Date field extraction from handwritten documents using HMMs

View/ Open
File version
Accepted Manuscript (AM)
Author(s)
Mandal, Ranju
Roy, Partha Pratim
Pal, Umapada
Blumenstein, Michael
Griffith University Author(s)
Year published
2015
Metadata
Show full item recordAbstract
Automatic document interpretation and retrieval is an important task to access handwritten digitized document repositories. In documents, the date is an important field and it has various applications such as date-wise document indexing/retrieval. In this paper a framework has been proposed for automatic date field extraction from handwritten documents. In order to design the system, sliding window-wise Local Gradient Histogram (LGH)-based features and a character-level Hidden Markov Model (HMM)-based approach have been applied for segmentation and recognition. Individual date components such as month-word (month written in ...
View more >Automatic document interpretation and retrieval is an important task to access handwritten digitized document repositories. In documents, the date is an important field and it has various applications such as date-wise document indexing/retrieval. In this paper a framework has been proposed for automatic date field extraction from handwritten documents. In order to design the system, sliding window-wise Local Gradient Histogram (LGH)-based features and a character-level Hidden Markov Model (HMM)-based approach have been applied for segmentation and recognition. Individual date components such as month-word (month written in word form i.e. January, Jan, etc.), numeral, punctuation and contraction categories are segmented and labelled from a text line. Next, a Histogram of Gradient (HoG)-based features and a Support Vector Machine (SVM)- based classifier have been used to improve the results obtained from the HMM-based recognition system. Subsequently, both numeric and semi-numeric regular expressions of date patterns have been considered for undertaking date pattern extraction in labelled components. The experiments are performed on an English document dataset and the encouraging results obtained from the approach indicate the effectiveness of the proposed system.
View less >
View more >Automatic document interpretation and retrieval is an important task to access handwritten digitized document repositories. In documents, the date is an important field and it has various applications such as date-wise document indexing/retrieval. In this paper a framework has been proposed for automatic date field extraction from handwritten documents. In order to design the system, sliding window-wise Local Gradient Histogram (LGH)-based features and a character-level Hidden Markov Model (HMM)-based approach have been applied for segmentation and recognition. Individual date components such as month-word (month written in word form i.e. January, Jan, etc.), numeral, punctuation and contraction categories are segmented and labelled from a text line. Next, a Histogram of Gradient (HoG)-based features and a Support Vector Machine (SVM)- based classifier have been used to improve the results obtained from the HMM-based recognition system. Subsequently, both numeric and semi-numeric regular expressions of date patterns have been considered for undertaking date pattern extraction in labelled components. The experiments are performed on an English document dataset and the encouraging results obtained from the approach indicate the effectiveness of the proposed system.
View less >
Conference Title
2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR)
Copyright Statement
© 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Subject
Artificial intelligence not elsewhere classified