Date field extraction from handwritten documents using HMMs

Ranju Mandal, Partha Pratim Roy, Umapada Palz, Michael Blumenstein

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Automatic document interpretation and retrieval is an important task to access handwritten digitized document repositories. In documents, the date is an important field and it has various applications such as date-wise document indexing/retrieval. In this paper a framework has been proposed for automatic date field extraction from handwritten documents. In order to design the system, sliding window-wise Local Gradient Histogram (LGH)-based features and a character-level Hidden Markov Model (HMM)-based approach have been applied for segmentation and recognition. Individual date components such as month-word (month written in word form i.e. January, Jan, etc.), numeral, punctuation and contraction categories are segmented and labelled from a text line. Next, a Histogram of Gradient (HoG)-based features and a Support Vector Machine (SVM)- based classifier have been used to improve the results obtained from the HMM-based recognition system. Subsequently, both numeric and semi-numeric regular expressions of date patterns have been considered for undertaking date pattern extraction in labelled components. The experiments are performed on an English document dataset and the encouraging results obtained from the approach indicate the effectiveness of the proposed system.

Original languageEnglish
Title of host publication13th IAPR International Conference on Document Analysis and Recognition, ICDAR 2015 - Conference Proceedings
PublisherIEEE Computer Society
Pages866-870
Number of pages5
ISBN (Electronic)9781479918058
DOIs
Publication statusPublished - 20 Nov 2015
Event13th International Conference on Document Analysis and Recognition, ICDAR 2015 - Nancy, France
Duration: 23 Aug 201526 Aug 2015

Publication series

NameProceedings of the International Conference on Document Analysis and Recognition, ICDAR
Volume2015-November
ISSN (Print)1520-5363

Conference

Conference13th International Conference on Document Analysis and Recognition, ICDAR 2015
Country/TerritoryFrance
CityNancy
Period23/08/1526/08/15

Fingerprint

Dive into the research topics of 'Date field extraction from handwritten documents using HMMs'. Together they form a unique fingerprint.

Cite this