Preserving Text Content from Historical Handwritten Documents

No Thumbnail Available
File version
Author(s)
Chakraborty, Arpita
Blumenstein, Michael
Primary Supervisor
Other Supervisors
Editor(s)

Lisa OConner

Date
2016
Size
File type(s)
Location

GREECE

License
Abstract

We propose a holistic, dynamic method to preserve text content with zero tolerance while removing marginal noise for historical handwritten document images. The key idea is to identify and analyze the region between the sharp peak at the edge and page frame of the text content at each margin. Depending on the proximity of the sharp peak to the text, the text content is then extracted from the document image. This method automatically adapts thresholds for each single document image and is directly applicable to gray-scale images. The proposed method is evaluated on four diverse handwritten historical datasets: Queensland State Archive (QSA), Saint Gall, Parzival and the Prosecution Project. Experimental results show that the proposed method achieves higher accuracy compared with other methods tested on the Saint Gall and Parzival datasets, whilst for the other two Australian datasets, which have been introduced here for the first time, the results are very encouraging.

Journal Title
Conference Title

PROCEEDINGS OF 12TH IAPR WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS, (DAS 2016)

Book Title
Edition
Volume
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject

Artificial intelligence not elsewhere classified

Persistent link to this record
Citation