Show simple item record

dc.contributor.authorChaudhuri, Bidyut B.
dc.contributor.authorAdak, Chandranath
dc.date.accessioned2017-07-27T23:16:10Z
dc.date.available2017-07-27T23:16:10Z
dc.date.issued2017
dc.identifier.issn0031-3203
dc.identifier.doi10.1016/j.patcog.2016.07.032
dc.identifier.urihttp://hdl.handle.net/10072/342701
dc.description.abstractThis paper deals with the identification and processing of struck-out texts in unconstrained offline handwritten document images. If run on the OCR engine, such texts will produce nonsense character-string outputs. Here we present a combined (a) pattern classification and (b) graph-based method for identifying such texts. In case of (a), a feature-based two-class (normal vs. struck-out text) SVM classifier is used to detect moderate-sized struck-out components. In case of (b), skeleton of the text component is considered as a graph and the strike-out stroke is identified using a constrained shortest path algorithm. To identify zigzag or wavy struck-outs, all paths are found and some properties of zigzag and wavy line are utilized. Some other types of strike-out stroke are also detected by modifying the above method. The large sized multi-word and multi-line struck-outs are segmented into smaller components and treated as above. The detected struck-out texts can then be blocked from entering the OCR engine. In another kind of application involving historical documents, page images along with their annotated ground-truth are to be generated. In this case the strike-out strokes can be deleted from the words and then fed to the OCR engine. For this purpose an inpainting-based cleaning approach is employed. We worked on 500 pages of documents and obtained an overall F-Measure of 91.56% (91.06%) in English (Bengali) script for struck-out text detection. Also, for strike-out stroke identification and deletion, the F-Measures obtained were 89.65% (89.31%) and 91.16% (89.29%), respectively.
dc.description.peerreviewedYes
dc.languageEnglish
dc.language.isoeng
dc.publisherElsevier
dc.relation.ispartofpagefrom282
dc.relation.ispartofpageto294
dc.relation.ispartofjournalPattern Recognition
dc.relation.ispartofvolume61
dc.subject.fieldofresearchArtificial Intelligence and Image Processing not elsewhere classified
dc.subject.fieldofresearchArtificial Intelligence and Image Processing
dc.subject.fieldofresearchInformation Systems
dc.subject.fieldofresearchElectrical and Electronic Engineering
dc.subject.fieldofresearchcode080199
dc.subject.fieldofresearchcode0801
dc.subject.fieldofresearchcode0806
dc.subject.fieldofresearchcode0906
dc.titleAn approach for detecting and cleaning of struck-out handwritten text
dc.typeJournal article
dc.type.descriptionC1 - Articles
dc.type.codeC - Journal Articles
gro.hasfulltextNo Full Text
gro.griffith.authorAdak, Chandranath


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

  • Journal articles
    Contains articles published by Griffith authors in scholarly journals.

Show simple item record