Binarization of Old Halftone Text Documents
Author(s)
Adak, Chandranath
Maitra, Prantik
Chaudhuri, Bidyut B.
Blumenstein, Michael
Year published
2015
Metadata
Show full item recordAbstract
A degraded document image should be cleaned before subjecting to Optical Character Recognition (OCR), otherwise the result may be erroneous. Though major studies have been conducted on degraded document image cleaning, halftone documents received less attention. Since halftone documents contain halftone dot patterns, classical binarization techniques do not produce proper output for feeding into the OCR engine. In this paper, old halftone documents are considered for text area cleaning and binarization. At the beginning, the zone of interest (text area) is found using local binary pattern and contour analysis. Reasonably ...
View more >A degraded document image should be cleaned before subjecting to Optical Character Recognition (OCR), otherwise the result may be erroneous. Though major studies have been conducted on degraded document image cleaning, halftone documents received less attention. Since halftone documents contain halftone dot patterns, classical binarization techniques do not produce proper output for feeding into the OCR engine. In this paper, old halftone documents are considered for text area cleaning and binarization. At the beginning, the zone of interest (text area) is found using local binary pattern and contour analysis. Reasonably smaller zones are filtered out as noise. Then the foreground pixels are separated using background estimation. After this, an automated spatial smoothing technique is employed on the foreground. At last, a local binarization technique is used to produce the binary image. The proposed method is tested on various old and degraded halftone documents, which has produced fairly good results.
View less >
View more >A degraded document image should be cleaned before subjecting to Optical Character Recognition (OCR), otherwise the result may be erroneous. Though major studies have been conducted on degraded document image cleaning, halftone documents received less attention. Since halftone documents contain halftone dot patterns, classical binarization techniques do not produce proper output for feeding into the OCR engine. In this paper, old halftone documents are considered for text area cleaning and binarization. At the beginning, the zone of interest (text area) is found using local binary pattern and contour analysis. Reasonably smaller zones are filtered out as noise. Then the foreground pixels are separated using background estimation. After this, an automated spatial smoothing technique is employed on the foreground. At last, a local binarization technique is used to produce the binary image. The proposed method is tested on various old and degraded halftone documents, which has produced fairly good results.
View less >
Conference Title
2015 IEEE Region 10 Conference. Academic-Research-Industrial Collaboration: Challenges & Opportunities in Electrical, Electronic, Computer, Communication and Information Engineering
Subject
Communications Technologies not elsewhere classified