"Quality" vs. "Readability" in Document Images: Statistical Analysis of Human Perception

View/ Open
File version
Accepted Manuscript (AM)
Author(s)
Alaei, Alireza
Raveaux, Romain
Conte, Donatello
Stantic, Bela
Griffith University Author(s)
Year published
2018
Metadata
Show full item recordAbstract
Based on the hypothesis that a good / poor quality document image is most probably a readable / unreadable document, document image quality and readability have interchangeably been used in the literature. These two terms, however, have different meanings implying two different perspectives of looking at document images by human being. In document images, the level of quality and the degree of readability may have a relation / correlation considering human perception. However, to the best of our knowledge there is no specific study to characterise this relation and also validate the abovementioned hypothesis. In this work, ...
View more >Based on the hypothesis that a good / poor quality document image is most probably a readable / unreadable document, document image quality and readability have interchangeably been used in the literature. These two terms, however, have different meanings implying two different perspectives of looking at document images by human being. In document images, the level of quality and the degree of readability may have a relation / correlation considering human perception. However, to the best of our knowledge there is no specific study to characterise this relation and also validate the abovementioned hypothesis. In this work, at first, we created a dataset composed of mostly camera-based document images with various distortion levels. Each document image has then been assessed with regard to two different measures, the level of quality and the degree of readability, by different individuals. A detailed Normalised Cross Correlation analysis along with different statistical analysis based on Shapiro-Wilks and Wilcoxon tests has further been provided to demonstrate how document image quality and readability are linked. Our findings indicate that the quality and readability were somewhat different in terms of the population distributions. However, the correlation between quality and readability was 0.99, which implies document quality and readability are highly correlated based on human perception.
View less >
View more >Based on the hypothesis that a good / poor quality document image is most probably a readable / unreadable document, document image quality and readability have interchangeably been used in the literature. These two terms, however, have different meanings implying two different perspectives of looking at document images by human being. In document images, the level of quality and the degree of readability may have a relation / correlation considering human perception. However, to the best of our knowledge there is no specific study to characterise this relation and also validate the abovementioned hypothesis. In this work, at first, we created a dataset composed of mostly camera-based document images with various distortion levels. Each document image has then been assessed with regard to two different measures, the level of quality and the degree of readability, by different individuals. A detailed Normalised Cross Correlation analysis along with different statistical analysis based on Shapiro-Wilks and Wilcoxon tests has further been provided to demonstrate how document image quality and readability are linked. Our findings indicate that the quality and readability were somewhat different in terms of the population distributions. However, the correlation between quality and readability was 0.99, which implies document quality and readability are highly correlated based on human perception.
View less >
Conference Title
2018 13TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS (DAS)
Copyright Statement
© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Subject
Artificial intelligence not elsewhere classified