Evaluation of Gist Operator for Document Image Retrieval

View/ Open
File version
Accepted Manuscript (AM)
Author(s)
Alaei, Fahimeh
Alaei, Alireza
Pal, Umapada
Blumenstein, Michael
Year published
2018
Metadata
Show full item recordAbstract
As digitised documents normally contain a large variety of structures, a page segmentation- and layout-free method for document image retrieval is preferable. In this research work, therefore, wavelet transform as a transform-based approach is initially used to provide different under-sampled images from the original image. Then, Gist operator, as a feature extraction technique, is employed to extract a set of global features from the original image as well as the sub-images obtained from the wavelet transform. Moreover, the column-wise variances of the values in each sub-image are computed and they are then concatenated to ...
View more >As digitised documents normally contain a large variety of structures, a page segmentation- and layout-free method for document image retrieval is preferable. In this research work, therefore, wavelet transform as a transform-based approach is initially used to provide different under-sampled images from the original image. Then, Gist operator, as a feature extraction technique, is employed to extract a set of global features from the original image as well as the sub-images obtained from the wavelet transform. Moreover, the column-wise variances of the values in each sub-image are computed and they are then concatenated to obtain another set of features. Considering each feature set, locality-sensitive hashing is employed to compute similarity distances between a query and the document images in the database. Finally, a classifier fusion technique using the mean function is taken into account to provide a document image retrieval result. The combination of these features and a clustering score fusion strategy provides higher document image retrieval accuracy. Two different databases of the document image are considered for experimentation. The results obtained from the experimental study are detailed and the results are encouraging.
View less >
View more >As digitised documents normally contain a large variety of structures, a page segmentation- and layout-free method for document image retrieval is preferable. In this research work, therefore, wavelet transform as a transform-based approach is initially used to provide different under-sampled images from the original image. Then, Gist operator, as a feature extraction technique, is employed to extract a set of global features from the original image as well as the sub-images obtained from the wavelet transform. Moreover, the column-wise variances of the values in each sub-image are computed and they are then concatenated to obtain another set of features. Considering each feature set, locality-sensitive hashing is employed to compute similarity distances between a query and the document images in the database. Finally, a classifier fusion technique using the mean function is taken into account to provide a document image retrieval result. The combination of these features and a clustering score fusion strategy provides higher document image retrieval accuracy. Two different databases of the document image are considered for experimentation. The results obtained from the experimental study are detailed and the results are encouraging.
View less >
Conference Title
2018 13TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS (DAS)
Copyright Statement
© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Subject
Pattern recognition
Data mining and knowledge discovery