Hashing with non-linear manifold learning

View/ Open
File version
Accepted Manuscript (AM)
Author(s)
Liu, Yanzhen
Bai, Xiao
Yan, Cheng
Wang, Jing
Zhou, Jun
Year published
2016
Metadata
Show full item recordAbstract
The amount of data is exploding with the development of Internet and multimedia technology. Rapid retrieval of mass data is becoming more and more important. To meet the demand of the rapid retrieval, many approximate nearest neighobor methods have been proposed to accelerate the exhaustive search process. Hashing is such an example with great balance of time and accuracy. Hashing methods achieve quick retrieval by converting the high-dimensional raw data into a binary hash code, keeping the similarity of original data in mapped hash codes. Many hashing approaches use the Euclidean distance as similarity measurement. However, ...
View more >The amount of data is exploding with the development of Internet and multimedia technology. Rapid retrieval of mass data is becoming more and more important. To meet the demand of the rapid retrieval, many approximate nearest neighobor methods have been proposed to accelerate the exhaustive search process. Hashing is such an example with great balance of time and accuracy. Hashing methods achieve quick retrieval by converting the high-dimensional raw data into a binary hash code, keeping the similarity of original data in mapped hash codes. Many hashing approaches use the Euclidean distance as similarity measurement. However, data in many datasets are distributed on a non-linear manifold, such that geodesic distance on manifold can represents the semantic similarity of original data points more accurately than the Euclidean distance. This enables better preservation of the sematic similarity in the hash code when mapping the original dataset to low- dimensional space. In this paper, we propose to use Isometric Mapping (ISOMAP) for dimensional reduction and utilize iterative quantization to reduce quantization loss during hashing process. The experiments show that our manifold learning method outperforms several alternative hashing methods. The retrieval performance is further boosted after iterative quantization process is added to the Diffusion Hashing (DH) and Spectral Hashing.
View less >
View more >The amount of data is exploding with the development of Internet and multimedia technology. Rapid retrieval of mass data is becoming more and more important. To meet the demand of the rapid retrieval, many approximate nearest neighobor methods have been proposed to accelerate the exhaustive search process. Hashing is such an example with great balance of time and accuracy. Hashing methods achieve quick retrieval by converting the high-dimensional raw data into a binary hash code, keeping the similarity of original data in mapped hash codes. Many hashing approaches use the Euclidean distance as similarity measurement. However, data in many datasets are distributed on a non-linear manifold, such that geodesic distance on manifold can represents the semantic similarity of original data points more accurately than the Euclidean distance. This enables better preservation of the sematic similarity in the hash code when mapping the original dataset to low- dimensional space. In this paper, we propose to use Isometric Mapping (ISOMAP) for dimensional reduction and utilize iterative quantization to reduce quantization loss during hashing process. The experiments show that our manifold learning method outperforms several alternative hashing methods. The retrieval performance is further boosted after iterative quantization process is added to the Diffusion Hashing (DH) and Spectral Hashing.
View less >
Conference Title
2016 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA)
Copyright Statement
© 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Subject
Pattern recognition
Data mining and knowledge discovery