PM-LSH: A Fast and Accurate In-Memory Framework for High-Dimensional Approximate NN and Closest Pair Search

Loading...
Thumbnail Image
File version

Accepted Manuscript (AM)

Author(s)
Zheng, Bolong
Zhao, Xi
Weng, Lianggui
Nguyen, Quoc Viet Hung
Liu, Hang
Jensen, Christian
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
2021
Size
File type(s)
Location
License
Abstract

Nearest neighbor (NN) search is inherently computationally expensive in high-dimensional spaces due to the curse of dimensionality. As a well-known solution, locality-sensitive hashing (LSH) is able to answer c-approximate NN (c-ANN) queries in sublinear time with constant probability. Existing LSH methods focus mainly on building hash bucket-based indexing such that the candidate points can be retrieved quickly. However, existing coarse-grained structures fail to offer accurate distance estimation for candidate points, which translates into additional computational overhead when having to examine unnecessary points. This in turn reduces the performance of query processing. In contrast, we propose a fast and accurate in-memory LSH framework, called PM-LSH, that aims to compute the c-ANN query on large-scale, high-dimensional datasets. First, we adopt a simple yet effective PM-tree to index the data points. Second, we develop a tunable confidence interval to achieve accurate distance estimation and guarantee high result quality. Third, we propose an efficient algorithm on top of the PM-tree to improve the performance of computing c-ANN queries. In addition, we extend PM-LSH to support closest pair (CP) search in high-dimensional spaces. Here, we again adopt the PM-tree to organize the points in a low-dimensional space, and we propose a branch and bound algorithm together with a radius pruning technique to improve the performance of computing c-approximate closest pair (c-ACP) queries. Extensive experiments with real-world data offer evidence that PM-LSH is capable of outperforming existing proposals with respect to both efficiency and accuracy for both NN and CP search.

Journal Title

The VLDB Journal

Conference Title
Book Title
Edition
Volume
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement

© 2021 Springer-Verlag Berlin Heidelberg. This is an electronic version of an article published in The VLDB Journal, 2021. The VLDB Journal is available online at: http://link.springer.com/ with the open URL of your article.

Item Access Status
Note

This publication has been entered as an advanced online version in Griffith Research Online.

Access the data
Related item(s)
Subject

Artificial intelligence

Persistent link to this record
Citation

Zheng, B; Zhao, X; Weng, L; Nguyen, QVH; Liu, H; Jensen, C, PM-LSH: A Fast and Accurate In-Memory Framework for High-Dimensional Approximate NN and Closest Pair Search, The VLDB Journal, 2021

Collections