Approximate Nearest Neighbour Search on Dynamic Datasets: An Investigation
File version
Accepted Manuscript (AM)
Author(s)
Dezfouli, Amir
Chades, Iadine
Sanderson, Conrad
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Gong, Mingming
Song, Yiliao
Koh, Yun Sing
Xiang, Wei
Wang, Derui
Date
Size
File type(s)
Location
Melbourne, Australia
License
Abstract
Approximate k-Nearest Neighbour (ANN) methods are often used for mining information and aiding machine learning on large scale high-dimensional datasets. ANN methods typically differ in the index structure used for accelerating searches, resulting in various recall/runtime trade-off points. For applications with static datasets, runtime constraints and dataset properties can be used to empirically select an ANN method with suitable operating characteristics. However, for applications with dynamic datasets, which are subject to frequent online changes (like addition of new samples), there is currently no consensus as to which ANN methods are most suitable. Traditional evaluation approaches do not consider the computational costs of updating the index structure, as well as the rate and size of index updates. To address this, we empirically evaluate 5 popular ANN methods on two main applications (online data collection and online feature learning) while taking into account these considerations. Two dynamic datasets are used, derived from the SIFT1M dataset with 1 million samples and the DEEP1B dataset with 1 billion samples. The results indicate that the often used k-d trees method is not suitable on dynamic datasets as it is slower than a straightforward baseline exhaustive search method. For online data collection, the Hierarchical Navigable Small World Graphs method achieves a consistent speedup over baseline across a wide range of recall rates. For online feature learning, the Scalable Nearest Neighbours method is faster than baseline for recall rates below 75%.
Journal Title
Conference Title
AI 2024: Advances in Artificial Intelligence: 37th Australasian Joint Conference on Artificial Intelligence, AI 2024, Melbourne, VIC, Australia, November 25–29, 2024, Proceedings, Part II
Book Title
Edition
Volume
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
This work is covered by copyright. You must assume that re-use is limited to personal use and that permission from the copyright owner must be obtained for all other uses. If the document is available under a specified licence, refer to the licence for details of permitted re-use. If you believe that this work infringes copyright please make a copyright takedown request using the form at https://www.griffith.edu.au/copyright-matters.
Item Access Status
Note
Access the data
Related item(s)
Subject
Persistent link to this record
Citation
Harwood, B; Dezfouli, A; Chades, I; Sanderson, C, Approximate Nearest Neighbour Search on Dynamic Datasets: An Investigation, AI 2024: Advances in Artificial Intelligence: 37th Australasian Joint Conference on Artificial Intelligence, AI 2024, Melbourne, VIC, Australia, November 25–29, 2024, Proceedings, Part II, 2024, pp. 95-106