Scalable and Fast Top-k Most Similar Trajectories Search Using MapReduce In-Memory
MetadataShow full item record
Top-k most similar trajectories search (k-NN) is frequently used as classification algorithm and recommendation systems in spatial-temporal trajectory databases. However, k-NN trajectories is a complex operation, and a multi-user application should be able to process multiple k-NN trajectories search concurrently in large-scale data in an efficient manner. The k-NN trajectories problem has received plenty of attention, however, state-of-the-art works neither consider in-memory parallel processing of k-NN trajectories nor concurrent queries in distributed environments, or consider parallelization of k-NN search for simpler spatial objects (i.e. 2D points) using MapReduce, but ignore the temporal dimension of spatial-temporal trajectories. In this work we propose a distributed parallel approach for k-NN trajectories search in a multi-user environment using MapReduce in-memory. We propose a space/time data partitioning based on Voronoi diagrams and time pages, named Voronoi Pages, in order to provide both spatial-temporal data organization and process decentralization. In addition, we propose a spatial-temporal index for our partitions to efficiently prune the search space, improve system throughput and scalability. We implemented our solution on top of Spark’s RDD data structure, which provides a thread-safe environment for concurrent MapReduce tasks in main-memory. We perform extensive experiments to demonstrate the performance and scalability of our approach.
Lecture Notes in Computer Science
Copyright 2016 Springer International Publishing AG. This is the author-manuscript version of this paper. Reproduced in accordance with the copyright policy of the publisher. The original publication is available at www.springerlink.com.