Scalable Distributed Trajectory Clustering Using Apache Spark
Authors/Creators
Description
Trajectory clustering is an important problem, where position data of mobile objects, such as vehicles and vessels, is analyzed to extract knowledge utilized for a plethora of management tasks. Recently, a vast increase in the production of data gathering devices has taken place, allowing the collection of data in much larger volumes. This challenges the application of existing clustering algorithms, as they are not always able to handle large datasets due to their design. In particular, TRACLUS is one of the most well-known trajectory clustering algorithms that is a generalization of DBSCAN for trajectory line segments. However, due to the iterative approach and the repetitive usage of a spatial index inherited from DBSCAN, TRACLUS’s performance degrades as the datasets increase in size and can be extremely slow in some cases. To tackle this shortcoming, we propose a distributed implementation of TRACLUS, built on Apache Spark, that can operate on very large datasets by applying different types of partitioning to the input data. Results from an empirical evaluation on real-world trajectories illustrate that our distributed variant achieves improved runtime performance and clustering efficiency.
Files
BMDA_2023_paper_4347.pdf
Files
(4.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:d623d4925e231d066c8a39e5d8805092
|
4.1 MB | Preview Download |