Published March 2023 | Version v1
Conference paper Open

Scalable Distributed Trajectory Clustering Using Apache Spark

  • 1. ROR icon University of Peloponnese
  • 2. National Centre of Scientific Research Demokritos
  • 3. ROR icon University of Piraeus
  • 4. ROR icon University of the Aegean

Description

Trajectory clustering is an important problem, where position data of mobile objects, such as vehicles and vessels, is analyzed to extract knowledge utilized for a plethora of management tasks. Recently, a vast increase in the production of data gathering devices has taken place, allowing the collection of data in much larger volumes. This challenges the application of existing clustering algorithms, as they are not always able to handle large datasets due to their design. In particular, TRACLUS is one of the most well-known trajectory clustering algorithms that is a generalization of DBSCAN for trajectory line segments. However, due to the iterative approach and the repetitive usage of a spatial index inherited from DBSCAN, TRACLUS’s performance degrades as the datasets increase in size and can be extremely slow in some cases. To tackle this shortcoming, we propose a distributed implementation of TRACLUS, built on Apache Spark, that can operate on very large datasets by applying different types of partitioning to the input data. Results from an empirical evaluation on real-world trajectories illustrate that our distributed variant achieves improved runtime performance and clustering efficiency.

Files

BMDA_2023_paper_4347.pdf

Files (4.1 MB)

Name Size Download all
md5:d623d4925e231d066c8a39e5d8805092
4.1 MB Preview Download

Additional details

Funding

European Commission
VesselAI - ENABLING MARITIME DIGITALIZATION BY EXTREME-SCALE ANALYTICS, AI AND DIGITAL TWINS 957237