Published April 8, 2020 | Version v1
Conference paper Open

A Parallel and Distributed Approach for Diversified Top-k Best Region Search

  • 1. Eindhoven University of Technology
  • 2. EPFL
  • 3. Athena R.C.

Description

Given a set of points, the Best Region Search problem finds the optimal location of a rectangle of a specified size such that the value of a user-defined scoring function over its enclosed points is maximized. A recently proposed top-k algorithm for this problem returns results progressively, while also incorporating additional constraints, such as taking into consideration the overlap between the set of selected top-k rectangles. However, the algorithm is designed for a centralized setting and does not scale to very large datasets. In this paper, we overcome this limitation by enabling parallel and distributed computation of the results. We first propose a strategy that employs multiple rounds to progressively collect partial top-k results from each node in the cluster, while a coordinator handles the aggregation of the global top-k list, dealing with overlapping results. We then devise a single-round strategy, where the algorithm executed by each node is enhanced with additional conditions that anticipate potential overlapping solutions from neighboring nodes. Additional optimizations are proposed to further increase performance. Our experiments on real-world datasets indicate that our proposed algorithms are efficient and scale to millions of points.

Files

EDBT2020_paper_131.pdf

Files (1.0 MB)

Name Size Download all
md5:7f6f900241483a3f4db155797535a5e4
1.0 MB Preview Download

Additional details

Funding

SmartDataLake – Sustainable Data Lakes for Extreme-Scale Analytics 825041
European Commission