Published June 30, 2020 | Version 3.0
Project deliverable Open

BigDataGrapes D3.3 - Distributed Indexing Components

Description

The BigDataGrapes (BDG) platform aspires to provide components that go beyond the state-of-the-art on various stages of the management, processing, and usage of grapevine-related big data assets thus making easier for grapevine-powered industries to take important business decisions. The platform employs the necessary components for carrying out rigorous analytics processes on complex and heterogeneous data helping companies and organizations in the sector to evolve methods, standards and processes based on insights extracted from their data.

The goal of the BDG Distributed Indexing activity is to develop novel methodologies and components for realizing efficient indexing over distributed, heterogeneous big data batch and cross-streaming sources.

The activities carried out in this activity focus on the design of time and space efficient indexing data structures for structured and unstructured data such as RDF graphs, time series and text documents, including compression techniques for Big data management that support a broad range of analytical queries over arbitrary data dimensions.

Specifically, we investigate the efficiency and effectiveness dimensions of indexes for texts, RDF triples, and time-series. This investigation led us to develop two novel techniques based on inverted indexes and succinct tries. These solutions substantially outperform competitive approaches at the state-of-the-art. Both these scientific results have been already published in Transactions on Knowledge and Data Engineering (TKDE)(Pibiri & Venturini 2019b; Perego, Pibiri & Venturini 2020) — a top-tier journal in Computer Science.

We believe that the above project results will have a high impact for both the project partners and the scientific community working in the field.

A third contribution on indexing time-series has been added to the last update of the deliverable due at M30. The software solution discussed is still under development but the encouraging results achieved so far lead us to believe that the solution designed will be very useful and impactful for the project and the community. Moreover, a top-tier publication is planned even for this project result.

This deliverable describes the software components developed and discusses the promising results obtained. An appendix shows how to access the software, install it and reproduce the tests conducted.

 

Files

D3.3 - Distributed Indexing Components_v3.0 (Submitted to EC).pdf

Files (2.4 MB)

Additional details

Funding

BigDataGrapes – Big Data to Enable Global Disruption of the Grapevine-powered Industries 780751
European Commission