Project deliverable Open Access
Rossano Venturini; Giulio Ermanno Pibiri; Raffaele Perego
The BigDataGrapes (BDG) platform aspires to provide components that go beyond the state-of-the-art on various stages of the management, processing, and usage of grapevine-related big data assets thus making easier for grapevine-powered industries to take important business decisions. The platform employs the necessary components for carrying out rigorous analytics processes on complex and heterogeneous data helping companies and organizations in the sector to evolve methods, standards and processes based on insights extracted from their data.
The goal of the BDG Distributed Indexing activity is to develop novel methodologies and components for realizing efficient indexing over distributed, heterogeneous big data batch and cross-streaming sources.
The activities carried out in this activity focus on the design of time and space efficient indexing data structures for structured and unstructured data such as RDF graphs, time series and text documents, including compression techniques for Big data management that support a broad range of analytical queries over arbitrary data dimensions.
Specifically, we investigate the efficiency and effectiveness dimensions of indexes for texts, RDF triples, and time-series. This investigation led us to develop two novel techniques based on inverted indexes and succinct tries. These solutions substantially outperform competitive approaches at the state-of-the-art. Both these scientific results have been already published in Transactions on Knowledge and Data Engineering (TKDE)(Pibiri & Venturini 2019b; Perego, Pibiri & Venturini 2020) — a top-tier journal in Computer Science.
We believe that the above project results will have a high impact for both the project partners and the scientific community working in the field.
A third contribution on indexing time-series has been added to the last update of the deliverable due at M30. The software solution discussed is still under development but the encouraging results achieved so far lead us to believe that the solution designed will be very useful and impactful for the project and the community. Moreover, a top-tier publication is planned even for this project result.
This deliverable describes the software components developed and discusses the promising results obtained. An appendix shows how to access the software, install it and reproduce the tests conducted.
D3.3 - Distributed Indexing Components_v3.0 (Submitted to EC).pdf