Project deliverable Open Access
Venturini, Rossano; Perego, Raffaele; Yankova, Milena; Karampiperis, Pythagoras
The BigDataGrapes (BDG) platform aspires to provide components that go beyond the state-of-the-art on various stages of the management, processing, and usage of grapevine-related big data assets thus making easier for grapevine-powered industries to take important business decisions. The platform employs the necessary components for carrying out rigorous analytics processes on complex and heterogeneous data helping companies and organizations in the sector to evolve methods, standards and processes based on insights extracted from their data.
The goal of the BDG Distributed Indexing activity is to develop novel methodologies and components for realizing efficient indexing over distributed big data batch and cross-streaming sources.
The activities carried out in this first period focused on the design of time and space efficient indexing data structures for structured and unstructured data such as labelled trees, graphs, and text documents, including compression techniques for Big data management that support a broad range of analytical queries over arbitrary data dimensions. Specifically, we investigated the efficiency and effectiveness dimensions of indexes for RDF triples based on inverted indexes, and designed a novel compression technique for making these indexes more efficient in both space and time. This deliverable includes the first version of the software components developed and discusses the preliminary results obtained. An appendix shows how to access the software, install it and reproduce the tests conducted.