Workload-Aware Self-Tuning Histograms for the Semantic Web

Zamani, Katerina; Charalambidis, Angelos; Konstantopoulos, Stasinos; Zoulis, Nickolas; Mavroudi, Effrosyni

doi:10.1007/978-3-662-53455-7_6

Published September 10, 2016 | Version v1

Journal article Open

Workload-Aware Self-Tuning Histograms for the Semantic Web

1. NCSR "Demokritos"
2. National Technical University of Athens

Query processing systems typically rely on histograms, data structures that approximate data distribution, in order to optimize query execution. Histograms can be constructed by scanning the database tables and aggregating the values of the attributes in the table, or, more efficiently, progressively refined by analysing query results. Most of the relevant literature focuses on histograms of numerical data, exploiting the natural concept of a numerical range as an estimator of the volume of data that falls within the range. This, however, leaves Semantic Web data outside the scope of the histograms literature, as its most prominent datatype, the URI, does not offer itself to defining such ranges. This article first establishes a framework that formalises histograms over arbitrary data types and provides a formalism for specifying value ranges for different datatypes. This makes explicit the properties that ranges are required to have, so that histogram refinement algorithms are applicable. We demonstrate that our framework subsumes histograms over numerical data as a special case by using to formulate the state-of-the-art in numerical histograms. We then proceed to use the Jaro-Winkler metric to define URI ranges by exploiting the hierarchical nature of URI strings. This greatly extends the state of the art, where strings are treated as categorical data that can only be described by enumeration. We then present the open-source STRHist system that implements these ideas. We finally present empirical evaluation results using STRHist over a real dataset and query workload extracted from AGRIS, the most popular and widely used bibliographic database on agricultural research and technology.

Files

selftuning-TLDKS.pdf

Files (366.1 kB)

Name	Size	Download all
selftuning-TLDKS.pdf md5:5f4a9b35409b1781d91e919f79ad6645	366.1 kB	Preview Download

Additional details

Is identical to: http://link.springer.com/chapter/10.1007%2F978-3-662-53455-7_6 (URL)
Is part of: http://link.springer.com/book/10.1007/978-3-662-53455-7 (URL); http://tldks.faw.at/volume/32/ (URL)

European Commission
SEMAGROW - SemaGrow: Data intensive techniques to boost the real-time performance of global agricultural data infrastructures 318497

	All versions	This version
Views	213	213
Downloads	234	234
Data volume	86.8 MB	86.8 MB

Workload-Aware Self-Tuning Histograms for the Semantic Web

Files

selftuning-TLDKS.pdf

Files (366.1 kB)

Additional details

Related works

Funding

Workload-Aware Self-Tuning Histograms for the Semantic Web

Creators

Description

Files

selftuning-TLDKS.pdf

Files (366.1 kB)

Additional details

Related works

Funding