DistLODStats: Distributed Computation of RDF Dataset Statistics

Sejdiu, Gezim; Ermilov, Ivan; Lehmann, Jens; Mami, Mohamed Nadjib

doi:10.5281/zenodo.3567965

Published October 30, 2018 | Version v1

Conference paper Open

DistLODStats: Distributed Computation of RDF Dataset Statistics

. Over the last years, the Semantic Web has been growing steadily. To-
count more than 10,000 datasets made available online following Se-
eb standards. Nevertheless, many applications, such as data integration,
nd interlinking, may not take the full advantage of the data without hav-
ori statistical information about its internal structure and coverage. In
e are already a number of tools, which offer such statistics, providing
ormation about RDF datasets and vocabularies. However, those usually
ere deficiencies in terms of performance once the dataset size grows
he capabilities of a single machine. In this paper, we introduce a soft-
mponent for statistical calculations of large RDF datasets, which scales
sters of machines. More specifically, we describe the first distributed in-
approach for computing 32 different statistical criteria for RDF datasets
ache Spark. The preliminary results show that our distributed approach
upon a previous centralized approach we compare against and provides
ately linear horizontal scale-up. The criteria are extensible beyond the
t criteria, is integrated into the larger SANSA framework and employed
four major usage scenarios beyond the SANSA community.

Files

iswc_distlodstats.pdf

Files (268.2 kB)

Name	Size	Download all
iswc_distlodstats.pdf md5:c90d23a31b82ef4f75345c0315395ff5	268.2 kB	Preview Download

Additional details

Is documented by: https://link.springer.com/chapter/10.1007/978-3-030-00668-6_13 (URL)

QROWD – QROWD - Because Big Data Integration is Humanly Possible 732194: European Commission
BigDataOcean – BigDataOcean - Exploiting Ocean's of Data for Maritime Applications 732310: European Commission
WDAqua – Answering Questions using Web Data 642795: European Commission
BigDataEurope – Integrating Big Data, Software and Communities for Addressing Europe’s Societal Challenges 644564: European Commission

	All versions	This version
Views	47	47
Downloads	157	156
Data volume	42.4 MB	42.1 MB

DistLODStats: Distributed Computation of RDF Dataset Statistics

Files

iswc_distlodstats.pdf

Files (268.2 kB)

Additional details

Related works

Funding

DistLODStats: Distributed Computation of RDF Dataset Statistics

Creators

Description

Files

iswc_distlodstats.pdf

Files (268.2 kB)

Additional details

Related works

Funding