There is a newer version of the record available.

Published February 3, 2022 | Version 2022.02
Dataset Open

Automatically Extracted SHACL Shapes for WikiData, DBpedia, YAGO-4, and LUBM & Associated Coverage Statistics

  • 1. Aalborg University Denmark

Description

The uploaded datasets contain automatically extracted SHACL shapes for the following datasets:

  • WikiData (the truthy dump from September 2021 filtered by removing non-English strings) [1]
  • DBpedia [2]
  • YAGO-4 [3] 
  • LUBM (scale factor 500) [4]

The validating shapes for these datasets are generated by a program that parses the corresponding RDF files (in `.nt` format). The extracted shapes encode various SHACL constraints, e.g., sh:minCount, sh:path, sh:class, sh:datatype etc. For each shape we encode coverage in terms of number of entities satisfying such shape, this information is encoded using the void:entities predicate. 

We have provided as executable Jar file the program we developed to extract these SHACL shapes.
More details about the datasets used to extract these shapes and how to run the Jar are available on our GitHub repository https://github.com/Kashif-Rabbani/validatingshapes.

[1] Vrandečić, Denny, and Markus Krötzsch. "Wikidata: a free collaborative knowledgebase." Communications of the ACM 57.10 (2014): 78-85.

[2] Auer, Sören, et al. "Dbpedia: A nucleus for a web of open data." The semantic web. Springer, Berlin, Heidelberg, 2007. 722-735.

[3] Pellissier Tanon, Thomas, Gerhard Weikum, and Fabian Suchanek. "Yago 4: A reason-able knowledge base." European Semantic Web Conference. Springer, Cham, 2020.

[4] Guo, Yuanbo, Zhengxiang Pan, and Jeff Heflin. "LUBM: A benchmark for OWL knowledge base systems." Journal of Web Semantics 3.2-3 (2005): 158-182.

Files

Files (1.8 GB)

Name Size Download all
md5:16ac88a416d35e9210962228a27e5728
12.0 MB Download
md5:066c76c27153519fda9b7755b897ce08
148.5 kB Download
md5:d148345d04ca4efa1dc8a37b41548647
163.5 MB Download
md5:5b5d5f21bbe88e9d069666ff456366e6
1.5 GB Download
md5:c529e9d9f43994b2ab97dddd2439bf4c
89.5 MB Download