Automatically Extracted SHACL Shapes for WikiData, DBpedia, YAGO-4, and LUBM & Associated Coverage Statistics
Description
The uploaded datasets contain automatically extracted SHACL shapes for the following datasets:
- WikiData (the truthy dump from September 2021 filtered by removing non-English strings) [1]
- DBpedia [2]
- YAGO-4 [3]
- LUBM (scale factor 500) [4]
The validating shapes for these datasets are generated by a program that parses the corresponding RDF files (in `.nt` format). The extracted shapes encode various SHACL constraints, e.g., sh:minCount, sh:path, sh:class, sh:datatype etc. For each shape we encode coverage in terms of number of entities satisfying such shape, this information is encoded using the void:entities predicate.
We have provided as executable Jar file the program we developed to extract these SHACL shapes.
More details about the datasets used to extract these shapes and how to run the Jar are available on our GitHub repository https://github.com/Kashif-Rabbani/validatingshapes.
[1] Vrandečić, Denny, and Markus Krötzsch. "Wikidata: a free collaborative knowledgebase." Communications of the ACM 57.10 (2014): 78-85.
[2] Auer, Sören, et al. "Dbpedia: A nucleus for a web of open data." The semantic web. Springer, Berlin, Heidelberg, 2007. 722-735.
[3] Pellissier Tanon, Thomas, Gerhard Weikum, and Fabian Suchanek. "Yago 4: A reason-able knowledge base." European Semantic Web Conference. Springer, Cham, 2020.
[4] Guo, Yuanbo, Zhengxiang Pan, and Jeff Heflin. "LUBM: A benchmark for OWL knowledge base systems." Journal of Web Semantics 3.2-3 (2005): 158-182.