Thesis Open Access

Schema Inference on Wikidata

Werkmeister, Lucas

Thesis supervisor(s)

Reussner, Ralf; Sack, Harald; Koutraki, Maria

Wikidata, the free knowledge base in the Wikimedia movement, is used by various Wikimedia projects and third parties to provide machine-readable information and data. Its data quality is managed and monitored by its community using several quality control mechanisms, recently including formal schemas in the Shape Expressions language. However, larger schemas can be tedious to write, making automatic inference of schemas from a set of exemplary Items an attractive prospect.

This thesis investigates this option by updating and adapting the RDF2Graph program to infer schemas from a set of Wikidata Items, and providing a web-based tool which makes this process available to the Wikidata community. Though the resulting schemas are usually not fit for direct validation, they can still be useful as a form of describing the layout of an area of Wikidata’s data model, a way to notice potential issues in the source data, or a basis for a manually curated schema.

master-thesis-final.tar.gz is a tarball of the thesis' source code, from which it should be possible to build the thesis by running make; master-thesis-Lucas-Werkmeister.pdf is the PDF built directly by PDFLaTeX; and master-thesis-Lucas-Werkmeister-signed.pdf is a signed version of master-thesis-Lucas-Werkmeister.pdf, signed with the PortableSigner software using the author's certificate from the KIT CA.
