Thesis Open Access
Wikidata, the free knowledge base in the Wikimedia movement, is used by various Wikimedia projects and third parties to provide machine-readable information and data. Its data quality is managed and monitored by its community using several quality control mechanisms, recently including formal schemas in the Shape Expressions language. However, larger schemas can be tedious to write, making automatic inference of schemas from a set of exemplary Items an attractive prospect.
This thesis investigates this option by updating and adapting the RDF2Graph program to infer schemas from a set of Wikidata Items, and providing a web-based tool which makes this process available to the Wikidata community. Though the resulting schemas are usually not fit for direct validation, they can still be useful as a form of describing the layout of an area of Wikidata’s data model, a way to notice potential issues in the source data, or a basis for a manually curated schema.