WikiDBs 10k - A Corpus Of Relational Databases From Wikidata
Creators
- 1. Technical University of Darmstadt
- 2. Technical University of Darmstadt, DFKI
Description
WikiDBs-10k (https://wikidbs.github.io/) is a corpus of relational databases built from Wikidata (https://www.wikidata.org/). This is the preliminary 10k version, the newer version of 100k databases (https://zenodo.org/records/11559814) includes more coherent databases and more diverse table and column names.
The WikiDBs-10k corpus consists of 10,000 databases, for more details read our paper: https://ceur-ws.org/Vol-3462/TADA3.pdf (TaDA@VLDB'23)
Each database is saved in a sub-folder, the table files are provided as csv files and the database schema as a json file.
We thank Till Döhmen and Madelon Hulsebos for generously providing the table statistics from their GitSchemas dataset and Jan-Micha Bodensohn for converting the dataset to SQLite files. This work has been supported by the BMBF and the state of Hesse as part of the NHR Program and the BMBF project KompAKI (grant number 02L19C150), as well as the HMWK cluster project 3AI. Finally, we want to thank hessian.AI, and DFKI Darmstadt for their support.
Files
wikidbs_10k.zip
Files
(761.1 MB)
Name | Size | Download all |
---|---|---|
md5:71ebb739508f1a54b79c2112687d5d83
|
590.6 MB | Preview Download |
md5:432a413072f6041966fe3129469b595e
|
170.5 MB | Preview Download |
Additional details
Related works
- Is previous version of
- Dataset: 10.5281/zenodo.11559814 (DOI)