Dataset Open Access

GitTables 1.7M

Madelon Hulsebos; Çağatay Demiralp; Paul Groth

GitTables is a corpus of currently 1.7M relational tables extracted from GitHub. Our continuing curation aims at growing the corpus to at least 10M tables. We annotated table columns in GitTables with more than 2K different semantic types from and DBpedia. Our column annotations consist of semantic types, hierarchical relations, range types and descriptions. If you have questions: documentation and contact details are provided on our website:

This dataset corresponds to the 1.7M tables used for the analysis in the GitTables paper: Characteristics about the table corpus (e.g. table sizes and topical distribution) are reported in this paper.

Responsible use

The current versions of GitTables, up to 0.0.4, contain tables extracted from CSV files from public GitHub repositories, hence some tables might not be associated with a license that allows e.g. commercial use. A new version of GitTables with only licensed tables will be released soon, the licenses will be attached to the file metadata. In the meantime, we suggest to use GitHub's License API to retrieve the license associated with the table (you can use the URL in the metadata to do so) to understand what restrictions apply to each table.

Please be aware that this dataset is uncurated, hence the underlying data files might exhibit sensitive, harmful or otherwise undesired data. The spread and exact replication of such content should be avoided, please report any such observations so that we can remove these files accordingly.

It is also important to assess derived artefacts on the presence of any negative bias before deploying or publishing them. In case harmful biases are observed we would like to be notified so that we can mitigate these problems and improve our guidelines for using GitTables. You can report this through the contact form on

Files (25.6 GB)
Name Size
313.3 MB Download
281.3 MB Download
755.5 MB Download
1.9 GB Download
5.3 GB Download
291.3 MB Download
4.8 GB Download
183.7 MB Download
6.2 GB Download
5.5 GB Download
All versions This version
Views 353353
Downloads 9595
Data volume 145.8 GB145.8 GB
Unique views 277277
Unique downloads 4848


Cite as