UPDATE: Zenodo migration postponed to Oct 13 from 06:00-08:00 UTC. Read the announcement.

Dataset Open Access

GitTables benchmark - column type detection

Madelon Hulsebos; Çağatay Demiralp; Paul Demiralp

Note: the download page of the entire GitTables corpus is here: https://zenodo.org/record/4943312.

This dataset represents a small subset of tables from GitTables curated for benchmarking column type detection methods. This benchmark evaluates systems that match table columns to semantic types from the DBpedia and Schema.org ontologies. It is featured in the SemTab 2021 challenge (CTA task).

This dataset consists of the following files:

  • “tables.zip”: directory with a sample of 1101 tables from GitTables. Filenames correspond to table IDs, the first column (without column name) corresponds to row indices, column names are replaced with "col_0", "col_1", etc. which match to the targets and labels (semantic types).
  • “<ontology>_targets.csv”: target columns per table ID, 1 file per ontology (DBpedia or Schema.org): columns are “table_id” (ignore the "_<ontology>" suffix) and “target_column” (i.e. the column that should be annotated).
  • “<ontology>_gt.csv”: ground truth column annotations per table ID, 1 file per ontology: columns are “table_id” (ignore the _<ontology> suffix), “target_column”, “annotation_id”, “annotation_label”.
  • “<ontology>_labels.csv”: unique labels present in the annotated tables, 1 file per ontology: columns are “annotation_id” and “annotation_label”.

The labels (semantic types) from each ontology come from:

For the entire GitTables corpus, please refer to this dataset. Visit https://gittables.github.io for more background and contact details.

Files (3.6 MB)
Name Size
dbpedia_gt.csv
md5:47914a0569d1bfa7dd2391f6172f2267
176.7 kB Download
dbpedia_labels.csv
md5:40f5ee3776222a9dfb21cf8f29838862
5.7 kB Download
dbpedia_targets.csv
md5:e8ad9a2ae93b92ccb4ad516c2d19dda5
75.2 kB Download
schema_gt.csv
md5:9baf387427a8f543efc71288fcefe170
36.1 kB Download
schema_labels.csv
md5:86503c8c6ee1e5d513a9d2c51f9dbb3d
1.5 kB Download
schema_targets.csv
md5:9c0c6de86c97632536230180dda6e425
20.1 kB Download
tables.zip
md5:c61792bc7b77dcf29c165d505a02dea1
3.3 MB Download
1,146
1,985
views
downloads
All versions This version
Views 1,1461,146
Downloads 1,9851,985
Data volume 930.5 MB930.5 MB
Unique views 995995
Unique downloads 894894

Share

Cite as