Published November 16, 2021 | Version v1
Dataset Open

GitTables benchmark - column type detection

  • 1. University of Amsterdam
  • 2. Sigma Computing

Description

Note: the download page of the entire GitTables corpus is here: https://zenodo.org/record/4943312.

This dataset represents a small subset of tables from GitTables curated for benchmarking column type detection methods. This benchmark evaluates systems that match table columns to semantic types from the DBpedia and Schema.org ontologies. It is featured in the SemTab 2021 challenge (CTA task).

This dataset consists of the following files:

  • “tables.zip”: directory with a sample of 1101 tables from GitTables. Filenames correspond to table IDs, the first column (without column name) corresponds to row indices, column names are replaced with "col_0", "col_1", etc. which match to the targets and labels (semantic types).
  • “<ontology>_targets.csv”: target columns per table ID, 1 file per ontology (DBpedia or Schema.org): columns are “table_id” (ignore the "_<ontology>" suffix) and “target_column” (i.e. the column that should be annotated).
  • “<ontology>_gt.csv”: ground truth column annotations per table ID, 1 file per ontology: columns are “table_id” (ignore the _<ontology> suffix), “target_column”, “annotation_id”, “annotation_label”.
  • “<ontology>_labels.csv”: unique labels present in the annotated tables, 1 file per ontology: columns are “annotation_id” and “annotation_label”.

The labels (semantic types) from each ontology come from:

For the entire GitTables corpus, please refer to this dataset. Visit https://gittables.github.io for more background and contact details.

Files

dbpedia_gt.csv

Files (3.6 MB)

Name Size Download all
md5:47914a0569d1bfa7dd2391f6172f2267
176.7 kB Preview Download
md5:40f5ee3776222a9dfb21cf8f29838862
5.7 kB Preview Download
md5:e8ad9a2ae93b92ccb4ad516c2d19dda5
75.2 kB Preview Download
md5:9baf387427a8f543efc71288fcefe170
36.1 kB Preview Download
md5:86503c8c6ee1e5d513a9d2c51f9dbb3d
1.5 kB Preview Download
md5:9c0c6de86c97632536230180dda6e425
20.1 kB Preview Download
md5:c61792bc7b77dcf29c165d505a02dea1
3.3 MB Preview Download