Published December 6, 2023 | Version 1.0.0
Dataset Open

tFoodL: Larger Semantic Table Annotations Benchmark for Food Domain

  • 1. Heinz Nixdorf Chair for Distributed Information Systems, Friedrich Schiller University Jena, Germany
  • 2. City, University of London, UK
  • 3. IBM Research, USA
  • 4. Heinz Nixdorf Chair for Distributed Information Systems, Friedrich Schiller University Jena, Jena, Germany

Description

tFoodL is the successor work of tFood that is generated by KG2Tables using 10 levels of a recursive hierarchy of related concepts in Wikidata.

Similar to tFood, it is a dataset for tabular data to knowledge graph matching. It is derived for the Food domain and has two types of tables. On the one hand, Horizontal Relational Tables are where each table represents a collection of entities. On the other hand, Entity Tables represent a single entity. We supported ground truth data from Wikidata as a target knowledge graph (KG).

tFoodL contains 43,255 entity and horizontal tables, while this repository contains only the validation fold (10%) of the entire benchmark with its ground truth data (gt). 

The supported tasks for semantic table annotations are: 

  1. Topic Detection (TD) links the entire table to an entity or a class from the target KG.
  2. Cell Entity Annotation (CEA) maps individual table cells to entities from the target KG.
  3. Column Type Annotation (CTA) links individual table columns to classes from the target KG.
  4. Column Property Annotation (CPA) detects the relations between column pairs from the target knowledge graph.
  5. Row Annotation (RA) annotates the entire row to a KG entity or property.

Files

tFood10-val.zip

Files (2.9 MB)

Name Size Download all
md5:f75dc903100ffe8c1efd085fea6e1df7
2.9 MB Preview Download

Additional details

Related works

Is derived from
Software: https://github.com/fusion-jena/KG2Tables (URL)
Is variant form of
Dataset: 10.5281/zenodo.10048187 (DOI)