Published June 7, 2024 | Version v1
Dataset Open

Mammotab 24 (SemTab)

  • 1. ROR icon University of Milano-Bicocca

Description

MammoTab is a dataset designed to evaluate semantic table annotation approaches.

It includes annotations about cell/mentions to Knowledge Graph (KG) entity matching (CEA task).

It is composed of 2500 (2000 for training and 500 for testing) tables extracted from 21149260 Wikipedia pages and annotated through Wikidata v. 20220708. The dataset is compliant with the data format used in SemTab

Files

mammotab_semtab_challenge_2024.zip

Files (21.7 MB)

Name Size Download all
md5:37b7859434cdee5e60750d527ea637bf
21.7 MB Preview Download