Published August 22, 2022 | Version 1.0
Dataset Open

MammoTab 22: a giant and comprehensive dataset for Semantic Table Interpretation

  • 1. University of Milano - Bicocca

Description

MammoTab is a dataset designed to evaluate semantic table annotation approaches.

It includes two types of annotation:

  1. cell/mentions to Knowledge Graph (KG) entity matching (CEA task) and;
  2. column to KG class matching (CTA task).

It is composed of 980254 tables extracted from 21149260 Wikipedia pages and annotated through Wikidata v. 20220708. The dataset is compliant with the data format used in SemTab2019.

Files

mammotab_dataset.zip

Files (3.1 GB)

Name Size Download all
md5:a4e3675ef0bd21bfe8b2c631b7e2d8c9
1.9 GB Preview Download
md5:6037ee0db8ebd16405a19c6d74664a58
1.2 GB Preview Download

Additional details

References

  • Marzocchi, M., Cremaschi, M., Pozzi, R., Avogadro, R., & Palmonari, M. (2022). MammoTab: a giant and comprehensive dataset for Semantic Table Interpretation. Proceedings of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching, SemTab2022.