Published August 22, 2022
| Version 1.0
Dataset
Open
MammoTab 22: a giant and comprehensive dataset for Semantic Table Interpretation
Authors/Creators
- 1. University of Milano - Bicocca
Description
MammoTab is a dataset designed to evaluate semantic table annotation approaches.
It includes two types of annotation:
- cell/mentions to Knowledge Graph (KG) entity matching (CEA task) and;
- column to KG class matching (CTA task).
It is composed of 980254 tables extracted from 21149260 Wikipedia pages and annotated through Wikidata v. 20220708. The dataset is compliant with the data format used in SemTab2019.
Files
mammotab_dataset.zip
Files
(3.1 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:a4e3675ef0bd21bfe8b2c631b7e2d8c9
|
1.9 GB | Preview Download |
|
md5:6037ee0db8ebd16405a19c6d74664a58
|
1.2 GB | Preview Download |
Additional details
References
- Marzocchi, M., Cremaschi, M., Pozzi, R., Avogadro, R., & Palmonari, M. (2022). MammoTab: a giant and comprehensive dataset for Semantic Table Interpretation. Proceedings of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching, SemTab2022.