MultiClaimNet: A Massively Multilingual Dataset of Fact-Checked Claim Clusters
Creators
Description
MultiClaimNet is a collection of three multilingual claim cluster datasets. The claims discussing similar facts are automatically grouped and annotated with a cluster ID. The following three datasets within MultiClaimNet contain claims written in 86 languages across diverse topics.
Dataset | Number of Claims | Number of Clusters | Number of Languages |
ClaimCheck | 1187 | 197 | 22 |
ClaimMatch | 1171 | 192 | 36 |
MultiClaim | 85.3K | 30.9K | 78 |
Preprint: https://arxiv.org/abs/2503.22280
Content:
- Claim - Factchecked Claim
- ClusterID - Cluster ID
- Language - Original language of the claim
- Translation - English translation
- NID - Unique identifier of the Claim
In addition to the above fields, the MultiClaim dataset contains the following fields from the original dataset.
- Timestamp
- URL
References
If you use any dataset from MultiClaimNet, in any publication, project, tool, or in any other form, please, cite the following paper:
@misc{panchendrarajan2025multiclaimnet,
title={MultiClaimNet: A Massively Multilingual Dataset of Fact-Checked Claim Clusters},
author={Rrubaa Panchendrarajan and Rubén Míguez and Arkaitz Zubiaga},
year={2025},
eprint={2503.22280},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2503.22280},
}
If you use the MultiClaim dataset from the MultiClaimNet collection in any publication, project, tool, or in any other form, please, cite the following paper in addition to the above:
@inproceedings{pikuliak-etal-2023-multilingual,
title = "Multilingual Previously Fact-Checked Claim Retrieval",
author = "Pikuliak, Mat{\'u}{\v{s}} and Srba, Ivan and Moro, Robert and Hromadka, Timo and Smole{\v{n}}, Timotej and Meli{\v{s}}ek, Martin and Vykopal, Ivan and Simko, Jakub and Podrou{\v{z}}ek, Juraj and Bielikova, Maria",
editor = "Bouamor, Houda and Pino, Juan and Bali, Kalika",
booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
month = dec,
year = "2023",
address = "Singapore",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.emnlp-main.1027",
doi = "10.18653/v1/2023.emnlp-main.1027",
pages = "16477--16500",
}
Files
Additional details
Funding
- European Union
- 101073351
- UK Research and Innovation
- 101073351