Published March 28, 2025 | Version v1
Dataset Restricted

MultiClaimNet: A Massively Multilingual Dataset of Fact-Checked Claim Clusters

  • 1. ROR icon Queen Mary University of London
  • 2. Newtral Media Audiovisual
  • 3. ROR icon National University of Distance Education

Description

MultiClaimNet is a collection of three multilingual claim cluster datasets. The claims discussing similar facts are automatically grouped and annotated with a cluster ID. The following three datasets within MultiClaimNet contain claims written in 86 languages across diverse topics. 

 

Dataset Number of Claims Number of Clusters Number of Languages
ClaimCheck 1187 197 22
ClaimMatch 1171 192 36
MultiClaim 85.3K 30.9K 78

 

Preprint: https://arxiv.org/abs/2503.22280

 

Content:

  • Claim - Factchecked Claim
  • ClusterID - Cluster ID
  • Language - Original language of the claim
  • Translation - English translation
  • NID - Unique identifier of the Claim

In addition to the above fields, the MultiClaim dataset contains the following fields from the original dataset.

  • Timestamp
  • URL

 

References

If you use any dataset from MultiClaimNet, in any publication, project, tool, or in any other form, please, cite the following paper:

@misc{panchendrarajan2025multiclaimnet,
      title={MultiClaimNet: A Massively Multilingual Dataset of Fact-Checked Claim Clusters}, 
      author={Rrubaa Panchendrarajan and Rubén Míguez and Arkaitz Zubiaga},
      year={2025},
      eprint={2503.22280},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2503.22280}, 
}


If you use the MultiClaim dataset from the MultiClaimNet collection in any publication, project, tool, or in any other form, please, cite the following paper in addition to the above:

@inproceedings{pikuliak-etal-2023-multilingual,
    title = "Multilingual Previously Fact-Checked Claim Retrieval",
    author = "Pikuliak, Mat{\'u}{\v{s}} and Srba, Ivan and Moro, Robert and Hromadka, Timo and Smole{\v{n}}, Timotej and Meli{\v{s}}ek, Martin and Vykopal, Ivan and Simko, Jakub and Podrou{\v{z}}ek, Juraj and Bielikova, Maria",
    editor = "Bouamor, Houda  and Pino, Juan  and Bali, Kalika",
    booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.emnlp-main.1027",
    doi = "10.18653/v1/2023.emnlp-main.1027",
    pages = "16477--16500",
}

 

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Request access

If you would like to request access to these files, please fill out the form below.

You need to satisfy these conditions in order for this request to be accepted:

Request access

If you would like to request access to these files, please fill out the form below.

You need to satisfy these conditions in order for this request to be accepted:

In order to share the dataset with you, please agree to the following terms:

  1. You will use the dataset strictly only for research purposes. The request for access to the dataset must be sent from the official and existing e-mail address of the relevant university, faculty, or other scientific or research institution (for verification purposes).
  2. You will not re-share the dataset (or any of its parts) with anyone else not included in this request.  
  3. You will appropriately cite the papers mentioned in the dataset description in any publication, project, tool using this dataset.
  4. You understand how the dataset was created and that the manual or automatically predicted annotations may not be 100% correct. 
  5. You acknowledge that you are fully responsible for the use of the dataset (data) and for any infringement of the rights of third parties (in particular copyright) that may arise from its use beyond the intended purposes.

You are currently not logged in. Do you have an account? Log in here

Additional details

Funding

European Union
101073351
UK Research and Innovation
101073351