AFP-Sum: A Generative-AI-Driven Claim Retrieval System Capable of Detecting and Retrieving Claims from Social Media Platforms in Multiple Languages
- 1. Kempelen Institute of Intelligent Technologies
Description
A Generative-AI-Driven Claim Retrieval System Capable of Detecting and Retrieving Claims from Social Media Platforms in Multiple Languages
Abstract: Online disinformation poses a global challenge, placing significant demands on fact-checkers who must verify claims efficiently to prevent the spread of false information. A major issue in this process is the redundant verification of already fact-checked claims, which increases workload and delays responses to newly emerging claims. This research introduces an approach that retrieves previously fact-checked claims, evaluates their relevance to a given input, and provides supplementary information to support fact-checkers. Our method employs large language models (LLMs) to filter irrelevant fact-checks and generate concise summaries and explanations, enabling fact-checkers to faster assess whether a claim has been verified before. In addition, we evaluate our approach through both automatic and human assessments, where humans interact with the developed tool to review its effectiveness. Our results demonstrate that LLMs are able to filter out many irrelevant fact-checks and, therefore, reduce effort and streamline the fact-checking process.
Paper: https://arxiv.org/abs/2504.20668
GitHub Repository: https://github.com/kinit-sk/claim-retrieval
The data are available upon request for research purposes only.
References
If you use this dataset in any publication, project, tool or in any other form, please cite the following paper:
@misc{vykopal2025generativeaidrivenclaimretrievalcapable,
title={A Generative-AI-Driven Claim Retrieval System Capable of Detecting and Retrieving Claims from Social Media Platforms in Multiple Languages},
author={Ivan Vykopal and Martin Hyben and Robert Moro and Michal Gregor and Jakub Simko},
year={2025},
eprint={2504.20668},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2504.20668},
}
Content
-
afp-sum.csv - AFP-Sum dataset consisting of around 19K fact-checks across 23 languages
-
id - Article ID
-
url - A URL of a fact-checking article
-
text - A text extracted from the fact-checking article
-
summary - A summary extracted from the fact-checking article
-
processed_text - Text of the fact-checking article without the summary
-
language - Language of the fact-checking article
-
sample2.csv - Sample of 2 fact-checking articles per language from the AFP-Sum dataset
-
id - Article ID
-
url - A URL of a fact-checking article
-
text - A text extracted from the fact-checking article
-
summary - A summary extracted from the fact-checking article
-
processed_text - Text of the fact-checking article without the summary
-
language - Language of the fact-checking article
-
sample100.csv - Sample of 100 fact-checking articles per language from the AFP-Sum dataset
-
id - Article ID
-
url - A URL of a fact-checking article
-
text - A text extracted from the fact-checking article
-
summary - A summary extracted from the fact-checking article
-
processed_text - Text of the fact-checking article without the summary
-
language - Language of the fact-checking article
-
fact_checks_metadata.csv - Metadata for the MultiClaim dataset and especially for the fact-checking articles
-
fact_check_id - Id of the fact-checks from the original MultiClaim dataset
-
url - A URL of the fact-checking article
-
rating_category - Rating extracted from the fact-checks metadata
-
language - Language of the fact-checking article
-
published_at - Publication date of the fact-checking article
Acknowledgments
This project is funded by the European Media and Information Fund (grant number 291191). The sole responsibility for any content supported by the European Media and Information Fund lies with the author(s) and it may not necessarily reflect the positions of the EMIF and the Fund Partners, the Calouste Gulbenkian Foundation and the European University Institute.
Files
Additional details
Software
- Repository URL
- https://github.com/kinit-sk/claim-retrieval