FactSpan: Multilingual Fact-Checking Dataset

Saju, Lorraine

doi:10.5281/zenodo.15084388

Published March 25, 2025 | Version 1.0.0

Dataset Open

FactSpan: Multilingual Fact-Checking Dataset

Saju, Lorraine

The FactSpan dataset is an extension of the X-Fact dataset, designed to support multilingual fact-checking research. This dataset overcomes limitations in existing datasets by incorporating recent data from the ClaimReview Markup for Data Commons Feed and providing detailed annotations.

Key Features:

Data Source: Claims are sourced from both the X-Fact dataset (up to 2020) and the Data Commons Feed (post-2020).
Validity: Claims are filtered to include only those from organizations recognized by the International Fact-Checking Network (IFCN) and Duke Reporters’ Lab, ensuring high reliability.
Standardized Labels: Verdict labels are standardized into five categories: False, Mostly False, Partly False/Misleading, Mostly True, and True.
Annotations (Annotated Dataset Only): The FactSpan_annotated.csv dataset includes rich annotations generated using GPT-3.5:
- label: The standardized verdict label.
- claim: The fact-checked claim.
- claimDate: The date of the claim.
- claim_year: The year of the claim.
- language: The language of the claim.
- Position Statements: Indicates the presence of position statements.
- Entity/Event Properties: Indicates the presence of entity or event properties.
- Quote: Indicates the presence of quotes.
- Numerical Data: Indicates the presence of numerical data.
- claim type: Categorizes the claim as factual or opinion.
- topics: Categorizes the claim into one of five predefined topics (Health and Pandemics, Politics and Governance, Society and Culture, Economy and Environment, Conflict and Security).
- mapped_label: An additional mapped label, for edge cases or further label mappings.
Unannotated Dataset: The FactSpan.csv dataset includes:
- label: The standardized verdict label.
- claim: The fact-checked claim.
- claimDate: The date of the claim.
- language: The language of the claim.

Purpose:

This dataset aims to facilitate research in multilingual fact-checking, providing a comprehensive and up-to-date resource for developing and evaluating fact-checking models.

Repository:

The dataset is maintained in the GitHub repository. The repository also contains scripts for expanding and updating the dataset.

This work was supported by the German Research Foundation (DFG, project no. 504226141).

Files

FactSpan.csv

Files (25.1 MB)

Name	Size	Download all
FactSpan.csv md5:069bd2039175db0d1af701348a818d8a	11.0 MB	Preview Download
FactSpan_annotated.csv md5:254724da69ecee21209ecefbcaf79b4a	14.1 MB	Preview Download

Additional details

Repository URL: https://github.com/lorraine-dev/FactSpan
Programming language: Python
Development Status: Active

	All versions	This version
Views	95	95
Downloads	164	164
Data volume	2.2 GB	2.2 GB

FactSpan: Multilingual Fact-Checking Dataset

Creators

Description

Files

FactSpan.csv

Files (25.1 MB)

Additional details

Software