FactSpan: Multilingual Fact-Checking Dataset
Creators
Description
The FactSpan dataset is an extension of the X-Fact dataset, designed to support multilingual fact-checking research. This dataset overcomes limitations in existing datasets by incorporating recent data from the ClaimReview Markup for Data Commons Feed and providing detailed annotations.
Key Features:
- Data Source: Claims are sourced from both the X-Fact dataset (up to 2020) and the Data Commons Feed (post-2020).
- Validity: Claims are filtered to include only those from organizations recognized by the International Fact-Checking Network (IFCN) and Duke Reporters’ Lab, ensuring high reliability.
- Standardized Labels: Verdict labels are standardized into five categories: False, Mostly False, Partly False/Misleading, Mostly True, and True.
- Annotations (Annotated Dataset Only): The
FactSpan_annotated.csvdataset includes rich annotations generated using GPT-3.5:label: The standardized verdict label.claim: The fact-checked claim.claimDate: The date of the claim.claim_year: The year of the claim.language: The language of the claim.Position Statements: Indicates the presence of position statements.Entity/Event Properties: Indicates the presence of entity or event properties.Quote: Indicates the presence of quotes.Numerical Data: Indicates the presence of numerical data.claim type: Categorizes the claim as factual or opinion.topics: Categorizes the claim into one of five predefined topics (Health and Pandemics, Politics and Governance, Society and Culture, Economy and Environment, Conflict and Security).mapped_label: An additional mapped label, for edge cases or further label mappings.
- Unannotated Dataset: The
FactSpan.csvdataset includes:label: The standardized verdict label.claim: The fact-checked claim.claimDate: The date of the claim.language: The language of the claim.
Purpose:
This dataset aims to facilitate research in multilingual fact-checking, providing a comprehensive and up-to-date resource for developing and evaluating fact-checking models.
Repository:
The dataset is maintained in the GitHub repository. The repository also contains scripts for expanding and updating the dataset.
This work was supported by the German Research Foundation (DFG, project no. 504226141).
Files
FactSpan.csv
Files
(25.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:069bd2039175db0d1af701348a818d8a
|
11.0 MB | Preview Download |
|
md5:254724da69ecee21209ecefbcaf79b4a
|
14.1 MB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/lorraine-dev/FactSpan
- Programming language
- Python
- Development Status
- Active