Spanish Fake News Dataset
Creators
- 1. Universidad Politécnica de Madrid
Description
Spanish Fake News Dataset
This dataset contains a structured and annotated collection of false news items in Spanish (Castilian), gathered and processed for academic research on misinformation.
Dataset Scope
The dataset represents most of the recorded false news messages and their variations up to 01.02.2021.
Content Description
The dataset includes samples of false information in various formats:
- News articles and headlines
- Tweets and Facebook/Instagram/Telegram posts
- YouTube video captions
- WhatsApp text and voice message transcripts
- Transcribed video/audio fragments with false claims
- Fake government documents
- Captions from photos and memes
- Text extracted from images using OCR
Only Spanish (Castilian) texts were used, excluding regional variants (e.g., Catalan, Basque, Galician) for consistency.
Sources
The data was collected from the following verified fact-checking initiatives:
Fact-checkers from these organizations provide detailed articles identifying and explaining falsehoods, often including:
- General context of the event
- Quotes or links to false claims
- Analysis and explanation of why the claims are false
- Verified information or corrections
Collection Method
The dataset was built using both manual extraction (e.g., identifying and quoting false statements) and automated parsing:
- MyNews service: an archive of Spanish mass media
- Custom scripts: for parsing and extracting structured data
- OCR tools: for extracting text from images (e.g., memes and screenshots)
Fields Description
Column Name |
Description |
Topic |
The thematic category of the news item (e.g., Politics, Health, COVID-19, Crime). Normalized and translated to English. |
Link source |
URL to the original news piece, fact-check report, or source of the claim. Invalid links were removed. |
Media |
The platform or outlet where the false claim appeared (e.g., Facebook, YouTube, WhatsApp). Normalized for consistent spelling and language. |
Date |
Publication or verification date of the news item, in YYYY-MM-DD format. |
Author |
(Optional) Author of the news or platform source, if available. May be empty. |
Headlines |
Title or summary of the news item or article containing the false information. |
Fake statement |
Quoted false claim or misinformation as cited in the verification article. |
⚠️ Notes
- The dataset was preprocessed to remove duplicates, invalid links, and non-textual clutter.
- Field values were normalized to support multilingual and cross-platform analysis.
- Only Castilian Spanish was retained for consistency and clarity.
📚 License & Use
This dataset is intended for non-commercial academic and research purposes. Please cite the original fact-checking organizations and this dataset if used in publications or analysis.
Files
esp_fake_news.csv
Files
(1.6 MB)
Name | Size | Download all |
---|---|---|
md5:e0dce05c01037592952076801a2af619
|
1.6 MB | Preview Download |
md5:cf17527874830c5417ddeef2e0e01d0a
|
3.7 kB | Preview Download |
Additional details
Related works
- Is supplement to
- Conference paper: 10.1007/978-3-031-21753-1_5 (DOI)
Dates
- Available
-
2021-02