Published June 4, 2025 | Version v1
Dataset Open

Spanish Fake News Dataset

Description

Spanish Fake News Dataset

This dataset contains a structured and annotated collection of false news items in Spanish (Castilian), gathered and processed for academic research on misinformation.

Dataset Scope

The dataset represents most of the recorded false news messages and their variations up to 01.02.2021.

Content Description

The dataset includes samples of false information in various formats:

  • News articles and headlines
  • Tweets and Facebook/Instagram/Telegram posts
  • YouTube video captions
  • WhatsApp text and voice message transcripts
  • Transcribed video/audio fragments with false claims
  • Fake government documents
  • Captions from photos and memes
  • Text extracted from images using OCR

Only Spanish (Castilian) texts were used, excluding regional variants (e.g., Catalan, Basque, Galician) for consistency.

Sources

The data was collected from the following verified fact-checking initiatives:

Fact-checkers from these organizations provide detailed articles identifying and explaining falsehoods, often including:

  • General context of the event
  • Quotes or links to false claims
  • Analysis and explanation of why the claims are false
  • Verified information or corrections

Collection Method

The dataset was built using both manual extraction (e.g., identifying and quoting false statements) and automated parsing:

  • MyNews service: an archive of Spanish mass media
  • Custom scripts: for parsing and extracting structured data
  • OCR tools: for extracting text from images (e.g., memes and screenshots)

 

Fields Description

Column Name

Description

Topic

The thematic category of the news item (e.g., Politics, Health, COVID-19, Crime). Normalized and translated to English.

Link source

URL to the original news piece, fact-check report, or source of the claim. Invalid links were removed.

Media

The platform or outlet where the false claim appeared (e.g., Facebook, YouTube, WhatsApp). Normalized for consistent spelling and language.

Date

Publication or verification date of the news item, in YYYY-MM-DD format.

Author

(Optional) Author of the news or platform source, if available. May be empty.

Headlines

Title or summary of the news item or article containing the false information.

Fake statement

Quoted false claim or misinformation as cited in the verification article.

 

⚠️ Notes

  • The dataset was preprocessed to remove duplicates, invalid links, and non-textual clutter.
  • Field values were normalized to support multilingual and cross-platform analysis.
  • Only Castilian Spanish was retained for consistency and clarity.

📚 License & Use

This dataset is intended for non-commercial academic and research purposes. Please cite the original fact-checking organizations and this dataset if used in publications or analysis.

Files

esp_fake_news.csv

Files (1.6 MB)

Name Size Download all
md5:e0dce05c01037592952076801a2af619
1.6 MB Preview Download
md5:cf17527874830c5417ddeef2e0e01d0a
3.7 kB Preview Download

Additional details

Related works

Is supplement to
Conference paper: 10.1007/978-3-031-21753-1_5 (DOI)

Dates

Available
2021-02