Published May 1, 2024 | Version v1
Dataset Open

Datasets for the paper: Lost in Translation: Using Global Fact-Checks to Measure Multilingual Misinformation Prevalence, Spread, and Evolution

Description

FullData.csv.gz: Contains links to all claims in the data-set.

  • publishing_date: Date on which the fact-check was published.
  • claim_date: Date that claim was made.
  • verdict: Rating given by the fact-checking organisation.
  • language: Language of the claim.
  • cluster_{threshold}: ID of the cluster that claim belongs to at all given clusters. Entry "0" means that claim is singleton and not clustered with any other claims.

Embeddings.npy: Contains a dictionary linking each claim to it's embedding calculated with LaBSE.

Files

Files (873.5 MB)

Name Size Download all
md5:8a02db004e06ffb0b9ec8b5f63c2a5a2
859.2 MB Download
md5:69e95774a818320ba3ffcc85a6f09d77
14.3 MB Download

Additional details

Software