Published January 8, 2024 | Version v1.0
Dataset Open

VERITE Benchmark

  • 1. ROR icon Centre for Research and Technology Hellas
  • 2. ROR icon Aristotle University of Thessaloniki

Description

VERITE is a benchmark dataset designed for evaluating multimodal misinformation detection models. The dataset consists of real-world instances of misinformation collected from Snopes and Reuters and addresses unimodal bias by excluding asymmetric misinformation and employing modality balancing. The images are sourced from within the articles of Snopes and Reuters, as well as Google Images. As we do not own the rights to the images, the dataset provide the image URLs along with their captions and labels. VERITE supports multiclass classification of three categories: Truthful, Out-of-context, and Miscaptioned image-caption pairs but can also be used for binary classification. We collected 260 articles from Snopes and 78 from Reuters that met our criteria which translates to 338 Truthful, 338 Miscaptioned and 324 Out-of-Context pairs. 

For more information on how to use the dataset visit: https://github.com/stevejpapad/image-text-verification. If you encounter any problems while downloading and preparing VERITE (e.g., broken image URLs), please contact stefpapad@iti.gr.

The dataset was developed in the context of the vera.ai (VERification Assisted by Artificial Intelligence) project.

 

Files

VERITE.zip

Files (110.7 kB)

Name Size Download all
md5:9e331f3999b8ffd7265491268009c139
110.7 kB Preview Download

Additional details

Additional titles

Subtitle
Verification of Image-Text pairs

Related works

Is compiled by
Journal article: 10.1007/s13735-023-00312-6 (DOI)

Funding

European Commission
vera.ai – vera.ai: VERification Assisted by Artificial Intelligence 101070093