VERITE Benchmark
Creators
Description
VERITE is a benchmark dataset designed for evaluating multimodal misinformation detection models. The dataset consists of real-world instances of misinformation collected from Snopes and Reuters and addresses unimodal bias by excluding asymmetric misinformation and employing modality balancing. The images are sourced from within the articles of Snopes and Reuters, as well as Google Images. As we do not own the rights to the images, the dataset provide the image URLs along with their captions and labels. VERITE supports multiclass classification of three categories: Truthful, Out-of-context, and Miscaptioned image-caption pairs but can also be used for binary classification. We collected 260 articles from Snopes and 78 from Reuters that met our criteria which translates to 338 Truthful, 338 Miscaptioned and 324 Out-of-Context pairs.
For more information on how to use the dataset visit: https://github.com/stevejpapad/image-text-verification. If you encounter any problems while downloading and preparing VERITE (e.g., broken image URLs), please contact stefpapad@iti.gr.
The dataset was developed in the context of the vera.ai (VERification Assisted by Artificial Intelligence) project.
Files
VERITE.zip
Files
(110.7 kB)
Name | Size | Download all |
---|---|---|
md5:9e331f3999b8ffd7265491268009c139
|
110.7 kB | Preview Download |
Additional details
Additional titles
- Subtitle
- Verification of Image-Text pairs
Related works
- Is compiled by
- Journal article: 10.1007/s13735-023-00312-6 (DOI)
Funding
Software
- Repository URL
- https://github.com/stevejpapad/image-text-verification