Recod.ai Scientific Image Integrity Dataset (RSIID)
Description
The Recod.ai Scientific Image Integrity Dataset (RSIID) is a benchmark dataset designed for evaluating forgery detection methods in scientific images. This dataset comprises 39,423 synthetically tampered figures, derived from 2,923 pristine scientific images sourced from Creative Commons repositories. The dataset is divided into training (26,496 figures) and testing (12,927 figures) sets, all licensed under Creative Commons Attribution (CC-BY).
The RSIID is structured by forgery modality and figure complexity, categorized into "Simple" and "Compound" figures:
- Simple Scientific Figures: These include forgeries created through Retouching (Blurring, Contrast adjustments), Cleaning (Inpainting, Brute-force removal), and Duplication (Copy-Move, Splicing, Overlap).
- Compound Scientific Figures: These figures consist of multiple panels, where forgeries can occur within a single panel (intra-forgery) or between panels (inter-forgery).
Additional Resources:
This repository also includes:
- Source Images and Metadata: The original, untampered images used to create the dataset, along with a spreadsheet (
artificial_forgery_src_data.zip) detailing the source of each image. - Compound Forgery Templates: Templates used for creating the compound forgeries (
template.zip).
Related Content:
Research Article - Benchmarking Scientific Image Forgery Detectors
GitHub Repository - Recod.ai Scientific Image Integrity Library
Citation
The dataset is distributed under the Attribution 4.0 International (CC BY 4.0) license.
If you use any content from this repository, please cite:
@article{cardenuto_2022,
title={Benchmarking scientific image forgery detectors},
volume={28}, DOI={10.1007/s11948-022-00391-4},
number={4},
journal={Science and Engineering Ethics},
author={Cardenuto, João P. and Rocha, Anderson}, year={2022}
}
Table of contents
Dataset Breakdown (Simple Forgeries):
|
Simple Forgeries |
|
|
|
|---|---|---|---|
|
|
Data Type |
Train |
Test |
|
|
Modality |
Number of figures |
Number of figures |
|
Source of forgery figures |
- |
1932 |
991 |
|
Pristine |
- |
1932 |
991 |
|
Duplication |
Copy–Move |
3761 |
1629 |
|
Splicing |
604 |
274 |
|
|
Overlap |
0 |
660 |
|
|
Total |
4365 |
2563 |
|
|
Cleaning |
Inpainting |
275 |
117 |
|
Brute-force |
961 |
412 |
|
|
Total |
1232 |
529 |
|
|
Retouching |
Blurring |
961 |
414 |
|
Contrast |
966 |
415 |
|
|
Total |
1927 |
829 |
|
|
Total of figures |
9456 |
4912 |
Dataset Breakdown (Compound Forgeries):
|
Compound Forgeries |
|
|
|
|
|---|---|---|---|---|
|
|
|
Data Type |
Train |
Test |
|
Forgery Location |
|
Modality |
Number of figures |
Number of figures |
|
Source of forgery figures |
- |
1932 |
991 |
|
|
Inter-panel |
Duplication |
Copy–Move |
9516 |
4094 |
|
Splicing |
604 |
274 |
||
|
Overlap |
0 |
660 |
||
|
Total |
10120 |
5028 |
||
|
Intra-panel |
Duplication |
Copy-Move |
3761 |
1629 |
|
Total |
3761 |
1629 |
||
|
Cleaning |
Inpaiting |
275 |
117 |
|
|
Brute-Force |
957 |
412 |
||
|
Total |
1232 |
529 |
||
|
Retouching |
Blurring |
961 |
414 |
|
|
Contrast |
966 |
415 |
||
|
Total |
1927 |
829 |
||
|
Total of figures |
|
|
17040 |
8015 |
Files
artificial_forgery_src_data.zip
Additional details
Funding
- Fundação de Amparo à Pesquisa do Estado de São Paulo
- 2020/02211-2
- Fundação de Amparo à Pesquisa do Estado de São Paulo
- 2017/12646-3
Dates
- Accepted
-
2022-08-09
Software
- Repository URL
- https://github.com/phillipecardenuto/rsiil