Recod.ai Scientific Image Integrity Dataset (RSIID)

Cardenuto, João Phillipe; Rocha, Anderson

doi:10.1007/s11948-022-00391-4

Published August 9, 2022 | Version v1

Dataset Open

Recod.ai Scientific Image Integrity Dataset (RSIID)

1. Universidade Estadual de Campinas (UNICAMP)

The Recod.ai Scientific Image Integrity Dataset (RSIID) is a benchmark dataset designed for evaluating forgery detection methods in scientific images. This dataset comprises 39,423 synthetically tampered figures, derived from 2,923 pristine scientific images sourced from Creative Commons repositories. The dataset is divided into training (26,496 figures) and testing (12,927 figures) sets, all licensed under Creative Commons Attribution (CC-BY).

The RSIID is structured by forgery modality and figure complexity, categorized into "Simple" and "Compound" figures:

Simple Scientific Figures: These include forgeries created through Retouching (Blurring, Contrast adjustments), Cleaning (Inpainting, Brute-force removal), and Duplication (Copy-Move, Splicing, Overlap).
Compound Scientific Figures: These figures consist of multiple panels, where forgeries can occur within a single panel (intra-forgery) or between panels (inter-forgery).

Additional Resources:

This repository also includes:

Source Images and Metadata: The original, untampered images used to create the dataset, along with a spreadsheet (artificial_forgery_src_data.zip) detailing the source of each image.
Compound Forgery Templates: Templates used for creating the compound forgeries (template.zip).

Citation

The dataset is distributed under the Attribution 4.0 International (CC BY 4.0) license.

If you use any content from this repository, please cite:

 @article{cardenuto_2022, 
 title={Benchmarking scientific image forgery detectors},
 volume={28}, DOI={10.1007/s11948-022-00391-4},
 number={4},
 journal={Science and Engineering Ethics},
 author={Cardenuto, João P. and Rocha, Anderson}, year={2022}
 }

	Data Type	Train	Test
Simple Forgeries
	Modality	Number of figures	Number of figures
Source of forgery figures	-	1932	991
Pristine	-	1932	991
Duplication	Copy–Move	3761	1629
	Splicing	604	274
	Overlap	0	660
	Total	4365	2563
Cleaning	Inpainting	275	117
	Brute-force	961	412
	Total	1232	529
Retouching	Blurring	961	414
	Contrast	966	415
	Total	1927	829
Total of figures		9456	4912

Dataset Breakdown (Compound Forgeries):

Compound Forgeries
		Data Type	Train	Test
Forgery Location		Modality	Number of figures	Number of figures
Source of forgery figures		-	1932	991
Inter-panel	Duplication	Copy–Move	9516	4094
		Splicing	604	274
		Overlap	0	660
		Total	10120	5028
Intra-panel	Duplication	Copy-Move	3761	1629
		Total	3761	1629
	Cleaning	Inpaiting	275	117
		Brute-Force	957	412
		Total	1232	529
	Retouching	Blurring	961	414
		Contrast	966	415
		Total	1927	829
Total of figures			17040	8015

Files

artificial_forgery_src_data.zip

Files (57.3 GB)

Name	Size	Download all
artificial_forgery_src_data.zip md5:73580eaec4b17dd738fe6bed4f66dd67	430.3 MB	Preview Download
template.zip md5:586a84709b349122b3753e0142951a57	85.7 MB	Preview Download
testset.7z md5:c3a4a68b7db4c3d550f3cacbe796e702	19.8 GB	Download
trainset.7z md5:7f3dd5edd3aa52dc0d30889c428938f1	36.9 GB	Download

Additional details

Fundação de Amparo à Pesquisa do Estado de São Paulo
2020/02211-2
Fundação de Amparo à Pesquisa do Estado de São Paulo
2017/12646-3

Accepted: 2022-08-09

Repository URL: https://github.com/phillipecardenuto/rsiil

	All versions	This version
Views	1,478	1,478
Downloads	1,367	1,367
Data volume	119.0 TB	119.0 TB

Related Content:

Citation

artificial_forgery_src_data.zip

Files (57.3 GB)

Funding

Dates

Software

Recod.ai Scientific Image Integrity Dataset (RSIID)

Authors/Creators

Description

Related Content:

Citation

Table of contents

Files

artificial_forgery_src_data.zip

Files (57.3 GB)

Additional details

Funding

Dates

Software