Published July 14, 2023 | Version V2.0
Dataset Open

The Digital Forensics 2023 dataset - DF2023

  • 1. Austrian Institute of Technology

Description

For a detailed description of the DF2023 dataset, please refer to:

  @inproceedings{Fischinger2023DFNet,
      title={DF2023: The Digital Forensics 2023 Dataset for Image Forgery Detection},
      author={David Fischinger and Martin Boyer},
      journal={The 25th Irish Machine Vision and Image Processing conference. (IMVIP)},
      year={2023}
  }

DF2023 is a dataset for image forgery detection and localization. The training and validation datasets contain 1,000,000/5,000 manipulated images (and the ground truth masks).

The DF2023 training dataset comprises:

  • 100K forged images produced by removal (inpainting) operations
  • 200K images produced by enhancement modifications
  • 300K copy-move manipulated images and
  • 400K spliced images

 

=== Naming convention ===

The naming convention of DF2023 encodes information about the applied manipulations. Each image name has the following form:

COCO_DF_0123456789_NNNNNNNN.{EXT} (e.g. COCO_DF_E000G40117_00200620.jpg)

After the identifier of the image data source ("COCO") and the self-reference to the Digital Forensics ("DF") dataset, there are 10 digits as placeholders for the manipulation. Position 0 defines the manipulation types copy-move, splicing, removal, enhancement ([C,S,R,E]). The following digits 1-9 represent donor patch manipulations. For positions [1,2,7,8] (resample, flip, noise and brightness), a binary value indicates if this manipulation was applied to the donor image patch. Position 3 (rotate) indicates by the values 0-3 if the rotation was executed by 0, 90, 180 or 270 degrees. Position 4 defines if BoxBlur (B) or GaussianBlur (G) was used. Position 5 specifies the blurring radius. A value of 0 indicates that no blurring was executed. Position 6 indicates which of the Python-PIL contrast filters EDGE ENHANCE, EDGE ENHANCE MORE, SHARPEN, UnsharpMask or ImageEnhance (values 1-5) was applied. If none of them was applied, this value is set to 0. Finally, position 9 is set to the JPEG compression factor modulo 10, a value of 0 indicates that no JPEG compression was applied. The 8 characters NNNNNNNN in the image name template stand for a running number of the images.

 

=== Terms of Use / Licence ===

The DF2023 dataset is based on the MS COCO dataset. Therefore, rules for using the images form MS COCO apply also for DF2023:

Images

The COCO Consortium does not own the copyright of the images. Use of the images must abide by the Flickr Terms of Use. The users of the images accept full responsibility for the use of the dataset, including but not limited to the use of any copies of copyrighted images that they may create from the dataset.

Notes

This work was co-funded by the European Union, Project 101083573 — GADMO ------- Please cite: Fischinger, D. and Boyer, M. (2023). DF2023: The digital forensics 2023 dataset for image forgery detection. IMVIP - 25th Irish Machine Vision and Image Processing conference.

Files

DF2023_train.zip

Files (15.0 GB)

Name Size Download all
md5:fec048d1f4eb9be7d733ecfe2403d89c
14.9 GB Preview Download
md5:9fdd81a6007d944c422d75ee52d870c1
75.2 MB Preview Download

Additional details

References

  • Fischinger, D. and Boyer, M. (2023). DF2023: The digital forensics 2023 dataset for image forgery detection. IMVIP - 25th Irish Machine Vision and Image Processing conference.
  • Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollar, P., and Zitnick, L. (2014). Microsoft coco: Common objects in context. In ECCV. European Conference on Computer Vision.