PRISM-36K: A Benchmark Dataset for AI-Generated Image Attribution

Ricco, Emanuele; Onofri, Elia; Cima, Lorenzo; Cresci, Stefano; Di Pietro, Roberto

doi:10.5281/zenodo.20065919

Published May 7, 2026 | Version 1.0.1

Dataset Open

PRISM-36K: A Benchmark Dataset for AI-Generated Image Attribution

1. King Abdullah University of Science and Technology
2. National Research Council
3. King Abdullah University of Science and Technology Department of Computer Science

PRISM-36K: A Benchmark Dataset for AI-Generated Image Attribution

PRISM-36K is a benchmark dataset of 36,000 AI-generated images for model-attribution research — the task of identifying which generative model
produced a given image.
It accompanies the paper "PRISM: Phase-enhanced Radial-based Image Signature Mapping for AI-Generated Image Attribution" (Ricco, Onofri, Cima, Cresci, Di Pietro; arXiv:2509.15270).

What is in the dataset

The dataset contains 36,000 PNG images at 512 × 512 pixels, balanced across six text-to-image generators with 6,000 images per model:

DALL-E 2 (Ramesh et al., 2022) — closed, accessed via OpenAI API
FuseDream (Liu et al., 2021) — GAN + CLIP guidance
PixArt-α (Chen et al., 2024) — diffusion transformer
SANA (Xie et al., 2024) — diffusion transformer
Stable Diffusion 1.4 (Rombach et al., 2022) — latent diffusion
VQGAN-CLIP (Esser et al., 2021) — GAN + CLIP guidance

Each generator produces 150 images per prompt over a fixed set of 40 author-written English prompts (20 short + 20 long, paired by topic).
All images are stored in lossless PNG format to preserve frequency-domain artefacts that are critical to spectral attribution methods.

What makes this dataset useful

Prompt-matched generations. The same 40 prompts are issued to every generator, so cross-model differences reflect generator-specific signatures rather than prompt drift.
Architectural diversity. The six generators span GAN-based, CLIP-guided, and transformer-based diffusion families, with both open-weight and closed-API systems represented.
Reproducible splits. 100 random prompt-level train/test splits used in the paper are shipped as splits/splits_100.csv; one canonical "average split" (splits/average_split.json) is provided for direct reproduction of all figures and tables.
Lossless integrity. Every image ships with a SHA-256 hash in checksums/SHA256SUMS (BSD-style, compatible with sha256sum -c) so users can verify their downloads.
Rich metadata. Per-image manifest (metadata/images.csv) and prompt manifest (metadata/prompts.csv) support filtering by model, prompt length, prompt pair, or specific generation iteration.

Repository layout

PRISM-36K/
├── README.md
├── LICENSE.txt
├── CITATION.cff
├── CHANGELOG.md
├── metadata/
│   ├── prompts.csv
│   └── images.csv
├── splits/
│   ├── average_split.json
│   └── splits_100.csv
├── images/
│   ├── DALLE-2/
│   ├── FuseDream/
│   ├── PixArt-alpha/
│   ├── SANA/
│   ├── StableDiffusion-1.4/
│   └── VQGAN-CLIP/
└── checksums/
    └── SHA256SUMS

Image filename convention: <ModelName>_<promptid>_<iter>.png, with promptid ∈ 1..40 and iter ∈ 1..150.

Intended uses

Training and evaluating model-attribution classifiers for AI-generated images.
Benchmarking real vs. fake detectors in a controlled multi-source setting.
Studying frequency-domain and spectral fingerprints of generative models.
Research on content provenance, generative-AI accountability, and related forensic problems.

Companion resources

Paper: arXiv:2509.15270
Image-generation scripts (the code used to produce these images): github.com/emarich/PRISM-36K
PRISM classifier and evaluation code: released upon full paper acceptance.

Licensing

Dataset (images and metadata): Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Note on DALL-E 2 images. The 6,000 images in images/dalle2/ were generated via OpenAI's paid API and are subject to OpenAI's usage policies in addition to CC BY 4.0: users intending to use these images beyond academic research should consult OpenAI's current terms of service.

Note on NVIDIA-SANA images. The 6,000 images in images/sana/ are licensed under the Apache License 2.0 usage policies in addition to CC BY 4.0.

Citing PRISM-36K

If you use this dataset, please cite both the paper and this Zenodo record. BibTeX entries and a CFF citation file are provided in the repository (README.md, CITATION.cff).

Limitations

Closed-set scope. The dataset covers six specific generators; it is not designed to support open-set attribution to unseen models.
English-only prompts authored by the dataset creators; no multilingual or in-the-wild prompts are included.
Synthetic only. No real photographs are included; for real vs. fake benchmarks, real images must be sourced from a complementary dataset.
No identifiable individuals. Prompts were authored to elicit generic scenes (objects, animals, landscapes); the dataset contains no images of identifiable real persons by design.

Files

_teaser.png

Files (14.2 GB)

Name	Size	Download all
_teaser.png md5:3aa38b0c3ffbd1e1749e52fab5c64dfb	2.3 MB	Preview Download
average_split.json md5:78ece11704120acec24282ef5de26297	346 Bytes	Preview Download
changelog.md md5:02c7978972665e90c7c56c14ca97b7d7	2.1 kB	Preview Download
CITATION.cff md5:0526526a53eaa92b2b2e51c34da1da93	2.8 kB	Download
generate_checksums.py md5:a4837fe174805875e8ad8b9adc709b24	10.5 kB	Download
images.csv md5:e1360791bad911db4c808d9a2e5777d7	3.7 MB	Preview Download
images.zip md5:4ab5525b06cecaa652a2eff2fa0805d2	14.2 GB	Preview Download
LICENSE md5:fb5d051e53001fdff7fec0f368f47190	20.8 kB	Download
prompts.csv md5:e4c3b0a0ca641329ce666eef6b715ec1	2.8 kB	Preview Download
README.md md5:1d13ff65b5b7044d949a4226e23ad5a7	8.6 kB	Preview Download
SHA256SUMS md5:75a77e2a0182bd59cc01d723333e87b7	3.4 MB	Download
splits_100.csv md5:f56e03405d8d84241f91987296c848a3	38.1 kB	Preview Download

Additional details

Is supplement to: Preprint: arXiv:2509.15270 (arXiv)
Is supplemented by: Software: https://github.com/emarich/PRISM-36K (URL)
Is version of: Dataset: 10.5281/zenodo.20038953 (DOI)

King Abdullah University of Science and Technology
Center of Excellence on Generative AI 5940

Available: 2026-05-06

Zenodo publication date
Collected: 2025-04-10

Images where generated

Repository URL: https://github.com/emarich/PRISM-36K
Programming language: Python
Development Status: Active

A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, "Hierarchical text-conditional image generation with clip latents," arXiv e-prints, pp. arXiv–2204, 2022
Liu, Xingchao, et al. "Fusedream: Training-free text-to-image generation with improved clip+ gan space optimization." arXiv preprint arXiv:2112.01573 (2021).
Chen, Junsong, et al. "Pixart-$\alpha $: Fast training of diffusion transformer for photorealistic text-to-image synthesis." arXiv preprint arXiv:2310.00426 (2023).
Xie, Enze, et al. "Sana: Efficient high-resolution image synthesis with linear diffusion transformers." arXiv preprint arXiv:2410.10629 (2024).
R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, "High- resolution image synthesis with latent diffusion models," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 10 684–10 695.
M. Li, R. Xu, S. Wang, L. Zhou, X. Lin, C. Zhu, M. Zeng, H. Ji, and S.-F. Chang, "Clip-event: Connecting text and images with event structures," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16 420–16 429, D O I :10.1109/CVPR52688.2022.01593.

	All versions	This version
Views	29	9
Downloads	18	3
Data volume	14.2 GB	13.4 kB

PRISM-36K: A Benchmark Dataset for AI-Generated Image Attribution

What is in the dataset

What makes this dataset useful

Repository layout

Intended uses

Companion resources

Licensing

Citing PRISM-36K

Limitations

_teaser.png

Files (14.2 GB)

Related works

Funding

Dates

Software

References

PRISM-36K: A Benchmark Dataset for AI-Generated Image Attribution

Authors/Creators

Description

PRISM-36K: A Benchmark Dataset for AI-Generated Image Attribution

What is in the dataset

What makes this dataset useful

Repository layout

Intended uses

Companion resources

Licensing

Citing PRISM-36K

Limitations

Files

_teaser.png

Files (14.2 GB)

Additional details

Related works

Funding

Dates

Software

References