Published December 10, 2025 | Version v4
Software Open

Artifacts for: Identifying Provenance of Generative Text-to-Image Models

Authors/Creators

Description

Model Provenance Artifacts

This repository contains code for the following which is the core methodology/algorithm of our paper:

1. Generating images across 8 base models, including one fine-tuned model.
2. Computing S2-CLIP features for the generated images.
3. Calculating model distances based on the extracted features.

Dataset


We provide 3000 images for each of the 8 base models (SD2.1, DeepFloyd, SDXL, Kolors, PixArt-Sigma, Hunyuan, FLUX and Sana) and one fine-tuned model (FT-SD2.1 with 100k PCB dataset).

Installation:


Please use Python version `3.10.12` and `pip3 install -r requirements_provenance.txt`

Please also install s2wrapper: `pip3 install git+https://github.com/bfshi/scaling_on_scales.git`

Running Provenance:


1. sh compute-s2-clip-features-for-folders.sh
2. sh run_provenance.sh
3. python3 generate-figure.py

Miscellaneous


Generating Images

We also include scripts to generate images from all diffusion models in the `generate-images-scripts` subfolder.

Fine-tuning

We use HuggingFace's fine-tuning script for SD2.1 and Kohya's SD-Script for fine-tuning FLUX. Please refer to the following two GitHub repositories for setting up fine-tuning:

* https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py
* https://github.com/kohya-ss/sd-scripts

The two datasets we used for fine-tuning are:

* PCB: https://huggingface.co/datasets/bghira/photo-concept-bucket
* LAION-Aesthetics: https://laion.ai/blog/laion-aesthetics/

Files

model-provenance.zip

Files (34.0 GB)

Name Size Download all
md5:a12df0e3c392a6f754ac5039e44dd518
34.0 GB Preview Download

Additional details

Dates

Available
2025-12-10