Published December 10, 2025
| Version v4
Software
Open
Artifacts for: Identifying Provenance of Generative Text-to-Image Models
Authors/Creators
Description
Model Provenance Artifacts
This repository contains code for the following which is the core methodology/algorithm of our paper:
1. Generating images across 8 base models, including one fine-tuned model.
2. Computing S2-CLIP features for the generated images.
3. Calculating model distances based on the extracted features.
Dataset
We provide 3000 images for each of the 8 base models (SD2.1, DeepFloyd, SDXL, Kolors, PixArt-Sigma, Hunyuan, FLUX and Sana) and one fine-tuned model (FT-SD2.1 with 100k PCB dataset).
Installation:
Please use Python version `3.10.12` and `pip3 install -r requirements_provenance.txt`
Please also install s2wrapper: `pip3 install git+https://github.com/bfshi/scaling_on_scales.git`
Running Provenance:
1. sh compute-s2-clip-features-for-folders.sh
2. sh run_provenance.sh
3. python3 generate-figure.py
Miscellaneous
Generating Images
We also include scripts to generate images from all diffusion models in the `generate-images-scripts` subfolder.
Fine-tuning
We use HuggingFace's fine-tuning script for SD2.1 and Kohya's SD-Script for fine-tuning FLUX. Please refer to the following two GitHub repositories for setting up fine-tuning:
* https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py
* https://github.com/kohya-ss/sd-scripts
The two datasets we used for fine-tuning are:
* PCB: https://huggingface.co/datasets/bghira/photo-concept-bucket
* LAION-Aesthetics: https://laion.ai/blog/laion-aesthetics/
Files
model-provenance.zip
Files
(34.0 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:a12df0e3c392a6f754ac5039e44dd518
|
34.0 GB | Preview Download |
Additional details
Dates
- Available
-
2025-12-10