Artifacts for: Identifying Provenance of Generative Text-to-Image Models

Anonymous, Anonymous

doi:10.5281/zenodo.17870201

Published December 10, 2025 | Version v4

Software Open

Artifacts for: Identifying Provenance of Generative Text-to-Image Models

Anonymous, Anonymous

Model Provenance Artifacts

This repository contains code for the following which is the core methodology/algorithm of our paper:

1. Generating images across 8 base models, including one fine-tuned model.

2. Computing S2-CLIP features for the generated images.

3. Calculating model distances based on the extracted features.

Dataset

We provide 3000 images for each of the 8 base models (SD2.1, DeepFloyd, SDXL, Kolors, PixArt-Sigma, Hunyuan, FLUX and Sana) and one fine-tuned model (FT-SD2.1 with 100k PCB dataset).

Installation:

Please use Python version `3.10.12` and `pip3 install -r requirements_provenance.txt`

Please also install s2wrapper: `pip3 install git+https://github.com/bfshi/scaling_on_scales.git`

Running Provenance:

1. sh compute-s2-clip-features-for-folders.sh

2. sh run_provenance.sh

3. python3 generate-figure.py

Miscellaneous

Generating Images

We also include scripts to generate images from all diffusion models in the `generate-images-scripts` subfolder.

Fine-tuning

We use HuggingFace's fine-tuning script for SD2.1 and Kohya's SD-Script for fine-tuning FLUX. Please refer to the following two GitHub repositories for setting up fine-tuning:

* https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py

* https://github.com/kohya-ss/sd-scripts

The two datasets we used for fine-tuning are:

* PCB: https://huggingface.co/datasets/bghira/photo-concept-bucket

* LAION-Aesthetics: https://laion.ai/blog/laion-aesthetics/

Files

model-provenance.zip

Files (34.0 GB)

Name	Size
model-provenance.zip md5:a12df0e3c392a6f754ac5039e44dd518	34.0 GB	Preview Download

Additional details

Available: 2025-12-10

	All versions	This version
Views	377	134
Downloads	136	35
Data volume	1.8 TB	1.4 TB

Artifacts for: Identifying Provenance of Generative Text-to-Image Models

Authors/Creators

Description

Model Provenance Artifacts

Dataset

Installation:

Running Provenance:

Miscellaneous

Generating Images

Fine-tuning

Files

model-provenance.zip

Files (34.0 GB)

Additional details

Dates