FLUXSynID: A Synthetic Face Dataset with Document and Live Images
Description
FLUXSynID: A Synthetic Face Dataset with Document and Live Images
FLUXSynID is a high-resolution synthetic identity dataset containing 14,889 unique synthetic identities, each represented through a document-style image and three live capture variants. Identities are generated using the FLUX.1 [dev] diffusion model, guided by user-defined identity attributes such as gender, age, region of origin, and other various identity features. The dataset is created to support biometric research, including face recognition and morphing attack detection.
File Structure
Each identity has a dedicated folder (named as a 12-digit hex string, e.g., 000e23cdce23) containing the following 5 files:
- 000e23cdce23_f.json — metadata including sampled identity attributes, prompt, generation seed, etc. (_f = female; _m = male; _nb = non-binary)
- 000e23cdce23_f_doc.png — document-style frontal image
- 000e23cdce23_f_live_0_e_d1.jpg — live image generated with LivePortrait (_e = expression and pose)
- 000e23cdce23_f_live_0_a_d1.jpg — live image via Arc2Face (_a = arc2face)
- 000e23cdce23_f_live_0_p_d1.jpg — live image via PuLID (_p = pulid)
All document and LivePortrait/PuLID images are 1024×1024. Arc2Face images are 512×512 due to original model constraints.
Attribute Sampling and Prompting
The attributes/ directory contains all information about how identity attributes were sampled:
- A set of .txt files (e.g., ages.txt, eye_shape.txt, body_type.txt) — each lists the possible values for one attribute class, along with their respective sampling probabilities.
- file_probabilities.json — defines the inclusion probability for each attribute class (i.e., how likely a class such as "eye shape" is to be included in a given prompt).
- attribute_clashes.json — specifies rules for resolving semantically conflicting attributes. Each clash defines a primary attribute (to be kept) and secondary attributes (to be discarded when the clash occurs).
Prompts are generated automatically using Qwen2.5 large language model, based on selected attributes, and used to condition FLUX.1 [dev] during image generation.
Live Image Generation
Each synthetic identity has three live image-style variants:
- LivePortrait: expression/pose changes via keypoint-based retargeting
- Arc2Face: natural variation using identity embeddings (no prompt required)
- PuLID: identity-aware generation using prompt, embedding, and edge-conditioning with a customized FLUX.1 [dev] diffusion model
These approaches provide both controlled and naturalistic identity-consistent variation.
Filtering and Quality Control
Included are 9 supplementary text files listing filtered subsets of identities. For instance, file similarity_filtering_adaface_thr_0.333987832069397_fmr_0.0001.txt contains identities retained after filtering out overly similar faces using AdaFace FRS under the specified threshold and false match rate (FMR).
Usage and Licensing
This dataset is licensed under the Creative Commons Attribution Non Commercial 4.0 International (CC BY-NC 4.0) license.
You are free to use, share, and adapt the dataset for non-commercial purposes, provided that appropriate credit is given.
The images in this dataset were generated using the FLUX.1 [dev] model by Black Forest Labs, which is made available under their Non-Commercial License. While this dataset does not include or distribute the model or its weights, the images were produced using that model.
Users are responsible for ensuring that their use of the images complies with the FLUX.1 [dev] license, including any restrictions it imposes.
Acknowledgments
The FLUXSynID dataset was developed under the EINSTEIN project. The EINSTEIN project is funded by the European Union (EU) under G.A. no. 101121280 and UKRI Funding Service under IFS reference 10093453. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect the views of the EU/Executive Agency or UKRI. Neither the EU nor the granting authority nor UKRI can be held responsible for them.
Files
FLUXSynID.zip
Files
(11.7 GB)
Name | Size | Download all |
---|---|---|
md5:af90cbb852c21c5ac3e5361f3885258d
|
11.7 GB | Preview Download |
Additional details
Related works
- References
- arXiv:2407.03168 (arXiv)
- arXiv:2404.16022v2 (arXiv)
- arXiv:2403.11641v2 (arXiv)
- https://github.com/black-forest-labs/flux (Other)
- arXiv:2412.15115 (arXiv)
Funding
Software
- Repository URL
- https://github.com/Raul2718/FLUXSynID
References
- Guo, J., Zhang, D., Liu, X., Zhong, Z., Zhang, Y., Wan, P., & Zhang, D. (2025). LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control. arXiv:2407.03168. https://arxiv.org/abs/2407.03168
- Guo, Z., Wu, Y., Chen, Z., Chen, L., Zhang, P., & He, Q. (2024). PuLID: Pure and Lightning ID Customization via Contrastive Alignment. arXiv:2404.16022. https://arxiv.org/abs/2404.16022
- Papantoniou, F. P., Lattas, A., Moschoglou, S., Deng, J., Kainz, B., & Zafeiriou, S. (2024). Arc2Face: A Foundation Model for ID-Consistent Human Faces. arXiv:2403.11641. https://arxiv.org/abs/2403.11641
- Qwen et al. (2025). Qwen2.5 Technical Report. arXiv:2412.15115. https://arxiv.org/abs/2412.15115
- Black Forest Labs. (2024). FLUX. GitHub. https://github.com/black-forest-labs/flux