# AgeSynth Synthetic Aging Dataset

This repository contains the data accompanying the paper:

> **Identity-Preserving Aging and De-Aging of Faces in the StyleGAN Latent Space**  
> Luis S. Luevano, Pavel Korshunov, Sébastien Marcel  
> *IJCB 2025*  
> [`https://www.idiap.ch/paper/agesynth/`](https://www.idiap.ch/paper/agesynth/)

The dataset builds on auxiliary synthetic ID seeds produced with the **Synthetics-Disco** project:

> **Synthetic Face Datasets Generation via Latent Space Exploration from Brownian Identity Diffusion**  
> David Geissbühler, Hatef Otroshi Shahreza, Sébastien Marcel  
> *ICML 2025*  
> [`https://www.idiap.ch/paper/synthetics-disco/`](https://www.idiap.ch/paper/synthetics-disco/)

The AgeSynth dataset therefore provides *identity-preserving* aging and de-aging trajectories for 20k synthetic identities generated with the Langevin algorithm of Synthetics-Disco.

To synthesize age-modified images yourself or reproduce the AgeSynth dataset from scratch, please use the open-source code released with the paper on our project website: [`https://www.idiap.ch/paper/agesynth/`](https://www.idiap.ch/paper/agesynth/).

---

## 1. Folder layout

```
agesynth/
│ README.md               ← **you are here**
│
├── langevin-10k-iresnet50/   ← 10k synthetic IDs generated with iResNet-50 FR backbone
│   ├── samples.h5            ← Reference latent vectors & metadata in h5py format
│   ├── agesynth.h5           ← Aged latents for the 10k iResNet-50 identities
│   └── command.yml           ← YAML file with full CLI parameters used by *synthetics-disco* to generate this subset
│
└── langevin-10k-edgeface-s/  ← 10k synthetic IDs generated with EdgeFace-S FR backbone
    ├── samples.h5            ← Reference latent vectors & metadata in h5py format
    └── agesynth.h5           ← Aged latents for the 10k EdgeFace-S identities
    └── command.yml          ← Generation parameters for the EdgeFace-S subset
```

*Each* `samples.h5` file contains the **reference latent vector** for every identity produced with the Langevin procedure (no age editing yet).  
THe aged latents are provided in the **per-folder `agesynth.h5` files**.

Each subset folder additionally includes a **`command.yml`** file that captures the exact command-line parameters passed to the [`synthetics-disco`](https://github.com/idiap/synthetics-disco) package.  The YAML stores two command blocks:

* `generate-database` – arguments for creating the initial StyleGAN2 image database and embeddings.
* `create-references` – Langevin diffusion parameters used to sample identity-preserving reference latents.

These files allow you to reproduce the synthetic IDs end-to-end or modify the configuration for your own experiments.

---

## 2. Data format

All files use the [HDF5](https://www.hdfgroup.org/solutions/hdf5/) container format and can be opened with `h5py`, `PyTables`, or similar libraries.

### 2.1 `samples.h5`

Schema (per identity):

```
identity_id/              (HDF5 group)
├── reference
    ├── embedding      : float32[512]   # Normalised FR embedding (same backbone that guided Langevin)
    └── w_latent            : float32[512]   # W-space latent (StyleGAN2)
```

Global attributes:

* `generator`  – StyleGAN2 checkpoint SHA or path
* `backbone`   – Face Recognition backbone used for Langevin (`iresnet50` or `edgeface-s`)
* `n_identities`
* `version`

### 2.2 `agesynth.h5`

Each `agesynth.h5` file (one per backbone folder and one merged copy at the repository root) stores the **aging / de-aging trajectories** created with the method described in the AgeSynth paper.

Hierarchy:

```
<sample_id>/                             # e.g. "000123"
└── <floating_point_step>/               # e.g. "-0.54" constatant scalar step in the latent space
    └── <stylegan_scalar>/               # scalar applied to reference vector along age direction
        ├── w_latent     : float32[512]  # Edited W-space vector
        └── w_latent_age_svr  : float32  # Estimated age (years) from SVR
```

* **`sample_id`** – Unique string identifying the synthetic identity (matches the keys in `samples.h5`).
* **`floating_point_step`** – constant step used to move the latent.
* **`stylegan_scalar`** – Exact scalar in the latent space summed to the origina latent along the learned age direction in W-space.
* **`latent`** – Resulting 512-D W-space latent vector.
* **`latent_age`** – Predicted age of the latent, obtained via the Support Vector Regression (SVR) model

---

## 3. Citation

If you use **AgeSynth** or the underlying **Synthetics-Disco** identities in your research, please cite **both** works:

```bibtex
@article{luevano2025identity,
  title={Identity-Preserving Aging and De-Aging of Faces in the StyleGAN Latent Space},
  author={Luevano, Luis S. and Korshunov, Pavel and Marcel, S{\'e}bastien},
  journal={International Joint Conference on Biometrics (IJCB)},
  year={2025}
}

@article{geissbuhler2024synthetic,
  title={Synthetic Face Datasets Generation via Latent Space Exploration from Brownian Identity Diffusion},
  author={Geissb{"u}hler, David and Shahreza, Hatef Otroshi and Marcel, S{\'e}bastien},
  journal={arXiv preprint arXiv:2405.00228},
  year={2024}
}
```

---

## 4. License & ethical usage

The dataset is released for **non-commercial research and educational purposes only** under the Creative Commons Attribution Non Commercial Share Alike 4.0 International license.

---

