Published May 31, 2026 | Version 0.1
Dataset Open

SynPop-DE: Synthetic population of 40 million German households using generative neural networks

  • 1. ROR icon Potsdam Institute for Climate Impact Research

Description

This is the data repository for the SynPop-DE (Synthetic Population of Germany) accompanying the publication: "SynPop-DE: Synthetic population of 40 million German households using generative neural networks".

Introduction: Household microdata combining socio-demographic, housing, income and expenditure attributes are a core resource for many studies in quantitative social science, such as modelling the household-level impacts of the energy transition. Yet no such data are openly available for Germany’s full population. SynPop-DE provides a synthetic population of 40,235,916 households and their 81,629,116 members in all 400 German districts, calibrated to the 2022 census, with 34 attributes per household. Synthetic households are generated by estimating the joint attribute distribution of the German Household Budget Survey through a two-stage machine learning architecture. While an autoencoder first compresses high-dimensional categorical data into a continuous latent space, a generative adversarial network subsequently learns to sample new records from this representation. These records are then aligned with census marginals for all German districts using iterative proportional updating to ensure spatial representativeness. Validation along three dimensions confirms that the model learns attribute relationships and generates synthetic households that reproduce the statistical properties of the survey data (fidelity), supports downstream analyses with accuracy comparable to the original survey (utility), and prevents disclosure of individual respondents (privacy). The dataset is openly available at https://synpop.de.

Data: The current data for SynPop-DE can always be found at synpop.de

Code: The code to reproduce the data can be found here: https://gitlab.pik-potsdam.de/metab/synpopde

 

Files

regional_populations.zip

Files (17.5 GB)

Name Size Download all
md5:d92d101e4633b7ea02d91ad825e72b50
8.4 GB Preview Download
md5:56e7e30ccb942533c657af65d25f552e
9.1 GB Download

Additional details

Related works

Is supplement to
Preprint: 10.31235/osf.io/zha8v_v1 (DOI)

Funding

Federal Ministry for Economic Affairs and Climate Action
03EI5248C

Software

Repository URL
https://gitlab.pik-potsdam.de/metab/synpopde
Development Status
Active