Published May 13, 2026 | Version 1.0.0
Dataset Open

Synthetic Building Energy Consumption Dataset – Local Calendar Week Sampler (MODERATE Project)

  • 1. ROR icon Eindhoven University of Technology

Contributors

Project leader:

  • 1. Accademia Europea

Description

This dataset contains 50 synthetic years of sub-hourly building energy and indoor environment data, generated from approximately 3 years of real monitored data using the Local Calendar Week Sampler method developed within the MODERATE project (Horizon Europe GA 101069834).

The synthetic generation approach recombines true historical weekly profiles into statistically plausible synthetic ones, preserving seasonal patterns, weekday/weekend variation, and distributional characteristics of the original data, while providing anonymization through resampling, donor diversification, and a percentile-based source-recognizability guardrail.

Each synthetic year covers a full calendar year at 2-hour temporal resolution, with the following variables:

Column Unit Description
timestamp Date and time (2h intervals)
el. Energy kWh Electrical energy consumption
th. Energy kWh Thermal energy consumption
CO2 ppm Indoor CO₂ concentration
Temperature °C Indoor air temperature
WZ warm_water_energy kWh Domestic hot water energy
ext. Solar Irradiance W/m² External solar irradiance
ext. Temperature °C External air temperature
synthetic_year_id Identifier of the synthetic year (001–050)

Generation Method

The Local Calendar Week Sampler operates as follows:

1. Historical weeks are validated and flagged for gaps (>3h) or extreme values (>95th percentile). 

2. For each target week, a candidate pool of temporally aligned historical weeks is assembled.

3. A primary donor (donor A) provides context variables (CO₂, temperature); a secondary daily donor (donor B) provides energy shape variation.

4. The synthetic profile is constructed as a scaled convex combination of donor profiles, anchored to a candidate-pool median baseline.

5. Weekly totals are rescaled to match a randomly selected week from the candidate pool.

6. A cosine transition layer is applied at week boundaries for temperature and CO₂ to avoid discontinuities.

7. Data is resampled to 2-hour resolution (energy summed, other variables averaged).

 

Anonymization measures include donor energy contribution capping (10–35%), CO₂ anomaly screening, resampling, and a source-recognizability guardrail that resamples weeks where the original donor remains the closest shape match.

Quality Metrics

Statistical fidelity of the synthetic data relative to historical data was assessed using shape-based metrics across all 50 generated years:

Metric Value
Mean nearest weekly NRMSE 2.008
Median nearest weekly NRMSE 1.605
Weekly close-match rate (p01) 0.177
Weekly close-match rate (p05) 0.329
Source-is-nearest-weekly rate 0.095
Mean source shape correlation 0.487
Source-is-top-shape-match rate 0.201
Dominant candidate share 0.025
Normalized candidate usage entropy 0.982

The high candidate usage entropy (0.982) indicates well-distributed sampling with no dominant source weeks. The source-is-nearest rate of 9.5% is low, indicating effective anonymization. Approximately 40% of synthetic days have a historical near-duplicate below the empirical non-self NRMSE threshold — a known limitation without formal anonymization guarantees.

Seasonal daily profiles and distributions of electrical energy, thermal energy, and domestic hot water energy were validated visually against historical data:

Limitations

No formal anonymization guarantees are provided.

Thermal energy exhibits near-zero values in summer (near-heating-off conditions), which may amplify relative NRMSE in that season.

The dataset does not include the original historical data.

Related Resources

MODERATE project: moderate-project.eu

Grant Agreement No.: 101069834

MODERATE open-source tools: github.com/MODERATE-Project

MODERATE platform: moderate.cloud

Files

local_calendar_representative_year.csv

Files (41.5 MB)

Name Size Download all
md5:30ac468b336b4e59311185ffcacf966c
41.5 MB Preview Download

Additional details

Funding

European Commission
MODERATE - Marketable Open Data Solutions for Optimized Building-Related Energy Services 101069834