Synthetic Building Energy Consumption Dataset – Local Calendar Week Sampler (MODERATE Project)
Description
This dataset contains 50 synthetic years of sub-hourly building energy and indoor environment data, generated from approximately 3 years of real monitored data using the Local Calendar Week Sampler method developed within the MODERATE project (Horizon Europe GA 101069834).
The synthetic generation approach recombines true historical weekly profiles into statistically plausible synthetic ones, preserving seasonal patterns, weekday/weekend variation, and distributional characteristics of the original data, while providing anonymization through resampling, donor diversification, and a percentile-based source-recognizability guardrail.
Each synthetic year covers a full calendar year at 2-hour temporal resolution, with the following variables:
| Column | Unit | Description |
|---|---|---|
timestamp |
— | Date and time (2h intervals) |
el. Energy |
kWh | Electrical energy consumption |
th. Energy |
kWh | Thermal energy consumption |
CO2 |
ppm | Indoor CO₂ concentration |
Temperature |
°C | Indoor air temperature |
WZ warm_water_energy |
kWh | Domestic hot water energy |
ext. Solar Irradiance |
W/m² | External solar irradiance |
ext. Temperature |
°C | External air temperature |
synthetic_year_id |
— | Identifier of the synthetic year (001–050) |
Generation Method
The Local Calendar Week Sampler operates as follows:
1. Historical weeks are validated and flagged for gaps (>3h) or extreme values (>95th percentile).
2. For each target week, a candidate pool of temporally aligned historical weeks is assembled.
3. A primary donor (donor A) provides context variables (CO₂, temperature); a secondary daily donor (donor B) provides energy shape variation.
4. The synthetic profile is constructed as a scaled convex combination of donor profiles, anchored to a candidate-pool median baseline.
5. Weekly totals are rescaled to match a randomly selected week from the candidate pool.
6. A cosine transition layer is applied at week boundaries for temperature and CO₂ to avoid discontinuities.
7. Data is resampled to 2-hour resolution (energy summed, other variables averaged).
Anonymization measures include donor energy contribution capping (10–35%), CO₂ anomaly screening, resampling, and a source-recognizability guardrail that resamples weeks where the original donor remains the closest shape match.
Quality Metrics
Statistical fidelity of the synthetic data relative to historical data was assessed using shape-based metrics across all 50 generated years:
| Metric | Value |
|---|---|
| Mean nearest weekly NRMSE | 2.008 |
| Median nearest weekly NRMSE | 1.605 |
| Weekly close-match rate (p01) | 0.177 |
| Weekly close-match rate (p05) | 0.329 |
| Source-is-nearest-weekly rate | 0.095 |
| Mean source shape correlation | 0.487 |
| Source-is-top-shape-match rate | 0.201 |
| Dominant candidate share | 0.025 |
| Normalized candidate usage entropy | 0.982 |
The high candidate usage entropy (0.982) indicates well-distributed sampling with no dominant source weeks. The source-is-nearest rate of 9.5% is low, indicating effective anonymization. Approximately 40% of synthetic days have a historical near-duplicate below the empirical non-self NRMSE threshold — a known limitation without formal anonymization guarantees.
Seasonal daily profiles and distributions of electrical energy, thermal energy, and domestic hot water energy were validated visually against historical data:
Limitations
No formal anonymization guarantees are provided.
Thermal energy exhibits near-zero values in summer (near-heating-off conditions), which may amplify relative NRMSE in that season.
The dataset does not include the original historical data.
Related Resources
MODERATE project: moderate-project.eu
Grant Agreement No.: 101069834
MODERATE open-source tools: github.com/MODERATE-Project
MODERATE platform: moderate.cloud
Files
local_calendar_representative_year.csv
Files
(41.5 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:30ac468b336b4e59311185ffcacf966c
|
41.5 MB | Preview Download |