Dataset and Code for 'Feature Extraction-based Clustering Selection Methodology to Identify Representative Buildings for Scalable Energy Simulations'

Published August 27, 2025 | Version v1

Dataset Open

This repository contains the anonymized dataset and reproducible code accompanying the paper:

“Feature extraction-based clustering selection methodology to identify representative buildings for scalable energy simulations.”

The dataset includes:

epc_metadata.csv: EPC-based building attributes (energy performance value, weighted U-value, air leakage, heat recovery)
dh_merged_kwh_per_m2.csv: hourly district heating consumption, normalized per m²
outdoor_temp_2023.csv: hourly outdoor temperature data (2023)
features_epc_only_standardized.csv: standardized feature table used for clustering
data_dictionary.xlsx: description of all variables

The code provides a minimal pipeline to reproduce the main results of the paper:

Expected outputs (e.g., validation_metrics.csv, agreement_metrics.csv) reproduce the main tables in the manuscript.

Anonymization: all identifiers (school names, locations, addresses) are removed. Only numerical features are included.

Licenses:

Related resources:
The GitHub repository with the same reproducible pipeline is available at:
https://github.com/hatefh/zenodo_code_bundle_epc_clustering_paper

Files

Name	Size	Download all
zenodo_code_bundle_v1.0.0.zip md5:f97519d6d53cc78627dd1f000bcbb934	1.3 MB	Preview Download