Dataset and Code for 'Feature Extraction-based Clustering Selection Methodology to Identify Representative Buildings for Scalable Energy Simulations'
Contributors
Other:
Researcher:
Supervisor (2):
Description
This repository contains the anonymized dataset and reproducible code accompanying the paper:
“Feature extraction-based clustering selection methodology to identify representative buildings for scalable energy simulations.”
The dataset includes:
- epc_metadata.csv: EPC-based building attributes (energy performance value, weighted U-value, air leakage, heat recovery)
-
dh_merged_kwh_per_m2.csv: hourly district heating consumption, normalized per m²
-
outdoor_temp_2023.csv: hourly outdoor temperature data (2023)
-
features_epc_only_standardized.csv: standardized feature table used for clustering
-
data_dictionary.xlsx: description of all variables
The code provides a minimal pipeline to reproduce the main results of the paper:
-
Load and preprocess data
-
Extract EPC-based features
-
Run clustering (K-Medoids, Agglomerative, GMM)
-
Compute validation metrics and agreement indices
Expected outputs (e.g., validation_metrics.csv, agreement_metrics.csv) reproduce the main tables in the manuscript.
Anonymization: all identifiers (school names, locations, addresses) are removed. Only numerical features are included.
Licenses:
-
Code: MIT License
-
Data: Creative Commons Attribution 4.0 International (CC BY 4.0)
Related resources:
The GitHub repository with the same reproducible pipeline is available at:
https://github.com/hatefh/zenodo_code_bundle_epc_clustering_paper
Files
zenodo_code_bundle_v1.0.0.zip
Files
(1.3 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:f97519d6d53cc78627dd1f000bcbb934
|
1.3 MB | Preview Download |