Published March 1, 2025
| Version 7
Dataset
Open
GoiEner Smart Meter Dataset (Version 7)
Description
This dataset is the result of a multi-stage processing pipeline applied to raw SIMEL (Sistema de Medidas Eléctricas) files. The pipeline involves splitting the raw SIMEL files into user-specific files, generating intermediate raw consumption time series, and finally imputing missing consumption values. The files included in this dataset are the end results of several of the processing scripts. The scripts used to generate these files can be found on GitHub: https://github.com/quesadagranja/GoiEner-v7. Detailed descriptions of each file and their contents are provided below.
-
imputed_goiener_v7.tar.zst
- Content:
This archive contains 28,807 CSV files, each representing an hourly consumption time series after the imputation process has been applied. - Columns (in each CSV file):
• index: A timestamp indicating the date and time in the format “YYYY-MM-DD HH:MM:SS”.
• fl: Indicates whether the timestamp falls within Daylight Saving Time (DST).
• kWh: The energy consumption measured in kilowatt-hours.
• imp: An imputation flag, where 0 indicates an original (non-imputed) value and any nonzero value indicates that the consumption value was imputed.
- Content:
-
simel.tar.zst
- Content:
This archive contains 99,219 raw SIMEL files. These files originate directly from the SIMEL system and include various measurement types such as A5D, B5D, F5D, P5D, RF5D, F1, P1, and P1D. - Note:
The structure and number of columns in these files vary depending on the measurement type. Each file includes columns corresponding to timestamps, flags, input/output values, and additional parameters as defined by the processing scripts.
- Content:
-
raw_goiener_v7.tar.zst
- Content:
This archive comprises 28,807 CSV files that contain the raw consumption time series data before the imputation process. - Columns (in each CSV file):
• dt: The date and time of the measurement in the format “YYYY/MM/DD HH:MM”.
• fl: Indicates whether the timestamp falls within Daylight Saving Time (DST).
• kWh: The measured energy consumption in kilowatt-hours.
- Content:
-
metadata.csv
- Content:
This CSV file includes metadata for consumers and their supply points. - Columns:
• cups: A unique identifier for the supply point (CUPS: Código Universal del Punto de Suministro).
• fecha_alta: The registration date of the supply point.
• fecha_baja: The deregistration date, or “NA” if the supply point is still active.
• p1_kw, p2_kw, p3_kw, p4_kw, p5_kw, p6_kw: Power ratings in kilowatts for different contracts.
• codigo_postal: The postal code.
• cnae: The CNAE code representing the economic activity classification.
• tarifa_atr: The tariff attribution code (if applicable).
- Content:
-
imputed_samples.csv
- Content:
This CSV file provides statistics on the imputation process for each processed consumption file. - Columns:
• fname: The filename of the processed consumption file.
• total_samples: The total number of hourly samples in that file.
• imputed_samples: The number of samples where the consumption value was imputed due to missing data.
• pct: The percentage of samples imputed, calculated as (imputed_samples / total_samples) * 100.
- Content:
Files
imputed_samples.csv
Files
(13.7 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:c27322f357988562afc2fe1d0f6fbb70
|
2.4 GB | Download |
|
md5:858bbbf5570a466456abc99b9f80d1da
|
2.3 MB | Preview Download |
|
md5:54df7e80e5dff0f97e038360f61126e0
|
36.2 MB | Preview Download |
|
md5:ecf0ec7341d1b10b41cf19258742e1f9
|
2.4 GB | Download |
|
md5:dfe5fdb2d91743527e6c61a9d711dba2
|
8.9 GB | Download |
Additional details
Related works
- Has version
- Dataset: 10.5281/zenodo.7859412 (DOI)
- Dataset: 10.5281/zenodo.7362093 (DOI)
- Is described by
- Data paper: 10.1038/s41597-023-02846-0 (DOI)
Software
- Programming language
- Python
- Development Status
- Active