Published March 1, 2025 | Version 7
Dataset Open

GoiEner Smart Meter Dataset (Version 7)

  • 1. ROR icon Universidad de Deusto

Contributors

  • 1. ROR icon Universidad de Deusto

Description

This dataset is the result of a multi-stage processing pipeline applied to raw SIMEL (Sistema de Medidas Eléctricas) files. The pipeline involves splitting the raw SIMEL files into user-specific files, generating intermediate raw consumption time series, and finally imputing missing consumption values. The files included in this dataset are the end results of several of the processing scripts. The scripts used to generate these files can be found on GitHub: https://github.com/quesadagranja/GoiEner-v7. Detailed descriptions of each file and their contents are provided below.

  1. imputed_goiener_v7.tar.zst

    • Content:
      This archive contains 28,807 CSV files, each representing an hourly consumption time series after the imputation process has been applied.
    • Columns (in each CSV file):
      • index: A timestamp indicating the date and time in the format “YYYY-MM-DD HH:MM:SS”.
      • fl: Indicates whether the timestamp falls within Daylight Saving Time (DST).
      • kWh: The energy consumption measured in kilowatt-hours.
      • imp: An imputation flag, where 0 indicates an original (non-imputed) value and any nonzero value indicates that the consumption value was imputed.
  2. simel.tar.zst

    • Content:
      This archive contains 99,219 raw SIMEL files. These files originate directly from the SIMEL system and include various measurement types such as A5D, B5D, F5D, P5D, RF5D, F1, P1, and P1D.
    • Note:
      The structure and number of columns in these files vary depending on the measurement type. Each file includes columns corresponding to timestamps, flags, input/output values, and additional parameters as defined by the processing scripts.
  3. raw_goiener_v7.tar.zst

    • Content:
      This archive comprises 28,807 CSV files that contain the raw consumption time series data before the imputation process.
    • Columns (in each CSV file):
      • dt: The date and time of the measurement in the format “YYYY/MM/DD HH:MM”.
      • fl: Indicates whether the timestamp falls within Daylight Saving Time (DST).
      • kWh: The measured energy consumption in kilowatt-hours.
  4. metadata.csv

    • Content:
      This CSV file includes metadata for consumers and their supply points.
    • Columns:
      • cups: A unique identifier for the supply point (CUPS: Código Universal del Punto de Suministro).
      • fecha_alta: The registration date of the supply point.
      • fecha_baja: The deregistration date, or “NA” if the supply point is still active.
      • p1_kw, p2_kw, p3_kw, p4_kw, p5_kw, p6_kw: Power ratings in kilowatts for different contracts.
      • codigo_postal: The postal code.
      • cnae: The CNAE code representing the economic activity classification.
      • tarifa_atr: The tariff attribution code (if applicable).
  5. imputed_samples.csv

    • Content:
      This CSV file provides statistics on the imputation process for each processed consumption file.
    • Columns:
      • fname: The filename of the processed consumption file.
      • total_samples: The total number of hourly samples in that file.
      • imputed_samples: The number of samples where the consumption value was imputed due to missing data.
      • pct: The percentage of samples imputed, calculated as (imputed_samples / total_samples) * 100.

Files

imputed_samples.csv

Files (13.7 GB)

Name Size Download all
md5:c27322f357988562afc2fe1d0f6fbb70
2.4 GB Download
md5:858bbbf5570a466456abc99b9f80d1da
2.3 MB Preview Download
md5:54df7e80e5dff0f97e038360f61126e0
36.2 MB Preview Download
md5:ecf0ec7341d1b10b41cf19258742e1f9
2.4 GB Download
md5:dfe5fdb2d91743527e6c61a9d711dba2
8.9 GB Download

Additional details

Related works

Has version
Dataset: 10.5281/zenodo.7859412 (DOI)
Dataset: 10.5281/zenodo.7362093 (DOI)
Is described by
Data paper: 10.1038/s41597-023-02846-0 (DOI)

Software

Programming language
Python
Development Status
Active