Hierarchical Demand Forecasting Benchmark for the Distribution Grid
This dataset contains power measurements and meteorological forecasts relative to a set of 24 power meters located in Rolle (Switzerland). These datasets are published to provide a standard benchmark for evaluating forecasting algorithms for demand side management applications.
In L.Nespoli, V. Medici, K. Lopatichki, F. Sossan, Hierarchical Demand Forecasting Benchmark forthe Distribution Grid, arXiv, 2019, this dataset is used to test several regressors in predicting the 24 hours ahead electrical load.
This dataset consists of measurements coming from 62 IEC 61000-4-30 Class A power quality meters manufactured by DEPsys (Switzerland) installed in secondary substations and LV cabinets of the distribution grid of the city of Rolle (Switzerland). The dataset has been enriched with numerical weather predictions from commercial provider Meteoblue (Switzerland), updated every 12 hours.
The power measurements are provided as a pickle dataset, which includes:
For each phase:
- mean active and reactive power
- voltage magnitude
- maximum total harmonic distortion (THD)
- voltage frequency \(\omega\)
- the average power over the three phases.
The latter one has been used as target variable in the aforementioned paper.
The meteorological forecasts are provided as a Hierarchical Data Format 5 file, which includes:
- global horizontal and normal irradiance (GHI and GNI, respectively)
- relative humidity (RH)
- wind speed and direction.
How to read the files with Python
The following code allows to open the files in python
import pandas as pd nwp_data = pd.read_hdf("nwp_data.h5","df") power_data = pd.read_pickle("power_data.p")
The nwp_data is a pandas DataFrame of arrays. Each column, whose name is self explanatory, represent a set of 24 hours forecasted meteorological variable. These represents the most recent forecasts available from the NWP service at the respective time index of the dataset.
The power_data is a dict of pandas DataFrame. The each value of the dict, whose key is self explanatory, contains a DataFrame whose columns are the name of the meter they refers to. The DataFrame 'P_mean' additionally contains 6 fictitious aggregations of the phase-mean power of the meters, 'S1', 'S2', 'S11', 'S12', 'S21', 'S22', and 'all', which represents the sum of all the meters. The hierarchical structure of the aggregations is the following one:
all __|__ | | S1 S2 _|_ _|_ | | | | S11 S12 S21 S22
S11 contains the first quarter of the time series presented in the dataset, while S12,S21,S22 contain the second, third and fourth quarter of the time series, respectively.
Additionally to this, the reference paper also considered the following vacation days:
The reduced_dataset.pk contains a reduced version of the dataset, with just the mean active power and the GHI and T variables for the NWP forecasts.
This project is carried out within the frame of the Swiss Centre for Competence in Energy Research on the Future Swiss Electrical Infrastructure (SCCER-FURIES) with the financial support of the Swiss Innovation Agency (Innosuisse - SCCER program) and of the Swiss Federal Office of Energy with the project SI/501523.
- L.Nespoli, V. Medici, K. Lopatichki, F. Sossan, Hierarchical Demand Forecasting Benchmark forthe Distribution Grid, arXiv, 2019