Published September 7, 2021 | Version 1.0
Dataset Open

A computational intelligence approach to predict energy demand using Random Forest in a Cloudera cluster

  • 1. Intelligent Data Analysis Group (DATAi), Pablo de Olavide University

Description

Society’s energy consumption has shot up in recent years, making the prediction of its demand a current challenge to ensure an efficient and responsible use. Artificial intelligence techniques have proven to be potential tools in handling tedious tasks and making sense of large-scale data to make better business decisions in different areas of knowledge. In this article, the use of random forests algorithms in a Big Data environment is proposed for households energy demand forecasting. The predictions are based on the use of information from different sources, confirming a fundamental role of socioeconomic data in consumer’s behaviours. On the other hand, the use of Big Data architectures is proposed to perform horizontal and vertical scaling of the solution to be used in real environments. Finally, a tool for high-resolution predictions with great efficiency is introduced, which enables energy management in a very accurate way.

Raw data is incuded in data.csv. This file contains half hourly home electricity consumption registers for 4404 households with fix tariffs (not subject to dynamic time of use) for a period between November 2011 and February 2014. Original information was acquired from the Low Carbon London project led by UK Power Networks (https://data.london.gov.uk/dataset/smartmeter-energy-use-data-in-london-households)

RFResults.zip contains the energy predictions for each ACORN group using the generated Random Forest algorithm. For this purpose, the first 613 days of a total of 818 observations of each group were considered for training and the last 205 days for testing.

Meteorological data was adquired from the darksky app (https://darksky.net). These data are included in the weather_hourly_darksky.csv

uk_bank_holidays. xlsx contains the dated of UK bank holidays for the studied period, used as additional variable related to occupancy

Notes

Funding: This research was partially supported by the Ministry of Economy and Competitiveness, project TIN2015-64776-C3-2-R, and by the Junta de Andalucia, under the Andalusian Plan for Research, Development and Innovation, TIC-239.

Files

data.csv

Files (11.6 GB)

Name Size Download all
md5:01db98e3426967032c78adf7bb925520
11.6 GB Preview Download
md5:8fa63d3e2a58dd1fdc559e525216c16e
3.1 MB Preview Download
md5:e20802cda60aa9e2bd1026867b8ae7f1
22.4 kB Download
md5:050f434480b19e1d0a69e0b5d3500bbd
2.0 MB Preview Download