Short term load forecasting based on deep learning for smart grid applications

. Short term load forecasting is indispensable for industrial, commercial, and residential smart grid (SG) applications. In this regard, a large variety of short term load forecasting models have been proposed in literature spaning from legacy time series models to contemporary data analytic models. Some of these models have either better performance in terms of accuracy while others perform well in convergence rate. In this paper, a fast and accurate short term load forecasting framework based on stacked factored conditional restricted boltzmann machine (FCRBM) and conditional restricted boltzmann machine (CRBM) is presented. The stacked FCRBM and CRBM are trained using rectiﬁed linear unit (RelU) and sigmoid functions, respectively. The proposed framework is applied to oﬄine demand side load data of US utility. Load forecasts decide weather to increase or decrease the generation of an already running generator or to add extra units or exchange power with neighboring systems. Three performance metrics i.e., mean absolute percentage error (MAPE), normalized root mean square (NRMSE), and correlation coeﬃcient are used to validate the proposed framework. The results show that stacked FCRBM and CRBM are accurate and robust as compared to artiﬁcial neural network (ANN) and convolutional neural network (CNN).


Introduction
The energy demand is increasing rapidly with the increase in population. The current net energy demand of commercial and residential sector is about 40%-50% in developed countries of the world [1]. In order, to meet this increasing energy demand the entire globe is transitioning from traditional grid to smart grid (SG). The SG has the ability to forecast, monitor, plan, schedule, and make real time decisions regarding generation and consumption of energy. One of the features of SG is advanced metering infrastructure which realized the active participation of both utility and consumers in electricity market. The goals and objectives of SG are to improve efficiency and effective utilization of electricity framework. Modeling and forecasting load realized the achievement these objectives.
For SG distribution side performance optimization a proper decision making is necessary which leads to reduce cost, alleviate peaks, and power losses. The current research keeping the aforementioned objectives in mind perform power scheduling using optimization techniques [2], [3]. However, prior to optimization techniques based scheduling, load forecasting is necessary for efficient energy management. Typically, load forecasting have three categories: a) short term load forecast, ranging from hours to week, b) medium term forecast, for a duration of weeks to year, and c) long term forecast, for a duration more than one year. Especially, the focus of this work is on short term load forecasting, it is a challenging task due to stochastic and non-linear consumption behavior of consumers. Many short term load forecasting models have been proposed in [4]- [11]. However, these models have either accuracy or convergence rate problems. In [12], authors used ANN based forecasting model to reduce the error performance and improve the forecast accuracy. However, while improving forecast accuracy the computational complexity and execution time is compromised due to tradeoff between accuracy and convergence rate.
The contribution of this paper is twofold, first, we propose a new way to adopt stacked factored conditional restricted boltzmann machine (FCRBM) model with aim to learn the non-linear and stochastic electrical patterns from offline data. Second, a short term load forecasting model is proposed based on the stacked FCRBM and CRBM to improve the relative forecast accuracy. The proposed model is validated by comparing with existing short term load forecasting models based on ANN and CNN in terms in terms of mean absolute percentage error (MAPE), normalized root mean square error (NRMSE), and correlation coefficient. We adopt a moduler strategy for short term load forecasting in which the output of each former module is fed into the later module. In short, our system model comprises of three modules data processing and feature extraction module, deep learning-based training module, and deep learning-based forecasting module.
The organisation of the paper is as follows: In Section 2, we introduce proposed methods. Section 3 demonstrates proposed architecture. In Section 4, simulation results are provided. Finally, the paper is concluded in Section 5.

Proposed methods
In this section, we introduce deep learning techniques CRBM and stacked FCRBM for short term load forecasting. For the aforementioned deep learning techniques, we describe three ingredients i.e., error function, conditional probability, and update/learning rules. Error function of a given network provides scalar values that are essential for the configuration. Conditional probability calculates the probability of an event over specific condition. Update/learning rules are required for tuning of the free parameters.

CRBM
CRBM [13] is an amendment in the RBM [14]. It is a probabilistic model used to model human activities, weather data, collaborative filtering, classification, and time series data [15]. In this paper, we use CRBM having three layers visible layer, hidden layer, and conditional history layer. The generic infrastructure of CRBM is shown in Figure 1. The detailed description of CRBM three ingredients such as error function, conditional probability, and learning rules are as follows.
Error function: The error function express the possible correlations between input, conditional history layer, hidden layer, and output. The error function is calculated as: where v = [v 1 , v 2 , ....., v n ] is the real valued vector having visible unit neurons from 1 to n neuron, u = [u 1 , u 2 , ....., u n ] shows the real valued vector having history neurons from 1 to n, h = [h 1 , h 2 , ....., h n ] denotes the binary vector having hidden neurons from 1 to n, w is the weight matrix, a is the visible layer bias, and b is the hidden layer bias. The weight matrix w vh is bidirectional while the weight matrices w uh and w uv are unidirectional.
Conditional probability: Conditional probability in CRBM determines the probability distribution over two inferences. First inference p(h/v, u), is to determine the probability of hidden layer inferenced on all the layers, while the second inference p(v/h, u), is to determine the probability of visible layer conditioned on all the other layers. The two inferences are leading to: Weights and biases learning and update rules: We use a stochastic gradient decent method for learning and updating the weights and biases of the layers because other methods some time have the problem of vanishing gradient which made the network hard to train. The parameters are fine tuned by maximizing the probability function and the weights and biases metrics are updated to minimize the gap between real and forecasted value. The weights are updated as follows: The biases are updated by the following Equation: where η is the learning rate and t is the iteration number. The aforementioned procedure is repeated for the number of epochs until to converge the model.

Stacked FCRBM
FCRBM is an extension of the CRBM introduced by Taylor and Hinton in [15]. In FCRBM [16], they add the concept of factor and styles to mimic multiple human actions. We propose a new way to adopt deep learning technique i.e., stacked FCRBM for short term load forecasting, where the successive layers take output from the previous trained layers to overcome the problem of overfitting and to improve the forecast accuracy. The stacked FCRBM comprised of four layers as shown in Figure  The visible and history layers are real valued while the hidden layer is binary. The visible layer is responsible for encoding the present time series data to forecast the future value, while the history layer will encode historical time series data. Hidden layer is responsible for the discovery of significant features required for analysis. The different styles and parameters essential for forecasting are embedded into the style layer. The relation and interaction between the layers, weights, and factors is expressed by an error function as: where E is the error function, v T w v is the visible factored, y T w y is the style factored, and h T w h is the hidden factored. It () • () is the product known as hadamard product, in which the product operation is element wise. The elementŝ a andb represent dynamic biases for visible and hidden layers, respectively, which are defined as follows:â where w v , w y , w h are weights of the corresponding layers and A v , A u , A y , B h , B u , B y are the connections of the corresponding layers to factors, are known as model free parameters.

Conditional probability
In case of stacked FCRBM, conditional probability determines probability distribution of one layer conditioned over all the remaining layers. In first case, we define probability distribution of hidden layer conditioned over all the remaining layers p (h|v, u, y). The restriction is that there is no intra-connection between the neurons in any layer while there is inter-connection between the neurons of different layers. The conditional probability of hidden layer can be calculated as: For all inputs probability of hidden layer neurons is evaluated using rectified linear unit (RelU) activation function.
Finally, we determine the probability of the visible layer conditioned on all the reaming layers such as history, hidden, and style layers p (v|h, u, y). The visible layer probability is defined as:

Stacked FCRBM weights and biases learning rules
We adopt stochastic gradient decent for learning and updating rules because to overcome the problem of vanishing gradient. Moreover, the stochastic gradient decent on a large dataset converge faster and avoid overfitting as compared to mini-batch. The weights of corresponding layers are updated as: The connections and biases are updated as follows: 3 Proposed architecture In this section, we introduce our proposed fast and accurate short term load forecasting model based on deep learning techniques such as stacked FCRBM and CRBM as shown in Figure 3. We adopt the modular strategy for short term load forecasting based on deep learning techniques, in which the output of each former module is fed into the later module. In short, our system model consists of three modules data processing and feature extraction module, deep learningbased training module, and deep learning-based forecasting module. The detailed description is as follows:

Data processing and feature extraction module
First the twenty zones historical data of US utility consisting of hourly load and weather data is taken from the Kaggle repository. This data is given as an input to the data processing and feature extraction module. The three data operations: cleansing, normalization, and structuring are performed on the received data. The cleansing operation is performed in order to replace the missing and defective values by the mean of the previous values. After cleansing, the data is normalized in order to reduce and eliminate the redundancy. Moreover, the data has large values, the normalization is performed to make the weighted sum within in the limits of the sigmoid function. At the end, we denormalize the data to achieve the desired load predictions. After cleansing and normalization, the data is structured in ascending or descending order. The desired features from the dataset is extracted by feature extraction process and finally the data is split into training and testing dataset.

Training module
Deep learning techniques such as CRBM and stacked FCRBM based training is the main part of this architecture. These techniques are trained with the training data, they learn non-linear relationship between demand load profile and historical observations. The output of the data processing and feature extraction module is given as an input to the training module. This module takes the training data and chose one of the models CRBM or stacked FCRBM for training. If the chosen model is stacked FCRBM, training module will train the stacked FCRBM using RelU activation function because it overcomes the problems of vanishing gradient and curse of dimensionality. If the model chosen for training is CRBM training module will train it with sigmoidal activation function. In this way, the deep learning-based training module is enabled via learning to forecasts the future load.

Forecasting module
The output of the training module is fed into the forecasting module on the basis of trained deep learning models, the forecasting module forecasts the future load. The accuracy of the proposed deep learning techniques is evaluated in terms of MAPE, NRMSE, and correlation coefficient using the testing data. The forecasted results are used for SG applications such as power generation planning, economic operation and unit commitment, power system maintenance and planning, load switching, power purchasing, demand side management, and contract evaluation.

Simulation results
In this section, simulation results of our proposed fast and accurate short term load forecasting model based on deep learning techniques such as stacked FCRBM and CRBM are presented. To validate accuracy of the proposed short term load forecasting model based on deep learning technique i.e., stacked FCRBM is compared with CRBM, ANN, and CNN in terms of performance metrics MAPE, NRMSE, and correlation coefficient. The detailed description is as follows: First, we define the forecast accuracy, in terms of NRMSE as: where τ is the number of steps forecasted in future, R t represents the real value, and F t is the forecasted value. Second, we present MAPE performance metric to get statistical significance for accuracy assessment. For minimum MAPE accuracy will be maximum and vice versa. The MAPE is calculated as: Finally, the correlation coefficient is implemented to check the accuracy that how close is the forecasted value to real value and is defined by the Equation 16. The correlation coefficient returns a value between -1 and 1. If the returned value is close to 1 the real and forecasted values are positively correlated, if the returned value is closed to -1 shows that the real and forecasted values are negatively correlated, and if zero is returned then real and forecasted values are not correlated.
where E is the expected value operator, µ R is the mean of real values, and µ F is the mean of forecasted value.

Historical load data
The historical data is taken from publicly available repository Kaggle of global energy forecasting competition 2012 [17]. The dataset consists of hourly load (kW) of twenty zones of US utility and temperature of eleven stations. This historical dataset (load and weather) is of four years ranges from 1 st hour of 1/1/2004 to 6 th hour of 30/6/2008. The dataset is divided into training and testing data. The three years data is used to train the network and one year data is used to test the network. In summer during daytime there is a significant load increase of consumers as compared to night time. In winter during daytime there is a slight increase in the load as compared to the night. The electricity consumption from 2004 to 2008 maximum and minimum 540393 kWh and 149 kWh.

Learning curve
Learning curve describes the error rate across the number of epochs. It examines the difference between the training and testing data, when the gap between the training and testing data is more the forecasting results will be in accurate and vice versa. Generally, at some point where test error starts to increase as compared to training error, this simply means that overfitting is occurred. In such situation the model memorizes the given training data rather than learning and the forecasted results will be inaccurate. This problem is solved by several methods such as drop out and early stopping. However, we observed learning curves of both deep learning techniques stacked FCRBM and CRBM, we did not notice overfitting because test error deceases as training error does as shown in Figure 4. In this situation, the network is learning rather than memorizing and the forecasted results will be accurate.

Cumulative distribution function of errors
The NRMSE is expressed int terms of cumulative distribution function (CDF) as shown in Figure 5. The stacked FCRBM has 50% better CDF of NRMSE as compared to ANN, CNN, and CRBM because the stacked FCRBM has more computational capability. Stacked FCRBM based prediction is reliable, even if the load is uncertain with high error of predication. The results show that deep learning technique stacked FCRBM is robust and accurate as compared to CRBM, ANN, and CNN. Moreover, the stacked FCRBM would be the best choice for the consumers load forecasting as compared to CRBM because it has more computational power to capture the highly abstract data.

Deep learning based short term load forecasting
The short term load forecasting for one week time horizon with hourly resolution is described in Figure 6. We forecast load of one week in middle of each season.

Conclusion
A short term load forecasting model based on deep learning techniques i.e., stacked FCRBM and CRBM is proposed in this paper. The proposed model consists of data processing and feature extraction module, deep learning-based training module, deep learning-based forecasting module, and forecast weakly load profile on the basis of past energy consumption. We found that the stacked FCRBM and CRBM are effective to learn from past energy consumption and exhibit better performance compared to literature forecasting models. The performance evaluation of the proposed model is performed in terms MAPE, NRMSE, and correlation coefficient. Simulation results demonstrate that stacked FCRBM and CRBM are accurate and robust as compared to ANN and CNN. Moreover, the adopted stacked FCRBM achieved 99.62% accuracy with affordable execution time and complexity. In future, we will work in medium term forecasting and long term forecasting.