Comparing NARX and NARMAX models using ANN and SVM for cash demand forecasting for ATM

A comparative study between NARMAX and NARX models developed with ANN and SVM when used to forecast cash demand for ATMs is conducted. A simple methodology for developing SVM-NARMAX models is proposed. The best results were obtained with NARX-ANN models. In addition no significant differences were found between NARX and NARMAX for both ANN and SVM. Hence it seems advisable to choose simpler models, such as NARX and a user-friendly tool like ANN at least for this particular application.


INTRODUCTION
The need to have good predictive models is becoming a relevant issue because of the importance of anticipating different phenomena such as climatic variations, changes in stock values or the evolution of variables in complex industrial processes. Unfortunately, the complexity of all these phenomena makes the development of suitable dynamic models based on laws very difficult. A successful alternative approach to address this problem is to design data-driven models. In that sense, in the field of linear systems there is a vast literature originated in the pioneering work of Box and Jenkins [1]. They proposed the development of models that consider that the evolution of a phenomenon can be explained by its previous behavior and by the effect of exogenous variables when they exist.
In case of non-linear phenomena, nonlinear autoregressive models, NAR, or nonlinear autoregressive with exogenous variables (NARX) are proposed. The predictive power of these models can be increased when previous errors are incorporated as regressors [2]. This results in the so called nonlinear autoregressive moving average model NARMA or NARMAX when also exogenous variables are included. These models may be better predictors because they use information about past errors in order to improve the prediction. The price to pay, as will be seen later, is that they are more difficult to identify.
Extensive use of Artificial Neural Networks (ANN) in the field of system identification, predictive control, design of observers and predictors can be found in the literature [3,4]. Despite the good results achieved with ANN some difficulties remain in its design, such as choosing the number of neurons in the hidden layers, the problem of overfitting, the existence of local minima in the objective function and the low capacity of generalization, among others. Support Vector Machine (SVM) are considered as very efficient tools for classification and regression. They have many advantages such as good generalization ability and an optimization process based on a convex function with no local minima [5]. In the case of dynamical systems, almost all works that use SVM focus on NARX type models [6] and few works report the use of SVM to develop NARMAX dynamic models. Suykens et al. [7]. established the equations needed to train Least-Square SVM (LS-SVM) -a tool similar to traditional SVM but with some features that add greater simplicity of useto develop NOE dynamic models (non-linear, with output error) with similar characteristics to NARMAX. However, they also established the great difficulty of handling such equations for practical implementations. Other authors mention NARMAX-SVM modeling applications but they are not clear enough about how SVM was used in the development of such dynamic models [8,9]. In [10] a non linear response was simulated using SVM in a two stage implementation. In the first stage an ARMAX model is defined and its output is used to train the SVM to simulate the non linear response under study, in a one-step ahead prediction situation. No explanation is given about how the prediction behaves for multiple-stepahead prediction. That is the reason why for this type of models ANN are still the preferred tool [11].
In a previous related work Simutis et al [12] compared ANN and SVM when forecasting cash demand for ATMs (Automatic Teller Machines) using only NARX type models while Ramirez and Acuña [13] also compare ANN and SVM for cash demand forecasting but using LS-SVM. In that paper NARMAX models were only used for ANN not for SVM.
In the present paper we propose a simple methodology for developing NARMAX-SVM models in order to conduct a comparative study with NARX equivalent models when used to forecast cash demand for ATMs. Additional comparisons with ANN performance are also studied.

II. AUTOMATIC TELLER MACHINES (ATM)
ATMs are funded and administered by financial institutions that make available to customers a simple method for conducting financial transactions in a public space with almost no human intervention. According to estimates developed by ATMIA (ATM Industry Association) the number of ATMs worldwide for 2007 exceeded 1.6 million units [12].
Some banks typically maintain an excess of up to 40% more cash in their terminals (ATM) of what they really need. In this regard, many experts believe that holding excess money appropriated is roughly 15% to 20% [12].
Costs related to keeping cash in an ATM represent among 35% to 60% of total maintenance costs [12]. Through improvements in management and cash management, banks can avoid falling into losses in new business opportunities due to having high cash assets. This is why it is necessary to develop new methods and advanced ways of estimating the demand for money at an ATM. Based on more accurate predictions and actual adjusted cash flow, financial institutions can lower their operating costs.
Banks and financial services assume that the demand for cash can these associated with certain variables that can have substantial effects on the level of demand for cash. Some of these variables we consider are the following [14]: • ATM Location • Seasonal factors such as weekends, holidays, etc.
• Historical data from the ATM.
Previous works demonstrate that the demand for cash is a non linear problem, thus it is necessary to use non linear modeling tools.

A. Data processing
Data come from the NN5 competition [15] and correspond to a set of 30 series of ATM's withdrawal on a daily basis from ATMs located in different parts of England. All series show a strong cyclical component of 7 days, as well as some recurring seasonal periods such as summer holidays or Christmas. Almost all series contain empty values (missing values) and some of these series show long-term trends or irregularities such as outliers or "gaps" [16].

B. Outliers and Gaps
All series include 3 types of "Gaps" or singularities.
• Observations equal to 0, indicating that no withdrawals have taken place due to "cash out" of the ATM. • "Missing Values" indicating that on that day the client's transaction was not recorded.
• Outliers, indicating which data is above or below the normal behavior of withdrawals at the ATM.
This research addresses the 3 types of abnormalities, detecting outliers, missing values and values equal to 0. In the X axis 30 ATM are shown while in the Y-axis the amount earned by each ATM can be observed. All Outlier along the series where replaced by cubic spline enterpolation.
To detect outliers the boxplot method by quartiles is used For obtaining 3 Q , the f observations are counted from the beginning, e.g f n x Q − + = 1 3 [11]. Then an outlier is one that meets the following condition (Eq. 2): On the other hand, once each of the anomalous data is identified (a total of 870) it is replaced by cubic spline interpolation with a polynomial form . Figure 1 shows the amount of outlier identified by each ATM [13].

A. NARX
A non-linear regressive model with exogenous input (NARX) is the extension of an ARX model and is given by equation (3) [17]: where e (k), is the prediction error at time k and is modeled as a Gaussian white noise zero mean process with variance σ. It represents the model uncertainty and the noise of the experimental data. The predictor associated with this type of models is given by equation (4) and is outlined in Figure 2, where ŷ is the prediction of the autoregressive variable y from previous experimental data of itself and of the exogenous variable u in times t-1, t-2, ... considering the nonlinear function [17]. Figure 2. Associated predictor for NARX-type models. u is the exogenous input and y the autoregressive variable. ŷ corresponds to the prediction of the autoregressive variable.
From Figure 2 it can be seen that for identification of such models the simple series-parallel identification method can be used. Indeed, it is enough to provide the following to the chosen approximator of the prediction function -ANN or SVM in the case of this work.
As input: • experimental data for the autoregressive variables from t-1 to t-n (n has to do with the order of the system model).
• experimental data of the exogenous variables from t-1 to t-m.
As output: • experimental data of autoregressive variables at a later time, t.
In the case of ANN the required training algorithm consists of the well known backpropagation. In the case of SVM it is enough to provide adequate input and output to the optimization method used to train SVM for regression [5].

B. NARMAX
A non-linear auto regressive moving average model with exogenous input (NARMAX) is the extension of an ARMAX model and is given by equation (5) [17].
where e(k) correspond to the random variable described above for Eq. (1). This term correspond to the prediction error from t-1 to t-p (e.g. e(k-1) = y(k-1) -ŷ(k-1)). The predictor associated with this kind of model is given by Equation 6 [17] and is outlined in Figure 3: ) ... , ,... , ... , ( To identify this kind of model it is necessary to have the values of previous prediction errors. The predictor has then to be used during training in order to obtain those previous prediction errors. This makes the identification, also known as parallel method of system identification, much more complex to perform than the series-parallel identification previously presented for NARX-type models.
In the case of ANN the required training algorithm consists now of the backpropagation-through-time method [18]. However, as explained in the introduction, there are no clearly described methods to implement the NARMAX model with SVM.

C. Proposed method to implement NARMAX with SVM
In this work the following methodology is proposed: • Divide the data set into three sets: training, validation and test.
• First identify a NARX model, as described in Section II-A, using the training set. Get the OSA prediction error (one-step-ahead) for NARX model using the validation data set.
• Use the prediction error as an additional input of a second NARX model trained on the validation set. As this second model includes the previous prediction error as input information it becomes a NARMAX type model.
• The generalization capability of this NARMAX model is evaluated on the test set, for multiplestep-ahead predictions, also known as a Model Predictive Output (MPO) configuration. Here, the model is evaluated when predicting the cash demand for the next 100 days.

D. Performance Indices
Two indices were used to quantify the prediction performance of the models. The symmetric mean absolute percentage error (SMAPE) is employed in the NN5 competition as the criteria to determine the winner [15] [20]. Is an average of the absolute percent errors but these errors are computed using a denominator representing the average of the forecast and observed values. SMAPE has an upper limit of 200% and offers a well designed range to judge the level of accuracy. It should be less influenced by extreme values [21].
V. SIMULATIONS

A. ANN training
The multilayer perceptron (MLP) ANN was used with hyperbolic tangent activation function and Levenberg-Marquard method for minimization of mean-square error criteria. The NNSYSID Toolbox for Matlab [22] was used for training NARX and NARMAX models using ANN. To determine the amount of autoregressors a Lipschitz function is applied [22] founding that 4 delays are needed for the output variable (cash demand (y)). One delay was considered for the exogenous variables and also one delay was chosen for the error for NARMAX models. Data were normalized in the range [0 1] and 150 different trainings for each model beginning with random weights were performed in order to avoid local minima. The training process was based on IA maximization for MPO predictions on the validation set.
Several configurations were tested including different number of neurons in the hidden layer. The following variables were considered as exogenous inputs: the day of the month (u1), day of week (u2), week (u3), month (u4) and a dummy variable (u5) to indicate special dates such as month-end, holidays and other calendar effects of interest. The final architectures for ANN NARX and NARMAX models are summarized in Table 1.   TABLE I. ANN ARCHITECURES FOR NARX AND NARMAX MODELS

B. SVM training
The library of tools developed at INSA de Rouen, France [23] was used to implement the SVM models. As mentioned above, the same sets of training, validation and testing were used for both ANN and SVM models. In this case, model training consisted in adjusting the parameters C, sigma and ε. In addition, λ, a parameter of the quadratic programming optimization method of SVM was also tuned. The same inputs than those used for ANN were considered to train and validate the SVM models. The training process was based on IA maximization for MPO predictions on the validation set. SVM models were developed considering a radial basis function kernel. For NARX, the parameter C was increased by powers of 2 [24]. The exponents took values from -5 to 15 with step 1 and a more detailed search was carried out near the best values of C for each ATM, decreasing the step up to 0.1. The same method was used to find the values of sigma and λ, ranging from -5 to 5 in the first case and from -32 to -7 in the second case. In the first case, the step was decreasing the same way than C, from 1 to 0.1, while for λ from 4 to 0.25. Parameter ε varied between 0 and 0.2 with an initial step of 0.05 and finishing with 0.01. For NARMAX the methodology described in section IV-C was applied. The NARX model previously trained was used for obtaining the OSA prediction error. This error was introduced as an additional input of a second NARX model. For the second NARX the SVM was trained to find the values of C, sigma and λ in the neighborhood of the values obtained for the first NARX model in a range of ± 1.5 with a step of 0.1. ε ranged from 0 to 0.2.

VI. RESULTS
Four models were tested when used as MPO predictors of cash demand for ATMs with a horizon of 100 days. The models are NARX and NARMAX type each one implemented with both ANN and SVM.
The MPO predictor for NARX models is described in Eq. 9.
While the MPO predictor for NARMAX models is described in Eq. 10: For MPO prediction the error variable is fixed to 0 because it is impossible to a priori know future values of the error [25].
In Figure 4 a comparison of the IA indices between NARX and NARMAX ANN models for all the 30 ATMs can be seen. Solid and dashed lines are only used for better visualization of the results.   Unexpectedly, for this application, the predictive capability of NARMAX models did not outperform the NARX capability. SVM also showed no superiority over ANN. Moreover given that the best results were exhibited by NARX ANN modelswhich is even more remarkable considering the SMAPE indexand given the simplicity exhibited by ANN in their trainingsignificantly lower computational effort to find weights than to calibrate parameters of SVM-it seems advisable to choose simpler models, such as NARX and a user-friendly tool like ANN at least for this particular application.
However it seems important to underline that the proposed method to implement NARMAX with SVM was easy to implement and allowed the SVM to predict in a MPO configuration. This is why we consider that the results of this work are encouraging to study this method and apply it to other model identification problems.