Analysis of hybrid non-linear autoregressive neural network and local smoothing technique for bandwidth slice forecast

The demand for high steady state network traffic utilization is growing exponentially. Therefore, traffic forecasting has become essential for powering greedy application and services such as the internet of things (IoT) and Big data for 5G networks for better resource planning, allocation, and optimization. The accuracy of forecasting modeling has become crucial for fundamental network operations such as routing management, congestion management, and to guarantee quality of service overall. In this paper, a hybrid network forecast model was analyzed; the model combines a non-linear auto regressive neural network (NARNN) and various smoothing techniques, namely, local regression (LOESS), moving average, locally weighted scatterplot smoothing (LOWESS), the Sgolay filter, Robyn loess (RLOESS), and robust locally weighted scatterplot smoothing (RLOWESS). The effects of applying smoothing techniques with varied smoothing windows were shown and the performance of the hybrid NARNN and smoothing techniques discussed. The results show that the hybrid model can effectively be used to enhance forecasting performance in terms of forecasting accuracy, with the assistance of the smoothing techniques, which minimized data losses. In this work, root mean square error (RMSE) is used as performance measures and the results were verified via statistical significance tests.


INTRODUCTION
Nowadays, there are vast network deployments of various domains and emerging new technologies and application-centric services. The capability of network traffic forecast has become one of today's crucial network design and the main requirements of various operations due to its benefits in various sub-domains, such as network security, dynamic slice re-allocation, and resource planning. In network traffic forecast, a proactive approach is used instead of a reactive one, where network resources are monitored to ensure that all service requirements are met, in addition to quality of service (QOS) and security. Also, traffic analysis can be a crucial stage for building successful preventive congestion controls.
Generally, forecast and prediction are used interchangeably. Nevertheless, forecast can be explicitly defined as the estimation of future values based on an analytical model built from past observations. In this on an artificial bee colony (ABC) algorithm that employed particle swarm optimization (PSO), an evolutionary search algorithm. Moreover, a (5;11;1) MLP-NN was used as the training algorithm. The results showed that the proposed model had a higher prediction accuracy than BP.
Li et al. [6] used a feed forward neural network to predict incoming and outgoing traffic flows.The study argued that inter-data center link is dominated by elephant flows. The study used a gradient decent and a wavelet transform to train a hybrid model. SNMP counters and total incoming and outgoing data traffic were gathered in 30-second intervals. These data were used as the dataset. The data were collected from data center (DC) routers for a period of six weeks. The time series was decomposed using a level-10 wavelet transform. However, it must be noted that the wavelet transform can aggressively eliminate parts of the original data if not implemented carefully.
Dyllon et al. [7] developed a nonlinear autoregressive exogenous neural (NARX) network model for time series network traffic analysis. The study implemented a neural network model to predict the future trends of the London South Bank University (LSBU) bandwidth data traffic. Dataset was collected using the paessler router traffic grapher (PRTG) tool. The results showed that NARX neural network is a good method for predicting time series data.
Yoo and Sim [8] proposed a forecast model and claimed it could improve resource utilization efficiency in high-bandwidth networks to accommodate the rise in data volume demands for scientific data applications. A seasonal decomposition of time series by LOESS (STL) and ARIMA are used on SNMP. The results showed that the proposed forecast model was resilient against abrupt changes in network usage. The multistep forecast was tested as well.
Afolabi et al. [9] discussed the significance of the interference-less machine learning approach in a time series forecast as a crucial component of prediction performance, especially when forecasting many steps ahead of the currently available data. The authors used Hilbert Huang transformation (HHT) as the noise elimination technique. The simulation results were compared with conventional and state-of-the-art approaches.
Joo et al. [10] proposed a prediction method based on wavelet filtering. The proposed framework analyzed the time series in both the time and frequency domains. The proposed approach was applied to various scenarios. The results showed that the proposed method outperformed other approaches that did not use wavelet-filtering techniques. B. Doucoure et al. [11] introduced a prediction method for renewable energy sources to intelligently manage renewable energy. The authors used wavelet decomposition and artificial neural networks and discussed the significance of their results.
Alawe et al. [12] proposed a novel mechanism to scale 5G core network resources by forecasting traffic via ML techniques. The prediction technique used was based on recurrent neural networks (RNN), long short-term memory (LSTM), artificial neural networks (ANN), and the deep neural network (DNN). Comparisons were made between the different techniques. The simulation results confirmed the higher efficiency of the RNN-based solution compared to the other approaches. No preprocessing or feature extraction was made.
Wang et al. [13] proposed a wavelet-based neural network model, called the multilevel wavelet decomposition network (mWDN). The proposed model used the wavelet decomposition in frequency learning while enabling the fine-tuning of all parameters under a deep neural network framework. The results showed the effectiveness of the proposed hybrid approach. The wavelet decomposition required several parameters that could affect the forecast performance such as the number of decomposition levels and the selected mother wavelet.
Salih [14] introduced LAN office network bandwidth prediction models as time series models. The proposed forecast models were tested using mean square error (MSE) and performance evaluation plots. However, the study did not use any preprocessing techniques. J. Feng et al. [15] proposed a deep traffic predictor (DeepTP) model to forecast long-period cellular network traffic. The study showed that the model outperformed other traffic forecast models by more than 12.3%. However, LSTM is not suitable for long-period forecasting (multi-steps ahead).
Le et al. [16] proposed a traffic forecasting model using autoregressive models and neural network, models to predict key performance indicators (KPIs) in network KPI for long term and short-term forecasting real data. However, no preprocessing was applied and the study only focused on investigating relationships between network KPIs.
You et al. [17] proposed a hybrid LOESS-ARIMA-based forecast model. Authors claimed that such a model has the potential of enhancing the efficiency of resource utilization, especially in high-speed networks, to accommodate the rapid increase in rising demands for scientific data applications. A seasonal decomposition of time series by LOESS (STL) and (ARIMA) was applied on simple network management protocol (SNMP). The results revealed that the proposed forecast model was resilient against abrupt changes in network usage provided that the multistep forecast was used as the primary scenario.

RESEARCH METHOD
The ML approaches were modeled as time series batch learning. The general process of network bandwidth forecast is based on machine learning. This algorithm was extended in this study by preprocessing the provided dataset, namely by eliminating unnecessary noise and rapid traffic fluctuations. Moreover, to avoid the erosion of periodic trends and patterns within the series, the system learns local and global trends separately to detect and eliminate short-term or long-term noise. Similar approaches have also been used in the past [7]- [11], with the various techniques used including Hilbert Huang transformation (HHT), STL, and the wavelet-based approach. However, it is often used to detect high noise levels in the long term and may not be suitable for online or semi-online processes, while the current study proposes a hybrid approach using a nonlinear auto aggressive neural network that focuses mainly on local variations using various local regression techniques to remove unnecessary noise and fluctuations, which may has negative effects on the prediction accuracy, especially in nonlinear and non-stationary time series. Local regression approaches allow the removal of noise and fluctuations in short scales and react more dynamically to noise-level short-term variations more than other wavelet-and HHT-based techniques. Similar approaches were also utilized in one study [8], which used ARIMA instead of NAR. The effectiveness of the proposed method was verified using available real network traffic datasets.

Neural network auto-regressive (NNAR)
Neural network training attempts to approximate a function by optimizing network weights and neuron bias.
In (1), the term ε stands for error. The y input features (Bandwidth slice in this case) ( − 1), ( − 2), ( − 3) are the feedback delays. Trial-and-error was done to optimize the hidden layers and neurons to achieve the best performance. However, as the number of neurons increases as the system becomes more complex, the low number of neurons may reduce network efficiency. Levenberg-Marquardt is the most widely used learning rule due to its fast response [9]. The root mean squared error (RMSE), mean squared error (MSE), and the error sum of squares (SSE). In (2), (3), and (4), are often used as the performance matrix, where y î is the predicted data, is the current data, and is the number of data samples [9]. In this research, the gradient descent was used as the learning rule. NARNN was chosen because LSTM and deep learning approaches require a complicated and careful design to produce accurate forecasts. In addition to that, these techniques work better with high dimensional and large datasets. Therefore, NARNN was selected in this research as the forecasting technique.
The collected data was divided into training data and testing data. The training stage was used to test the model fit. Then, the time series forecasting model was established using the trained model. The performance was measured accordingly and then compared with actual values.

Local smoothing techniques
As discussed in section 1 the persistence of noise in a time series forecast can have continuously and cumulatively impair forecasting performance in n-steps ahead forecasts, so this issue has to be tackled carefully when working with forecasting algorithms while minimizing the effects of high or low frequency noise within the data, which can be useful for forecasting in the short-or long-term scale. The significance of noise processing or removal was addressed in past work [7]- [11]. Next Section discusses various local smoothing techniques used in this paper.

Local regression techniques
The local regression method is based on the LOESS method [18]. It is based on fitting simple models to localized data subsets to form a curve that approximates the original data. The observations ( , ) are assigned neighborhood weights using the tricube weight function shown in (6). Let ∆ ( ) = | − | be the distance from to , and let ∆ ( ) be these distances in the smallest to largest order. Then, the neighborhood weight for the observation , is defined by the function ( ): for such that ∆ ( ) < ∆ ( ), where q is the bandwidth that defines the number of observations in the subset of data localized around x. In the proposed algorithm, this approach was applied to fit a trend to the last k observations of resource utilization. Accordingly, a new trend line ̂( ) =̂+̂( ) is found for each new observation. This trend line is used to estimate the next observation ̂( + 1).The new observation can be in the form of host resource utilization such as bandwidth slice utilization [18]. In (7) shows the final forecast formula using hybrid LOESS and NARNN: where is the number of entries, is the number of hidden layers with activation function , and is the parameter corresponding to the weight of the connection between the input unit and the hidden unit , is the weight of the connection between the hidden unit and the output unit, and 0 and 0 are the constants that correspond, respectively, to the hidden unit and the output unit.Two forms use LOWESS, which uses a first-degree polynomial model with weighted linear least squares and LOESS, which uses a second-degree polynomial model [18].

Robust local regression
This study adopted the LR method but the first fit was carried out with weights defined using the tricube weight function. The fit was evaluated at the to get the fitted values (̂ ), and the residuals ̂= ̂ − , at each observation ( , ), the additional robustness weight was calculated, subjected to a magnitude of ̂. Accordingly, a new weight ( ) was assigned to each observation, where is defined as in (8) [18].
where MAD is defined per (9): Similarly, two versions were examined, i.e., 'RLOWESS' and 'RLOESS'. In both forms, the lower weights were assigned to the outliers in the regression. Moreover, outside the six mean absolute deviations, zero weights were assigned to new values.

Moving average
In several domains, time series data is usually smoothed using moving averages (MAs). This method is used especially in trend forecasting. The moving average is considered a type of real-time filter that removes high frequencies from data. In signal processing, MAs are therefore also called "low-pass filters" [19] where the calculated coefficients are equal to the reciprocal of the span or bandwidth. Moving averages are also known as "exponential smoothing". Let's define as throughput at the time i. Let = { }, = 1 … . . be the time series where p is the time series length. Therefore, the moving average of the period q at time can be calculated as per (10) [19]. In (10) and (11)

Savitzky-Golay smoothing filter
The Savitzky-Golay (SG) smoothing filter is considered a type of low-pass filter characterized by two parameters denoted as K and M. The SG filter can be defined as a weighted moving average, i.e., a finite impulse response (FIR) filter. Filter coefficients are calculated using an un-weighted linear least-squares regression and a polynomial model of a specified degree (the default is 2). The time series to be estimated is donated by x(n), so the final output is obtained using (12), (13): Note that a higher degree polynomial makes it possible to achieve a high level of smoothing without attenuating the data features [20]. It is worth mentioning that LOESS is used for seasonal decomposition, but in this work, the focus was to use LOESS and other local regression techniques as smoothening techniques, since decomposition may aggressively remove some of the important dataset features. Now, the question becomes how to select the bandwidth q. The bandwidth plays a critical role in the overall local regression fit; if the bandwidth selected is very small, large variances will result, as insufficient data will fall within the smoothing window, and, as a result, a noisy fit will be produced. On the other hand, if it is very large, not all data will be fitted within the specified window. Ideally, a separate bandwidth for each fitting point is used, bearing in mind features such as the local density. Practically, it is difficult to select an optimum q value, as the researcher does not want to unintentionally eliminate data. The simplest approach is to select q as a constant for all . This case could be satisfactory for some simple constant variance data, but when the independent variables have a non-uniform distribution such as in the bandwidth slice, problems such as empty neighborhoods and the accidental removal of more unnecessary data could result. Therefore, the following approach shown in Algorithm 1 was proposed:

Algorithm 1
Input y as time series bandwidth utilization Output MSE ŷ as a locally fitted (predicted) value using local smoothing techniques 1-Initialize, set q as 0 2-perform local smoothing using selected q 3-set q = q + 0.001 4-calculate the average MSE for all q-values 5-if MSE is = 0, then go to 2, else stop 6-set q  q 7-return ŷ Figure 1 shows the effects of different q-values and their corresponding differences from the original Bandwidth utilization. It is obvious that as the q-value increases, the smoother the curve, but the difference (error) will increase, in turn, increasing the overall absolute mean squared error (MSE). In this paper, NNAR (p,k) was used to indicate p lagged inputs and k nodes in the hidden layer. The general approach to searching for the optimal structure for the NNAR model is through trial-and-error, performed by testing numerous networks with varying numbers of inputs and hidden units and then calculating the generalization error of each to achieve a structure with the lowest generalization error [21], [22]. The crucial part of NNAR modeling is to find the appropriate values for p and k lagged inputs. In this work, Akaike's information criterion (AIC) [21]- [23] was used to automate the parameter selection process using R programming language. In fact, this method is asymptotically equivalent to cross-validation [23]. The best model with p and k was then chosen with the least value of AIC using the R language.
Two scenarios were examined in this paper-the short-term forecast, which shows how each hybrid technique will perform on the short-term scale, and the second scenario, which shows the forecast performance on a long-term scale forecast. Each time step represents 28.8 minutes and every 50 time steps represent one day. This case is due to the limitations in the data collection tool. The values were then interpolated, resulting in a time series model. The multi-scale forecast was used to investigate the extent to which the hybrid techniques would perform better than various forecast windows. The finding will prove beneficial for real-world core and backbone networks to achieve efficient network resource planning. In this paper, to enhance time series forecast models, the Box and Cox [24] power transformation was used to normalize series variances. Moreover, the augmented Dicky-Fuller (ADF) test [24] was used to confirm the stationarity of the time series although NARNN can be used to model a nonstationary time series. Previous work had advised examining the stationarity of regression models, as stationarity  Figure 1 (a) shows the LTE bandwidth utilization without smoothing while Figures 1 (b) show the effects of applying moving average smoothing techniques using q=0.002 while Figure 1 (c) shows the effects of applying moving average smoothing techniques using q=0.003 As shown in Figure 1 (a), it is obvious that the bandwidth slice exhibited significant seasonal patterns with daily peaks. Nevertheless, the data also shows a stochastic pattern between successive points with continuous irregular fluctuations. On the other hand, no long-term trend appeared to exist. Minimum smoothing bandwidth (q) was selected as intorduced in section 2 in algorithm 1. From Figure 1 (b), it is noticable that the effects of applying smoothing techniques can be difficult to be observed by the naked eye. Therefore, MSE was accordingly calculated for each technique as depicted in Table 1, which shows the effect of applying various smoothing techniques on the selected dataset. Figure 1 (b) shows the LTE slice bandwidth utilization smoothed with the moving average (MA) and smoothing window q=0.003, which removes more of the small flactuations at the top peaks, thus producing the highest MSE out of all the other techniques. In this case, (q) has a direct influence on the smoothing performance since it is inversly propotional to the MSE. Therefore, a significant portion of the data could be removed if higher (q) values were used. In fact, the higher the (q) values, the better the smoothing and the larger amount of data that will be lost as depicted in Figure 1 (c). Concequently, in today's data-centric world, losing even small amounts of data could lead to the violation of service level agreements in addition to inefficient resource utilization and planning. Therefore, (q) has to be selected according to algorithm (1). LOWESS produced the second largest MSE, as shown in Table 1, due to the likelihood that the nonlinear bandwidth slice would less likely fit if the first-degree polynomial linear model was used. However, fitting using the quadratic polynomial based on LOESS produced a smaller MSE, as shown in Table 1, due to the nonlinearity of the second-order local fitting models, as shown in Figure 1 (a). On the other hand, the sgolay filter produced a smaller MSE using a second-degree polynomial, in contrast to LOESS, which used a second-degree polynomial in which the weights were strongly influenced by the q bandwidth, as shown in (6) Finally, RLOESS and RLOWESS shared a similar performance, yielding the lowest MSE values, as shown in Table 1. Now, based on AIC calculated automatically from (autoarima) function in R, it was found that NARNN (28,14) produced the best fit. Table 2 shows the comparisons and the final results of applying the hybrid NARNN and smoothing techniques for the LTE bandwidth slice forecast for short 50 time steps head and for long 350 time steps ahead. Table 2 also shows the RMSE for NARNN of each smoothing technique for 50-time steps and 350-time steps. Overall, the hybrid NARNN tended to perform better, with better RMSE and a higher smoothing MSE.

RESULTS AND DISCUSSION
It is worth to note that, the RMSE values when applying NARNN only without any combined technique were 308 for the 50-time step forecast and 323 for the 350-time step. From Table 2, it is obvious that the combination of LOESS and NARNN yielded better performance followed by the moving average and NARNN. The Diebold-Mariano test [21]- [23] was then applied to check for statistical significance. NARNN with LOESS RMSE was found to be statistically different from other hybrid techniques. The same finding was found for the 350 time step forecast. Therefore, NARNN with LOESS yielded better performance and was verified statistically via the Diebold-Mariano test as well. This result confirms the effectiveness and the reliability of the hybrid NARNN and the smoothing techniques for forecasting short-and long-term scales. The autocorrelation function (ACF) obtained using the Ljung-Box test was used for further analysis. The analysis of ACF was used to calculate the number of inputs of auto-correlated vectors to create an appropriate model. Moreover, it was also used to investigate white noise (zero mean, constant variance, uncorrelated processes, and normally distributed) in the residuals. Figure 2 (a) shows the ACF and the plots of the residuals of the hybrid NARNN smoothing forecast models for the 50-time step. And   In the case of the 50-time step forecast using the hybrid NARNN with LOESS , the residuals fell randomly within the horizontal band (between 4e7 and -4e7) and as a result the variance of the residuals looked to be independent of the size of the fitted values. Meanwhile, the same results were found for 350-time steps forecast in hybrid NARNN. This pattern suggests that the variances in the error terms are equal. Moreover, no one residual stood out from the random pattern; thus, suggesting that there were no outliers. The lags in the ACF plots fell below the 0.08 threshold. Moreover, no pattern was evident in the residuals. Additionally, the residuals followed a random distribution around zero. This result confirms and validates that the NARNN with LOESS relatively provided the best forecasting models. Figure 3 (a) shows the performance comparison of hybrid NARNN versus hybrid Seasonal Autoregressive Moving Average (SARIMA) for 50-time step forecast, the hybdrid SARIMA was used as a benchmark to validate the obtained results. Results have shown that in overall NARNN hybrid technique outperform other non-hybrid techniques. NARNN-LOESS had the least RMSE values across other SARIMA hybrid techniques, although SARIMA-original slightely outperform NARNN when used without local smoothing techniques. Therefore, NARNN-LOESS will be our best choice since our objective is to provide best forecast performance with minimum data lose as discussed earlier. Same findings were found for the 350-time step forecast as depicted in Figure 3  In this case, NARNN-LOESS had the least RMSE value comared to SARIMA with hybrid techniques. Although SARIMA hybrid show a noticeable performance improvement compared to hybrid NARNN except for the case of hybrid NARNN-LOESS that barely outperform SARIMA-LOESS. The Diebold-Mariano test was then applied to check the statistical significance of the obtained results. It was found that RMSE of NARNN-LOESS hybrid techniques to be better and statistically different from forecasting SARIMA and this confirms the superiority of the NARNN hybrid techniques. Figure 4. (a) depicts the 50-time step ahead forecast for the NARNN with LOESS and Figure 4 (b) for 350-time step forecast, both figures show that the both forecast can effectively lie between the prediction intervals.

CONCLUSION
In this paper, hybrid local smoothing and neural network auto-regressive (NNAR) modeling approaches were used to forecast LTE core bandwidth slice utilization. Several local smoothing techniques were analyzed, and a local smoothing mechanism was introduced to minimize the effects of data losses, which may carry necessary information resulting from aggressive and uncontrolled smoothing functions. The models showed better forecast performance in terms of RMSE, provided minimum data losses were maintained. Long-term and short-term step forecasts were examined and the results were verified using residual analysis,