dmTP: A Deep Meta-Learning Based Framework for Mobile Traffic Prediction

Deep learning technologies have been widely exploited to predict mobile traffic. However, individually training deep learning models for various traffic prediction tasks is not only time consuming but also unrealistic, sometimes due to limited traffic records. In this article, we propose a novel deep meta-learning based mobile traffic prediction framework, namely, dmTP, which can adaptively learn to learn the proper prediction model for each distinct prediction task from accumulated meta-knowledge of previously learned prediction tasks. In dmTP, we regard each mobile traffic prediction task as a base-task and adopt an LSTM network with a fixed structure as the base-learner for each base-task. In order to improve the base-learner's prediction accuracy and learning efficiency, we further employ an MLP as the meta-learner to find the optimal hyper-parameter value and initial training status for the base-learner of a new base-task according to its meta-features. Extensive experiments with real-world datasets demonstrate that while guaranteeing a similar or even better prediction accuracy, meta-learning in the proposed dmTP reduces the numbers of epochs and base-samples needed to train the base-learners by around 75 percent and 81 percent, respectively, as compared with the existing prediction models.


AbstrAct
Deep learning technologies have been widely exploited to predict mobile traffic. However, individually training deep learning models for various traffic prediction tasks is not only time consuming but also unrealistic, sometimes due to limited traffic records. In this article, we propose a novel deep meta-learning based mobile traffic prediction framework, namely, dmTP, which can adaptively learn to learn the proper prediction model for each distinct prediction task from accumulated meta-knowledge of previously learned prediction tasks. In dmTP, we regard each mobile traffic prediction task as a base-task and adopt an LSTM network with a fixed structure as the base-learner for each base-task. In order to improve the base-learner's prediction accuracy and learning efficiency, we further employ an MLP as the meta-learner to find the optimal hyper-parameter value and initial training status for the base-learner of a new basetask according to its meta-features. Extensive experiments with real-world datasets demonstrate that while guaranteeing a similar or even better prediction accuracy, meta-learning in the proposed dmTP reduces the numbers of epochs and base-samples needed to train the base-learners by around 75 percent and 81 percent, respectively, as compared with the existing prediction models.

IntroductIon
In order to support the rapidly growing mobile services [1][2], key technical challenges in mobile traffic analysis, including mobile traffic prediction, anomaly detection, attack classification, website fingerprinting, and mobile traffic identification, have been widely studied in recent years [3]. Among these challenges, the accurate prediction of mobile traffic is a critical enabler of advanced management and optimization of network resources.
Estimating future traffic loads based on historical traffic records, mobile traffic prediction has been widely considered a time series forecasting problem. The existing prediction models or algorithms can be generally classified into two categories: statistical methods and machine learning based methods [4].
In statistical methods, explicit statistical models with certain parameters are used to fit the mobile traffic patterns for traffic forecasting. In [5][6], the linear autoregressive integrated moving average (ARIMA) model and seasonal ARIMA model were adopted, respectively, to capture the short-term or long-term correlation in network traffic. Li et al. [7] demonstrated that traffic loads generated by three typical mobile services follow heavy-tailed distributions and then proposed to utilize the a-stable model to predict their load fluctuations. However, since realistic mobile traffic tends to show complex irregular patterns, it is difficult for statistical models to accurately predict realistic mobile traffic.
Unlike the statistical methods, machine learning based methods package the statistical models used for prediction into opaque or semi-opaque black-boxes, which need to be trained using historical traffic records. In [8,9], linear regression (LR) and support vector regression (SVR) were used to predict the network-level mobile traffic, respectively. Nevertheless, these shallow learning methods cannot cope with many practical prediction scenarios due to the fact that they cannot perform feature extraction on their own, but also rely on some prior knowledge of the input features. Recently, powerful deep learning tools have been leveraged for mobile traffic prediction. Nie et al. [10] employed a deep belief network based model to predict the mobile traffic loads aggregated over a city. Feng et al. [11] proposed a long short-term memory (LSTM) network based prediction model to forecast the traffic loads in the target cell. With a reduced connection complexity, a random connectivity LSTM network based traffic prediction model was proposed in [12]. A convolutional neural network based model [13] and a convolutional LSTM network based model [4] were proposed to forecast the spatial mobile traffic distribution in a city. However, in the existing works, a specific prediction model must be constructed and trained for each individual mobile traffic prediction task as the traffic patterns handled by various tasks are quite different. Separately training the prediction models for multiple tasks is not only time consuming but also unrealistic since there are not always sufficient historical traffic records available, for example, for newly built areas.
To fill in the above gaps, this article presents an early attempt to introduce deep meta-learning into mobile traffic prediction and investigate how to make the prediction model learn to learn for a specific mobile traffic prediction task according to its characteristics (meta-features). The main contributions of this work are summarized as follows: traffic prediction, where each prediction task is regarded as a base-task and is represented as a time series forecasting problem. By defining the meta-task as learning to learn the proper prediction models for different base-tasks, we propose a novel deep meta-learning based mobile traffic prediction framework (dmTP). In dmTP, we adopt an LSTM network with a fixed structure as the base-learner to forecast the traffic load for a base-task based on previous values. Using the five main frequency components as the meta-features, we employ a multi-layer perceptron (MLP) as the meta-learner to output the optimal hyper-parameter value and initial status for the base-learner of a new base-task according to the accumulated meta-knowledge and meta-features for the base-task. • The performance of dmTP is evaluated by extensive tests using real-world mobile traffic data for heterogeneous prediction tasks. Numerical results show that the meta-learning technology not only improves the prediction accuracy and learning efficiency but also makes them more adaptable to abnormal traffic patterns.
dAtAset descrIptIon And bAckground knowledge of MetA-leArnIng

MobIle trAffIc trAces
In this article, we adopt three real-world mobile traffic datasets generated in Milan (Dataset 1), Guangzhou (Dataset 2), and London (Dataset 3), respectively. Figure 1 provides the details about those datasets. Dataset 1 is publicly available [4], while Datasets 2 and 3 were purchased and are not publicly accessable. We refer to each grid in Dataset 1 as a cell and regard the forecasting problem for each cell's mobile traffic load as an individual prediction task. As the city area of Milan is divided into 9999 grids, there are 9999 prediction tasks for Dataset 1. For Datasets 2 and 3, we choose the forecasting problem for short massage service (SMS) traffic load generated in the whole Guangzhou city and that for Twitter traffic load generated in the whole London city as two prediction tasks, respectively. These prediction tasks from Datasets 1, 2, and 3 focus on different spatial scales and involve different kinds of mobile traffic.

chArActerIstIcs of MobIle trAffIc
We set the time interval resolution at one hour following the settings in [4] mainly because time interval resolutions smaller than one hour will result in the situation that many cells from Dataset 1 have lots of zero values in their traffic load series, which will be too sparse for traffic prediction. The peak traffic loads among various prediction tasks from the three datasets differ by up to five orders of magnitude, and we normalize the traffic load series of each prediction task into the range of [0,1] using the max-min scaling method. Figure 2a illustrates the normalized mobile traffic loads in three prediction tasks during two weeks. We observe that although the three mobile traffic streams exhibit different temporal patterns, they all exhibit a weekly cyclic pattern, which is also observed in time series of other cells from Dataset 1.
We generate a discrete periodic signal for each mobile traffic prediction task by periodically repeating its normalized real traffic loads in the first secular week (from Monday to Sunday) and then perform FFT of it. Please note that although the constructed periodic signal only approximates the actual mobile traffic load series, it allows us to capture the main features of a prediction task with much fewer records of historical traffic loads. Figure 2b shows the amplitudes of the frequency components in the FFT results related to the three In order to support the rapidly growing mobile services, key technical challenges in mobile traffic analysis, including mobile traffic prediction, anomaly detection, attack classification, website fingerprinting, and mobile traffic identification, have been widely studied in recent years [3]. Among these challenges, the accurate prediction of mobile traffic is a critical enabler of advanced management and optimization of network resources. Description of the three adopted datasets. Note that our study does not breach user privacy or raise ethical or legal issues. All the datasets used have been pre-processed to ensure anonymity. In particular, all personal information has been removed before data analysis.
time series in Fig. 2a. From Fig. 2b, we find that in the frequency domain, w = p/84, p/12, p/6, p/4, p/3 (corresponding to the periods of one week, one day, 12 hours, 8 hours, and 6 hours, respectively) are the five main frequency components for a mobile traffic prediction task. However, the amplitudes of the main frequency components vary evidently across different prediction tasks. Figure 2c shows the cumulative distribution function (CDF) of the signal energy carried by the five main frequency components. We observe that the sum energy of the five main frequency components in more than 60 percent of the considered prediction tasks exceeds 60 percent of the signal energy. We use a frequency component vector of size 10 to record the real parts and imaginary parts of a prediction task's main frequency components. Obviously, this vector can reflect both the amplitude and the phase of each main frequency component. Figure 2d illustrates inter-dependencies of 5000 randomly selected pairs of time series (i.e., prediction tasks) from Dataset 1 in both time domain and the corresponding frequency domain. Specifically, Fig. 2d plots the Euclidean distance between the two frequency component vectors of a pair of prediction tasks vs. the Pearson correlation coefficient between the two corresponding time series. It can be observed that the Pearson correlation coefficient of two prediction tasks' time series is negatively correlated with the Euclidean distance between their frequency component vectors. In other words, two prediction tasks' time series tend to have similar time domain variations if their frequency component vectors are close to each other, and vice versa. This implies that the frequency component vector can be used to characterize the features of a prediction task's traffic pattern.

MetA-leArnIng
Meta-learning studies how learning systems can increase learning efficiency through experience. The goal of meta-learning is to understand how learning itself can become flexible according to the domains or tasks under study [14].
For a typical supervised learning task x and a learner , each sample is denoted by labeling a number of features with an unknown target function F x and the hypothesis space of learner , H  , is defined as the set of all the possible hypothesis functions generated by . The training progress of  can thus be seen as searching the hypothesis function h  (x) that approximates F x over 's hypothesis space.  usually embeds a set of biases, which may be caused by the adopted learning algorithm, hyper-parameters, or the initial status. These biases may restrict the size of a base-learner's hypothesis   space, and will aff ect how the base-learner searches the hypothesis space. Meta-learning matches the biases of a base-learner to an individual task, which is achieved by a meta-task that adaptively generates a proper set of biases for each learning task according to the learning task's meta-features. The meta-task itself can be seen as a learning task and handled by a meta-learner. Accordingly, those original individual learning tasks are referred to as base-tasks.
the proposed dMtp Figure 3 shows the diagram of our proposed deep meta-learning based mobile traffic prediction framework, dmTP. In dmTP, we regard each individual mobile traffi c prediction task as a base-task and present an LSTM network based prediction model with a fi xed structure as the base-learner for it. We defi ne a base-task's frequency component vector as its meta-features. We regard the number of steps, which is a hyper-parameter defi ning the length of the input sequence and is denoted by SN, and the initial values of neural connection weights and neural thresholds as a base-learner's set of biases. For each considered value of SN, the hypothesis space of a base-learner is composed of the mappings each of which transforms the 3  SN-dimensional input sequence into an 1  SN-dimensional output sequence. Thus, the value of SN determines a base-learner's hypothesis space and the initial parameter values determine the base-learner's initial searching point in its hypothesis space. Intuitively, since the meta-features of a basetask refl ect the characteristics of its traffi c pattern, the best set of biases of the base-learner will be influenced by this task's meta-features. 1 We use an MLP as the meta-learner to implicitly extract the correlation between the base-tasks' meta-features and their best sets of biases, and output the best set of biases for a new base-task's base-learner according to the meta-features.
Notations in dmTP: S meta train denotes the metatask training set, which is equivalent to the set of base-tasks generating meta-samples for metatask training. For base-task m in S meta train , we use S train_large base_m(SN) to denote the large base-task training set of base-samples for training m's base-learner when the number of steps equals SN, while we use S verify base_m(SN) to denote the set of base-samples for verifying the base-learner's performance. For basetask n not in S meta train , we use S train_small base_n(SN* n ) to denote the small base-task training set with few base-samples for fine-tuning the base-learner and use S test base_n(SN* n ) to denote the set of base-samples for testing the base-learner's performance.

lstM network As the bAse-leArner
In dmTP, we construct a multi-layer LSTM network with Q layers to act as the base-learner. For a specifi c prediction task, this LSTM network will be continuously fed with a sequence of input vectors related to previous SN time intervals and predict (as the output) normalized mobile traffi c load in the next time interval. Each input vector consists of three attributes: the normalized mobile traffi c load, the day of the week, and the hour of the day. Figure 4 gives the structure of an LSTM memory block in the qth layer of the base-learner. An LSTM memory block is logically a recurrently connected subnet containing some functional modules called    1 For a base-learner with a fi xed structure and a set of base-samples obtained from the traffi c load series of a base-task, the optimal number of steps and parameter values making the base-learner optimally fi t those base-samples are determined by the frequency characteristics of the base-task's traffi c load series since these frequency characteristics can completely refl ect the temporal pattern of the traffi c load series. As shown in Fig. 2, the frequency characteristics of a basetask's traffi c load series are generally dominated by the fi ve main frequency components and the frequency component vector composed of them can characterize the features of a base-task's traffi c pattern. Hence, we select the frequency component vector as the meta-features of each base-task. Our additional experiment results (not presented in this article) show that enriching the meta-features with more than fi ve frequency components will not signifi cantly further improve the meta-learner's performance.
gates. As shown in Fig. 4, if the LSTM memory block's input vector and output vector are of size U q and V q , respectively, there will be p q = 4 · (U q + V q ) · V q + 4 · V q parameters to be trained in the qth layer. Thus, the total number of parameters to be trained in a base-learner is given by P = p 1 + … + p Q .
trAIn the MetA-leArner wIth MetA-sAMples We assume that a base-learner will obtain high prediction accuracy and training efficiency when its SN value determines the proper hypothesis space and its initial parameters approach the target ones (the initial searching point in the hypothesis space approaches the target function) [14].
The dmTP utilizes a set of base-tasks to construct S meta train . For each base-task m in S meta train , the best value of SN for its base-learner is selected from multiple candidates through exhaustive trials. Specifically, for every candidate value, SN c , the base-learner is trained using S train_large base_m(SN c ) with randomly selected initial values of the P parameters and its performance is verified via S verify base_m(SN c ) . All the base-samples in S train_large base_m(SN c ) and S verify base_m(SN c ) have the same length of input sequence, SN c . The best SN value that leads to the base-learner's highest prediction accuracy is then selected and denoted by SN* m . By labeling base-task m's frequency component vector with SN* m and P parameters of the base-learner trained by S train_large base_n(SN* n ) , one meta-sample is obtained. We propose to construct the meta-learner with MLP, which consists of at least three layers of operations [15]. We train the MLP based meta-learner using S meta train . With the MLP's ability of feature extraction and correlation characterization [15], the meta-learner will be able to generate the best step number, SN* n , and initial parameter values approaching the target ones for the base-learner of a new base-task n.

fIne-tune the bAse-leArner for A new bAse-tAsk
As shown in Fig. 3, for a new mobile traffic prediction task (i.e., a new base-task) n, the frequency component vector is first extracted as its meta-features. The well-trained meta-learner is fed with the meta-features and outputs a set of biases for the base-learner of base-task n.
The base-learner will take the given SN* n as its number of steps and set its initial values of neural connection weights and neural thresholds according to the output of the meta-learner. Then, the base-learner is fine-tuned using the small base-task training set, S train_small base_n(SN* n ) . evAluAtIon on reAl-world MobIle trAffIc dAtA experIMentAl settIngs In our experiments, the base-learner is constructed as a three-layer LSTM network, where the output vectors of the first and the second LSTM memory blocks are both of size 5. According to the LSTM memory block structure, the total number of parameters to be trained for each base-learner is 428. The meta-learner has three hidden layers of size 300, 300, and 400, respectively. 2 We randomly select 8000 out of 9999 base-tasks from Dataset 1 to construct the meta-training set, S meta train . For each base-task m in S meta train , we test SN from 3 to 24. For a candidate SN value, SN c , we apply a sliding window with size SN c to split m's normalized traffic load series and generate the base-samples by labeling each sequence of input vectors with the normalized traffic load in the next time interval. We randomly select 90 percent of these base-samples to construct S train_large base_m(SN c ) while use the remaining ones to construct S verify base_m (SN c ) . We examine the performance of dmTP on the remaining 1999 base-tasks of Dataset 1 that have not been selected for meta-training and the two testing base-tasks from Datasets 2 and 3. Similarly, for each testing base-task n, we apply a sliding window with size SN* n , which is output by the meta-learner, to generate the base-samples.
The performance of dmTP is compared with the existing time series forecasting methods including ARIMA [5], LR [8], SVR [9], and basic LSTM networks [12] for SN = 12, 24. For a fair comparison, we construct the basic LSTM network with the same structure of a base-learner in dmTP. We choose the mean square error (MSE) as the loss function, while the adaptive moment estimation algorithm [15] with the default learning rate is utilized to optimize the baseline LSTM networks as well as the meta-learner and base-learners in dmTP. We note that for approximately 20 percent of the base-tasks in Dataset 1, the traffic loads in more than 50 percent of time intervals are lower than 20 percent of the peak traffic load, while for approximately 84 percent of the base-tasks in Dataset 1 and the two base-tasks from Datasets 2 and 3, the traffic loads in more than 50 percent of time intervals are lower than 40 percent of the peak traffic load. Considering that the mean values of the traffic load series related to a significant portion of base-tasks are relatively low, we use the normalized root mean square error (NRMSE) to evaluate the accuracy of the considered prediction methods.

predIctIon perforMAnce
Figures 5a-5d compare the prediction accuracy achieved by dmTP and the baseline methods. In order to test the applicability of meta-learning technology to other prediction methods, we evaluate the performance of ARIMA, LR, and SVR when their hyper-parameters (i.e., the number of steps for ARIMA and LR, and the kernel function for SVR) for a certain base-task are either fixed   Fig. 5, all the base-samples that have not been used for testing are used to fine-tune/train the base-learner in dmTP and the baseline models. As can be seen from Figs. 5a-5d, ARIMA and LR perform poorly among all the considered methods. This is because these simple models are not able to capture the highly nonlinear temporal patterns of mobile traffic loads. The SVR, which is a nonlinear prediction method, can deal with the nonlinearities in load variation, and thus achieves better performance than that of ARIMA and LR. Due to the deep learning capability, the basic LSTM networks with adequate training data are able to learn the deep dependency between traffic loads generated in various time intervals, and thus perform better than ARIMA, LR, and SVR, as shown in Fig. 5a However, Figs. 5b and 5c show that the prediction accuracy of the basic LSTM networks degrades for prediction tasks related to Datasets 2 and 3. This is because the basic LSTM networks require a large number of samples to train their models. The small training sets with base-samples generated in a three-week or one-week period will result in overfitting and thus poor performance. Figure 5d shows that when there are abnormalities in the testing traffic patterns, the basic LSTM networks have unsatisfactory performance as well. This is because these basic LSTM networks have high dependency on the training data and will fail to predict the abnormal traffic loads if there are no similar samples in their training sets.
The proposed dmTP always obtains the best prediction accuracy, attributing to two aspects. First, the base-learner for each base-task in dmTP has the ability of learning and representing the complex nonlinearities in mobile traffic load variations. Second, unlike the basic LSTM networks embedded with fixed hyper-parameters and randomly selected initial parameter values, the meta-learner in dmTP will find the best SN and proper initial values of parameters for a base-learner, which leads to a higher prediction accuracy and a stronger adaptability to the abnormal traffic pattern. Compared with ARIMA, LR, and SVR, dmTP reduces the NRMSE by about 25-43 percent, 73-85 percent, and 20-39 percent, respectively, for the testing base-tasks from Datasets 1, 2, and 3. As shown in Fig. 5a, when there is adequate training data and the traffic pattern is normal, dmTP can further reduce the NRMSE by 12 percent compared with the basic LSTM networks mainly because of the selection of the optimal SN value. As shown in Figs. 5b-5d, when the training sets are small or the traffic pattern is abnormal, dmTP outperforms the basic LSTM networks with a 28-60 percent reduction in NRMSE owing to the proper selection of initial parameter values in each base-learner. Figure 5e shows the prediction results of dmTP and the basic LSTM networks for a testing base-task in Dataset 1 (cell 1684). We can clearly see that dmTP achieves more accurate prediction values than the basic LSTM networks when the traffic pattern has abnormities or sudden changes. From Fig. 5, we also find that the three baseline methods in conjunction with meta-learning technology perform better than their counterparts without meta-learning technology, and the meta-learners can improve the prediction accuracy by about 9 percent, 17 percent, and 8 percent, for ARIMA, LR, and SVR, respectively. These results confirm the applicability of the meta-learning technology to ARIMA, LR, and SVR. Figure 5f shows the average training time (ATT) needed by the base-learners in dmTP and the baseline methods for the testing base-tasks as well as the time needed to construct the meta-task training set and train the meta-learner (TCMTM). Compared with the baseline methods, although dmTP consumes much more TCMTM, after the initial parameter values have been properly set, the base-learner of a new base-task will converge faster and need shorter ATT than the basic LSTM networks. Note that since the meta-samples can be obtained off-line and the meta-learner only needs to be trained once, the extra off-line complexity caused by meta-learning in dmTP can be justified by the improved on-line convergence speed and reduced training time of base-learners.

leArnIng effIcIency IMproveMent for bAse-leArner
In Fig. 6, we use the testing base-tasks from Dataset 1 to further examine how the meta-learner can help the base-learners improve their learning efficiency in terms of convergence speed and the number of base-samples needed. For each testing base-task, the testing base-samples are generated during the two weeks of 12/16/2013-12/22/2013 and 12/23/2013-12/29/2013, while different from Fig. 5, only a portion of the remaining base-samples are randomly selected to fine-tune/train the base-learner in dmTP and the baseline models. Figure 6a shows the average NRMSE achieved by the proposed dmTP, basic LSTM networks with random initial parameters, and the LSTM network with biases transferred from a certain base-task in S meta train (cell 6395 with the best SN of 6) vs. the number of epochs in training, where 840 base-samples are used to fine-tune/train the base-learner in dmTP and the baseline models for each testing base-task.
We can see that for predicting normal mobile traffic, the NRMSE for basic LSTM networks decreases as the number of epochs increases and remains at a good accuracy level after 40 epochs. Due to the diversity of mobile traffic patterns among various base-tasks, the LSTM network with transferred biases has slower convergence speed and higher NRMSE than dmTP, but the transferred initial parameters result in lower initial NRMSE value and higher convergence speed than the basic LSTM networks. The dmTP's performance becomes stable after 10 epochs, leading to a 75 percent reduction in the number of epochs needed than the basic LSTM networks. Thanks to the chosen optimal SN value, dmTP reduces the NRMSE by about 12 percent as compared with the basic LSTM networks. We can also see that for predicting abnormal mobile traffic, dmTP has much faster convergence speed than both the basic LSTM networks and the LSTM network with transferred biases. Moreover, the accuracy improvement for dmTP over the baselines after they all become stable is much larger than that for predicting normal mobile traffic. This demonstrates that the meta-learner can not only elevate the base-learners' learning efficiency but also make them more adaptable to abnormal mobile traffic because the accumulated meta-knowledge will help each base-learner in dmTP handle unknown traffic patterns. Figure 6b displays the average NRMSE achieved by dmTP and the baselines vs. the number of training base-samples selected to fine-tune/train the predicting models for each testing base-task with 100 training epochs. We can see that for predicting normal mobile traffic, the basic LSTM networks and the LSTM network with transferred biases obtain a high prediction accuracy if the base-task training sets are large enough (e.g., more than 500 training base-samples). The prediction accuracy of dmTP with 150 training base-samples exceeds that of the basic LSTM networks with 800 training base-samples. Compared with the basic LSTM networks, the meta-learner in dmTP helps the base-learners reduce the training base-samples needed by about 81 percent. Since each base-learner for a testing base-task in dmTP is given a proper set of biases by the meta-learner, only a limited number of base-samples are required to fine-tune a base-learner to achieve high accuracy. For predicting abnormal traffic, the NRMSE for dmTP is significantly lower than those for the basic LSTM networks and the LSTM network with transferred biases. This is because without meta-knowledge, the LSTM networks will fail to accurately predict unknown traffic patterns if there are no similar base-samples in their training sets.

conclusIon
In this article, we introduce deep meta-learning into mobile traffic prediction and propose a novel deep meta-learning based mobile traffic prediction framework, dmTP. The prediction accuracy of dmTP has been tested on multiple real-world mobile traffic datasets. Experimental results demonstrate that dmTP is able to forecast the mobile traffic loads for prediction tasks with different spacial scales and application types. With meta-learning, the proposed framework can achieve a higher prediction accuracy and a higher adaptability in coping with abnormal traffic patterns than the existing prediction methods. Moreover, the meta-learner reduces the numbers of epochs and base-samples needed to fine-tune the base-learners by about 75 percent and 81 percent, respectively, as compared with LSTM network based prediction models.
Building on this work, it will be interesting to mathematically model and analyze the correlation between a base-task's frequency component vector and the best set of biases for the task's base-learner, as well as investigate whether the meta-learning technology can be used to optimize the structure of the base-learner for a prediction task. Additionally, it is worth studying if some other characteristics of mobile traffic patterns can be used as a base-task's meta-features. Jie Zhang [M'06, SM'16] (jie.zhang@sheffield.ac.uk) has held the Chair in Wireless Systems at the Department of Electronic and Electrical Engineering, University of Sheffield (www.sheffield.ac.uk) since January 2011. He is also the Founder, Board Director and Chief Scientific Officer (CSO) of Ranplan Wireless (www.ranplanwireless.com), a public company listed on Nasdaq OMX. Along with his students and colleagues, he has pioneered research in small cell and heterogeneous networks (HetNets) and published some of the landmark papers and books on these topics, widely used by both academia and industry. He has published some of the earliest papers on using social media network data for proactive network optimization. Since 2010, he and his team have also developed ground breaking work in modelling and designing smart built environments considering both wireless and energy efficiency. His Google scholar citations are in excess of 7000 with an H-index of 35. Prior to his current appointments, he studied and worked at Imperial College London, Oxford University, the University of Bedfordshire, and East China University of Science and Technology, reaching the rank of lecturer, reader and professor in 2002, 2005, and 2006, respectively.
Building on this work, it will be interesting to mathematically model and analyze the correlation between a base-task's frequency component vector and the best set of biases for the task's base-learner, as well as investigate whether the meta-learning technology can be used to optimize the structure of the base-learner for a prediction task.