Anticipatory Allocation of Communication and Computational Resources at the Edge using Spatio-Temporal Dynamics of Mobile Users

—Multi-access Edge Computing represents a key enabling technology for emerging mobile networks. It offers intensive computational resources very close to the end-users, useful for task ofﬂoading purposes. Many scientiﬁc contributions already proposed approaches for optimally allocating these resources over time. However, most of them fail to take advantage of the prediction of both users’ mobility and service demands over a look-ahead temporal horizon. To bridge this gap, this paper formulates a novel methodology for anticipatorily allocating communication and computational resources at the network edge, based on the prediction of spatio-temporal dynamics of mobile users. The conceived architecture exploits a Software-Deﬁned Networking approach to monitor users’ mobility, a Convolutional Long Short-Term Memory to predict over different look-ahead horizons the number of users within a given number of cells and their related service demands, and Dynamic Programming to optimally allocate users’ requests among available Multi-access Edge Computing servers. Computer simulations investigate the effectiveness of the proposed approach in a realistic autonomous driving use case and compare its behavior against a baseline solution. Obtained results demonstrate its unique ability to dynamically and fairly distribute users’ requests among the resources available at the network edge, while ensuring the targeted quality of service level.


I. INTRODUCTION
In both fifth generation (5G) and Beyond 5G (B5G) networks, Multi-access Edge Computing (MEC) is emerging as a fundamental enabling technology for the rapid diffusion of advanced services, such as autonomous driving, virtual/augmented reality, e-Health, robotics, and tactile Internet [1]- [3]. According to European Telecommunication Standard Institute (ETSI)-MEC specifications [4], MEC servers are deployed at the network edge to offer intensive computing and memory capabilities in the proximity of end-users, while guaranteeing low communication latencies to new heavy demanding and real-time services [1]. They are also able to limit network congestions by processing data directly at the edge, instead of forwarding a big amount of data to the cloud. This particularly applies to MEC servers co-located with gNBs (base station of 5G networks), that can provide computational capabilities as close as possible to end-users and capture information for further purposes (data analytics and big data processing) [1].
As expected, communication and computational resources available at the network edge should be properly managed to fulfill the spatio-temporal dynamics and the even growing amount of users' requests [5], [6]. Most of the scientific contributions in this context address network resource management, computational resource allocation, and task offloading through optimization algorithms [7]- [16] or iterative procedures based on artificial intelligence [17]- [22]. Unfortunately, these contributions generally consider the actual static picture of the overall systems and ignore the impact that future spatio-temporal dynamics of mobile users may have on the system behavior. Differently, the knowledge (i.e., prediction) of both users' mobility and communication and computational resources they request over time within a given geographical area could significantly improve network optimization mechanisms [23]- [25]. The current state of the art proposes various instruments to forecast the movements of users [26]- [37], their requests [38]- [44], or both [45] (see Section II for more details). Solutions based on machine and deep learning also promise to better anticipate network behaviors and dynamics in heterogeneous and large scale scenarios [46], [47]. Nevertheless, resulting network optimization problems (including those presented in [26]- [28], [30]- [33], [35], [40]- [43], [45]) fail to take advantage of the joint prediction of both users' mobility and service demands over a look-ahead temporal horizon and within a standard compliant ETSI-MEC context.
To bridge this gap, this work formulates an innovative methodology for the anticipatory allocation of communication and computational resources at the network edge (i.e., task offloading), based on the knowledge of spatio-temporal dynamics of mobile users. The conceived approach significantly extends the very preliminary contributions presented, by the same authors of this work, in [48] and [49]. The considered architecture adopts a Software-Defined Networking (SDN) approach to monitor users' mobility over time. Then, starting from the outcomes of the preliminary contributions presented by the same authors in [48] and [49], it exploits a deep learning architecture based on Convolutional Long Short-Term Memory (ConvLSTM) [50] for predicting the distribution of users among cells and their related service demands over a look-ahead temporal horizon. A centralized Multi-access Edge Orchestrator uses this information to anticipatorily distribute users' demands among available MEC servers, while satisfying communication and computational constraints at the network edge and the upper bound for latency expected by mobile users. Specifically, the optimal allocation problem is stated as a sequential decision-making process, which considers future steps in the optimization horizon and it is solved by Dynamic Programming [51].
The behavior of the proposed approach is investigated in an autonomous driving use case (with real mobility traces [52] and conceivable network and service settings [13], [53]- [59]) by using computer simulations. First of all, the presented study remarks that the usage of both ConvLSTM and Dynamic Programming ensures results comparable with those obtained by the same optimization algorithm running on a perfect knowledge (i.e., ground truth) of spatio-temporal dynamics of mobile users. This demonstrates the high performance of the prediction process. At the same time, the comparison against a baseline approach, which leverages the distribution of users at the current time instant and allocates users' demands to the closest MEC server, reveals that only the conceived anticipatory approach can fairly distribute users' requests among the resources available at the network edge, while ensuring the targeted quality of service level. Finally, a complexity analysis confirms the effective and easy implementation of the proposed methodology in real deployments.
The remainder of the paper is as follows. Section II reviews the related work on this area and identifies the gaps bridged in this paper. Section III introduces the considered architecture and the targeted scenario. Section IV describes the proposed optimization approach, including the system model, the problem formulation, and the mobility prediction model. Section V presents numerical results coming from computer simulations and formulates a complexity analysis. Finally, Section VI concludes the paper and draws future research directions.
Emerging methodologies exploit artificial intelligence technologies, like machine learning, deep learning, and deep reinforcement learning, for network optimization [17]. While most of the contributions in this context focus on the optimal management of computational resources only [18]- [20], some other works consider at the same time the goal of managing and allocating communication and computational resources [21], [22]. Available approaches intend to maximize the overall resource capacity [20], to minimize energy consumption [19] and delay [18], [21], [22], as well as to fulfill the expected upper bound for the overall delay [18], [22].
The contributions presented in [23]- [25] highlight that the knowledge (i.e., prediction) of users' mobility and/or the set of requests that they may formulate in a given geographical area over time introduce further key information for network optimization tasks.
The prediction of users' trajectory and location can be achieved with mathematical models [26]- [28]. The mobility forecasting obtained in [26] is used to offload computing tasks (requested by mobile users) to a single remote MEC server. To this end, an optimization problem that jointly minimizes energy consumption and latency, satisfying the expected maximum delay, is formulated in [26]. The knowledge of trajectories during the next look-ahead window is considered in [27] for planning the migration of virtual machines at the network edge. This goal is reached by employing an optimization problem that minimizes communication latencies, ensuring at the same time expected upper bounds. Finally, the work in [28] leverages a Markov Decision Processes to predict user mobility and formulates an iterative approach for jointly allocating communication resources among available users and placing virtual machines at the network edge. Similarly to [26], the presented solution minimizes energy consumption and delay.
Differently from the above-discussed methodologies, solutions based on machine learning promises to better anticipate network behaviors and dynamics, also in heterogeneous and large scale scenarios [46], [47]. For example, the prediction of trajectory and location is performed through deep learning architectures, as Long Short-Term Memorys (LSTMs) [29], [30], [32], [33], LSTMs with attention mechanism [34], Convolutional Neural Networks (CNNs) [31], and a combination of recurrent and CNNs with Markov Chains [35]. Furthermore, the number of users in a given geographical area is predicted through machine learning-based Regressors in [36] and a combination of deep learning and Bayesian networks in [37]. Mobility forecasting in [30] supports an optimization problem willing to distribute computing caching capabilities among mobile users, maximizing the overall resource capacity and satisfying the expected maximum delay. The knowledge of locations, until one [31] or more steps ahead [32], [35], is also adopted to drive the migration of virtual machines at the network edge. In more detail, the contribution in [31] describes an iterative procedure for minimizing the communication latencies and satisfying the expected maximum delays. Optimization problems willing to minimize delay [32] and energy consumption [35] are formulated in [32], [35]. Finally, the work discussed in [33] adopts deep reinforcement learning to manage computation offloading tasks among different remote MEC servers in order to minimize the delay. [32], [35] [33] >1 [40], [41] 1 [42] >1 [43], [45] This >1 Instead, traffic volume/load can be accurately predicted through deep learning methods [38], [39], such as Multi-Layer Perceptrons [42], CNNs [44], LSTMs [40], [41], and Multivariate LSTMs [43]. Traffic forecasting during the next look-ahead horizon assists network optimization in terms of computation offloading and resource allocation with one MEC server in [40], [41], minimizing energy consumption. Traffic prediction also aids the joint communication and computational resource allocation for user association and Service Function Chain placement among MEC servers in [42]. Here, an optimization algorithm is adopted for minimizing delay, while respecting service latency as upper bound. Moreover, the knowledge of traffic requests in Cloud-Radio Access Network context supports the Remote Radio Head (RRH)-Base Band Unit (BBU) mapping in [43], where an optimization problem minimizes deployment cost and energy consumption. The traffic volume of RRHs with the number of users moving between a pair of two RRHs is predicted in [45] through Multivariate LSTM. This information is exploited to optimally perform RRH-BBU mapping, minimizing energy consumption and delay.
To conclude, Table I summarizes the goals and methodologies followed by the reviewed scientific contributions performing mobility/requests prediction and network optimization, highlighting the main differences with respect to the approach proposed in this paper. It emerges that to the best of authors' knowledge no contributions in the current state of the art jointly predict, through deep learning, the geographical distribution of users over time (i.e., the number of users available within each cell in a given moment) and the related requests for a look-ahead horizon, as proposed in this work in order to better manage task offloading in a 5G slicing paradigm. Thus, they do not take advantage of mobility and requests prediction to dynamically and anticipatorily optimize communication and computational resource management among available MEC servers, satisfying the upper bound of communication latencies.

III. REFERENCE SCENARIO
This work mainly refers to the task offloading problem, according to which it is necessary to deploy (and properly use) available communication and intensive computational capabilities at the network edge for offering new heavy demanding and latency-critical services with challenging user expectations [1], [4], [49], [54].
The conceived approach can be implemented within the 5G slicing paradigm. In fact, according to 3rd Generation Partnership Project (3GPP) specifications [60], a slice instance represents a set of network functions and related resources which are arranged and configured in a logical network to meet certain network characteristics. To this end, a service provider declares communication service requirements (e.g., coverage area, number and distribution of users, traffic demand, mobility, latency, etc.) to the infrastructure provider. In turn, the infrastructure provider configures the corresponding network slice instance, whose preparation phase includes the on-boarding and verification of network function products and the necessary network environment. From this moment on, the service provider can dynamically allocate the resources belonging to the aforementioned slice to the served mobile users (i.e., the task offloading within a specific slice). Note that in complex deployments, where heterogeneous services are offered through different slices, the proposed approach can be replicated for each slice.
In line with 5G specifications, emerging guidelines for the upcoming B5G systems, and the ETSI-MEC standard [61], the mobile network considered in this work embraces mobile users, gNBs, MEC servers, SDN controllers, and a Multi-access Edge Orchestrator (see Fig. 1). Here, gNBs are part of the 3GPP network integrated within the ETSI-MEC architecture. They provide wireless connectivity to mobile users through heterogeneous technical components at the radio interface [48], [61]. It is important to remark that gNBs can be connected to each other in different ways. Ring, tree, or mesh topologies can be implemented by the infrastructure provider [62]. Without loss of generality, a mesh topology is depicted in Fig. 1 as an example of the backhaul network topology, even if the system model described in Section IV-A will be general enough for capturing the behavior of any topology.
A number of MEC servers (or MEC hosts) expose resources to mobile users, depending on one or more services they use [61]. In this sense, the example reported in Fig. 1 shows that the black and gray blocks of MEC servers are dedicated to autonomous driving and e-Health services, respectively. According to ETSI-MEC specifications, MEC servers can be deployed at the gNBs, at aggregation points, or at the edge of the core network [4]. Independently from their position, however, MEC resources (i.e., memory and computing) can be used by users attached to different cells. This important flexibility, however, requires a careful distribution of users' demands, that should take care of the stringent communication requirements, instead of just considering the computational capabilities of MEC servers.
Network resources are monitored, configured, and orchestrated [61]. To this end, SDN controllers continuously interact with gNBs and MEC servers for monitoring the number of users served by each cell, the computational resources they request, and the amount of resources exposed and/or available in each MEC server. Note that SDN controllers can retrieve useful information from network elements through standardized protocols (i.e., OpenFlow, RESTCONF, etc.) [63]. Specifically, since gNBs know how many users are attached to them, they can retrieve information about the number of users served by each gBN by simply asking for such information to the gNBs. This information is delivered to the Multi-access Edge Orchestrator for network optimization purposes. It represents a fundamental entity of the ETSI-MEC reference architecture, included in the MEC system level management [61]. The envisaged solution uses Multi-access Edge Orchestrator capabilities for managing a certain number of gNBs and MEC servers in a given geographical area (i.e., the radio access network is divided into clusters, controlled by one orchestrator) in order to optimally allocate computing and communication resources for task offloading, based on the prediction of spatio-temporal users' dynamics. This is done by satisfying heterogeneous traffic demands. The proposed optimization algorithm, which can be aided by exploiting mobility and service requests prediction, is executed by each orchestrator instance in order to minimize the latency (which is one of the most leading performance measures of 5G and B5G [5], [6]) of each service, while jointly considering network communication and computational requirements and satisfying the upper bound of service latency and related network constraints.
Moreover, an intrinsic characteristic of many 5G services (e.g., autonomous driving, virtual/augmented reality assisting museum tours) is mobility. Therefore, the communication and computational resources must be managed by using a mobilityaware approach, which is considered one of the most critical and challenging issues for network orchestration [6], [33].

IV. PROBLEM STATEMENT
In this section, the system model is described and the optimization problem for the reference scenario and the adopted mobility prediction model are formulated. To facilitate the understanding of the notations adopted in what follows, a summary of symbols is reported in Table II.

A. System model
Let I and |I| be the set and the number of users moving in the considered geographical area, respectively. According to the target application, the request formulated by the i − th user is characterized by the following communication and computational requirements: the communication bandwidth set to b i , the upper bound of latency equal to τ i , the input data size s i , the memory requirement set to m i , and the demanded computational capability (expressed in terms of number of CPU cycles) equal to c i . Let J be the set of available gNBs. The number of gNBs is given by |J |, that is the cardinality of J , and B j represents the amount of bandwidth available The total latency experienced by the i − th user attached to the j − th gNB and served by the m − th MEC server in the k − th time interval is given by: where l radio ij (t k ) is the communication latency experienced between the i − th user and the j − th gNB over the radio interface, l backhaul jm (t k ) is the backhaul latency experienced between the j − th gNB and the m − th MEC server, and l exe im (t k ) is the execution latency experienced at the m − th MEC server [6], [9], [54]. These different latency contributions are shown in Fig. 1.
In compliance with ITU specifications, the communication latency over the radio interface, l radio ij (t k ), is expected to be less than 5 ms [53], [54].
The backhaul latency l backhaul ijm (t k ) is obtained by dividing the aggregate traffic load generated by the users attached to the j − th gNB and served by the m − th MEC server, that is i∈Ijm(t k ) s i , and the capacity of the backhaul link between the j − th gNB and the m − th MEC server, r jm (t k ) [42]: where I jm (t k ) is the portion of users attached to the j − th gNB and served by the m−th MEC server, that share the same backhaul link. The system model described herein is general enough for capturing the behavior of any backhaul topology. Without loss of generality, a mesh topology, with the same capacity for each backhaul link, is considered (see Fig. 1). MEC servers can be deployed at the gNBs, at aggregation points, or at the edge of the core network. Therefore, the backhaul latency varies depending on the scenario. Specifically, assuming that MEC servers are co-located with gNBs without loss of generality, there are two possibilities when calculating the backhaul latency. No additional delay (i.e., l backhaul jm (t k ) = 0) is introduced in the backhaul if the m − th MEC server co-located with the j − th gNB (i.e, m = j), to which it is attached the user, is the one serving the user. Conversely, the backhaul latency is considered and calculated for the backhaul path connecting the gNB, to which it is attached the user, with a neighboring MEC host, which is serving the user.
During the time interval t k , the computing capabilities exposed by each MEC host are assumed to be uniformly allocated among served users. Therefore, the execution latency l exe im (t k ) is equal to [9], [14]: where f im (t k ) is the number of CPU cycles per second allocated by the m−th MEC server to the i−th user. Such an equation is generic enough to be used in any realistic scenario with homogenous and heterogeneous service requirements: the execution latency refers to the computational capability requirements of users, that can execute a single application task, as well as more heterogeneous application tasks.

B. Optimization problem
The goal of this paper is to distribute users' requests among the available MEC servers, so that the latency of each subject to: considered service is minimized and network outage in terms of memory, computing, and bandwidth resources is avoided. Such a problem is stated as a sequential decision-making process: at every decision epoch t k , control actions aiming at assigning users' demands to the best suitable MEC servers are executed, according to their available memory capabilities and computing abilities, in order to minimize latencies experienced by users and to satisfy service latency constraint. At every decision epoch t k , the requests and, hence, the resources needed to run the user services for the N steps ahead are leveraged and the control is executed based on the optimization problem P 1 stated in (4). The solution of the problem is found by executing the dynamic programming approach [51] at every decision epoch t k (i.e., each point of the sequential decisionmaking process where decisions are made), transforming a complex problem into a sequence of simpler problems. In line with the dynamic programming approach [51], the discount factor γ, (0 < γ ≤ 1), is introduced to incorporate the concept of discounting for the look-ahead temporal horizon N . Specifically, the decision cycle t k,n , with n ∈ {0, 1, ..., N }, represents the sequence of the considered time steps (with t k,n = t k+n ) to reach and implement decisions in each epoch t k , whose impact is exponentially weighted through γ n . Thus, it is possible to understand that, when n = 0, t k,0 is weighted through γ 0 = 1, while the future time steps in the sequence have a gradually decreasing weight (i.e., from γ 1 for t k,1 to γ N for t k,N ) in the decision cycle t k,n . The implemented control is expressed by a binary decision variable α im (t k ), that is: 1 if the i−th user is served by the m−th MEC server, i.e., ∀i ∈ I M EC m (t k ); 0 otherwise.
(5) Note that α im (t k ) only involves the backhaul and the execution latency because they depend on the concerned MEC server, while the radio component is independent thereof.
The constraints in (4a) consider the memory capabilities and requirements: the memory capability of the m − th MEC server M opt m (t k,n ) cannot be exceeded by served users in each decision cycle t k,n and the overall memory capabilities need to be sufficient for satisfying memory requirements. The constraint in (4b) regards the CPU ability of the m − th MEC server F opt m (t k,n ) in each decision cycle t k,n . Because of the definition of the execution latency component in (3), computing capabilities are included in the service latency constraint (4c), that is valid for each i − th user in the network , where the maximum tolerable latency τ i is the upper bound of user latency experienced during each decision cycle t k,n . If the computing abilities are not enough, (4c) is not verified. Bandwidth requirements are considered in (4d), where e ij (t k,n ) is the spectral efficiency between the i−th user and the j−th gNB. Moreover, in every decision cycle t k,n each user can be served by one and only one MEC server, as reported in (4e) and (4f), that means the number of users attached to different gNBs should be equal to the number of users served by different MEC hosts.
The solution of the optimization problem P 1 stated in (4) may be anticipatorily found, forecasting the number of users in the coverage area of each gNB. In what follows, the anticipatory optimization approach presented in this work is referred to as Prediction-based Control (P-C). Since the solution of the network optimization problem P 1 stated in (4) may be also found supposing to know the mobility of users in advance, in Section V also the anticipatory network optimization approach based on ground truth, i.e., Ground Truth-based Control (GT-C), is evaluated.

C. Mobility prediction model
The users moving in the considered geographical area may pass from one cell to an adjacent cell. Accordingly, the number of users attached to each gNB changes over time. The goal of the mobility prediction model described in this subsection is to anticipatorily discover the distribution of mobile users  To this aim, this paper leverages real mobility data from the dataset presented in [52], which reports the movements of around 100 taxi cabs in Rome (Italy), from 1 February 2014 to 2 March 2014, with a granularity of about 15 s. The traces of the published version of the dataset gives information on when the taxi position has been collected, with a precision of microseconds, and Global Positioning System (GPS) coordinates, in the decimal format. The considered geographical area of the center of Rome has been divided using square cells, covering an area of 1 km × 1 km each (an example is reported in Fig. 2). Note that square cells have been considered, but the considerations are also valid for arbitrarily shaped cells.
These real traces are used to generate a list of matrices describing the geographical distribution of users over time. For example, Fig. 3 shows the distribution of the number of users for two cells (i.e., j = 1 and j = 3) with a low and high number of users, respectively.
The conceived mobility prediction model exploits the Con-vLSTM architecture to predict the distribution of mobile users among the available cells and for the upcoming N consecutive time intervals, based on the knowledge of the distribution of users (i.e., retrieved by SDN controllers from gNBs, that know how many users are attached to them) observed during the latest T observation time intervals. As depicted in Fig.  4, the considered ConvLSTM architecture is based on LSTM [64], with the convolution operator as input, forget, and output gates instead of the element-wise or Hadamard product [50]. Therefore, it is able to extract temporal and spatial correlations of data through LSTM memory cells and the convolutional operation, respectively [38], [65]. More specifically, this work conceives a learning architecture embracing two 2-dimensional ConvLSTM layers, after each one a batch normalization layer is used to accelerate deep network training [66]. The number of epochs and the number of filters are set equal to 30 (see the convergence analysis proposed in Section V-A) and 200, respectively. At the end, there is a fully-connected layer with the Rectified Linear Unit (ReLU) activation function [38] to predict the expected distribution of users, after the observation window T , for a specific look-ahead temporal horizon N . The predictor is configured in order to minimize the Mean Square Error (MSE) loss function, which minimizes the difference between the ground truth and the predicted distribution of users [48], [65]. The Adam optimization [67], with a learning rate equal to 0.001, is used to iteratively update the network weights.

V. PERFORMANCE EVALUATION
Herein, the performance of the conceived anticipatory network optimization scheme is evaluated by using computer simulations. Without loss of generality, the study considers an autonomous driving use case (with real mobility traces [52] described in Section IV-C and conceivable network and service settings [13], [53]- [59]). Of course, the whole approach can be applied to each use case and heterogeneous scenarios by properly adapting the related parameter settings.
A real geographical area of 10 km 2 in Rome (Italy) is considered, divided into 10 square cells (i.e., |J | = 10). According to the autonomous driving use case, for the i − th user the communication bandwidth and the upper bound of service latency are set to b i = 700 Mbps and τ i = 100 ms, respectively [54], the input data size is set to s i = 5 Mbit [13], [56], and the memory and computational capability requirements are set to m i = 16 GB [57] and c i = 300 Megacycles [56], respectively. The available bandwidth within the cell attached to the j − th gNB and the capacity of the backhaul link between the j − th gNB and the m − th MEC server are set to B j = 40 MHz [58] and r jm = 10 Gbps, respectively. Since it is assumed that MEC servers are colocated with gNBs without loss of generality, |J | = |M| = 10 in the tests. The parameters of MEC servers, whose sizing is a key issue in such systems, are adequately sized with respect to overall requests in each t k,n [59]: the memory capability and the computing ability of the m − th MEC server are set to M opt m = 176 GB and F opt m = 36 Gigacycles/s, respectively. In simulations, it is considered the upper bound of the communication latency experienced over the radio  [54] e ij 30 bit/s/Hz [55] Period of time 3600 s interface l radio ij (t k,n ), that means to use constant values for l radio ij (i.e., 5 ms [53], [54]) and the spectral efficiency e ij (i.e., 30 bit/s/Hz [55]) in each t k,n . Table III summarizes the main adopted parameters.
In line with the dynamic programming approach [51], the optimal solution has been found at each decision epoch t k through the value iteration algorithm implemented by using Matlab. As anticipated in Section IV-B, the optimization problem P 1 is solved by considering the predicted number of users per cell over a specific temporal horizon, i.e., |Î j gN B (t k,n )| by using the actual distribution of users for n = 0 and the prediction of the number of users for the future time steps in the decision cycle t k,n , and the perfect knowledge (ground truth) of users' distributions, i.e., |I gN B j (t k,n )|. These anticipatory mechanisms based on prediction and ground truth are denoted with P-C and GT-C, respectively. Note that the comparison between P-C and GT-C intends to highlight the effectiveness of the prediction procedure and its impact on the overall system performance. The behavior of both P-C and GT-C is studied for different temporal horizons, that are N = 5 s, 10 s, 20 s, 40 s, to evaluate their effect on key performance indicators. Moreover, to provide further insight, the anticipatory methods P-C and GT-C are compared with a Baseline approach. It just leverages the distribution of users at the current time instant t k and allocates their requests to the closest MEC server (i.e., co-located with the gNB in the related cell), without envisaging optimization problems and constraints. Therefore, since the related literature is missing works that perform network resource optimization based on the prediction of the number of users problem (please see Section II for further details), the proposed anticipatory optimization approach based on mobility prediction is compared with the anticipatory network optimization approach based on ground truth and with the Baseline approach defined above.
Since the system configuration decision and the user latency are updated with a time granularity of 1 s, the latency constraint (4c) is continuously taken into account by P-C and GT-C for considering the impact of users' mobility, handovers, and possible service migrations among MEC servers. As a consequence, the proposed system model and the conceived optimization problem allow to successfully meet the whole service latency constraint. Note that the developed approach has been conceived for 5G and B5G networks. Indeed, it is possible to assume a 0 ms handover latency (namely mobility interruption time in 3GPP specifications) [68]. At the same time, this assumption does not influence the behavior of the proposed approach because resources are optimally allocated on a much higher time granularity than the mobility interruption time allowed in 5G (and beyond) deployments. For the same reason, the presented approach does not explicitly consider the virtual machine/container migration process. With a time granularity of 1 s, the Multi-access Edge Orchestrator, that optimally orchestrates requested services and available communication and computational resources, communicates with all the network entities. Therefore, any configuration changes (i.e., on the number of users and related resources to be allocated) are known by MEC servers through the interaction with the orchestrator. Moreover, the delay of task migration between MEC servers can be considered negligible in vehicular context [69], as the analyzed case, or, through prediction information for the look-ahead temporal horizon N , the migration process can eventually occur before it actually happens so that the users do not experience any additional delay due to migration [70].
The measured key performance indicators entail a complete analysis on mobility prediction performance and latency per user, as well as the number of changes among MEC servers, the distribution of users among MEC servers, consumed memory, and CPU usage. The number of changes among MEC servers is included as key performance metric because, in mobile scenarios, it is important to guarantee service continuity. The changes among MEC servers (and so also the number of potential migrations) imply the establishment of new backhaul connections, having negative effects on experienced latency. Also the backhaul connection quality affects the computation execution [3]. Therefore, it is better to avoid changes among MEC servers and keep connectivity between the user and the serving MEC host [3], [71]. Finally, also the complexity of the proposed anticipatory network optimization scheme is evaluated.
All the results reported below have been evaluated in a period of time (i.e., decision epochs) equal to 3600 s, with a time granularity of 1 s, and have been obtained by averaging the outcomes on the 3600 realizations. Together with average values, the 95%−confidence intervals, computed through the Gaussian statistical distribution, are reported as well for the spatial characterization. For the characterization during 3600s, the Cumulative Distribution Functions (CDFs) illustrate only P-C and Baseline, because GT-C trends overlap with P-C and they are omitted for the sake of clarity.

A. Mobility prediction performance
Regarding the prediction procedure (integrated within the P-C scheme), the conceived mobility prediction model exploits the ConvLSTM architecture, as described in Section IV-C. To provide further insight, the comparison with a state-of-the-art mobility prediction approach, that uses the LSTM architecture aided by the attention mechanism for capturing long-range dependency [34], is presented as well. In particular, the reference learning architectures selected for the cross-comparison    embrace four LSTM layers (i.e., two with 200 and 2 with 100 hidden units, respectively, after each one a batch normalization layer is used) in order to have a comparable complexity of the corresponding ConvLSTM architecture. Note that the mobility prediction architecture needs a different number of training parameters for each considered temporal horizon N , as summarized in Table IV. Fig. 5 shows the prediction loss (i.e., MSE) of the ConvL-STM architecture and the LSTM architecture with attention as a function of the number of epochs for the training set and the validation set, representing 80% and 20% of the adopted dataset, respectively. The reported curves confirm that the developed ConvLSTM architecture reaches lower values of MSE with respect to the LSTM architecture with attention. Moreover, differently from the LSTM architecture with attention, the ConvLSTM architecture fastly converges to stable values and does not need a long training period. Accordingly, the ConvLSTM architecture trained during 30 learning epochs, which achieves better results in terms of prediction loss and convergence time /complexity, is considered hereafter.
To deeply evaluate the mobility prediction performance in the investigated scenario, Fig. 6 reports the MSE values for each j −th cell, registered with the different temporal horizons N . Cells j = 3 and j = 8, with the highest number of users (see the trend of |I gN B 3 | in Fig. 3), reach the greatest values of prediction loss. Moreover, the MSE value tends to increase with N , as expected. In fact, the highest values of MSE are registered for N = 40 s, even if in this case MSE is generally lower than 4. Fig. 7 depicts the average latency per user served by each MEC server. The average latency per user registered by both P-C and GT-C schemes is always lower than the maximum tolerable latency τ i . Furthermore, the proposed optimization approaches, involving a load balancing among MEC servers, keep a nearly stable and uniform latency throughout the network. Thus, they can improve the computation efficiency of MEC servers, avoiding overloaded MEC servers, as well as user experience, balancing MEC servers loads and always satisfying service latency requirements [6], [59].

B. Latency per user
On the contrary, without the previous implications, the Baseline scheme registers an average latency per user that exceeds the maximum tolerable latency τ i in high-loaded cells.
The CDFs of all the latencies per user reported in Fig. 8 thoroughly confirm that only P-C always ensures service latency requirements for all the users in the network, differently from the Baseline approach. To deeply analyze the impact of the horizon N , the values of the average latency per user among MEC servers for P-C with each analyzed horizon are reported. They are equal to 83.2 ms for N = 5s, 81.1 ms for N = 10 s, 82.7 ms for N = 20 s, and 83.6 ms for N = 40 s. Obtained results clearly reveal that from N = 5 s to N = 10 s the average latency per user among different MEC servers decreases, while it tends to increase with values higher than N = 10 s, registering the highest value of latency for N = 40 s.
To conclude, N = 10s is a suitable optimization horizon because of slightly lower values of latency.  Fig. 9 shows the CDFs of the number of changes among MEC servers. Reported curves demonstrate that the presented proposal generally has the highest performance levels: a number of changes equal to 0 is registered by only 28.31% of realizations for the Baseline scheme, whereas the proposed approach presents around 90% of samples with 0 changes. Focusing on the horizon N , from N = 5 s to N = 10 s the performance improves, while it tends to decrease with values higher than N = 10 s. In fact, when N = 10 s, the number of changes is always lower and the value of the 95−th percentile is 4 changes with respect to 7, 5, and 9 changes registered for N = 5 s, N = 20 s, and N = 40 s, respectively. Thus, increasing the considered future steps (i.e., from N = 5 s to N = 10 s) in the optimization problem P 1 reduces the number of changes among MEC servers. However, because of higher variability, for longer temporal horizons (i.e, N = 20 s and N = 40 s) the anticipatory network optimization approach starts to imply a higher number of changes among MEC servers and the highest number is reached with N = 40 s. The 95 − th percentile of the presented approach with N = 10 s outperforms also Baseline (i.e., 5 changes).

C. Changes among MEC servers
In summary, this analysis further confirms that N = 10 s is a suitable optimization horizon since it minimizes the average user latency and the number of changes among MEC servers. Fig. 10 shows the average number of users served by each MEC server. Both anticipatory network optimization methods (P-C and GT-C) are able to fairly distributed users' demands among the different MEC servers, regardless of the gNB to which they are co-located. Moreover, since the ConvLSTM architecture has very high prediction performance, P-C behaves very similarly to GT-C. They have exactly the same behaviors for MEC servers co-located with gNBs having a high number of users (e.g., j = 3), that are fully used under memory and computing constraints. Also by varying the temporal horizon N , P-C and GT-C achieve a very similar average number of users served by each MEC host.

D. Distribution of users among MEC servers
Instead, the Baseline approach is deeply biased by the distribution of the users among cells and, in particular, its policy is to maintain the users at the MEC server co-located with the gNB in which they are attached. Fig. 11 illustrates the CDFs of the number of users served by different MEC servers. It further confirms the extremely high similarity between the trends of P-C with different N , that behaves differently from the Baseline scheme having higher variability.
E. Amount of memory consumed by MEC servers Fig. 12 represents the average values of the memory consumed by each MEC server. Also in this case it is possible to observe that both P-C and GT-C methods, with different horizons N , well balance the load among the MEC servers. This result confirms the fairness property investigated in the previous subsection. In high-loaded cells, the MEC servers co-located with the gNBs saturate their available memory. As a consequence, the proposed approaches redirect some of the requests generated within these cells towards other MEC servers, thus always satisfying the constraint reported in (4a). Instead, the consumed memory M cons m for Baseline, without memory constraints, reaches an average value of around 400 GB for MEC servers corresponding to cells with a high number of users (e.g., m = j = 3), as demonstrated by the reported results.
As an additional confirmation, the CDFs reported in Fig.  13 describe how the Baseline approach registers peak usage of memory equal to around 600 GB. On the contrary, the anticipatory optimization scheme developed in this work guarantees quite balancing of the amount of memory consumed in the available MEC servers, which is always below the target upper bound.

F. CPU usage of MEC servers
According to the definition in (3) and the related constraint (4b) of the formulated optimization problem P 1, the CPU capability of each MEC server is completely consumed in all the implemented approaches. Note that the computing ability of the m − th MEC server in the optimization problem P 1, i.e., F opt m = 36 Gigacycles/s, is adequately set with respect to overall requests and it is lower compared to typical values of the MEC server ability. In fact, they can be greater than 1000 Gigagycles/s [72] and, with that assumption, the vast majority of the available CPU resources could be dedicated to other services and purposes.
Of course, the CPU ability of MEC server affects the execution latency experienced by each user, which is the most significant component. In fact, because of the computing sizing of MEC servers (i.e., F opt m ), the average total latency per user performed through the optimization approach is generally closer to the maximum tolerable latency τ i (as detailed in the next subsection) and it validates the current hypothesis in considering the radio component as constant. Without load balancing among MEC servers, the same computing abilities (i.e., F m = F opt m ) are not sufficient to always satisfy the upper bound of service latency τ i in the Baseline case. In particular, the average number of CPU cycles/second allocated by the m − th MEC server to the i − th user, i.e.,f im , is generally lower for Baseline compared to P-C and GT-C, as demonstrated in Fig. 14. Therefore, the Baseline case has higher execution latency components because of lower values off im . Furthermore, sincef im is inversely related to the number of users served by the m−th MEC server |I M EC m |, the Baseline scheme registers the lowest and the highest values off im for MEC servers co-located with gNBs having a high and a low number of users, respectively.
The related CDFs reported in Fig. 15 clearly illustrate the similar behaviors of P-C with different horizons N , that generally has higher values off im with respect to Baseline. Moreover, this figure confirms that the maximum possible value for the number of CPU cycles per second allocated by the m−th MEC server to the i−th user (i.e., f im ) is the CPU ability of MEC server F opt m .

G. Complexity analysis
Despite the overall better performance reported in the above subsections, introducing the anticipatory methods increases the complexity in the network management system, basically due to finding the solution to the optimization problem P 1. In Table V, P-C and Baseline are compared in terms of the average running time of each decision epoch t k and the average total number of objective function evaluations needed for each decision epoch t k , characterizing only P-C solved through value iteration, which is j∈J |I gN B j (t k,n )| · |M| · N [73]. GT-C is omitted because the cost of the optimization process is analogous to P-C. However, here it is highlighted that the  mobility prediction model required by P-C needs an extratraining phase, which early converges anyhow (Section V-A). Without the optimization problem, Baseline has an extremely lower running time because it does not implement any controls and does not anticipatorily evaluate the user distributions. For P-C, it is evident that the complexity increases with N . In fact, the larger the look-ahead horizon N , the deeper in future in the objective function of each k − th optimization problem (i.e., by considering N steps ahead in each decision cycle t k,n ). Thus, P-C with N = 5 s has the lowest average running time and the lowest number of objective function evaluations per decision epoch. Intermediate values are reached when N = 10 s and N = 20 s and P-C with N = 40 s registers the highest complexity. Again, N = 10 s is the best trade-off between performance and complexity. Note that simulations have been executed on an Intel Core i5 CPU quad-core with 8 GB of RAM and the running time will be extremely reduced on a powerful machine with GPU, by improving the efficiency of the proposed approach [74], [75]. In particular, GPU server is at least 4-5 times faster than CPU server (with 16/24 cores) [75]. The significant profit of using a powerful machine makes the running time not only comparable to but much lower than the optimization epoch of the optimization algorithm. The effectiveness can be further enhanced (i.e., much shorter running time) by using a GPU server with more features [75], that are actually used by network operators.
It is remarked here that the encouraging results achieved by the proposed anticipatory network optimization approach open future research directions aiming at decreasing the computational complexity of the proposed solution based on dynamic programming, while maintaining the same performance. To this aim, solutions based on deep reinforcement learning [76] and distributed training [77] seem interesting areas to be further explored.

VI. CONCLUSIONS
This work presented a novel methodology for anticipatorily allocating communication and computational resources at the network edge, and over different look-ahead temporal horizons. Specifically, the Convolutional Long Short-Term Memory has been used to predict the number of users served within a given number of cells and their related service demands, and the Dynamic Programming has been exploited to optimally allocate users' requests among Multi-access Edge Computing servers for better managing task offloading within a network slice created into a 5G system. By focusing the attention on the autonomous driving use case, computer simulations demonstrated how the proposed solution is able to fairly distribute users' requests at the network edge, while satisfying communication and computational constraints, as well as ensuring the upper bound of the communication latency expected for the considered service. Future research activities will further extend the presented study by considering more complex network scenarios with heterogeneous services sharing a variable amount of resources at the network edge, energy consumptions, and realistic Beyond 5G deployments.  Fig. 15. CDF of the average number of CPU cycles/second per user for P-C and Baseline.