Optimal Resource Allocation in C-RAN through DSP Computational Load Forecasting

The Cloud-RAN (C-RAN) paradigm is envisioned to increase the efficiency of future mobile networks by moving the computational resources needed at the Remote Radio Heads (RRH) to the cloud infrastructure. In this work, we provide a framework that optimizes the number of allocated virtual resources by considering both the computational requirements of the RRH and the Quality of Service of users, which could experience loss of service due to reassociations between the RRH and the virtual machines. The provided optimization framework is supported by data coming from a real mobile network of a middle-sized European city, which provides an estimate for the computational loads coming from the RRH. We evaluate the performance of the framework in different scenarios, analyzing the impact of different forecasting algorithms as well as different look-ahead intervals for the predictions (short-term / long-term). The results obtained by our framework can be used to assist network operators in the optimization of C-RAN resources and shed some light on the interplay between forecasting errors and overall performance.


I. INTRODUCTION
The emerging 5G-and-Beyond networks will strongly rely on the concept of Network Function Virtualization (NFV), enabling network operators to move all network functions, pertaining to both Core and Radio Access Network (RAN), into the Cloud. Indeed, the so-called Cloud-RAN (C-RAN) architecture envisions the aggregation of computational resources of all Base Stations (BS) in a centralized baseband unit (BBU) pool, which is connected to a densely distributed set of Remote Radio Heads (RRH) through high capacity links [1]. Such an approach is able to optimally allocate computational resources in a dynamic way to the individual RRH, whose hardware design is much simpler than legacy LTE eNodeBs. Compared to traditional architectures, the C-RAN approach is envisioned to lower overall hardware costs and energy consumption, at the same time increasing spectral efficiency and network resource utilization [2]- [5].
However, the actual realization of the C-RAN vision is not without its challenges: first, the complexity of the front-haul network (the network portion connecting the RRH with the BBU pool) is increasing due to the higher and higher bandwidth requirements. For the same reason, the time requirements of the Digital Signal Processing (DSP) functions (e.g., frame (de)modulation, (de)coding, and IFFT/FFT) to be implemented at the BBU are tighter and tighter. It follows that performing a paradigm shift in which such DSP functions are executed on General Purpose Processing hardware (GPP) in the Cloud, rather than on specialized hardware, is a complex operation which may be successfully completed only by (i) properly characterizing the computational requirements needed at the BBU and (ii) being able to accurately forecast them so that optimal dynamic reallocation (up/down scaling) of the virtual resources can be accomplished.
This paper focuses exactly on such two aspects: first, the computational load required for executing the decoding functions of a real cellular network is characterized, starting from a dataset of measurements related to the radio resources such as Modulation and Coding Scheme (MCS) and used Physical Resource Blocks (PRB). The characterization is performed relying on the Open Air Interface (OAI) simulator, which allows us to accurately evaluate the computational load needed for frame decoding. Then, the obtained computational load requirements are predicted using different forecasting models and prediction look-ahead intervals, in order to thoroughly characterize the prediction error. Finally, the results of these two steps are used to cast a robust optimization problem which minimizes the number of BBU resources needed (e.g., the total cost for the network operator), while ensuring enough computational resources for in-time decoding. The problem is solved over multiple time intervals, also taking into account possible re-associations between RRHs and BBU pools, which could possibly impact the users' Quality of Service. The resulting framework can be used to assist network operators in the optimization of C-RAN resources, also giving insights on the interplay between forecasting look-ahead intervals, prediction errors, and overall performance.
The remainder of this paper is organized as follows: Section II reviews related works in the literature; Section III gives an overview of the system model, while the optimization problem is formulated in Section IV. The characterization of computational load and related forecasting insights are given in Section V and Section VI respectively. Section VII reports on the performance results obtained and Section VIII concludes the paper and highlights future research directions. particular, some works focus primarily on obtaining speededup implementations through the use of GPU and multicore CPUs [6] or parallel architectures [7], which can be utilized in Cloud scenarios. At the same time, several works analyze and emphasize the computational complexity of frame decoding, which is directly influenced by several network-related parameters, such as the Moduling and Coding Scheme (MCS), the number of Physical Resource Blocks (PRB) to be processed at the same time, the current Signal-to-Noise Ratio (SNR), as well as Cloud-related parameters such as the CPU frequency and number of cores of the particular virtual machine used for decoding. The work in [8] gives an excellent review of the related work in the area of C-RAN complexity requirements characterization.
In order to perform realistic, repeatable and scalable experiments in such a scenario, Nikaein et. al propose Open Air Interface (OAI), an open-source reference software implementation 1 of 3GPP-compliant LTE/LTE-A systems [9]. The framework has been used in several works related to modeling and analysis of the computational requirements of DSP decoding functionalities in Cloud environments. As an example, [10] proposes the CloudIQ resource management framework, where OAI is used to implement and demonstrate such a solution in a realistic scenario. Authors show that C-RAN architectures can potentially save as much as 22% in computing resources compared to legacy approaches, by exploiting the variations in the processing load across base stations. In [11] virtual DSP functions are implemented on OAI, where different virtual technologies (e.g., virtual machines, containers) are compared in terms of total computing times for different values of MCS and PRB. It is demonstrated that the processing requirements are dominated by uplink decoding and can be estimated accurately as a function of PRB, MCS and the virtualization environment. The OAI framework is also leveraged in [12] to characterize the computation energy consumed by a BBU pool. Consequently, authors cast a resource allocation problem to minimize the number of active virtual machines and therefore obtain energy savings.
Several works are focused on optimizing the association between RRHs and the BBU pool with the goal of either reducing power consumption [13]- [15] or the total system cost [16]. In [13] RRH DSP requirements from two template cells (business/residential areas) of a real mobile network are split in tasks (decoding, modulation, FFT/IFFT) and each task is allocated to a different BBU so that the total power consumption is minimized. The problem is solved through a simulated annealing heuristic, showing power consumption savings between 5% and 20% compared to a static, non-virtualized architecture. In [16], authors focus on minimizing the total system cost, letting each UE to connect to multiple VMs in the BBU pool and considering limited fronthaul capacity. Each VM in the system is modeled as a FIFO queue, and the resulting problem is solved optimally with efficient search algorithms. However, all system parameters are set to arbitrary values, and no realistic datasets are used. Finally, Boulos et al. in [15] focus on RRH-BBU association optimization. The problem is formulated as a bi-objective problem, minimizing both power consumption and, similarly to our work, the total number of RRH re-associations to a new BBU. The problem is solved using a heuristic derived from the bin-packing problem literature. Again, simulation parameters are chosen arbitrarily, without leveraging realistic network datasets. To the best of our knowledge, this is the first work that studies the impact of DSP computational load forecasting on the optimal DSP resource allocation framework performed on a realistic LTE network dataset.

III. PROBLEM OVERVIEW
We consider a C-RAN network consisting of N RRHs, connected to a BBU pool formed by M virtual machines (VM). Each VM is responsible for executing the DSP functions of one or more associated RRHs, and we assume that an RRH may be connected to only one VM (e.g., M ≤ N ). At any instant, only a subset of m ≤ M VMs is active, providing enough computational resources to serve all RRHs. It is well known that among the different DSP functions, frame decoding is the most intensive one [11]. In this paper, we, therefore, assume that the requirements of each RRH are dominated by the decoding operation, whose computational complexity is determined by the number of used PRBs, the MCS distribution (i.e., the number of bits carried by each PRB) and the SNR. Furthermore, we treat the scheduling algorithm controlling the RRH (i.e., deciding how many PRBs to allocate and the corresponding MCS distribution) as a black box and we assume that our framework is able to observe only the output of such a black box. This scenario closely reflects the knowledge available at the operator side, which generally cannot modify the operational details of the scheduler algorithm, but it is able to observe the network KPIs resulting from its use.
We assume that the network operator relies on a third-party platform for managing and running the VMs, paying a price for the service which depends on how many VMs are active and for how long. The main objective of the operator is to minimize the total cost for the virtualization service, which requires to forecast the computational requirements of all the RRHs in order to activate/deactivate VMs accordingly. The process is subject to two main constraints, both related to the Quality of Service perceived by users of the network: 1) In-time decoding: frame decoding for each RRH must be completed by the BBU within stringent time requirements (the typical HARQ loop lasts a few ms in LTE [11], [17]). Any delay in the process due to under provisioning of the virtual resources may lead to the expiration of the corresponding timeouts and triggers of frame retransmissions at the user side, thus decreasing its QoS. 2) BBU-RRH reassociation: upon sudden peaks in the computational requirements or any changes in the number of active VMs, some of the RRHs will need to re-associate to a new VM in the BBU pool. Depending on how such a reassociation is handled, the process may cause all users connected to the RRH to experience a loss of QoS. Such an issue should be considered by the optimization framework.

IV. PROBLEM FORMULATION
The problem described in the previous section can be formalized as a bin packing problem. Time is divided into discrete epochs of arbitrary length: at each epoch t the computational requirements of all RRHs are fit to the smallest possible number of VMs.

A. Decision variables
be a binary variable defined as: (1) Since each RRH must be associated to exactly one VM in each epoch, we have that: At each epoch, a VM in the BBU pool can be active (i.e., associated to at least one RRH) or inactive. Let y j , t be a binary variable tracking the state of each VM in each epoch, defined as: It is easy to show that (3) can be rewritten using the two following linear constraints:

B. QoS-related constraints
At the beginning of each epoch t, the operator forecasts the computational load c i,t (expressed in the number of CPU cycles) required for decoding frames coming from the i-th RRH and uses it to allocate VMs properly. As aforementioned, two constraints related to the QoS perceived by users should be considered: 1) In-time decoding: frame decoding has strict time requirements, which must be satisfied in order to avoid QoS degradation. Let d be the deadline (in seconds) for decoding frames at the BBU pool, and b the computational budget (i.e., the CPU frequency) of each VM (in Hz). To ensure that all frames are decoded within the deadline by the BBU pool at epoch t, we may write: For simplicity, here we assume that all VMs have the same computational budget b, although the model can be easily generalized to the case where VMs have different CPU frequencies.
2) RRH-BBU reassociation: due to load variations, an RRH may be associated to different VMs between two consecutive epochs. During reassociation, all users connected to the RRH may experience a loss of QoS, e.g., due to forced handovers to other RRHs. Let r i,j,t be a binary variable defined as: Again, r i,j,t can be formalized with the help of the following linear constraints: Rather than expressing a direct constraint, RRH-BBU reassociations are managed as a penalty term in the objective function, as explained next.

C. Objective function
Let p be the per-VM price paid by the operator to the thirdparty for managing the virtualization service in each epoch (e.g., hourly or every 15 minutes). We assume that the operator is able to forecast future RRH loads in a time window T composed of several epochs (i.e., t = 1 . . . T ). The operator's goal is to minimize the following objective function: where the first term captures the total cost for running the VMs, the second term is a penalty introduced to limit the number of RRH-BBU reassociations and the variable α is a scaling factor which could be used for balancing the two terms and prioritizing one over the other. The term 1 2 is added to avoid counting twice a reassociation of an RRH from one VM to another. Note that here we assume the cost p to be timeinvariant, although the model could be easily generalized to the case where p changes with time.

V. LOAD CHARACTERIZATION A. Dataset
In order to characterize the loads c i,t , we leverage a dataset containing measurements from 443 LTE base stations deployed in a middle-sized European city, all working at a frequency of 2100 MHz and with a channel bandwidth of 10 MHz. For each base station, two weeks of data sampled in 15 minutes are available. We focus on two specific measurements, namely the MCS and PRB utilization uplink distribution. The former reports the distribution of uplink MCS assigned by the scheduler in each 15-minute interval, while the latter tracks the distribution of used uplink radio resources (i.e., how many of the 50 PRBs available in the 10 MHz channel are allocated to users) in each 15-minute interval. Such distributions are pre-processed in order to obtain two single-valued time series, by picking the maximum MCS and PRB utilization values that occurred within each time interval. We prefer such a conservative approach, rather than using the distribution mean, in order to ensure that the worst-case scenario in terms of the computational load is considered. Fig. 1 shows the MCS and PRB traces for one base station in the dataset relative to one week.

B. Emulation with OAI
The MCS and PRB traces are input to the Open Air Interface (OAI) platform, which allows emulating the full 3GPP LTE protocol stack, including highly optimized baseband processing functionalities such as turbo decoding. In particular, we leverage the ulsim tool of OAI, which emulates the Physical Uplink Shared Channel (PUSCH) decoding pipeline at the eNodeB. The tool allows to specify several input parameters, including the sub-frame MCS, the current PRB load (how many resource blocks are allocated to the user), channel parameters such as bandwidth and SNR, as well as other options (e.g., the number of iterations of the turbo decoder). For a given input configuration, the tool emulates the decoding process and keeps track of the corresponding CPU time t. Knowing the CPU frequency f , it is possible to express the computational load for decoding as c = f * t. Fig. 2 shows the computation load in Million Operations (MOP), obtained with OAI at different MCS and PRB working points, averaged over 1000 sub-frames. Due to the lack of fine-grained SNR measurements in our dataset, simulations were run setting the lowest possible SNR value allowed for each MCS configuration, which corresponds to the working point with the highest computational load, as  demonstrated in [8], [18]. All tests were performed on a singlecore Intel(R) Xeon(R) CPU E5-1660 v3 @ 3.00GHz, 16 GB RAM, with Ubuntu 16.04 OS, using an AWGN channel with a bandwidth of 10 MHz (as in our dataset), setting the number of iterations of the decoder to 4.

C. Estimation of RRH computational load
The curves in Fig. 2 are used to compute a load profile for each base station in the dataset, starting from the MCS and PRB traces described in Section V-A. We observe that different base stations are characterized by different load profiles. Therefore we perform k-mean clustering over the traces, choosing the best k using both Silhouette and Davies-Bouldin cluster quality indexes. We reveal the existence of 3 distinct clusters, shown in Fig. 3, where each cluster centroid is represented by the red bold line. As it can be observed, all clusters are characterized by the day/night fluctuation typical of mobile traffic. However, the first cluster is mainly composed of base stations characterized by a constant, high load throughout the whole week, with a very small decrease during nights. In the second cluster, the load during nights is approximately halved, while the third cluster is composed of cells that are completely offloaded during nights. Among the 443 base stations in the dataset, 48% belong to the first cluster, 9% to the second, and 43% to the third. We refer to these three clusters as High Load (H), Moderate Load (M) and Low Load (L), respectively.

VI. LOAD FORECASTING A. Look-ahead interval
The optimization problem proposed in Section IV needs as input the loads c i,t for the entire time window T . We assume that load forecasting is performed up to a certain prediction horizon, which we refer to as look-ahead interval L. As an example, if each epoch t lasts 15 minutes and L = 4, forecasting is performed to obtain load values up to 1 hour in the future. Note that L controls how many times the optimization problem needs to be solved. When L = T the forecast algorithms predict all values c i,t and only one instance of the problem is solved. When L < T , the optimization problem is solved T /L times, each time considering only the L available predicted loads. In this latter case, we substitute the term T in (10) with L, and the variables r i,j,0 at the beginning of each look-ahead interval except for the very first are stored from the previous interval and passed as input parameters.

B. Forecasting algorithms
Let t be the epoch at which forecasts need to be computed. We consider three different options for computing the predicted load samples c i,l , l = t + 1, . . . , t + L: Note that only the sample c i,t+1 is predicted from past samples, while all other forecasts up to L are predicted starting from already predicted values. The model parameters β k and the value of W are chosen following a standard machine learning approach: the entire set of computational loads c i,t is divided into training (first week of data) and test set (second week). The model is trained and the parameters are chosen so as to minimize the training Root Mean Squared Error (RMSE). The value of W resulting from such a process is equal to 96 samples, corresponding to one day. Increasing values of W resulted in negligible improvements with a considerably higher amount of memory used, and were hence not considered.  The following considerations can be done: (i) in general, as expected the RMSE increases as look-ahead interval increases; (ii) among all algorithms, MLR outperforms both LD and LV for small look-ahead intervals, not only in terms of average error but also in terms of maximum RMSE; (iii) for larger look-aheads (e.g. 4 hours) the performance of MLR and LD are similar on average, with MLR showing much lower variance at small look-aheads; (iv) the LV algorithm shows the worst performance among all methods. In the following section, we use such an error characterization to analyze the impact of using different algorithms and look-ahead intervals in the C-RAN optimization problem.

A. Setup
We evaluate the optimization framework considering 60 out of the 443 available base stations, randomly selected from the dataset according to the same cluster distribution reported in Section V-C. The time window T is set to one day, coinciding with the first day of the test week, and four look-ahead intervals are tested: 15 minutes, 1 hour, 2 hours, and 4 hours. The problem formulated in Section IV is implemented with Python using the Gurobi interface and solved on an Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz, 40 cores, 126 GBs of RAM, running Ubuntu 16.04 OS. In order to provide realistic values for our scenario, we use parameters derived from the popular Amazon EC2 cloud service. In particular, we set the budget value b to 10 MOP, which corresponds to a price p of 0.2016 dollars for every 15-minute occupation of the resources.

B. Benchmarks
As a first experiment we run the optimization problem setting α = 0, that is without considering RRH-BBU reassociations, VMs are allocated in a static fashion through over-provisioning, considering the peak load of each RRH and letting RRHs to share the same VM instance; (ii) without C-RAN, representing a scenario in which each RRH is associated with one full VM instance, without the possibility of sharing resources among RRHs. As one can see from Fig. 5, dynamic C-RAN allocation is able to provide considerable cost savings, in the order of 25% compared to the static C-RAN case and 2.5 times better than without C-RAN. Note also that the look-ahead interval does not have any effect in this case, since i) the prediction returned by the oracle are not affected by the look-ahead interval and ii) reassociations are not contemplated, making the temporal dependency between adjacent epochs (i.e., the second term of (10)) irrelevant.

C. Numerical results
Next, we run the optimization problem with different values of α and considering several pairs of forecast algorithms and look-ahead intervals. Note that, as illustrated in Fig. 4, each pair is characterized by a different RMSE distribution. Since the forecasts are used in (6) as input to the optimization problem, it is important to leverage the knowledge of the RMSE distribution to avoid that the predicted load are underestimated, which would result in insufficient provisioning of VMs. We take here a worst-case robust optimization approach in which the forecasted loads c i,t are augmented with a value , which is set equal to the maximum RMSE observed for the particular algorithm/look-ahead interval pair under consideration. In this way, we ensure that the active VMs are always enough to satisfy the RRHs' requirements, even when the forecasts are affected by the maximum error. This satisfies the first QoS constraint introduced in Section III. The four graphs in Fig. 6  the RRH-BBU reassociations. Each graph shows on the left side with solid bars the total VM cost (expressed in percentage increase over the dynamic C-RAN benchmark), while on the right side the number of reassociations (striped bars). We note that: • A clear trade-off is visible between the two terms of the objective functions: small values of α keep the cost increase around 10% for all algorithm/look-ahead interval pairs, with an associated number of reassociations higher than 10 in most cases. Conversely, higher values of α are able to reduce considerably the number of reassociations at higher VM cost. An interesting case is given by the  LV predictor (red bars) for α = 100: the high error associated with the LV algorithm has the effect of greatly overestimating the RRH load requirements, making the solution similar to the static C-RAN scenario. In this case, no reassociations are performed. • For low values of α, the optimal look-ahead interval changes according to the algorithm used. As an example, for α = 0.1 and 0.5, the VM cost decreases as the lookahead interval increases in the LD and LR cases. This does not hold for the LV case, for which the RMSE increases drastically as the look-ahead interval expands. • For high values of α, the number of reassociations increases as the look-ahead interval increases for the LD and LR case, suggesting that short-term predictions should be preferred in place of long-term ones. • In general, and as expected, no solution is able to decrease simultaneously both the number of reassociations and the VM cost. It is therefore the operator's duty to tune the α parameter according to the preferred scenario.

D. Problem complexity
As explained in Section VI-A, the look-ahead interval controls the number of optimization problems run in the entire interval T . Note that each problem is essentially a bin-packing problem, which is known to be NP-hard. Table I shows the solving times for α = 0.1, in the dynamic C-RAN with oracle scenario and with 60 BS. As one can see, a higher look-ahead interval means higher solving time. This is promising for cases where short-term predictions provide better results than longterm ones, as solving multiple small instances of the problem is much more efficient than solving less larger. Tackling large instances of the problem, entailing hundreds of RRHs and with time intervals spanning over weeks, requires the development of specific heuristic algorithms, which are left as future work.

VIII. CONCLUSION AND FUTURE WORKS
In this paper we introduce a framework that exploits computational loads forecasting to optimally allocate VM instances in a C-RAN architecture, additionally taking into account the number of service interruptions for the final users due to RRH-BBU reassociations. The outcomes of this work can be used for various cases of 5G-and-Beyond networks in which DSP resources of RAN are virtualized and combined with dynamical algorithms for re-allocation of virtual instances according to the current network loads. Furthermore, in combination with slicing technologies, our approach could be applied differently for various slices since it allows different treatments of service interruptions, which could be defined by a network operator.
Future works include (i) improving the prediction algorithms to obtain oracle-like forecasts and (ii) developing a heuristic algorithm to tackle larger instances of the problem, considering hundreds of base stations and larger look-ahead intervals.