Resource Calendaring for Mobile Edge Computing in 5G Networks

Mobile Edge Computing (MEC) is a key technology for the deployment of next generation (5G and beyond) mobile networks, specifically for reducing the latency experienced by mobile users which require ultra-low latency, high bandwidth, as well as real-time access to the radio network. In this paper, we propose an optimization framework that considers several key aspects of the resource allocation problem for MEC, by carefully modeling and optimizing the allocation of network resources including computation and storage capacity available on network nodes as well as link capacity. Specifically, both an exact optimization model and an effective heuristic are provided, jointly optimizing (1) the connections admission decision (2) their scheduling, also called calendaring (3) and routing as well as (4) the decision of which nodes will serve such connections and (5) the amount of processing and storage capacity reserved on the chosen nodes. Numerical experiments are conducted in several real-size network scenarios, which demonstrate that the heuristic performs close to the optimum in all the considered network scenarios, while exhibiting a low computing time.


I. INTRODUCTION
Next generation (5G and beyond) mobile networks are currently being deployed, and need to provide services characterized by ultra-low latency, high bandwidth, as well as real-time access to the radio network. To achieve these goals, Mobile Edge Computing (MEC) is envisaged to provide an IT service environment and cloud-computing capabilities at the edge of the mobile network, within the Radio Access Network and in close proximity to mobile subscribers; through this approach the latency experienced by mobile users can be considerably reduced. However, the computation power that can be offered by an edge cloud is limited if compared to a remote cloud. Considering that 5G networks will be likely built in an ultra-dense manner, the edge clouds attached to 5G base stations will also be massively deployed and connected to each other in a specific mesh topology. Thus, by exploiting the cooperation among multiple edge clouds and by carefully allocating edge resources to each connection, we can provide a solution to the limitations of a single MEC unit.
In this paper, we provide an optimization framework (an exact model as well as an efficient heuristic approach) that considers several key aspects of the resource allocation problem in the context of Mobile Edge Computing. Specifically, our proposed model and heuristics jointly optimize (1) the admission decision (which connections are admitted and served by the network, based on the profit they can potentially generate with respect to the required resources for serving demands), (2) the scheduling of admitted connections, also called calendaring (taking into account the flexibility that some users exhibit in terms of starting and ending time tolerated for the required services), (3) the routing of these flows, (4) the decision of which nodes will serve such connections as well as (5) the amount of processing and storage capacity reserved on the chosen nodes that serve such connections, with the objective of maximizing the operator's profit.
To our knowledge, our work is the first one that considers all these five aspects together. Other works focus, instead, on specific aspects. For instance, in [1], the authors study a task offloading model considering constraints on task queue lengths to minimize the users' power consumption, while the work in [2] jointly considers task assignment, computing and transmission resources allocation to minimize system latency in a multi-layer MEC context. The authors in [3] study traffic processing and routing policies for service chains in distributed computing networks to maximize network throughput. These works, however, do not consider the resource scheduling problem. The work in [4] studies bandwidth calendaring to allocate network resources and schedule deadline-constrained data transfers, while [5] studies the problem of scheduling and routing deadline-constrained flows in data center networks to minimize the energy consumption. However, the allocation of computing resources is not considered in these works. In [6], the authors study the problem of dispatching and scheduling jobs in edge-cloud system to minimize the job response time; [7] studies online deadline-aware task dispatching and scheduling in edge computing to maximize the number of completed tasks. Finally, the work in [8] proposes a two-timescale strategy for resource allocation by performing service placement (per frame) and request scheduling (per slot) to reduce the operation cost and system instability. These works, though, do not explicitly consider the routing problem that arises. In our previous works, we first focus in [9] exclusively on minimizing the latency of traffic in a hierarchical network, keeping the network and computation capacity fixed. Then, in [10], we address the joint network planning, slicing and edge computing problem, aimed at minimizing both the total latency and operation cost for arbitrary network topologies. However, in these works we do not consider the requests admission as well as the scheduling of the computation, storage and bandwidth resources.
To jointly optimize allocation of multiple resources and scheduling, we first formulate an exact optimization model, which turns out to be both nonlinear (due to latency constraints that we model accurately) and integer, and then we provide a reformulation 1 that transforms our original problem into an equivalent Mixed Integer Quadratically Constrained Problem (MIQCP), which can be solved by available commercial solvers. Subsequently, an effective heuristic, named Sequential Fixing and Scheduling (SFS), is proposed. We compare our proposed model and heuristic to a greedy approach, which provides a benchmark for our solutions. Numerical results demonstrate that the proposed heuristic performs close to the optimum in all the considered network scenarios, with a very short computing time.
The paper is organized as follows: Section II illustrates the problem formulation and the proposed exact optimization model. Section III presents the heuristic. A numerical analysis and comparison of the proposed model and heuristics is performed and discussed in Section IV. Finally, section V concludes the paper.

II. PROBLEM FORMULATION A. System Overview
We consider an edge cloud network represented by an undirected graph G(V, E), where each node v ∈ V represents an edge computing node having D v and S v as computation and storage capacity, respectively. The two parameters θ v and φ v denote, respectively, the cost of computation and storage capacity of node v. Each edge e ∈ E corresponds to a network link characterized by its bandwidth B e and its cost per unit of flow ψ e . Let K denote the set of requests, with different types, offered to the network. We regard each type of request as an aggregated communication-computation demand, e.g. web, video, game traffic etc., which has to be accommodated in the network and requires some amount of bandwidth, storage and computation resources. We assume that the calendar (i.e., the arriving time and duration) of the requests for the upcoming period is known. This can be achieved assuming that customers have announced their requirements in advance, or that some history-based prediction tools [11] are used.
We discretize the time horizon into a set T of equal duration time-slots, where the slot length is τ . Each request k ∈ K is defined as a tuple (s k , α k , β k , d k , λ k ). The parameter s k is the source node of request k; α k , β k and d k define the arrival time, the latest ending time (deadline) and the duration of request k, respectively. Finally, we consider a Poisson process for each request k with an average packet arrival rate λ k . The arrival and ending times coincide, respectively, with the arrival of the first packet and the departure of the last packet of request k.
A request k could be processed immediately (for delaysensitive tasks) after its arrival, or scheduled for later (for delay-tolerant tasks). Also, it could be entirely processed on the local edge computing node or split into multiple fractions and processed on other nodes. In any case, it must be completed before the deadline β k . Figure 1 shows the arrival time α, deadline β and duration d of requests 1 and 2. Also, it highlights that request 1 is scheduled to be served from time ξ 1 , delayed (shifted) with respect to α 1 but still compatible with β 1 . The ending time for the request will then depend on ξ 1 , d 1 , computing latency, and link latency along the routing path if (some fraction of) the request is offloaded to the neighbor edge computing nodes.
Given a calendar of requests T over a time horizon, the proposed optimization approach must: a) schedule the starting time of each request, b) decide where to compute the requests, and c) route some fractions of requests when it is necessary to process them on other edge computing nodes, in order to maximize the profit of the provider.

B. Life Cycle of A Request
A given request k arriving at an edge node v at time α k could be: i. rejected, ii. processed immediately -this is needed if it is a delay sensitive task -or iii. shifted to a future epoch, if it is delay tolerant. To model the fact the delayed (shifted) starting time ξ k can vary in the time frame [α k , β k − d k ], we express ξ k as: ξ k = β k −d k t=α k t · z kt , and we have: Essentially, z kt is a binary variable that can be 1 at most in one point of time which corresponds to ξ k for request k. When z kt = 0 for all possible time slots, this implies that the request is not admitted and, therefore, not scheduled. Note that by changing the inequality constraint (1) to an equality, the edge cloud will be forced to serve all the incoming requests, which may be unfeasible in some cases. A request can be either processed locally in a computing node or split and offloaded to other edge computing nodes. In the latter case, the processing latency, the storage provisioning constraints and the link latency along a routing path should be taken into account by the calendaring scheme. Considering a node v that is allocated to process a fraction q kv ∈ [0, 1] of request k, the ending time at v, denoted by ξ kv o , can be expressed as: where T kv L and T kv P are respectively the link latency and processing latency. Note that both ξ k and ξ kv o are integer values in the time slot set and τ is the time-slot duration. The ending time of each request depends on the last finished piece, which must be completed before the deadline. Such constraint is expressed as: In the following, we will express the request routing and the two latency components (link and processing latency) in detail.

C. Network Routing
We assume that a request can be split into multiple pieces only at its source node. Each piece can then be offloaded to another edge computing node independently of the other pieces, but it cannot be further split (we say that each piece is unsplittable). Each link e ∈ E may carry different request pieces, q kv (remind that q kv is the fraction of request k to be processed at node v). Then, the total flow of request k on link e, f k e , can be expressed as the sum of all pieces of k that pass through such link: f k e = v∈V: e∈R kv q kv , where R kv ⊂ E denotes the routing path (set of traversed links) for the partial request q kv from source node s k to node v. The traffic flow conservation constraint is enforced by: where Φ − v and Φ + v are, respectively, the set of incoming and outgoing links of node v. The fulfillment of this constraint guarantees continuity and acyclicity for the routing path.

D. Link Latency
Let T kv L denote the link latency for routing request k to node v. Each request is routed in a multi-path way, i.e., different pieces of the request may be dispatched to different nodes via different paths. The transmission time of the requests on each link is described by an M |M |1 model; hence, ∀k, ∀v, T kv L is defined as: where p kv e is the fraction of bandwidth capacity sliced for the piece of request (q kv ) flowing to node v via link e. The link latency is accounted for only if a piece of request k is processed at node v (i.e. q kv > 0) and v = s k . The following constraint ensures that the flow of request k on each link of the routing path does not exceed the allocated capacity: Considering that different requests k ∈ K can share the same link at a given time slot, the reservation constraint of a link capacity at any time slot is expressed as: where p kvt e is the fraction of link e's bandwidth allocated for a piece of request q kv at time slot t. Note that we assume that the reserved bandwidth for each request over its life period does not change in order to provide consistent service guarantee. The superscript t in p kvt e is used to indicate the life status of the flow. The relation between p kvt e and p kv e is given by p kvt e = δ kvt p kv e , where δ kvt is a binary variable which is equal to 1 if ξ k t < ξ k + d k + T kv L τ , and 0 otherwise.

E. Processing Latency and Storage Provisioning
When a request cannot be entirely processed locally, we assume that such request can be segmented and processed on different edge computing nodes. Hence, each node can slice its computation capacity to serve several requests coming from different source nodes. Notice that a request k also requires a fixed amount of storage resource m k on a node v if k is to be processed on this node later. Thus, only if both computation and storage resources on a node are sufficient, a request could be processed on that node. Let variable r kv denote the fraction of computation capacity D v sliced for the piece of request q kv λ k . The processing of user requests is also described by an M |M |1 model. Let T kv P denote the processing latency of edge computing node v for request k. Then, based on the computational capacity r kv D v with an amount q kv λ k to be served, ∀k, ∀v, T kv P is expressed as: where r kv is the fraction of node v's computation capacity sliced to request k, and η k is the processing density [12] of request k measured in "cycles/bit". In the above equation, when request k is not processed on node v, the latency is set to 0 and, at the same time, no computation resource should be allocated to request k. The corresponding constraint is: q kv and r kv also have to fulfill the consistency constraints: Remind that the right hand of equation (9) represents whether a request k is admitted in the system or not. If a request k is rejected by the admission controller, the right hand expression is equal to 0 and q kv = 0 is enforced.
The storage constraint can be expressed as follows: Different requests k ∈ K may share an edge computing node at a time slot. Thus, the reservation constraint of a node computation capacity at any time slot is implemented by: where r kvt is the fraction of node v's computation capacity allocated for request k at time slot t. We assume that the reserved computation power for each request over its life period will not change due to both the computation scaling overhead and task reconfiguration overhead. The superscript t in r kvt allows us to keep track of the life status of the request. The relation between r kvt and r kv is given by r kvt = ρ kvt r kv , where ρ kvt is a binary variable which is equal to 1 if ξ k + T kv L τ t < ξ kv o , and 0, otherwise.

F. Optimization Problem
Our goal in the resource calendaring problem is to maximize the profit computed as the total revenue obtained from serving the users' requests minus the network operation costs in terms of computation, storage and bandwidth resources, under the constraints (starting and ending times) of requests coming from different nodes: where µ k is the revenue gained from serving request k. Problem P0 contains both nonlinear and indicator constraints, therefore, it is a mixed-integer nonlinear programming (MINLP) problem, which is hard to be solved directly [13]. Moreover, we also face the following difficulties: a) routing variables R kv and request fraction variables q kv are "intertwined": to find the optimal routing, the fraction of request processed at each node v should be known, and at the same time, to solve the optimal resource allocation for a request, the routing path should be known; b) P0 contains indicator functions and constraints, which cannot be directly and easily processed by most solvers. To deal with the above critical issues, we propose an equivalent reformulation of P0, which we call P1 that we can efficiently solve with the Branch and Bound method. Intuitively, the reformulation in P1 proceeds as follows: (a) we first reformulate the link and processing latency constraints, as well as the node storage constraint (viz., constraints (4), (7) and (10)), then (b) we handle the difficulties related to indicator variables δ kvt , ρ kvt and variables R kv with the corresponding routing constraints. For space reasons, we do not include the reformulation here. The interested readers can refer to the technical report available online 1 .

III. HEURISTICS
To solve our problem in a reasonable computational time, we propose a heuristic named Sequential Fixing and Scheduling (SFS) which realizes a good trade-off between admitting "valuable" connections (the ones that provide high return to the service provider) and the resources they request in terms of transmission rate, storage and computation.
SFS is detailed in Algorithm 1. We first introduce some auxiliary variables of P1, viz., b kv : whether request k is processed on node v, and γ kv e : whether request piece q kv is routed via link e. The hat notation (likeb kv ) represents the values of the corresponding variables in the solution set S .
We start by sorting all requests in descending order according to the ratio µ k d k λ k m k η k ; this ranking is designed to give a higher weight to requests that generate more revenue and less cost to the operator. Then, we try to define a schedule where we admit as many requests as possible. For each request k, we check whether its activation period overlaps with the one of other requests k that are already admitted, and in such case we say there is a conflict. The overlap value F k is determined by the function check overlap(·) (line 6 of Algorithm 1). This function takes as input k, k and the partial solution S computed up to the current point, returns F k and proceeds as follows: i) it initializes two local variables α and β with the arrival time α k and deadline β k of request k , respectively; ii) it verifies if k is admitted; if yes, it updates, respectively, α and β with the exact starting timeξ k and ending time max v∈Vξ k v o of k according to the solution S ; iii) it computes the (partial) overlapping between k and k as: overlap = β k − α, if α > α k ; overlap = β − α k , otherwise (a negative value of overlap means no overlapping); iv) Finally, it calculates and returns the maximum relative overlap value F k between k and k , which is expressed as: Based on {F k |k ∈ K\{k}}, for each edge node v ∈ V, we select the maximum F k for all k being processed at v, and we identify this overlap value with C v (line 7). Next, we compute the ordered set Q k which contains sets V i of best candidate edge nodes to process request k. In doing so, we consider C v and limit to L v the computation resource of each node in V i , ∀V i ∈ Q k (line 8 of Algorithm 1; details in Algorithm 2). If we successfully find some candidates (Q k = ∅), we further update the residual bandwidth B e for all links e and create a weighted graph G with the reciprocal bandwidth B −1 e . Then (lines 12-16), we select the first V i in Q k that permits to find a profitable solution (O 0) according to Algorithm 1 Sequential fixing and scheduling 1: Initialize z kt = 0, ∀k, t and profit O = 0 for P1 ; 2: Setb kv = 0,r kv = 0,p kv e = 0 (in S ), ∀k, v, e; 3: Sort K in descending order as Kr by µ k d k λ k m k η k , k ∈ K; 4: for k ∈ Kr do 5: if Q k = ∅ then 10: k v e ), ∀e ∈ E; 11: Create graph G weighted by B −1 e ; 12: for Vi ∈ Q k do 13: Set b kv = 1, r kv Lv, ∀v ∈ Vi; else Reject k (set z kt = 0, ∀k, t);

Algorithm 2 Find candidates
Input: k, G, S (solution), Cv(conflict); Output: Q k (candidates), Lv(limit); 1: Lv ← 1 − k ∈K|Cv >wcr k v , ∀v ∈ V; wc = 0.6 2: V s = {v ∈ V | Cv wc || Lv w l }; w l = 0.25 3: Yv ← (−hop(G, s k , v), v / ∈ {s k |k ∈ K}, Lv), ∀v ∈ V s ; 4: Sort V s in descending order by Yv; 5: V1 = ∅, DΣ = 0; 6: for v ∈ V s do 7: if λ k w d DΣ then V1 ← V1 ∪ {v}; w d = 0.9 8: the following criteria: we outsource k to the nodes in V i and bound the computation resource by setting b kv and r kv . Based on G , we route each piece of request q kv using the Dijkstra algorithm (lines 10-14). After fixing variables b kv , γ kv e and the constraints related to z kt , r kv in P1 , we start to optimize P1 to get the profit and the solution denoted, respectively, by O and S. If P1 results infeasible in the current setting (O < 0), we reiterate on the other elements of Q k . If the result of the new optimization improves, we update the current best profit O and solution S , we admit request k and allocate resources to it (including time slots, computation, bandwidth, and storage). We also update the lower bound of P1 to LB = O to accelerate the optimization (line 20). Finally, if the result does not improve or no candidate could be found, we reject k and clear its corresponding variables settings.
In Algorithm 2, we first estimate L v , the remaining computation capacity of each node v, based onr k v in the solution S verifying that the conflicts are higher than a given threshold w c (C v > w c ). Then, we define V s as the set of nodes v satisfying C v w c || L v w l , where w l is a threshold on the remaining computation. Hence, V s represents the set of nodes that are either in less conflict (for request k) or have enough remaining computation power. For each v ∈ V s , we compute three features (denoted by Y v ), i.e., the negative hop distance between s k and v (−hop(G, s k , v)), the indicator of whether v is a source node or not (v / ∈ {s k |k ∈ K}) and the estimated left computation capacity L v (lines 1-3). Based on Y v , we sort V s in descending order to give more priority to a node that is closer to the ingress node for k, better not to be a source, and has more remaining computation capacity with respect to other nodes. Then, we try to add nodes to V 1 , in order, until λ k w d D Σ , where D Σ denotes the estimated total computation capacity that can be used and w d is a threshold controlling the total required computation capacity. Finally, we return the ordered set Q k with V 1 at the first place and the left nodes V s − V 1 being separately stored as unit sets of backup candidates (lines [5][6][7][8][9]. Notice that the values of the thresholds w c/l/d are appropriately chosen based on our experiments. Greedy approach: We further present an alternative benchmark heuristic, which we call Greedy. It first sorts the requests in ascending order by the priority key (−µ k , α k , β k ) and then tries to schedule them one by one. The sorting considers the revenue µ k of a request in the 1st place, the arrival time α k in the 2nd place and the deadline β k in the 3rd (last) place. For each request k, we try to guarantee sufficient computation power by using its closest neighbor nodes with a condition λ k DΣ w greedy , w greedy = 0.6 is a threshold appropriately chosen based on our experiments.

IV. NUMERICAL RESULTS
We evaluate the performance of the proposed model, the SFS and Greedy heuristics in terms of the operator's profit, expressed as in P0, the serving rate (the fraction of admitted requests) and the computing time to get the solution. We first present the experimental setup and then we discuss our results.
Experimental Setup: All numerical results presented in this section have been obtained on a server equipped with an Intel(R) Xeon(R) E5-2640 v4 CPU @ 2.40GHz and 126 GB of RAM and with an open-source framework, SCIP (Solving Constraint Integer Programs). The parameters of SCIP used in our experiments are set to their default values. The results illustrated in Figure 2 are obtained by averaging over 50 instances, with 97% narrow confidence intervals.
The network topologies used in our experiments are generated based on Erdös-Rényi random graph by specifying the numbers of nodes and edges. Due to space constraints, we present and discuss in this section the results obtained for a representative topology (denoted as 30N50E30R), composed of 30 nodes and 50 edges with 30 requests, as well as those for a small topology (denoted as 5N5E3R) consisting of 5 nodes, 5 edges, and 3 incoming requests. The 5N5E3R topology allows us to compare the SFS heuristic to the optimal solution ( Figure 2(d)). The full set of results is available online 1 .
We uniformly extract, at random, a source node as well as the arrival/ending times and duration, and the revenue gained by the operator in serving each request, in range [100, 300]. We further generate random request rates according to a Gaussian distribution N (λ k , σ 2 ), where λ k is randomly selected in range from 30 to 60 Gb/s and σ = 0.5. For the sake of simplicity, we assume that all links have the same bandwidth (B e = 30 Gb/s) and nodes have the same computation capacity (D v = 30 Giga cycles/s) and storage capacity (S v = 40 GB). The costs of using one unit of these three resources, ψ e , θ v , and φ v , are all set to 0.01. Finally, we set the processing density η k = 1 and the storage requirement m k = 10 for all requests. Note that our proposed model and heuristics are general, and can be applied to optimize resource allocation in all network scenarios with any parameters setting.
Effect of the request rate and revenue (λ k , µ k ): Figures 2(a) and 2(b) illustrate the variations of profit and serving rate versus the request rate and revenue in the 30N50E30R topology. Values of λ k and µ k , k ∈ K are both scaled from 0.5 to 2.0 with respect to their initial values. This implicitly indicates that serving each request provides a revenue proportional to its arrival rate. As (λ k , µ k ) increase, the profits for all approaches increase; the network operator, in fact, is able to select and admit the requests which can cover the system cost and provide higher profit at the same time. When the request rate is low, all connections can be served; when it increases, Fig. 2: Profit and serving rate against scaling parameters (λ k , µ k ), D v and λ k . specifically after the point around 1.2, the serving rate of SFS decreases since the system can accommodate less requests, which become more demanding, hence costlier in terms of required resources, as the scale factor (thus λ k ) increases. Finally, SFS exhibits better performance compared to Greedy with gaps up to 18% for profit and 20% for serving rate.
Effect of the computation capacity D v : Figure 2(c) shows the variations of the profit against the edge node computation capacity D v , scaled with respect to its initial value from 0.5 to 1.5, in the 30N50E30R topology. When D v increases, the profit (and serving rate, not shown for space reasons) increase and converge to a specific value for all approaches. Note that, for SFS, the profit increases from 700 up to 980, while for Greedy, from about 630 to 950. As for the serving rate it increases for SFS from 0.64 up to 1, while for Greedy, from about 0.61 to 0.98. These trends reflect the strong effect of the available computation capacity on the profit and serving rate. Additionally, SFS performs better than Greedy with clear gaps: up to 13% for profit and 14% for serving rate. As expected, with large available node computation capacity, the performance gap between SFS and Greedy decreases since the utilization of enhanced algorithms is less critical to perform a good resource allocation, when resources are abundant.
Optimum and Computing time: The exact model P0 could be solved in a reasonable time only in the small topology (5N5E3R), and Figure 2(d) plots the profit versus the request rate λ k keeping the revenue µ k fixed. The decreasing trend of the profit for all the three approaches, when increasing λ k , is due to the fact that more and more resources are needed and hence the cost incurred by the operator continues to increase while the revenue is fixed, and therefore the profit decreases. SFS exhibits indeed very good performance since its curve and that of the optimal solution are completely overlapping, while the Greedy approach shows lower performance.
The optimal solution has an average computing time of 146 s, while SFS took just 5 s, and Greedy 4 s. Furthermore, in larger scenarios (up to 30N50E30R) SFS exhibited a computing time always inferior to 1096 s, thus confirming its efficiency in computing very good solutions in a short time. The Greedy approach needs less computation time, on average 822 s, to obtain the solution, at the cost of higher performance gaps with the proposed model and SFS heuristic.

V. CONCLUSION
In this paper we formulated and solved the resource calendaring problem in mobile networks equipped with Mobile Edge Computing (MEC) capabilities. Specifically, we proposed both an exact optimization model as well as an effective heuristic able to obtain near-optimal solution in all the considered, real-size network scenarios.
The decisions we optimized include admission control for the connections offered to the network, their calendaring (scheduling) and bandwidth constrained routing, as well as the determination of which nodes provide the required computation and storage capacity. Calendaring, in particular, permits to exploit the intrinsic flexibility in the services demanded by different users, whose starting time can be shifted without penalizing the utility perceived by the user while, at the same time, permitting a better resource utilization in the network.