Resource Calendaring for Mobile Edge Computing: Centralized and Decentralized Optimization Approaches

access points of a mobile operator in the Milan area. Results demonstrate that the heuristic performs close to the optimum in all considered network scenarios, while exhibiting a low computing time. This provides an evidence that our proposal is an eﬀective framework for optimizing resource allocation in next-generation mobile networks.


Introduction
Next generation (5G and beyond) mobile networks are currently being deployed, and need to provide services characterized by ultra-low latency, high bandwidth, as well as real-time access to the radio network. To achieve these goals, Mobile Edge Computing (MEC) is envisaged to provide an IT service environment and cloud-computing capabilities at the edge of the mobile network, within the Radio Access Network and in close proximity to mobile subscribers. Through this approach, the latency experienced by mobile users when they use specific services can be considerably reduced. However, the computation power that can be offered by a single edge cloud is limited compared to the one available at a remote cloud. This implies that relying exclusively on a single edge cloud for serving user requests is not actually possible, thus resulting in the need to devise approaches that either offload the exceeding traffic to the central cloud or rely on the availability of other close-by edge clouds that could cooperate to help each other. This last case is particularly interesting as it allows providers to exploit all its resources, but, at the same time, it requires a careful allocation of edge resources to each user request. Considering that 5G networks will be likely built in an ultra-dense manner and the edge clouds attached to 5G base stations will also be massively deployed and connected to each other in a specific mesh topology, this approach appears to be feasible and certainly deserves a specific analysis. It requires the adoption of proper strategies that, on the one side, guarantee the user requirements associated to each request are met and, on the other side, ensure the operator's resources are not depleted but used in an optimized manner. Meeting both the expectations of end users and operators is not straightforward as they are somewhat conflicting with each other. In fact, while from the user viewpoint, requests must be served within the expected time boundaries, from the provider's perspective, serving a request should be sufficiently profitable and not causing the need to exclude other more profitable requests. These requirements result in the need to develop an approach to control multiple aspects, including the decision to admit or not a request, the scheduling of admitted requests in order to fulfill the users' requirements, the assignment of requests to a proper serving node and the subsequent allocation of the required resources, the proper routing of the requests toward their servants.
As discussed in more details in Section 2, the literature presents approaches that handle some of the aspects mentioned above [1][2][3][4][5][6][7][8][9][10][11]. However, to the best of our knowledge, there is no approach able to tackle all above aspects at the same time.
Our approach is centered on an optimization framework (an exact model as well as an efficient heuristic approach based on sequential fixing) that considers all key aspects of the resource allocation problem in the context of Mobile Edge Computing, by carefully modeling and optimizing the allocation of network resources including computation and storage capacity available in network nodes as well as link capacity. Specifically, our proposed model and heuristics jointly optimize (1) the admission decision (which user requests are admitted and served by the network, based on the profit they can potentially generate with respect to the required resources for serving demands), (2) the scheduling of admitted user requests, also called calendaring (taking into account the flexibility that some users exhibit in terms of starting and ending time tolerated for the required services), (3) the routing of these user requests, (4) the decision of which nodes will serve them as well as (5) the amount of processing and storage capacity reserved on the chosen nodes that serve such user requests, with the objective of maximizing the operator's profit.
Besides the classical optimization approach, we propose also a "multi-agent" framework where the resources in the Mobile Edge Network are managed in a coordinated and decentralised way. The goal of each agent is to dynamically allocate network resources, reacting to (possibly local) network changes in a prompt and effective way. This could help reducing the experienced latency, thus permitting to satisfy in a more effective way demanding services. Similar distributed allocation approaches are considered also in other similar network contexts [12], for example, in SDN where decisions in the control plane are taken by multiple controllers, each in charge of a specific domain, to improve scalability and an effective resource allocation with fast reaction to network changes.
In our approach each agent solves the optimization problem in a distributed fashion, with limited overhead, by relying on an implementation of the Alternating Direction Method of Multipliers [13]. This is a well-established optimization tool used to decompose a problem into multiple small subproblems that can be solved iteratively. We extend it using a weighting approach especially tailored to our scheduling problem. This paper extends the work presented in [14] by strengthening the evaluation of the optimization approach, which now includes realistic network topologies, and by defining and experimenting with the distributed solution approach, based on ADMM.
We provide clear quantitative insights regarding the structure of the underlying computational infrastructure, and we compare our proposed model and heuristic to a greedy approach, which provides a benchmark for our solutions. We perform a thorough performance analysis of the proposed model and heuristics using real-size network scenarios and real radio access points positions of a mobile network operator (Vodafone Italy) in the Milan area. Numerical results demonstrate that our proposed model captures several important aspects of Mobile Edge Computing architectures. Furthermore, the proposed heuristics perform close to the optimum in all considered network scenarios, with a very short computing time, thus representing a very promising solution for the design of efficient and cost-effective mobile networks.
To summarize, our paper makes the following contributions: • A mathematical model that captures key aspects of Mobile Edge Computing architectures.
• A fully distributed algorithm based on ADMM, which permits to solve our problem with resource allocation decisions made directly by edge nodes. • A thorough numerical evaluation performed in several realistic, large scale topologies. We make all topology datasets publicly available in a repository.  [2] √ Dynamic Centralized Flat MEC [3] √ √ Static Centralized Fog [4] √ Static Centralized Multi-layer MEC [5] √ √ Dynamic Centralized Distributed Computing [6] √ √ √ Dynamic Centralized SDN [7] √ √ Static Centralized Data center [8] √ √ Dynamic Decentralized Flat MEC [9] √ √ Dynamic Decentralized Flat MEC [10] √ √ Static Centralized Flat MEC [12] √ Static Decentralized SDN [17] √ √ Dynamic Centralized SDN [18] √ √ Static Centralized Cloud RAN [19] √ √ Static Decentralized Flat MEC [20] √ Static Decentralized Flat MEC Our work The paper is organized as follows: Section 2 discusses related work. Section 3 illustrates the problem formulation and the proposed exact optimization model. Section 4 presents the heuristics we devised, based on a sequential-fixing approach. Section 5 illustrates a distributed algorithm we propose, based on ADMM, to solve our problem with resource allocation decisions made directly by edge nodes. The numerical analysis and comparison of the proposed model and heuristics is performed and discussed in Section 6, including real-life network scenarios. Finally, Section 7 concludes the paper.

Related Work
In this section we revise works that consider task offloading and calendaring/scheduling issues; we further discuss recent works where the ADMM approach has been applied in mobile networking contexts.
To our knowledge, our work is the first one that considers all five aspects illustrated in the previous sections together (admission decision, scheduling, routing, node offloading decision and amount of resources; for these latter, we consider processing and storage capacity reserved on each node). The following works, instead, focus on specific aspects. In Table 1 we summarize all the different subproblems tackled in each reference, as well as the solution approach adopted, to provide an easier comparison with our present work.
In [2], the authors study a task offloading model considering constraints on task queue lengths to minimize the users' power consumption, while the work in [4] jointly considers task assignment, computing and transmission resources allocation to minimize system latency in a multi-layer MEC context. The work in [3] studies task distribution and proactive edge caching in fog networks with latency and reliability constraints to minimize the task computing time. The authors in [5] study traffic processing and routing policies for service chains in distributed computing networks to maximize network throughput. These works, however, do not consider the resource scheduling problem. The works in [6,17,18] study bandwidth calendaring to allocate network resources and schedule deadline-constrained data transfers, while in [7] the authors study the problem of scheduling and routing deadline-constrained flows in data center networks to minimize the energy consumption. However, the allocation of computing resources is not considered in these works. In [8], the authors study the problem of dispatching and scheduling jobs in edge-cloud system to minimize the job response time; in [9] the authors study online deadline-aware task dispatching and scheduling in edge computing to maximize the number of completed tasks. Finally, the work in [10] proposes a two-time-scale strategy for resource allocation by performing service placement (per frame) and request scheduling (per slot) to reduce the operation cost and system instability. These works, though, do not explicitly consider the routing problem that arises.
ADMM [13] has been recently applied in the mobile networking context. The work in [1] studies a MEC slicing framework named Sl-EDGE which allows network operators to instantiate heterogeneous slice services on edge Whether request starts at time slot ∈  Fraction of request processed at node Fraction of link 's bandwidth sliced to request at Fraction of node 's computation capacity sliced to at Whether node processes request at devices. The authors formulate the edge slicing problem as a mixed-integer linear programming (MILP) model and design a distributed algorithm based on ADMM such that clusters can locally compute slicing strategies. In [19], the authors propose a distributed cross-domain resource orchestration algorithm based on ADMM for dynamic network slicing in cellular edge computing, while [20] studies the energy-efficient workload offloading problem and propose a low-complexity distributed solution based on consensus ADMM. The work in [12] proposes a distributed algorithm based on ADMM to solve the multi-path fair bandwidth allocation problem in distributed Software Defined Networks (SDN) with the assumption that the paths are pre-computed once and for all and do not change.
Differently from such works, we further jointly consider the request admission, scheduling and routing problems in edge computing networks, with both centralized and distributed solution approaches.

Problem Formulation
In this section we formulate the resource calendaring problem, which includes users' requests admission, their scheduling and routing. Our target is to maximize the profit, which is expressed as the difference between the revenue earned by the provider from serving users' requests and the cost incurred from providing computation and storage resources at edge nodes, as well as bandwidth capacity. Table 2 summarizes the notation used throughout this section. For brevity, we simplify expression ∀ ∈  as ∀ , and apply the same rule to other set symbols like , ,  , etc. throughout the rest of this paper.

System Overview
We consider an edge cloud network represented by an undirected graph (, ), where each node ∈  represents an edge computing node having and as computation and storage capacity, respectively. The two parameters and denote, respectively, the cost of computation and storage capacity of node . Each edge ∈  corresponds to a network link characterized by its bandwidth and its cost per unit of flow . Let  denote the set of requests, with different types, offered to the network. We regard each type of request as an aggregated communication-computation demand, e.g. web, video, game traffic etc., which has to be accommodated in the network and requires some amount of bandwidth, storage and computation resources. We assume that the calendar (i.e., the arriving time and duration) of the requests for the upcoming period is known. This can be achieved assuming that customers have announced their requirements in advance, or that some history-based prediction tool [21] is used. Since an accurate prediction of network traffic is an important element for network operators, several techniques have been proposed in the literature to enable efficient resource management, traffic engineering and load balancing [22][23][24][25][26]. The aim of these works is to try to predict parameters like the near-term transmission rate of a connection or, in general, network traffic profiles, based on measurements on the past traffic and on service-level agreements established with network users, as a prediction of future traffic distribution; however, the focus of our paper is not to determine the best traffic predictor. The problem of optimal allocation of current and future bandwidth resources is studied in [24] in the context of Traffic Engineering in SDN. Machine Learning approaches have also been proposed to predict the traffic load on the links of a telecom network [22]. Convolutional Neural Networks have been proposed for predicting network traffic in datacenter networks [23]. Finally, when precise prediction is difficult to achieve, online algorithms like the one illustrated in [6] could be considered, to deal with unpredictable incoming demands; these approaches are more fit when an admission decision, scheduling and resource allocation decisions must be taken instantaneously.
We discretize the time horizon into a set  of equal duration time-slots, where the slot length is . Each request ∈  is defined as a tuple ( , , , , ). The parameter is the source node of request ; , and define the arrival time, the latest ending time (deadline) and the duration of request , respectively. Finally, we consider a Poisson process for each request with an average packet arrival rate . The arrival and ending times coincide, respectively, with the arrival of the first packet and the departure of the last packet of request .
A request could be processed immediately (for delay-sensitive tasks) after its arrival, or scheduled for later (for delay-tolerant tasks). Also, it could be entirely processed on the local edge computing node or split into multiple fractions and processed on other nodes. In any case, it must be completed before the deadline . As an example, Figure 1 shows the arrival time , deadline and duration of requests 1 and 2. Also, it highlights that request 1 is scheduled to be served from time 1 ⋆ , delayed (shifted) with respect to 1 but still compatible with 1 . The ending time for the request will then depend on 1 ⋆ , 1 , processing latency, and link latency along the routing path if (some fraction of) the request is offloaded to the neighbor edge computing nodes.
Given a calendar of requests  over a time horizon, the proposed optimization approach must: a) schedule the starting time of each request, b) decide where to compute the requests, and c) route some fractions of the requests when it is necessary to process them on other edge computing nodes, in order to maximize the profit of the provider.

Comments
Our approach is specifically tailored to network scenarios where at each edge node we have a stateless servant for a specific request type (it could be, for example, a microservice that is deciding whether some temperature data must trigger an alarm or not) and the network decides where to send a request. When considering streams of data (that is, multiple packets related one with the other), we should have a way to route to the same destination the packets belonging to the same stream. On the other hand, if we have a computation that requires traffic to be split according to a specific logic (e.g., the typical MapReduce example concerning word counting), the traffic cannot be split by the network in a simple way, but it should go through some application-specific components that decide how to split it. We do not explicitly model errors and retransmissions; a simple way to capture these such features could be to calculate the expectation value of the processing time based on the failure probability, and correspondingly the number of retrying times. We leave these extensions as futures research issues.

Life Cycle of A Request
A given request arriving at an edge node at time could be: i) rejected, ii) processed immediately -this is needed if it is a delay sensitive task -or iii) shifted to a future epoch, if it is delay tolerant. To model the fact that the delayed (shifted) starting time ⋆ can vary in the time frame [ , − ], we express ⋆ as: where is a binary variable that can be 1 at most in one point of time which will correspond to ⋆ for request . Meanwhile, we have: (2) When = 0 for all possible time slots, this implies that the request is not admitted and, therefore, not scheduled. Note that by changing the inequality constraint (2) to an equality, the edge cloud will be forced to serve all the incoming requests, which may be unfeasible in some cases.
A request can be either processed locally in a computing node or split and offloaded to other edge computing nodes. In the latter case, the processing latency, the storage provisioning constraints and the link latency along a routing path should be taken into account by the calendaring scheme. Considering a node that is assigned to process a fraction ∈ [0, 1] of request , the ending time at , denoted by , can be expressed as: where and are respectively the link latency and processing latency. Note that both ⋆ and are integer values in the time slot set, and is the time-slot duration. The ending time of each request depends on the last finished piece, which must be completed before the deadline. Such constraint is expressed as: In the following, we will express the request routing and the two latency components (link and processing latency) in detail.

Network Routing
We assume that a request can be split into multiple pieces only at its source node. Each piece can then be offloaded to another edge computing node independently of the other pieces, but it cannot be further split (we say that each piece is unsplittable). Each link ∈  may carry different request pieces, (remind that is the fraction of request to be processed at node ). Then, the total flow of request on link , , can be expressed as the sum of all pieces of that pass through such link: where  ⊂  denotes the routing path (set of traversed links) for the partial request from source node to node . The traffic flow conservation constraint is enforced by: where Φ − and Φ + are, respectively, the set of incoming and outgoing links of node . The fulfillment of this constraint guarantees continuity and acyclicity for the routing path.

Link Latency
Let denote the link latency for routing request to node . Each request is routed in a multi-path way, i.e., different pieces of the request may be dispatched to different nodes via different paths. The transmission time of the requests on each link is described by an | |1 model [27], hence, ∀ , ∀ , is defined as: where is the fraction of bandwidth capacity sliced for the piece of request ( ) flowing to node via link . The link latency is accounted for only if a piece of request is processed at node (i.e., > 0) and ≠ . The following constraint ensures that the flow of request on each link of the routing path does not exceed the allocated capacity: Considering that different requests ∈  can share the same link at a given time slot, the reservation constraint of a link capacity at any time slot is expressed as: where is the fraction of link 's bandwidth allocated for a piece of request at time slot . Note that we assume that the reserved bandwidth for each request over its life period does not change in order to provide consistent service guarantees. The superscript in is used to indicate the life status of the flow. The relation between and is given by = , where is a binary variable defined as:

Processing Latency and Storage Provisioning
When a request cannot be entirely processed locally, we assume that such request can be segmented and processed on different edge computing nodes. Hence, each node can slice its computation capacity to serve several requests coming from different source nodes. Notice that a request also requires a fixed amount of storage resources on a node if is to be processed on that node. Thus, only if both computation and storage resources on a node are sufficient, a request could be processed on that node. Let variable denote the fraction of computation capacity sliced for the piece of request . The processing of user requests is also described by an | |1 model [28,29]. Let denote the processing latency of edge computing node for request . Then, based on the computational capacity with an amount to be served, ∀ , ∀ , is expressed as: where is the fraction of node 's computation capacity sliced to request , and is the processing density [30] of request measured in "cycles/bit". In the above equation, when request is not processed on node , the latency is set to 0 and, at the same time, no computation resource should be allocated to request . The corresponding constraint is: and also have to fulfill the consistency constraints: Remind that the right hand of equation (13) represents whether a request is admitted in the system or not. If a request is rejected by the admission controller, the right hand expression is equal to 0 and = 0 is enforced. Different requests ∈  may share an edge computing node at a time slot. Thus, the reservation constraint of a node computation capacity at any time slot is modeled as follows: where is the fraction of node 's computation capacity allocated for request at time slot . We assume that the reserved computation power for each request over its life period will not change due to both the computation scaling overhead and task reconfiguration overhead. The superscript in allows us to keep track of the life status of the request. The relation between and is given by = , where is a binary variable defined as: which indicates whether node processes request at . Finally, based on the definition (15), the storage constraint can be expressed as follows, considering that the storage allocated for an admitted request could be released after the end of its serving process:

Comment
The way we model latency and delay is aligned with other approaches in the literature. The work of Ma et al. [27] presents a system delay model which has the same components adopted in our paper; the communication delay in the wireless access is modeled as in our work using an | |1-like expression. Moreover, authors also assume that traffic is processed across a subset of computing nodes and the service time of edge hosts and cloud instances are exponentially distributed, hence the service processes of mobile edge and cloud can be modeled as | |1 queues in each time interval. The same assumption is made in [31]. In [28], the authors assume that both the congestion delay and the computation delay at each small-cell Base Station (by considering a Poisson arrival of the computation tasks) can be modeled as an | |1 queuing system; the work in [29] assumes that the baseband processing of each Virtual Machine (VM) on each User Equipment packet can be described as an | |1 processing queue, where the service time at the VM of each physical server follows an exponential distribution. Finally, the works [32][33][34][35] also adopt similar choices concerning the delay modeling.

Optimization Problem
Our goal in the resource calendaring problem is to maximize the profit computed as the total revenue obtained from serving the users' requests minus the network operation costs in terms of computation, storage and bandwidth resources, under the constraints (starting and ending times) of requests coming from different nodes: where is the revenue gained from serving request . The variables being optimized, reported under the max operator, are , , , , . Problem (0) contains both nonlinear and indicator constraints, therefore, it is a mixedinteger nonlinear programming (MINLP) problem, which is hard to be solved directly [36]. Since this problem contains the multi-commodity integer flow problem as a special case (in fact, in our work flows can be split only at the edge nodes and once admitted they cannot be further split and are therefore routed as integer flows), which is known to be NP-complete [37], it turns out to be NP-hard.
Moreover, we also face the following difficulties: a) routing variables  and request fraction variables are "intertwined": to find the optimal routing, the fraction of each request processed at each node should be known, and at the same time, to solve the optimal resource allocation for a request, the routing path should be known; b) (0) contains indicator functions and constraints, e.g., (7), (10), (15), etc., which cannot be directly and easily processed by most solvers. To deal with the above critical issues, we propose an equivalent reformulation of (0), which we call (.1), that we can efficiently solve with the Branch and Bound method. Intuitively, the reformulation in (.1) proceeds as follows: (a) we first handle the difficulties related to variables  and the corresponding routing constraints, then (b) we reformulate the link and processing latency constraints (viz., constraints (7) and (11)). Note that, the objective function of (.1) is the same as that of (0), while constraints (4)∼(8), (10)∼ (12) and (15) in (0) are reformulated to the constraints (57), (60)∼(62) and (67)∼(79) in (.1). For the sake of conciseness, we do not include the reformulation here. The interested readers can refer to Appendix A.

Heuristics
To solve our problem in a reasonable computational time, we propose a heuristics named Sequential Fixing and Scheduling (SFS) which realizes a good trade-off between admitting "valuable" user requests (i.e., the ones that provide high return to the service provider) and the resources they request in terms of transmission rate, storage and computation. A greedy approach is then illustrated as benchmark heuristic.

Sequential Fixing and Scheduling
SFS is detailed in Algorithm 1, and summarized in the flowchart of Figure 2. We first introduce the following auxiliary variables that indicates whether request is processed on node and that indicates whether request piece is routed via link . The hat notation (likê ) represents the values of the corresponding variables in the solution set  ⋆ . We start by sorting all requests in descending order according to the ratio ; this ranking is designed to give a higher weight to requests that generate more revenue and less cost to the operator. Then, we try to define a schedule where we admit as many requests as possible. For each request , we check whether its activation period overlaps with the one of other requests ′ that are already admitted, and in such case we say there is a conflict. The overlap value ′ is determined by the function ℎ _ (⋅) (line 6 of Algorithm 1, details are provided in Section 4.1.1).
Based on { ′ | ′ ∈ ∖{ }}, for each edge node ∈ , we select the maximum ′ for all ′ being processed at , and we identify this overlap value with (line 7). Next, we compute the ordered set  , which contains sets  of best candidate edge nodes to process request . In doing so, we consider and limit to the computation resource of each node in  , ∀ ∈  (line 8 of Algorithm 1; details in Section 4.1.2). If we successfully find some candidates ( ≠ ∅), we further update the residual bandwidth ′ for all links and create a weighted graph  ′ with the reciprocal bandwidth ′−1 . Then, steps in lines (12)(13)(14)(15)(16) are the solution exploration phase based on the set of candidate node groups, which can handle infeasibility situations in the optimization process. More specifically, we select the first  in  that permits to find a profitable solution ( ⩾ 0) according to the following criteria: we outsource to the nodes in  and bound the computation resource by setting and . Based on  ′ , we route each piece of request using the Dijkstra algorithm (lines [10][11][12][13][14]. After fixing variables , and the constraints related to , in (.1),  Figure 2: Flowchart of the SFS heuristic.

Algorithm 2 Check Overlap
Input: , ′ ,  ⋆ (solution); Output: ; 1: we start to optimize (.1) to get the profit and the solution denoted, respectively, by  and . If (.1) results infeasible in the current setting ( < 0), we reiterate on the other elements of  . If the result of the new optimization improves, we update the current best profit  ⋆ and solution  ⋆ , we hence admit request and allocate resources to it (including time slots, computation, bandwidth, and storage). We also update the lower bound of (.1) to =  ⋆ to accelerate the optimization (line 20). Finally, if the result does not improve or no candidate could be found, we reject and clear its corresponding settings of variables.

Check overlap
The ℎ _ function takes as input , ′ and the partial solution  ⋆ computed up to the current point, returns ′ and proceeds as detailed in Algorithm 2: i) it initializes two local variables and with the starting time ′ and deadline ′ of request ′ , respectively; ii) it verifies if ′ is admitted; if yes, it updates, respectively, and with the exact starting timê ′ ⋆ and ending time max ∈̂ ′ of ′ according to the solution  ⋆ ; iii) it computes the (partial) overlapping between and ′ as: = − , if > ; = − , otherwise (a negative value of means no overlapping); iv) Finally, it calculates and returns the maximum relative overlap value ′ between and ′ , which is expressed as min( max(

Find candidates
Algorithm 3 determines the appropriate subset of edge nodes that can process traffic requests. We first estimate , the remaining computation capacity of each node , based on̂ ′ in the solution  ⋆ , verifying that the conflicts are higher than a given threshold ( > ). The threshold reflects the availability of the computation resource for an in-using node in time conflict. Lower values will lead to a less efficient utilization of the computation resources, thus producing a less optimal solution, while higher values could lead to a higher bias in the estimation, therefore leading to an infeasible solution. In the experiments, we choose a proper value = 0.6. Then, we define  as the set of nodes satisfying ⩽ || ⩾ , where is a threshold on the remaining computation; controls the minimum remaining computation capacity that a candidate should have. Higher values will cause a lower resource utilization. We choose a small value around = 0.25 in the experiments. Note that very low values would cause overloading of the computing nodes and lead to an infeasible solution. Hence,  represents the set of nodes that are either in less conflict (for request ) or have enough remaining computation power. For each ∈  , we compute three features (denoted by  ), i.e., the negative hop distance between and (−ℎ (, , )), the indicator of whether is a source node or not ( ∉ { ′ | ′ ∈ }) and the estimated left computation capacity (lines 1-3). Based on  , we sort  in descending order to give more priority to a node that 1) is closer to the ingress node for , 2) better not to be a source, and 3) has more remaining computation capacity with respect to other nodes. Then, we try to add nodes to  1 , until < Σ , where Σ denotes the estimated total computation capacity that can be used and is a threshold controlling the total required computation capacity. Higher values will make the computation resource utilization more efficient, in cooperation with the solution exploration phase (lines 12-16), while lower values will result in less efficient utilization of computation resources. In the experiments, we set = 0.9. Finally, we return the ordered set  with  1 at the first place and the left nodes  −  1 being separately stored as unit sets of backup candidates (lines [5][6][7][8][9]. Notice that the values of the thresholds ∕ ∕ are appropriately chosen based on our experiments.

Greedy approach
Greedy is a heuristics alternative to SFS that we propose as a benchmark in Section 6. The detailed procedures are listed in Algorithm 4. Greedy shares some common steps with SFS, but it also exhibits several differences in that it applies different strategies to prioritize requests (line 3) and to search candidate nodes for processing requests (line 6, Algorithm 5); furthermore, it does not consider requests' overlap and the exploration of solutions in case of infeasibility. Note that Greedy could solve problems in a short computing time while still obtaining good solutions, as we discuss in Section 6.3. More specifically, it first sorts the requests in ascending order by the priority key (− , , ) and then tries to schedule them one by one. The sorting considers the revenue of a request in the first place, the starting time in the second place and finally the deadline in the third (last) place. Then, for each request , we try to guarantee sufficient computation power by using its closest neighbor nodes. The steps for searching candidate nodes are detailed in Algorithm 5. Compared with the strategy of SFS (Algorithm 3), Greedy estimates the left computation and the potential set of nodes  (lines 1-2) without considering the request overlap information , and collects candidate nodes until < Σ , where is a threshold similar to in SFS controlling the required number of computing nodes. The lower is, the more computation nodes are required. Unlike SFS, the remaining neighbor nodes are not used for further solution exploration in case of infeasibility; as a result, higher values will make the algorithm individuate fewer computing nodes, which may be not sufficient in some cases for processing the requests, leading to less profitable or infeasible solutions. We set = 0.6 in our experiments and evaluate the effect of on the performance in Section 6.3. Finally, this greedy searching strategy brings a faster computation time, but in general a less optimal solution.

Complexity Analysis
In this Section we provide a complexity analysis for both the SFS and Greedy approaches. SFS: Referring to Algorithm 1, the computation complexity comes mostly from the following two aspects: i) the for loop with | | iterations (line 4), together with the nested for loop with | | iterations (line 12), and ii) the optimization of the reduced problem (.1) in the nested loop (line 15). Regarding the nested for loop,  represents the set of candidate nodes groups picked from the neighbor nodes. In the worst case, it is max{| |} = ||. However, the actual value of | | is, in practice, much lower than || due to the few neighbor nodes that are typically needed for request offloading and the break step (line 16) in the nested loop. Regarding the optimization step, the reduced problem (.1) is still an MINLP problem, hence NP-hard, since it still contains both integer and continuous variables (e.g., scheduling and offloading variables) as well as nonlinear constraints (e.g., latency components). Therefore, for simplicity, we denote the complexity of optimizing (.1) as ( (||, , | |)), where (⋅) is a complexity function related to the number of requests, network topology and the time horizon in the problem. Then, the complexity of optimizing (.1) is ( (1, , | |)) since, in each iteration, only one request is involved. Therefore, the final complexity of SFS can be  Greedy approach: Compared with Algorithm 1 (SFS), Algorithm 4 has different request prioritizing and candidates seeking strategies, but no request overlap checking and exploration of solutions for infeasible situations. Therefore, regarding its computation complexity, the main difference is that Greedy does not have the solution exploration phase. Based on a similar complexity analysis as illustrated above for SFS, the complexity of Greedy can be expressed as (|| (1, , | |)).

Distributed Resource Allocation
In this Section we illustrate a distributed algorithm for our problem such that resource allocation decisions can be made directly by edge nodes with limited overhead. To this aim we adopt the Alternating Direction Method of Multipliers [13], which is a well-established optimization tool used to decompose a problem into multiple small subproblems that can be solved iteratively in a distributed fashion.
ADMM was originally introduced for solving convex problems with fast convergence properties. Recently, it has been used as an heuristic to approximately solve some nonconvex problems as well, but in this case convergence may not be guaranteed [38,39]. In the following, we propose a weighted ADMM algorithm to decompose and solve our scheduling problem, where a weighting strategy is exploited to balance the priorities of various decision variables in the optimization model. In practice, we observe that the weighting strategy is important for the fast convergence of ADMM in all the network scenarios we considered. For the sake of simplicity and clarity, we first present the ADMM formulation without our weighting scheme, then we present the weighted ADMM algorithm, built with few reformulations. Table 3 summarizes the main notations used in this section. In the following, notations like +1 , +1 and +1 , represent the value of the corresponding variables , and at iteration +1 of the ADMM algorithm. Notations , , , and represent the sub-blocks of vector , which correspond to the five main decision variables denoted by symbols , , , and , respectively. The same rule is applied to vector .

ADMM Formulation
Let us recall that our problem aims at scheduling and computing multiple requests from different ingress nodes in a network with a specific (and generic) topology, where each request can be split and offloaded to the nearby edge computing nodes. Hence, we can split the problem across both (i) the request set  and (ii) the node set . To derive the ADMM formulation of problem (.1), we proceed to reformulate it in 3 main steps, illustrated in Figure 3. Specifically, we first select the main decision variables and the corresponding constraints that bind variables with respect to and , which hinder the distributed optimization of the problem. The main variables are written in compact form for the sake of clarity and the convenience of derivation. This step transforms the problem into (.2). However, (.2) contains consensus variable , which means that, given a request , this information is shared by the edge nodes where the request might be offloaded to. This also prevents the distributed optimization. Then, to handle this issue, we split the consensus variable by introducing a local copy of on each possible edge node and reformulating the corresponding constraints (17), (18) and (19), and transform the problem into (.3). In (.3), both the objective function and variables are splittable across the index and , while the main constraints for the requests and capacities still bind the variables together. Finally, to solve this, we copy the main decision variables (denoted by ), introduce a penalty function based on the constraints, and then formulate the problem into (.4) based on 2-block ADMM. (.4) is a standard ADMM formulation which can be applied for distributed optimization.
In the following, we present in detail the 3 steps of reformulations. Firstly, we write the main decision variables in the following compact form: where (⋅) is the transpose of a given matrix. Let us define: Then, the resource calendaring optimization problem (auxiliary variables, like , , etc., are ignored here), can be reformulated as follows:  ∶ rest constraints of (.1), where  is the feasible set defined by the rest constraints of (.1) presented in Appendix A. Note that these constraints are splittable across the index and . For the sake of clarity, we change the previous auxiliary variable ⋆ (recall that ⋆ is the start time of request ) to ⋆ in our model, where ⋆ can be regarded as the duplicated information of starting time for request on node . Since ⋆ , ∀ , refers to the same starting time for request expressed in constraint (17), problem .2 is equivalent to (.1). Constraints (17), (18) and (19) perform the admission of a request, formulated by the consensus variable , while (20), (21) and (22) represent the capacity constraints for processing, storage and bandwidth, which respectively couple the variables { }, { } and { }.
To enable the distributed optimization, we need to split the consensus variable and reformulate the corresponding constraints (17), (18) and (19). To do this, we first introduce̊ as a local copy of for each request on each possible edge node, and add the consensus constraint̊ = , ∀ , ∀ , ∀ . Constraints (17) and (18) can thus be rewritten as: Then, substituting for 1 || ∑ ∈̊ (since̊ = , ∀ , , ) into constraint (19) and introducing an auxiliary variable̊ defined as: we can reformulate constraint (19) to ∑ ∈̊ = 0, ∀ . Let̊ = (̊ ) ∈ ,̊ = , and ∶= (̊ ,̊ , ,̊ , ) . We further reformulate the objective function with respect to . To do this, we define a function̊ ( ) as: Based on above, the problem .2 can be rewritten as where is the new feasible set defined by the constraints in (24) together with (25), (26) and (27). (.3) is more compact and clearer than (.2) in terms of the decision variables and constraints, and the objective function is now expressed as the sum of independent functions̊ ( ). However, we still have several constraints including the request scheduling constraints (29) and (30) as well as the capacities provisioning (31), (32) and (33), which bind the variables together with respect to the index for the request and for the edge node. This makes it difficult to compute the optimal solution in a distributed way. For this reason, in the following, we further reformulate (.3) into a 2-block ADMM form, as defined in [13], which can instead be optimized in a distributed manner. To this aim, let us now define an indicator function: where  is a feasible set defined by constraints (29)∼(33). By introducing an auxiliary variable ∶= ( , , , , ) as a copy of and setting = { } ( , )∈× , we have the following reformulation: ∈ , ∀ , ∀ .

ADMM Solution
To derive the solution for the optimization (.4), we first write the augmented Lagrangian function as follows: where̊ is the vector of Lagrange multipliers (or dual variables) and the component ⟨̊ , − ⟩ is the inner product. The last term ( 2 || − || 2 ) is a penalty term and is a positive coefficient. The rationale behind introducing such penalty term is that the augmented dual function can be shown to be differentiable under rather mild conditions on the original problem [13].
Then, based on the 2-block ADMM, we can write the solution (in a scaled form) as: where is the iteration index and ∶= ( , , , , ) =̊ ∕ is the scaled dual variable. The steps for updating +1 and +1 can be carried out independently in parallel for each ( , ) ∈ ×. Notice that the -update requires solving a problem having |||| blocks of variables.
In -update, since the 2 norm || − +1 − || 2 can be written in a separate form, and constraints (29)∼(33) are independent of each other, we could split the indicator function   ( ) and separately update the copying variables , , , and . The update solution for each sub-item of is written as follows (for the detailed derivation, we refer the interested readers to Appendix B): , and +1 , +1 , +1 , +1 , +1 are the average value for each item of +1 . A similar representation is done for +1 .
Note that the step for updating +1 can be carried out in parallel for each ( , ) ∈ ×. After gathering the average information of +1 and , the updates for +1 can be independently carried out in parallel for each ( , ) ∈  × . Finally, the updates are applied in parallel to +1 .

ADMM Convergence
To check the convergence of ADMM, we first compute the primal and dual residuals (denoted by +1 and +1 respectively): These values converge to zero as the ADMM algorithm progresses [13]. A recommended stopping criterion of ADMM is defined in [13] as follows: where +1 and +1 are the feasibility tolerances whose expressions are: where = ( +1 ) is the dimensionality of +1 , and and are the absolute and relative tolerances respectively. In practice, we can set = = 10 −5 . The choice of these values depends on the application scenario and the scale of variable values.

Weighted ADMM
In the augmented Lagrangian function (37), the variables in ∶= (̊ ,̊ , ,̊ , ) have different dimensions, i.e.,̊ has a dimension of 1,̊ , and̊ have the same dimension of | |, while the dimension of is ||| |. Therefore, the penalty term in Lagrangian function (37) equivalently assigns different weights for the variables. To balance the weights, we design a diagonal weight matrix = ( , √ | |, , , √ || ) based on the dimensions and the 2 norm of the penalty, and then reformulate equation (37) as: Then, the 2-block ADMM solution (in a scaled form) can be written as: For above equations, we abuse notation and let = −1̂ to simplify the formulations, equation (50) is transformed to the same form as equation (40). Based on the derivation in Appendix B, the solution for the newupdate (49) can be also transformed to the same form as the solution (41). Finally, the only change is -update which solves equation (48) instead of (38).
Correspondingly, the primal and dual residuals are rewritten as: And the feasibility tolerances are expressed are: To sum up, our ADMM-based distributed resource scheduling is defined in Algorithm 6. An advantage of ADMM is that the solving of the optimization problem is distributed among all or part of the edge computing nodes. Specifically, one edge node will play a role of management to distribute and coordinate the whole optimization process. Each edge node will separately compute the optimizations for +1 , ∀ ( -update) in its local place, and each sub-task could be also computed in parallel inside edge node (leveraging the different computing cores or servers) based on the solution (see Equation (38)). After all the computation is done, the management node will Compute +1 , ∀ , in parallel by optimizing (48) (41)), and also update the penalty parameter for the next iteration. Then, this information will be delivered to all edge nodes for continuing the next optimization round. During each iteration, the management node will check the convergence conditions to decide whether to finalize the whole tasks and report the final acceptable solution +1 , ∀ , based on the convergence condition (44).

Comment on convergence
To make the ADMM algorithm convergence fast, the penalty parameter ( ) can be properly tuned in each iteration of the solving procedure. In the ADMM update equations, a large value of gives a large penalty on violations of primal feasibility and hence tends to produce small primal residuals. Conversely, a small value tends to reduce the dual residual, but at the expense of reducing the primal feasibility and producing a larger primal residual. A general method is introduced in [40,41] for balancing the variations of both primal and dual residuals, which is mainly designed for convex programming problems. However, it may show some instability and non-convergence properties in some application scenarios [42]. Here, we propose problem-specific modifications on this method for the updating strategy, which suites to the characteristics of our optimization model (i.e., mixed-integer, non-linear, non-convex programming). We first introduce the original strategy of [40,41] as follows.
where > 1, > 1 and > 1 are parameters. Typical values can be = = 2 and = 10. The idea behind the above strategy (55) is to try to keep the primal and dual residual norms within a small factor ( ) of one another as they both converge to zero.
Our proposed strategy is shown as follows: otherwise.
where _ _ _ is to detect whether the ADMM algorithm is stuck in a local trap during the solving procedure. We define the local trap as a state when either of the primal and dual residuals keep unchanged for a certain interval, e.g., more than 5 iterations. Compared with the strategy in [40,41], we also eliminate the decreasing statement of , which, in practice, is a cause of instability and non-convergence of the algorithm.
We will illustrate in the next section how these approaches permit to make the algorithm converge in practical network scenarios considered in our numerical evaluation.

Numerical Results
In this section we evaluate the performance of the proposed model, the SFS and Greedy heuristics, as well as the distributed (ADMM-based) resource allocation algorithm in terms of the profit of the operator, expressed as in (.1), the serving rate (the fraction of admitted requests) and the computation time to get the solution.
Consequently, the rest of this section is organized as follows: section 6.1 presents the network topologies we have considered in our numerical evaluation campaign; section 6.2 describes the setup for our experiments; finally, section 6.3 discusses the results obtained in different network scenarios.

Network Topologies
We evaluate our optimization approach using multiple network topologies, described hereafter, including several random graphs as well as a topology built on a real network scenario.

Random graphs
We first consider Erdös-Rényi random graphs [43], setting the desired number of nodes and edges. As the original Erdös-Rényi algorithm may produce disconnected random graphs with isolated nodes and components, to generate a connected network graph we patch it with a simple strategy that connects isolated nodes to randomly sampled nodes (up to 10 nodes) in the graph. We generate several kinds of topologies with different numbers of nodes and edges, starting from simple ones to larger and more complex networks, as shown in Figure 4. The structural information for all topologies (including the one obtained in the real network scenario illustrated in the following) is reported in Table 4. All topology datasets are publicly available in our repository 1 . These topologies can be considered representative of various edge network configurations where multiple edge nodes are distributed in various ways over the territory.

A real network scenario
We further consider a real network scenario, with the actual deployment of Base Stations collected from an open database, OpenCellID 2 , which collects information of BSs from all over the world, including their positions. This topology was first introduced in [44], but in this paper we use it in a different context, solving the resource calendaring problem in a MEC context. Specifically, we considered the "Città Studi" area around Politecnico di Milano and selected one mobile operator (Vodafone) with 133 LTE cells falling in such area (see Figure 5(a)).
We then performed a clustering on such cells, as illustrated in Figure 5(b), obtaining 30 clusters. Finally, we generated the network topology which, as in real mobile scenarios, has a fat tree-like shape with edge nodes connecting aggregation nodes, in the following way, starting from the cluster centroids: • we connected any two nodes if their distance is lower than a given threshold (800 meters). By doing so, note that some "leaf"/edge nodes become connected to more than one aggregation node to increase redundancy and hence reliability of the final topology, as it happens in real networks. In Figures 5(b) and 5(c) the color (illustrated in the vertical bars) corresponds to the cluster size (number of cells contained in the cluster), with nodes aggregating more than 5 cells being regarded as the aggregation nodes; • we determined the Minimum Spanning Tree of the geometric graph weighted by the distance and cluster size, while preserving some redundant links mentioned above.    The resulting topology is illustrated in Figure 5(c); the average node degree resulting from the above procedure is 2.33. In such topology, edge servers can be installed in all nodes.

Experimental Setup
We implemented our model and heuristics using SCIP (Solving Constraint Integer Programs) 3 , an open-source framework that solves constraint integer programming problems. All numerical results presented in this section have been obtained on a server equipped with an Intel(R) Xeon(R) E5-2640 v4 CPU @ 2.40GHz and 126 Gbytes of RAM. The parameters of SCIP in our experiments are set to their default values. The results illustrated in the following figures are obtained by averaging over 50 instances, with 97% narrow confidence intervals.
We uniformly extract, at random, source nodes as well as the starting/ending times and duration, and the revenue gained by the operator in serving each request, in the [100, 300] range. We further generate random request rates on the ingress edge nodes of the network topologies (see Figures 4(a), 4(b) and 5(c)) according to a Gaussian distribution ( , 2 ), where is uniformly selected in the 30 to 50 Gb/s range and = 0.5. We further consider more complex scenarios by randomly splitting the "heavy" requests on each ingress node to spawn a variety of "small" different requests; specifically, for topology 30N50E (see Table 4), we generate one request on each of the five ingress nodes, then split each request into 6 parts to create a network scenario 30N50E30R having a total of 30 requests; for topology Città Studi, in the same way, we first create a scenario CittàStudi6R having 6 requests where each ingress node holds one, then each request is split into 5 parts to generate a scenario CittàStudi30R having a total of 30 requests. For the sake of simplicity, we assume that all links have the same bandwidth ( = 30 Gb/s) and nodes have the same computation capacity ( = 30 Giga cycles/s) and storage capacity ( = 40 GB). The costs of using one unit of these three resources, , , and , are all set to 0.01. Finally, we set the processing density = 1 and the storage requirement = 10 for all requests. In Table 5, we provide a summary of the reference values we define for the main parameters related to requests. Such values are representative of a scenario with a high load of requests relative to the limited computation. Our request rates result from the aggregation of requests generated by multiple users connected at a given ingress node. More specifically, the values of request rate are designed to cover several different scenarios, i.e., mice, normal and elephant request load. We select rate values and requests duration (starting and ending times) which are typical of a 5G usage scenario (eMBB, URLLC, and mMTC) [45]. For instance, a request which has a very short computation duration and a small rate could represent an URLLC use case, and a mission critical application, while a request with a higher computation duration and a high rate could represent an eMBB use case and an augmented reality service. In Table 5 Parameters setting -initial (reference) request data (for the case of high incoming request load). 3  12  5  50  200  A2  1  8  3  40  100  A3  5  20  8  45  200  A4  3  16  8  35  300  A5  5  18  6  55  250  A6  7  22  9 30 150   Table 5, almost all requests cannot be served using only the resources (computation capacity) at their respective ingress nodes. Note that our proposed model and heuristics are general, and can be applied to optimize resource allocation in all network scenarios with any parameters setting. Table 6 summarizes for each network scenario we considered in our numerical evaluation the network topology used, the users requests offered to the network (following Table 5 definitions), how they are split and the total number of requests. Table 7 illustrates the effect of on the profit value and on the corresponding computing time of the Greedy approach under three demanding scenarios (with limited computation capacity, obtained scaling the parameter by a factor 0.6, ∀ ). The parameter permits to strike a balance between the obtained performance and the computing time. In fact, Greedy obtains a higher profit value at = 0.6 in almost all scenarios except CittàStudi30R, where it has slightly lower profit value than that of point 0.4, but with a much shorter computing time. In scenario CittàStudi6R, Greedy obtains much better profit at point 0.6 than that obtained at points 0.4 an 0.8 with an increase of a factor 1.3 and 2.6, respectively.

Discussion of Results
In the following, we first compare the results obtained from both the exact model and the heuristics, including our ADMM-based approach, for a small network scenario: the 5N5E network with 3 requests (represented by 5N5E3R) in section 6.3.1. Then, in section 6.3.2, we analyze the effect of different parameters on the solution obtained by the two heuristics SFS and Greedy for two large network scenarios, i.e., one random topology 30N50E and one real network scenario Città Studi with 30 requests (represented by 30N50E30R and CittàStudi30R, respectively). For this latter topology, to model different levels of demand aggregation, we considered a further scenario (denoted by CittàStudi6R) with 6 requests that are the aggregations of the above 30 requests at the 6 ingress nodes, while maintaining the same total request rates. The 5N5E3R topology allows us to compare to the optimal solution the solutions obtained from the heuristics (SFS and Greedy) as well as the distributed algorithm ADMM. In fact, the exact model (.1) could be solved in a reasonable time only in the small topology (5N5E3R). Figure 6 plots the profit versus the request rate keeping the revenue fixed, where Optimal represents the result obtained by solving the exact model (.1). The decreasing trend of the profit for all the approaches, when increasing , is due to the fact that more resources are needed, hence the cost incurred by the operator increases while the revenue is fixed. This results into a profit decrease. The profit drops to around 300 for SFS, ADMM and Optimal, while around 200 for Greedy. The curves show a step-wise pattern due to the combinatorial expression of the profit in the objective function of the optimization (.1) and the only 3 requests contained in instance 5N5E3R. Both ADMM and SFS exhibit excellent performance since their curves practically overlap that of Optimal, while Greedy shows lower performance. In the small network scenario, 5N5E3R, ADMM and SFS can obtain good results mainly due to two reasons: i) the solution space of the optimization model for 5N5E3R is relatively small and simple, ii) the algorithms can capture the main issues of the problem: taking into account requests' priority and overlapping, namely in deciding their admission into the system, as well as performing an effective exploration of candidate computing nodes, etc, which all influence significantly the quality of the obtained solution. However, we can expect that in larger scenarios, a certain gap exists with respect to the optimum. As for the Greedy algorithm, its lower performance is mainly due to its differences with respect to SFS. Specifically, the Greedy approach adopts different strategies to prioritize requests and to search candidate nodes for processing requests. Besides, it does not consider requests' overlap and the exploration of solutions in case of infeasibility. Therefore, it may show lower performance.

Comparison of the exact model and
As for the computing time, Table 8 summarizes this performance figure for our proposed algorithm in the considered network scenarios; for example, in the 5N5E3R scenario, Optimal has an average computing time of 62 s, ADMM of 59 s, while Greedy takes 1.9 s and SFS just 1.3 s. A detailed discussion on the higher computing time shown by ADMM is provided in Section 6.3.3 where we also analyse its usage in a larger network scenario.

Analysis of SFS and Greedy heuristics' results for large networks
In the following, we illustrate the objective function value, in terms of profit, as a function of different parameters for the two network scenarios 30N50E30R and CittàStudi30R (Figures 7-11) and serving rate, which is plotted as a function of the request rate and revenue ( , ) (see Figure 9).
Effect of the request rate ( ): Figures 7(a) and 7(b) report the profit as the function of the request rate , scaled from 0.5 to 2.0 with respect to their initial values (see Table 5), for scenarios 30N50E30R and CittàStudi30R, respectively. As increases, the profits for all approaches in the two scenarios decrease, since the revenue from serving each request is fixed while the system cost for serving the growing requests increases. In Figures 7(a) and 7(b), the SFS curves show a similar trend as increases, while Greedy curves follow a slightly different pattern, specifically, the profit in 30N50E30R decreases smoothly as increases, while in CittàStudi30R, the profit rapidly decreases after the scaling point 1.4. Finally, SFS performs better than Greedy in both scenarios with average gaps up to 8% and 9%, respectively. Effect of the request rate and revenue ( , ) : Figures 8(a) and 8(b) illustrate the profit variation versus the request rate and revenue for scenarios 30N50E30R and CittàStudi30R, respectively. Values of and , ∈  are both scaled, at the same time, from 0.5 to 2.0 with respect to their initial values. Such scaling implicitly indicates that serving each request provides a revenue proportional to its arrival rate. As ( , ) increase, the profits (see ; the network operator, in fact, is able to select and admit the requests which can cover the system cost and provide, at the same time, higher profit. In Figure 8(b), when the scale is larger than 1.7, Greedy shows a slight decreasing trend, since it fails to find a good solution to balance the cost and profit.
Figures 9(a) and 9(b) illustrate the variations of serving rate versus the request rate and revenue for scenarios 30N50E30R and CittàStudi30R, respectively. In Figure 9(a), when the request rate is low, all user requests can be served; when it increases, specifically after the point around 1.2, the serving rate of SFS decreases since the system can accommodate less requests, which become more demanding, hence costlier in terms of required resources. In scenario CittàStudi30R (see Figure 9(b)), when the scale factor is lower than 1.25, the serving rates for both approaches slowly decrease as the request rates increase; after that, the decrease for Greedy becomes rapid while for SFS it is slower. Finally, SFS exhibits better performance compared to Greedy, with gaps up to 18% for the profit and 20% for the serving rate.
Effect of the link capacity : The variation of the profit as a function of the link capacity (scaled from 0.1 to 1.2 with respect to its initial value), is illustrated in Figures 10(a) and 10(b) for scenarios 30N50E30R and CittàStudi30R, and they show a very similar trend. When increases, the profit increases for both SFS and Greedy in the two scenarios. Both of them increase fast before the scaling point 0.75, reflecting the positive effect of the available link capacity on the profit. Additionally, SFS performs better than Greedy with clear gaps: up to 77%. For larger values of the available link capacity, there are naturally enough resources to satisfy the requests' requirements; SFS and Greedy hence perform similarly and converge to specific values, and the gap between them also decreases.
Effect of the computation capacity : Figures 11(a) and 11(b) show the variations of the profit against the edge node computation capacity , scaled with respect to its initial value from 0.5 to 1.5, in scenarios 30N50E30R and CittàStudi30R. When increases, the profit first increases rapidly and then converges to a specific plateau value for all approaches. Note that, for all allocation algorithms, the increase in terms of achieved profit can be up to 300, while the increase of the serving rate is around 0.4. These trends reflect the strong effect of the available computation capacity on the profit and serving rate. Additionally, SFS performs better than Greedy with clear gaps (up to 13% for the profit). With the increase of computation capacity, the performance gap between SFS and Greedy decreases since the utilization of enhanced algorithms is less critical to perform a good resource allocation, when resources are abundant. In both network scenarios, SFS allows the operator to achieve higher profit, which stabilizes when the scale of computation capacity is above the 0.9 value, while Greedy converges after the computation capacity is scaled up to around 1.1 for 30N50E30R, and around 0.9 for CittàStudi30R.
Finally, in Table 8, SFS exhibits an average computing time of 1096 s in scenario 30N50E30R and of 362 s in scenario CittàStudi30R, confirming its efficiency in computing good solutions in a short time. The Greedy approach needs less computation time, on average 822 s in 30N50E30R and 325 s in CittàStudi30R, to obtain a solution, at the cost of higher performance gaps with respect to the SFS heuristic. Besides, compared with 30N50E30R, CittàStudi30R requires slightly less computing time to obtain the solutions since its topology has a relatively smaller size and a fat-tree structure. Generally, when the network scenario is smaller (e.g., 5N5E3R, CittáStudi6R), SFS can be slightly faster than Greedy. In this case, we observe that the quality of the heuristic trial solutions for the subproblems influences the efficiency of the optimization solvers. SFS provides better solutions, compared with Greedy. On the other hand, for larger networks, the exploration of SFS becomes heavier, which consumes a larger amount of time. Since the Greedy algorithm does not have this overhead, in such scenarios it runs faster than SFS.

Analysis of ADMM results
To measure the performance of the distributed, ADMM-based algorithm described in Section 5, we run experiments comparing the behavior of ADMM, SFS and Greedy in the CittàStudi6R network scenario with 6 requests. Figures 12(a), 12(b) and 12(c) illustrate the variation of the profit as a function of scaling parameters , , and , respectively. The curves in the sub-figures have trends similar to those in Fig. 7, 10 and 11, respectively, thus confirming the effects of these parameters on the solutions obtained in different network scenarios. The curves also show a step-wise pattern (like Figure 6) due to the same reasons highlighted before. Note that the curves practically overlap, except for very high traffic requests/scaling factors (see Fig. 12(a), particularly with scaling factors below 1.7); the same behavior can be observed in Fig. 12(b), especially for scaling values larger than 0.9, as well as in Fig. 12(c) for scaling factors above 0.8). This indicates that our proposed distributed algorithm achieves practically the same performance as the centralized algorithm (SFS) in a large set of real network settings.
As for the computing time, SFS requires, on average, 19 s, Greedy takes 36 s, and ADMM takes 365 s, with an average of 143 iterations. The main reason for the higher computing time of ADMM is due to the number of iterations needed for the convergence, the computing time of each subproblem and also the parallelization of the solving process for all subproblems in the environment. For instance, regarding topology CittàStudi6R, the maximum number of subproblems to be computed in parallel for each iteration of ADMM is |||| = 180 and each subproblem takes  around 1 ∼ 2 seconds. To solve the optimization in a reasonable time, we limit the number of neighbor edge nodes that can be explored to 5 nodes, which reduces the number of the subproblems to 5|| = 30. The disadvantage is that it degrades the performance of ADMM. In each iteration of ADMM, these subproblems are solved on the simulation server which has 20 (< 30 subproblems) cores. As mentioned above, in this case, ADMM takes around 365 s with an average of 143 iterations. In a real edge network environment, all subproblems in each iteration can be solved distributedly on the edge nodes, and as a result both the performance and the computing time can be certainly improved. For a larger network scenario, ADMM has higher potential compared to SFS and Greedy w.r.t. the performance and computing time, since for ADMM these are mainly influenced by the solving process of each subproblem ( ∈  × ) whose complexity only depends on | |, i.e., the number of slots considered in the time horizon, while for both SFS and Greedy algorithms it depends on the complexity of the original problem which increases exponentially w.r.t. the problem scale. Table 9 compares the profit value obtained and the corresponding computing time of different approaches in network scenarios CittàStudi30R and 30N50E30R. Three scaling points for the request rate , ∀ are selected in each scenario. Regarding the profit value, ADMM performs better than SFS and Greedy except at scaling point 0.5 in the 30N50E30R scenario, where the profit obtained by ADMM is slightly lower than that achieved by SFS. For the computing time, the Table reports for ADMM two values: one is the computing time measured in a simulation server with 20 CPU cores (the one we used in our measurement campaign), and the other is the estimated computing time for a real edge computing network where the computation of subproblems can be fully distributed. We estimate the value based on the following analysis. In the experiments, for requests from different ingress nodes, we make the algorithms (SFS, Greedy, ADMM) explore the 5 nearest neighbor nodes for each ingress node to limit the exploration space and accelerate the process assuming their total computation capacity is sufficient. Therefore, for ADMM, the number of parallelized subproblems is equal to ||| | = 30 * 5 = 150, and the theoretical computing time in real networks would be 20∕150 = 1∕7.5 times lower than that obtained with the simulation server; for instance, in the CittàStudi30R scenario at point 0.5, the estimated computing time is 6676∕7.5 = 890 . Compared with SFS and Greedy, the estimated computing time of ADMM is larger in scenario CittàStudi30R due to the overhead of ADMM caused by the many iterations needed for a convergence to the solution. As the problem scale increases, the overhead of ADMM becomes negligible compared to the total computing time. For instance, in scenario 30N50E30R, the estimated computing time of ADMM is almost at the same level as in scenario CittàStudi30R, it is also close to the computing time of SFS, and even lower than it at point 1.5. For SFS and Greedy, due to the higher complexity, the computing time increases of about 2.8 times in scenario 30N50E30R.
Another aspect to be considered is that the performance of ADMM depends on the initial penalty parameter and the updating strategy (see Algorithm 6), which can be further tuned to achieve the best performance. In this work, we use empirical and intuitive settings for ADMM, for simplicity, and at the same time, to demonstrate the potential of a distributed algorithm applied in the resource scheduling for edge computing networks. A rigorous fine tuning will be the subject of future work.
Finally, we would like to emphasize that a key advantage of ADMM is that the optimization for resource scheduling can be solved distributedly on all edge computing nodes, while providing a very good solution for the operator. This feature of ADMM is very important: it allows the operator to alleviate the problems deriving from a single point of failure and to obtain a good scheduling solution in an environment where, in several practical situations, only a distributed scheme can be applied for optimizing the resource allocation and scheduling.

Conclusion
In this paper we formulated and solved the resource calendaring problem in mobile networks equipped with Mobile Edge Computing capabilities. Specifically, we first proposed an exact optimization model and an effective heuristic able to obtain a near-optimal solution in all the considered, real-size network scenarios.
We further proposed a distributed resource allocation algorithm, based on the ADMM method, that we extended using a weighting approach especially tailored to our problem, which allows each node to take local decisions coordinating its actions with the other nodes, converging reasonably fast to near-optimal solutions, as we illustrated in our numerical evaluation which includes both random geometric graphs and realistic mobile network topologies obtained from actual cell positions. The decisions we optimized include admission control for the user requests offered to the network, their calendaring (scheduling) and bandwidth constrained routing, as well as the determination of which nodes provide the required computation and storage capacity. Calendaring, in particular, permits to exploit the intrinsic flexibility in the services demanded by different users, whose starting time can be shifted without penalizing the utility perceived by the user while, at the same time, permitting a better resource utilization in the network.
The objective function we have defined in this work is composed of two terms; the first one is the revenue and is calculated as the product of a real coefficient, , and the admission variable . The second term is the cost incurred by the operator for serving the users' requests; it is calculated as a function of the computation/storage and bandwidth resources, and it directly depends on the total amount of traffic of the request to be served. Our choice of a fixed revenue per type of request in this paper is three-fold: i) a pricing model (the revenue) which is proportional to the amount of traffic of the user's request and that ignores the type of request may result inappropriate in some scenarios. In fact, each type of request may require different amount of computation/storage/bandwidth resources. ii) The cost defined in the second term of the objective function (which could be perceived as the lower bound of the price to be declared to the user) takes already into account all the aspects mentioned before (in point (i)), including the amount of traffic of the users' requests. iii) Finally, a fixed revenue has been introduced to make the model more flexible and able to admit in the system the requests that can be satisfied with the available resources.
However, it would be interesting to devise and numerically analyze other objective functions with a more sophisticated pricing model (e.g., with a fixed and a variable/proportional part according to the requests types and requirements [46]) for edge computing services. This aspect will be considered in a future extension of this work.
Other future research directions include more experiments on the ADMM approach, considering larger networks and fine tuning the initial penalty parameter and the updating strategy, as well as the implementation of the approach on a real network testbed. Another interesting point that is worth investigating is to extend the model capturing errors and subsequent retransmissions in the processing problem, which has an impact on the experienced latency and deadlines of users' requests.
requests are coming through this dummy node and going to each source node with volume , i.e. = 1, ∀ , ∀ ∈  , where  = {(0, ) | ∈ } is the dummy link set. Then, we extend the definition of Φ − to Φ − = {( ′ , ) ∈  ∪  }. Equation (6) is hence transformed as: Correspondingly, we add the following constraints to the set  of dummy links: The final stage of our procedure is the definition of the constraints that guarantee all desirable properties that a routing path must respect: the fact that a single path is used (a request piece is no more splittable), the flow conservation constraints that provide continuity to the chosen path, and finally the absence of cycles in the routing path  . We would like to highlight that the request can be only split at source node , and each portion of such traffic is destined to an edge node , and this is the reason why we have multiple routing paths  , ∈ {1, 2, ⋯}.
To this aim, we introduce the following conditions: • For an arbitrary node ′ , the number of incoming links used by a path  is one, and thus variables should satisfy the following condition: • The flow conservation constraint (see Eq. (59)) implements the continuity of a traffic flow.
• Every routing path should have an end or a destination to avoid loops. This can be ensured by the following equation: Satisfying them along with the constraints illustrated before can guarantee that such properties of the routing path are respected. The proof is as follows: Proof. a) Substitute Eq. (58) into (59) and make the following transformation: c) From a) and b), we have: d) Based on c), constraint (60), conditions (61) and (62) can be written as: Their practical meaning is explained as follows: • (64) ensures dummy link (0, ) to be the zeroth link in any routing path  if > 0, • (65) ensures node to be the end node of the last link in any routing path  if > 0, • (66) ensures that if ∈ ∖{ ′ } is an intermediate node in a routing path  ′ , should have only one incoming link and one outgoing link. It also indicates the continuity of a request flow. e) Given a non-empty routing path  ′ ( ′ > 0), check its validity by using the above conditions: • Let = in (66), then based on (64), ∑ ∈Φ + ′ = 1. Next, we assume 1 = ( , 1 ) is the first link of the routing path  ′ , then ′ 1 = 1; • If 1 = ′ , then the path is found, otherwise, we continue with the following steps: • Let = 1 in (66), due to Based on the above reformulation of routing, the flow conservation constraints can be further improved as follows: (68)

A.2 Link Latency
Based on the above definition of the routing variable , we can rewrite constraint (8) as: where Λ = + and = 1 is a constant. Note that the term (1 − ) permits to implement condition ∈  in Eq. (8).
We now introduce variable ℎ , defined as follows: This permits to transform Eq. (7) as = ∑ ∈ ℎ . We then need to linearize the product of the binary variable and the continuous variable ℎ , and to this aim we introduce an auxiliary variable = ℎ , thus also eliminate . We first compute the value range of ℎ by considering the two cases: if = 1, the denominator of ℎ becomes − , therefore, the upper limit of the denominator is . Given that ℎ represents the single link latency which must be less than the allowed maximum latency ( − − ) , therefore, the range of ℎ is [ −1 , (( − − ) ) −1 ]. Then, the linearization is performed by the following constraints.
At the same time, the link latency is rewritten as:

Since
= is the product of binary and continuous variables, we linearize it as: Please remind that is a binary variable which is equal to 1 if ⋆ ⩽ < ⋆ + + ⌈ ⌉ , and 0 otherwise (see Eq. (10)). As we can see both upper and lower bounds of are variables. We reformulate by the following constraints: where is an auxiliary integer variable for expanding the ceil operation over . The first and the second inequalities respectively enforce = 0, when < ⋆ and ⩾ ( ⋆ + + ), which is the ending time of the link transmission.

A.3 Processing Latency and Storage Provisioning
Equation (11) is a nonlinear indicator function of the variables and . To handle this issue, we first introduce an auxiliary variable to indicate whether request is processed on node . According to the definition of , we have the following constraint: where > 0 is a big value and such constraint implies that if = 0, the request is not processed on node , i.e. = 0. Based on the above, we can rewrite constraint (12) as: Note that the term (1 − ) permits to implement condition > 0 in Eq. (12). In equation (11), we observe that if = 1, we have: To handle this case, we first define a new variable ′ as follows: From this equation, we have = 1 ⇒ ′ = > 1 and = 0 ⇒ ′ = 1 , = 0. More in detail, this indicates that if a request is accepted and processed on a set of nodes  ⊆  (i.e., = 1, ∈  ), the processing latency is determined by max ∈ > 1 , therefore, ′ satisfies the related constraints and represents the exact processing latency when request is accepted. Instead if is rejected, then we have = 0, = 0, ∀ , . Based on constraint (4) specifying that the ending time depends on the maximum latency and considering that a rational and meaningful request should satisfy + ⌈ 1 ⌉ < − , we have = + ⌈ ′ ⌉ < . Therefore, Based on the consensus constraint, the above minimization problem can be splitted into || independent unconstrained problems and each subproblem ( ) can be written as: Taking the derivative of the above objective function w.r.t. equal to 0, we could get the solution of . Since = , ∀ , ∀ , the solution of is written as: where +1 and denote respectively the two average terms.

For
, which is a copy of variable̊ indicating the fractions of requests processed on different edge nodes, the update in (39) can be also splitted to || independent problems, and each subproblem ( ) can be written as: This subproblem is a convex optimization problem. We can use the method of Lagrange multipliers to optimally solve it. To do so, we first write the Lagrangian function as: where ≠ 0 is the Lagrange multiplier. Then, the KKT conditions can be written as: Based on the above equations, we can obtain the following solution: where +1 and denote respectively the two average terms.

For
, which is a copy of variable representing the computation capacities of edge nodes allocated to different requests, in a similar way, we could split the update in (39) to || independent unconstrained problems, and each subproblem ( ) is written as: ⩾ } corresponding to the constraint (31) (we recall its formulation: ∑ ∈ ⩽ ) which is the reservation constraint of an edge node computation capacity. We can simplify the above minimization problem by introducing = 1 || ∑ ∈ . The derivation is detailed as follows. The above minimization can be rewritten as: where  = { | ⩽ ⩽ } corresponding to set  , which represents the constraint ⩽ || = ∑ ∈ ⩽ . Minimizing over with fixed and following the same way of solving , we have the solution: where +1 and denote respectively the two average terms. Then, substituting (83) back into the minimization, we get the following unconstrained problem: Then, the update of is reduced to an optimization over the variable .

For
, which is a copy of variable̊ related to the computation capacity and storage provisioning of edge nodes, the update has the similar structure as the one of . Following the same procedure of update, the problem can be splitted to || independent unconstrained problems. We first introduce a variable = 1 || ∑ ∈ and obtain the solution based on : where +1 and denote respectively the two average terms. can be computed through optimizing the following unconstrained problem: where  = { | ⩽ ⩽ } corresponding to constraint (32) (here we recall its formulation, that is ∑ ∈̊ ⩽ ) which is the reservation constraint of an edge node storage capacity.

For
, which is a copy of variable representing the fractions of link bandwidth sliced to different requests, following the same procedure of update, we first introduce a variable where +1 and denote respectively the two average terms. can be computed through optimizing the following unconstrained problem: where  = { | ⩽ ⩽ } corresponding to the constraint (33) (we report for clarity its formulation here: ∑ ∈ ∑ ∈ ⩽ ) which is the reservation constraint of a link capacity. Based on the above simplified updates for , substituting equations (80), (82), (83), (85) and (87) into (40) (whose expression is recalled here: +1 ∶= + +1 − +1 ), we have: Equation (89) shows that the values of components ( , , , ) in +1 are equal to their corresponding average values, respectively. Thus, we further simplify equations (82), (83), (85) and (87) as follows: where the variables , , are determined by the following optimizations based on above derivations: In the above optimizations (94), (95) and (96), all of them are composed of two components: i) the indicator function on decision variable which can be regarded as the constraint, e.g.,   (|| ), ii) the 2 norm which represents the Euclidean distance, e.g., || − ( +1 + )|| 2 . Thus, they are equivalent to three different Euclidean projections onto the convex sets  ,  and , which have the following closed form solutions: , and +1 , +1 , +1 , +1 , +1 are the average value for each item of +1 . Note that a similar representation is applicable to +1 .