Collaborative Cloud and Edge Mobile Computing in C-RAN Systems with Minimal End-to-End Latency

Mobile cloud and edge computing protocols make it possible to offer computationally heavy applications to mobile devices via computational offloading from devices to nearby edge servers or more powerful, but remote, cloud servers. Previous work assumed that computational tasks can be fractionally offloaded at both cloud processor (CP) and at a local edge node (EN) within a conventional Distributed Radio Access Network (D-RAN) that relies on non-cooperative ENs equipped with one-way uplink fronthaul connection to the cloud. In this paper, we propose to integrate collaborative fractional computing across CP and ENs within a Cloud RAN (C-RAN) architecture with finite-capacity two-way fronthaul links. Accordingly, tasks offloaded by a mobile device can be partially carried out at an EN and the CP, with multiple ENs communicating with a common CP to exchange data and computational outcomes while allowing for centralized precoding and decoding. Unlike prior work, we investigate joint optimization of computing and communication resources, including wireless and fronthaul segments, to minimize the end-to-end latency by accounting for a two-way uplink and downlink transmission. The problem is tackled by using fractional programming (FP) and matrix FP. Extensive numerical results validate the performance gain of the proposed architecture as compared to the previously studied D-RAN solution.


I. INTRODUCTION
Mobile cloud and edge computing techniques enable computationally heavy applications such as gaming and augmented S. Shamai is with the Department of Electrical and Computer Engineering, Technion, Haifa 3200003, Israel (email: sshlomo@ee.technion.ac.il). reality (AR) by offloading computation tasks from batterylimited mobile user equipments (UEs) to cloud or edge servers which are located respectively at cloud processor (CP) or edge nodes (ENs) of a cellular architecture [1]- [7]. In systems with both cloud and edge computing capabilities, computation tasks can be opportunistically offloaded either to ENs or to the CP [8]. For example, it may be desirable to offload latencyinsensitive and computationally heavy tasks to a CP, while relatively light tasks with more stringent latency constraints can be offloaded to edge servers in ENs.
The optimization of the offloading decision policy was studied in [9], [10] by focusing on the application layer and without including constraints imposed by the Radio Access Network (RAN). To the best of our knowledge, reference [3] for the first time studied the joint optimization of computation and communication resources for mobile wireless edge computing systems, with follow-up works including [4]. Both papers [3], [4] aimed at minimizing energy expenditure under constraints on the end-to-end latency that encompass the contributions of both communication and computation. While [3] accounts only for uplink transmission, reference [4] also includes the contribution of downlink communication, which is required to feed back the results of the remote computations. To overcome the inherent non-convexity of the resulting optimization problems, the authors in [3], [4] applied successive convex approximation (SCA) [11], [12], which efficiently finds a locally optimal solution for constrained non-convex problems. Extensions in [13], [14] studied edge computing-based AR applications [13] and edge computing via an unmanned aerial vehicle (UAV) mounted cloudlet [14].
In a system with both cloud and edge computing capabilities, computation tasks can be partially offloaded to CP and ENs [8]. Reference [8] tackled the problem of jointly optimizing communication and computational resources with the goal of minimizing a weighted sum of per-UE end-to-end latency metrics within a distributed RAN (D-RAN) architecture [15,Sec. III]. The authors in [8] developed closed-form solutions for optimal resource allocation and task splitting ratios by focusing on the design of uplink communication from UEs to ENs and CP while assuming orthogonal time-division multiple access (TDMA) on wireless access uplink channel and a fixed allocation of fronthaul capacity across the UEs. Reference [16] also addressed the design of the task splitting ratios under the assumption that the task of each UE can be split into multiple subtasks that are offloaded to multiple ENs.
In a D-RAN, ENs perform local signal processing for channel encoding and decoding. Thus, the overall performance can be degraded by interference in dense networks. In this paper, we propose integrating collaborative fractional cloudedge offloading within a cloud radio access network (C-RAN) architecture [17], while accounting for the contributions of both uplink and downlink. In a C-RAN, as illustrated in Fig. 1, joint signal processing, in the form of cooperative precoding and detection, at the CP enables effective interference management. Unlike the case of D-RANs, the design of C-RAN systems entails the additional challenge of optimizing the use of ENs-CP fronthaul links [18]- [20]. In this regard, we note that, although fronthaul constraints were also considered in [8] for the design within a D-RAN system, a simple data forwarding model was assumed with fixed capacity allocation among the UEs. In [21], the authors tackled the optimization of functional split for collaborative computing systems equipped with a packet-based fronthaul network. However, it was assumed in [21] that the physicallayer (PHY) functionalities, which include channel encoding and decoding, are located only at ENs. In [22], the authors addressed the task allocation and traffic path planning problem for a C-RAN system under the assumption that the service latency consists of task processing delay and path delay only on fronthaul links.
In this work, we address the optimization of C-RAN signal processing for the purpose of enabling collaborative cloud and edge mobile computing with minimal end-to-end twoway latency. We proceed by first reviewing the design of collaborative cloud and edge computing system within a D-RAN architecture. Unlike [8], [23], which considered oneway uplink design with inter-UE TDMA and fixed fronthaul capacity allocation, we address the design of two-way communications with both TDMA and non-orthogonal multiple access strategies and we treat the fronthaul capacity allocation as optimization variables. Then, we address the design of C-RAN system for collaborative offloading. For all the design problems, we consider the criterion of minimizing two-way end-to-end latency for computation offloading as in [8], [24]- [26]. To tackle the formulated problems, which turn out to be non-convex, we adopt fractional programming (FP) and matrix FP [27], [28]. We present extensive numerical results that confirm the convergence of the proposed optimization algorithms, the advantages of C-RAN architecture as compared to D-RAN [8], and the impact of collaborative cloud and edge computing on latency with C-RAN.
The paper is organized as follows. In Sec. II, we describe the system model including the computational tasks, computational capabilities, wireless channel and fronthaul transmission models. In Sec. III, we discuss the design of collaborative cloud and edge mobile computing system within the D-RAN architecture, and the design for a C-RAN system is discussed in Sec. IV. We provide extensive numerical results in Sec. V to validate the performance gain of the proposed architecture as compared to the D-RAN solution. We conclude the paper in Sec. VI.
Notations: We denote the set of all M ×N complex matrices by C M×N . The notation x ∼ CN (µ, Ω) indicates that x is a column vector following circularly symmetric complex Gaussian distribution with mean vector µ and covariance matrix Ω. We also use the notation I(x; y) to represent the mutual information between random vectors x and y. A block diagonal matrix, whose diagonal blocks are given as A 1 , . . . , A L , is denoted by diag({A l } l∈{1,...,L} ). Lastly, E[·] represents the expectation operator, and ||x|| denotes the Euclidean 2-norm of a vector x.

II. SYSTEM MODEL
As illustrated in Fig. 1, we consider a collaborative cloud and edge mobile computing system, in which N U singleantenna mobile UEs offload their computational tasks to a network consisting of N E ENs and a CP. In order to exchange computational input information, the UEs communicate with the ENs over a wireless uplink channel, and each EN is connected to the CP through dedicated fronthaul link of finite capacity C ul F bits per second (bps). For communication in the reverse direction from CP to each EN, the fronthaul has capacity of C dl F bps, and the ENs transmit to the UEs in a wireless downlink channel. For convenience, we define the sets N U {1, 2, . . . , N U } and N E {1, 2, . . . , N E } of indices of UEs and ENs, respectively. We denote the number of antennas of EN i as n E,i , and the number of all ENs' antennas is n E = i∈NE n E,i . The bandwidths of uplink and downlink channels are W ul and W dl , respectively, which are measured in Hz.

A. Computational Tasks and Collaborative Computing Model
As in [4], [8], we assume that the UEs have limited computing powers, and hence offload their whole tasks to ENs or CP without local processing. We define b I,k and b O,k as the numbers of input and output bits for the task of UE k. We assume that V k CPU cycles are required to process one bit of the task of UE k so that the task of UE k requires b I,k V k CPU cycles in total. The computing powers of each EN i and CP are denoted by F E,i and F C , respectively, whose units are CPU cycles per second.
For each UE k, we allow for a collaborative cloud and edge computing [4], [8]. This means that a part of the task of UE k is processed by a predetermined EN i k , while the rest of the task is offloaded to the CP. We define a variable c k ∈ [0, 1] which controls the fraction of the task of UE k that is processed by EN i k . Accordingly, EN i k receives the input information of c k b I,k bits from UE k, runs c k b I,k V k CPU cycles, and reports the resulting output information of c k b O,k bits back to UE k. Similarly, the CP receives (1−c k )b I,k input bits from UE k, runs (1 − c k )b I,k V k CPU cycles, and sends We define N U,i as the set of UEs that are associated with EN i, i.e., Therefore, if we denote as F E,i,k the computing power of EN i assigned for UE k, the variables F E,i,k , k ∈ N U,i , are subject to the constraint The edge computation latency τ exe E,i,k for UE k at EN i with k ∈ N U,i is given as Similarly, denoting the computing power allocated to UE k by the CP as F C,k , the variables F C,k , k ∈ N U , should satisfy the constraint The cloud computing latency τ exe C,k for UE k at the CP is given as

B. Wireless Channel Model for Edge Link
Assuming the flat fading channel model for both the uplink and downlink wireless edge links, the received signal vector y ul i ∈ C nE,i×1 of EN i on the uplink is given as where h ul i,k ∈ C nE,i×1 denotes the channel vector from UE k to EN i; x ul k ∈ C 1×1 indicates the transmit signal of UE k; and z ul i ∼ CN (0, σ 2 z,ul I) is the additive noise vector. Similarly, the received signal y dl k ∈ C 1×1 of UE k on the downlink can be written as where h dl k,i ∈ C nE,i×nE,i represents the channel vector from EN i to UE k; x dl i ∈ C nE,i×1 denotes the transmit signal vector of EN i; and z dl k ∼ CN (0, σ 2 z,dl ) denotes the additive noise.
The transmit powers of each UE k and EN i are limited as  where P ul and P dl represent the maximum transmit powers at each UE and EN, respectively. We define the maximum signalto-noise ratios (SNRs) of the uplink and downlink channels as SNR ul max = P ul /σ 2 z,ul and SNR dl max = P dl /σ 2 z,dl , respectively. The symbols described in this section are summarized in Table  I.

III. OPTIMIZATION FOR THE D-RAN ARCHITECTURE
In this section, we discuss the design of the collaborative cloud and edge mobile computing system under a D-RAN architecture [15,Sec. III]. Unlike [8], which considered oneway uplink design with inter-UE TDMA and fixed fronthaul capacity allocation, we address the design of two-way communications with both TDMA and non-orthogonal multiple access strategies while treating the fronthaul capacity allocation as optimization variables.
In D-RAN, each EN i locally decodes the uplink input information transmitted by the associated UEs N U,i without cooperating with nearby ENs. Also, in the downlink, the computation output information for UEs N U,i is solely encoded and transmitted by the serving EN i. We discuss the designs with orthogonal TDMA and non-orthogonal multiple access strategies in Sec. III-A and III-B, respectively.

A. Orthogonal TDMA
With TDMA, N U UEs communicate with N E ENs on the wireless edge link while being assigned different time slots so that there is no inter-UE interference on wireless channel. We define u ul k ∈ [0, 1] and u dl k ∈ [0, 1] as the uplink and downlink time fractions allocated to UE k. Thus, the defined fraction variables u {u ul k , u dl k } k∈NU should satisfy the constraint In the uplink, UE k transmits a baseband signal which encodes the input information for its task. Assuming that Gaussian channel codebooks are used, the transmitted signal x ul k of UE k is distributed as x ul k ∼ CN (0, p ul k ). Since there is no co-channel interference with orthogonal TDMA, the transmit power p k of UE k is set to p ul k = P ul without loss of optimality.
With the described transmission model, the achievable data rate R ul k between UE k and EN i in the uplink channel is given as R ul k = u ul k W ul I(x ul k ; y ul i ), where the mutual information I(x ul k ; y ul i ) is calculated as The uplink latency τ ul E,k on the wireless edge link for UE k is then given as Among the received b I,k bits from UE k ∈ N U,i , EN i processes only c k b I,k bits using its edge server and forwards the remaining (1 − c k )b I,k bits to the CP on the fronthaul link for cloud computing. We denote the partial capacity of the fronthaul link between EN i and CP that is used for for all i ∈ N E . For given C ul F,k , the uplink fronthaul latency τ ul F,k of UE k is given as The CP processes the received (1 − c k )b I,k bits for UE k producing output information of (1 − c k )b O,k bits. The output bits are transmitted to EN i k that serves UE k. We denote by C dl F,k ≥ 0 the partial capacity of the fronthaul link from CP to EN i k that is used to transfer the (1 − c k )b O,k bits for UE k. Thus, the following constraint should be satisfied: for all i ∈ N E . The downlink fronthaul latency τ dl F,k of UE k for given C dl F is given as In the downlink, each EN i reports the computation output information of b O,k bits to UE k ∈ N U,i . To this end, EN i encodes the output information with Gaussian channel codebook producing an encoded baseband signal s dl Therefore, EN i transmits the encoded signal s dl k during a fraction u dl k of the downlink time slot. For given Q dl k , the achievable downlink data rate R dl k is given as The optimal covariance matrix Q dl⋆ k , that maximizes the mutual information in (17) while satisfying the constraint tr(Q dl k ) ≤ P dl , implements conjugate beamforming [29] and is given as whereh dl k,i = h dl k,i /||h dl k,i ||. By substituting (18) into (17), we obtain the maximized mutual information value I(s dl k ; y dl k ) as The downlink latency τ dl E,k for UE k on the wireless edge link is hence given as Finally, the overall latency τ T,k for each UE k is given as where the second term indicates that local edge computing at EN i k and fronthaul transmissions can take place simultaneously. As a result, the total latency required for completing the tasks of all the participating UEs is given as We tackle the problem of optimizing the variables c with the goal of minimizing the total latency τ T . We formulate this problem as with the notationsR ul (23) is non-convex due to the constraints (23c) and (23e)-(23g). We can tackle the non-convex problem by coordinate descent approach [30, Sec. 1.8], since the problem becomes convex if we fix one of the variable sets c and {F, C F }. However, the coordinate descent approach cannot be directly applied to the problems that will be discussed in Sec. III-B and IV, and hence we consider FP [27] as a solution method, which can overcome this limitation.
We observe that all the constraints (23c) and (23e)-(23g), that induce the non-convexity of the problem (23), can be expressed as a function of ratios of optimization variables. It was shown in [27] that FP is suitable for approximating those constraints by convex constraints. In more detail, based on [27, Cor. 1], we can show that, for any real values λ ul F,k , λ dl F,k , λ exe E,i k ,k and λ exe C,k , the following constraints are stricter than (23c) and (23e)-(23g): The above constraints have the following desirable properties: they are convex constraints, if the auxiliary variables λ ul F,k , λ dl F,k , λ exe E,i k ,k and λ exe C,k are fixed. And they become equivalent to (23c) and (23e) and λ exe C,k are given as Based on the above observation, we consider the problem obtained by replacing the constraints (23c) and (23e)-(23g) with (24) in (23) and adding λ = {λ ul F,k , λ dl F,k , λ exe E,i k ,k , λ exe C,k } k∈NU as optimization variables. To tackle the obtained problem, which has the same optimal value as (23), we propose an iterative algorithm, in which the variables {c, u, F, C F , τ } and λ are alternately updated. Since the optimization of {c, u, F, C F , τ } for fixed λ is a convex problem, standard convex solvers, such as the CVX software [31], can be used. The optimal λ for fixed {c, u, F, C F , τ } can be obtained as (25), which make the constraints (24a)-(24d) equivalent to the original constraints (23c) and (23e)-(23g). We describe the detailed algorithm in Algorithm 1.
The convex problem solved at Step 4 of each tth iteration in Algorithm 1 has stricter constraints than the original problem (23). Also, the feasible space of the convex problem contains the solution obtained at the (t − 1)th iteration. Thus, the solution of the convex problem at the tth iteration belongs to the feasible space of problem (23) and achieves a lower latency value than the solution of the (t − 1)th iteration. Therefore, Algorithm 1 produces monotonically decreasing latency values with respect to the iteration index t so that it converges to a locally optimal point. For more formal proof of the convergence of SCA and FP algorithms, we refer to [11], [27]. We can operate Algorithm 1 with an arbitrary Algorithm 1 Alternating optimization algorithm that tackles problem (23) 1. Initialize {c, u, F, C F , τ } as arbitrary values that satisfy the constraints (23b)-(23m), and set t ← 1. 2. Calculate the total latency τ T in (22) with the initialized {c, u, F, C F , τ }, and set τ (0) 3. Set λ according to (25). 4. Update the variables {c, u, F, C F , τ } as a solution of the convex problem which is obtained by replacing the constraints (23c) and (23e)-(23g) with (24a)-(24d) and then by fixing λ. 5. Calculate the total latency τ T with the updated {c, u, F, C F , τ }, and set τ (t) Otherwise, set t ← t + 1 and go back to Step 2. initial point that satisfies the conditions (23b)-(23m). In the simulation section, we initialize the variables {c, u, F, C F } at Step 1 as For the given {c, u, F, C F }, we compute an initial value for τ according to (12), (14), (16), and (20).
The complexity of Algorithm 1 is given by the number of iterations multiplied by the complexity of solving the convex problem at each iteration (i.e., Step 4). The complexity of solving a generic convex problem is upper bounded by O(n(n 3 +M ) log(1/ǫ)) [32, p. 4], where n denotes the number of optimization variables, M is the number of arithmetic operations required to compute the objective and constraint functions, and ǫ represents the desired error tolerance. The numbers n and M equal n = 13N U and M = 45N U , respectively, for the convex problem solved at Step 4 of Algorithm 1. However, to the best of our knowledge, the analysis of the convergence rate of general SCA algorithms is still an open problem. Instead, we provide some numerical evidence of the fast convergence of Algorithm 1 in Sec. V.

B. Non-Orthogonal Multiple Access
In this subsection, we discuss the design with nonorthogonal multiple access. With non-orthogonal access, N U UEs communicate simultaneously with N E ENs on the same time and frequency resource. Therefore, the uplink and downlink communications on the wireless edge link are impaired by inter-UE interference signals, while benefiting from transmission on a larger time interval. The computation and fronthaul transmission models are the same as the one described in Sec. III-A, and we detail here only the uplink and downlink communication phases and the resulting latency performance.
As in Sec. III-A, we assume that each UE k uses a Gaussian channel codebook so that its transmitted signal x ul Due to the presence of inter-UE interference signals, full power transmission at all UEs may cause an optimality loss. This suggests that we need to carefully design the transmit power variables p ul k , k ∈ N U , by adapting to channel state information (CSI).
Each EN i needs to decode the signals {x ul k } k∈NU,i based on the received signal y ul i . We assume that the signals {x ul k } k∈NU,i are detected in parallel without successive interference cancellation (SIC) as in [33], [34] in order to minimize the decoding delay. We leave the design and analysis with SIC decoding [35] while taking into account the decoding delay for future work.
Under the assumption of parallel decoding, the achievable rate R ul k of UE k in the uplink channel is given as R ul k = W ul I(x ul k ; y ul i k ) with the mutual information value computed as Here we have defined the notation p {p ul k } k∈NU , and the function For given R ul k , the uplink edge latency τ ul E,k for UE k is given as (12).
For the downlink edge link, each EN i transmits a superposition of the signals s dl k , k ∈ N U,i , where s dl k ∼ CN (0, Q dl k ) encodes the task output of UE k. The transmit signal of EN i is written as With the above transmission model, the downlink transmit power constraint (9) can be expressed as k∈NU,i tr(Q dl k ) ≤ P dl , and the achievable rate R dl k of UE k on the wireless edge link is given as where Q {Q dl k } k∈NU . For given R dl k , the downlink edge latency τ dl E,k of UE k is given as (20). For the non-orthogonal multiple access scheme as described above, we aim at jointly optimizing the variables p, Q, c, F and C F with the goal of minimizing the total latency τ T in (22). The problem can be written as where we have defined R {R ul E,k , R dl E,k } k∈NU . We note that it is more challenging to tackle problem (31) than (23) due to the presence of inter-UE interference signals on the wireless edge links. Accordingly, the uplink and downlink transmission strategies on edge links, which are characterized by the variables p and Q, need to be jointly optimized. Also, the constraints (31e) and (31f) on the edge throughputs, which involve matrix variables Q, are not convex. To address these complications, we employ FP [27] as well as matrix FP [28], which is a generalized version of [27].
We first observe that the constraints (31d), that are expressed as a function of ratios of scalar optimization variables, can be handled by FP [27] as in Sec. III-A. Based on [27, Cor. 1], we replace the constraints (31d) with stricter constraints (24a)-(24d), which become equivalent to (31d) if the variables λ ul F,k , λ dl F,k , λ exe E,i k ,k and λ exe C,k equal (25). The other non-convex constraints (31e) and (31f) contain ratios of matrix variables. Thus, we need to employ matrix FP [28], which generalizes scalar or vector version of FP in [27]. From [28, Cor. 1], the following constraints are stricter than (31e) and (31f) for any Γ ul , and (32a) where we have defined the variablesp ul Q dl E,k = Q Also, the above constraints are equivalent to (31e) and (31f) if Using the alternative representations (24) and (32) to the non-convex constraints (31d)-(31f), we restate the problem (31) with additional optimization variables λ, We tackle the obtained problem by alternately optimizing the variables {c,p,Q, F, C F , τ , R} and {λ, Γ, θ}. The detailed algorithm is summarized in Algorithm 2. Similarly to Algorithm 1, Algorithm 2 achieves monotonically decreasing latency with respect to the number of iterations, whose solution converges to a locally optimal point of (31) due to its non-convexity. In Sec. V, we initialize the variables {c, F, C F } and {p,Q} as (26b)-(26e) and respectively, where the elements of V dl k ∈ C nE,i×nE,i , k ∈ N U,i , are independent and identically distributed as CN (0, 1). For the given {c, F, C F ,p,Q}, we compute the rates R using (27) and (30), from which the latency variables τ can be initialized as (12), (14), (16), and (20).
The complexity of Algorithm 2 is given as the product of the number of iterations and the complexity of solving the convex problem at Step 4. The complexity of the latter is upper bounded by O(n(n 3 + M ) log(1/ǫ)) [32, p. 4], where the numbers of optimization variables and arithmetic operations are given as n = N U (4ñ 2 E + 14) and M = N U (ñ E (14ñ E + 1) + 41) +ñ E (8ñ 2 E + 5ñ E + 3), respectively. Here we have assumed that every EN uses the same numberñ E of antennas, i.e., n E,i =ñ E for all i ∈ N E . Some numerical evidence of the convergence rate of Algorithm 2 is provided in Sec. V.

IV. OPTIMIZATION FOR THE C-RAN ARCHITECTURE
In this section, we investigate the design of collaborative cloud and edge mobile computing system within a C-RAN architecture [18]- [20]. In C-RAN, the baseband signals of distributed ENs are processed by the CP in a centralized manner for the purpose of effective interference management. In the following subsections, we describe the uplink and downlink communication phases and the total end-to-end latency Algorithm 2 Alternating optimization algorithm that tackles problem (31)

A. Uplink Communication and Latency
As illustrated in Sec. II-A, each UE k splits its computation input information into two parts of c k b I,k and (1−c k )b I,k bits, and sends the former and latter parts to its serving EN i k and the CP, respectively. In the D-RAN protocol detailed in Sec. III, both parts were encoded into a single codeword, since all the input information had to be decoded by the serving EN i k . However, in the C-RAN scheme, only one part is decoded by EN i k , and the other codeword is decoded by the CP based on the fronthaul received signals. To accommodate this requirement, we leverage superposition coding as discussed next.
We denote the encoded signals for the two parts of c k b I,k and (1 − c k )b I,k bits by s ul E,k and s ul C,k , respectively. Under independent Gaussian channel codebooks, the two signals are independent of each other and distributed as s ul E,k ∼ CN (0, p ul E,k ) and s ul C,k ∼ CN (0, p ul C,k ). UE k transmits a superposition of the encoded signals so that the transmit signal x ul k is given as and the transmit power constraint (8) can be written as p ul E,k + p ul C,k ≤ P ul . Based on the uplink received signal y ul i , EN i detects the signals s ul E,k transmitted by its serving UEs k ∈ N U,i . The achievable rate R ul E,k of each signal s ul E,k in bps is given as Here we have defined p ul {p ul E,k , p ul C,k } k∈NU .
After the local decoding described above, EN i cancels out the impact of the decoded signals from the received signal y ul Since the fronthaul link connecting EN i to the CP has finite capacity C F bps, a quantized version of the signalỹ ul i , denoted byŷ ul i , is forwarded to the CP. We assume the Gaussian test channel as in [19], [20]. Then, the quantized signalŷ ul i is modeled asŷ where the quantization distortion noise q ul i is independent ofỹ ul i and is distributed as q ul i ∼ CN (0, Ω ul i ). Under the quantization model (39), the compression rate γ ul i , that equals the number of bits representing the quantized signalŷ ul i per baseband sample, is given as [36] EN i should send W ul τ ul E γ ul i bits to the CP on the fronthaul link of capacity C F bps, since the duration of each baseband sample is approximately 1/W ul sec, and hence τ ul E /(1/W ul ) = W ul τ ul E quanzited baseband samples should be forwarded to the CP. Due to the parallel operation of fronthaul links of different ENs, the fronthaul latency τ ul F for uplink is given as The CP recovers the quantized signalsŷ ul 1 ,ŷ ul 2 , . . . ,ŷ ul NE from the bit streams received on the fronthaul links. The vectorŷ ul = [ŷ ulH 1ŷ ulH 2 · · ·ŷ ulH NE ] H , which stacks the quantized signals from all ENs, can be written aŝ where we have defined h ul is an indicator function which takes 1 if the statement in the subscript is true and 0 otherwise. The stacked noise vectors q ul and z ul are distributed as q ul ∼ CN (0,Ω ul ) and z ul ∼ CN (0, σ 2 z,ul I), respectively, withΩ ul = diag({Ω ul i } i∈NE ). Using the recovered quantized signal vectorŷ ul , the CP detects all the signals s ul C,k , which are necessary for cloud computing. The achievable rate R ul C,k of the signal s ul C,k is given as R ul C,k = W ul I(s ul C,k ;ŷ ul ), where the mutual information value is computed as Consequently, the latency τ ul E for uploading the input information of the UEs on the uplink channel is given as

B. Downlink Communication and Latency
After completing the computation tasks, the CP encodes the computation output information of (1 − c k )b O,k bits for each UE k with Gaussian channel codebook and obtains an encoded baseband signal s dl C,k ∈ C nE ×1 ∼ CN (0, Q dl C,k ). The CP computes a signal vectorx dl ∈ C nE ×1 by superimposing the encoded signals as The ith subvectorx dl i ∈ C nE,i×1 ofx dl = [x dlH 1 · · ·x dlH NE ] H is transferred to EN i on the fronthaul link. To this end, it is quantized, and we model the quantized signalx dl i under the Gaussian test channel [19], [20] aŝ where the quantization distortion noise q dl i is independent of x dl i and distributed as q dl i ∼ CN (0, Ω dl i ). The compression rate γ dl i needed for representing the quantized signalx dl i in bits per baseband sample is given as where the elements of E i ∈ C nE ×nE,i are filled with zeros except for the rows from i−1 j=1 n E,j + 1 to i j=1 n E,j being an identity matrix of size n E,i × n E,i . Similar to (41) for uplink, the downlink fronthaul latency τ dl F for given γ dl i , i ∈ N E , and τ dl E is computed as Each EN i also encodes the edge computation output information for UE k ∈ N U,i of c k b O,k bits producing an encoded baseband signal s dl E,k ∈ C nE,i×1 ∼ CN (0, Q dl E,k ). EN i then transmits a superposition of the locally encoded signals s dl E,k , k ∈ N U,i , and the quantized signalx dl i , which was received on fronthaul, over the downlink channel to UEs. Thus, the signal x dl i transmitted by EN i is given as With (49), the transmit power constraint (9) at EN i can be written as The first term in the left-hand side (LHS) measures the power of the signals {s dl E,k } k∈NU,i , which encode the computation output information processed by EN i. The sum of the second and third terms is the power of the signalx dl i , which is a quantized version ofx dl i that encodes the signals {s C,k } k∈NU processed by the CP.
Each UE k detects the signals s dl E,k and s dl C,k based on the downlink received signal y dl k . The achievable rates of s dl E,k and s dl C,k are given as R dl E,k = W dl I(s dl E,k ; y dl k ) and R dl C,k = W dl I(s dl C,k ; y dl k ), respectively, with Here, we have defined h dl . With the downlink rates described above, the latency τ dl E for downloading the output information on the downlink channel is given as . (52)

C. Total End-to-End Latency With C-RAN
The total end-to-end latency τ T for completing the all the tasks within the described C-RAN architecture is modeled as where the fronthaul latency τ ul F , τ dl F and the edge latency τ ul E , τ dl E are defined in (41), (48), (44) and (52), respectively. Also, τ exe E and τ exe C represent the latency for executing the computation tasks at the ENs and CP which are are given as with τ exe E,i k ,k and τ exe C,k in (3) and (5).

D. Optimization
We aim at jointly optimizing the task splitting variables c, the uplink {p ul , Ω ul } and downlink communication strategies {Q dl , Ω dl } with the goal of minimizing the end-to-end latency τ T in (53). The problem at hand can be stated as minimize p≥0,c≥0,Q 0, Ω 0,F,τ ,R We note that it is more difficult to solve problem (55) than problems (23) and (31) for D-RAN, since (55) involves more optimization variables including the fronthaul quantization strategies Ω ul and Ω dl ; and the constraints (55d) and (55g) on the fronthaul latency have a more complicated form than (23c) and (23e) for D-RAN systems. To address these complications, we apply FP and matrix FP [27], [28] as in the methodology outlined above for D-RAN as well as the convex approximation method introduced in [19, Lem. 1].
To this end, we first replace the constraints (55h) with (24c) and (24d) which are convex for fixed λ exe E,i k ,k and λ exe C,k and become equivalent to (55h) when λ exe E,i k ,k and λ exe C,k are given as (25). Similarly, based on [27, Cor. 1], we consider the following constraints which are stricter than (55b), (55c), (55e) and (55f): 2λ dl The above constraints become equivalent to (55b), (55c), (55e) and (55f) if for m ∈ {ul, dl}. Now, we discuss the non-convex constraints (55d) and (55g). Using the epigraph form, the constraint (55d) can be restated as If we fix the auxiliary variables α ul and Σ ul i , the constraints (59) are convex. Also, they become equivalent to (58) if the auxiliary variables α ul and Σ ul i are given as Similarly, instead of (55g) for downlink, we consider the following stricter constraints: The above constraints are equivalent to (55g) if Algorithm 3 Alternating optimization algorithm that tackles problem (55) 1. Initialize {p, c, Q, Ω, τ , R} as arbitrary matrices/values that satisfy the constraints (55b)-(55l), and set t ← 1.
Otherwise, set t ← t + 1 and go back to Step 3.
Lastly, using [28, Cor. 1], we replace the remaining nonconvex constraints (55i)-(55l) with the following stricter constraints: for k ∈ N U . The above constraints are equivalent to (55i)- and (64). The detailed algorithm is described in Algorithm 3. The solution obtained by Algorithm 3 is a locally optimal solution due to the non-convexity of the problem (55). In Sec. V, we initialize {p, c} as p ul E,k ← P ul , p ul C,k ← P ul and c k ← 1/2 for k ∈ N U . To initialize the covariance matrices of downlink signals Q and quantization noise signals Ω, we first set where the elements of V E,k ∈ C nE,i×nE,i , V C,k ∈ C nE ×nE and V Ω,k ∈ C nE,i×nE,i follow CN (0, 1). The covariance matrices obtained in (65) may not satisfy the power constraints (50). To resolve this issue, we repeatedly multiply a scalar η < 1 to the matrices Q and Ω until the constraints (50) are satisfied. In the simulation, we set η = 1/2. Once the variables {p, c, Q, Ω} are fixed, the rate variables R can be computed using (37), (43) and (51), and the latency variables τ are initialized as (41), (44), (48), and (52).
As discussed in Sec. III, the complexity of Algorithm 3 is given by the number of iterations multiplied by the complexity of solving the convex problem at Step 4. The complexity of the latter is upper bounded by O(n(n 3 + M ) log(1/ǫ)) [32, p. 4], where the numbers n and M equal n = N U (4ñ 2 Here D L is defined as the number of arithmetic operations needed to calculate the determinant of an L × L matrix, which is given as D L = O(L 3 ) with Gaussian elimination [37, p. 1]. We discuss the convergence rate of Algorithm 3 in Sec. V.

V. NUMERICAL RESULTS
In this section, we validate via numerical results the performance gain of the proposed C-RAN architecture as compared to the D-RAN reference system. We assume that the locations of N U UEs and N E ENs are independently and uniformly sampled from a square area with side length of 500 m. We impose the minimum separation of 10 m between any pair of UE and EN. We consider a path-loss model ρ 0 (d/d 0 ) −η [38], [39], where ρ 0 is the path-loss at a reference distance d 0 , d denotes the distance between the transmitting and receiving nodes, and η is the path-loss exponent. We set d 0 = 30 m, ρ 0 = 10 dB and η = 3, and assume independent Rayleigh small-scale fading channel model for all the channel coefficients. We consider a symmetric system between uplink and downlink with SNR ul max = SNR dl max = SNR max , W ul = W dl = W , and C ul F = C dl F = C F . The computation capabilities of CP and ENs are set to F C = 10 11 [4] and F E,i ∈ {1.0, 2.5} × 10 10 [13], [40], respectively, unless stated otherwise. We also assume that there are b I,k = b O,k = 10 6 input and output bits for each UE and that the task of each UE k requires V k = 700 CPU cycles per input bit [8]. To solve the convex problems at Step 4 of Algorithms 1, 2 and 3, CVX software [31] with SDPT3 solver [41] is adopted. Without claim of optimality, we associate each UE k with the closest EN, so that i k is set to with dist i,k represents the geographical distance between UE k and EN i.

A. Convergence of the Proposed Algorithm
The convergence rate of FP is analyzed in [27] with a focus on single-ratio problems, and reference [28] discusses the convergence rate of matrix FP via numerical examples. Similar to [28], we provide numerical evidence of the fast convergence of the proposed algorithms in Fig. 2. In the figure, we plot the end-to-end latency τ T of D-RAN and C-RAN schemes versus the number of iterations for N U = 4, N E = 2, n E,i = 2, W = 20 MHz, C F = 1 Gbps, F E,i = 10 10 and SNR max ∈ {0, 20} dB. We plot both the snapshots and average latency, where the latter is averaged over 100 channel samples. The figure shows that, regardless of the SNR, the proposed algorithms converge reliably within a few iterations. We leave the analysis of the convergence rate of the proposed algorithms for future work. Throughout the following experiments, we set the threshold value for convergence as δ = 10 −4 and limit the maximum number of iterations to t max = 30.

B. Performance Gains of the C-RAN Architecture
In this subsection, we investigate the performance gains of the C-RAN architecture introduced in Sec. IV for collaborative cloud and edge mobile computing as compared to benchmark D-RAN systems described in Sec. III. To this end, in Fig. 3, we plot the average end-to-end latency τ T versus the fronthaul capacity C F for N U = 4, N E = 2, n E,i = 2, W = 20 MHz, F E,i = 10 10 and SNR max = 20 dB. The figure shows that deploying C-RAN architecture is not advantageous when the fronthaul capacity C F is small due to the large latency caused by the fronthaul transmission. However, as C F increases, the C-RAN scheme significantly outperforms the benchmark D-RAN schemes, since it enables more effective interference management by means of centralized encoding and decoding at CP.
In Fig. 4, we examine the energy consumption at UEs under the same set-up considered in Fig. 3. We calculate the energy consumption at UE k as E k = E ul k +E dl k , where the uplink and downlink energy expenditures are defined as E ul k = τ ul E,kp ul k and E dl k = τ dl E,k d dl k , respectively. Here, d dl k indicates the mobile receiving energy expenditure per second in downlink, and is set to d k = 0.625 J/s as in [13]. The uplink transmit powerp ul k of UE k is respectively given asp ul k = p ul k andp ul k = p ul E,k + p ul C,k for the D-RAN and C-RAN systems. Unlike D-RAN, the energy consumption of UEs with C-RAN decreases with C F . This is because the ENs and CP can exchange quantized baseband signals of better resolution for larger C F , and hence the latency on edge links becomes lower. Fig. 5 plots the average end-to-end latency τ T with respect to the number n E,i of antennas of each EN for N U = 3, N E = 2, W = 20 MHz, F E,i = 10 10 , C F = 3 Gbps and SNR max = 5 dB. Comparing the performance of D-RAN with different access techniques, we see that TDMA shows a lower latency than non-orthogonal access when the ENs use a small number of antennas. However, when the ENs are equipped with sufficiently many antennas, the nonorthogonal scheme outperforms the TDMA scheme, since the co-channel interference signals can be suppressed by local array processing at the ENs. In this case, each EN can suppress interference signals only with local processing, and hence C-RAN does not provide performance benefits, while significant gains are observed for lower values of n E,i .
In Fig. 6, we plot the average end-to-end latency τ T versus the number N E of ENs for N U = 8, n E,i = 2, W = 50 MHz, F E,i = 2.5×10 10 , C F = 2 Gbps and SNR max = 20 dB. When the network has a single EN, i.e., N E = 1, there is no advantage of deploying the C-RAN architecture in Sec. IV compared to D-RAN in Sec. III. This is because the noise signals caused by fronthaul quantization degrade the spectral efficiency for both uplink and downlink. However, as N E increases, C-RAN shows significantly improved latency performance than the D-RAN schemes. These gains are achieved by the centralized signal processing at the CP on behalf of the connected ENs, which enables effective interference management.

C. Performance Gains of Collaborative Cloud-Edge Computing
In this subsection, we study the performance gains of the collaborative cloud and edge computing system with optimized computational resource allocation as compared to benchmark schemes that rely only on edge computing (i.e., by setting c k = 1 for all k ∈ N U ) or cloud computing (i.e., c k = 0 for all k ∈ N U ). Note that the optimization of these benchmark schemes can be addressed by adopting the proposed algorithm with minor modifications. For reference, we also evaluate the performance of a hybrid strategy that selects between the two benchmark schemes. We adopt the optimized C-RAN architecture in Sec. IV for all cases except for edge computing, for which the C-RAN system is not applicable and hence we select D-RAN with non-orthogonal multiple access.
In Fig. 7, we plot the average end-to-end latency τ T versus the fronthaul capacity C F for N U = 4, N E = 2, n E,i = 2, W = 50 MHz, F E,i = 2.5 × 10 10 and SNR max = 10 dB. Since edge computing does not utilize the fronthaul links, its performance is not affected by C F . In contrast, the latency of cloud computing scheme decreases as C F increases. While selecting between edge and cloud computing schemes does not yield significant benefits, the proposed collaborative cloud and edge scheme achieves notable gains, particularly in the intermediate regime of C F .
In Fig. 8, we plot the average end-to-end latency τ T versus the maximum SNR for N U = 4, N E = 2, n E,i = 2, W = 100 MHz, F E,i = 2.5 × 10 10 and C F = 250 Mbps. The figure shows that, although increased SNR levels are beneficial for all the schemes, the performance of cloud computing is more significantly affected by the SNR than that of edge computing. This is because the edge latency of edge computing is limited by interference, and hence its performance saturates as the Edge computing (ck = 1, ∀k) Cloud computing (ck = 0, ∀k) Selection btw. edge/cloud computing Collaborative cloud-edge computing Figure 7. Average end-to-end latency τ T versus the fronthaul capacity C F (N U = 4, N E = 2, n E,i = 2, W = 50 MHz, F E,i = 2.5 × 10 10 and SNRmax = 10 dB). 10   SNR increases. The performance of the C-RAN scheme is instead limited by the fronthaul capacity as SNR grows larger. Fig. 9 plots the average end-to-end latency τ T by varying the edge computing capability F E,i normalized by F C for N U = 4, N E = 2, n E,i = 2, W = 100 MHz, C F = 500 Mbps, SNR max = 10 dB and F C = 10 11 . When F E,i is too small, it is desired to choose c k = 0 for all k ∈ N U so that all the tasks are offloaded to the CP. As F E,i increases, offloading some tasks to ENs can improve the performance, and the proposed scheme with optimized task allocation provides a notable gain as compared to all the benchmark schemes.
In Fig. 10, we plot the average task ratio c k assigned to ENs versus the fronthaul capacity C F for N U ∈ {2, 4}, N E = 2, n E,i = 1, W = 100 MHz and F E,i ∈ {0.1, 0.5} × 10 10 . The task ratio variables are obtained from the proposed algorithm in Sec. IV-D. We observe from the figure that, as the fronthaul Edge computing (ck = 1, ∀k) Cloud computing (ck = 0, ∀k) Selection btw. edge/cloud computing Collaborative cloud-edge computing Figure 9. Average end-to-end latency τ T versus the normalized edge computing capability F E,i /F C (N U = 4, N E = 2, n E,i = 2, W = 100 MHz, C F = 500 Mbps, SNRmax = 10 dB and F C = 10 11 ). capacity C F increases, more tasks are assigned to CP due to reduced fronthaul latency. Similarly, as the ENs are equipped with stronger computing power F E,i , they process a larger portion of tasks. Moreover, increasing the number N U of UEs results in smaller ratios c k , since the ENs with limited computing power offload more tasks to the CP when N U is larger.

VI. CONCLUSIONS
We have studied the design of collaborative cloud and edge mobile computing within a C-RAN architecture for minimal end-to-end latency. We have tackled the joint design of computational resource allocation and C-RAN signal processing strategies with the goal of minimizing end-to-end latency required for completing the computational tasks of all the participating UEs in the network. To tackle the non-convex optimization problem, we have applied FP and matrix FP. Via extensive numerical results, we have validated the convergence of the proposed optimization algorithms, the performance gain of C-RAN architecture as compared to D-RAN, and the impact of optimized computational resource allocation of collaborative cloud and edge computing. As future work, we mention the extension to collaborative AR [13], heterogeneous C-RAN and mobile computing integrated systems [42]- [44], the robust design with imperfect CSI [45], and the energyefficient design [3], [4] for energy-limited mobile UEs. Also, it would be relevant to verify the effectiveness of the proposed algorithms by deriving a tight lower bound on the optimal latency values.