pDCell: an end-to-end transport protocol for mobile edge computing architectures

To deal with increasingly demanding services and the rapid growth in number of devices and traffic, 5G and beyond mobile networks need to provide extreme capacity and peak data rates at very low latencies. Consequently, applications and services need to move closer to the users into so-called edge data centers. At the same time, there is a trend to virtualize core and radio access network functionalities and bring them to edge data centers as well. However, as is known from conventional data centers, legacy transport protocols such as TCP are vastly suboptimal in such a setting. In this work, we present pDCell, a transport design for mobile edge computing architectures that extends data center transport approaches to the mobile network domain. Specifically, pDCell ensures that data traffic from application servers arrives at virtual radio functions (i.e., C-RAN Central Units) timely to (i) minimize queuing delays and (ii) to maximize cellular network utilization. We show that pDCell significantly improves flow completion times compared to conventional transport protocols like TCP and data center transport solutions, and is thus an essential component for future mobile networks.


INTRODUCTION
The fifth generation (5G) of cellular mobile networks aims at supporting high mobility, massive connectivity, extremely high data rates and ultra low latency [30].While previous generation systems were optimized for a specific objective (e.g., voice or data), 5G networks need to support several or even all of the above requirements simultaneously.Enhanced mobile broadband (eMBB), massive machine type communications (mMTC) and ultra-reliable low-latency Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.Copyrights for components of this work owned by others than ACM must be honored.Abstracting with credit is permitted.To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.Request permissions from permissions@acm.org.communications (URLLC) type of services exhibit highly diverse requirements and traffic characteristics.For example, automation processes pose very stringent latency and high reliability requirements to complete actuation commands (< 1 ms latency and 10 −9 packet loss rate) while communications between sensors employed for smart city services are infrequent and pose relaxed latency requirements [27].To support such services, the architectural design of mobile networks needs to evolve.In particular, to empower URLLC in systems for 5G and beyond, applications and services need to move closer to the end users.
Mobile Edge Computing (MEC), Software-Defined-Networking (SDN) and Network-Function-Virtualization (NFV) are the main driving factors to enable 5G services.The MEC paradigm brings computing and storage resources closer to the end users.SDN allows to decouple control plane and user plane and NFV decouples network functions from dedicated hardware for their execution on commercial-off-the-shelf hardware.Future mobile networks will leverage these technologies to bring key core and radio functions near the user, in the so called edge data center.Virtual Central Units v(Cus) implementing parts of the air interface stack (following a C-RAN approach) will directly connect users to MEC applications running in the edge data center such as local Evolved Packet Cores (EPCs), keeping local traffic local, reducing the load of the transport network and reducing the latency experienced by the user.
The adoption of edge data centers also brings the whole transport, from application server to the mobile terminal, under the control of the Mobile Network Operator, which allows for performance optimizations.Indeed, conventional transport protocols applied to such an ecosystem would perform poorly.For example, cell load increase is known to limit user bandwidth availability [24] and the increased delay, due to large per-user queues at base stations, can reduce the precision of the TCP retransmission timeout estimation.Consequently, conventional TCP may experience unnecessary timeouts, causing retransmissions and slow start, and thus leading to poor link utilization.Transport protocols for MEC architectures, bridging data center and radio networks together, should take advantage of Cus pooling and information about feedback on channel quality that is available in MEC environments as part of the Radio Network Information Services [23].Furthermore, they should also take advantage from technological advances in data center transport aiming at minimizing flow completion time (FCT) of small flows [6,17,22,26].Cross-domain transport optimization allows to fully benefit from the potential gains, e.g., better cell load management and enables just-in-time scheduling to ensure that MEC servers send traffic such that it arrives at the Cus exactly at the right point in time to be scheduled.This work presents pDCell, a new transport design spanning from the data center domain to the mobile end users.pDCell takes The main novelty of pDCell is to couple the transport protocol with the scheduler of the air interface, residing within the Cu at the edge data center.pDCell vastly improves transport efficiency and can be deployed in a way that is transparent to services and applications by exposing a TCPcompliant socket application interface at the server and mobile terminal side.By coupling the transport congestion control with the wireless scheduler, pDCell can immediately slow down a source sending traffic to a user when radio channel conditions worsen, rather than waiting for slow TCP end-to-end congestion control to adapt.Therefore, buffer sizes in the transport network can be substantially smaller, reducing overall latency.

Data Center
Our main findings are as follows: • pDCell significantly outperforms data center transport solutions that are not aware of the wireless domain.
• pDCell is scalable, as the increase in network load does not significantly affect the average FCT.
• pDCell operates with minimal queue occupancy and prevents unnecessary retransmissions to achieve ultra-low endto-end latency.

MEC ARCHITECTURE AND MOTIVATING EXAMPLE 2.1 MEC Architecture
2.1.1Centralized/Cloud-RAN.C-RAN systems split the air interface protocol stack into two separate parts, one that remains at the antenna site called Distributed Unit (Du) and a Centralized Unit (Cu) which is moved to centralized pools in nearby edge data centers.For the operators, C-RAN systems enable significant cost reduction to deploy and maintain antenna sites while boosting the network capacity, e.g., by enabling Cooperative Multi Point techniques with tight synchronization between the aggregated cells.The complexity of distributed and central units differs based on the functions that are processed locally.The different points in the protocol stack where the separation of central and distributed functions can take place are called functional splits (see Fig. 1(c)) [8,12].The implementation of a given functional split uniquely defines the properties of the system design [2], and of the network connecting the Du and Cu, known as fronthaul (Fh).In the full C-RAN scenario, radio I/Q samples are transported in the fronthaul and only circuit-based fiber technology can support the massive capacity required (approximately 1.229 Gbps of CBR traffic per 20 MHz LTE channel, independent from the user traffic) and very low latency requirements (approximately 100 µs one way) [14].With less demanding splits, such as Packet Data Convergence Protocol/Radio Link Control Protocol (PDCP/RLC) (see Fig. 1(c)), high throughput packet-based fronthaul technologies, such as millimeter-wave or Ethernet, can be used with the caveat that packetization delays need to be taken into account [11].There is a trade-off between the benefits and drawbacks of using the different functional splits.The nearer to the physical layer the split, the more complex are the signal processing techniques that have to be performed in the Cu, and the higher the potential performance gains and cost savings.At the same time, such lower split options impose very stringent delay and rate requirements on the transport between the Du and Cu.Higher layer functional splits like PDCP/RLC provide some of the benefits of the lower layer splits, such as separation between control and user plane and the possibility of using some joint transmission features in order to boost the performance at a much lower transport cost.For the PDCP/RLC split, the transport requirements just include an overhead of around a 20% over the user traffic and easily achievable delays between Du and Cu in the order of ms.Control and data plane are tightly coupled especially in P-GW and S-GW through a number of dedicated interfaces and protocols.In LTE systems, the GPRS Tunneling Protocol (GTP) is employed to carry data traffic to the end-users.With GTP, end-users maintain the same IP address while moving and traffic is routed between the P-GW and the eNodeB through the S-GW.The Stream Control Transmission Protocol (SCTP) is used for control traffic within the EPC and between the EPC and eNodeB.The recently specified 5G Core redesigns the EPC towards a more de-centralized architecture by instantiating virtual cores in edge data centers through MEC and NFV.In MEC architectures (see Fig. 1(b)), the aforementioned functionalities are virtualized and executed in edge data centers along with pools of Cus performing the radio processing.Although in virtualized environments LTE control signalling can lead to significant overhead [28], in this work we focus on data delivery of small-size traffic typical in data center and mobile networks [9,32] where LTE events are executed rarely.By allowing the Cus to exploit such a feedback and couple it to the edge data center, it becomes possible for the data center transport to perform flow scheduling and to adjust the source sending rate to the amount of radio resources that will be allocated by the mobile network, achieving at the same time high data rates and low latency.
To verify the need for this new transport protocol, Fig. 3 presents the FCT performance for i) TCP New Reno, ii) a state-of-the-art transport for data center networks (DCNs), pHost [17], with infinite queue size and iii) pHost with limited queue size at the Cu.The objective is to compare TCP and unmodified existing data center solutions applied to a MEC edge data center.While TCP runs endto-end, pHost is terminated at the Cu and from there to the UE, either UDP or TCP can be employed according to the split option (see § 2.1.1).pHost transport empowers end hosts to perform flow scheduling on the basis of grant assignments.Destinations can choose which source is entitled to transmit data packets by sending tokens.Sources decide to which destination to reply when receiving multiple tokens.pHost exploits packet-spraying to eliminate congestion by leveraging the property of DCN of full-bisectionbandwidth and avoiding explicit path-scheduling.The traffic flows from applications towards the EPC located in the MEC platform, which in turn forwards the traffic to the end user through the Cu.For the cases of TCP and pHost with limited queue size, a per-user buffer size of a LTE (Long Term Evolution) UE category 3 is assumed [1].The traffic trace used for this experiment is explained in § 4 and its flow size distribution is shown in Fig. 7.The FCT of pHost with finite queue shows moderate delays for all flows.pHost tries to fill the buffer of the Cu as soon as possible and since it does not have any feedback about the wireless domain, packets are discarded once the buffer is full.pHost will respond to the loss of packets with retransmissions by reissuing grants.Considering the timeouts of pHost designed for DCN, this retransmission is extremely fast.The FCT of pHost with infinite queue shows the higher delays.When the buffer of the Cu is assumed to be infinite, no packets are lost due to buffer overflow.For this reason, packets experience high queuing delays.Finally, TCP FCT shows an intermediate performance.Although the congestion control of TCP considers the delay and bandwidth of the channel, its reaction time  Given the above results, we argue that a new transport needs to be designed specifically for mobile edge data centers.We base this design on data center transport rationale, but parameterize and adapt it to the requirements of the mobile network.The new transport must incorporate specific aspects of the wireless domain and be flexible to adapt to latency and bandwidth variation, while maintaining the benefits of the DCN transport such as low queue occupancy, low FCT and low losses.

THE PDCELL TRANSPORT
This section discusses the design rationale of pDCell transport, illustrating how it integrates the data center and wireless domains.

Assumptions and Design Challenges
We consider an end-to-end architecture where each Du is controlled by one Cu and each mobile user is attached to a single Du at a time.We will discuss in § 5 the modification to the architecture layout necessary to relax such assumption.pDCell has to overcome a number of challenges: (1) The RTTs of the two components of end-to-end system (mobile and data center networks) are different.While future 5G mobile networks should support millisecond RTTs, current RTTs are in the order of tens of milliseconds to seconds [32].In contrast, the data center delay is in the order of microseconds [13].Hence, pDCell should take into account this difference and prevent bursts of packets to be queued simultaneously at Cus that will increase latency.To this end, pDCell operates on flow scheduling and source sending rate adaptation in the data center to seamlessly incorporate the requirements of the mobile network.(2) Multiple data center sources can send traffic to the same mobile user simultaneously.The size of such flows can vary and they have to be processed by the Cu hosting the baseband processing of the Du associated with the mobile user.Hence, these flows share the same buffer and compete with flows destined to other users.To minimize FCT, short flows should be prioritized over long ones.Moreover, while the LTE standard requires packet-level guarantees, data center networks only enforce flow level guarantees [28].To this end, Cus schedule the processing order of incoming data center traffic with a newly defined scheduling policy, explained in § 3.2 and § 3.6.
(3) pDCell should ensure a clear separation between control and data traffic.In edge data centers, both the mobile network control traffic attributed to message exchange between the diverse functionalities of the EPC and the data center control traffic coexist.(4) While in data centers packet losses are infrequent, in the wireless domain they are common, and hence pDCell should react accordingly depending on the nature of losses.Whereas in the wireless domain pDCell relies on existing mechanisms, in the data center environment pDCell operates on the buffer management level by defining an admission policy that prevents packets from being dropped and later retransmitted.

Architectural Design Overview
In this work, we consider a MEC architecture as in Fig. 1(b), implementing a functional split according to option 2 (see Fig. 1(c)).This split limits the multiplexing gains as it leaves a considerable part of the baseband processing at Dus.However, it poses much less stringent fronthaul requirements in terms of latency and rate [12,18], allowing edge data centers and Dus to be interconnected with packet-based technology.To exploit available radio resources, each Du performs resource allocation based on CQI user feedback and takes care of retransmissions if needed (at the RLC and MAC layers).Note that implementing split option 8 would require CQI feedback to be propagated to the Cu, and thus the scheduling is based on less up-to-date channel state information.This is the second advantage of implementing split option 2. However, to solve challenge (1) and perform sophisticated scheduling taking into account simultaneously data center and mobile networks states, also the Cu needs to be aware of the user channel conditions, which the split option 2 does not provide.Thus, the challenge to solve is: how to propagate such information back to the data center?To resolve this challenge, we resort to buffer management and propose a new mechanism tailored to the requirements of the mobile network.

Buffer Management at Cus and Dus
At the Cu, an IP packet undergoes the PDCP protocol with associated robust header compression (see bottom of Fig. 4).There exists a direct mapping between a PDCP service data unit (SDU) and a protocol data unit (PDU), that are of a maximum size of 8188 Bytes, to also handle MTUs larger than those of typical IP packets.PDCP PDUs are then sent to the RLC layer at Dus.To better manage wireless resources, the association between RLC SDUs and PDUs is not univocal.The payload of RLC PDUs can contain multiple SDUs, e.g., combining new incoming packets from PDCP and retransmissions.A RLC SDU can also be split into multiple PDUs.At this stage, the information on original data center flows is no longer available, precluding the capability of performing scheduling with joint information on data center and mobile network and thus to meet objectives like FCT minimization.Hence, pDCell advocates the need for combined scheduling decisions at Cus by leveraging both flow and channel quality information.The total per-user buffer space allocated for RLC in acknowledge mode is given in number of SDUs, typically 1024 for a UE cat 3, i.e., 1.4 MB to accommodate 1500 Bytes long RLC SDUs.When TCP and URLLC traffic coexists,

CU Fronthaul
To DU 1 To DU 2 Propagation of virtual thresholds the queuing delay can grow up to 540 ms, which is highly detrimental to the performance of interactive applications [21].To prevent such behaviour, we advocate the a better used of buffer space and we implement the following mechanism for pDCell.At the Cu level, one queue per user is necessary to control application sources in the data center.The allocation is performed by the Radio Resource Control (RRC) protocol when it detects the status of an user as active.The queue size is defined via a virtual threshold that changes based on: i) the feedback from the corresponding Du, ii) the current load at Cu level.Since the sum of the virtual thresholds of per-user queues could exceed the processing capacity of the Cu, the buffer space is shared among all per-user queues and the total per-user queue size is adjusted to the processing capacity of the Cu.The number of output ports in this shared memory switch is equal to the number of associated Dus, where outgoing traffic of each port is shaped to the Du processing capacity (see Fig. 4(a)).If for the same user the propagated value from a Du virtual threshold is smaller than the one currently available in the Cu, the latter is decreased.Otherwise, the value of the virtual threshold increases if the current buffer occupancy allows.Similarly to prior work [21], the shared memory switch operates at PDCP level, although for scalability reasons per-user and not per-flow queues are allocated.
The natural choice for buffer management in a shared memory switch is a Longest-Queue-Drop (LQD) [4] policy that drops already admitted packets from the longest queue in case of congestion.By design, LQD provides some level of fairness among users to access Cu processing capacity.Since the goal of the transport is to ensure reliability and minimize retransmissions, already admitted packets should not be dropped.Hence, the LQD behavior needs to be modified.When the Cu buffer is full, the virtual threshold of the currently longest queue Q is decreased by one unit without dropping packets in Q and another incoming packet currently evaluated for admission is simply not admitted.Hence, we guarantee that each admitted packet has buffer space reserved.Furthermore, incoming packets belonging to a user whose queue occupancy is equal to its virtual threshold are also not admitted, regardless of the buffer occupancy.In § 3.7 we will show how to handle retransmission of non-admitted packets to limit non-in-order arrivals and consequent time-consuming reordering.We call this buffer management policy Highest-Threshold-Decrease (HTD).Fig. 5 highlights the difference between the LQD and the proposed HTD admission controls.Before the new packet p arrival, the virtual thresholds of the first and the second queues (Q i and Q j ) are 4 and 2, respectively.The actual queue occupancy of Q i and Q j are 3 and 1, respectively.Since the size of the buffer is 4, newly arriving packet p to Q j causes congestion in the Cu buffer.LQD drops one already admitted packet from Q i as it is the longest queue, and admits p to Q j .The virtual thresholds for both queues remain the same.In contrast, HTD never drops an already admitted packet, hence, HTD reduces by one unit the virtual threshold of Q i and prevents p from being admitted.Note that HTD reacts more slowly than LQD, but satisfies the requirement that admitted packets are never dropped (challenge (4)).
Since each mobile user is controlled by a single Du, each Du maintains a single queue per user (see Fig. 4(b)).In stationary regime, the queue status at t + 1 is given as follow: where Q(t) is the queue status at time t, A(t) denotes incoming packets from Cu accepted by the policy HTD, R(t) denotes PDUs to be retransmitted and D(t) corresponds to successfully acknowledged PDUs from the mobile user that can be removed from the buffer.Users experiencing good channel quality report a high CQI index, which in turn allows the MAC scheduler to employ higher modulation and coding scheme and transport block size.Hence, the component D(t) drains the queue fast and the component R(t) is marginal.In contrast, for users with a low channel quality, the MAC scheduler uses a more robust modulation and coding scheme which reduces the number of bits per resource block.Consequently, the components D(t) and R(t) may cause Q to grow, limiting the space for new incoming packets.Then Dus propagate A(t) to Cu (using uplink channels) and the corresponding virtual threshold increases if there is room.The propagation of A(t) makes the Cus indirectly aware of the channel conditions that split option 2 precludes in order to limit/increase the sending rate of data center sources with the adaptation layer (see § 3.5).Note that the size of Q cannot grow indefinitely, i.e., the sum of all allocated per user queues cannot exceed the overall Cu buffer space.Because of the propagation delay between Cu and associated Dus, the values of per-user virtual thresholds in the Cu may be outdated, and the Cu may transmit more traffic than the associated Dus can accept.To accommodate this excess traffic, one solution would be to add extra buffer space, which is however detrimental to interactive applications.Therefore, to avoid allocating this extra buffer for every queue, all virtual per-user queues in Du share the same physical buffer space.

pDCell Design
The transport design of the mobile network and the data center are transparent to each other.The latter, however, requires to incorporate mechanisms to reflect specific properties of the wireless environment.In the context of edge data centers, maintaining the data center transport simple to avoid delays introduced by scheduling is imperative as timeouts and retransmissions negatively impact the overall FCT.Consequently, for pDCell, we rely on transports where flow scheduling, congestion control and reliability are performed by the end hosts, following a minimalistic approach with requests-to-transmit (RTS) and credit assignment.Both are exchanged by end data center servers through credit packets as detailed in § 3.5.The design of pDCell is inspired by previous creditbased transports pHost [17] and ExpressPass [13], and shares the fast start approach common in other recent works [20], i.e., to ensure low-latency, pDCell sends data packets without completing an handshake procedure by probing for available bandwidth.Specifically, pDCell allows a flow to start by sending data packets along with a requests-to-transmit packet.
In contrast to protocols like pHost and ExpressPass that are specific to data center applications, pDCell is adapted to the unique features of a mobile network as follows.Conventional data center control traffic (credit packets) is exchanged between end hosts and the arrival order of credits at the bottleneck link schedules the subsequent arrival of data packets, which helps to avoid the incast problem.pDCell inherits this property as well, as shown in § 4.1.In MEC architectures, the data traffic needs to traverse Epc hosts before being processed at Cus (challenge (3)).pDCell supports this feature as illustrated in § 3.5.

The Design of the Adaptation Layer
pHost performs transmission control through grant exchange.Each credit packet informs the sender of the next packet (on a packet by packet basis) to be sent for each flow.Hence, pHost effectively performs scheduling at a per-packet granularity.pDCell incorporates this design.Similar to pHost and PIAS [7], pDCell makes use of the limited number of priority queues in commodity switches (4-8 per port [7]) to prioritize credit over data packets.Fig. 6 shows the workflow of pDCell for two sources and two destinations and highlights the traffic exchange between all segments of the considered architecture.A data center source sends an requests-to-transmit packet to inform the adaptation layer at Cu about the estimated flow size along with a small number of data packets.Although some works criticize the availability of precise information on flow sizes [7], it can be estimated [5].The adaptation layer processes the requests-to-transmit request and admits arriving data packets into the corresponding per-user queue.Implicitly, this will preallocate buffer space in the Cu buffer and will allow to allocate credits on the basis of the difference between the current queue occupancy and Figure 6: Example of pDCell workflow the virtual threshold.Such difference defines the current capability of the mobile user to receive data according to the channel conditions.Obviously, the current queue occupancy cannot be larger than the virtual threshold and when the two values are equal, no more credits are generated.Note that this is a conservative approach as more credits could be generated because of the queue turnaround time.Each credit has two associated timers t s and t e : the first prevents sources to transmit before the timer expiration and the second prevents sources to transmit data packets after its expiration.If credits expire, i.e., the source does not use them in due time, they are simply re-sent by the destination.By default, we set t s = 0 and t e = t s + 1.5× the MTU-sized packet transmission time.Only Cus can update t s , to ensure that only mobile traffic benefits from the adaptation required to solve challenge (1).Congestion is detected by Cus when no data packets are received as a reply to a credit packet.Three are multiple possible reasons: i) data packets have not been admitted at the Cu, ii) data packets have been discarded in the network fabric, or iii) the sender has not utilized the token given by credit packets before the expiration of t e .Consequently, pDCell adjusts t s as t s = TTI + α, where TTI is the Transmission Time Interval and α is the time necessary to transmit A(t) packets at the lowest modulation and coding scheme.In this way pDCell prevents the data center source from sending traffic to the mobile user and this backoff period is set to be sufficient to absorb the already queued traffic.

Wireless Scheduling Algorithms
Although Dus are in charge of actual packet scheduling and mapping to transport blocks, to pursue specific objectives like FCT minimization the Cus need to select the processing order of incoming flows.A FIFO approach would naturally process packets according to the arrival time, which can be detrimental for the FCT of small flows.Hence, each t a interval equal to a frame duration of 1 TTI, the Cus read packets from the buffer B and construct virtual LTE frames to emulate the actual schedule at Dus.Specifically, to each per-user queue (see § 3.3) is assigned a weight and the scheduler visits queues with high-priority first.This weight is defined as the time it takes to transmit the current flow with the current modulation and coding scheme of the mobile user.The higher the time, the lower the weight.
To assess the impact of the wireless scheduler in the overall framework of pDCell, we propose a new scheduling algorithm and compare its performance against the state-of-the-art: • Minimum Remaining Flow Size (MRFS) schedules flows according to the remaining time required by the flow to complete.It considers the bandwidth available to the user in the wireless domain and the estimated number of packets remaining in the flow.This algorithm optimizes the FCT, especially for small flows and users with good channel quality, providing an advantage to interactive applications.The details of the scheduling algorithm are given in Algorithm 1.
• Proportional Fair Scheduler (PFS) preferentially schedules users with good channel quality (while maintaining proportional fairness).PFS is commonly implemented in current mobile networks [10] and does not consider the characteristics of the flows to be scheduled.This algorithm ensures fairness among the users and exploits the wireless medium efficiently, hence we employ it as baseline for comparison with MRFS.
Algorithm 1 Minimum Remaining Flow Size (MRFS) 1: queue ← array of all per-user queues 2: F a ← array of flows at queue ordered by minimum completion time (size f low /bw user ) 3: N Q ← number of per-user queues 4: s f ← sub-frame 5: while (size(s f for all f in {0, size(F a )} do 7: while f ind(queues(q), f id ) ! = NU LL do 10: end for 18: end while The choice of the employed scheduling algorithm allows to fulfill specific goals.To understand the rationale behind the MRFS design, let us consider the following example.Two sources in the data center S 1 , S 2 send two flows each to two mobile users Ue 1 and Ue 2 served by the same Cu.Assume CQI Ue 1 > CQI Ue 2 and the paths between S 2 and Cu to experience congestion while the paths between S 1 and Cu do not.Consider F S 1 Ue 1 < F S 1 Ue 2 and F S 2 Ue 1 > F S 2 Ue 2 with F S 1 Ue 1 and F S 2 Ue 2 to be of equal size.Then, to minimize overall FCT, the transport should schedule in order F S 1 Ue 1 , F S 2 Ue 2 , F S 1 Ue 2 and F S 2 Ue 1 .
Given the preference that MRFS gives in scheduling first short flows, long ones can be starved in the short term.By penalizing fairness and giving high priority to short flows, we ensure to meet latencies required for URLLC type of traffic.Note that in the long term, the long flows get actually scheduled because, as their remaining flow size becomes smaller, in turn their scheduling priority increases.

Handling Packet Losses
Although the credit-based scheme is designed to minimize packet losses and has been proven to ensure almost zero losses in data centers, in an end-to-end architecture this is not true anymore.In mobile networks, packet losses are more frequent because of the inherent characteristics of the wireless channel.Consequently, pDCell considers the different nature of packet losses and provides different reaction mechanisms.
In the data center, HDT (see § 3.2) ensures reliability by not dropping already admitted packets.When data packets are not admitted at the Cu or lost, credit packets, which include the ID of the next data packet to be sent, are issued again.This also limits outof-order arrivals and the consequent reordering that is detrimental for interactive applications.
In the wireless domain, pDCell relies on existing reliability mechanisms implemented in mobile networks, namely the Automatic Repeat Request (ARQ) and the Hybrid ARQ (HARQ) at the RLC and MAC layer respectively.Hence, the mobile users do no propagate explicit per-packet feedback to the Cus, which would require either modifications to the existing protocols through explicit signaling over uplink control channels or adopting a technology-and split-dependent eCPRI solution that estimates HARQ timeouts.Furthermore, because of the combined effect of HDT and MRFS, mobile users with bad channel quality signal implicitly this status to data center sources, which in turn slow down and reduce the sending rate to prevent that significant amount of traffic to incur in high loss probability.

EVALUATION
This section evaluates pDCell for a MEC setting composed of a edge data center and wireless domain by means of simulations.Our evaluation setup builds on the one used for previous works [6,17] and publicly available2 by extending it to incorporate the mobile network component.

Evaluation Methodology
Network Topology: The network topology spans from the DCN to the cellular network.The latter consists of LTE cells configured with a 20 MHz channel and 4 × 4 MIMO, providing up to 100 Mbps of user-plane data.A set of 100 LTE category 3 UEs are randomly located in the area of coverage.The channel model and the UE link quality measurement report follow [15].The functional split is according to the 3GPP functional split 2. The fronthaul interface is modeled as a serial line of 10 Gbps.For CQI reports, we employ traces reported in [16] and assume a reporting interval of 5 ms, which is the typical value used in LTE to carry the CQI feedback over the Physical Uplink Shared Channel under fading conditions [3].The CQI feedback is used by the Du to determine the modulation and coding scheme for data transmissions to the end user.
As in prior work [6,17], the DCN is a two-tier leaf-spine topology comprising 144 servers grouped into 9 racks.Unlike large cloud data centers hosting hundreds of thousands of servers, edge data centers for baseband processing are expected to be of a smaller size [31].Interconnectivity between the computing servers and the 4 core switches is provided by 9 top-of-rack (TOR) switches through a full mesh.As a result, the data center network is a nonoversubscribed topology where each TOR has 16 10 Gbps links  Both core and TOR switches implement packet spraying and are equipped with tiny per-port buffers of 36 KB each.The DCN includes as hosts Cu and distributed EPC to be compliant with § 2.1.1 and the latest technology developments [19].
The wireless scheduler decides which packets are transmitted to the Du and it works on the time scale of frequency-division duplex (FDD) sub-frames (1 ms).

Workloads:
The data center workloads used in the evaluation exploit a flow size distribution based on measurements in a production data center with similar topology as in our study and from a cluster running data mining type jobs [9].Fig. 7 shows the CDF of the flow size.Most of the flows are small (≤ 10 KB), while a limited number of long flows contain most of the traffic (the largest flow is 3 MB and 1 GB for production data center and cluster respectively).The flow size distribution is consistent with measurements of cellular network traffic [25].To test the impact of background traffic (i.e., traffic generated by other applications hosted in the data center) on mobile traffic, we perform the following experiment.Similar to previous works [6,17], the flow arrival follows a Poisson process with a rate that ensures a given target network load set by default to 0.6.As explained in § 3.4, pDCell inherits the incast avoidance of credit exchange based DCN transport protocols, i.e., no throughput degradation is perceived when multiple sources make concurrent requests to send data to the same receiver simultaneously.To evaluate this property, we compare the CDF of the flow completion time under two different traffic matrices.In the non-incast one, 20% of the flows are destined to a single Cu server, while the remaining flows are exchanged between random hosts in the data center.In the incast traffic matrix, all sources send traffic to the same Cu server.Fig. 8 compares the completion time of the flows.The CDF shows negligible differences for the two traffic matrices.Consequently, in the remainder of the paper, we present results for  Performance Metrics: To evaluate pDCell, we adopt FCT as main performance metric, i.e., the time between the first packet being sent and the last one being received of a given flow.In the rest of the paper the FCT is measured end-to-end and not only within the data center.Hence, lower values of the slowdown indicate better system performance.We assess and discuss the effect of buffer size at the Cu and its implication on FCT.To evaluate the impact of the protocol in the wireless domain, we analyze the FCT of flows of users with different channel quality, e.g., users located near the edge of the cell versus users located near the base station.

Results
The first experiment compares the FCT of pDCell against solutions that are not aware of the wireless domain requirements at the congestion control protocol level.Specifically, Fig. 9 compares performance of pDCell with the different scheduling algorithms illustrated in Section 3.6 against TCP New Reno and legacy pHost in the data center domain, both employing the PFS scheduler.We observe that pDCell, being aware of the wireless network status, outperforms significantly the other solutions regardless the employed scheduling algorithm.Focusing on pDCell performance, the MRFS scheduling algorithm consistently reduces FCT with respect to PFS.In particular, small flows for users with good quality benefit the most from MRFS as they are scheduled immediately in the next frame after arrival at Cu. PFS instead, treats all users and flows the same way, using sub-optimally the bandwidth of the cellular network and increasing the overall FCT for all flows.
The following experiment aims at verifying the effectiveness of channel quality awareness by the congestion control protocol and compares the performance of the wireless schedulers.Fig. 10(a) analyzes the effect of the CQI reporting.For the experiment, we compare the cases where the CQI report remains fixed during the simulation period and when it varies.In the former case, we assign to each user a CQI value uniformly distributed in the range [0 − 15].Compared to PFS, MRFS provides a significant advantage with tracebased CQI reports, as it adjusts the scheduling according to channel quality and prioritizes the schedule of short flows.Hence, when the CQI is fixed, MRFS can only perform decisions taking into account flow size and performs sub-optimally.PFS performance is similar with fixed and trace-based CQI as scheduling decisions do not take into account the flow characteristics.For the next experiment, we focus on small flows, i.e., the < 10 KB flows that represent nearly 80% of the traffic.Fig. 10(b) shows the FCT grouped according MRFS with its awareness of channel quality statistics and remaining flow size significantly outperforms PFS.In particular, the FCT achieved by users experiencing bad channel quality improves by 2 orders of magnitude compared with PFS.Interestingly, while the dependence between the FCT and user CQI of PFS is linear, in MRFS is not.The reason is that, by design, MRFS always chooses to schedule first flows with minimum remaining size of users in favorable channel conditions.
The next set of experiments measures FCT performance of pDCell while varying the amount of buffer space at the Cu for both schedulers.To achieve ultra-low latency, it is essential that flows encounter small queuing delays along the path to the end user.Fig. 11(a) shows the percentage of non-admitted packets and compares the performance of pDCell and pHost for different buffer sizes.The former, by leveraging information on user-perceived channel quality, slows down application sources by reducing the fraction of nonadmitted packets that have to be rescheduled.In contrast, pHost application sources sending rates do not adapt to the user-perceived channel quality and keep pushing traffic to Cus' buffers, who fill up quickly and maintain a high percentage of non-admitted packets.This negatively impacts the FCT.Fig. 11(b) analyzes buffer utilization after having set the maximum Cu buffer size at 3 MB.Note that the Cu buffer size is shared among all the users and flows, hence the 3 MB buffer corresponds to an average per flow buffer of only two packets.pHost reaches quicker than pDCell the full buffer occupancy which it then maintains for the entire simulation period.pDCell after the initial period where the queue fills up, achieves lower full buffer utilization.Hence, it reduces the fraction of non-admitted packets and improves consequently FCT.Fig. 11(c) shows that FCT values remain very similar when the buffer size varies from 3 to 30 MB for both schedulers.These results show that the performance of pDCell remains very similar for a wide range of values and can operate at very low buffer sizes, independently of the choice of scheduler.

PERSPECTIVES FOR IMPLEMENTATION IN 5G SYSTEMS
The system enhancements proposed and evaluated so far are applicable to MEC architectures for current LTE mobile networks and next 5G systems with users attached to one Du at a time.Next, we show that minimal modifications need to be performed to the buffering architecture to support multi-connectivity.In 3GPP Release 12, the concept of dual-connectivity was introduced.A UE can be simultaneously connected to two eNodeBs, one master (MeNB) and a secondary (SeNB).The MeNB handles both control and user plane, while the SeNB typically handles the user-plane only by providing additional resources.For 5G systems, the concept of dual-connectivity evolved into multi-connectivity where a UE can be simultaneously attached to a LTE eNodeB and a 5G next generation eNodeB (gNB).The 3GPP Release 15 defines the concepts of 5G new radio (NR) and provides the basis for internetworking among LTE and NR [29].Leveraging the PDCP/RLC split, user-plane traffic can be split or duplicated with packet duplication at the aggregation point (the PDCP layer) in the Cu [27].Then, traffic undergoes separate procedures on the lower-layer protocols, and UEs can individually benefit from separate scheduling, transmission on different frequencies, and reliability schemes.
The buffer management architecture in § 3.2 can support multiconnectivity with minimal modifications.The number of per-user queues at the Cu is not longer one per Du (see Fig. 4), but the RRC protocol will allocate two queues to each active user, one for MeNB and a second one for the SeNB.Since traffic for each Du is scheduled on a different output port of the shared memory switch and multiple per-user queues are allocated at the Cu, the update of virtual thresholds at the Du is driven by both the CQI from the user side and feedback from the Cu.

CONCLUSION
MEC architectures leverage edge data centers to perform processing of core and baseband functions of mobile networks.While considerable efforts have been devoted to the analysis of architectural requirements, transport protocol design has remained largely an open research area.In this work, we present pDCell, the first transport for edge data centers hosting MEC applications, which is specifically adapted to cover the whole domain from the data center to the end users attached to base stations.To this end, pDCell incorporates requirements of the wireless domain into the data center transport, including a novel congestion control protocol coupled with a wireless scheduler.Our performance evaluation highlights that pDCell significantly outperforms data center domain transport solutions which are agnostic of the wireless domain.Specifically,

Figure 1 :
Figure 1: Architectures of a mobile network operator: (a) conventional and (b) MEC realization, and (c) functional split options inspiration from recent advances in data center transport designs and extends the data center transport to seamlessly incorporate the requirements of the mobile domain.The main novelty of pDCell is to couple the transport protocol with the scheduler of the air interface, residing within the Cu at the edge data center.pDCell vastly improves transport efficiency and can be deployed in a way that is transparent to services and applications by exposing a TCPcompliant socket application interface at the server and mobile terminal side.By coupling the transport congestion control with the wireless scheduler, pDCell can immediately slow down a source sending traffic to a user when radio channel conditions worsen, rather than waiting for slow TCP end-to-end congestion control to adapt.Therefore, buffer sizes in the transport network can be substantially smaller, reducing overall latency.Our main findings are as follows:

2. 1
.2 Evolved Packet Core (EPC).The complete set of core functions specified by the 3GPP are called the Evolved Packet Core, which jointly with an EUTRAN (LTE RAN) forms the complete EPS (Evolved Packet System) or cellular network.The EPC consists of multiple components (see Fig. 1(a)), including Packet Data Network Gateway (P-GW), Serving Gateway (S-GW), Mobility Management Entity (MME) and Policy Charging Rule Functions (PCRF).Each Figure 2: Setup OpenEPC

Figure 3 :
Figure3: Performance (in term of FCT) of legacy transport protocols in the edge data center scenario can be on the order of seconds as packets traverse and wait in long queues.Given the above results, we argue that a new transport needs to be designed specifically for mobile edge data centers.We base this design on data center transport rationale, but parameterize and adapt it to the requirements of the mobile network.The new transport must incorporate specific aspects of the wireless domain and be flexible to adapt to latency and bandwidth variation, while maintaining the benefits of the DCN transport such as low queue occupancy, low FCT and low losses.

Figure 4 :
Figure 4: Buffering architecture of the wireless domain: (a)each Cu is a shared memory switch with a number of output ports equal to the number of associated Dus; (b) each Du is a shared memory switch with a number of output ports equals to the number of associated users.the queuing delay can grow up to 540 ms, which is highly detrimental to the performance of interactive applications[21].To prevent such behaviour, we advocate the a better used of buffer space and we implement the following mechanism for pDCell.At the Cu level, one queue per user is necessary to control application sources in the data center.The allocation is performed by the Radio Resource Control (RRC) protocol when it detects the status of an user as active.The queue size is defined via a virtual threshold that changes based on: i) the feedback from the corresponding Du, ii) the current load at Cu level.Since the sum of the virtual thresholds of per-user queues could exceed the processing capacity of the Cu, the buffer space is shared among all per-user queues and the total per-user queue size is adjusted to the processing capacity of the Cu.The number of output ports in this shared memory switch is equal to the number of associated Dus, where outgoing traffic of each port is shaped to the Du processing capacity (see Fig.4(a)).If for the same user the propagated value from a Du virtual threshold is smaller than the one currently available in the Cu, the latter is decreased.Otherwise, the value of the virtual threshold increases if the current buffer occupancy allows.Similarly to prior work[21], the shared memory switch operates at PDCP level, although for scalability reasons per-user and not per-flow queues are allocated.The natural choice for buffer management in a shared memory switch is a Longest-Queue-Drop (LQD)[4] policy that drops already admitted packets from the longest queue in case of congestion.By design, LQD provides some level of fairness among users to access Cu processing capacity.Since the goal of the transport is to ensure reliability and minimize retransmissions, already admitted packets should not be dropped.Hence, the LQD behavior needs to be modified.When the Cu buffer is full, the virtual threshold of the currently longest queue Q is decreased by one unit without dropping packets in Q and another incoming packet currently evaluated for admission is simply not admitted.Hence, we guarantee that each admitted packet has buffer space reserved.Furthermore, incoming packets belonging to a user whose queue occupancy is equal to its virtual threshold are also not admitted, regardless of the buffer occupancy.In § 3.7 we will show how to handle retransmission of non-admitted packets to limit non-in-order arrivals and consequent

4 Figure 5 :
Figure5: Admission of an incoming packet at Cu. Difference between LQD and the proposed HTD before, during and after admission of a single packet.The size of the Cu buffer B is 4. time-consuming reordering.We call this buffer management policy Highest-Threshold-Decrease (HTD).Fig.5highlights the difference between the LQD and the proposed HTD admission controls.Before the new packet p arrival, the virtual thresholds of the first and the second queues (Q i and Q j ) are 4 and 2, respectively.The actual queue occupancy of Q i and Q j are 3 and 1, respectively.Since the size of the buffer is 4, newly arriving packet p to Q j causes congestion in the Cu buffer.LQD drops one already admitted packet from Q i as it is the longest queue, and admits p to Q j .The virtual thresholds for both queues remain the same.In contrast, HTD never drops an already admitted packet, hence, HTD reduces by one unit the virtual threshold of Q i and prevents p from being admitted.Note that HTD reacts more slowly than LQD, but satisfies the requirement that admitted packets are never dropped (challenge (4)).Since each mobile user is controlled by a single Du, each Du maintains a single queue per user (see Fig.4(b)).In stationary regime, the queue status at t + 1 is given as follow:

Figure 7 :
Figure 7: Distribution of flow sizes of data center traffic to the Cu

Figure 8 :
Figure 8: FCT with different traffic matrices towards the servers and 4 40 Gbps towards the core switches.Both core and TOR switches implement packet spraying and are equipped with tiny per-port buffers of 36 KB each.The DCN includes as hosts Cu and distributed EPC to be compliant with § 2.1.1 and the latest technology developments[19].The wireless scheduler decides which packets are transmitted to the Du and it works on the time scale of frequency-division duplex (FDD) sub-frames (1 ms).

Figure 9 :
Figure 9: Distribution of FCT for pDCell, pHost and TCP a single Cu, which can be extrapolated to scenarios where several Cus are located in the same edge data center.

Figure 10 :
Figure10: Analysis of FCT for users experiencing different channel quality to the users' channel quality.Specifically, users with CQI values in the range 1 ≤ CQI < 6 experience a bad channel quality and use QPSK modulation, users with 7 ≤ CQI < 10 use 16QAM, while users experiencing good channel quality with 10 ≤ CQI ≤ 15 use 64QAM.MRFS with its awareness of channel quality statistics and remaining flow size significantly outperforms PFS.In particular, the FCT achieved by users experiencing bad channel quality improves by 2 orders of magnitude compared with PFS.Interestingly, while the dependence between the FCT and user CQI of PFS is linear, in MRFS is not.The reason is that, by design, MRFS always chooses to schedule first flows with minimum remaining size of users in favorable channel conditions.The next set of experiments measures FCT performance of pDCell while varying the amount of buffer space at the Cu for both schedulers.To achieve ultra-low latency, it is essential that flows encounter small queuing delays along the path to the end user.Fig.11(a)shows the percentage of non-admitted packets and compares the performance of pDCell and pHost for different buffer sizes.The former, by leveraging information on user-perceived channel quality, slows down application sources by reducing the fraction of nonadmitted packets that have to be rescheduled.In contrast, pHost application sources sending rates do not adapt to the user-perceived channel quality and keep pushing traffic to Cus' buffers, who fill up quickly and maintain a high percentage of non-admitted packets.This negatively impacts the FCT.Fig.11(b) analyzes buffer utilization after having set the maximum Cu buffer size at 3 MB.Note that the Cu buffer size is shared among all the users and flows, hence the 3 MB buffer corresponds to an average per flow buffer of only two packets.pHost reaches quicker than pDCell the full buffer occupancy which it then maintains for the entire simulation period.pDCell after the initial period where the queue fills up, achieves lower full buffer utilization.Hence, it reduces the fraction of non-admitted packets and improves consequently FCT.Fig.11(c)shows that FCT values remain very similar when the buffer size varies from 3 to 30 MB for both schedulers.These results show that the performance of pDCell remains very similar for a wide range of values and can operate at very low buffer sizes, independently of the choice of scheduler.

Figure 11 :
Figure 11: Analysis on buffer size at Cu it vastly reduces FCT, which makes our solution a prerequisite for ultra-low latency applications.

Table 1
The CQI report has direct relation with the modulation and coding scheme and in turn the transport block size used to allocate physical resource blocks for data transmission to the mobile user.