Design and Validation of an Open Source Cloud Native Mobile Network

Network technologies are embracing the cloud-native paradigm, following the current best practices in cloud computing. Cloud-native technologies might be applied to different types of network functions in a mobile network, but they are particularly relevant nowadays for core network functions, as the recent standard introduces the service-based architecture that matches modern cloud-native technologies such as Docker and Kubernetes. In parallel, a number of open source software initiatives already provide researchers and practitioners with usable software that implements the key functionality of a mobile network (for both LTE and 5G). These software solutions, however, are monolithic and not integrated into state-of-the-art cloud-native frameworks. In this article, we fill this gap by describing the implementation of a cloud-native mobile network, which supports channel emulation and provides an affordable and scalable way of testing orchestration algorithms with standardized VIM interfaces. Our experimental evaluation shows the applicability of our solution, which is released as open source and illustrates its flexibility.


IntroductIon
Following the recent needs for higher levels of network flexibility, programmability, and automation, the network function virtualization (NFV) [1] technology can be considered as one of the most important drivers for the 5G networks (and beyond). When compared to (legacy) 4G/LTE services, 5G network services exhibit very diverse characteristics; for example, they may operate in very short timescales, should be restricted to certain geographic locations, may require more integrated means of interacting with the network, and eventually may require higher degrees of flexibility. Because of this, proprietary hardware and telecommunication protocols, especially those related to network management, have often been considered a severe impediment to the development of novel technologies.
The characteristics of NFV cope well with the increasingly demanding requirements of 5G networks and beyond. Besides the advantages in terms of operational costs, due to the commoditization of the network through its softwarization, the extreme flexibility of this paradigm for (re-) configuration, scaling, and continuous deployment and integration matches the very diverse 5G network stakeholder ecosystem. With this new approach, it is possible to deploy services on commercial off-the-shelf (COTS) servers, traditionally used for cloud computing. This has enabled communication providers to multiplex services over the same physical infrastructure, share resources between the different functions, deploy on-demand per user load, an, consequently, reduce operational expenses (OPEX).
However, even though virtualization technologies based on virtual machines provide full hardware independence, they are based on hypervisors, an intermediate virtualization layer that is considered difficult to operate and manage, and may include unacceptable loads when dealing with fast timescale operations. Also, the deployment, configuration, and management of novel networking services are prone to a series of drawbacks, resulting from the tight dependency between software and hardware. This constitutes an obstacle to the effective deployment of NFV technology and has motivated significant efforts to migrate from these "static" physical network functions (PNFs) to "dynamic" virtual network functions (VNFs). As a result, operators have been reluctant to discard their investments in PNFs for a solution that struggles to yield the desired levels of resiliency [2].
To overcome the above issue, cloud network functions (CNFs) have gained a lot of attention, as they are already widely used in cloud computing environments, thus tested for their robustness. CNFs rely on a very thin virtualization layer, and have a minimal footprint on the infrastructure, allowing the implementation of modern software architectures such as microservices [3], and providing a very high degree of configurability. In fact, CNFs are attracting a lot of interest: on one hand, standardization bodies such as the European Telecommunications Standards Institute (ETSI) have adopted orchestration frameworks tailored to this paradigm [4], while on the other hand, the open source Cloud Native Computing Foundation (CNCF) [5] has already defined a roadmap composed of technologies, tools, and patterns for contenarization, CI/CD, observability, messaging, distributed databases and most importantly, networking. However, as CNFs are just arriving in the telecommunications domain, they are lacking the required maturity levels prior to their adoption in production environments (beyond proprietary solutions). Since the use of CNFs in mobile networking requires overcoming many challenges Design and Validation of an Open Source Cloud Native Mobile Network

NETWORK SOFTWARIZATION AND MANAGEMENT
The authors describe the implementation of a cloud-native mobile network, which supports channel emulation and provides an affordable and scalable way of testing orchestration algorithms with standardized VIM interfaces.
[6], a democratization of the corresponding software components (e.g., released as open source) shall foster the development and testing of the required solutions. Motivated by the above, in this article we present an end-to-end mobile network with channel emulation capabilities, monitored and managed by a CNF orchestrator, which is released as open source. Our contributions are: • We provide researchers and practitioners with an open source scalable solution to emulate a complete 5G mobile network, building on well-known cloud-native tools. • We develop the required modules and tools to support the network operation, namely: -A module to support the channel emulation, which is particularly useful for those scenarios where specialized hardware is not available, or in repeatable experiments with varying channel conditions are to be performed -Data shippers to monitor the diff erent network functions, supporting the implementation of autonomous network management decisions -A collection of confi guration blueprints (i.e., Helm charts) to automatize the orchestration processes. • Two use cases to illustrate the features of our solution and to validate the behavior of the tools and modules developed As mentioned, interested practitioners can experiment with our proposed cloud-native mobile network, as it is available in open source (including the code required to deploy the use cases). For consistency reasons we split the codebase into core and orchestration (https:// github.com/kaposnick/k8s-open5gs), and access (https://github.com/kaposnick/srslte).
The rest of the article is structured as follows. We present the design of the solution, introducing the diff erent building blocks required to build a cloud-native mobile network; we describe the procedures to deploy the network; we introduce the experiments we performed to benchmark the solution and assess its performance; we discuss related research work and initiatives; and fi nally, we conclude the article.

dEsIgn of A cloud-nAtIVE EcosystEm
The softwarization of mobile network functions has allowed network operators to reduce their OPEX by using open software, and also enabled researchers and practitioners to experiment with new network functionalities. This software-driven transition is being accelerated by the introduction of cloud-native architectures [7], which greatly increases the fl exibility and scalability of the software. In what follows, we introduce the diff erent components of our cloud-native mobile network design, several of them chosen among the set of open source solutions that have become available during the last few years (we review them later).

oVErVIEw
The diff erent components of our design are illustrated in Fig. 1. For orchestration, we chose Kubernetes, an open source system for deploying and managing containerized applications. For the core network implementation, we selected Open5Gs, motivated by its adaptability to the cloud native paradigm. For the user equipment (UE) and the access network, we rely on srsRAN, open source software that provides a full mobile network stack implementation for those elements. Furthermore, the srsRAN community is very active, and its contributors keep intensively improving the product to keep up with the latest Third Generation Partnership Project (3GPP) standards. To simplify the deployment and support repeatability, we developed a simple channel emulator, which enables deploying a fully programmable end-to-end network without requiring expensive hardware. Finally, we chose Prometheus, one of the monitoring solutions promoted by the Cloud Native Computing Foundation (CNCF) [5], as the monitoring framework to collect the status of the whole system. In what follows, we detail these components.

mAnAgEmEnt And orchEstrAtIon
Kubernetes (also known as k8s) is a container orchestration platform widely adopted in the cloud computing domain for more than five years. A typical cloud-native application consists of a set of container images (e.g., Docker images) that are deployed and managed by Kubernetes using configuration files that describe the functionality and the services they provide. Its scaling and redundancy capabilities have already been demonstrated in production environments, and it is envisioned that telco operators will rely on Kubernetes for next-generation networks, motivated by the need to make more effi cient use of the heterogeneous virtualized infrastructure (which spans over central and edge data centers). To manage the additional management complexity, a huge ecosystem of open source projects that extend the possible operational tasks has been created under the umbrella of the CNCF [5].

corE nEtwork functIons
Open5Gs is a very popular open source implementation of a mobile network core. Written in C, it stands as a reference among researchers and mobile telecommunications practitioners for experimentation and future enhancements. Currently supporting up to 3GPP 5G Release 16, it contains the most important components of the 5G core (5GC) and 4G evolved packet core (EPC) with control-user plane separation (CUPS) [8], meaning it can operate on both 5G non-standalone (NSA) and standalone (SA) modes, as it can serve both 4G E-UTRAN NodeB (eNBs) and 5G next generation NodeB (gNBs). Its straightforward build procedure makes its deployment in small- The softwarization of mobile network functions has allowed network operators to reduce their operational expenditure by using open software and also enabled researchers and practitioners to experiment with new network functionalities. This software-driven transition is being accelerated by the introduction of cloud-native architectures [7], which greatly increases the flexibility and scalability of the software.
scale private networks very easy, while its modular architecture could be considered as a natural candidate for running it into microservices-based cloud-native environments powered by Kubernetes, as we focus on in this article. As illustrated in Fig. 2, Open5Gs pods will be deployed over the virtual infrastructure, while srsENB is directly running on top of Docker containers.
rAn functIons srsRAN [9], formerly srsLTE, is another open source software framework, developed by Software Radio Systems, that provides the functionality from the PHY up to the radio resource control (RRC) layer for eNB/gNBs, while also supporting 4G/5G UEs. It is written in C/C++, and its confi guration parameters cover a wide range of possible base station and UE confi gurations. Regarding the RF interface, its contributors have developed drivers for various commercial hardware RF-frontends such as URSP, Soapy SDR, and BladeRF.
Additionally, srsRAN provides a software RF-frontend based on ZeroMQ, an open source message queueing library written in C. When using this driver, the transmitted I/Q baseband symbols between UE and base station are transferred over various transport methods, such as inter-process communication (IPC) or TCP sockets. Choosing this driver avoids the need for high expertise in RF channel configuration and facilitates the introduction of researchers who want to simulate a radio access network (RAN) environment, but whose RF channel is not their main area of interest, or who would be reluctant to invest in actual hardware transceivers. We next discuss how we take advantage of ZeroMQ to implement a channel emulation model. chAnnEl EmulAtIon: thE grc brokEr As mentioned, the use of ZeroMQ enables running the network in a fully softwarized way. Furthermore, it also enables the emulation of complex topologies via programming (e.g., arbitrary complex topologies can be created by dynamically connecting endpoints, as they do in large-scale emulators using hardware in the loop [10]). The main challenge when emulating complex mobility scenarios relies on the handling of I/Q symbols (e.g., sending low-power symbols from the eNB to trigger a Handover Request from the UE).
Following the srsRAN documentation, we utilized the GNU Radio Companion (GRC), which is another open source project mainly used for software-defi ned radio (SDR), and, among others, contains modules for the ZeroMQ library. We coded in Python a GRC Broker, located between the UE and the cells of the eNB(s) implementing the RF interface, as illustrated in Fig. 3. This module intercepts the transmitted I/Q symbols and performs operations on them to emulate the channel condition for the two traffi c directions: • In the downlink direction, in order to simulate the distance between the UE and the cells, we made use of multiplier blocks followed by an adder. Multiplier blocks multiply the time-domain cells' transmitted samples with a constant gain value in the range [0, 1], adjusting UE's perception of the cells' signal strength. The adder block simply performs the superposition of the modifi ed samples, as would be the case in the real environment. • In the uplink direction, we used a throttle block that re-transmits the I/Q symbols toward the cells of the eNB(s). We leverage this setup to create diff erent evaluation scenarios (discussed earlier) and leave as future work the implementation of additional features into the GRC Broker.

monItorIng
In order to have a consistent view of the mobile network's current status in a unifi ed platform, we deployed the cloud-native version of Prometheus (kube-prometheus-stack) in the same Kubernetes cluster used for the Open5Gs. In this section, we describe the steps taken in order to monitor both core and RAN through Prometheus.
Core Network: When kube-prometheus-stack is deployed in a Kubernetes cluster, by default it installs exporters in all the master and worker nodes, which collect and expose metrics like CPU usage, RAM, networking status, and IOPS on different granularity levels (node, pod, container). By exporting those metrics, we can monitor the health of the virtual infrastructure running the different core network functions, as depicted in Fig. 2. Prometheus operator pods have set those exporters as targets and will start polling them at fi xed intervals using HTTP in a well-known path: (metrics). Those metrics are directly accessible through either the Prometheus WebUI or Grafana, a visualization tool, using PromQL, a functional query language.
RAN: srsRAN produces an array of metrics for the lower layers of the networking stack that can be exported to the console or to a text file. However, srsRAN does not have an agent acting as a metrics exporter toward third-party software. Therefore, we modified the source code of the srsENB in order to develop and integrate a new one into the final binary. The first step in this direction was to integrate the lighttpd project, a lightweight open source HTTP server written in C. Then we developed the REST application programming interface (API) through which the metrics, in key-value pair format, will be exported to the Prometheus operator. Afterward, we developed Kubernetes manifests that comprise a pod, whose role is to proxy the HTTP requests received by the Prometheus operator toward the end srsENB process. Those manifests are deployed to the Kubernetes cluster with the IP address of the srsENB as an input parameter.

Virtual Infrastructure
Software Infrastructure Last, facilitating the discovery capabilities of Prometheus, we deployed a ServiceMonitor confi guration resource, which automatically discovers the "proxy" pods existing in the cluster and sets them as Prometheus targets. Thus, for every srsENB process we instantiate, an auto-discovery service is performed, and once it is completed a few seconds later, the eNB starts getting polled (or scraped, using the Prometheus terminology).

buIldIng And dEployIng thE nEtwork
One of the main advantages of cloud-native solutions is their extreme fl exibility thanks to the easy templating and parametrization of the software components. In the following, we discuss how we use two key components of our setup, manifests and Helm charts, to perform the mobile network confi guration.

kubErnEtEs mAnIfEsts
To be able to manage very heterogeneous software components, Kubernetes needs to abstract the specific parameters associated with them. To this aim, so-called Manifests are used to defi ne the information model (i.e., the parameters associated with each function) for each network function. Thus, we had to provide the Kubernetes resource files for each software component in the network architecture. Open5Gs provides software implementing the network functions (NFs) for both EPC and 5GC under the same umbrella, as depicted in Fig. 4: • Control-plane functions: MME, HSS, PCRF, SGW-C, PGW-C (for the 4G/LTE); AMF, SMF, UDM, UDR, NRF, NSSF, AUSF, PCF, BSF (for the 5GC) • User-plane functions: SGW-U, PGW-U (for the 4G/LTE), and UPF (for the 5GC) Hence, manifests include information such as the exposed ports, internal interconnections, and software components to be deployed by Kubernetes. Each NF is instantiated within the context of a StatefulSet controller instead of a standalone Pod so that it is appropriately managed in case of termination. Additionally, to support realistic use case scenarios, where control plane and user plane NFs should be closely located but in distinct nodes, we set the nodeSelector attribute to mobile-core: control for control plane NFs and mobile-core: user for user plane NFs. This attribute is enforced by Kubernetes at the scheduling stage.
For the inter-pod communication, we use the ClusterIP service type so that the corresponding service pods are accessible only within the Kubernetes cluster (preventing external access), while for the interfaces exposed toward the RAN (i.e., S1-MME/N2 by the MME/AMF, S1-U/N3 by the SGW-U/UPF), we use the LoadBalancer service type. This type of service is used in cloud environments to enable external connectivity toward the cluster's pods, and typically uses a load balancer at the frontier of the cloud data center (DC).

hElm chArts
The deployment stage of a cloud computing service (a mobile network core, in this case) has proven to be prone to human errors for a number of reasons: • A considerable amount of Kubernetes manifests (3 manifests per NF) have to be applied for every single iteration of an experiment. During the de-provisioning of the service, the reverse procedure has to be executed, taking care to de-instantiate all the NFs. • A few manifest attributes have to be provided at the initial stage, as they depend on the current state of the environment. Such attributes are the IP addresses that must be accessible by the RAN. Statically defining them, such as in a production environment like in an operator's DC, is a solution that limits the flexibility of the deployment in a local environment, as they depend on the network interface card (NIC) IPs, which are obtained through DHCP and are renewed upon reboot. Those IPs are used for setting the LoadBalancer service configuration as  To be able to manage very heterogeneous software components, Kubernetes needs to abstract the specific parameters associated with them. To this aim, so-called Manifests are used to define the information model (i.e., the parameters associated with each function) for each network function.
described above and the IP address that the UPF advertises to the SMF (and consequently to the AMF and the gNB) when addressing a service request. Thus, as is already done by major cloud computing services, we employ Helm charts to automate those configuration tasks by storing parameterized Kubernetes manifests along with a single configuration file (containing all the global parameters) in a centralized location (this chart can be uploaded to an online registry for accessibility reasons, as in the case of Docker images). Then Helm automatically generates the equivalent Kubernetes manifests and afterward provides them to the Kubernetes API server. This results in a single command executed for (de)provisioning of the entire core network, thus greatly improving the automation.

ExpErImEnts
We ran the Kubernetes cluster on a KVM cloud platform using Ubuntu 18.04.6. We then used Ubuntu 20.04.1 LTS as the image of all the nodes of the cluster (one master and two workers, as illustrated in Fig. 2), with Docker 20.10.8 as the container runtime, as well as kubelet and kubectl. The control plane was set using the kubeadm utility and Flannel as the container networking interface (CNI). The network functions were connected to an internal network for control plane and inter-pod communication, and NAT forwarding is enabled for management purposes through SSH. The two worker nodes had an additional network adapter of type Bridged Adapter attached to another Ethernet adapter of the host machine, exposing the N2 and N3 interfaces toward the gNB(s). The two worker nodes were also annotated with the labels mobile-core:control and mobile-core:user, which are taken into account during the deployment stage of the CNFs. The RAN (srsENB/srsUE) is set up using Docker in a commercial laptop with the 4 core CPUs at 1.8 GHz and 16 GB of RAM. Each srsUE process runs in a separate networking namespace in order to have distinct routing tables.
In what follows, we deploy two different scenarios to illustrate the use of our platform and validate the adequate behavior of the developed modules. For each considered scenario, the Open5Gs helm chart had to be deployed from scratch to fully decommission software components from previous runs. Also, the UDR database had to be populated with UEs' information for authentication purposes. At each chart deployment, the core network reached a healthy state after approximately 2 minutes, with several pods restarts due to unfulfilled interdependencies: for instance, the UDM container has to wait for the UDR database container to start up; otherwise, it is restarted. To check the end-to-end state of the core network, we access the Prometheus REST API (which exposes the retrieved metrics, as discussed before). We then analyze the performance of the system for the following scenarios. We remark that, in addition to these two scenarios, the platform can be used for several other research activities, such as prototyping new radio schedulers, developing novel fine-grained orchestration algorithms involving CPU pinning or thread-level quota, and building on the GRC broker for the emulation of more complex scenarios with multiple users.

scEnArIo I: dynAmIc orchEstrAtIon
In this case, the objective is to stress the u-plane network functions. We start the experiment by deploying one mobile network instance, consisting of one 5GC, and one gNB with one associated UE, and configure the UE to runs an iperf test towards an iperf server connected to the UPF through the N6 interface. We measure the amount of CPU resources consumed by the different components via the Prometheous exporter, with the results depicted in Fig. 5. According to the results, the user plane function occupies most of the CPU, while the rest of the network functions have an almost negligible footprint.
To illustrate the dynamic orchestration capabilities of our system, which may be leveraged by, for example, artificial intelligence (AI) algorithms in combination with the monitoring capabilities, we then trigger the instantiation of another network at approximately 1000 s. While this procedure is done manually for this experiment, in an autonomous environment the data shippers provide all the needed information about, for example, the network function load, to support this operation. As the figure illustrates, the two instances create a  We start by deploying a mobile network instance with one EPC and two eNBs, where each eNB generates two cells. We then deploy two UEs, both associated with the same eNB but in a diff erent cell, since the current implementation of ZeroMQ limits the number of UEs per cell to one. As in the previous case, each UE generates bidirectional iperf traffi c toward a server connected to the PGW-U.
From the above scenario, we trigger the handovers by taking advantage of the GRC broker to emulate a change in the channel conditions. More specifically, we program the GRC broker to emulate the movement of UEs between cells by judiciously changing the received power from the UEs to each eNB (in the uplink) and from the eNBs to the UE (in the downlink). This is done by modifying the gains that multiply the samples from the eNBs and throttling the samples from the UEs (Fig. 3), achieving the effect of slowly moving away from one eNB and getting closer to the other one. The inverse procedure is applied to emulate the movement back to the previous cell, and the whole process is repeated indefi nitely.
We depict in Fig. 6 the resulting downlink (top) and uplink (bottom) throughput at the medium access control (MAC) layer of the two eNBs. First, it is worth noting the relatively small throughput values, which are caused by the emulation of the channel that "enlarges" the duration of the TTIs (thus decreasing the actual throughput). Second, the figure clearly illustrates how the throughput moves between eNBs, with approximately only one eNB sending and receiving traffi c at a time. These results showcase the fl exibility of the implementation of the GRC broker, which can be used and extended to support more complex radio scenarios, and be used as a sandbox for the development of mobility-triggered mechanisms (e.g., follow-theload orchestration) in a controlled environment.

rElAtEd work
The rapid, easy, and cost-efficient deployment of end-to-end mobile networks using open source software has been studied in various works, and different implementations have been published, mostly for large deployments. Works such as [11,12] leverage on the Open Air Interface (OAI) [13] as the RAN solution, using a hardware-based radio interface (i.e., USRPs) as the front-end, thus preventing the evaluation of more complex channel conditions and mobility scenarios (note that channel emulation is available with the latest version of OAI). Additionally, the core network is provided and orchestrated as a single container without management capabilities at NF granularity and without providing control and user plane separation. In [14], certain extensions are proposed in Kubernetes resources' specifi cation to support NFV loads. In general, the experimental work in this fi eld focuses on the RAN component: for instance, [15] leverages a network slicing aware modifi ed version of srsRAN to provide a slice-aware orchestration solution. Still, as this setup is also leveraging real radio front-ends, the extensibility to more UEs or diff erent radio conditions is limited. On the other hand, large-scale radio testbeds like Colosseum [10] have less focus on the core part, which is a fundamental aspect when testing end-to-end solutions. Hence, to the best of the authors' knowledge, there has not been any open source effort for end-to-end mobile network stack emulation using cloud technologies, exclusively conventional hosts, and no RF hardware front-end, which would be ideal for small experimental deployments for research and educational purposes.
conclusIon Telecommunication operators' infrastructure is currently being migrated to cloud environments, with statically deployed PNFs being replaced by dynamic and flexible CNFs. The observability of  heterogeneous network components is crucial to achieving predictability, fast fault detection, and scalability. With these new concepts being adopted, networking researchers need the ability to deploy end-to-end mobile networks for experimentation and development in an easy and affordable manner. In this article, by leveraging existing open source software projects and developing new modules and tools, we have designed and developed an end-to-end cloud-native open source 4G/5G mobile network. This solution is a cost-efficient tool for the development and validation of novel control and orchestration algorithms since it does not require specialized hardware. We have illustrated the use of our solution with two use cases, leveraging the data shippers, the automatic deployment, and the channel emulation capabilities that have been developed.