Reliability comparisons of mobile network operators: an experimental case study from a crowdsourced dataset

: It is of great interest for Mobile Network Operators (MNOs) to know how well their network infrastructure performance behaves in different geographical regions of their operating country compared to their horizontal competitors. However, traditional network monitoring and measurement methods of network infrastructure use limited numbers of measurement points that are insufficient for detailed analysis and expensive to scale using an internal workforce. On the other hand, the abundance of crowdsourced content can engender various unforeseen opportunities for MNOs to cope with this scaling problem. This paper investigates end-to-end reliability and packet loss (PL) performance comparisons of MNOs using a previously collected real-world proprietary crowdsourced dataset from a user application for 13 months’ duration in Turkey. More particularly, a unified crowdsourced data-aided statistical MNO comparison framework is proposed, which consists of data collection and network performance analysis steps. Our results are statistically supported using confidence interval analysis for the mean difference of PL ratios and reliability levels of MNOs using unpaired number of observations statistical analysis. The network performance results indicate that significant performance differences in MNOs depending on different regions of the country exist. Moreover, we observe that the overall comparative ordered list of MNOs’ reliability performance does not differ when both PL and latency requirements vary.


Introduction
Almost all Mobile Network Operators (MNOs) are heavily investing in network infrastructure where necessary changes and technological upgrades have a clear impact on their daily end-to-end network performances.
However, providing nationwide coverage with a high quality Key Performance Indicator (KPI) is not an easy task. Additionally, maintaining better KPI guarantees in some areas means that other parts of the country will not easily have excellence (e.g., in urban, suburban, or rural areas). As a matter of fact, services in large geographical areas have different network demands and requirements where various technical approaches (e.g., statistics, machine learning) need to be exploited for adequate provision in urban, suburban, or rural areas of the country. For example, to support an acceptable Internet connection, services such as voice over IP (VoIP), live streaming (e.g., YouTube or Netflix), 4K Ultra or full HD Video, online video gaming, and instant messaging should be enabled for all MNOs depending on the geographical locations as well as users' periodic demands at certain times of the day [1]. * Correspondence: engin.zeydan@cttc.cat This work is licensed under a Creative Commons Attribution 4.0 International License.
MNOs' performances over the mobile infrastructure have changed dramatically in recent years and a close race between MNOs exists to upgrade to an advanced mobile infrastructure. The introduction of 4G and Long Term Evolution Advanced (LTE-A) services has increased subscribers' expectations in terms of data rates, latency, and reliability. The network end-to-end performances of different MNOs also differ based on their different investment and management strategies. For example, for some MNOs that are deploying massive Internet of Things (IoT) devices, highly reliable and low block error rate (BLER) data transmission may be of interest, whereas for other MNO interested in providing communication for time-critical applications, low latency can be critical. Moreover, those KPI performances of MNOs can differ based on the geographical locations' target metrics and requirements. Some of the important KPIs that are studied within this paper for comparing MNO performances as well as defining target requirements are as follows: (i) Latency is the measurement of Round Trip Time (RTT) it takes for one packet sent from user equipment (UE) to reach to application server and then back to UE, (ii) Packet Loss (PL) is defined as lost data/total sent data, which gives the ratio of data packets that are not arriving at the intended destination. (iii) Reliability is defined as the average number of correct observations (in percentages) for the given requirements when the amount of β bytes of transmitted data is transmitted with BLER less than λ and within a less than γ ms of latency. In other words, reliability measures provision of a high level of correct message transmission within a latency bound. In the present paper, we utilize The 3rd Generation Partnership Project (3GPP)'s reliability definition [2], which is defined as one of the KPIs according to 3GPP's The Fifth Generation (5G) requirements for Ultra reliability and low latency communications (URLLC) use case [3].
On the other hand, crowdsourcing data from the masses has also gained steam in the last few years as a desired source of insights [4]. Crowdsourcing data is differentiated from traditional data collection procedures in many aspects in terms its cost efficiency, frequency, immediacy, and usability, and may lead to significant potential to MNOs for analyzing their users' behaviors. Together with a large number of received network performance observations scattered nationwide, unlocking valuable insights about a recent product, service, or brand is possible. This can be helpful to solve large-scale network monitoring problems of MNOs in a costeffective manner using the power of crowds. For example, analyzing large-scale test results from end-users provides the opportunity to understand network service performance, relate anomalies based on KPIs tests with other datasets (e.g., social media data), and study the evolution of network performance over the observation time. For MNOs, this analysis is useful to build accurate models that can help in building smarter next generation infrastructure. Additionally, extracting knowledge from crowdsourced data is especially valuable as it can enable the development of fine-grained business models.
To obtain valuable information from wireless 4G cellular networks using a crowdsourced dataset, utilization of appropriate statistical and intelligent analytic techniques is needed. In this light, a natural question arises: how can crowdsourced data and statistics help us to extract insight into performances of MNOs as well as their competitors' wireless networks? Contributing to the solution of this question, in this paper we utilize a real-world proprietary KPI observations dataset that is built by experiments performed by different MNOs' UEs on large-scale geographical locations over a long period. Using statistical analysis methods on this dataset, we focus on end-to-end performance comparisons of three major MNOs in Turkey 1 . Our results reveal the impact of location based on latency, reliability, and PL KPIs of different MNOs providing nationwide coverage.

Related work
In this section, we will classify the related works into three broad categories, namely network performance measurements, crowdsourced data, and reliability aspects.

Network performance measurements
For extracting knowledge from mobile network performance measurements, there are various works in the literature for network planning [5][6][7][8] and for analyzing KPI performance differences of MNOs [9][10][11]. A recent enabler technology that provides low-latency and ultrahigh reliable communication services is given elsewhere [8]. Coverage analysis based on Base Station (BS) KPIs to reduce drive test and field measurement cost using big data analytics is proposed [12]. A case study targeting dropped calls and bad quality-of-service minimization is provided [6]. In order to identify variance in end-to-end performance of network behavior, spaceand time-based properties of network performances of different carriers are analyzed [9]. A previous study [10] investigated interactions between radio layers, network protocols, and applications and their effects on 4G Long Term Evolution (LTE) network performance. The impact of packet size on one-way delay for the Download (DL) in 3G mobile networks using measurements from several Swedish mobile operators is shown [11].
In a recent work performed in the UK [13], it is shown that the network performance effects are quite important and play an important role in terms of end-users' choice of MNOs. Similarly, the significant positive impact of network effects on the demand for mobile services and end-user's decisions is demonstrated by analyzing the data from January 1998 to June 2003 in Germany [14]. Experimental results on network performances of 12 commercial mobile operators across Europe are described [15]. Finally, large-scale cellular network traffic collected from thousands of BSs for traffic modeling purposes is analyzed [16]. However, in most of the above cases, network performance measurements and comparisons between MNOs are done either using small-scale datasets collected via experimental trials (such as drive tests with devices capable of collecting KPI data using multiple MNO subscriber identification module (SIM) cards) or lack appropriate comparative reliability analysis using large-scale crowdsourced datasets collected from thousands of UEs.

Crowdsourced data
Crowdsourced datasets are useful for the analysis and exploration of the MNOs' performances. There are numerous works employing different aspects of crowdsourced data for better network optimization in MNOs [17,18]. Using Austrian Regulatory Authority for Broadcasting and Telecommunications (RTR) Open Data (a crowdsourced dataset) 2 , Kousias et al. [17] investigated the effect of different features that distinguished MNOs from each other. On the other hand, Apajalahti et al. [18] used both RTR Nettest and Netradar [19] crowdsourced datasets to design a feature mapping strategy between those datasets. A set of quality-of-service (QoS)-related metrics that are crowdsourced from the UEs are used to analyze the MNOs network performances under stringent ElectroMagnetic Field (EMF) constraints and regulations [20]. Fida and Marina [21] rely on crowdsourced data to provide coverage maps of different MNOs and also focus on coverage map accuracy from the perspective of device diversity. Compared to most of the existing analysis based on using crowdsourced datasets, in this work we focus on understanding the comparative behaviors of the performance of MNOs using specific network infrastructure KPIs (e.g., calculated reliability together with packet loss and latency KPIs) on a large scale (i.e. in nationwide locations with thousands of measurements over a long period of duration of major MNOs' metrics in Turkey).

Reliability
The reliability metric has been the focus of the 5G era for mission-critical services that require URLLC [22][23][24].
URLLC cases demand 99.999% network reliability and 1 ms latency [25]. In general, it specifies that packets are successfully delivered while the latency bound is satisfied. However, other definitions also exist including reliability definition by 3GPP, reliability per node, and control channel reliability [2,8]. A low cost portable measurement framework to quantify the end-to-end latency and reliability metrics of communication links is given [24]. Qu et al. [22] introduced a REliability-Aware service CHaining (REACH) framework to ensure reliable service chaining with Virtual Network Functions (VNFs). In another paper [23], the authors propose a softwarized 5G architecture to ensure end-to-end reliability of network services such as mission-critical traffic. Although there have been many works on ensuring reliability aspects for MNOs, no experimental evaluations have been devised for comparisons of different MNOs in terms of their composite metrics such as reliability using a crowdsourced dataset. In our previous related work, we investigated the performance of three MNOs in terms of latency, DL/Upload (UL) speed, jitter, and PL [26]. Different from our previous analysis, in the present paper we extend these comparison ideas into combining different metrics to evaluate the performance of MNOs in terms of their reliability.

Our contributions
The statistical comparative performance analysis of major MNOs is still an open and active research area where MNOs, users, and service providers are willing to know how well their horizontal and vertical competitors are performing in certain regions of their operating country. In this paper, different from the previous works outlined above, we analyze the end-to-end performance comparisons of MNOs based on reliability and PL KPIs using statistical confidence interval (CI) for proportions analysis as our basis methodology. Compared to the state-of-the-art, our MNO reliability analysis contributions are mainly focusing on a real-world proprietary crowdsourced network performance test dataset collected from thousands of UEs over a period of 13 months in Turkey, which is different from pure simulation-based analysis that relies mostly on synthetic data generators. A summary of our key findings is given as follows: • A framework that invokes statistical techniques using crowdsourced data and extracts MNO networks performances in different cities in Turkey is proposed where the visualization is done using the Folium visualization tool's interactive map.
• Statistical approaches suitable for comparisons of MNOs are proposed according to different KPIs such as reliability and PL.
• Our results indicate that there may be major variations in end-to-end network performances between MNOs based on different geographical regions of Turkey and ordered list of reliability performance comparisons of analyzed MNOs do not change when latency and PL ratio requirements vary. Table 1 provides the notations and their corresponding descriptions that are used throughout the rest of the paper. Within the paper, sets are denoted by uppercase calligraphic letters such as N and L. The rest of the paper is organized as follows: in Section 3, we provide the system model and architecture. In Section 4 we provide requirements for reliability and concepts for statistical calculation analysis using performance comparisons of proportions. In Section 5, we provide evaluation results for comparisons of major MNOs in Turkey. Finally, in Section 6 we provide the conclusions and future work.  Figure 1 demonstrates the system level architecture for utilizing data analysis over the network performance test dataset for an example of three MNOs. As illustrated in Figure 1, the data collection process for obtaining the data involves several steps. First, UEs perform the measurement tests via a network performance test application installed in the mobile UE of each MNO and send the measured KPI values into the application service provider marked as data flow in step-1. The application service provider, marked as step-2, has a privacy preserving unit, marked as step-3, which performs two main tasks. First, it generates unique test

Network performance analysis
As illustrated in Figure 1, network performance analysis runs the statistical methods to compare the KPI performances of MNOs. The network performance test data are extracted from the application server database and transferred into the Pandas data analytics toolbox 3 for further statistical analysis as marked by step-6. To determine the CI of the network performance test observations, we perform CI analysis for proportions where proportion is defined as the percentage of reliability or PL. At certain times of the month, one MNO can yield better latency or lower PL performance than other MNOs. This is expected due to randomness in the number of test data samples and wireless network. Therefore, the statistics on such random sampled data need to be specified in CIs. In the present paper, we study an unpaired comparison case where there is no one-to-one correspondence between the network test data observations of MNOs [27]. After comparisons are calculated, data visualizations over the map are performed using Folium & Leaflet interactive maps 4 , marked as step-7 in Figure 1.

Reliability and packet loss analysis
To compare the performance of MNOs, providing high reliability is a critical requirement for performing a stable data exchange between the UEs and the content. Reliability performance can also be a differentiating factor for MNOs. In general, 4G systems' reliability performance is good where the typical BLER (or PL ratio in this paper) is around 10 −2 to 10 −3 [28]. Depending on the applications, some may also require much higher reliability together with low latency (e.g., 10 ms for some URLLC applications such as cooperative collision avoidance system (CCAS) [29]). On the other hand, it is difficult to achieve both high reliability and low latency simultaneously. For our analysis, we aim to meet the following reliability requirement: • PL < λ for the transmission of β bytes with an end-to-end latency of γ ms.

CI for packet loss ratio and reliability percentages
To calculate unpaired PL performance comparisons of MNOs based on those collected unpaired and independent KPIk ∈ K observations between MNOs, we compute the sample means of reliability percentages and PL ratios and then utilize the statistical difference of two observations to conclude about performance advantages or weaknesses of a given MNO compared to other MNOs. For calculating PL CI, we utilize CI analysis for proportions method [27]. For proportions, we calculate the average ratio of PL at each geographic location-l ∈ L over the observation duration for all MNO-m ∈ M and KPIk ∈ K . To compare two MNOs using the difference between two means of reliability and PL ratios, we utilize the CI for comparative studies method. When two observations are independent of each other, the standard error for the difference between two means of observations is where s m e (k, l) and s n e (k, l) are the standard error of the observation KPI-k ∈ K for MNO-m and MNO-n ∈ M , respectively, for a given geographic location-l ∈ L . The standard error is calculated as where the PL ratio (or reliability) for N m observations is p m (k, l) for the m-th MNO and location l . If normal approximation of binomial distribution with N m p m (k, l) ≥ 10 exists, the confidence interval for the PL ratio or reliability percentage can be calculated as where α denotes the significance level and z 1−α/2 denotes the (1 − α/2)-quantile of N (0, 1). Thus, a confidence level of (1α )100% in calculating CI m,n (k, l) for the differences between two PL ratios (or reliability percentages) is given byp wherep m (k, l) is the average PL ratio (or reliability) over all observations. After observing the CI value of (3), if the CI contains zero within this (1α )100% confidence interval, then the statistical difference is insignificant and no conclusions can be drawn based on the PL (or reliability) ratios of two MNOs.

Evaluation results
For evaluation results of three MNOs, anonymized data are collected offline on a daily basis according to the data collector process of Section 3.2 and transferred into the network performance test database for a period of 13 months ranging from January 2017 to February 2018. The statistics of the measurement tests are given in Table 2 for all MNOs, which yields a relatively large sample size for fair comparisons. To obtain graphical results, we have utilized Python's seaborn statistical data visualization library 5 .
The network performance test measurements are done by ''real users '', i.e. the users of MNOs themselves performing tests at different nationwide locations and times using the preinstalled applications on their UEs.
These users measure the network performance of their MNOs at certain positions (which can be enterprises, homes, airports, shopping centers, etc). Note also that not all experiments done by the users of each MNO are using the same infrastructure, i.e. different network infrastructures (radio access network (RAN) and transport and core networks). Moreover, the amounts of data used for comparisons are at city-scale or nationwide.

Parameters for comparisons
For our evaluation results, we have selected the reliability requirements for λ and γ values as the mean values of PL ratios and latency, respectively, of all measurements in Turkey as also given in Table 2. Therefore, the average reliability in Table 2 is also given with respect to these selected values.
For our MNOs comparisons, without loss of generality, we have selected the numerical requirements λ = 0.007 (mean nationwide PL ratio) and γ = 28.279 ms (mean nationwide latency) for transmitting β = 32 bytes of data transmission size. Note that these requirements can also be selected to correspond to specific application requirements, which is outside the scope of this paper. To make fair comparisons, we have also assumed that latency is linearly proportional to transmitted data size and have updated all latency values with respect to transmission of β = 32 bytes (which is similar to 3GPP defined definition in [2]), i.e all experiments' latency values are multiplied by a factor of 32/x i , where x i is the transmitted data size of the i -th experiment.

Performance difference of MNOs across different geographical regions
In this subsection, we investigate the performance differences of MNOs with respect to reliability and PL in different city locations in Turkey. We also visualize the CI data using the Folium interactive heatmap tool.  (3). We can observe that in major cities such as İstanbul and İzmir MNO-1 performs better than MNO-2 and MNO-3 performs better than MNO-2 (note that a similar trend is also observed in Adana). In the other big city, Ankara, MNO-1 is again better than MNO-2, whereas MNO-2 has better reliability than MNO-3 (note that a similar trend is also observed in Çorum). In small cities such as Çankırı and Bitlis, we cannot conclude with 90% confidence levels that MNO-1 or MNO-2 is better than the other. Moreover, CI levels in those cities are higher due to the existence of relatively small numbers of  Figure 4 shows the obtained reliability versus increasing latency (ms) requirements ( γ ) graph for comparisons of three major MNOs in Turkey. Note that we have also shown CI values for each point using Eq. (2). We can easily observe that MNO-3 performs the best, followed by MNO-1 and MNO-2. We can also observe that as latency requirement γ increases, the trend for performance order differences between MNOs does not change, i.e. MNO-3 performs the best, followed by MNO-1 and MNO-2. Moreover, irrespective of MNO comparisons, it is observed that, as the latency requirement increases, after a certain point reliability does not change. This is due to the fact that as the latency requirement is relaxed reliability increases; however, after some point since the PL rate requirement λ is kept constant at 0.07 , the reliability values saturate at fixed points for all MNOs.

Reliability results and CI visualizations
For general reliability comparisons of MNO-2 and MNO-3 in all cities in Turkey, we plot Figure 5 using Folium data visualization. From this figure, we can easily observe that MNO-2 is better than MNO-3 in the majority of cities in western and southern regions, whereas MNO-3 performs better in the central regions of the country. We cannot conclude with 90% confidence level that MNO-2 or MNO-3 is better than the other in eastern cities of the country, which are marked in white.

MNO-2 is better than MNO-3
No Decision MNO-3 is better than MNO-2 Figure 5. Comparisons of reliability of MNO-2 and MNO-3 in all cities in Turkey using Folium data visualization when λ =0.07, β =32 bytes, and γ =28.279 ms. Figure 6 shows the obtained reliability vs. increasing PL ratio requirements ( λ ) comparisons of three major MNOs in Turkey. We can observe a trend similar to that in Figure 4, where MNO-3 performs better than MNO-1 and MNO-2. Similarly, the overall reliability performance comparison order list (i.e. the reliability of MNO-3 being better than MNO-1 followed by MNO-2) of MNOs does not change when PL requirement λ varies. Similar to observations in Figure 4 irrespective of MNO comparisons, as the PL requirement increases after a certain point, reliability does not change. This is due to the fact that as the PL requirement is relaxed, reliability increases; however, after some point since the latency requirement γ is kept constant at 28.279 ms, the reliability values saturate at fixed points for all MNOs.  cities in Turkey. We can observe that in big cities, such as İstanbul and Ankara, MNO-3 is better than MNO-2, whereas MNO-1 is better than MNO-2. Similar to Figure 3, in small cities, such as Bitlis, Mardin, and Çorum, the CI levels are higher due to the low numbers of observations. In some cities such as İzmir, we cannot make a comparison decision with 90% confidence level for MNO-1 and MNO-2. Similarly, no decision results are obtained in Yalova for MNO-2 and MNO-3 comparisons.

Conclusions and future work
In this paper, we investigated the end-to-end network performance comparisons of MNOs using a crowdsourced dataset of network performance tests of each UE collected over a duration of 13 months with different KPI. We utilized PL and latency values to compare reliability performances of three major MNOs in Turkey. Our results indicate that significant performance differences exist across the MNOs based on location/region of the operating service as well the preset latency and PL requirements for reliability analysis. Overall, we find out that the ordered list of MNOs for comparative KPI performances based on reliability does not change when latency and PL ratio requirements vary. In future work, we are planning to investigate the reliability comparisons corresponding to different applications such as massive machine type communications, enhanced mobile broadband or ultrareliability, and low latency communications over 5G networks.