Location-based Analytics in 5G and Beyond

—Location-based analytics leverage accurate location awareness enabled by the ﬁfth generation (5G) mobile technology standard, as well as the integration of heterogeneous technologies, to empower a plethora of new services for 5G verticals and optimize the use of network resources. This paper proposes an end-to-end architecture integrated in the 5G network infrastructure to provide location-based analytics as a service. Based on such architecture, we present an overview of cutting-edge applications in 5G and beyond, focusing on people-centric and network-centric location-based analytics.


I. INTRODUCTION
Location information is a pivotal service of 5G and beyond cellular networks and will enable a plethora of new location-dependent use cases. Indeed, since Release 16, the 3GPP is enhancing 5G networks and devices with localization functionalities targeting a very high level of location accuracy (with sub-meter accuracy 95% of the time or more) [1], [2]. Besides the localization of users, there is a growing interest in location-based analytics, i.e., the analysis of the location and behavior of people and objects in public areas, roads, and buildings, through dedicated infrastructures or by relying on user devices [3]- [10]. While closely related, location-based analytics are not a mere extension of user equipment (UE) localization, but rather a new paradigm that enables a large variety of scenarios and applications.
Location-based analytics can be classified as people-centric and network-centric. People-centric analytics refer to the ensemble of information related to people presence and movements in physical spaces (e.g., people counting, dynamic map creation, flow tracking, fusion of spatiotemporal data with multimodal information, and anomalous behavior detection) [3]- [7]. Network-centric analytics refer to the ensemble of information related to network operation (e.g., network planning, fault detection, resilience, location-aware diagnosis and troubleshooting) [8]- [10]. On the one hand, the ability to operate 5G networks in both sub-6 GHz and millimeter wave (mmWave) frequency bands and the use of massive antenna arrays significantly extend the capabilities of cellular localization. On the other hand, such new 5G features, including beamforming, multi-connectivity, and the adoption of new spectrum portions pose new challenges for autonomous network management. The provision of location-based analytics relies on complex features and mobility patterns extracted from raw location-related data inherent in physical and network events. This calls for an extension of the 5G network functions (e.g., the scheduler) to interface with location data in a multi-layer and flexible architecture that i) facilitates secure sharing and re-use of accurate location and context data for diverse localization services, and ii) combines the different network functions for the extraction of location-based analytics. There is a unique opportunity for network providers to make location-based analytics a network-native service in 5G and beyond, which will be pivotal to creating new disruptive services and to optimize network performance.
This paper proposes a full-stack architecture integrated in the 5G network infrastructure to enable a plethora of services requiring location-based analytics. Such analytics rely on enhanced positioning provided in 5G together with heterogeneous data. Finally, we present a set of case studies on peoplecentric and network-centric analytics that can be implemented with the proposed architecture.
II. END-TO-END ARCHITECTURE The 5G system consists of the next-generation radio access network (5G-RAN) and the 5G core network. The 5G-RAN is a distributed set of base stations, or gNBs, managing the efficient use of the radio spectrum. The 5G core operates via network functions and service-based interfaces. Network virtualization decouples network functions from the hardware they operate on, leading to a more dynamic system that can be controlled by software. We propose new system functionalities integrated in the 5G network infrastructure to allow operators and service providers to expose location-based analytics as a service. Such functionalities leverage 5G network information combined with heterogeneous data from other radio access technologies [2].

A. Localization and analytics functions
We propose the use of virtualization techniques to run the localization and analytics functions as virtual functions, with the support of both traditional virtual machine and cloud-native container based techniques. This provides an augmentation of the 5G architecture by leveraging the ETSI network function virtualization framework, which represents the 3GPP standard for operators to deploy 5G network functions in virtualized infrastructures. This augmentation of the 5G architecture offers operators and service providers the possibility to expose new location-based analytics to third parties and exploit location data for smart network management. Fig. 1 shows a comprehensive view of the proposed system architecture and includes details of how the location-related functions coexist on top of a virtualized infrastructure for on-demand deployment in the form of localization services. The proposed system is compliant with the 3GPP 5G core architecture as it makes use of a service-based architecture integrated in the 5G network functions and augments it with atomized and independent location functions. Specifically, the system is aligned with the enhanced 3GPP Location Service (eLCS) architecture, which specifies 5G network functions, interfaces, and workflows for location-related functionalities [1]. Here, the location management function (LMF) coordinates and calculates the user position for location-based services requested by external or internal eLCS clients, including other network functions. In our proposed system architecture, the localization enablers provide the other system functions with UE location data (e.g., coordinates, velocity, direction). In particular, localization enablers implement the LMFs deployed on-demand to fulfill specific performance requirements: integration of 5G New Radio (NR), global navigation satellite system (GNSS), and WiFi, as well as device-free localization. Such LMFs provide location data to the location data analytics functions (LDAFs) for the provision of location-based analytics. LDAFs can be considered as LCS clients and use location data from LMFs. People-centric and network-centric LDAFs apply descriptive, predictive, prescriptive, and diagnostic algorithms to respectively perform statistical analysis on location and network data, assess future possible conditions, search for actions to be taken, and determine the causes for specific conditions. Finally, the integrity, security, and privacy functions provide authentication and advanced cryptographic techniques on the localization and analytics data to be exposed towards external applications, secure conditional sharing techniques, and data management policies (e.g., anonymization, obfuscation).

B. Localization analytics as service APIs
The localization analytics are exposed as services through dedicated application programming interfaces (APIs). A service can be seen as a combination of multiple localization related functions (i.e., LMFs and LDAFs) that need to be wired in the form of pipelines to provide the desired output requested by external applications. This requires a workflow execution engine (management and orchestration) to translate service requests into functional steps that involve the localization related functions.
The localization analytics output is then exposed through Localization Analytics Service APIs either as an on-demand RESTful service or as a continuous data stream. This is managed via dedicated access control functions within the API layer. The overall approach has the main goal to provide a flexible and composable platform where the various localization functions can be combined while facilitating sharing and re-use of some of the key functionalities (e.g., those for the localization enablers or data security and privacy) across different localization services.

III. 5G LOCALIZATION
This section presents the ongoing 3GPP standardization activities and the research in the area of 5G localization, to better define the eLCS involved within the proposed architecture. We also give an overview of the main technologies that can be combined with 5G location data to improve localization.

A. 5G standardization and metrics
The 5G NR was defined in two phases (Phase 1 and 2) corresponding to 3GPP Rel. 15 and 16. Localization in 5G was introduced in Rel. 15 for non-standalone operation (5G networks aided by existing 4G infrastructure) and continued in Rel. 16 with standalone NR operation, with further enhancements in Rel. 17. 5G localization mainly relies on measurements of single-value metrics, such as downlink and uplink time difference of arrival (DL/UL-TDoA) and beamforming angle of arrival (AoA) or angle of departure (AoD). Depending on the use case, some received signal strength indicators such as the reference signal received power (RSRP) and the reference signal received quality (RSRQ) can also be used for positioning.
The use of richer information within the localization process, e.g., exploiting multipath or prior information about the environment, can be extended to use soft information [11] to significantly improve the localization accuracy in 5G scenarios, especially in challenging environments.

B. Heterogeneous location data fusion
The fusion of radio access technology (RAT)dependent and RAT-independent location data in a hybrid fashion can help to meet the demanding localization requirements on accuracy, latency and integrity level for 5G use cases.
GNSS is supported in 3GPP for 5G and the combination with cellular positioning is needed for many use cases in which one technology is not fully operating or has limited coverage, such as in tunnels or urban canyon scenarios. Studies show that use of even only one high-accuracy 5G timing measurement can significantly improve the horizontal positioning accuracy with respect to GNSS standalone solutions [12].
Concerning the integration of other RATindependent positioning methods, the combination of ranging measurements for a UE from multiple WiFi access points (APs) and 5G NR cells, for both indoor and outdoor scenarios, is envisaged to accomplish high-accuracy positioning. However, in 5G networks, the location server may not have the information about the WiFi APs' exact locations; this limits the usefulness of WiFi data at the location server. In such cases, for instance, smartphone movements can be estimated using WiFi Fine Time Measurement ranging measurements [13]. These data can be integrated in a network-based location system defined in 3GPP, where the network collects timing and angle measurements sent from the UE.
In this context, the large bandwidth of mmWave networks not only enables very high accuracy positioning, but enables simultaneous localization and mapping (SLAM) though AoA information. SLAM in mmWave networks relies on anchor location estimation, device localization, and environment mapping for both physical and virtual anchors. A low complexity SLAM algorithm fully integrated with a mmWave communication system is feasible with median error smaller than 0.3-0.5 m [14].

C. Device-free localization
Device-free localization relies on the detection and analysis of signals reflected by device-free targets (persons, vehicles, etc.) as in radar networks. Such networks sense the wireless environment to infer the location of targets and can take advantage of any modulated signal at any frequency of operation. The ultra-low latency connectivity and a finer radar range resolution enabled by 5G are paving the way to the use of 5G NR waveforms for joint radar and communication. As an example application, a 5G integrated radar service has been proposed in [15] for future vehicle networks. In this context, the use of mmWave technology is particularly relevant since the reduced wavelength allows the use of massive arrays with electronic steering capabilities, thereby improving the directionality properties for detection and tracking of device-free targets.

IV. FROM LOCALIZATION TO ANALYTICS
This section presents a set of case studies for people-centric and network-centric location-based analytics. They are conceived for implementation as LDAFs (see Fig. 1) for compatibility and direct integration in the 3GPP 5G core architecture. These examples only cover a subset of possible use cases, and are introduced to showcase the system functionalities in the envisioned architecture, which is aligned with the ongoing work in 3GPP and could be further extended based on the technical implementation of use cases.

A. People-centric location-based analytics
People-centric analytics provide insights and empower domains such as smart cities and transportation enabling a number of 5G services.
1) Mobility clustering: This use case investigates the mobility patterns in large-scale mobility datasets, which can be implemented within the proposed architecture using 5G LMFs as input. Such datasets exhibit challenges in terms of granularity, regularity, and accuracy, which motivate the use of modern deep learning techniques to be implemented as LDAFs. We investigate recurrent networks based sequence-to-sequence autoencoders [4] for human mobility analysis. We conduct unsupervised spatiotemporal clustering on the OpenPFLOW dataset [3], which represents walking, biking and commuting mobility in the city of Tokyo for 24h at regular 1 minute timesteps. The autoencoding model is formed by stacking layers of gated recurrent units in an encoder/decoder structure.
After training, spatiotemporal aspects of the mobility data are encoded in the latent space represented by the encoder output. There we apply principle component analysis, and then use the K-Means method to detect clusters. Fig. 2 shows the process applied to walking trajectories from [3]. The visualization on the actual Tokyo map indicates potential trends such as regional, sub-regional, and crossregional mobility concentration, as well as patterns of stationary and non-stationary behavior across different time periods. The fusion of heterogeneous technologies and contextual information enabled by the proposed end-to-end architecture, such as network conditions, events and geographic labels from surrounding environment, will further improve such mobility analytics, e.g., with dedicated network functions for anomaly detection and the interplay with other aspects of human activity. The proposed approach builds also on the architecture's ability to maintain a steady influx of data in order to validate, update, and retrain the proposed model.

2) Group detection and people counting:
There is a growing interest in designing crowd-centric device-free [5] and device-based methods for group detection and people counting that infer the number of targets directly from the measured data without estimating their locations.
The Group-In method [6] is an LDAF that provides group inference using as input the wireless traces collected by the WiFi LMF. A previous Group-In study [6] used experiments in an indoor setting but did not consider the application of Group-In at a large scale using WiFi datasets. In this paper, we apply the Group-In algorithms to a city-scale dataset [7] that is a result of the pilot study in Gold Coast, Australia. The Group-In LDAF provides the following localization analytics APIs: 1) group detection; 2) long-term group detection; and 3) crowd size. Group detection infers the people groups during short time intervals (e.g., 2min), longterm linkages aggregates group detection for each pair of people based on frequency of appearance in same groups over a longer time interval (e.g., weekly), and crowd size shows the number of people at the time intervals.
As the city-scale dataset does not contain ground truth values for people groups, we select a parameter set based on the controlled lab experiments. The fixed (selected) parameters provide satisfactory performances in almost all scenarios (more than 80% pairwise and Jaccard accuracy except when the groups of devices are consistently closer than 2m to each other). Moreover, the performance is cross-validated by dividing the controlled datasets into five equal data chunks and applying same parameters to data chunks without re-calibration. It is possible to apply analytics in the large-scale dataset without additional training since no explicit training phase is needed due to the unsupervised graph-clustering-based approach. Through groundtruth data, a more precise calibration of the parameters could be achieved. We observe that it is computationally feasible to apply Group-In at a large scale, infer groups out of more than 100 people, and generate insights in near real-time. Fig. 3 shows the results of applying Group-In to a one-week trace (10 min. time interval, 30 sec. sampling time). The preliminary results include existence of static WiFi devices as well as mobile WiFi devices that are in vehicles (on the road). Moreover, a single person is considered as one group. As a result, the most commonly observed group was a one-person group, followed by two people groups, and so on. As expected, we observe a positive correlation between the number of groups and the number of people. The data follows a daily trend with a peak value (up to 110 people) every day. Results indicate that Group-In is a promising technique for the analysis of city-scale data for long periods. Accurate localization through 5G will lead to more granular insights for people counting and group behaviour identification without additional computational complexity.

B. Network-centric location-based analytics
Two use cases are now presented to show the use of location-based analytics for network management: network optimization for efficient service provisioning considering the dynamic changes in the network and location-aware diagnosis/troubleshooting for the maintenance of the cellular network by identifying problems as well as ensuring the resilience of the network itself.
1) Network optimization: An example of location-aware network optimization is pencil beamforming based on the estimated UE position. Pencil beamforming relies on the location information of the LMF communicated to network-centric LDAFs, but unlike other types of services it does not interface third parties via APIs, but communicates these analytics to other management network functions for the gNB beam management within the 5G-RAN. We have performed a preliminary analysis for the impact of pencil beamforming on the QoS of 5G networks and the ElectroMagnetic field (EMF) exposure. To this aim, an open-source simulator has been developed [9] that is able to synthesize the traffic beams for each gNB, both in direction and beamwidth, by exploiting UE localization accuracy. Each beam is directed towards the centre of a circular area in which the UE is assumed to be, where the diameter of this circular area indicates the uncertainty level for UE location estimate. Differently from [9], we summarize here the main insights about the location-aware management of the pencil beams, by analyzing average EMF and throughput over the territory. Table I presents the values of the average EMF [V/m] and the average throughput [Mbps] according to different location uncertainty levels, together with the confidence intervals (C.I.). Results show that an increase of the location uncertainty level results in a higher EMF (due to possible overlap of the wider beams) and a lower throughput (the higher beam width lowers the beam's directivity). Therefore, higher localization accuracy helps to reduce the EMF exposure while increasing the throughput.
2) Network diagnosis: Location-aware network diagnosis can rely on contextualized indicators, i.e., time-series metrics combining location and cellular network measurement. Such indicators are extracted from the network measurements reported by the users in different areas of interest, including the cell coverage, center, and edge. This concept can be especially beneficial for 5G ultra-dense scenarios, characterized by a high dynamicity of users and an increased demand due to the reduced coverage areas and inter-site distances [10].
Supported by the high-accuracy localization produced by the 5G LMFs and provided to the Network-centric Analytics (LDAFs), from which the contextualized indicators are part of, novel mechanisms for failure diagnosis can be implemented and provided as applications for the networks operators. In this way, and going beyond the previously cited approach [10] (that worked with manually defined areas and simply Bayes classifiers), novel developments supports the complete automation of the definition of the areas of interest by estimating cell coverage area, center, edge, influencing area on other cells (the area that will be most likely be covered by the cell in case of a failure in its neighbor) and area being influenced by each of their neighbours. This has led to an increased number of available contextualized indicators that can be used for diagnosis. Figure 4 compares the performance of failure diagnosis mechanisms using only classic metrics with the use of both classic and contextualized metrics (fusion). This is done for the indoor ultra-dense scenario with 12 picocells and multiple modelled failures presented in [10]. Network diagnosis is performed based on three classifiers, namely K-nearest neighbors, discriminant analysis classification and multiclass error-correcting output codes classification. Results show how for the different classifiers the use of contextualized data considerably decreases the diagnosis error rate with respect to only using classical metrics, thus providing a powerful tool for 5G failure management. The availability of localization data for the generation of the locationenriched metrics allow the median diagnosis error rate for the three classifiers to be reduced significantly, going below 1% for disc and multiclass. This demonstrates the relevance of location-aware information for improving failure management of 5G networks. The proposed network management approaches can use different types of data. For example, Minimization of Drive-Test (MDT) data can be directly integrated in the proposed approaches, e.g., in the calculation of contextualized indicators. Therefore, when MDT traces are available, they could help in both obtaining more accurate locationbased analytics and improving network management and diagnosis.
V. CONCLUSION This paper has presented a new system architecture for the provision of location-based analytics as a service, which will enable a plethora of new people-centric and network-centric applications for 5G verticals. The proposed system architecture is an augmentation of the 5G architecture, where network and user data from heterogeneous technologies are Fig. 4: Comparison between the diagnosis error rate (DER) obtained by classic and location-enriched contextualized metrics in an ultra-dense scenario, using k-nearest neighbors (KNN), discriminant analysis classification (disc) and multiclass errorcorrecting output codes classification (ECOC). combined to extract on-demand analytics that can serve third party applications and can be used to optimize the network performance. Example analytics for cases of study involving people grouping, mobility clustering, network optimization, as well as network diagnosis have been illustrated, showing the effectiveness of the proposed architecture.
VI. ACKNOWLEDGMENT This work was supported by the European Union's Horizon 2020 research and innovation programme under Grant no. 871249. The pilot study in Gold Coast is conducted with NEC Australia.