A Survey of Enabling Technologies for Network Localization, Tracking, and Navigation

Location information for events, assets, and individuals, mostly focusing on two dimensions so far, has triggered a multitude of applications across different verticals, such as consumer, networking, industrial, health care, public safety, and emergency response use cases. To fully exploit the potential of location awareness and enable new advanced location-based services, localization algorithms need to be combined with complementary technologies including accurate height estimation, i.e., three dimensional location, reliable user mobility classification, and efficient indoor mapping solutions. This survey provides a comprehensive review of such enabling technologies. In particular, we present cellular localization systems including recent results on 5G localization, and solutions based on wireless local area networks, highlighting those that are capable of computing 3D location in multi-floor indoor environments. We overview range-free localization schemes, which have been traditionally explored in wireless sensor networks and are nowadays gaining attention for several envisioned Internet of Things applications. We also present user mobility estimation techniques, particularly those applicable in cellular networks, that can improve localization and tracking accuracy. Regarding the mapping of physical space inside buildings for aiding tracking and navigation applications, we study recent advances and focus on smartphone-based indoor simultaneous localization and mapping approaches. The survey concludes with service availability and system scalability considerations, as well as security and privacy concerns in location architectures, discusses the technology roadmap, and identifies future research directions.


I. INTRODUCTION
L OCALIZATION, tracking, and navigation systems are attracting growing attention from researchers, engineers, and practitioners due to the consumer penetration of high-end, sensor-rich mobile devices and the ubiquity of wireless communication networks. These systems span different application domains from customer-centric location-based services, such as mobile advertising and behavioral retail analytics, to resource allocation in wireless networks, to emergency call positioning, like E911 in U.S.A. and E112 in EU.
In particular, cellular network operators have a strong interest in localization technology mainly due to their needs for network planning and optimization. For instance, identifying traffic hotspots (i.e., crowded areas where network capacity is insufficient during peak hours and/or public events) and poor coverage areas, as well as performing root cause analysis of call drops, failed Hand-Overs/Offs (HO), and low key performance or quality indicators is critical for ensuring uninterrupted service, fast recovery from undesirable network conditions, and -at the end of the day-improved end-user experience. Moreover, location information is important for optimizing small and macro cell deployment to address the increasing needs of cell phone users. Combined with user mobility classification (i.e., static, walking, motorway, railway) operators can improve network efficiency through load balancing, transmission scheduling, etc.
While Global Navigation Satellite Systems (GNSS), such as the Global Positioning System (GPS) are the default solution for outdoor localization with clear sky view, there is no prevailing technology for GNSS-deprived areas, including densely built city centers, urban canyons, and importantly deep inside buildings, where satellite signals are severely attenuated or totally blocked, and affected by multipath propagation. As statistics indicate that people spend most of their time inside buildings [1] and the majority of cellular calls and data connections originate from indoors, there is an increasing demand for highly accurate indoor localization systems. Especially for emergency response services, the U.S.A. Federal Communications Commission (FCC) launched stringent requirements in February 2015 on network operators asking for a 50-meter horizontal accuracy to be provided incrementally for 40%-80% of emergency calls within 2-6 years, as well as a proposal for a vertical accuracy metric to be approved and comply with within 6 years [2].
In fact, vertical accuracy is critical for realizing the vision for 3D location, especially inside multi-floor buildings and skyscrapers in modern city centers. For instance, it is more helpful for emergency responders to know the correct floor where an emergency call was initiated (even if the estimated user location is several meters away from the true location), rather than being directed to the wrong floor (even if the estimated user location is exactly below the true location on the lower floor). In 2013, the FCC Communications Security, Reliability and Interoperability Council (CSRIC) documented several emerging indoor location technologies [3] and reported extensive accuracy results of a number of commercial systems during localization of more than 13,400 test E911 calls across 19 buildings [4]. The great interest in performance evaluation of indoor localization systems under real-life conditions is also evident from newly released standards [5] and related competitions, including the Microsoft Indoor Localization Competition [6], [7], the EvAAL contest for evaluating Ambient and Assisted Living (AAL) systems through competitive benchmarking [8], [9], and most recently PerfLoc competition for smartphone indoor localization applications announced by the U.S.A. National Institute of Standards and Technology (NIST) [10].
In order to meet the requirements for next generation 2D/3D location several enabling technologies need to be advanced and tested in the field, including accurate height prediction (also known as floor determination) in modern multi-floor residential and enterprise buildings, reliable user mobility estimation, and time-efficient mapping of the physical space in indoor environments. Moreover, with the pervasive penetration of Internet of Things (IoT) applications into daily life, the importance of solutions that have been studied in Wireless Sensor Networks (WSN), such as multi-hop range-free localization, is increasing. Also, upcoming technologies and industry trends in the context of 5G communication networks necessitate the use of enhanced localization techniques to attain not only high 2D/3D location accuracy, but also address new challenges, such as availability, scalability, security, and privacy [11]. These solutions require advanced signal processing methods, hybridization of existing techniques, and intelligent information fusion algorithms to fully exploit location-dependent signals from cellular 2G/3G/4G, and Wireless Local Area Networks (WLANs).
In fact, WLAN can be used as a promising localization technology, not only due to the ubiquitous infrastructure and ease of collecting WLAN Received Signal Strength (RSS) measurements on Wi-Fi enabled mobile devices, but importantly because it is provisioned by the 3rd Generation Partnership Project (3GPP), which is a mobile communications industry collaboration leading and driving the development of mobile communications standards [12], [13]. This opens the way for bringing Wi-Fi localization technology out of the lab or small and medium scale deployments out in the field into large scale application scenarios for complementing and under circumstances improving the accuracy of 3GPP standardized cellular localization methods.
There is a number of relevant surveys, including [11], [14]- [24], that focus on a subset of the above areas and/or partially cover the related methods and algorithms. A general survey of techniques employing time, angle, and signal strength measurements is presented in [11] and [14]- [17]. Brief reviews on enabling wireless technologies for localization are available in [15] and [18], and an overview of localization methods is presented in [11], [14], [16], and [19]. Theoretical analysis on principles of time-based localization and Non-Line-of-Sight (NLoS) mitigation algorithms for different wireless systems including but not limited to cellular networks was conducted in [16].
In [18], commercial indoor positioning systems based on different technologies, e.g., infrared, ultrasound, Radio Frequency IDentification (RFID), WLAN, Bluetooth, Ultra-Wide Band (UWB), magnetic, vision-based, and audible sound, were discussed along with their corresponding architectures and localization methods. This work also provided evaluation criteria for indoor positioning systems, and the comparisons among the commercial systems were conducted in terms of security, cost, accuracy and precision, robustness, user preference, commercial availability, and limitations. Dardari et al. conducted a comprehensive review on indoor tracking methods including Bayesian filtering, distributed and cooperative tracking, fingerprinting, Simultaneous Localization and Mapping (SLAM), and data fusion [15]. In [11], challenges and pitfalls of each localization system based on radio frequency and inertial sensors were analyzed, and the authors explored recent applications of learning algorithms to localization.
Due to the attractiveness of WLAN-based localization, emerging fingerprinting methods for indoor localization were reviewed in the survey papers [11], [20]- [22]. He and Chan, [20] provided a comprehensive overview of signal strength fingerprint-based methods and a detailed survey on advanced methods for indoor localization with the focus on efficient system deployment. Solutions that rely on fingerprints for outdoor localization are covered in [21], considering not only signal strength measurements from WLAN but also data gathered from other available sensors, like accelerometer, microphone, compass, and even daily patterns of usage, to identify unique signatures that can locate a device. The work in [22] focused on localization methods with the use of measurements available on smartphones. Deterministic and probabilistic signal strength fingerprint matching algorithms for WLAN-based localization were discussed, and approaches for mitigation of signal strength changes were reviewed in [20]. Authors also provided a brief survey on Bluetooth beacon positioning, magnetic-field fingerprinting, and mapaided methods, which are complementary to signal strength fingerprinting. Another focus of [20] is in-depth reviews on Pedestrian Dead-Reckoning (PDR) algorithms, classified into walk detection and step counting, step length estimation, and walking direction estimation.
Reviews on localization methods in WSN carried out in the works [14], [17], [23], [24]. Mao et al. [17] conducted a general survey on localization for static WSN, and rangefree and distance-based algorithms operating in a multi-hop fashion were discussed. Features and localization algorithms for mobile WSN were reviewed in [14]. Han et al. [24] overviewed range-based and range-free localization algorithms for mobile WSN and provided a comparative survey with respect to mobility models and path planning schemes. Theoretical foundations on cooperative localization algorithms for WSN were established in [23].
With respect to the above literature, the objective of this survey is to provide a timely and comprehensive overview of enabling technologies for localization, tracking, and navigation 1 in wireless networks. The key contributions include: • Overview of industrial and commercial systems developed by technology vendors and/or network providers, in addition to academic solutions, so that readers are educated on both state-of-the-art research and wellestablished solutions used in practice. • Discussion of 5G localization and related challenges as a promising research direction. • Discussion of advanced solutions for range-free localization in WSN where multi-hop paths for node pairs experience different local properties, referred to as network anisotropy. • Description of solutions for vertical positioning in cellular and WLAN networks, with or without the aid of sensors, to achieve the vision for accurate and reliable 3D location. • Presentation of mobility state estimation methods in cellular networks for improving the performance and user experience in tracking and navigation applications. • Discussion of solutions for mapping the indoor physical space to cover in addition ongoing work in standardization bodies as well as map-related challenges, such as indoor space modeling, privacy, security, and map representation issues beyond simultaneous localization and mapping systems presented in existing surveys. In particular, we discuss solutions and algorithms in the following areas: • Cellular network localization: Recent academic systems that rely mostly on Global System for Mobile communication (GSM) networks [25]- [28], commercial location platforms and industry geolocation solutions intended for 2G/3G/4G networks [29]- [38], as well as new research directions and recent results in 5G localization [39]- [56]. • WLAN-based localization: Web and mobile geolocation services that are based either on private [57]- [59] or publicly available Wi-Fi databases [60]- [62], state-ofthe-art academic solutions that rely on Wi-Fi technology either standalone or in combination with inertial sensors and/or statistical filtering while addressing the challenges of the fingerprint-based approach [63]- [88], solutions that follow the lateration approach [89]- [93], and systems 1 The terms localization and positioning are used interchangeably to denote the process of determining the location (position) of a device; tracking is the process of monitoring moving users or objects that is not merely the result of sequential localization, but more advanced spatio-temporal processing; navigation is the process of monitoring and guiding a user/device from one place to another. leveraging on crowdsourcing to construct radio signal databases for localization [94]- [113]. • Range-free localization in WSN: Solutions exploiting connectivity information between radio nodes (devices), rather than energy prohibitive distance (range) measurements, which are a viable option for localization in IoT use cases [114]- [130]. • Data fusion: Techniques for combining diverse types of location-dependent measurements that are applicable in wireless networks [131]- [137]. • Vertical positioning: Cellular-based solutions in academic and industrial research [138]- [143] as well as commercial 3D location systems [144]- [146], WLANbased approaches [147]- [153], and sensor-based solutions [154]- [165]. • Mobility state estimation: User mobility classification (i.e., static, walking, motorway, railway, etc.) using GPS location [166], signal power measurements [167]- [171] and HO information [172]- [176] in cellular networks. • Indoor mapping: Processing raster images, architectural floor plans, or photographs of evacuation plans [177]- [180], and more sophisticated SLAM approaches developed on light-weight mobile devices, e.g., smartphones and tablets, which can output both the physical indoor map and the signal map (celluar, Wi-Fi, etc.) of the building [181]- [188] or only the physical map [189]- [197]. In addition, map-related challenges are discussed including indoor space modeling [198]- [203], as well as privacy, security and map representation issues [110], [177], [204]- [206]. This survey is structured as follows. Fundamental network localization techniques are briefly discussed in Section II. Section III overviews the most typical localization architectures and presents the 4G Long Term Evolution (LTE) positioning architecture as a case study. Localization in cellular networks is surveyed in Section IV focusing on commercial 2D solutions that are extensively used by network operators and recent research results on 5G localization. WLAN-based localization systems are presented in Section V, followed by range-free localization schemes for WSN in Section VI. Data fusion techniques for combining multi-source (i.e., radio and sensors) measurements are described in Section VII. The provision of reliable vertical positioning to achieve the vision for 3D location, particularly in cellular and WLAN systems, is discussed in Section VIII. Moving to solutions for supporting tracking and navigation applications, Section IX discusses mobility state estimation in cellular networks, while recent advances in mapping the physical space in indoor environments are described in Section X. Section XI summarizes some architecture considerations related to service availability, system scalability, as well as security and privacy, followed by our outlook on the technology roadmap. Finally, future research directions are provided in Section XII.

II. NETWORK LOCALIZATION FUNDAMENTALS
Looking at the wide availability and adoption of car and pedestrian hand-held navigation systems equipped with multiconstellation satellite receivers, it is evident that GNSS is the dominant technology for outdoor localization, tracking, and navigation applications. Modern GNSS are capable of delivering localization accuracy within a few meters under ideal environmental conditions; however, they have a number of limitations. For instance, even high-end GNSS receivers have high energy requirements, which is undesirable for batterypowered devices, such as sensor nodes. Moreover, GNSS receivers suffer from high time-to-first-fix, i.e., it may take several seconds or even minutes to detect and lock enough satellite signals to determine user location when the GNSS receiver is turned on, which may be prohibitive for delaysensitive application scenarios. In addition, on top of their well known accuracy degradation in urban and indoor environments, still not all devices feature GNSS chipsets while several applications, including environmental monitoring and weather forecasting among others, do not require the high accuracy delivered by GNSS.
Due to this fact there room for the development of alternative localization systems that rely on wireless communication networks, including cellular, WLAN, and WSN. Such systems may be preferable in specific application scenarios because they either leverage on the existing network infrastructure, e.g., cellular Base Stations (BS), WLAN Access Points (AP), or the network can be cheaply and easily deployed in the target area like in the of WSNs. Moreover, network localization systems do not require the installation of dedicated and expensive hardware, e.g., custom transceivers, antennas, and cabling, or privacy-infringing equipment such as surveillance cameras.
In this section, we outline the main techniques for user localization using information about the network topology (e.g., known location of network transmitters, antenna orientation, etc.) and/or measurements from radio signals in wireless networks, followed by a brief discussion of classical algorithms for estimating location. Fundamental localization techniques are illustrated in Fig. 1 and can be categorized as follows based on the underlying location-dependent measurements.
1) Proximity: The user location is estimated as the known location of the transmitter that is associated with the usercarried equipment, i.e., Mobile Station (MS). A representative method in this category is the Cell-ID method standardized in GSM cellular systems that returns the location of the serving BS as the user location. Similar approach is followed in RFID and bluetooth-based systems where the location of the closest transmitter is assumed. A known shortcoming of this approach is that the location accuracy depends on the density of network transmitters, e.g., a few tens of meters in urban small-cell networks to several hundreds of meters in rural macro-cell network deployments.
2) Angle of Arrival: The Angle of Arrival (AOA) technique uses simple geometric relationships to estimate the user location at the intersection of lines formed by measuring the arrival angles of radio signals exchanged between the MS and multiple BSs. This technique is known as triangulation. AOA can be measured with the aid of directive antennas or antenna arrays, while a minimum of two BSs are required to determine location in 2D.
3) Signal Strength: RSS readings observed at the MS can be employed to estimate the corresponding distances from surrounding BSs through mathematical models (known as path loss models) that describe signal attenuation as a function of distance. Each distance defines a circle on which the user may reside and essentially the user location can be inferred from the intersection of circles. RSS measurements from at least three BSs are required to resolve the 2D user location unambiguously and in this case the technique is known as trilateration; the term multilateration is used if more measurements are available.
The lateration approach can be affected by the inherent inaccuracy of the path loss model, as well as NLoS conditions and multipath propagation due to signal reflection and diffraction on obstacles especially in complex urban and indoor environments. This may introduce large errors in the estimated distances, thus leading to inaccurate user location. Fingerprint matching, also known as Scene analysis, is an increasingly popular technique to address the above limitations by collecting location-tagged signal signatures (i.e., fingerprints) at known locations and storing them together with the associated location information in a database, commonly known as radiomap. Location can be determined by finding the best match between the fingerprint observed at the MS and the fingerprints in the radiomap through pattern recognition methods. In this case, higher accuracy can be attained at the expense of data collection time and effort for populating the radiomap to cover the target area. Time: Time of Arrival (TOA) can be  measured when a signal is transmitted by the MS and received  at multiple BSs to estimate the distances from the corresponding BSs by multiplication with the speed of light. Therefore,  each TOA measurement provides a circle and the lateration  approach described previously can be employed to determine  location. Alternatively, Time Difference of Arrival (TDOA) can be measured when the transmitted signal is received at multiple pairs of BSs. A TDOA measurement defines a hyperbola, instead of a circle, where the user may reside, while the foci is located at one of the two BSs. Typically, one of the BSs is taken as reference and used to obtain TDOA measurements from the remaining BSs. Contrary to TOA, the exact time of signal transmission is not required, which solves the issue of precise clock synchronization among BSs. 5) Hybrid: The above types of measurements can be used in combination to build hybrid location systems that either improve the localization accuracy compared to stand-alone systems due to the additional location-related information or offer a fall-back solution in case of lack of a specific type of measurements thus increasing system availability in challenging scenarios.

III. LOCALIZATION ARCHITECTURES
To successfully deploy an indoor positioning solution in a commercially viable way, profound consideration needs to be given on the technical system architecture. Architectural considerations are closely related to the use case to be served with the indoor positioning solution. While architectures can be classified from various perspectives, the categorization selected here is based on the entity responsible for the location estimation. The following summarizes the three most typical architectures and discusses their applicability for different use cases.

A. UE-Based Architecture
In this architectural model the mobile device, denoted User Equipment (UE), is responsible for the location estimation [207]. The device performs the location estimation using the assistance data the device receives from the network. This assistance data can, e.g., be information regarding the radio node locations. More complex variants may include, for example, spatial signal strength profiles for each radio node. UE-based model is best-suited for scenarios, where the device itself needs to be location aware and there is a large user base. An exemplary design approach is to serve per-building signal maps, also known as radiomaps, from a Content Delivery Network (CDN) for the devices that want to locate themselves in the building. The CDN-based delivery guarantees global low-cost, low-latency delivery capability with failovers for high availability. As the only network interaction required is the download of the building radiomap, the operational costs of the service are in control and do not scale unpredictably with the increasing number of service users. Also, because only assistance data is carried over the network, the security measures need not necessarily be as rigorous, because no information regarding the user location is carried between the client and the server. Especially, performing the location estimation in the device itself preserves the user privacy.

B. UE-Assisted Architecture
In this scenario the role of the device to be located is to perform measurements and provide them to the network entity for location estimation [207]. A simple example is an IoT device that measures the Wi-Fi and/or Bluetooth signal strengths and provides them to the network element via Wi-Fi or Bluetooth Mesh connectivity. The network element is responsible for the location estimation.
The UE-assisted approach is well-suited for cases in which the device to be located does not need the location information. An example of such a use case is object tracking. When tracking the movement of deliveries, the delivery itself does not need to know its whereabouts, but the location information is of interest to the process controllers. The operational costs of the UE-assisted approach are potentially much higher than those of the UE-based architecture. This is because a server transaction is required for each location event. While the amount of data transferred per event is not large, the sheer volume of the objects to be tracked may result in the net amount of transactions being very high. Moreover, the operational costs increase unpredictably as the number of transactions is related to the usage of the service and not directly to the number of users. On the other hand, the complexity and the cost of the tracking device itself can be kept low as it does not need to be able to perform any calculations or have memory for storing radiomaps.

C. Network-Based Architecture
In this approach the network itself is responsible for the measurements and the location estimation. To exemplify, a building might be equipped with Bluetooth sniffers that detect advertisement packets from Bluetooth devices. The sniffers can, e.g., estimate the distance and direction to the devices and through this process estimate their location. When many such sniffers detect the same device, the location estimate can be made even more accurate.
What makes network-based architecture appealing, is its passiveness. For example, in the previous example nothing was really required of the devices to be located except for the Bluetooth radio. However, the complexity is in deploying the sniffers as they need to be powered and require network connectivity. Moreover, as the sniffers need to detect each device to be positioned, there is an upper limit to the number of devices the system can track.

D. Case Study: LTE Positioning Architecture
In this section we outline the positioning architecture for 4G LTE networks based on the introduction of the LTE Positioning Protocol (LPP) [12] and the LTE Positioning Protocol annex (LPPa) [13] by 3GPP. In particular, LPP aims to define Assisted GNSS (A-GNSS) and cellular positioning for 4G networks, i.e., LTE and LTE-Advanced (LTE-A).  LPP is natively a control plane positioning protocol. With control plane implementations, most commonly used in emergency services, positioning messages are exchanged between the network and the UE, i.e., the LTE device to be positioned, over the signaling connection [208]. The LTE location architecture is shown in Figure 2, where the Evolved Serving Mobile Location Center (E-SMLC) is the component in charge of positioning activities, which resides at the Location Service (LCS) server. The Mobility Management Entity (MME) gives the positioning request to the E-SMLC, which then controls the UE and, possibly, LTE base stations (denoted eNodeBs or eNBs), to perform positioning [209]. Depending on the application scenario, the UE location information can then be forwarded back to a requesting LCS client through the Gateway Mobile Location Center (GMLC).
On the other hand, user plane positioning over LTE uses the data traffic link to transmit positioning information, and is enabled by the Secure User Plane Location (SUPL) protocol proposed by a group of companies in Open Mobile Alliance (OMA) [210]. SUPL 2.0 supports positioning over LTE as well as 2G and 3G networks, and provides a common user plane platform for all air interfaces. SUPL uses the data link to transmit positioning information, and is enabled by an entity called the SUPL Location Platform (SLP) at the LCS server. The SLP handles SUPL messaging, and is typically able to interface with the E-SMLC for obtaining assistance data. SUPL messages are routed over the data link via the Packet Gateway (P-GW) and the Serving Gateway (S-GW) entities, as shown in Figure 3. Different localization methods for LTE are defined in the 3GPP standard, including A-GNSS, Enhanced Cell-ID, and Observed Time Difference of Arrival (O-TDOA) in Release 9. Enhanced Cell-ID improves the accuracy compared to traditional Cell-ID method that localizes the user at the location coordinates of the BS. This is feasible using sector-cell information, where a particular Cell-ID corresponds to a directional antenna, e.g., 3-sector and 6-sector cells, that places the user inside a circular sector of 120 • or 60 • width centered at the main antenna beam. Moreover, timing information similar to Timing Advance (TA) in GSM and Round Trip Time (RTT) in Universal Mobile Telecommunications System (UMTS) networks places the user inside a ring of variable width depending on the resolution of the timing information. This can be further refined with AOA information.
O-TDOA follows the same principle as in UMTS systems where the time difference of signals transmitted by two BSs is observed (i.e., measured) at the UE, thus defining a hyperbola on which the user is located; see [211] for more details. Known issues with O-TDOA are the need for TOA measurements from multiple BSs, the interference experienced among neighboring BSs whose signals are usually weak and the serving BS, synchronization of BSs' clocks, and the fact that synchronization signals are not suitable for positioning. These issues are addressed in LTE by considering Reference Signal Time Difference (RSTD) measurements, which are based on dedicated Positioning Reference Signals (PRS). In particular, PRS are transmitted on non-overlapping sub-carriers, while no data is included in PRS sub-frames. Moreover, PRS frames are transmitted with higher power than data frames, and transmissions are muted to prevent collisions. Finally, the network BSs are synchronized and different seeds are used for the PRS random code [212].
O-TDOA is a downlink positioning method, which is inherently UE-assisted. In contrast, the Uplink Time Difference of Arrival (U-TDOA) method is also defined in LTE standard (Release 11), where the arrival time of a signal transmitted by the UE to a number of Location Measurement Units (LMU) is measured at the network side as shown in Figure 2, thus making U-TDOA a network-based method [213].
Interestingly, the LTE standard (Release 13) makes provision for additional methods, including barometric sensor, Wi-Fi, Bluetooth, and Terrestrial Beacon System (TBS) that is discussed later in Section VIII-A. In line with these developments, OMA released in 2014 the specification of a positioning protocol that builds on top of LPP referred to as OMA LPP extensions (LPPe) [214]. LPPe protocol supports positioning with the aid of Wi-Fi as well as short range nodes, e.g., Bluetooth and Bluetooth Low Energy (BLE) tags or beacons, RFID tags, Near Field Communication (NFC), etc., which can be augmented with the use of Inertial Measurement Unit (IMU) sensors integrated into the UE, such as accelerometer and gyroscope, and other sensors including magnetometer and barometer. Therefore, Wi-Fi positioning methods have become fully standardized because both the 3GPP control plane solution [215] and the OMA SUPL user plane solution [210] allow for UE-assisted and UE-based Wi-Fi based positioning, via support of the OMA LPPe protocol. For a single Wi-Fi AP, the corresponding attributes include AP location Lat/Lon or civic address, transmit power, antenna gain and coverage area among others.

E. Lessons Learned
The architecture of the localization system is directly related to the target use case and a typical classification of localization architectures is based on the entity that computes the estimated user location. Three main categories can be identified, namely UE-based, UE-assisted, and network based architectures.
In the UE-based architectural model the device performs the location estimation using location-dependent measurements available in-situ and possibly assistance data from the network (e.g., BS locations, spatial signal strength profiles, etc.). This option is preferred in application scenarios where the device itself needs location-awareness, when there is a large user (or customer) base, while the advantages include less strict security measures (because no information that can be linked to the user location, only assistance data, are exchanged with the network) and preservation of the user privacy as location is estimated directly by the device itself.
In the UE-assisted architecture, localization is performed on the network side with the aid of location-dependent measurements collected by the user device and forwarded to the appropriate network element. This architecture is usually selected for applications where the device does not need the location information, e.g., in object or asset tracking scenarios. One of the drawbacks is the amount of data that needs to be transferred to the network for handling location requests, which may become overwhelming as the number of objects/assets increases or the usage of the service (not proportional to the number of users) scales up. On the other hand, it removes the computational overhead from the device, which is highly desirable for resource-limited devices, e.g., sensor nodes in WSNs or connected IoT devices.
Network-based solutions compute the user/device location using only information available at the network side without any explicit communication with the device. This option is appealing for monitoring applications and has been traditionally used by cellular network operators because it exploits network measurements recorded as part of standard network operation without introducing additional communication overhead. On the down side, the cost of deploying extra infrastructure for offering a location service in a new area can be high; however, this is not an issue in the case of cellular networks or WLANs that are typically deployed with the main objective to deliver connectivity, while localization is an added-value built on top.

IV. CELLULAR NETWORK LOCALIZATION
Besides the need for location information to support network planning and optimization, network operators have recently identified several scenarios for monetizing on the huge volume of location data that is daily logged on the network side. These scenarios include Smart City use-cases (e.g., transit planning, traffic management, store siting, autonomous vehicles and intelligent transportation systems, public transportation optimization, and large-scale event response), public safety services (e.g., emergency response E911/E112, tracing lost children and elderly, etc.), as well as consumer and mobile gaming applications (e.g., in-shop advertisements and customer analytics, indoor PokemonGO-like applications, etc.).
In this section, we focus on solutions that compute 2D location; the more challenging 3D location calculation is discussed later in Section VIII-A. Cellular-based systems that deliver 2D location are summarized in Table I.

A. Academic Solutions
There are several efforts from the academic research community to address 2D location estimation. Some of them rely on fingerprint matching algorithms that leverage the radiomap collected prior to localization to determine location by finding the best match between the fingerprint observed by the user and the fingerprints in the radiomap; see Section V-B for more details about fingerprint matching. For instance, CellSense is a probabilistic RSS fingerprint matching location determination system for GSM phones that delivered median error of 42.43 m and 27.86 m in a rural and an urban test-bed, respectively [25]. Chakraborty et al. [26] use semi-supervised and unsupervised machine learning techniques to reduce or eliminate the effort to collect location-tagged measurement data and report sub-100 m median localization accuracy with very little or no location-tagged data in a GSM network. CAPS (Cell-ID Aided Positioning System) uses a cell-ID sequence matching technique to estimate current position based on the history of cell-ID and GPS position sequences that match the current cell-ID sequence [27], while the reported error ranges from 31.0 m to 72.3 m for two mobile phones across four different routes through AT&T's GSM network and Verizon's Code Division Multiple Access (CDMA) network. The CTrack system uses a two-pass Hidden Markov Model (HMM) that sequences cellular GSM fingerprints directly without converting them to geographic coordinates (i.e., through a fingerprint matching algorithm), and fuses data from low-energy sensors available on most commodity smartphones, including accelerometers (to detect movement) and magnetic compasses (to detect turns) [28]. Authors report median error of 45 m, however, CTrack employs data from the smartphone's sensors which does not allow the solution to be applied directly for network-based localization before LPPe protocol is fully deployed by network operators.
Another major limitation of the solutions presented in [25]- [28] is that they require a high number of BSs to be present in the observation. This is the reason for considering mostly GSM networks, where the observations may contain RSS measurements from up to seven cells (i.e., the serving and six stronger neighbor cells). However, the situation is very different in 3G and 4G networks. In fact, researchers at Sprint, U.S.A. report that in more than 50% of their network observations through Sprint's commercial CDMA2000 network contain only one BS [29]. To address this challenge, authors employ information such as the distance to the BS, location of neighboring BSs, and levels of interference and noise into a Bayesian-based method that improves the standardized Cell-ID method (enhanced with RTT measurements) by 20%. The low number of BSs is confirmed by researchers at Alcatel-Lucent, U.S.A. who report that most of the observations in a 4G LTE commercial network contain only signal strength information from the serving cell and in some cases (depending on the network event that generated the measurement) one additional signal strength value from the strongest neighbor cell [30]. A machine learning solution is presented based on supervised training of Random Forest with labeled drive-test data to learn the signal strength values at different locations, combined with particle filter-based HMM to perform user tracking with network Measurement Reports (MR). The median error is 20 m to 25 m depending on the ratio of training to testing data.

B. Commercial Solutions
This implies that there might be a mismatch between academic research and what is observed in practice. Therefore, in the following we overview some commercially proven solutions that are currently used by network operators. In fact, the market of location platforms is dominated by some key players, including Comtech Telecommunications (former TeleCommunication Systems -TCS) with their wide range of Location-Based Services (LBS) products [31] and Ericsson with their Mobile Positioning System that supports complementary positioning methods for 2G, 3G and 4G/LTE networks [32]. They both sit in very strong positions based on their traditional place in this market and very strong market share [216].
Other companies that provide LBS platforms to network operators include Viavi, Netscout, and Groundhog Technologies. Viavi (former Arieso, and later JDSU) offers the ariesoGEO platform, which uses proprietary methods to geolocate and analyze billions of events per day, while the platform supports a wide range of infrastructure vendors and cellular access technologies [33]. In a recent field trial over a 2-week period more than 5 million voice and data calls were localized by ariesoGEO with 100 m accuracy, which is suitable for identifying hotspots to aid microcell and in-building system deployment. Netscout (former Newfield Wireless) offers the TrueCall platform featuring a geolocation engine that includes a multi-step algorithm to derive locations based on the sector (antenna) database and network measurements [34]. Their algorithm is weighted by timing information, signal strength, and uses multiple sectors to increase accuracy. Using this platform, LTE operators can assess for example channel quality index of their LTE network. As a former MIT Media Lab spin-off, Groundhog Technologies launched its mobility intelligence platform based on chaos theory and multi-dimensional modeling [35]. The application of chaos theory gave rise to the company's mathematical models of subscribers' mobility and usage behavior, which can be used for different applications such as by mobile operators to optimize networks according to the user demands.
TruePosition (merged with Skyhook) follows an U-TDOA approach to determine location based on the time it takes a signal to travel from a mobile phone to a number of LMUs [36]. They devised a method that utilizes High-Speed Uplink Packet Access (HSUPA) sessions to establish a higher handset transmit power, emulating the transmit power of a powered-up voice call placing a E911 call. In addition, their hybrid Assisted GPS (A-GPS) and U-TDOA algorithm takes both locations and returns the location that has lower uncertainty. According to field trials, the U-TDOA solution achieves positioning error of 57.1 m @67% 2 in urban and 28.4 m @67% in suburban areas, whereas the hybrid solution achieves 48.8 m@67% in urban and 20.5 m @67% in suburban areas, respectively [217].
Advanced Forward Link Trilateration (AFLT) is a solution proposed by Qualcomm that uses ranging to multiple cell towers, which is also relevant to the O-TDOA location solution. Their hybrid A-GPS/AFLT location solution takes advantage of the complementary nature of the GPS satellite constellation and the terrestrial wireless network. Based on extensive field tests for localizing indoor E911 calls, the hybrid solution delivered 155.8 m, 226.8 m, 75.1 m, and 48.5 m@67% in dense urban, urban, suburban, and rural areas, respectively [4]. Obviously, this solution works best in suburban and rural areas, where satellite signals are not severely obscured.
The Radio Frequency Pattern Matching (RFPM) technology developed by Polaris Wireless uses radio frequency fingerprint matching to compare mobile measurements (RSS values, signal-to-interference ratios, time delays, etc.) against a geo-referenced database of the mobile operator's radio environment [37]. During the same performance evaluation with Qualcomm's solution, RFPM was reported to attain 116.7 m, 198.4 m, 232.1 m, and 575.7 m@67% in dense urban, urban, suburban, and rural areas, respectively [4].
Finally, Glopos offers a software-only solution for positioning, which estimates location based on signals and network parameters from cellular BSs [38]. This technology uses selflearning probabilistic models to estimate positions based on these data, referred to as Intelligent Probability Hierarchy (IPH), models of cell area and shape, and integration of signal data from neighboring cells when available. In the original network-assisted solution, all of the data can be crowdsourced, and when it is available on the phone, the system can run without Internet connectivity. In principle, however, the solution can also be used for network-side positioning of mobile devices. Glopos technology demonstrated 6 to 13 m accuracy during field trials in the Grand Gateway 66 Mall, Shanghai, China using different cellular radio access technologies [218].

C. 5G Localization
Unlike conventional macro-cell cellular networks, 5G wireless networks will provide significantly beneficial foundation for mobile localization. In past years, various techniques based on TOA, TDOA, AOA, and signal strength have been considered for localization in cellular networks [219]. However, their accuracy was heavily limited to hundreds and tens of meters due to the severe channel impairments (i.e., multipath, shadowing, NLoS propagation) between the BS and mobile and insufficient bandwidth and received signal power.
Location-awareness in 5G can be used for various applications, including content prefetching, radio environment maps, proactive radio resource management, routing in the backhaul, and cognitive localization and prediction [39]. Dammann et al. [40] discuss the prospects of positioning with respect to technologies envisaged for 5G communication systems, including higher carrier frequencies, higher signal bandwidths, denser networks, MIMO technologies, etc., which will significantly increase the pseudorange estimation accuracy for signal propagation delay based positioning methods like TDOA. Waveform optimization is explored in [41] for positioning in 5G based on signal propagation delay estimation in the uplink case. In this parametric waveform approach, a scalar parameter is provided for controlling the distribution of the available signal power over the spectrum. The prospects and enabling technologies for high-efficiency device localization in 5G ultra-dense networks are discussed in [39].
Key features of 5G wireless networks consist of small-cell, Device-to-Device (D2D) communication, Heterogeneous Networks (Het-Net), massive MIMO, millimeter-wave (mmWave) communication with highly directive transmission, which have the potential to enable centimeter-level accuracy localization systems; yet, as this is a new area of study, it remains to be verified through measurement campaigns.
To date, the potential for such high accuracy has been demonstrated mostly through simulations in several recent works. For instance, the feasibility of 5G signal for autonomous driving scenario was studied in [42], and showed that accuracy below 30 cm can be obtained with current 50 or 100 MHz system bandwidth. In [43], it is reported that accurate positioning performance can be achieved in cmWavebased 5G ultra-dense networks under the time-varying clock errors by continuously estimating the clock parameters. The compressive sensing approach of [44] can facilitate such high accuracy in mmWave channels, as demonstrated through simulations. Centimeter-level accuracy was reported in [45] using mmWave MIMO channel measurements obtained by a vector network analyzer.
1) mmWave and Massive MIMO: In 3GPP Release 9, PRS was defined so that the TOA can be measured between BS and mobile [46]. Many mobile device and service providers are investigating the localization performance with PRS. However, the performance is still significantly limited by the bandwidth and the NLoS propagation. In 5G networks, the small-cell and D2D transmission will prevail to make the distance between the BS and mobile shorter. It is noted that the shorter distance will increase the Line-of-Sight (LoS) probability. In addition, the increasing bandwidth of the wireless networks will clearly make the localization more accurate.
Highly directive transmission with mmWave and massive MIMO is also another important feature that will make TOA and AOA more reliable localization measurements [47]. In particular, mmWave transmission is occupying the bandwidth up to 2 GHz with center frequencies around 20 GHz and above. Considering that the currently available UWB technology is using the bandwidth around 500 MHz and 1 Ghz, the large bandwidth of mmWave is a clear advantage. This will also be the case with RSS measurements. For instance, Savic and Larsson [48] investigate fingerprint matching techniques based on vectors of RSS measurements in a massive MIMO system.
Talvitie et al. [44] exploit the sparsity of the mmWave channel, and employ a compressive sensing approach with iterative refinement steps for accurate estimation of the channel parameters, including the departure and arrival angles as well as the time-of-arrival for each observed propagation path, that can facilitate such high accuracy. A site-specific propagation model is used for indoor localization in 5G to exploit multipath in mmWave MIMO channels [45]. Along the same line, Channel-SLAM is a recursive Bayesian filtering approach that treats multipath components as signals emitted from virtual transmitters, thus leading to a more accurate position estimate or enabling positioning when the number of physical transmitters is insufficient [49].
In the 5G mmWave systems where high speed mobility scenario and massive MIMO are widely considered, the angle estimation can be actively exploited in the localization. Howerver, without knowing the orientation of devices containing the antenna array, the direction of arrival or departure can be insufficient information to know the true direction of signal in three dimensional space. There exists some works looking into the estimation of the orientation of devices in 5G scenario. The performance bound on position and orientation estimation is derived in [50] and the potential advantage of 5G mmWave system was discovered. Guerra et al. [51], investigate the localization and orientation performance limits of networks employing wideband massive arrays both at receiving and transmitting devices for enabling mobile terminal localization using only one single AP. In addition, Abu-Shaban et al. shows that the 5G mmWave system provides sub-meter localization capacity by computing Cramer-Rao lower bound in [52].
2) Cooperative Localization: In the recent localization literature, a special emphasis is made on cooperative localization. Although it has been recently spotlighted in [220] and [23] and experimentally evaluated in [221], network cooperation in the distributed estimation has been already long studied in [222]. Localization in cooperative ad-hoc network has been shown in [223] where the TOA and AOA measurements are combined in a distributed estimation framework. The locations of nodes were updated iteratively by means of extended Kalman filter with an optimal information fusion technique. In addition, nonparametric belief propagation has appeared for localization in sensor networks in [224].
A cooperative fingerprint matching localization algorithm was shown through computer simulations to significantly improve accuracy in LTE networks [225]. It was assumed that the UE uses signal strength, i.e., RSRP in LTE systems, TA measurements for UE-eNodeB connections and RTT measurements for UE-UE connections. The performance gain is mainly because fingerprint matching is independent of LoS links and performs very well in rich multipath and NLoS environments.
Another cooperative positioning algorithm addresses the hearability problem in modern cellular networks (i.e., the mobile device can only utilize the estimated distance from its home BS) [53]. The distances from neighboring users can be estimated through RSS ranging from D2D communication and then all distance estimates are forwarded to a processing unit for centralized position estimation.
A Sum-Product Algorithm over Wireless Networks (SPAWN) was shown in [220] to localize nodes in a distributed manner with the cooperation between nodes by exchanging messages containing probability density functions, and achieves significant performance gains compared to non-cooperative algorithms. In [54], cooperative selflocalization and distributed tracking have been combined to localize multiple agents including non-cooperative objects. In general, Bayesian belief propagation for cooperative localization suffers from the complexity arising from exchanging messages. Meyer et al. [55] showed a sigma point belief propagation by which a low-complexity approximation can be achieved.
Recently, a cooperative localization strategy via a distributed optimization technique known as the Alternating Direction Method of Multipliers (ADMM) was introduced in [56]. In this work, the message passing algorithm is implemented in the form of ADMM, and the paper demonstrates how such scheme can be used in the cooperative driving scenario. It provides a practical, in terms of complexity, solution compared to the optimal SPAWN under the 5G autonomous driving scenarios.
Dammann et al. [40] investigate the capability of D2D communication to enable cooperative positioning in 5G for scenarios with high UE density and demonstrate through simulations for an exemplary environment that when the density is greater than 1,100 UEs per square kilometer sub-meter positioning accuracy with outage probabilities converging to zero can be achieved.

D. Lessons Learned
Academic projects for cellular network localization usually rely on signal strength measurements from a sufficiently large number of BSs. This is mainly due to the lack of access to real cellular data from network operators that necessitates the use of applications installed on the user device for collecting network data. Such data is limited to the signal strength values and BS identities, while more that one or two BSs are commonly available only for older generation GSM networks, but not for 3G/4G. On the other hand, commercial solutions developed by network operators or LBS platform vendor feature advanced capabilities and offer higher accuracy, due to the access to real network data including time, angle, and signal strength measurements, and are typically applicable to a much broader range of cellular networks.
Especially in the context of the upcoming commercial deployments of 5G networks, significantly higher localization accuracy is anticipated. This is due to higher carrier frequencies and signal bandwidth, network densification with the proliferation of small-cell installations, mmWave and massive MIMO technologies, as well as opportunities for D2D communication that enables cooperative localization. Recent results in this important new research area have demonstrated the potential for centimeter-level accuracy.

V. WLAN-BASED LOCALIZATION
With the widespread deployment of WLANs and Wi-Fi equipped devices, Wi-Fi positioning emerges as a promising location solution in areas covered by Wi-Fi signals (especially indoors). As WLANs are typically uncoordinated and deployed for individual purposes in the open industrial, scientific, and medical spectrum bands, timing information (i.e., TOA, TDOA) is rarely provided. Moreover, due to NLoS conditions and signal fluctuation, which stem from the presence and appearance/disappearance of obstacles, radio interference, and noise, the majority of Wi-Fi positioning solutions are based on RSS information. Table II summarizes WLAN-based localization and tracking solutions.

A. Geometric Approach
Originally, location-tagged vectors of RSS measurements (i.e., fingerprints) were used to build a database with approximate AP locations. The main idea is that using the RSS values of a specific AP observed at known locations, the location of the AP itself is approximated as the weighted centroid of those measured locations, e.g., a strong RSS value at location A would pull the location of the AP closer to location A, compared to a weaker RSS value of the same AP observed at another location B. Using the database of approximate AP locations, the unknown user location can be estimated as the centroid of the (approximate) locations of the APs contained in the measured fingerprint weighted by the corresponding RSS values.
This approach is followed by Google Maps Geolocation API, while the database is built using GPS-tagged RSS observations collected by survey cars during the imagery data collection for Google Street View project [57]. Companies including Skyhook and Navizon use the same approach. The Skyhook positioning system is a metropolitan-wide location determination system, which combines Wi-Fi based positioning system as discussed above, with cellular based positioning system, GPS and accelerometer information in order to quickly deliver accurate and reliable location information [58]. Fingerprint data are collected by professionals using a large fleet of survey cars. Apple and Samsung are using Skyhook as a location service provider. On a different line, Navizon's global positioning relies on its own global AP database with known geographic location, which is assembled and maintained by a worldwide community of over 1.2 million users [59]. This database covers most urban and sub-urban areas around the world. Navizon licenses access to the database, including location lookup or to third parties, e.g., carriers who may choose to deploy a Wi-Fi location solution. There are also some open public AP databases updated using data collected by volunteers, which provide location information to user requests through dedicated APIs. Such public databases include Wigle [60], Mozilla Location Service [61], and OpenCellID [62] which typically include locations of both cellular BSs and Wi-Fi APs computed from data collected with smartphone logging applications.

B. Fingerprint Matching
Instead of using RSS fingerprints to infer AP locations, another approach is to store those fingerprints as raw data in a database and then employ sophisticated pattern recognition algorithms to determine user location given the observed RSS values. This approach is commonly known as Wi-Fi fingerprint matching and has attracted interest of the research community after the seminal work of Bahl and Padmanabhan at Microsoft Research who introduced the RADAR location and tracking system in 2000 [63]. Since then, several research teams around the world developed work in Wi-Fi fingerprinting, and a very large number of papers have been presented at scientific events or published by scholarly journals.
The basic operation of Wi-Fi fingerprint matching is as follows. In the offline phase, a radiomap is constructed with a set of RSS fingerprints measured at Reference Points (RP), i.e., points with known coordinates, either in a global coordinate system compatible with GNSS solutions or a local Cartesian system. In the online phase, a user's RSS fingerprint is compared with those in the radiomap, and the location of the user is determined as the closest RP (or combination of closest RPs) in signal space.
Deterministic [63], [64] and probabilistic methods [65], [66] can be used for estimating the position, both with advantages and drawbacks in terms of complexity and accuracy of the position estimates. In deterministic methods, the user's RSS fingerprint is compared directly with each one of the fingerprints in the radiomap by using a similarity function (e.g., the Euclidean distance in the signal space). In the 1-Nearest Neighbour method (1-NN) the position where the most similar fingerprint in the radiomap was collected is assumed to be the best position estimate. Alternatively, in the k-Nearest Neighbour (k-NN) the k most similar fingerprints can be used to estimate the position as the (weighted) centroid of the corresponding positions. It has also been shown that using different functions and methods to compare the user's RSS fingerprint with those in the radiomap has a significant impact on the positioning accuracy [67]- [69]. In [68], several alternatives for matching fingerprints are evaluated in the context of a GSM-based underground positioning system. In this case, the Euclidean distance proved to be the function leading to a better accuracy. In a different context, that of a multi-building, multifloor environment, [69] reports the results of an exhaustive study involving 53 alternative functions to measure the similarity between Wi-Fi fingerprints. Using k-NN, the authors concluded that the Sørensen distance function with k = 13 leads to the best results.
With probabilistic methods, the position is estimated by computing the probability of receiving the measured RSS value at each location based on the distribution of RSS values across the operational area. During the offline phase, the histogram of RSS values measured from each AP at each RP is used to model the probability of observing a given RSS value at each location. In the online phase, the estimated position is obtained as the RP where the probability of observing the measured RSS fingerprint is the highest [66]. With this method, accuracy of 0.6 m (average error) has been reported in [66], with the advantage of being computationally lightweight.
Creating radiomaps for large buildings is a tedious and time-consuming task, as many fingerprints need to be collected manually over a large set of RPs. Moreover, it is known that the denser the set of RPs and the larger the number of fingerprints collected at each RP, the better the accuracy of the position estimates, which calls for even harder work in building the radiomaps [70], [71]. Kim et al. [70] performed an experimental study where they measured the accuracy of a positioning system for several densities of the radiomap, namely by considering a different number of fingerprints collected at each RP. They concluded that accuracy increases significantly as more fingerprints are collected. An evaluation on the impact of the radiomap density in the accuracy of the positioning system, using four different methods, was also performed in [71]. The key finding was that a reduction on the density of RPs always results in an accuracy degradation, while also showing that some methods are more robust than others to a decrease of the radiomap density.
In practice, the radiomap becomes outdated by the RSS variation [72], and poor results are obtained. This phenomenon is known to be incurred by the changes of environmental factors (e.g., humidity, people movement, door/window open/close, etc.), heterogeneous device types, and device statues (e.g., hand-held, carried in pocket or bag, device orientation, etc.) in the on/offline phases [73], and cell breathing (i.e., dynamic transmit power control for network operation reasons [226]). Moreover, RSS readings can be affected by RF interference from other electronic devices, e.g., microwave ovens or cordless phones operating on the same frequency, which calls for robust fingerprint matching algorithms, as discussed in [67].
Given the effort associated with creating and maintaining radiomaps, solutions based on crowdsourcing (discussed later in this section), SLAM (discussed in Section X), as well as parametric and non-parametric models have been proposed [71], [106]. In particular, radiomaps based on models can be obtained from a much sparser set of RSS fingerprints and have the potential of reducing the computation effort in estimating the position. While model-based radiomaps usually result in accuracy degradation, models based on non-parametric Gaussian Process (GP) regression have been reported to provide better accuracy results than the traditional radiomap construction based solely on data collection [71].
Based on the empirical observation that RSS variation due to device heterogeneity follows linearity, the work in [74] proposes a device calibration step between the offline and online phases to create a mapping in signal space. However, this approach requires numerous RSS samples at several known locations for a new mapping device, which is a labor-intensive task, and it brings lack of device compatibility. To avoid the inconvenience of data collection and to increase compatibility, a histogram-based approach is proposed in [75]. This approach exploits RSS histograms and does not require location information at which the measurements are obtained. Hence, it allows a user to perform calibration (repeating every 10 seconds) while positioning. Other techniques manipulate the absolute RSS values to compute differences or ratios of RSS values from Wi-Fi APs within the original RSS fingerprint, which are shown to mitigate the effect of device diversity; see [76], and references therein for an overview and evaluation of such techniques with real-life data.
Contrary to the idea that device diversity results mainly from the use of different Wi-Fi chipsets, different antenna design and placement within the mobile devices, and even different device drivers and operating systems [20], [77], recent results reported in [78] suggest that noise and fast fading have a significant impact on the measured RSS values. Experimental results show that RSS measurements taken simultaneously, at the same location, by a set of similar Wi-Fi devices, are poorly correlated.
Wi-Dist is an indoor localization framework that fuses noisy fingerprints with uncertain mutual distances given by their bounds. It achieves low errors by a convex-optimization formulation, which jointly considers distance bounds and only the first two moments of measured fingerprint signals [79].
Experimental results indicate that Wi-Dist achieves significantly better accuracy than other state-of-the-art schemes (often by more than 40%).
Due to the low reliability of RSS, Luo et al. [80] present a robust and efficient model for integrating human-centric collaboration to improve the accuracy of a baseline Wi-Fi system by collecting both positive and negative feedback from users on their estimated locations. The model is robust with respect to malicious feedback, quickly self-correcting based on subsequent helpful feedback from users. More advanced solutions complement Wi-Fi fingerprint matching with Bayesian filters such as Kalman and particle filters to improve accuracy. For instance, the system presented in [81] establishes a Bayesianrule based objective function and then applies the particle swarm optimization technique to identify the optimal solution (i.e., estimated location). Subsequently, the Kalman filter is used to update the initial location and track the mobile user, thus mitigating the estimation error. Other systems employ motion sensors together with Bayesian filters [73], [82], [83]. In these solutions, the device location is predicted from motion sensor readings, and the predicted location is updated with Wi-Fi fingerprint matching results. The location prediction can be done with discrete Markov chain models (e.g., random walk). However, in the presence of RSS variation, the fingerprint matching results contain large errors, and the filter output will diverge (has an error monotonically increasing) after a few iterations. This problem is interpreted as the model mismatch in the sense of Bayesian inference [83], [84]. Although the aforementioned calibration methods can be a good solution for the model mismatch mitigation, it is still vulnerable to the sudden change of the RSS variation characteristics.
One major challenge in fingerprint-based systems is modeling the statistics of the errors, i.e., in estimating the error associated to each position estimate. While some progress has been reported recently, error estimation still needs further investigation [85], [86]. Berkvens et al. [85] propose a conditional entropy metric as a dynamic measure of the uncertainty associated to each position estimate, and conclude that a low value of the conditional entropy is highly correlated with small positioning errors, while high values of the conditional entropy are associated to both small and large errors. Aiming to dynamically estimate the positioning error, an extensive analysis was performed in [86] for the causes of large errors in Wi-Fi fingerprint matching, using both simulation and realworld data. In this work, the authors concluded that some of the causes for large errors are related to the geometry of the space and access points placement, and that, in the real world, it is quite difficult to estimate the error associated to each position estimate.
The work in [82] proposes a non-parametric information filter to adaptively compute the reliability (or uncertainty) of Wi-Fi fingerprint matching results. Also, RP selection, AP selection, and outlier detection are introduced to avoid divergence due to RSS variation. More specifically, in this approach, only RPs close to the predicted location are selected, and APs whose RSS observations are stable are selected for Wi-Fi fingerprint matching. Such selections enforce the proximity constraint and are effective to prevent large errors. Outlier detection, checks if the fingerprint positioning results fall within an acceptable level. If not, the results are discarded. Similar idea is also presented in [83] where the Wi-Fi fingerprint matching is formulated as a compressive sensing problem. Although these approaches improve robustness to the RSS variation, strong belief in the sensors make them vulnerable to sensor biases.
The Peak-based Wi-Fi Fingerprinting (PWF) approach proposed in [73] exploits a time sequence of RSS observations to correct the location estimate by capturing an inherent local property. When a user moves towards a specific AP, the RSS value of the AP is increasing under the low-level noise assumption. Based on this fact, the PWF approach adjusts the location estimate by selecting a RP at which RSS peak is detected. This approach provides reliable results when the user moves along a corridor, but it may fail in rooms where RSS levels are similar and when RSS measurements are missing. Besides, since the computation of particle weight is simply computed to be proportional to the Euclidean distance between the particle and fingerprint matching result, any robustness to the RSS variation is not supported in the tracking framework.
Recent work in [72] introduces a Bayesian framework for simultaneous user tracking and mitigation of RSS variation. This approach is based on a time-varying RSS variation model and a spatial correlation-based channel estimation [227], which can be done by GP [228]. While other works assume a constant RSS variation for all the APs, the solution in [72] presumes that the RSS variation for each AP varies independently in order to consider different propagation conditions over time and space. By estimating the contributions of random effects in measurements, this approach enables higher accuracy and improved robustness to the RSS variation with a small number of APs. However, it also has a definite weakness under severe RSS variation conditions, which is common for the modelbased approaches.
Yiu et al. [87] address the requirement of Wi-Fi fingerprint matching for an up-to-date database to achieve best accuracy by using GP for modeling RSS values and creating the radiomap based on few training data. In particular, a parametric pathloss model for the GP mean and a flexible non-parametric covariance function are used to get reliable estimates with low data collection effort. Experimental results suggest that with 23 RPs the proposed solution performs equally well as traditional fingerprint matching with over 230 RPs for an office space of 2500 m 2 . Along the same line to address the cost of constructing the radiomap in the offline phase, a new empirical propagation model called Regional Propagation Model (RPM) is used in [88]. The proposed system first collects sparse fingerprints at some certain RPs followed by an affinity propagation clustering algorithm, which operates on the sparse fingerprints to automatically divide the whole scenario into several clusters or sub-regions. The parameters of the RPM are obtained in the next step and are further used to recover the entire fingerprint database.

C. Lateration Approach
Even though the vast majority of modern WLAN-based positioning systems rely on fingerprint matching to compute the user location especially in indoor environments, in principle the lateration approach is also applicable. In fact, several early systems tested indoor positioning solutions based on lateration; for example, the RADAR system combines empirical RSS measurements with signal propagation modelling to determine user location and compares the performance against fingerprint matching [63].
The inputs for lateration can be either TOA/TDOA or RSS measurements. For instance, TOA/TDOA can be measured using different signalling techniques such as Direct Sequence Spread Spectrum (DSSS) as described in [89], where Li et al. analyze the performance of geolocation systems for DSSS and OFDM WLANs and compare them in terms of symbol synchronization performance. However, timing measurements are hard to obtain because precise synchronization is required, while the multipath propagation and NLoS conditions indoors due to walls and obstacles introduce high inaccuracies in timing measurements.
To this end, most of the Wi-Fi lateration solutions rely on RSS measurements. Typically, these solutions employ a signal propagation model or equivalently a path loss model that provides the signal attenuation as a function of the distance from a Wi-Fi AP. Capturing signal propagation in complex indoor environments with a model is very challenging because the signal strength at a given distance can be significantly lower than expected due to walls or people walking or even higher than expected owing to constructive reflections. Another limitation is that the exact locations of the Wi-Fi APs need to be known for the lateration algorithm which is not always the case in indoor installations, as opposed to outdoor cellular tower deployments.
Therefore, before the application of the actual lateration algorithm several research works exploit Wi-Fi data collected in the target indoor environment and attempt to estimate the AP locations and derive optimal coefficients for the propagation model in the sense of best fitting the real data. Nurminen et al. [90] use a Bayesian method for off-line estimation of the position and the path loss model parameters of a transmitter (i.e., cellular tower or Wi-Fi AP) and then test three different methods in an indoor office environment: a grid method that uses standard Monte Carlo integration, the Metropolis-Hastings algorithm, and the Iterative Reweighed Least Square algorithm. Along the same line, a Bayesian positioning algorithm based on the Rao-Blackwellized particle filter, where the parameters of the path loss model are estimated independently for each AP in addition to localizing the user, is presented in [91]. The key idea of the EZ localization system is that the RSS observations, less with GPS location at the building entrance or near a window and most from unknown indoor locations, are constrained by the physics of wireless propagation [92]. EZ models these constraints and then uses a genetic algorithm to solve them, yielding a median localization error of 2 m and 7 m in a small and a large building respectively, which is slightly worse but comparable to the Horus fingerprint matching solution [66] while avoiding the data collection effort.
Interestingly, recent research efforts investigate the combination of lateration with fingerprint matching to leverage on their strengths. For instance, the INTRI system first forms a contour consisting of all the RPs with the same signal level from an AP received by the device and then finds the device location by minimizing the distance between the position and all the contours with an optimization formulation following the spirit of trilateration [93].

D. Radiomap Construction Through Crowdsourcing
While some fingerprint matching solutions reduce the cost of constructing the radiomap through signal propagation models and sparse data collection by professionals or trained surveyors (e.g., [87] and [88]), there is an increasing trend to exploit localization data collected and shared by common people acting as volunteers. Such solutions leverage on the emerging new paradigm of crowdsourcing and the high availability of smartphone devices featuring multiple sensors [229].
Crowdsourcing is becoming increasingly popular for Wi-Fi RSS fingerprint matching systems mostly targeting indoor environments, which are also known as "organic" systems; however, the concept has been applied successfully to create radiomaps from cellular signals, including [57]- [62] discussed above, or signals from other radio sources. In fact, the Intel Place Lab project was one of the pioneering research efforts to build an indoor/outdoor localization system that relies solely on users scanning and contributing ambient radio signals from cellular, Wi-Fi, and Bluetooth beacons [94]. Place Lab built on top of early systems, like ActiveCampus [95] that introduced the concept of employing feedback from regular users for fast and accurate updating of the Wi-Fi radiomap. Hossain et al. [96] argue that feedback about a user's actual position as indicated by the user to the system, either explicitly or implicitly, greatly helps in fine-tuning an under-trained positioning system with proper filtering. Moreover, if users are well-behaved, it was shown that the participation of end-users can assist in the construction of a radiomap incrementally from scratch, while their Bluetooth-based system adapts well when the surroundings change.
Similar approach to Place Lab was followed subsequently by several systems, including Herecast [97] where users contribute by sporadically reporting their location at room level using a simple laptop application, Redpin [98] where users train the system while using it in a collaborative fashion, and more recently SmartCampusAAU [99]. More advanced solutions motivate users to involve in crowdsourcing by trying to reduce the data collection effort on the user-side. For instance, Zee enables training data for the radiomap to be crowdsourced without any explicit effort on the part of users by leveraging the inertial sensors present in the mobile devices carried by users, to track them as they traverse an indoor environment, while simultaneously performing Wi-Fi scans [100].
Recently, autonomous crowdsourcing systems have been presented that rely on Trusted Portable Navigator 3 to build and update the databases for trilateration (i.e., AP locations and propagation parameters) and fingerprint matching Wi-Fi positioning systems [101], [102]. These systems eliminate various limitations of current crowdsourcing systems such as the requirement for floor plan map or GPS, suitability to specific environments, and implementation of simple sensor-based navigation solutions.
Even though crowdsourcing is very appealing for the creation and updating of the radiomap, it brings new challenges that need to be addressed in order to deliver similar or slightly worse accuracy compared to systems where the radiomap is built rigorously by experienced professionals or trained volunteers. These challenges include i) the construction of the radiomap using data collected with heterogeneous devices, ii) determining when user input is actually required, iii) discarding erroneous data contributed either unintentionally or maliciously (also known as "polluted" data) as well as stale data, and iv) radiomap scalability, i.e., keeping the radiomap size manageable as the volume of user contributions increases.
In the face of these challenges, Park et al. [103] use Voronoi regions for reasoning about gaps in coverage (i.e., areas with low density of fingerprints) and a clustering method for identifying potentially erroneous user data. They demonstrate rapid coverage while maintaining positioning accuracy comparable to that achieved with a professionally collected radiomap. Data from various sensors, such as accelerometer and gyroscope, are used in [104] to tag more accurately the locations of the Wi-Fi RSS fingerprints collected by numerous users, while optimization algorithms along with a filtering method are employed to remove erroneous data. Moreira and Meneses [105] assess the quality of a radio map built collaboratively and propose a method to classify the credibility of individual contributions and places recognized by the system, as well as the reputation of individual users.
The Molé system allows for aggregation of fingerprints from many users and is compact enough for on-device storage, while it employs a scalable cloud-based fingerprint distribution system [106]. FreeLoc addresses radiomap construction across heterogeneous devices by employing relative, rather than absolute RSS values, and uses techniques for maintaining a single fingerprint for each location in the radiomap, irrespective of any number of uploaded data sets for a given location, thus keeping the radiomap to a reasonable size [107]. Differential, instead of absolute, RSS values have been explored for fusing crowdsourced RSS data collected with heterogeneous devices to make the resulting radiomap completely device independent [108]. The Anyplace system [109], [110] uses the concept of RSS differences for crowdsourcing the radiomap, while it guides crowdsourcers to uncovered or low fingerprint density regions through heatmaps that visualize the volume of data collected in different areas or rooms.
Even though there is still room for improvement with regards to the above challenges, crowdsourcing is a promising and viable solution for decreasing the cost of building and updating the radiomap, thus increasing the adoption of fingerprint matching systems. This is evident from the fact that several commercial fingerprint matching systems, including IndoorAtlas [111], indoo.rs [112], and Navigine [113], offer crowdsourcing as a main feature.
We note that, in the context of localization and tracking, crowdsourcing has also been explored for floor determination and sensor calibration (Section VIII) and indoor mapping of the physical space (Section X).

E. Lessons Learned
Fingerprint matching is the preferred approach for WLANbased localization in many application scenarios because the complex propagation conditions and multipath are captured in the measured RSS fingerprints spanning the area of interest. Therefore, fingerprint matching methods outperform methods based on signal propagation due to the inherent inaccuracies of the signal models in typical indoor environments. The key objective in fingerprint matching is the construction of a reliable radiomap in an efficient and cost-effective manner, while addressing challenges related heterogeneous devices, the impact of AP and RP density on the position accuracy, and keeping the size of the radiomap as low as possible.
Crowdsourcing has emerged as a promising and feasible alternative for building and updating the required radimaps. Non-participatory systems usually rely on experienced professionals for collecting radiomap data, like in the Ekahau commercial solution, or trained volunteers, like in the KAILOS academic project [230]. However, the data collection task, especially for large-scale indoor environments, is not only tedious, it also could become cost prohibitive. For instance, covering the 450,000 m 2 COEX underground shopping mall area in South Korea required 15 collectors to collect point-bypoint 200,000 Wi-Fi RSS readings at 10,000 unique locations for about two weeks. Also, a measurement campaign following the deployment of the Ekahau system can cost $10,000 for a large office building with no maintenance included. On the other hand, crowdsourcing approaches require the use of well-planned incentive strategies to engage users and motivate them to contribute their collected data, while guiding them to cover areas with sparse data. In addition, users' contributions are prone to errors and the system is vulnerable to misbehaved or malicious users, which requires the use of proper methods to identify and filter out low quality fingerprints.

VI. RANGE-FREE LOCALIZATION IN WIRELESS SENSOR NETWORKS
Multi-hop range-free localization, which uses connectivity information between radio nodes (devices), has attracted research interest in the field of WSNs for many years. The fundamental idea behind multi-hop range-free localization is to offer the location information less accurate for each node, but to achieve cost-and energy-effectiveness from the network perspective, and to have robustness to NLoS propagations. In this sense, the importance of multi-hop range-free localization is increasing nowadays with the predicted booming in IoT applications over the next years.
The multi-hop range-free localization problem has been investigated from two perspectives: 1) the problem of converting the hop counts (i.e., minimum hop counts) measured along the shortest paths between anchor-node pairs into the physical distances and performing trilateration, and 2) the graph embedding problem with the hop counts among all the nodes in a centralized manner or among neighboring nodes in a decentralized manner. A brief overview of both problems and relevant algorithms is presented in [17]. In this paper, we focus on the former problem and review emerging approaches with the focus on recent advances in the presence of network anisotropy, causing the NLoS path 4 between two nodes and deteriorating localization accuracy, which has not been analyzed in the survey papers.
Distance Vector-Hop (DV-Hop) [114] is a well-known multi-hop range-free algorithm, and due to its simplicity, many algorithms [115]- [119] have been developed by modifying the original DV-Hop algorithm. Assuming the path between any pair of nodes is linear and isotropic, in the DV-Hop algorithm an average hop progress (i.e., average size for one hop) is computed by each anchor, and the distance from an anchor to a target node is estimated by multiplying the anchor's average hop progress and their minimum hop count. Many works have focused on reducing errors of the anchors' average hop progresses in a probabilistic [115], [116] or heuristic [117] manner and from the optimization perspective [118], [119]. However, the underlying assumption does not hold in practice due to the existence of network anisotropy (e.g., non-uniform node deployments, irregular radio propagation), and significant performance degradation is observed in anisotropic networks.
The works in [120] and [121] propose location refinement algorithms to minimize the average localization error over the network under the strict constraint of one-hop or two-hop connectivity. While these approaches improve the performance of a family of the DV-Hop algorithm impressively, a large increase in communication overhead is observed due to the information exchange between neighboring nodes. Moreover, the change of network topology by radio irregularity leads to an oscillation of the location estimate. Based on a network hole detection method [122] that detects nodes at the boundaries of network holes, the Rendered Path (REP) algorithm measures the deviation angle of the shortest path and estimates their distance with the cosine rule [123]. Since higher localization accuracy can be achieved with a small number of anchors, the REP algorithm is known to be a cost-effective solution. However, this approach is vulnerable to undetected small holes, while the use of global connectivity information for hole detection is energy intensive, which is improper for energy-limited radio devices.
The work in [124] proposes the pattern-driven algorithm that classifies anchors according to the hop count thresholds, determined based on the empirical observations. This approach uses anchors within eight hops from a target node for localization. In particular, anchors which are less than four hops-apart from the node are considered reliable, and the distances to those anchors are simply computed as done in the DV-Hop algorithm. The distances to the rest of the selected anchors are estimated with the aid of a nearest reliable anchor to reduce the error.
Other reliable anchor selection algorithms are also found in recent works [125]- [127]. In the Reliable Anchor-based Localization (RAL) algorithm [125], anchors having average hop progresses larger than the minimum hop progress are classified into a reliable anchor set. The works in [126] and [127] use the concept of geometric dilution of precision to select reliable anchors. The fundamental idea of these algorithms is to test if anchors are reliable by using a closely placed anchor as a reference. Hence, these approaches may fail in sparse anchor scenarios. Moreover, the distance estimates to the reliable anchors may include large errors, which means their decisions could be wrong, but this point has been overlooked.
To deal with such drawbacks, the Reliable Anchor Pair Selection (RAPS) approach tests if the shortest path between two anchors via a target node is linear-shaped (undistorted) and selects anchor pairs passing the test [128]. After selection, the distances to the reliable anchors are estimated with the geometric approximation of the node location. The RAPS algorithm does not require any reference anchor for the reliability test; therefore, this approach is well-suited for networks with a small number of anchors.
Other geometric approximation-based distance estimators having robustness to path detour are found in [129] and [130]. The work in [129] introduces a virtual hole construction method to approximate a detoured path as an arc of a circular sector of a circular-shaped virtual hole. In [130], a Pascal's triangle model is proposed to provide location candidates and probabilities for nodes of the shortest path. As the approximation gives the possibility of the target node being placed at each location by considering the degree of path detour, an error from path detour can be effectively relieved. However, the approximation is invalid for heavily detoured paths, which can be configured in countless shapes.

A. Lessons Learned
The hop count significantly varies with configuration of the shortest path between nodes, which is determined with respect to local properties including node deployments (e.g., node density, deployment method, and geography) and channel characteristics. The path configurations of anchor pairs play references in distance estimation for an anchor-node pair. Specifically, the anchor pair information is used to transform the hop count of the anchor-node pair into the distance domain.
However, it becomes useless for anchor-node pairs due to mismatches in local properties for the anchor pairs and the anchor-node pairs in anisotropic networks. Methods of selecting good anchors on the node side have been studied to avoid the mismatches, but a fundamental requirement on anchor numbers remains a fatal flaw. In other words, they are cost prohibitive from the network operation perspective. Geometric approximations, which derive a mathematical model for each path configuration, could be alternative, but applications are limited to those paths with low uncertainties in configuration.

VII. DATA FUSION
To improve system performance in terms of reliability of the estimates (integrity), accuracy, and availability, it is appealing to process information obtained from a number of sensors by means of fusion techniques [131], [132].
Multiple data fusion solutions have already been released as commercial products. For instance, the SiRFstarV architecture by Cambridge Silicon Radio (CSR), which was acquired by Qualcomm, gathers real-time information from GPS, Galileo, GLONASS and COMPASS satellites, multiple radio systems, such as Wi-Fi and cellular, and multiple IMU sensors, like accelerometers, gyroscopes and magnetometers. It then combines this real-time information with ephemeris data, mapping, cellular BS and Wi-Fi AP location data and other cloud-based aiding information using the SiRFusion platform, which is now part of Qualcomm iZat location services [133]. Positioning error of 9 m@68% and 13.1 m@95% over several test runs in a shopping mall is reported [134].
A generic framework with different layers of fusion for tracking in radio networks is presented in [135]. In the first level, radio measurements can be combined, including TOA, TDOA, RTT, AOA, RSS, and Doppler parameter β that provides a measurement of the relative user speed. In the second level, spatial fusion of radio measurements from a sufficiently large number of transmitters takes place in the form of lateration algorithms (i.e., trilateration or multilateration depending on the number of transmitters) for TOA, TDOA and RSS, triangulation for AOA, fingerprint matching for RSS, and multi-static radar for Doppler. In the third level, information fusion involves the combined processing of multiple measurements of different modality (kind). At this level, fusion of complementary sensor data can be performed, including barometric pressure data to resolve vertical ambiguity and IMU data from the devices' onboard sensors. Such sensory data can be utilized to infer the mobility state of the user as static, walking, running, cycling, etc., as discussed in Section IX. Finally, temporal filtering of the location-dependent measurements and/or predicted user locations with the aid of an appropriate mobility model can be applied to smooth the estimated user trajectory. Well-studied Bayesian tools for such processing include the Kalman filter [131] and different variants, e.g., the position Kalman filter [136], as well as particle filters [137].
Angelis and Fischione [132] study the optimal sensor fusion of heterogeneous position related measurements and investigate the fundamental performance of linear fusion. The proposed estimator combines information coming from ranging, speed, and angular measurements, which are jointly fused by a Pareto optimization problem where the mean and the variance of the tracking error are simultaneously minimized, while assuming a very simple dynamical model for mobility.

A. Lessons Learned
Data fusion is a powerful methodology for unleashing the full potential in terms of accuracy, reliability, and robustness of localization and tracking systems in application scenarios where complementary multi-source measurement data are available. This is evident from the fact that existing commercial products, either developed on software or hardware, leverage on sophisticated fusion techniques to optimally combine satellite, terrestrial radio, and sensor data to deliver high level of accuracy. As described in [135], fusion techniques can be employed across different levels for enhancing tracking systems in wireless networks; namely at the radio measurement level (e.g., TOA, TDOA, RTT, AOA, RSS, and Doppler), at the algorithm level (e.g., proximity, triangulation, lateration, fingerprint matching), at the processing level for combining multi-modal measurements (e.g., data from inertial or environmental sensors), and finally at the post-processing level with the spatio-temporal filtering of measurements or rough location estimates by means of statistical processing or Bayesian filters.
Especially, in the context of localization in 5G, hybrid fusion of multiple radio and sensor data is attracting research interest for achieving the envisioned centimeter-level and uninterrupted positioning required, for example, by the automotive industry for enabling ITS application or emergency response services.

VIII. VERTICAL POSITIONING
Most of the 2D fingerpint matching algorithms that rely on RSS observations (either cellular or WLAN) can be easily extended to three dimensions, as long as the radiomap contains height information for every collection point across multiple floors. Many practical positioning and tracking scenarios inside real world multi-floor buildings require determination of the floor where the user or device is located. In some cases, floor determination is even necessary to identify the floor plan required by some tracking techniques, including those based on scene analysis and especially those using particle filters. Reliable detection of the floor has been identified as even more important that 2D position for safety and rescue operations as it minimizes search operations [138]. The combination of 2D positioning with floor estimation is often referred as 2.5D, 3D, or multi-floor positioning.
In 2015, NIST identified 3D geolocation as the top gap with highest priority for LBS R&D investment [231]. This highlights the importance of delivering vertical (i.e., height) location information, either in the form of absolute height value or floor label. In the following, we overview solutions that rely on cellular networks, WLAN, and sensor-based approaches using either the accelerometer or the barometer (i.e., atmospheric pressure sensor). These solutions are summarized in Table III.

A. Cellular-Based Solutions
The appeal of RSS-based floor detection solutions derives from the ubiquity of cellular and Wi-Fi networks. One of the first works to report the use of RSS fingerprints for floor detection is based on RSS data from GSM networks [138]. The proposed SkyLoc system attained accuracy around 73% in detecting the correct floor (97% within a 2 floors error margin) during an extensive evaluation performed in three tall buildings in Washington D.C., Seattle, and Toronto.
Recently, baseline positioning performance results based on the 3GPP 3D Multiple-Input-Multiple-Output (MIMO) deployment and propagation model that has been adopted in the 3GPP (Release 13) were presented in [139]. Simulation results pertaining to outdoor-only and indoor-outdoor network deployments with different number of macro cells and small cells indicate that the Cell-ID and O-TDOA methods defined for LTE are capable of meeting the FCC requirements for positioning E911 calls. Interestingly, in the indoor-outdoor simulation scenario for an 8-floor building with sufficient indoor small cells the horizontal accuracy of Cell-ID and O-TDOA is 31 m and 16 m for 90% of the tests, respectively. Moreover, the vertical accuracy of Cell-ID is within 1 m error for 99% of the tests, compared to 25% of the tests for O-TDOA. The reason for this surprising performance of Cell-ID is that all users are served by a cell in the same floor, and the difference between the antenna and the user device heights is 1 m. This implies that Cell-ID can accurately estimate the 3D user location in the case of dense small cell deployments. This is confirmed in an experimental LTE femtocell test-bed, where the Cell-ID approach is reported to achieve 1.79 m vertical error and 69.88% floor detection rate [140].
Researchers at Ericsson present a 3D location solution for LTE cellular networks that complies with the 3GPP standard [141]. Their solution is built upon the 2D Adaptive Enhanced Cell IDentity (AECID) algorithm, which measures fingerprint location information whenever high accuracy A-GPS, O-TDOA or U-TDOA position measurements occur in the LTE network [142]. The high accuracy measurements with the same fingerprint are then clustered and a 3GPP polygon that describes the boundary of the cluster is computed. AECID is extended by using altitude information from A-GPS, O-TDOA, and U-TDOA positions of opportunity to provide the radiomap with altitude-tagged fingerprinted polygons. Subsequently, a geographical shape conversion algorithm transforms the polygon with altitude information to the 3GPP point with an ellipsoidal uncertainty, since the latter format is standardized on all LTE position reporting interfaces.
A new 3D fingerprint matching scheme called Fingerprint Correlation Localization (FCL) powered by an Enhanced Nearest Neighbor Localization (ENNL) algorithm is presented in [143]. Authors employ cell matching degree and choose the best reasonable fingerprint according to the correlation of its surrounding points, while improving search efficiency across the radiomap by introducing a new searching window. Results in a 4-floor supermarket building in Chengdu, China with real network data, where the network MRs contain the RSS values from seven cells, demonstrate 60 m horizontal error and 7 m vertical error in 90% of the tests.
Regarding commercial solutions, InvisiTrack has developed signal processing techniques that evolve existing O-TDOA, with advanced multipath mitigation technology and ranging signal processing [144]. Their Positioning over LTE (PoLTE) technology uses LTE Sounding Reference Signal (SRS) for an uplink solution. In this case, no changes would be needed to the user's mobile device, but the firmware in the LTE eNB would require modifications. InvisiTrack's downlink location methods are compatible with LTE Cell-specific Reference Signals (CRS) only defined in Release 8, unlike other methods which rely on the deployment of LTE PRS defined in Release 9. If PRS is present, InvisiTrack will enhance PRS functionality, while they can use PRS exclusively or in conjunction with CRS. The reported accuracy is 1 m to 10 m (horizontal) and less than 3 m (vertical).
Nokia Siemens Networks (NSN) is also using 3D geolocation solution to enhance network planning and optimization after buying and further developing a 3D radio propagation modeling technology from Israel-based NICE Systems. Such 3D solution enables network operators to replace much traditional drive and walk testing to reduce costs and deployment time by up to 80% [145]. NextNav has commenced deployment of a nationwide network of wireless transmitters in U.S.A. to deliver a positioning service to cellular and other mobile devices in environments where GPS and other GNSS signals (e.g., GLONASS) are significantly degraded or unavailable, such as indoors or in urban areas [146]. Although, this is not a pure cellular location method it has been provisioned in LTE standard (3GPP standard, Release 13) as TBS. Thus, it has the potential to be seamlessly integrated with existing cellular methods in the future. The NextNav location system utilizes GPS-like signals transmitted by the proprietary beacon transmitters. These transmitters are deployed across a geographical area and are strategically installed on selected sites (typically on existing cell network sites) for optimizing their geometric distribution to ensure high accuracy, but with significantly lower density than cellular communication systems. Indoor accuracy during field trials across multiple buildings in dense urban, urban, suburban and rural areas was shown to be 94 m (horizontal) and 4.8 m (vertical) in 90% of the tests [3].

B. Floor Identification With WLAN
Regarding 3D location with absolute height information in WLAN, Caso et al. [147] present a novel variant of the well-known Weighted k-NN (Wk-NN) fingerprint matching algorithm that is based on frequentist, rather than Bayesian, inference and adopts a statistical metric based on Pearson's correlation. Even though several works follow the same approach, the majority of the WLAN-based solutions focus on identifying the floor where the user resides.
For example, results reported in [148] show an accuracy of 86% in detecting the correct floor using a Nearest Floor Algorithm, a simplification of the k-NN method, with Wi-Fi data. A method similar to 1-NN, combined with RF trilateration, has also been used in the work described in [149], with results showing accuracy of 100% in detecting the correct floor, although tested only on a single building, with a limited number of test samples. Accuracy of 97% in detecting the correct floor is also reported in [150], where a RSS solution based on path loss propagation model including a floor loss factor is proposed. The solution has been evaluated for Wi-Fi at 2.4 GHz and 5 GHz, and also using BLE, with the 2.4 GHz technology providing the best results.
Campos et al. [151] employ unsupervised clustering to allow the collected fingerprints to group freely in the signal strength space, without precluding -through the imposition of architectural constraints -any natural arrangement of the collected fingerprints. This is combined with majority voting committees of backpropagation artificial neural networks to deliver floor detection rates between 91% and 97%.
To reduce the computational complexity of floor determination in Wi-Fi, due to the size of the fingerprint database, Rahman et al. [152] apply a two-step process, first by rearranging the fingerprints according to unique AP listed by the fingerprint, and second by filtering the unique AP list by selecting only 'significant' APs of the building. The floor detection algorithm based on the reduced database employs Bayesian posterior probability of each floor and is reported to achieve 75% and 86% correct detection rates in two buildings. Going one step further, Locus is a heuristics based indoor localization, tracking and navigation system for multistory buildings that determines floor and location by using the locations of infrastructure points, and without the need for radiomaps [153]. Initial experimental results in an indoor space spanning 175,000 ft 2 , show that Locus can determine the floor with 99.97% accuracy and the location with an average location error of 7 m.

C. Sensor-Based Solutions
Researchers have addressed the problem of floor identification using accelerometers or barometric sensors embedded in high-end smartphones. However, as discussed in the following, due to the technical challenges in processing sensory data, most of the existing approaches rely on sensors to reliably detect floor changes and then resort to cellular or WLAN data to identify the correct floor.
It is known that atmospheric pressure decreases with height (i.e., altitude), and this dependency can be exploited for floor estimation as it is used outdoors to complement GPS altitude estimations. However, the use of barometric data indoors to estimate the floor is not straightforward [154]. Pressure variations across the same floor due to temperature variations and air flow, air conditioning systems, weather changes, errors in pressure sensors and other causes, make it difficult to associate absolute pressure values to specific floors [155], [156].
The work performed by Li et al. [155] is one of the first to perform an in-depth analysis of the potential of using barometers to estimate the height for indoor positioning. After a detailed analysis of the dynamics of pressure indoors, the authors conclude that it is impossible to accurately estimate the height using barometers in an absolute manner. To overcome this limitation, the authors propose two solutions, namely the use of reference pressure stations, and the combination of pressure data with Wi-Fi positioning (fingerprint matching or other). Evaluation of the combined pressure and Wi-Fi solution on a 6-story building resulted in 100% accuracy in detecting the correct floor.
Recent results indicate that the barometric sensor is a robust and reliable solution achieving 0.42 m vertical error and 98.8% floor detection rate [140]. However, it requires punctual calibrations to remove fluctuations due to local pressure changes. For instance, Ye et al. [160] present the Scalable Barometer Calibration (SBC) calibration algorithm to automatically calibrate barometer for a large number of smartphone users, which requires neither any infrastructure nor any human intervention, and uses smartphone barometer and accelerometer only.
The limitations of the barometer sensor are confirmed by Muralidharan et al. [157]. In this work it is reported that pressure difference can be used as a useful fingerprint to detect the exact number of floors changed with almost 100% accuracy, while pressure-based features (such as the change in pressure) enable the classification of vertical activities (such as taking escalators, stairs or elevators) with high accuracy. However, the main finding was that it is difficult to use the barometer to determine the actual floor that a user is on.
The idea of using reference pressure stations is also exploited in [156]. After recognizing that pressure data suffers from offsets and variations due to numerous causes, the authors propose a solution where a reference pressure sensor is installed in each one of the floors of the buildings to be covered. Data from the reference sensors is shared through a central server. The solution has been evaluated in three large buildings (an office building, an airport terminal, and an underground shopping mall) with accuracy results around 96% in estimating the correct floor (estimations while changing floors were not considered).
Gupta et al. [158] also consider the use of pressure data to improve a Wi-Fi RSS-based floor detection solution. In their work, pressure data is used to calculate the value of a Vertical Motion Indicator which, in turn, is used to detect when a user is moving to a different floor, blending that information with the output of the RSS-based floor estimator. Limited evaluation on a single building showed accuracy higher that 99% in detecting the correct floor. This solution requires the access to a database with the 3D position of the Wi-Fi APs. Another solution based on Wi-Fi fingerprints and pressure data has also been evaluated in the context of the IPIN 2016 Competition, and the results obtained from processing data collected independently in 4 distinct buildings show an average floor detection accuracy of 96.5% in real-world settings [159].
Detecting when a pedestrian is changing floors is also the fundamental idea in most of the solutions based on processing data from accelerometers. One common feature in many of these approaches is the recognition of activities, such as taking and elevator up or down, walking up or down stairs, and distinguishing these activities from standing still or walking across a single floor [161]- [164]. However, detecting when a pedestrian is changing floors does not solve the problem of detecting the absolute floor the user is at, and also requires the user to acquire and process data continuously. While results reported by several authors, including those referred above, point to 100% accuracy in detecting floor changes, some of these solutions do not solve the absolute floor detection problem, or require the user to manually indicate the initial floor. To this end, the F-Loc system leverages on crowdsourcing and mobile phone sensing to collect users' Wi-Fi traces and accelerometer readings for building the Wi-Fi map of the entire building though advanced clustering and cluster manipulating techniques, which can then be used for floor localization [165]. A field study in a 10-floor building shows that F-Loc achieves an accuracy of over 98%.

D. Lessons Learned
While some of the proposed solutions are showing a performance compatible with the requirements of most location-based applications, including emergency and rescue operations (accuracy higher than 95%), some challenges still need further investigation. Approaches based on the processing of data from accelerometers in smartphones are dependent on the way different individuals handle their devices for recognizing activities (such as climbing stairs). Moreover, most of the works only consider changing floors using elevators or stairs, failing to address the use of escalators or ramps which are very common in airports and shopping malls.
On the other hand, solutions based on signals of opportunity, such as cellular, Wi-Fi or Bluetooth, whether based on fingerprint matching, trilateration, or other methods, often depend on previous data collection efforts for building the radiomap. Moreover, these solutions might be of little use in disaster situations where there is a complete outage of the communication infrastructures. In this respect, the use of pressure or accelerometer data is more immune to damages to infrastructures, but solutions based only on a single type of data cannot determine the absolute floor.

IX. MOBILITY STATE ESTIMATION
Mobility State Estimation (MSE) is an integral part of cellular networks mainly for ensuring uninterrupted communication service to all users, especially those moving at high speeds. For instance, 3GPP introduces Mobility Robustness Optimization (MRO) features to its LTE self-optimization functions, which can dynamically improve the network performance of HO to provide enhanced quality of experience for the users, and increased network capacity. MRO can be done by automatically adapting cell parameters to adjust HO thresholds based on feedback of performance indicators [232]. Therefore, MSE is crucial for optimizing HOs in order to reduce call drop and network signaling flow, optimize traffic scheduling, and achieve resource optimization. MSE can also be very beneficial to transmission scheduling, mobility load balancing, channel quality indicator feedback enhancement, energy efficiency, and many resource management scenarios. One such scenario is choosing the most suitable channel dependent scheduling scheme for LTE, either frequency selective scheduling or frequency diversity scheduling according to the user speed, as demonstrated in [233]. For HO performance improvement the goal of 3GPP is not to obtain user speed estimates that are highly accurate, but rather coarsely classify the user speed into four mobility classes (i.e., 0-30 km/h, 30-60 km/h, 60-90 km/h, and ≥90 km/h).
At the same time reliable MSE is also beneficial to user tracking and navigation. For instance, if a user is determined to be static, then the output of the user tracker could "freeze" to prevent the undesirable jumping of the estimated location due to signal fluctuations or measurement errors. Alternatively, in case of high mobility the user location could be snapped to a motorway on the road network, rather than a low speed residential road. In both cases, the performance of tracking and navigation can be significantly improved.
Recent approaches employ sensory data to address MSE, including the use of similarities among the sensor data received from a pair of magnetic sensors to determine the speed of a vehicle [234]. However, they are not directly applicable in a network-based solution due to the requirement for data that are collected using external sensors, which are mounted on the vehicle/user. In the following, we overview several MSE approaches that rely on different information available at the network side to infer the mobility state of the user. These approaches are summarized in Table IV.

A. GPS Location
Several systems rely on GPS readings to determine the user transportation mode (typically among static, walk, bike, bus, train), including the semi-supervised approach in [166]. In many scenarios, however, GPS locations are not available. For instance, network operators perform MSE using cell network data, i.e., MRs generated during network events, which typically do not contain the GPS location of the user device.

B. Signal Power Measurements
Some early solutions count the number of times that RSS readings cross a certain level [167], where the proposed solutions estimate the maximum Doppler frequency that is proportional to the mobile speed. Other methods estimate speed of mobiles by computing the covariance function of the RSS. For instance, the algorithm in [168] employs a modified normalized auto-covariance of received signal power. The proposed algorithm works well for frequency selective Rayleigh and Rician channels, it provides accurate speed estimation even if the Signal-to-Noise Ratio (SNR) is as low as 0 dB, while simulation results indicate that the algorithm is can reliably estimate mobile speed corresponding to a maximum Doppler up to 500 Hz. Although covariance-based methods are more efficient than crossing-based counterparts for small observation windows, they are both sensitive to noise for small Doppler spreads. This limitation could be addressed with Doppler spread estimation techniques such as the Maximum Likelihood approach that relies on periodic channel estimation and could provide near-optimal performance [235]. However, this technique requires knowledge of the SNR and the Gaussian noise level, while it suffers high implementation complexity.
The solution proposed in [169] estimates the speed of a mobile phone by matching time series of RSS data to a known signal strength trace from the same road. The main idea is that RSS profiles along roads remain relatively stable over time, i.e., multiple passes on the same road with the same speed generate similar signal strength traces. Passing over the same road at a lower or higher speed leads to either stretched or compressed version of the signal strength trace, respectively. Thus, one can determine the speed of a phone by matching its signal strength profile onto a training RSS profile obtained at a known speed. The downside is that it might be difficult to collect a training RSS profile traces at a known speed.
The MSE solution in [170] takes advantage of the speed dependent time variations of the shadowing in the uplink SRS measurements and the computed metrics are then compared with a reference curve or look-up table (database), with respect to the shadowing decorrelation distance. Two algorithms are presented, namely the Spectral Analysis Method (SAM), which evaluates the maximum frequency of oscillation of SRS measurements built on Fast Fourier Transform (FFT), and the Time-based Spectrum Spreading Method (TSSM), which evaluates in time domain the speed dependent spectrum spreading of the SRS signal. The computational cost of TSSM is very low and has very limited impact to the processing unit of the LTE eNB. Moreover, TSSM is easier to implement as SAM requires an accurate selection of the frequency peak, which may be difficult to obtain under some conditions. A highly desirable property of both algorithms is that they are network-based according to 3GPP-LTE standard because they are based on SRS power measurements to be conducted at LTE eNBs. However, the SRS sampling frequency is an issue because various network equipment vendors provide variable sampling rates ranging from milliseconds to seconds. Moreover, the database with reference curves needs to be created in advance.
The MonoSense system leverages serving cell information only and the idea is that the phone speed can be correlated with features extracted from both the serving cell tower ID and the corresponding RSS [171]. This is an interesting approach because the majority of network MRs, especially in LTE networks, usually contain information only from the serving cell and sometimes the strongest neighboring cell [29], [30]. MonoSense extracts features from both the time and frequency domain information available from the serving cell tower over different sliding window sizes. Both the logarithmic and linear RSS scales can provide different information about user movement, further enriching the feature space and leading to higher accuracy. Results show an average precision and recall of 89.26% and 89.84% respectively in differentiating between the stationary, walking, and driving modes.

C. HO Information
HO-based solutions count the number of HOs made by the user device during a predefined time window. For instance, the invention in [172] proposes a threshold-based classifier using cell and HO count.
The solution in [173] detects the location of the mobile based on existing knowledge of HO zones. A HO zone is the most probable location in a given road segment where the mobile switches from the current BS to a new one. Whenever a HO occurs in the testing trace, the location of the mobile is estimated to be the location of the most probable HO zone. A HO typically occurs when the SNR drops below a certain threshold. It turns out that on any given road segment, the locations where the SNR drops below the threshold remain stable. The average speed estimate is then the distance between the previously estimated HO location and the currently estimated HO location divided by the total time between the previous and the current HO. Building the database of HO zones is the main limitation of this approach.
To prevent overestimation of UE's mobility state in Het-Nets, specific weights can be assigned to different HO events and the final HO count for MSE is computed using a weighted sum of the HO events, e.g., 0.45, 0.25 and 0.1 for macroto-pico, pico-to-macro, and pico-to-pico HO events [174]. Deogun et al. [175] build upon this weighted MSE approach and present two approaches that combine HO with signal power information, namely the Trajectory-based MSE and the Enhanced Trajectory-based MSE scheme. The Trajectorybased MSE scheme counts the number of mobility events, i.e., HO successes and RSRP threshold crossing events, during the counting time period. Each mobility event is assigned a distinct weight based on the HO type (i.e., macro-to-macro, macroto-pico, pico-to-macro, and pico-to-pico) or crossing event. While this scheme includes only successful HO events (as in standard 3GPP MSE scheme), the Enhanced Trajectory-based MSE scheme considers both successful and failed HO events.
Another solution models densely deployed small cells using stochastic geometry, then analyzes the statistics of the number of HOs as a function of user device velocity, small-cell density, and HO count measurement time window, and develops a minimum variance unbiased velocity estimator, whose variance tightly matches with the Cramer-Rao Lower Bound [176]. Using this velocity estimator, they formulate the problem of detecting the user mobility state as low, medium, or high.

D. Lessons Learned
While the importance of accurate MSE was recognized in the early generations of cellular networks for supporting efficient and timely HO functionality while users are moving, it has recently become a critical enabling technology for tracking and navigation applications. For tracking, typical examples include fixing the user location when she/he is estimated to be static or optimizing the tracker processing and output if the user is moving at higher, rather than lower, speeds. For navigation, examples demonstrating the usefulness of MSE in improving the user experience include showing a driving user on a motorway, instead of a nearby residential road, if she/he is moving at high speed (outdoor), or showing a user moving to another floor if a vertical movement, e.g., taking the stairs or elevator, is detected (indoor).
On the network side, user mobility can be identified through temporal processing of network parameters including signal power measurements, Doppler, or sequence of BSs involved in HO operations. Higher resolution can be achieved if the GNSS user location computed on the UE, and/or data from the integrated inertial sensors are transmitted back to the network, at the expense of increased signaling and traffic in the control plane channels. In contrast, MSE does not have these limitations in UE-based solutions, thus triggering the development of a wide range of fitness and physical activity monitoring applications enabled by accurate mobility estimation.

X. INDOOR MAPPING
Indoor navigation and tracking solutions are impaired by the unavailability of floor maps and the lack of standards. While outdoor maps are now freely available from companies such as Google [236], Apple [237] and Microsoft [238], and even as the result of collaborative initiatives such as OpenStreetMap (OSM) [239], no such global solutions exist for indoor maps. Some of the companies mentioned above have been developing proprietary indoor mapping solutions, including IndoorOSM [240], Google Maps Indoor [241], HERE maps that were originally developed by Nokia [242], and apparently Apple as well with their rumored indoor mapping and surveying application [243]. Other companies are also entering this business, such as mapspeople [244] To this end, the development of tools to assist in creating indoor maps has received a lot of attention by researchers during the last decade. A maps editor, part of a toolkit for building and using indoor maps, has been proposed in [177] where Stahl and Haupert identify the need for maps to enable seamless indoor/outdoor pedestrian navigation, and propose a hierarchical model that integrates geometric and symbolic maps. The proposed editor can be used to create maps on top of raster images or architectural floor plans, and includes support for multiple floors and the topology required for navigation. The process is, however, completely manual and time consuming. The authors also identify the need for adequate formats for the representation of the maps and propose a solution based on eXtensible Markup Language (XML).
Towards automation in the creation of indoor maps, Schäfer et al. [178] proposed a method to automatically generate maps by parsing Computer-Aided Design (CAD) files containing architectural floor plans. The extracted maps are also modeled in a way that facilitates their use by particle filters employed in some indoor systems. Peter et al. [179] also proposed a method to automatically create indoor maps from photographs of evacuation plans, which are further enhanced by processing data collected from IMU. The motivation is that those evacuation plans are readily available in most of the buildings. WifiSLAM that was acquired by Apple in 2013 followed the same approach to extract the building maps [180].

A. Simultaneous Localization and Mapping
SLAM is an active research field that was firstly explored by the robotics community. Nowadays, it has become very popular bringing together robotics, computer vision, signal processing, data fusion, and sensor experts. Essentially, SLAM allows the 3D reconstruction of the interior physical map while an individual or robot is moving freely and being continuously tracked inside an unknown indoor environment.
SLAM technology spans from expensive systems based on Light Imaging, Detection, And Ranging (LIDAR) technology including [249]- [252] that deliver centimeter-level mapping accuracy to cost-effective smartphone-based solutions that achieve meter-level accuracy. Between these two extremes, camera-based solutions, especially those developed around Microsoft Kinect including [253]- [256], provide a good balance between cost (in terms of equipment price and survey time) and map accuracy. In this survey, we overview the increasingly popular smartphone-based SLAM solutions because of their capability to generate indoor radiomaps (i.e., cellular, Wi-Fi, magnetic) together with the building physical map using the on-board wireless communication and inertial sensor modules. Such indoor radiomaps are necessary for enabling accurate 3D fingerprint matching localization algorithms.
Note that several works in the literature take an existing physical map (e.g., floorplan blueprint) as input and output only the corresponding signal map. In this case, the building map puts hard constraints on the collected data trajectories. Thus, powerful map-matching techniques and Bayesian filtering methods, such as particle and Kalman filters, can be used to increase the user tracking accuracy and consequently the quality of the signal map; see [22] for an overview of mapmatching algorithms on a contemporary smartphone including application of wall constraints, topological indoor maps, and building geometry for heading correction. These are still considered as SLAM systems by some researchers because they build the signal map. However, we focus on solutions that have no prior knowledge of the indoor space and output primarily the physical map and optionally the radiomaps. These solutions are summarized in Table V.
The WiFiSLAM solution 5 uses Gaussian Process Latent Variable Models (GP-LVM) for mapping high-dimensional data (i.e., RSS from surrounding Wi-Fi APs) to a lowdimensional latent space (i.e., xy coordinates of the user device) [181]. WiFiSLAM requires no IMU sensor data and the underlying model incorporates a set of constraints, e.g., nearby locations should observe similar RSS values, similar RSS values should be observed at nearby locations, and successive locations in the data should be nearby assuming a walking user. The localization accuracy is 3.97 ± 0.59 m. On the downside, WiFiSLAM relies on a signature uniqueness assumption, which limits applicability to only signal-rich environments featuring a large number of Wi-Fi APs. Moreover, the computational complexity is high requiring O(N 3 ) operations per iteration, where N is the number of user poses (i.e., 2D position and orientation/heading). The WiFi GraphSLAM solution uses Wi-Fi RSS, pedometry (i.e., measured distance between two Wi-Fi scans), and gyroscope data [182]. GraphSLAM is a commonly used technique in robotics community for simultaneously estimating a trajectory and building a map offline. WiFi GraphSLAM shares many benefits of GP and uses similar Wi-Fi RSS observations for loop closure. Loop closure refers to the process of identifying that the user has returned to a physical location, which was visited previously while surveying, and enables resetting the PDR tracking algorithm to avoid inertial sensor drift; a survey of PDR systems based on inertial sensors can be found in [257]. WiFi GraphSLAM requires O(N 2 ) operations per iteration and accuracy is 2.23 ± 1.25 m. The main disadvantage is that it uses a pedometry sensor to measure travelled distance between two Wi-Fi scans, which may not work well with smartphone IMU sensors. This is because step counting works fairly well, but accurate step length estimation is challenging on smartphones.
The WiSLAM solution uses foot-mounted IMU and Wi-Fi RSS data. The main idea is to adapt the FootSLAM and PlaceSLAM algorithms and combine them in Dynamic Bayesian Network approach [183]. FootSLAM uses a Bayesian approach, where the state is the user's pose and step measurements allow updating both user trajectory and environment map. It is implemented as a Rao-Blackwellized Particle Filter (RBPF), where each particle is composed of a user trajectory instance and its related map [184]. On the other hand, PlaceSLAM assumes proximity information relative to some well recognizable places, e.g., doors [185]. In WiSLAM, RSS measurements provide distance, instead of proximity information, and require no human interaction or RFID tags. The main limitations are the foot-mounted IMU, which can be prohibitive in many application scenarios, and the use of logdistance propagation model for Wi-Fi RSS with path loss exponent fixed to 2, which is unrealistic for indoor environments, while Wi-Fi AP locations are estimated during SLAM.
The SignalSLAM system uses Wi-Fi/Bluetooth RSS, 4G LTE RSRP, magnetic field, GPS reference locations (outdoors), NFC tag or QR code readings at specific landmarks, and PDR based on IMU data [186]. This is essentially a modification of the WiFi GraphSLAM approach and the main difference is that the similarity in signal space conditions the proximity in physical space. Moreover, loop closure is performed when landmarks (NFC, QR, GPS) are revisited or known landmark locations can be directly used in the model. The reported median error is 11 m to 14 m for tracking using only Wi-Fi readings collected with different devices, which is not directly comparable to previous SLAM approaches. The main disadvantage is that it requires landmarks (NFC, QR, GPS) with known or unknown location for loop closure.
The DPSLAM system uses Distributed Particle filter SLAM (DPSLAM) to provide constraints on the drift of a simple hipmounted smartphone IMU [187]. DPSLAM does not require any prior knowledge of floor plans, transmitter locations, RSS signal maps, etc. The user is simply required to revisit locations periodically to enable IMU drifts to be observed and corrected using loop closure (or when GNSS fixes are available). The complexity is O(PN), for the correction step after loop closure (not iterative), where P is the number of particles. Localization tests indicate 4 m error at final location after 15 minutes' walk, while the largest error across the whole path is 12 m. The smartphone IMU is hip-mounted for more robust PDR tracking, which can affect the applicability of the system, but it could be feasible with other carrying modes (hand-held, pocket, bag, etc.) as in SignalSLAM.
Fingerprint Extended Kalman Filter SLAM (FEKFSLAM) is a low complexity approximation of the full SLAM [188]. FEKFSLAM maintains only a single hypothesis of the state vector, unlike DPSLAM, and requires a loop closure detection step at every measurement epoch. FEKFSLAM is recommended when there is no RSS signal map available, but the user's PDR parameters are well known (e.g., the step length and any compass bias have been recently calibrated during a period of GNSS availability). FEKFSLAM exhibits lower complexity O(N) for the correction step after loop closure (not iterative) at the expense of lower accuracy compared to DPSLAM (i.e., 4 m error at final location after 550 steps walk against 3 m at a different test).
Both DPSLAM and FEKFSLAM are part of the SmartSLAM hybrid solution that intelligently switches between these two solutions, as well as other simpler fusion algorithms, depending on the available information to reduce the computational load of the tracking engine and save battery, while maintaining good accuracy [188]. Other simpler fusion algorithms include a simple step-and-compass PDR solution and a Fingerprint Extended Kalman Filter (FEKF) that uses fingerprint maps and PDR. SmartSLAM is a general purpose all-in-one solution that trades off computational complexity (directly affecting battery life) with accuracy. However, it introduces overhead in terms of system complexity in application specific scenarios, where the available information is well known in advance.
The following solutions focus on the generation of the physical map, without building any radiomap. Essentially, indoor floor plan maps are created by processing location-dependent data traces obtained from crowdsourcing. The basic idea is to mimic what has been done in the past for creating outdoor maps from GPS traces contributed by volunteers. Since GNSS is not available or is not reliable indoors, several research efforts looked into traces obtained from dead reckoning after processing data from IMUs.
For instance, researchers at UC Berkeley developed a method to automatically create the maps of indoor buildings [189]. They used data collected from an IMU and a foot-mounted piezoelectric sensor to infer the trajectories of pedestrians, and then used those trajectories to estimate the structure of the buildings. While this method can only be used to create maps of the hallways, it showed that users of indoor positioning systems can contribute to build indoor maps automatically.
The CIMLoc system follows a similar approach, where data collected through crowdsourcing from smartphone sensors (magnetometer, gyroscope and accelerometer) are uploaded to a server for processing to derive the users' trajectories using PDR and a particle filter [190]. These trajectories are then segmented and clustered to generate indoor maps that are then used by the particle filter to improve the overall tracking performance. In this case, the only use of the created maps is to assist in the positioning method. This contrasts with the MapGENIE system, where the goal is to create indoor maps that can also be used for visualization [191]. In that approach, data collected from a foot-mounted IMU is processed to detect steps using a Zero-Velocity-Update protocol. The inferred trajectories are then processed to generate the hallway skeleton of the building (requires the exterior outline of the building to be provided). The traces are then further processed and combined with grammars to estimate the remaining structure of the building, including the geometry of rooms and their areas. Essentially, grammars are used to encode structural information such as the dimensions of rooms, the number of rooms, the relative room ordering, geometric constraints, etc. While almost fully automatic, these solutions required the use of specific equipment (foot-mounted IMUs or other sensors) and some level of previous calibration.
Recently, smartphone-based SLAM solutions addressed the limitations of external IMU sensors. For instance, Walkie-Markie exploits the Wi-Fi infrastructure to define landmarks, referred to as WiFi-Marks, to fuse crowdsourced user trajectories obtained from inertial sensors on users' mobile phones [192]. WiFi-Marks are special pathway locations at which the trend of the received Wi-Fi signal strength changes from increasing to decreasing when moving along the pathway. Walkie-Markie is able to reconstruct a high-quality pathway map for a real office-building floor after only 5-6 rounds of walks, with accuracy gradually improving as more user data becomes available. The maximum discrepancy between the inferred pathway map and the real one is within 3 m and 2.8 m for the anchor nodes and path segments, respectively.
CrowdInside uses Wi-Fi RSS and IMU sensors and corrects inertial motion traces by using points of interest in the indoor environment, such as elevators and stairs, for error resetting [193]. CrowdInside achieves 1 m distance error of the building entrance position and the displacement error is 6 m in 90% of the cases, while the number of rooms is always correctly identified with enough traces. The system requires sufficient anchor points, e.g., locations with GPS reception or special inertial data signature, such as escalators, elevators, and stairs, for calibrating traces. Moreover, several traces that pass through these anchor points need to be collected. Another drawback is that different rooms should have distinctive Wi-Fi signatures to ensure high room identification rate.
SenseWit uses only IMU data to identify motion state (walking, static, and irregular), extract features (turning, water dispenser, door, etc.), label featured locations and bundle together sequences of locations according to featured locations. Subsequently, a complete floor plan is progressively generated [194]. Map generation results suggest that hallway shape similarity and room size error are better than CrowdInside. The main disadvantage is that a significant volume of crowdsourced trajectories is needed.
Recent advances in image processing created new opportunities to automate the creation of indoor maps. For instance, Jigsaw extracts the position, size, and orientation information of individual landmark objects from images taken by users [195]. It also obtains the spatial relation between adjacent landmark objects from inertial sensor data. Map generation results at 3 floors of two large shopping malls indicate that the position and orientation errors of landmark objects are within 1-2 m and 5 • -9 • in 90% of the tests, while hallway connectivity and connection areas between floors are 100% correct. Mapping performance is slightly better than SenseWit; however, Jigsaw suffers from high energy consumption due to imagery data from the smartphone's camera, as opposed to low-power IMU data recording. Other disadvantages include the need for several crowdsourced trajectories, the significant labor cost to obtain the images, and the requirement for high image quality.
A method for automatic generation of 2.5D indoor maps by processing images collected using off-the-shelf tablets or smartphones is presented in [196]. These images are processed to generate accurate maps of single rooms. If the used devices are equipped with IMUs, a common feature in most of the more recent smartphones, maps from single rooms can be combined to create maps of entire buildings. While these maps are not proposed specifically to assist indoor positioning systems, they can certainly be used to support many applications including pedestrian navigation.
Along the same line, Google Tango (formerly Project Tango) enables applications to compute a device's position and orientation within a detailed 3D environment, and to recognize known environments [197]. This technology uses advanced sensors, including better IMU and multiple cameras, such as RGB, depth, and motion tracking, to enable a mobile device to map indoor spaces and to know the location of the device within that space. First experiments with Google Tango reported that this technology does not allow performing highly detailed scanning [248]. However, as discussed in [248], the point cloud produced by Google Tango can be processed with standard shape detection methods or simple heuristics to identify floors and walls or detect doors/openings. Therefore, it is a good candidate for constructing a 2D floor plan map depicting rooms and corridors, as well as doors connecting them. Furthermore, this could become the leading solution for 3D indoor mapping and spatial positioning driven from the recent release of consumer smartphones that feature such high-end sensors (e.g., Lenovo Phab 2 Pro) and the increasing availability of enhanced technology-enabled applications on Google Play Store.

B. Map-Related Challenges 1) Indoor Space Modeling:
Modeling of indoor spaces is important to complement physical maps. Apparently, indoor spaces exhibit complex topologies and they are composed of entities that are unique to indoor settings, e.g., rooms and hallways connected by doors. Therefore, conventional Euclidean distances are inapplicable indoors, which necessitates the use of symbolic and graph-based models [198].
The technology roadmap is towards indoor Geographic Information Systems (GIS) integration, including GeoJSON and IndoorGML. GeoJSON is a format for encoding a variety of geographic data structures, which is not specifically for indoor environments [199]. It supports multiple geometry types, including Point, LineString, Polygon, MultiPoint, MultiLineString, and MultiPolygon. In 2015, the Internet Engineering Task Force (IETF), together with the authors of the specification, formed a working group to standardize GeoJSON and the new specification of the GeoJSON format was released in 2016 [200]. Although very popular, it is not a formal GIS standard, i.e., it is not supported by the Open Geospatial Consortium (OGC).
On the other hand, IndoorGML standard [201] specifies an open data model and XML schema for indoor spatial information, which is an application schema of OGC Geography Markup Language (GML) [258]. While there are several 3D building modeling standards such as CityGML, Keyhole Markup Language (KML), and Industry Foundation Classes (IFC) data model, which deal with interior space of buildings from geometric, cartographic, and semantic viewpoints, IndoorGML intentionally focuses on modeling indoor spaces for navigation purposes. IndoorGML will soon be applied to model the indoor network in the V-world 3D geospatial platform, which includes the construction of national 3D map of South Korea [202]. Note that IndoorOSM, which was another older effort by OSM, is deprecated. Recent efforts from industrial and academic partners try to develop an indoor counterpart to OSM by bridging the gaps between IndoorOSM and IndoorGML in the context of the EU-funded i-locate open geodata project [203].
2) Privacy, Security and Map Representation: While several research teams addressed some problems related to the availability of indoor maps, very few referred to or proposed solutions for issues related to privacy, formats for the representation of the maps, protocols to access indoor maps, and the different types of maps needed in the indoor environment.
Building owners might not want the indoor maps of their buildings to be publicly available mainly due to security reasons. However, they might recognize the usefulness of disclosing those maps locally to the users inside their buildings. For instance, the Anyplace system offers the option in the map architect to release a map as either public or private, thus restricting access to specific users if necessary [110]. On the other hand, probably indoor maps with certain levels of detail should only be available to a certain group of users, while less detailed maps could be provided to the general public. As an example, consider a hospital or an airport; patients or travelers only need access to the public areas of the buildings, while medical staff or security personnel would benefit from having access to more detailed maps. These requirements constitute a challenge on how to provide secure access to indoor maps, while simultaneously preserving privacy.
Global access to indoor maps also calls for standard formats for map representation, and standardized protocols to access them (also related to privacy). This problem has been briefly discussed in [204] and a platform to make indoor maps available to client applications has been proposed. In that work, authors highlight the difference between symbolic and geometric maps, and their different uses (see also [177]). In a user study conducted in 2009 in a large shopping mall, it was concluded that outdoor-like maps and architectural floor plans are not optimal for indoor navigation because ". . . corridors do not have street names". In other words, outdoor and indoor spaces are organized in very different ways. Actually, indoor spaces are not like roads; thus, they are not well represented by simple directed graphs where edges connect vertices as in road networks. Moreover, pedestrian navigation is far less constrained than car navigation, even in outdoor spaces.
The adequacy of indoor maps based on architectural floor plans for visualization and pedestrian navigation has also been discussed in [205] and [206]. Among other aspects, the author highlights the fact that different users have different needs depending, for example, on their familiarity with the visited place. In [205], a new type of indoor map, inspired by London Underground maps, is proposed that can arguably better support several tasks performed indoors. In [206], the different types of indoor maps are further discussed and a framework is proposed to evaluate their quality in seven dimensions including privacy, interactivity, and semantic accuracy.
The above works contributed to a better understanding of the requirements for effective indoor mapping. We now understand that maps are needed for visualization of people or assets (geometry and/or symbolic maps), in large and small displays, for pedestrian and robot navigation (topological maps), for use by tracking techniques (e.g., particle filters), or even for augmented reality applications.

C. Lessons Learned
The creation of indoor maps, as well as the mapping process (i.e., the survey and data collection for producing the maps), has many challenges. Firstly, each building owner or administrator has to build the corresponding indoor maps before they can be used, while for outdoors the maps are already ready for use. For large and complex buildings, this task might be hard, in particular for old buildings for which blue prints are no longer available. Secondly, indoor maps built for one platform cannot be used in applications supported by other platforms due to the lack of interoperability. Standards for indoor maps formats and for open access to maps (Web services) would stimulate interoperability among systems from different vendors, and would maximize the benefits of spending considerable effort in building indoor maps. Appropriate tools to assist in the production of indoor maps by nonprofessionals would also contribute to increase the number and quality of maps. On the other hand, if the floor plan maps are already available or can be extracted from architectural plans to a usable picture format (e.g., jpeg, png, etc.) then they can be easily overlayed on top of world maps. For instance, some services offer Web tools for resizing, stretching, and rotating these pictures to align the floor plans with the boundaries of the building on world maps, like the KAILOS positioning engine [230], [259], or on top of Google Maps, like the Anyplace indoor navigation service [110].
Smartphone-based SLAM approaches are a user-friendly and cost-effective solution for the creation of indoor physical maps, while many solutions in this category are capable of building as a by-product the corresponding radiomap with accurate enough location tags that can be later used by fingerprint matching solutions for localization. However, SLAM solutions usually require the collection of a considerable volume of (possibly crowdsourced) data for producing coarsegrain indoor maps, which are usually of lower quality compared to other mapping methods.

XI. DISCUSSION AND OUTLOOK
In this section, we first summarize lessons learned with respect to the topics covered in this survey, followed by a discussion on architectural considerations and an outline of the technology roadmap and industry trends.
A. Lessons Learned 1) Localization architectures are classified as UE-based, UE-assisted, and network-based, each having advantages and disadvantages, while selecting one of them depends entirely on the target application. 2) While cellular network localization has been studied for several decades and mature solutions have made their way into standards, the advent of commercial 5G deployments featuring novel radio technologies will bring new opportunities for reaching centimeter-level accuracy. 3) Fingerprint matching with signal strength data seems to be the dominant approach for WLAN-based localization, while crowdsourcing solutions address the overhead of building and updating the radiomap at the expense of countermeasures to ensure the integrity of the crowdsourced contributions. 4) Multi-hop range-free localization is a promising solution for low-power and low-complexity localization in resource-constrained WSNs and high-volume IoT deployments, given that network anisotropy is sufficiently mitigated. 5) Data fusion techniques are applicable to different levels, ranging from raw measurements to location algorithms and advanced post-processing methods at the final stage, and are expected to further enable the high accuracy demanded in low-latency safety-critical 5G applications. 6) Vertical positioning can be achieved by cellular, WLAN and sensor-based solutions; however, reliable height/floor estimation for the provision of accurate 3D location is yet to come. To understand the importance of vertical positioning, for example, emergency responders taking action after an E-911 call initiated inside a multifloor building prefer the user location to be determined in the exact floor with possibly higher horizontal error, instead of the wrong floor, because it is easier for them to spot the victim. 7) MSE is an important enabling technology for tracking and navigation, which is easier to implement on mobile devices, rather than on the network side, due to the availability of satellite, terrestrial radio and sensor data on the devices which can be costly in terms of bandwidth to transmit in order to improve network-based MSE. 8) Indoor mapping has been greatly facilitated owing to the recent advances in smartphone-based SLAM solutions; yet, many of the challenges need to be resolved, including indoor space modeling and map security, privacy and representation. In addition, even though there are many ongoing standardization efforts for indoor maps and data management systems there is still a long way to go to reach indoor GIS systems offering the same services as their well-established outdoor counterparts.
B. Architecture Considerations 1) Availability: Architectural considerations also come into play, when designing systems for robustness. It can easily be envisioned that indoor positioning will become society's critical infrastructure in the next decade or so. GNSS is already considered such critical infrastructure especially due to its use for timing of data networks, banking services, etc. With the advent of the FCC E911 emergency call positioning requirements for indoors [2], it is to be expected that indoor positioning will also become a critical infrastructure.
Because such development is inevitable, any party willing to provide their products for such purposes needs to consider Service Level Availability (SLA) requirements. It may well be that accuracy is somewhat secondary consideration in the E911 applications, but the primary concern will be the availability of both the service itself and that of the location estimate [2]. The first concern can be addressed by redundancy, i.e., distributing the required servers geographically and having the proper DNS failover mechanisms in place. The second concern is related to the positioning coverage and can be ensured by appropriate crowd-learning techniques.
2) Scalability: Scalability is related to the cost and effort required to deploy and maintain the indoor positioning solution. It is a relative measure with respect to the expected added value. Therefore, there is no universal answer to the question, whether some technology is expensive or cheap to deploy.
Indoor positioning solutions can be divided into Businessto-Business (B2B) and into small and large scale Businessto-Consumer (B2C) solutions depending upon their intended target users. In the B2B use case the service end users are, e.g., the employees of a certain company. The application may be a smart office application helping the employees to find meeting rooms or services. Another example is an industrial automation application targeted at increasing productivity in the factory floor by, e.g., guiding maintenance personnel to correct locations.
The common factor for the B2B deployments is that they are deployed to a limited number of buildings and can typically add high value to the organization and for the individual users. Also, the bulk of the investment is in developing the end user application and its integration to the organization's other services. Therefore, the deployment cost of the indoor positioning solution does not play a large role in the investment decision. The ease of deployment is required, but the solution does not necessarily need to be scalable.
In the B2C solutions the end user is a consumer. An example of such a use case is an application targeted at shopping mall visitors. In the small scale, such an application may be created by a single shopping mall or retail chain for loyal customers. The added value per user may be small, but the large volume compensates for the low per-unit value. Again, in the small scale the cost of the solution is dictated by the application development -not the expense related to the indoor positioning deployment.
The large scale B2C refers to the case, when the availability of the indoor location becomes platformized. An analogy can be found from the GNSS and 2D WLAN-based positioning; today the application developers can trust that platforms such as iOS and Android provide location information at some level of accuracy anywhere in the world [260]. However, this is not yet the case with accurate 3D indoor location (i.e., location error of a few meters, floor level). Before this can be achieved, the scalability issue must be resolved. In practice, to enable indoor positioning in public and semi-public venues in large scale globally, crowdsourcing technologies must evolve. It may be acceptable, with a question mark, to require the initial radio survey to be made manually by a site visit, but the up-keep and maintenance of the radiomap must take place automatically. Full automation would also solve the venue owners' problem of deciding which vendor to co-operate with.
3) Security and Privacy: All information regarding the whereabouts of people and assets needs to be treated as confidential and private data by default. Indoor location information, whether it refers to individuals or objects, makes no exception. Even more so, whereabouts of people indoors may be even more private than outdoors in some cases. Also, the understanding of the radio environment indoors may possibly be considered private information as it may provide valuable information for malicious people regarding the company network. Therefore, it is advisable to handle also the assistance information (locations of the radio nodes, signal strength landscape) with proper care in the indoor positioning system. There are many aspects to protecting the personal and private data, which are discussed next.
Authentication: Refers to the set of measures, how the individual or machine is identified to be the entity it claims to be. The mechanism can range from using an application-specific identity or code (not safe) to requiring individuals to login or to use client-side/personal certificates (very safe). The authentication mechanism needs to be decided based on the use case and the confidentiality of the data to be accessed. The basic question, however, always is if the entity, a person or a machine, accessing the location information can be identified reliably with a suitably strong authentication mechanism.
Authorization: Once the entity has been identified, the next step is to authorize the access to the services and information. There may be various access levels (e.g., lower and higher accuracy indoor positioning) or available services (indoor and outdoor positioning) depending upon the person and the type of the object. The authorization step is typically already easier after the entity has been authenticated at sufficient certainty.
Encryption: Relates to the set of measures ensuring that even in case of data leakage, it cannot not be utilized by any malicious party unless they also get access to the encryption keys. There are at least three dimensions to encryption that need to be taken care of. Firstly, the data needs to be protected at rest in the cloud or in databases. This ensures that even if the malicious user gains access to the data or databases, the data is not compromised given that the encryption keys are appropriately protected. Secondly, whenever data is transferred over the Internet, they need to be protected by suitable means to prevent eavesdropping. In the most typical case it is sufficient to use HTTPS transport. Finally, the data also needs to be protected at rest in the client device. For instance, on Android devices it may suffice to use the internal storage (sandbox for the application) without extra encryption. However, whenever there is sensitive data, it should also be encrypted in the client side. Further issues then arise related to storing the actual encryption key securely in the application.
Privacy: Includes both technical and behavioral aspects. The technical aspect refers to designing the system so that neither the user location information is stored with unique identifiers, nor such data is transferred as plain text over the Internet. On the other hand, the behavioral aspect is harder to control. In the end, it is in the hands of the user, how much he/she is willing to expose to others through, e.g., social media applications.

C. Technology Roadmap and Industry Trends
Radnosrati et al. [135] have identified some important trends that are expected to have an impact on radio network positioning in the near future. One trend is that information and algorithms are shared between different layers in the classical Open Systems Interconnection (OSI) model. For instance, if Power Delay Profile (PDP) information was available not only at the physical layer, but in the higher layers as well, then more sophisticated algorithms could easily be derived by taking into account the variance of measurement uncertainty, the temporal correlation of the uncertainty between successive measurements of the same type, and the correlation between signal strength and timing measurements. Another trend is the increasing availability of new and better information, including Fine Timing Measurements (FTM) in the IEEE 802.11v standard [261], accurate Direction of Arrival (DOA) estimation in MIMO systems, and distance estimation based on short-range single-hop or multi-hop techniques in ad-hoc networks where no direct communication is required [135]. Another trend is the deployment of new infrastructure, including BLE beacons, IoT devices, Machine-to-Machine (M2M) networks that contain a number of devices such as RFID, sensors, tags, etc., and 5G communication networks [135].
So far, wireless networks have been used to determine user location for enabling a multitude of location-based services and applications. A recent trend is to move from networkbased localization to location-aided communications [262]. This has opened an entirely new research field regarding the use of location information to improve network operation, especially in upcoming 5G networks. This is of great value to network operators who can improve network performance (e.g., throughput, latency, etc.) and optimize resource allocation to meet increasing demands.
Di Taranto et al. [263] highlight the importance of locationawareness in 5G networks and identify how location information could be employed across the OSI protocol stack. For instance, in the physical layer, network processes that can be benefited include spatial spectrum sensing for cognitive radio and interference coordination in 5G, slow adaptive modulation and coding or channel estimation, beamforming, pilot decontamination in MIMO systems as described in [264] and [265], and Channel State Information (CSI) estimation [266]- [269]. Medium Access Control (MAC) layer applications include resource scheduling algorithms (e.g., for frequency reuse), inter-cell interference coordination techniques, and multicasting algorithms. Network and transport layer applications include enhanced HO mechanisms, routing protocols (known as georouting) in ad-hoc and vehicular networks. Higher layer applications include location-assisted information delivery (e.g., advertising) and multimedia streaming, Intelligent Transportation Systems (ITS) and autonomous vehicles, and location-enabled security and privacy schemes such as encryption key management and wormhole attack detection.

XII. FUTURE RESEARCH DIRECTIONS
In future wireless communication systems, the problem of accurate localization and tracking will have to be addressed by several methods that include, among others, a synergy between wireless access, networking theory, statistical data analysis, and optimization theory methods [132]. In these future wireless communication systems, localization services will face new possibilities and challenges given by heterogeneous protocols, low-latency and mmWave communications. The new ground-breaking results within Machine Learning will likely have an important impact in the redefinition of localization and tracking. We will address them shortly in the following.

A. Fundamental Research
One of the powerfully emerging research areas, within the fundamental domain, is the area of Machine Learning, and especially Statistical Machine Learning. The recent developments in this domain will likely have a major impact to engineer localization systems and services. In fact, machine learning has been already exploited for the well-known methods of SLAM, where wireless devices estimate or learn the received signal strength at a given point in space with respect to a Wi-Fi AP or cellular BS and such a knowledge is then used to derive the location. Among the emerging breakthrough in Statistical Machine Learning, there are the new theory that are capable to estimate missing data or corrupted data by using the hidden statistical correlation properties from various measurements. These ideas can be potentially applied to further improve or define new localization methods, especially for those scenarios where the measurements are insufficient or missing.

B. Heterogeneous Networking Protocols
Future wireless access networks will rely on heterogeneous networking protocols, which will operate according to different standards and on different frequencies. A fundamental question to be addressed is how rapidly we may switch from one communication protocol to another, while still ensuring accurate positioning services. This will be particularly difficult, because current state-of-the-art physical layer and MAC methods are not concerned with positioning accuracy as a main metric.
1) Heterogeneous IoT Protocols: Within the IoT networking protocols, a tangible example of this heterogeneity is given by the numerous standards, such, among others, Bluetooth, ZigBee, SigFox, LoRa, Narrow Band IoT (NB-IoT). In principle, a wireless device could be using one such standard at a time and could profit of the communication with anchor nodes or other distributed nodes that are using those standards. However, how to make efficient such potential localization procedure is an interesting open question, due to the unavoidable delays that will be introduced for switching from one standard to another, and for the time needed to receive beaconing information.
2) 5G New Radio: The 5G wireless communication networks will use a new wireless access technology called New Radio (NR) [270], [271]. NR is a standardization activity going on by 3GPP, in the Release 15. The essential characteristic of NR is that it will simultaneously support several wireless requirements and protocols (in terms of data rates, latency, coverage, capacity, and reliability) for many use cases (such as machine to machine, industrial, people-centric, infrastructure, and vehicular communications). NR will have a new complex set of protocols specifications and the physical and medium access control layer, especially the novelty concern waveform, frame structure, multiple access, and initial access management. How to exploit these new characteristics is a largely open problem. As this paper is being written, very few or no publications can be found concerning NR and indoor localization, whereas some initial studies are available for outdoor communications [272].

C. A Prominent IoT Protocol: NB-IoT
Among the emerging protocols for IoT, NB-IoT will most likely play an essential role in the future. In fact, such a standard has been introduced in the 3GPP releases and will be compatible for LTE, with particular reference to M2M communications. It is expected that NB-IoT will be capable to cover wide areas where there will be a massive IoT deployment. The characteristics of LTE such as modulation, channel coding, and multiple access, will be inherited by NB-IoT. Therefore, we can think that IoT devices using NB-IoT can enhance their localization abilities not only by the tradition localization methods implemented among NB-IoT devices, but also by the exploitation of the communication with mobile cellular phones using LTE and GSM. An initial activity in this direction can be found in [273], where it was shown that the localization method based on O-TDOA is promising. However, the obtained accuracy is of the order of tens of meters, which suggests that still much research is needed to arrive at order of meters precision for both indoor and outdoor localization.

D. Low-Latency Communications
The new paradigm of low-latency communications [274] predicts that any wireless device will be capable to communicate short messages within few milliseconds from source to destination and back. Such a short latency will enable new localization methods that will deliver centimeter-level accuracy. One of the problems with network-aided localization is the unpredictable delay introduced by the communication protocols in the physical and MAC layers. However, low latency communication will substantially reduce these latencies, while carrying timing information directly on the waveforms.
Wireless communications networks capable to offer low latency below 1 ms have therefore the tremendous potentiality to revolutionize the localization accuracy of mobile wireless devices. An interesting research direction will consist in understanding what are the fundamental performance limitations for localization services in low latency networks. In fact, there is a tradeoff between latency and throughput for different MAC scheduling schemes. The classic tradeoff between latency and reliability of slotted ALOHA and CSMA is shown in [275], and the tradeoff between delay and energy consumption in [276]. The tradeoff between latency and throughput in ad hoc network routing is given in [277]. The tradeoffs may be different for different regions (i.e., different throughput or reliability). For different techniques, such tradeoffs are expected to exist, and the related boundaries should be well determined. Given that these tradeoffs can determine the actual latency of the communications, and that the latency can be one of the input to the localization algorithms, it will be interesting to estimate the tradeoff between location accuracy and the techniques mentioned above.
However, not all the communication services and use cases will benefit from low latency communications, due to the cost and complexity of such a requirement. Therefore, only the use cases that demand low latency will potentially enjoy feature also sub-centimeter localization algorithms.

E. High Data Rate Wireless Systems
The growing need for higher data rates motivated the development of the 60 GHz mmWave communications for future high-data rate wireless systems [278]. FCC released an unlicensed continuous spectrum of 7 GHz over 60 GHz in many countries worldwide. This unlicensed band is particularly useful for applications such as smart communication environments, smart cities, independent robot navigation, assistive technology, habitat monitoring, vehicular networks, industrial logistics, and medical equipment [279]. Thus, with the emergence of mmWave communications, accurate and low-cost device localization algorithms are essential for devices using such a communication technology. This calls for new fundamental communication theory methods (e.g., location-aided beamforming) to ensure fast communications for positioning. For instance, Koivisto et al. [280] demonstrate that geometric location-based beamforming schemes become technically feasible, which can offer substantially reduced reference symbol overhead compared to classical full CSI-based beamforming.
A drawback of milliliter waves communications is the time to set up and maintain the narrow beams [281]. Such a delay can potentially affect localization algorithms that are dependent on communication delays between transmitter and receiver. Another major issue is that in indoor applications of millimeter waves, the presence of obstacles between transmitter and receivers (e.g., people), will block the line of sight communications, and therefore the actual communication will occur through reflectors. In such a case, the direction of arrivals of signal, which will be not the line of sight, may potentially hinder good accuracy.