Data Fusion With Model-Based Machine Learning For Weighted Least Squares Based Positioning

This work presents data fusion techniques with Machine Learning based algorithms for hybrid cooperative positioning. More precisely, model-based Machine Learning methods built on Bayesian inference in Bayesian Networks using a Belief Propagation algorithm. The linear systems consist of Gaussian Factor Graphs (FG) for knowing the probability of target position given the distance measurements to anchor nodes that know its position. The models are based on Least Squares (LS) and Weighted LS (WLS) algorithms with anchor based positioning. It is assumed that the nodes are able to estimate range with wireless technologies, such as IEEE 802.15.4-UWB, and different ranging techniques such as those based on Time of Arrival (two-way). One of the main objectives of this work is to study techniques based on Machine Learning to improve positioning both in good and challenging scenarios. Simulation results show that the presented FG with WLS algorithm achieves better results than the LS and WLS algorithms in various scenarios with good and poor conditions for positioning.


I. INTRODUCTION
Global Navigation Satellite System (GNSS) is considered the legacy solution in outdoor environments for positioning and navigation. However, in dense urban areas or urban canyons with the presence of strong multipath echoes and even the total blockage of the GNSS signals, the position accuracy is degraded or, in indoor scenarios or tunnels, it is not available. Data fusion and multi-sensor hybrid approaches can address many of the vulnerabilities of GNSS in that environments [1]. For vehicle applications or indoor positioning, recent research has focused on the development of localisation systems either using advanced sensors or fusing on-board and off-board information [2]. The position information can be used in the radio network for enhanced communications and mobility management in 5G, or alternatively offered to enable locationbased services like intelligent transportation systems (ITS), self-driving cars or indoor applications. Different combinations of sensors and technologies are suited to positioning and its applications, depending on the environment, dynamics, budget, accuracy requirements, and the degree of robustness or integrity required [3]. Moreover, there are different approaches to integrate data that takes into account its nature for optimal weighting as well as the transmission conditions. In [4], the authors presented a sensor fusion platform for GNSS/INS with UWB cooperative distance measurements for terrestrial vehicle navigation.
Machine Learning (ML) and Deep Learning methods have been explored and applied to positioning and navigation. They can offer adaptability and increase accuracy of the system in challenging scenarios. As examples, in [5] NLOS identification and mitigation for localization based on UWB experimental data techniques are presented. In [6] the authors presented a localization algorithm based on a graphical model (factor graph) for statistical inference.
This work presents a data fusion framework designed with model-based Machine Learning for hybrid and cooperative positioning. Model-based Machine Learning approach provides a framework which supports the creation of models tailored to each new problem [7]. This framework emerged from three key ideas: (i) a Bayesian viewpoint, (ii) the use of graphical models such as Bayesian Networks and Factor Graphs, and (iii) the application of inference algorithms such as Belief Propagation. In this framework, the problem is modelled with a probabilistic graphical format with all variables expressed as random variables.
In the presented study case, ranging is performed with radio devices, such as Ultra Wide Band (UWB, IEEE 802.15.4), where transmitted signals are used to estimate the range. Positioning is based on centralized single-epoch integration architecture, in which ranging measurements from UWB subsystems at the same epoch are combined using Weighted Least-Squares (WLS) estimation. The weighting of the ranges needs the standard deviation error values of each measurement provided by UWB nodes. Gaussian distributions for error of distance of Line-Of-Sight (LOS) transmissions are assumed. The proposed positioning algorithm is a model-based ML built on Bayesian inference, implemented by probabilistic graphical models using message-passing algorithms [7]. Therefore, graphical models are considered as Bayesian Networks [8] and Factor Graphs (FG) with Belief Propagation (BP) algorithm (or sum-product) [9]. We further consider linear systems and FGs with Gaussian distributions as in [10]. However, the FG proposed in [10] has loops. Loops are not desirable in factor graph design because of they cause indeterminate behaviors. In contrast, this work proposes a different algorithm based on FG that avoids loops and also it is based on the WLS algorithm. The weighting techniques allow to the distances with best covariance results to contribute more than the other ones to the position estimation. Moreover, the proposed algorithm based on FG technique allows to give better positioning results than WLS and LS algorithms in challenging scenarios. The proposed algorithm has a first stage in which the distances to anchor nodes (i.e., random variables) are grouped, on the one hand, to avoid loops in the FG and, on the other hand, so that the position solution of each group is weighted based on its covariance.
The remainder of the paper is organized as follows. Section II describes the system model, Section III presents the proposed algorithm, Section IV shows the simulation results, and Section V concludes the paper.

A. Scenario
The scenario of interest consists of two types of nodes: anchor nodes, and a target node that do not know its position. We consider anchor based positioning with LS and WLS algorithms. We want to know the probability of target position x given the measurements m i to anchor nodes that know its position. Thus we will work with the posterior pdf that can be approximated as We define the two-dimensional coordinates (M = 2) of the nodes as x = [x, y] T for the target node, and x a (i) = [x a (i) , y a (i) ] T , i = 1, . . . , N , for the anchor nodes. In this data fusion model for cooperative and hybrid positioning, we assume that the nodes are able to estimate ranges with different wireless technologies (e.g., IEEE 802.15.4-UWB, etc.) and different ranging techniques based on Time of Arrival (ToA), Received Signal Strength (RSS), etc. The geometrical distance between the target node and the i-th anchor is defined as

B. Ranging Model
In this work, it is assumed that a ToA estimation technique based on IEEE 802.15.4-UWB technology is used in order to estimate i . For the ToA-based model, in LOS and without bias error, measurements m i = d i and distance ρ i between target and anchor node i are modelled with the following The errors in distance estimation with TOA based techniques may have different sources. For instance, the signal traveling between sensor nodes is affected by the environment and obstacles, e.g., objects, people, walls and cars, causing multipath propagation.
For the factor graph modeling, we consider a linear system and factor graph. We use the ranging models that relate positions with estimates d i . However, those models in (1) are not linear. Thus, previous equation is linearized using Taylor series, such that (1) may be where ( 0 ) indicates initial values for each iteration of the algorithm. Once they are linearized and rearranging terms, one obtains the linear model (3) that can be used to relate the random variables (modelled with Gaussian distributions) in the factor graph.

C. Positioning Model
For ToA based ranging with N anchor nodes we obtain the known linear model [11] The system can be solved by applying iterative LS for the LS algorithm, and iterative Weighted LS for WLS one. The later involves solving a WLS problem in each step: where W is the diagonal matrix of weights w i .

III. THE ALGORITHM
The variables of the described ranging and positioning models are related through the Bayesian Network (BN) of equation (6). A BN is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph. Fig. 1a and 1b show the set of variables and their dependencies through the FG models. The factors (squares) relate the random variables (circles).
Therefore, the probability of target position x given the distance measurements is the following BN: Appendix A provides an introduction to BN and FG, detailing the main steps from BN to FG solving with a message passing algorithm (Belief Propagation or sum-product algorithm).

A. Algorithm With Loops
The BN (6) can be represented by a factor graph with the following factors ( Fig. 1a): Note that the factors involve random Gaussian variables, therefore we design a factor graph that can be solved with Gaussian propagation messages (BP algorithm) [9].
This solution is similar to [10] and it also has loops. Loops are not desirable in factor graph design because of they cause indeterminate behaviors. Therefore, in next section an approach is proposed to avoid loops.

B. Intermediate Algorithm Avoiding Loops
In this section an intermediate solution is proposed. It avoids the loops in the factor graph. The variables are grouped in vectors. The factors are the following:  . When D = 2, H has the minimum dimensions (2x2) to apply pseudoinverse solving the determined system of (3) and finding a solution with LS or WLS methods as in (4) and (5).
Note that the covariance of the WLS estimation (H T WH) −1 ) is related to the Geometric Dilution of Precision (GDOP), GDOP = trace(H T WH) −1 ), of the network of reference UWB nodes [11]. In factors f ∆di (∆d i , ∆x) of Fig.  1b, a position solution ∆x i (with LS or WLS algorithms) is obtained for each group i = 1, ..., Q. Each solution from each group obtained with D ranges is weighted with its covariance to obtain the solution ∆x. This is a product of Gaussian variables ∆x i . It allows groups with better covariance results to contribute more than the other ones to estimate variables ∆x and thus, the position x.

C. Pseudocode of Algorithm Avoiding Loops
The main steps for solving the factor graph avoiding loops are detailed in Algorithm 1. It considers LS and WLS models. Firstly, the initialization of variables related to the scenario and variables related to the algorithms FG and Taylor series is performed. Then, the FG is solved with BP algorithm.

IV. SIMULATION RESULTS
The proposed algorithm with D = 3 (groups of 3 ranges) was tested in scenarios with one target node (black dot), N anchor nodes (red stars) and ToA based ranging technique. The system model and algorithms are detailed in Sections II and III, respectively. We want to know the target position x given the measurements d i to anchor nodes that know its position. We resort to the GDOP parameter (estimated as in [11]) as a value that measures the effect on the positioning solution of i) network geometry of anchor nodes and ii) distance error to the corresponding anchor node. Results of the proposed algorithm with each scenario were averaged over N C = 1000 independent Monte Carlo trials. In figures, WLS or LS models with FG without loops (Intermediate solution) are named WLS or LS with FG -Intermediate; and WLS or LS are the algorithms without FG technique.
Two types of scenarios are proposed in order to show the behavior of the algorithms. On the one hand, more theoretical scenarios (scenarios A, B and C) in which we assume that the FG algorithms know the error of the measurements and   on the other hand, a more realistic scenario (scenario D) in which they do not. Table I shows scenario settings. Parameter w i is different than σ −1 d i for more realistic scenario D. In that case, w i parameters are set from the information given by the manufacturer of the MDEK1001 system with DWM1001 UWB modules [12]. Manufacturer says that X-Y location accuracy is typically < 10 cm. In a future work these values w i will be studied more in detail and we will study their estimation from real data. Moreover, with real data, we will study the improvement and the effect on the position estimation of the grouping of variables.
Scenario A: when the number of anchor nodes is reduced to N = 3 and the GDOP value is poor (high error σ d i and bad geometry for positioning), the results of algorithms that are not based on FG techniques are worse. For first iteration l = 1, for algorithms not based on FG, the RMSE value is higher than 20 m. In this scenario we assume that FG algorithms know the error of the measurements σ d i . Scenario A is shown in Fig. 2, its parameters are shown in Table I and the RMSE (9) of the target node position for scenario A is shown in Fig. 3.  Scenario B: when the number of anchor nodes is N = 4 and the GDOP value is good, the results of algorithms without FG do not diverge, and results of algorithm WLS with FG are a little better. Scenario B is shown in Fig. 4, while the estimated GDOP and RMSE are shown in Figs. 5 and 6, respectively. In this scenario we assume that FG algorithms know the measurements error σ d i . Simulation parameters are shown in Table I.   Scenario C: it is shown in Fig. 7. Its parameters are shown in Table I. The number of anchor nodes is N = 4. GDOP value is poor in terms of bad geometry, however there are distances with low σ d i error. RMSE results are shown in Fig.  8. RMSE results obtained with algorithms based on FG are better than RMSE results obtained without it. Scenario D: it is shown in Fig. 9. It is more realistic so that the FG does not know the error σ d i . Its parameters are shown in Table I. The number of anchor nodes is N = 4. In the proposed algorithm, the ranges are grouped in Q = 3 groups of D = 3 components. GDOP value obtained with each group is shown in Fig. 10. The GDOP value of one  group is better than the others. RMSE results are shown in Fig. 11. RMSE results obtained with the algorithm based on FG and WLS are better than those obtained with the algorithm based on LS. Thus, weighted techniques allow distances with better covariance results to contribute more to the position estimation.
Note that the initial value of x 0 (Eq. 2) in each iteration of the algorithms is the estimated valuex of the previous iteration. For scenario A (Fig. 2), the RMSE (Fig. 3) of values ofx for non-FG based algorithms is higher than those for FG-based algorithms. A bad geometry of the anchor nodes affects the solution estimated with the iterative LS and WLS algorithms. The RMSE of the solution with non-FG based algorithms increases and those algorithms solutions diverge. Instead, for scenario D (Fig. 9), the RMSE (Fig. 11) of values ofx for all algorithms converge to a stable solution.
The position obtained with the algorithm based on FG (intermediate solution avoiding loops) is estimated with the variable ∆x. This variable is computed as the product of the Gaussian variables ∆x i for each group i = 1...Q. Thus, each solution obtained with D = 3 anchor nodes is weighted with its covariance to obtain the final solution ∆x. Moreover, in each iteration of the algorithm based on FG (intermediate solution avoiding loops), the estimation of marginal probability of target node positionx is obtained with the product of incoming Gaussian messages, which allows to ponder them depending on corresponding covariance. This yields that for scenarios with poor GDOP, the presented iterative FG without loops (with LS or WLS models) achieves better results than iterative LS and WLS algorithms.
In conclusion, the weighted techniques allow distances with best covariance results to contribute more than the others to the position estimation in each group when GDOP value of the target node is not poor (scenario B, D) or poor (scenario C). Moreover, another interesting conclusion is that the positioning results of algorithms based on FG without loops are better than those estimated with algorithms not based on FG in challenging scenarios when GDOP of anchor nodes is poor (scenarios A and C).

V. CONCLUSIONS
This work presented model-based Machine Learning algorithms built on Bayesian inference in Bayesian Networks using a Belief Propagation algorithm. We have presented linear systems that consist of Factor Graphs (with Gaussian distributions) for knowing the probability of target position given the distance measurements to anchor nodes that know its position. The models are based on Least Squares (LS) and Weighted Least Squares (WLS) algorithms with anchor based positioning. The systems are data fusion frameworks for hybrid cooperative positioning. We consider positioning techniques that use radio devices, such as (UWB, IEEE 802.15.4). One of the main objectives is to study techniques based on Machine Learning to improve positioning in both good and challenging scenarios for positioning. Simulation results show that the presented algorithm based on Factor Graph technique and WLS takes advantage of the weighted techniques. Thus, it allows distances with better covariance results to contribute more than the others to the final position estimation, both in good and in challenging scenarios. Moreover, the presented FG solution ponders position estimations depending on covariance results, allowing improved positioning results in challenging scenarios. In future works, we will consider techniques based on hybrid model and non-model solutions, deep learning and other ranging and positioning (or navigation) techniques to improve the solutions with NLOS transmissions and non-Gaussian distributions applying real data from UWB nodes.

APPENDIX A INTRODUCTION TO FACTOR GRAPHS AND SUM-PRODUCT ALGORITHM
In general, let f (v 1 , ..., v n ) denote the joint probability mass function of a collection of random variables. By the chain rule of conditional probability, we may always express this function with conditional and marginal probability distributions as (10).
A graphical model is a probabilistic model that describes probability distributions. There are different types of graphical models. The Bayesian Networks are based on directed acyclic graphs [8]. The Bayesian Network represents the random variables and their conditional relationships as: If a(v i ) has no parents, then it will be p(v i ). For example, a Bayesian network may be written by Another type of graphical model is a factor graph. A factor graph is a bipartite graph that enables efficient computation of marginal distributions through the sum-product algorithm [9]. Probability distributions can be represented by factor graphs, since conditional dependence between random variables can be expressed in terms of factorization of their joint probability density function. For example, from the example of factorization of (11), the factors of the factor graph may be Thus, a factor graph has a variable node for each variable v i and a factor node for each local function f i . Moreover, in a factor graph there is an edge-connecting variable node v i to factor node f i if v i is an argument of f i .
The sum-product algorithm applied to the factor graph allows the computation of the marginal probabilities with the messages between the components of the factor graph [9]. The main rules of sum-product algorithm are the following: • Message from variable to factor: • Message from factor to variable: (14) where V = n(f ) is the set of arguments of the function f . In (14), the factors f represent the conditional probabilities, therefore the message from a factor to a variable represents the marginal of the joint distribution. After all the messages are passed, the interesting marginal functions can be calculated. The marginal probability of a variable can be computed as the product of its incoming messages. For the case of Gaussian multivariate random variables and linear relations between variables, the integrals of (14) can be calculated directly. The marginal distribution of a joint Gaussian distribution is another Gaussian distribution with mean and covariance of the variable that is marginalized [7]. Therefore, for Gaussian factor graphs with linear relations between variables, the messages can be represented by the mean vector and covariance of the corresponding multivariate Gaussian random variable.