On the Use of Fuzzy Metrics for Robust Model Estimation: a RANSAC-based Approach

. Application domains, such as robotics and computer vision (actually, any sensor data processing ﬁeld), often require from robust model estimation techniques because of the imprecise nature of sensor data. In this regard, this paper describes a robust model estimator which is actually a modiﬁed version of RANSAC that takes inspiration from the notion of fuzzy metric, as a suitable tool for measuring similarities in the presence of the uncertainty inherent to noisy data. More precisely, it makes use of a fuzzy metric within the main RANSAC loop to encode as a similarity the compatibility of each sample to the current hypothesis/model. Further, once a number of hypotheses have been explored and the winning model has been selected, we make use of the same fuzzy metric to obtain a reﬁned version of the model. In this work, we consider two fuzzy metrics that permit us to express the distance between the sample and the model under consideration as a kind of degree of similarity measured relative to a parameter. By way of illustration of the performance of the approach, we report on the accuracy achieved by the proposed estimator and other RANSAC variants for a benchmark comprising two kinds of perception problems typically encountered in vision applications, and a large number of datasets with varying proportion of outliers and diﬀerent levels of noise. The proposed estimator is shown able to outperform the classical counterparts considered.


Introduction
The Random Sample Consensus algorithm (RANSAC) [6] is a robust estimation technique whose most distinctive feature is the use of random sampling and a This work is partially supported by projects PGC2018-095709-B-C21 (MCIU/AEI/ FEDER, UE), EU-H2020 BUGWRIGHT2 (GA 871260) and ROBINS (GA 779776), and PROCOE/4/2017 (Govern Balear, 50% P.O. FEDER 2014-2020 Illes Balears). This publication reflects only the authors views and the European Union is not liable for any use that may be made of the information contained therein.
voting scheme to find the optimal set of model parameters to fit/explain a given dataset comprising both inliers and outliers. RANSAC is widely used nowadays, so much that it has become of common use in robotics-related algorithms, since, in this application domain, it is often necessary to solve model estimation problems whenever a perception task is addressed.
Nowadays, facing this kind of situation requires to cope with new challenges due to an increased use of potentially poor, low-cost sensors, and the ever growing deployment of robotic devices which may operate in potentially unknown environments. In general terms, the underlying algorithms need to be capable of being robust against, in particular, strong uncertainty levels. In this regard, a robust estimator is able to correctly find the original model that supposedly the input data fits to, even when the data is noisy and contains outliers, i.e. data items which are not consistent with the original model due to an arbitrary bias affecting them. (See [8] for the details on the concepts, techniques and technical issues surrounding robust estimation. ) Fuzzy methodologies have been shown to be useful to deal with imprecise data, targeting on the design of systems that are able to cope with uncertainty one way or another and even degrade gracefully if needed [9]. In this work, we propose a variant of RANSAC which avoids discriminating between inliers and outliers but makes use of a fuzzy metric, in the sense of I. Kramosil and J.
Michalek [11], to associate to every sample a degree of compatibility with regard to the current model. The aforesaid fuzzy metric is besides used in a final model refinement step that runs after the main hypothesis selection loop.
The rest of the paper is organized as follows: Section 2 overviews RANSAC; Section 3 introduces two fuzzy metrics of relatively distinct nature though oriented to be embedded within the main RANSAC loop, while Section 4 details the RANSAC variation that incorporates these fuzzy metrics; Section 5 reports on a number of experiments to illustrate the performance achieved; Section 6 concludes the paper.
2 Brief review of the RANSAC approach for robust model estimation Regarding model estimation, a common measure of estimation robustness is the breakdown point (BDP), defined as a percentage threshold on the outlier rate beyond which the technique under consideration is no longer robust to outliers. RANSAC is one of those robust estimators with BDP higher than fifty percent. Fifty percent is the limit of the Least Median of Squares (LMedS) [20], another robust estimator that has also enjoyed high popularity as a high BDP technique. Least Trimmed Squares (LTS) and Minimum Probability of Randomness (MINPRAN) are other high-BDP algorithms [Olu16], although less popular than RANSAC and LMedS. The BDP for others, such as the M-estimators family [HR11], is below 50%. Applications in statistics typically require less than fifty percent BDP, since outliers in this context are anomalies or exceptions in the data. However, the case is often different in robotics and computer vision ap-plications, where outliers are defined with respect to the best among competing models, each describing well a fraction of the input data. By randomly generating hypotheses on the model parameters, RANSAC tries to achieve a maximum consensus in the input dataset in order to deduce the inliers. Once the inliers are discriminated, they are used to estimate the parameters of the underlying model by regression. In more detail, instead of using every sample in the dataset to perform the estimation as in traditional regression techniques, RANSAC tests in turn many random sets of samples. Since picking an extra point decreases exponentially the probability of selecting an outlier-free sample [5], RANSAC takes the Minimum Sample Set size (MSS) to determine a unique candidate model, thus increasing its chances of finding an all-inlier sample set. This model is assigned a score based on the cardinality of its consensus set. Finally, RANSAC returns the hypothesis that has achieved the highest consensus, and the corresponding model is refined through a last minimization step that only involves the inliers found.
Searching for an all-inlier sample, RANSAC typically runs for N iterations: where ρ is the desired probability of success, i.e. at least one of the considered random sets is outlier-free, s is the size of the MSS for the problem at hand and ω is the ratio of outliers. (See [6] for the details on Eq. (1).) There have been a number of efforts aiming at enhancing the standard RANSAC algorithm, e.g. MSAC, MLESAC, MAPSAC, PROSAC, R-RANSAC, LO-RANSAC and U-RANSAC [4], since it, while robust, has its drawbacks regarding accuracy, efficiency, stability and response time [17,19]. Among these variants, there is a very reduced set adopting fuzzy methodologies [12,23]. In both cases, the authors address a homography fitting problem, which, in [12], is solved by discriminating data samples into the good, bad and vague fuzzy sets using a fuzzy classifier, while [23] defines a triangle-type membership function for the set of inliers and combines this with a Monte Carlo method for sample selection. It must be pointed out that the two aforementioned variants of RANSAC differ significantly from the one described in this paper, which is based on distance fuzzification.

Fuzzy metrics for robust model estimation
Two mathematical tools can be found in the related literature with regard to the measurement of the degree of nearness between two points with respect to a parameter. On the one hand, we have the so-called modular metrics [2]. In this regard, we recall that a function w :]0, ∞[×X × X → [0, ∞] is a modular metric on a non-empty set X if, for each x, y, z ∈ X and each θ, µ > 0, the following is satisfied: (MM1) w(θ, x, y) = 0 for all θ ⇔ x = y; (MM2) w(θ, x, y) = w(θ, y, x); (MM3) w(θ + µ, x, z) ≤ w(θ, x, y) + w(µ, y, z).
This kind of generalized metrics has been typically used in modeling problems that arise in classical Newtonian mechanics where the numerical value w(θ, x, y) is interpreted as the velocity of a body traveling from location x to location y in time θ. However, in general terms, w(θ, x, y) can be thought of as a dissimilarity measurement between objects x and y relative to the value θ of a parameter. Hence, the smaller the value, the closer the points x and y are, with respect to θ. (We refer the reader to [3], and references therein, for a recent account of the theory.) From now on, the value w(θ, x, y) will be denoted by w θ (x, y).
On the other hand, we have the notion of fuzzy metric. This type of metric tool arises with the aim of extending to the fuzzy framework the notion of statistical metric due to K. Menger. In the sequel, we assume that the reader is familiar with the basics of fuzzy sets and t-norms. (An outstanding general reference on these topics is [10].) According to [11], a fuzzy metric space is a triplet (X, M, * ) where X is a non-empty set, * is a continuous t-norm and M is a fuzzy set on X × X×]0, ∞[ satisfying, for each x, y, z ∈ X and θ, µ ∈]0, ∞[, the following: The value M (x, y, θ) can be understood as a degree of similarity between two points x, y ∈ X relative to the value θ ∈]0, ∞[ of a parameter. Thus, the larger the value of M (x, y, θ), the closer the points x and y are with respect to θ.
At this point, it is worth noting that fuzzy metrics have been shown to be a very appropriate similarity measure when working with data affected by vagueness or imprecision, like noisy data; e.g. see [1,7,[14][15][16] for successful applications to image filtering and to the study of perceptual colour differences. Despite the applicability of fuzzy metrics, it must be pointed out the lack of examples in the literature and the fact that this becomes a handicap in order to expand the number of fields in which new applications can be generated.
At a glance, the exposed axiomatics of both notions of metrics are in essence dual. Motivated by this fact and by the aforementioned lack of examples, the intuitive duality relationship was formally proved with the aim, among others, of introducing new methods for generating fuzzy metrics and, thus, overcome the aforesaid handicap [13]. Specifically, the next result was proved.
If w is a modular metric on X, then the triplet (X, M w,f * , * ) is a fuzzy metric on X, where the fuzzy set M w,f * : Within the framework of the aforementioned metrics, we are now concerned on obtaining a suitable metric tool for RANSAC; that is to say, a metric that is suitable as a measurement in presence of noise and, in addition, it is able to encode the compatibility of each sample to the current model/hypothesis. In this regard, next we introduce, applying Theorem 1, two fuzzy metrics, induced from modular metrics and the use of, on the one hand, the Luckasievicz t-norm and, on the other hand, the Aczél-Alsina t-norms. To this end, we recall first a few pertinent facts that will play a central role in our subsequent discussion.
On the one hand, the Luckasievicz t-norm * L and the family of Aczél-Alsina t-noms ( * α AA ) α∈]0,∞[ are given, for all x, y ∈ X, as follows [10]: On the other hand, given a metric space ( for all x, y ∈ X and for all θ ∈]0, ∞[ [3]. In view of the exposed facts, we construct two new fuzzy metrics, (M d 1,n , X, * L ) and (M d 2,n , X, * n AA ), aiming at, among others, encoding the compatibility of each sample to the current hypothesis within the framework of a RANSAC-based model estimator. Notice that n ∈ N and that N denotes the set of positive integer numbers. To this end, given a metric space (X, d), consider the modular metric w d (θ, x, y) on X and notice thatw d θ (x, y) = w d θ (x, y) for al x, y ∈ X and for all θ ∈]0, ∞[. Next, we induce the fuzzy metric based on the t-norm * L . Define the fuzzy θ , 0}. By Theorem 1, we deduce that (M w d θ ,f * , X, * L ) is a fuzzy metric. On account of [18,Theorems 4.15] and [21], (F * P (M w d θ ,f * L , . . . , M w d θ ,f * L ), X, * L ) is a fuzzy metric, * P is the product t-norm and the function F * P : [0, 1] n → [0, 1] is defined by F * P (a 1 , . . . , a n ) = a 1 * P a 2 * P . . . * P a n (n ∈ N). It follows that  4 Fuzzy metric-based model scoring and refinement for RANSAC As already described, RANSAC adopts a hypothesize-and-verify approach to fit a model to data contaminated by random noise and outliers: i.e. for every hypothesis/model considered, data samples are classified into inliers and outliers by comparing the fitting error with a threshold τ I related to data noise, and that model accumulating the largest number of inliers is the one finally chosen as solution of the estimation problem. This simple approach has been systematically used for robust estimation of model parameters in the presence of arbitrary noise, although, along the years, alternative implementations have been proposed to counteract the misbehaviours and shortcomings that have been detected.
In this work, we focus on three facets of RANSAC: (1) samples classification into inliers and outliers, in which we prevent the estimator from explicitly, and prematurely, deciding which samples are relevant; (2) model scoring, for which we replace the pure cardinality of the inlier set of plain RANSAC by an expression involving the individual fitting errors, similarly to what MSAC and MLESAC do [22]; and (3) model refinement once the main hypothesis-checking loop has finished, for which we adopt an iterative re-weighting scheme that makes use of all the available data samples without any distinction between inliers and outliers, contrarily to plain RANSAC, and other variants, that adopt least squares regression only for the set of inliers (notice that the distinction between inliers and outliers depends on the current model under consideration, and thus changes with every model).
Algorithm 1 describes formally the RANSAC variant that is proposed in this work. The details regarding points (1)-(3) above can be found next: 1. Samples classification. As already mentioned, no distinction is made between inliers and outliers, but we make use of a fixed fuzzy metric M w,f * generated by the technique in Theorem 1 to obtain a compatibility value φ ∈ [0, 1] between each sample x j and the current model M Θ k , given the fitting error (x j ; M Θ k ). Observe that the compatibility value obtained from the fuzzy metric depends on the set of parameters (d, Φ) with Φ = (n, θ) when either M d 1,n or M d 2,n are under consideration. From now on, such a value will be denoted by φ i ( ; Φ) with the aim of making clear that such a value refers to the fitting error and that such a value comes from the fuzzy metric M d i,n (i ∈ {1, 2}). Since we contemplate the use of a single, specific distance d, i.e. the one related to the fitting error, we will denote both fuzzy metrics as M i,n (i ∈ {1, 2}) eliminating the allusion to metric d.

Experimental results
In this section, we illustrate the performance of the RANSAC variant proposed in Section 4, using either M 1,n or M 2,n , for a number of experiments that: -Consider two model fitting problems, namely straight line fitting and ellipse fitting. The former is for 2D lines described by parameters Θ = (a, b, c), corresponding to a straight line in general form ax + by + c = 0. The latter case is for ellipses expressed as ax 2 +by 2 +2cxy +2dx+2ey +f = 0 and hence Θ = (a, b, c, d, e, f ). The respective dimensionalities are clearly different. -Compare with plain RANSAC and MSAC (their computational requirements are similar to ours).

Experimental setup
For testing purposes, we generate 500 synthetic datasets for the straight line estimation problem and 200 synthetic datasets for the ellipse estimation prob-lem. Each dataset contains a total of 300 points which comprise both inliers and outliers, the latter in a proportion equal to ω. The respective samples stem from either 2D lines in random orientations and positions or ellipses with random axes lengths and orientations. Given a random point p = (x, y) over the respective curve and the normal vector n at p, an inlier p I of the dataset is generated by shifting p along n using a zero-mean Gaussian distribution with standard deviation σ, i.e. p I = p + N (0, σ) · n. In both cases, outliers p O are uniformly generated within a rectangular area containing the ellipse or a part of the straight line, ensuring that outliers lie out of a ±3σ stripe along the curve. Every combination (σ, ω) gives rise to a different dataset.
Regarding hypothesis generation within the main loop, in all experiments, the size of the MSS is always set to the minimum, i.e. s = 2 for straight lines and s = 5 for ellipses (Θ is normalized to unit norm). Besides, the number of iterations k max is calculated according to Eq. (1), with ρ = 99%. The parameters of φ i ( ; Φ), Φ = (θ, n), are set as follows: θ = κ · σ, as well as τ I for RANSAC/MSAC, considering different values for κ; n = 1 or 2, as indicated for each experiment. Finally, to compare properly RANSAC, MSAC and our estimator, we make use of the same sequence of MSS's to avoid the effect of randomness.

Results and discussion
In the following, to measure the estimation accuracy: -For the straight line fitting problem, we make use of the average µ[ε] of the angle ε between the true and the estimated normal vector for straight lines. -For the ellipse fitting problem, we make use of the average µ[ε] of the maximum relative error ε between the true p * and the estimated p vector of coefficients (a, b, c, d, e, f ), calculated as: -For both cases, we also report on the average number of iterations spent during model refinement µ[t]. Table 1 shows performance results for the straight lines case, for the two fuzzy metrics M 1,n and M 2,n and several outlier ratios ω and Gaussian noise magnitudes σ. In sight of these results, it is worth noting that: (1) the estimation accuracy for M 1,n is above that of plain RANSAC and MSAC in all cases, while for M 2,n the accuracy is in general better than the classical counterparts except for n = 1 and ω = 0.5 and 0.6, although the difference with MSAC is very small; (2) M 1,n behaves in general better than M 2,n ; (3) the value of θ in M i,n does not seem to be critical, since very similar errors result for κ = 1 -3, maybe more variation in performance is observed for M 2,1 ; (4) the estimation accuracy does not differ significantly for M 1,1 and M 1,2 , while, for M 2,n , M 2,2 seems to be better; (5) as for the number of iterations of the refinement stage t, in general, µ[t] is very similar for n = 1 and n = 2 and for both fuzzy metrics, (6) it grows Table 1: Straight line fitting case: estimation accuracy and number of iterations of the refinement stage for (a) different outlier ratios ω, (b) different noise magnitudes σ and (c) different settings for τ I , θ = κ · σ. When they do not change, σ = 1, ω = 0.4 and κ = 3. Lighter background means higher performance. with the amount of noise in the data, as expected, and (7) higher values of κ reduce t, indicating that outliers are nullified within the main loop and therefore less iterations of refinement are required. Table 2 reports on the accuracy which has resulted for the ellipse fitting case. On this occasion: (1) again the behaviour for M 1,n is better than that of plain RANSAC and MSAC in general, with higher accuracy for M 1,1 , except for some very particular cases, i.e. ω = 0.60 or σ = 2; (2) M 2,n clearly behaves better for n = 2, also outperforming RANSAC and MSAC; (3) as for the number of refinement iterations, it is above what is necessary for straight lines, as expected because of the higher number of model parameters to estimate; (4) it seems the dependency of µ[t] on a correct selection of κ is also higher for this estimation problem. Figures 1 and 2 report on the best-and the worst-case estimations among the full collection of datasets, for our approach and the two estimation problems considered with regard to MSAC; that is to say, the best case is the case for which FM-based RANSAC outperforms MSAC the most, and the worst case is the case in which MSAC outperforms FM-based RANSAC the most. Besides, we report on several percentiles of the respective ε for all three methods. In both figures, the colour code of the left plots is as follows: the true/estimated model is indicated as gray/black lines; regarding MSAC, inliers/outliers are indicated as blue/red dots; as for FM-based RANSAC, φ i ( (x j ; M Θ ) ; Φ ) is coded in gray scale.
As can be observed, for the straight-lines estimation case, data samples are correctly scored by our approach, and the estimated and true models are almost identical even for the worst case, i.e. for the worst estimation, the error is not significant. Regarding ellipse estimation and the best case, we can see that the FM-based RANSAC scores correctly the inliers and hence manages to find the ellipse, while MSAC cannot identify it correctly. As for the worst case, all three variants fail to locate correctly the ellipse, though they all produce estimates of the same quality. The percentile plots included in Fig. 1 and 2 for both estimation problems provide more insight on the global performance of all three methods, showing that M 1,n outperforms M 2,n in general and that the FM-based RANSAC leads to significantly lower estimation errors than MSAC.

Conclusions
This work introduces two new fuzzy metrics (FM) which have been succesfully embedded within a revised version of RANSAC, proving thus useful for robust model estimation. Further, this revised version of RANSAC includes an iterated re-weighting least-squares stage for model refinement making use of the same FM. By means of any of the two FMs considered, we avoid discriminating between inliers and outliers, but make use of a compatibility value with regard to the current model/hypothesis, provided by the FM itself for each data sample. These compatibility values are aggregated to score the model against other models generated inside the main RANSAC loop. Experimental results show good performance for the two FMs while being part of the FM-based RANSAC, actually outperforming RANSAC.