Comparative study on BBA determination using different distances of interval numbers

Dempster-Shafer theory (DST) is an important theory for information fusion. However, in DST how to determinate the basic belief assignment (BBA) is still an open issue. The interval number based BBA determination method is simple and effective, where the features of different classes' samples are modeled using the interval numbers, i.e., an interval number model is constructed for each focal element. Then, the distances of interval numbers are used for measuring the similarity degrees between the testing sample and each focal element, and the similarity degrees are used for determinating the BBA. The definition of interval numbers' distance is crucial for the effectiveness of the interval number based BBA determination methods. In this paper, we use different interval numbers' distances for determinating BBAs. By using the artificial data set and the Iris date set of open UCI data base, respectively, we compare and analyze the determination of BBAs with different distances.


I. INTRODUCTION
Dempster-Shafer theory (DST) [1] was proposed by Dempster in 1960s, and was developed by Shafer [2].In DST, the basic beliefs are assigned to the power set of the frame of discernment (FOD), which is used to describe the uncertainty of sources of evidence.The evidences (i.e., basic belief assignments, BBAs) originated from different sources can be fused using the Dempster's combination rule [1].DST has been widely used in the information fusion fields [3]- [5].
Using DST, the first step is to determinate the BBAs, which is still an open issue.The determination of BBAs can mainly categorized into two branches [6]: (1) The experts give the BBAs directly according to their personal experiences; (2) The BBAs are determinated based on the samples using some special determination rules.In the first branch, the determination of BBAs relies on the experts' subjective points of view.In this paper, we focus on the second branch approaches, i.e., the BBAs are determinated based on available samples.Researchers have proposed many approaches in this branch.Selzer et al. [3] determinated the BBAs based on the number of classes and the environmental weighting coefficient.Shafer [2] proposed a BBA determination method based on statistical evidences.Bi et al. [7] designed a kind of triple focal elements BBA in dealing with the text classification problem.Szlzenstein et al. [8] used the Gaussian model getting the BBAs through iterative estimation.Deng et al. [9] defined a similarity measure based on radius of gravity, and then the similarity measure is used for determinating the BBAs.Boudraa et al. [10] and Florea et al. [11] determinates the BBAs based on the membership functions.Han et al. [12] proposed a method for the transformation of fuzzy membership function into BBAs by solving a constrained maximization or minimization optimization problem.Recently, Kang et al. [6] designed a BBA determination method using the interval numbers.
Kang's interval number based BBA determination method is simple and effective.Kang's method first constructs the interval number [14] models for each focal element (including the singleton focal elements with single class and the compound focal elements with multiple classes) based on the set of training samples.In Kang's method, the Tran and Duckstein's [14], [16] interval number distance (TD-IND) is used for measuring the similarity degree of the testing samples compared with different focal elements' interval number models.In the final, the similarities are normalized to get the values of BBA.The definitions of the interval numbers' distances (INDs) are crucial for the performance of the interval number based BBA determination method.There exist many possible choices for INDs, e.g., the Gowda and Ravi's distance [15] (GR-IND), the Tran and Duckstein's distance [16] (TD-IND), the Hausdorff distance [17]

II. BASIC OF DEMPSTER-SHAFER THEORY
Dempster-Shafer theory (DST) (also known as the Evidence Theory) is an appealing mathematical framework which can effectively describe the uncertainty information for the state of nature.In DST, the frame of discernment (FOD) is denoted by The elements in Θ are mutually and exhaustive.The basic belief assignment (BBA) function assigns basic beliefs on the power set of Θ, i.e., 2 Θ .The BBA is also called the mass function which satifies: The Belief (Bel) and Plausibility (P l) of A are defined as: The interval [Bel (A) , P l (A)] is call the belief interval, which represents the uncertainty of the support degree of A.
Different information sources can provide different evidences, i.e., the BBAs.In DST, two BBAs associated with two distinct sources of evidence can be combined according to the Dempster's rule, as in Eq. (4).
where K = B∩C=A m (B) m (C) denotes the conflicting coefficient.Dempster's combination rule is both commutative and associative.
To make a probabilistic decision, the fused BBA can be transformed into the probability using the Pignistic probability transformation: where |A| denotes the cardinality of A.

III. KANG'S BBA DETERMINATION METHOD BASED ON THE INTERVAL NUMBERS' DISTANCES
Using the DST, the determination of the BBAs is the first step, which is an still a challenging task.Interval number, which can describe the uncertainty or insufficient information, is useful for determinating the BBAs.The definition of interval numbers is as follows: An interval number ã in R is a set of real numbers that lie between two real numbers, i.e., ã = [a − , a + ] = {x|a − ≤ x ≤ a + } , a − , a + ∈ R and a − ≤ a + .Kang et al. [6] proposed a BBA determination method based on the interval number models, where the basic beliefs assigned to different focal elements are determinated based on the interval numbers' distances between the testing sample and the interval number models of focal elements.Here, we recall the Kang's interval number based BBA determination method first.
Kang's method determinates BBAs on different single features respectively.In a single feature, Kang's method models different focal elements (including the focal elements with single class and the focal elements with multiple classes) using interval numbers, and the testing sample is treated as a degenerate interval (a precise number) with a zero length.Kang' method measures the distances between the testing sample and different interval number models of the focal elements.The testing sample should have a higher similarity degree with the focal element when the distance is small, and the corresponding focal element is assigned a higher basic belief.The steps of Kang's method are described as follows: 1) The interval number models of the focal elements with single class are constructed by finding the minimum and the maximum of the corresponding classes' training samples.Then, the interval number models of the focal elements with mixture classes are obtained by finding the overlapping region of the corresponding single classes' interval number models.The interval number models of different focal elements are denoted by bf , f ∈ 2 Θ .2) Calculate the distances between the testing sample (denoted by ã) and different focal elements' interval number models, i.e., D ã, bf , ∀f ∈ 2 Θ .Note that the length of ã is 0, i.e., a + = a − .3) Calculate the similarity degree based on the distances according to Eq. ( 6).
where α > 0 is the support coefficient.Empirically, it is proper to set α = 5 [6].4) The BBA is determinated by normalizing the similarity degrees of all the focal elements.
Kang's method define the similarity degrees using interval numbers' distance, and the BBAs are obtained by normalizing the similarity degrees.Thus, the definition of the IND (i.e., the D ã, bf ) is crucial for this method.The differences of the BBAs determinated by Kang's method using different INDs are compared in the next section.

IV. COMPARISONS OF INTERVAL
is the length of the domain [14] of the interval numbers.To measure the difference between two interval numbers, many interval numbers' distances (INDs) have been proposed.Here, we introduce four widely used INDs, which are introduced as follows: Gowda and Ravi (1995) [15]: In 1995 Gowda and Ravi proposed a metric (denoted by GR-IND) combining a position and a size component, as follows where the position component is defined as, and the size component is defined as Tran and Duckstein (2002) [16]: In the framework of fuzzy data analysis, Tran and Duckstein proposed the interval numbers' distance (TD-IND): Hausdorff distance [17]:

B. Numerical example
Different INDs can be used for implementing the BBA determinations.Here, we use a numerical example for comparing the interval number based BBA determination methods using different INDs.The BBA determination methods using different INDs are applied on a three-classes classification problem.In this numerical example, we give the features' ranges of different classes directly, as shown in Figure 1, where the feature's range of class 1 (θ 1 ) is [1,4], class 2 (θ 2 ) is [3,7] and class 3 (θ 3 ) is [5,8].
From the Figure 1, the interval numbers models of focal elements can be constructed, which is listed in Table I.Note

Focal elements
Interval number model that in this example {θ 1 , θ 3 } and {θ 1 , θ 2 , θ 3 } do not have interval number models, because the {θ 1 }'s and {θ 3 }'s interval number models do not have overlapping region.Suppose we have a testing sample whose feature value is 2, i.e., ã = [2,2], as the purple dot on X-axis of Figure 1.Then we use different INDs, i.e., the GR-IND as in Eq. ( 7), the TD-IND as in Eq. ( 10), the H-IND as in Eq. ( 12), and the Nq-IND as in Eq. ( 13) (with q = 2 in Nq-IND), for measuring the distance between the ã and different focal elements' interval number models, respectively.The distances are listed in Table II.Then, using the distances the similarity degrees are calculated according to Eq. ( 6), where the support coefficient is set to α = 5.By normalizing the similarity degrees the BBAs are obtained as listed in Table III.
As the BBAs in Table III, the basic beliefs assigned to different focal elements have small differences using GR-IND compared with that using TD-IND, H-IND and Nq-IND.For example, using GR-IND the basic beliefs assigned to {θ 1 } and {θ 2 } are 0.2552 and 0.2305, which have small differences.Using TD-IND, the basic beliefs of {θ 1 } and {θ 2 } are 0.4086 and 0.1289, whose difference is larger.The BBAs Here, we use the Pignistic probability transformation (as in Eq. ( 5)) for transforming the BBAs to probabilities for decision making.The probabilities of the testing sample belonging to different classes are listed in Table IV.

V. EXPERIMENT
To compare the interval number based BBA determination method using different INDs, we use Monte-Carlo experiments on the classification of the artificial set and the iris set.The information fusion based classification is implemented as follows.In each classification, the interval number based method is used for determinating the BBA in each single feature.Then these multiple BBAs are combined using Dempster's combination rule as in Eq. ( 4).Then the combined BBA is transformed into probabilities using Pignistic probability transformation as in Eq. ( 5).The testing sample is classified as the class which has the largest Pignistic probability.
In the experiment, the interval number based methods using different INDs are used for determinating the BBAs respec-tively.In the Nq-IND, we have taken q = 2.The parameter α in the generation of the similarity degrees in the interval number based BBA determination method (as in Eq. ( 6)) is set to 5. The Monte-Carlo classification experiments are repeated 100 times with random testing samples.The effectiveness of the interval number based BBA determination methods using different INDs are compared using the average accuracy of the 100 runs.

A. Experiment on artificial set
The artificial set generated contains 3 classes.Each class has 50 samples, and each sample has 3 features.The features of different classes are generated according to Gaussian distribution, i.e., G μ, σ 2 .The standard deviations (σ) of different classes' different features are all set as σ = 1.The mean (μ) settings of different classes' different features are listed in Table V.The features of different classes in the artificial set we generated are shown in Figures 2-4   As shown in Figures 2-4, the class 3 is linearly separable from class 1 and class 2, and class 1 and class 2 are not linearly separable from each other in feature 1.Similarly, class 2 and class 3 are not linearly separable from each other in feature 2, and class 1 and class 3 are not linearly separable from each other in feature 3.In each Monte-Carlo run, we randomly select 25 samples from each class (75 samples in total) as the set of training samples, and the remaining samples are used as the testing samples.We first classify the testing sample according to the BBA determinated based on each single feature, respectively.Then, we combine the BBAs determinated based on the 3 features, and use the combined BBA for classifying the testing sample.The results of the methods based on different INDs are listed in Table VI.

B. Experiment on iris set
The iris set contains 3 classes.Each class has 50 samples, and each sample has 4 features.In this experiment, we randomly select different numbers of samples as the training samples (the number of the samples selected from different classes are the same), and all the samples are used as the testing samples.results of the number based BBA determination methods based on different INDs are shown in Figure 5.According to Figure 5, the methods using TD-IND, H-IND and Nq-IND perform well in both the cases with small number of training samples and large number of training samples.The method using TD-IND performs the best compared with the methods using other three INDs.The results of the method using GD-IND have a counter-intuitive behavior, since its accuracy decreases with the increasing of the number of the training samples.When the number of training samples is large, the interval numbers generated can better model the features of corresponding classes, especially, for the mixture classes' focal elements (i.e., the overlapping range of corresponding classes' interval number models).However, as discussed in the numerical example in section IV-B, the interval number based method using GD-IND is not recommended for determinating the BBA, especially, counting the mixture class focal elements together.That is why the method using GD-IND performs bad when the number of training samples is large.

VI. CONCLUSION
In this paper, we have tested different INDs for implementing the interval number based BBA determination method.The effectiveness of the BBAs are compared based on the information fusion based classification problems.The experiments validate that combining the BBAs determinated using interval number based methods with different INDs performs well for the classification problems.The methods using the TD-IND, H-IND and Nq-IND provide quasi similar performances, where the one using TD-IND is the best one.Using the GD-IND, the basic beliefs construction is not very effective.With GD-IND, the differences of the basic beliefs assigned to different focal elements are small, which is not discriminant enough for making decisions, especially, counting the mixture classes' focal elements.Therefore, the method using the GD-IND is not recommended.
Up to now, the interval number based BBA determination methods are implemented on the single feature.In future work, we will try to use the interval numbers for determinating the BBAs on the multiple features spaces, and compare the effectiveness of the ones using different INDs.We will explore also different decision-making strategies (i.e.DSmP, min of d BI, etc.), and test other rules of combination as well to see if we can improve classification performances.
(H-IND) and the De Carvalho's normq distance [18] (Nq-IND).In this paper, we implement the Kang's interval number based method using different INDs.We analyze the differences of the BBAs determinated using different INDs based on numerical examples.Furthermore, we use Monte-Carlo experiments for comparing the performances of interval number based methods with different INDs by classifying an artificial set and the iris set 1 .
NUMBER BASED BBA DETERMINATION METHOD USING DIFFERENT INDS As aforementioned, the definition of the IND is crucial for the interval number based BBA determination methods.Many INDs have been proposed.Here, we introduce four widely used INDs. A. Introduction of the interval number's distances Suppose ã = [a − , a + ] and b = [b − , b + ] are two interval numbers.Then [13], [14], c = ã ⊕ b = [c − , c + ], where c − = min (a − , b − ) and c + = max (a + , b + ).The length (or width) of the interval number ã is μ two sets A and B of points of R n , and a distance d (x, y), where x ∈ A and y ∈ B. The Hausdorff distanc (H-IND) is defined as follows: D H (A, B) = max sup x∈A inf y∈B d (x, y), sup y∈B inf x∈A d (x, y) (11) If d (x, y) is the Manhattan distance (also called the City block distance), i.e., d (x, y) = |x − y|, then Chavent et al. (2002) proved that D H ã, b = max a − − b − , a + − b + (12) De Carvalho et al. (2006) [18]: A family of distances between interval numbers has been proposed by De Carvalho et al. based on the bounds of interval numbers.The metric of norm-q (Nq-IND) is defined as:

Fig. 5 .
Fig. 5. Performances of the interval number based methods using different INDs with different scales of training samples on iris data set.

TABLE I THE
INTERVAL NUMBERS MODELS OF FOCAL ELEMENTS.

TABLE III THE
BBAS DETERMINATED BASED ON DIFFERENT INDS.

TABLE IV THE
PIGNISTIC PROBABILITIES OBTAINED BASED ON DIFFERENT INDS.
Intuitively, the testing sample belongs more likely to class 1, as shown in Fig. 1.According to Table IV, the methods using the TD-IND, H-IND and Nq-IND all can make right classifications.According to the probabilities originated from the GR-IND, the testing sample should be classified to class 2. Revisiting the BBA determinated based on GR-IND, the basic beliefs assigned to the focal elements with single class has the right tend, i.e., m ({θ 1 }) > m ({θ 2 }) > m ({θ 3 }).However, the Pignistic probabilities originated from the GR-IND is counter-intuitive, where the beliefs assigned to the focal elements with multiple classes are counted together.From this perspective, the BBA determinated based on GR-IND is not so good.In this numerical example, the interval number based methods using the TD-IND, H-IND and Nq-IND perform more proper for the BBA determination than that using the GR-IND if the decision-making is based on max of BetP.

TABLE V THE
MEAN (μ) SETTINGS OF DIFFERENT CLASSES' DIFFERENT FEATURES.

TABLE VI THE
RESULTS OF THE METHODS BASED ON DIFFERENT INDS.