A DSmT based combination scheme for multi-class classification

This paper presents a new combination scheme for reducing the number of focal elements to manipulate in order to reduce the complexity of the combination process in the multi-class framework. The basic idea consists in using of p sources of information involved in the global scheme providing p kinds of complementary information to feed each set of p one class support vector machine classifiers independently of each other, which are designed for detecting the outliers of the same target class, then, the outputs issued from this set of classifiers are combined through the plausible and paradoxical reasoning theory for each target class. The main objective of this approach is to render calibrated outputs even when less complementary responses are encountered. An inspired version of Appriou's model for estimating the generalized basic belief assignments is presented in this paper. The proposed methodology allows decomposing a n-class problem into a series of n-combination, while providing n-calibrated outputs into the multi-class framework. The effectiveness of the proposed combination scheme with proportional conflict redistribution algorithm is validated on digit recognition application and is compared with existing statistical, learning, and evidence theory based combination algorithms.


INTRODUCTION
Nowadays a large number of classifiers and methods of generating features is developed in various application areas of pattern recognition [1], [2].Nevertheless, it failed to underline the incontestable superiority of a method over another in both steps of generating features and classification.Rather than trying to optimize a single classifier by choosing the best features for a given problem, researchers found more interesting to combine the recognition methods [2].Indeed, the combination of classifiers allows exploiting the redundant and complementary nature of the responses issued from different classifiers.
Researchers have proposed an approaches for combining classifiers increasingly numerous and varied, which led the development of several schemes in order to treat data in different ways [2].Generally, three approaches for combining classifiers can be considered: parallel approach, sequential approach and hybrid approach [2].Furthermore, these ones can be performed at a class level, at a rank level, or at a measure level [3].
However, with the existence of the constraints corresponding to the joint use of classifiers and methods of generating features, an appropriate operating method using mathematical approaches is needed, which takes into account two notions: uncertainty and imprecision of the responses of classifiers.In general, the most theoretical advances which have been devoted to the theory of probabilities are able to represent the uncertain knowledge but are unable to model easily the information which is imprecise, incomplete, or not totally reliable.Moreover, they often lead to confuse both concepts of uncertainty and imprecision with the probability measure.Therefore, a new original theories dealing with uncertainty and imprecise information have been introduced, such as the fuzzy set theory [4], evidence theory [5], possibility theory [6] and, very recently, the theory of plausible and paradoxical reasoning in [7], [8], [9].The Dezert-Smarandache theory (DSmT) of plausible and paradoxical reasoning was elaborate by Jean Dezert and Florentin Smarandache for dealing with imprecise, uncertain and paradoxical sources of information.Thus, the main objective of the DSmT was to introduce combination rules that would allow to correctly combining evidences issued from different information sources, even in presence of conflicts between sources or in presence of constraints corresponding to an appropriate model (i.e.free or hybrid DSm models [7]).The DSmT is justified in many kinds of applications [7], [8], [9].Indeed, the DSmT has a feasible computational complexity for industrial uses which are considered as problems of small dimension [10], [11].In contrast, the extension of this theory into the multi-class framework has the problem of their applicability in view of the high computational complexity.This is closely related with the number of elements to be processed in the framework of this theory, which follows the sequence of Dedekind's numbers [12].Try to use the free DSm model, considering the set of all subsets of the original classes (but under the union and the intersection operators), is not easy and becomes untractable for more than 6 elements in the discernment space [13].
In this paper, we propose an effective combination scheme of one-class classifiers in a general belief function framework by incorporating an intelligent learning technique for reducing the number of focal elements.This allows us to reduce drastically the computational complexity of the combination process and to extend specially the applicability of DSmT into the multi-class classification framework.Indeed, the objective of this work is neither to choose the kind of the one-class classifier, but only to illustrate from a practical application the advantage of this new combination scheme for the real-time implementation purpose.
In Section II, this paper will deal with the related works.Theoretic formulation of Proportional Conflict Redistribution (PCR6) combination rule and the way it can be extended for solving the multi-class classification problem are presented in Section III.We give in Section IV a multi-class classification scheme based on belief function theories.The database of the isolated handwritten digits, methods used for generating features and algorithm used for OC-SVM models validation are described in Section V.The experimental and statistical results are summarized in Section VI.

II. RELATED WORKS
Dezert and Smarandache [14] proposed a first work for ordering all elements generated using the free DSm model for matrix calculus such as [15], [16] made in DST framework.But, this proposition has limitations since in practical applications it is better to only manipulate the focal elements [17], [18], [19], [3].
Hence, few works have been focused already on the computational complexity of the combination algorithms formulated in DSmT framework.Djiknavorian and Grenier [18] showed that there's a way to avoid the high level of complexity of DSm hybrid (DSmH) combination algorithm by designing a such code that can perform a complete DSmH combination in very short period of time.However, even if they have obtained an optimal process of evaluating DSmH algorithm, first some parts of their code are really not optimized and second it has been developed only for a dynamic fusion.Martin [20] further proposed a practical codification of the focal elements which gives only one integer number to each part of the Venn diagram representing the discernment space.Contrary to the Smarandache's codification [13] used in [21] and the proposed codes in [18], author thinks that the constraints given by the application must be integrated directly in the codification of the focal elements for getting a reduced discernment space.Therefore, this codification can drastically reduce the number of possible focal elements and so the complexity of the DST as well as the DSmT frameworks.A disadvantage of this codification is that the complexity increases drastically with the number of combined sources especially when dealing with a problem in the multi-class framework.To address this issue, Li et al. [22] proposed a criterion called evidence supporting measure of similarity (ESMS), which consists in selecting, among all sources available, only a subset of sources of evidence in order to reduce the complexity of the combination process, but this criterion has been justified for only a two-class problem.However, the complexity of reducing both the number of combined sources and the size of the discernment space are research challenges that still need to be addressed.To generate the combined masses from all the p one-class classifiers, a set of elements is defined namely , which attributes for each element , such that defines the index of the corresponding source.
According the finite set of hypotheses, a combination rule is then performed on elementary masses for generating partial masses from all the p sources as follows: where ( ) ( ) is the partial mass associated to the elementary or compound hypothesis A of the combined sources and verifying ( ) ( ) ( ) Hence, the choice of an appropriate combination rule depends on the set of predefined hypotheses.Example of such approaches is Proportional Conflict Redistribution (PCR6) rule based on DSmT.

A. Combination Rule Based On the DSmT
the discernment space of the multiclass classification problem under consideration having n exhaustive elementary hypotheses i θ , which are not necessarily mutually exclusive in DSmT.The main concept of the DSmT is to distribute basic belief assignment of certainty over all the composite propositions built from elements of subset of Θ , namely ( ) { } , with ∪ (Union) and ∩ (Intersection) operators instead of making this distribution over the elementary or union hypothesis only.Therefore, the hyperpowerset for two hypotheses (classes) belonging to . The DSmT uses generalized basic belief mass, also known as the generalized basic belief assignment (gbba) computed on hyperpowerset of ( ) i Θ , which is defined as: represents the mass of the conflict (or paradoxical information), and . The way the conflicting mass is redistributed yields to several versions of a Proportional Conflict Redistribution (PCR) rules [23], [19].Form PCR1 to PCR2, PCR3, PCR4, PCR5 one increases the complexity of the rules and also the exactitude of the redistribution of conflicting masses.The combination rule (PCR5) proposed by [23] for two sources is mathematically one of the best for the proportional redistribution of the conflict applicable in the context of the DSmT.Martin and Osswald have proposed the following alternative rule to PCR5 for combining more than two sources altogether (i.e.p upper then 3).This new rule denoted PCR6 does not follow back on the track of conjunctive rule as PCR5 general formula does, but it gets better intuitive results.For 2 = p PCR5 and PCR6 coincide.The combined partial gbba by means of the PCR6 rule [19] is defined as: is the set of all relatively and absolutely empty elements, M Φ is the set of all elements of ( ) i G which have been forced to be empty in the hybrid model M defined by the exhaustive and exclusive constraints, Ø is the empty set, the denominator and where k σ counts from 1 to p avoiding k , i.e.: ( ) Here, ( ) ( ) ∧ corresponds to the classical DSm rule on the free Dsm model [24], which is defined as:

B. Effective Combination Scheme of One-Class Classifiers
For the computation of the global combined mass the direct use of PCR6 combination rule on Θ D yields the computation cost that increases drastically with n and even may be computationally prohibitive, especially when we have a huge number of elements belonging to Θ D when dealing with gbba's within the DSmT framework (i.e.

( ) ( )
is the Dedekind's number of n ).However, for our multi-class problem, because of a special feature that the separation of data according to One Against All (OAA) approach [25] of such classifier, we can render the data highly unbalanced for each two-class problem.Hence, the need to use an one-class classifier which is able to distinguish the samples of the target class i θ from other outliers belonging to its From this principle, we propose a combination scheme which allows decomposing a n -class problem into a series of n -combination, whose reasoning for each combination is performed from the subset ( ) , instead of the reference space Θ .Hence, we propose a combination scheme which uses a complementary features captured by the different sources of information , from the input probe data to feed the one-class classifiers , that operate independently of each other for each target class , and then the partial opinions (i.e.transformed measures) provided from these classifiers will be combined all together through an appropriate rule in the subset ( ) . Finally, all the n partial combined masses will be incorporated in a unique module for the task of decision making.Table I gives obtained according to our computing method within DSmT framework, as follows: In this way, we can reduce drastically, within DSmT framework, the number of focal elements from ( ) ( ) IV. MULTI-CLASS CLASSIFICATION SCHEME BASED ON BELIEF FUNCTION THEORIES The proposed multi-class classification scheme incorporates mainly four modules: i) one-class support vector machine (OC-SVM) classification, ii) transformation of the normalized OC-SVM outputs into belief assignments using estimation technique based on a modified version of Appriou's model, iii) combination of masses through an algorithm based on belief function theories and iv) decision making.

A. Classification Based On OC-SVM
In the proposed combination scheme, we take the path of OC-SVMs, enabling us to incorporate an intelligent learning technique to efficiently avoid both the closed set and good distribution assumptions in the multi-class classification framework.In the following, we briefly review the concept learning with one-class SVM.
Review of OC-SVM: Schölkopf et al. [26] proposed OC-SVM classifier by modifying the standard support vector machines initially introduced by Vapnik [27].The pattern classification approach using OC-SVM has been successfully used for many applications as biometric verification [28], [29].This classifier is an unsupervised learning algorithm, which only requires the learning of the target class samples.In fact, the pattern classification through OC-SVM consists of defining a boundary around the target class, such that it accepts as many of the target samples as possible, while minimizing the chance of accepting outliers.
The concept of the OC-SVM seeks to find an hyper sphere in which the most of learning data are included into a minimum volume.More specifically, the objective of the OC-SVM is to estimate a function ( ) that encloses the most of learning data into a hyper sphere with a minimum volume where d is the size of feature vector [26].Hence, the decision function ( ) is given as [26]: where Sv is the number of support vectors j x form the training dataset, j α are Lagrange multipliers, such that , m is the cardinal of training dataset, v is the percentage of data considered as outliers, ρ defines the distance of the hyper sphere from the origin, and ( ) .., K defines the OC-SVM kernel that allows projecting data from the original space to the feature space [30].
A pattern x is then accepted when . Otherwise, it is rejected.Various kernel functions can be used as polynomial, Radial Basis Function (RBF) or multilayer perceptron [27].Generally, the RBF kernel is used for its better performance, which is defined as: where γ is the kernel parameter.
In the following, we show how the OC-SVM based concept learning can be extended to construct multi-class OC-SVM with multiple hyper spheres.

Extension of OC-SVM for Constructing Multi-Class OC-SVM:
Basically, the OC-SVM classifier have been conceived to deal with an one-class classification problem [26].Their extension to multi-class scenario may provide uncalibrated outputs for some classifiers.In this paper, we use a sigmoid transformation for mapping the reassigned output, using logarithmic function, of different OC-SVM classifiers to probabilities as follows: where factors that are introduced in the probabilistic framework in order to respect the normality condition (i.e. ( ) . Thus, the term ( ) , which is defined for a given pattern x as: ,respectively.In the multi-class classification framework, OC-SVM classifier is extended.Therefore, the posterior probability ( ) , of the frame can be directly obtained according the equation (8).Finally, the maximum likelihood (ML) test is used for decision making as follows: where x is the pattern test characterized by the source of information

B. Estimation of Masses
In this paper, the mass functions gbba of evidence ( ) , are estimated using an inspired version of Appriou's model, which is initially defined for two classes [31].Therefore, the modified version of Appriou's model in DSmT framework over ( ) { } , is given as: is used to quantify the belief that the pattern x belong to the subset ( ) . Therefore, the value of ε is fixed here to 0.001.

C. Combination of Masses
In order to manage the conflict generated from p sources of information (i.e. ), the global combined masses are computed over the subsets , as follows: where n 1 represents normalization factor that is introduced in DSmT framework in order to respect the normality condition of masses over the set of all focal elements ( ) ( ) m is the partial combined mass using the PCR6 combination rule, such that ( ) ( ) ( ) , and ⊕ represents the combination operator, which is composed of both conjunctive and redistribution terms of the PCR6 rule, when dealing with DSmT framework.

D. Decision Rule
Combination of evidences using the proposed combination scheme yields the combined belief and a decision making is made using the statistical classification technique.First, the combined beliefs are converted into probability measure using a probabilistic transformation, called DSmP, that maps a belief measure to a subjective probability measure [32] defined as: where is a tuning parameter and F corresponds to the set of all focal elements including eventually all the integrity constraints (if any, i.e.
for Shafer's model and for all the paradoxical hypotheses); ( ) denotes the DSm cardinal of the set k A [14].In the context of some particular multi-class classification problems, the simple classes i θ are truly exclusive and Shafer's model is adopted.Therefore, the can be directly obtained according the following equation: .
In this manner, the combined belief assignment is transformed into a probability measure so that the statistical classification approach is applied for computing the final decision.Finally, the DSmP-based maximum likelihood (ML) test is used for decision making as follows: where x is the pattern test characterized by p sources of information , and ε is fixed to 0.001 in the decision measure given by (17).

V. DATABASE AND ALGORITHMS USED FOR VALIDATION
The proposed OC-SVM classifiers are trained using different methods of generating features on a database of the isolated handwritten digits.In this section, we briefly describe the database, the methods used for generating features and the algorithm used for validation of OC-SVM models.

A. Database Description and Performance Criteria
To validate the proposed combination scheme, the wellknown US Postal Service (USPS) database is used for handwriting recognition task.This database contains normalized grey-level handwritten digit images of 10 numeral classes, extracted from US postal envelopes.All images are segmented and normalized to a size of 16 16 × pixels.There are 7291 training data and 2007 test data where some of them are corrupted and difficult to classify correctly.For evaluating the performances of the combination scheme, a popular rates are considered, which are the Recognition Rate (RR) for each class and Mean Recognition Rate (MRR) for all classes.

B. Methods Used for Generating Features
The objective of the features generation step is to underline the relevant information that initially exists in the raw data.Thus, an appropriate choice of the descriptor improves significantly the accuracy of the combination scheme.In this study, we use a collection of popular feature generation methods, which can be categorized into background features, foreground features, geometric features [3].

C. Algorithm Used for Validation of OC-SVM Models
The OC-SVM model is produced for each class according the used descriptor.Hence, the training dataset is partitioned into ten subsets of samples: each one is used as a learning subset to learn the corresponding OC-SVM classifier that operates independently of other.Let N different values of the RBF parameter i γ , sorted in increasing order, such that , on the set of all models fulfilling the last condition, using the maximum of the number of support vectors Sv criteria.Consequently, higher the number of support vectors is, the better the information is representative for each class.

VI. EXPERIMENTAL RESULTS
The effectiveness of the proposed combination scheme is demonstrated experimentally by evaluating the recognition performance on all isolated handwritten digits from the test dataset.We perform experiments to select a subset of global complementary sources of information using the proposed extension of OC-SVM into multi-class (MC) classification framework, namely MC-OC-SVM, and then the proposed combination scheme is evaluated in belief function theories framework.

A. Performance Evaluation of the Proposed Descriptors
In these experiments, we compute during the test phase the recognition rate MRR of the MC-OC-SVM classifier using Geometric Features (GF), Foreground Features (FF), Background Features (BF), and the descriptors which result from a concatenation between at least two simple descriptors such as (BF,FF), (BF,GF), (FF,GF), and the (BF,FF,GF) descriptor.Indeed, the experiment has shown that the appropriate choice of both descriptors and concatenation in order to represent each digit class in the feature generation step provides an interesting recognition performance.In Table II, the MRR vary from one descriptor to another, and the MRR of concatenated descriptors are relatively high compared to those of simple descriptors.
As we can see, it is difficult to improve the recognition performance by a concatenation of features since most of the time the combined descriptors does not take into account the complementary nature of features, which can be exist between both descriptors.Hence, we choose among all descriptors available (i.e.see Table II) only those for which the corresponding MC-OC-SVM classifiers could attain an improvement in the recognition performance.Indeed, BF, FF and GF-based descriptors yield respectively in the experiments (c), (b) and (a) a MRR of 89.50%, 83.75% and 78.90%.When using (FF,GF)-based descriptor in the experiment (f), we obtain a significant improvement in the recognition performance of the MC-OC-SVM classifier from 83.75% until 87.59%.Further, an important gain of 4.31% in the recognition performance, where MRR = 91.71%, is obtained in the experiment (g) when the BF-based descriptor is concatenated to the (FF,GF)-based descriptor to get a new (BF,FF,GF)-based descriptor.
Hence, in the following section we use the three descriptors BF, (FF,GF) and (BF,FF,GF) as global sources of information to feed the OC-SVM classifier of each target class.This allows us to evaluate the recognition performance of the proposed combination scheme and to better exploit the complementary nature, which is obtained from these descriptors.In this way, it is possible to improve the recognition performance when the concatenation of descriptors can fail to provide the correct solution for some specific handwritten digit recognition problems.

B. Performance Evaluation of the Proposed Combination Scheme
In these experiments, we evaluate the recognition performance of the proposed combination scheme using sum, DS and PCR6 rules.In fact, this combination scheme allows to exploit the complementary nature issued from the three sources of information , and manage the conflict provided from the outputs of OC-SVM classifiers of each target class., for each corresponding combination.In Table III, each line presents the FAR error computed for each class of handwritten digits using the three sources of information , for each source of information.
For better comparison, recognition results corresponding to the combination of the three sources  IV.The proposed combination scheme using the DS rule yields a MRR of 92.76% corresponding to an improvement of 1.05%.While sum rule decreases the MRR to 92.68%.This is due to the direct estimation technique of masses which assigns the confidences only to the simple classes in probabilistic theory framework.Hence, the sum rule couldn't handle managing correctly the conflict generated from the three sources.Furthermore, the experimental results show that the DS rule is not able to handle most of the conflicting cases between the three sources.Hence, the DST is not appropriate to solve our problem of handwritten digit recognition into the multi-class classification framework.Indeed, the use of DS rule in the combination scheme allows redistributing the beliefs through a simple normalization by ( ) in the combination process of masses.However, when responses of OC-SVM classifiers are less complementary, both sum and DS rules do not provide reliable decision.In Table IV, an improvement of RR when using PCR6 rule compared to both sum and DS rules were obtained for all classes of handwritten digits except those of θ .This is because there are some digits belonging to theses classes which are wrongly characterized by the combined sources 2 1 , S S and 3 S .In other words, the PCR6 combination based rule is not reliable when the complementary information provided from the sources of information is wrongly preserved.In order to get a higher MRR, the combined sources of information should provide complementary information.The proposed combination scheme with PCR6 rule yields the best MRR of 95.03% when combining the three sources ).After redistribution, the combined mass is transformed into the DSm probability and the DSmP-based ML test is used for decision making.Finally, the proposed combination scheme using PCR6 rule in DSmT framework is the most stable across all experiments whereas recognition rates pertaining to DS combination rule vary significantly.

VII. CONCLUSION AND FUTURE WORK
In this paper, an effective combination scheme of one-class classifiers in a general belief function framework has been proposed.The OC-SVM classifiers can be incorporated as an intelligent learning technique for reducing the number of focal elements.This scheme consists in using of a subset of global complementary sources of information to feed the OC-SVM classifiers corresponding to each target class, which allows decomposing a n-class problem into a series of n-combination, while providing n-calibrated outputs into the multi-class framework.Therefore, this allows us to reduce drastically the computational complexity of the combination process and to extend specially the applicability of DSmT into the multi-class classification framework.Experimental results show that the proposed combination scheme with PCR6 rule yields the best performance on the handwritten digit recognition application compared to the sum rule and DS rule even when the individual MC-OC-SVM multi-classifications provide uncalibrated outputs.
In continuation to the present work, the next objectives consist to adapt the use of the evidence supporting measure of similarity (ESMS) criteria to select complementary sources of information for each target class using the same proposed combination scheme in order to attempt to improve the RR and MRR.

1 ,
which defines the cardinal of set of all focal elements ( )

ik
Sv and ikρ are the number of support vectors and the distance of the hyper sphere from the origin for each ik parameters of each OC-SVM model are tuned during the validation phase using the corresponding training subset of samples.In this work, we have allowed until 10 % of error on the training dataset (i.e. percentage of training data considered as outliers 1 rate computed during the validation phase.Indeed, the selection of the optimal value ( ) The proposed combination scheme, which uses ten combinations (i.e. a combination per target class), consists to measure ten values of conflict 9 isolated handwritten digit from the test dataset.In the context of recognition of isolated handwritten digits, the conflicting regions of each handwritten digit test are modeled by the ten paradoxical

3 S
by sum, DS and PCR6 rules are respectively given in Table

3 S
all together.Indeed, the PCR6 rule allows an efficient redistribution of the partial conflicting mass only to the elements involved in the partial conflict (i.e. the target class i θ and

TABLE II RECOGNITION
RATES OF THE MC-OC-SVM CLASSIFIER USING DIFFERENT METHODS OF GENERATING FEATURES