Hybrid Method HVS-MRMR for Variable Selection in Multilayer Artificial Neural Network Classifier

ABSTRACT


INTRODUCTION
Reducing dimensionality of dataset has become increasingly critical because of the multiplication of data. In many areas, the solution of a system problem is based on a set of database (variables) [1][2]. Increasing the number of these variables that characterizes the problem represents difficulties at many levels such as complexity, computing time, and deterioration of the system problem solution in the presence of noisy data. A method of reducing dimensionality is to find a representation of the original data in a smaller space. Dimensionality reduction can roughly be divided into two categories [3][4]: feature extraction and feature selection. Firstly, Feature extraction generates a small set of novel features by merging the original features. Secondly, Feature selection picks a small set of the original ones.
Variables selection or features selection is a search process used to select a subset of variables for building robust learning models [5] such as neural networks, decision trees and others. Some irrelevant and/or redundant variables exist in the learning data that make learning harder and decrease the performance of learning models. The variables selection methods can be classified into three main categories: filter, wrapper and embedded. Filter methods were the first used for the variables selection. This category allows evaluating the relevance of a variable according to measures that rely on the properties of the learning data. Filter techniques are fast for high-dimensional datasets, but they ignore interaction with the classifier [6]. The wrapper methods use the predictive accuracy of a predetermined learning algorithm to determine the best  [7]. Wrapper methods tend to find the most suitable feature subset for the learning algorithm, but they are very computationally expensive. Unlike the wrapper and filter methods, embedded methods incorporate the selection of variables during the learning process. The embedded methods combine the advantages of filter and wrapper techniques [8]. The filter approach determines the relevant and redundant variables independent of the classification, such as using only MRMR criterion, so it is not recommended to use it alone [9]. The method filter might improve the selection of variables if it understands how the filtered variables are used by the classifier. The wrapper method evaluates a subset of features by its classification performance using a learning algorithm [10], for example, heuristic variable selection (HVS). In this work, we propose to incorporate MRMR criterion into the ranking scheme of HVS. We are making hybrids by a convex combination of the relevancy given by HVS criterion and the MRMR criterion.
The rest of the paper is organized as flows: Section 2 we present related works. In section 3 we present research method. Section 4 presents the results of our experimental studies including the experimental methodology, experimental results, and the comparison with heuristic variables selection HVS and MRMR Minimum Redundancy Maximum Relevance. The conclusions are drawn in Section 5.

RELATED WORKS
The variables selection is generally defined as a search process to find a subset of "relevant" characteristics from those of the original set [5], [11][12][13]. The concept of relevance of a subset of variables always depends on the objectives and system requirements. The problem of variables selection for classification task can be described as follows: given the original set G, of N features, find a subset F consisting of N' relevant features where N'< N .The selection of a subset F allows maximizing the performance of the classification by constructing learning models.

The Heuristic Variable Selection
Let's consider that a multilayer perceptron (MLP) [14] is neural networks architecture defined by A (I, H, 0) where I input layer, H hidden layers and O output layer, and W weight matrix. The value of w_ij connection between two neurons j and i reflects the importance of their relationship. This value can be positive or negative depending on if the connection is excitatory (+) or inhibitory (-). Yacoub et al proposed a method for variable selection named heuristic variable selection HVS [15]. The HVS criterion is interested in the strength of these connections. This strength is quantified by |w_ij |.The partial contribution π (i, j) of the hidden neuron j on the output i is given by the proportion of all the connection strength arriving to neuron i see Figure 1.
For estimate the relative contribution of unit j is final decision of the system. The unit j sends connections to, a set of units (j) with partial contributions π i,j Figure 2.

Minimum Redundancy Maximum Relevance
The MRMR (Minimum Redundancy Maximum Relevance) method [16] selects variables that have the maximally relevance with the target class and which are also minimally redundant. In this work, to find a maximally relevant and minimally redundant set of variables, we use mutual information based MRMR criterion. The calculation of redundancy and relevance of a variable is given by equations (4) and (5). The I(i, Y) is the mutual information between class labels y and variable i . This allows us to quantify the relevance of variable i to the classification. The relevance of variable is given by: The redundancy of a variable subset is determined by the mutual information among the variables. The redundancy of variable i with the other variables is given by: Where S and |S| respectively denote the set of variables and its size and I(i, j) is the mutual information between i and j . The score of a variable is the combination of these two factors: The measures of relevance and redundancy of variables can be formed in several ways, but the quotient of the relevance by redundancy select highly relevant variables with less redundancy [17]. After this individual variable evaluation, a sequential search technique is used with a classifier to select the final subset of variables. A classifier is used to evaluate the subsets starting with the variable that has the best score, the best two, until we find the subset that minimizes the classification error.

RESEARCH METHOD
Using the filter methods alone for example MRMR, may not give the best performance because it operates independently the classifier and is not involved in the selection of variables. On the other hand, HVS does not take into account the redundancy among variables. Our objective is to improve the variables selection HVS by introducing an MRMR filter to minimize the redundancy among relevant variables. As seen later, this improves the performance of classifier by compromising relevancy and redundancy of variables.
In our approach of HVS-MRMR variables selection, the variables are selected by a convex combination of the relevancy given by HVS contributions and the MRMR criterion. For i the variable, the ranking Measure R_i is given by Where the parameter α ∈ [0,1] determines the compromise between HVS and MRMR criterion, The search strategy is one of the properties of the variable selection algorithms. There are three strategies, forward selection, backward elimination and stepwise selection. In forward selection, variables are progressively incorporated into larger and larger subsets. In backward elimination one starts with the set of all variables and progressively eliminates the least promising ones [3]. In our algorithm we use the strategy backwards elimination. To better compromise with redundancy and Relevancy of variables, we use S(i)MRMR criterion for ranking. Also, we use |C i | the criterion of HVS as the measure of relevance variables.
Algorithm 1 illustrates HVS-MRMR variables selection method. In each iteration, we identified the least important variable after ranking the variables in the set G. The variable least significant are retired and the remaining subset G will go through process iterative, each time we remove a variable, relearning is required. The algorithm removes the variables one by one until the last variable.
Let's consider that S p is a subset of variables and p is the number of these variables.

Algorithm 1 : HVS-MRMR for variables selection
Begin Set α Given set of variable, S⊂G Repeat : Train the MLP with test dataset; For each i ∈ do Compute the by equation (2 ) Compute the by equation (7 ) Compute the by equation (8 )

RESULTS AND ANALYSIS
To evaluate the performance of the HVS-MRMR, we use some data sets for real world classification. Table 1 shows the detailed information of each dataset. They were partitioned into three sets: a training set, a validation set and a testing set. The training and testing sets were used to train MLP and to evaluate the classification accuracy of trained MLP, respectively. The validation set is used to estimate prediction error for MLP.

Preprocessing
We preprocessed the datasets by rescaling input variables values between 0 and 1 the linear normalization function: After normalization, all features of these examples are between zero and one. The standardization formula is as follows: Where x i,new and x i,old are the new and old value of attribute, respectively. The x i,max and x i,min are the maximum and minimum value, respectively of variable i.

Parameter Estimation
In this paragraph, we have specified some parameters of HVS-MRMR. These are described as follows: The initial weights for an MLP were randomly chosen in the value between -1 and 1.The training error threshold value for cancer, diabetes, glass, hepatitis, horse, ionosphere, vehicle, and waveform datasets was set to 0.002, 0.003, 0.04, 0.02, 0.003, 0.02, 0.01and 0.03 respectively. Also, the validation error threshold value was set to 0.001, 0.002, 0.025, 0.018, 0.001, 0.014, 0.007 and 0.025 for cancer, diabetes, glass, hepatitis, horse, ionosphere, vehicle, and waveform datasets, respectively. The learning rate and initial weight values are the parameters of the well known back-propagation algorithm [18][19]. From to the suggestions of many previous works [20][21] and after some preliminary tests these values were set. The α value for the HVS-MRMR variable selection was then determined empirically from the set {0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8} based on the best tenfold cross-validation performance. Table 2 shows the results of HVS-MRMR, MRMR and HVS over 20 independent runs on eight classification datasets. The classification accuracy (Acc) in Table 2   In order to determine the essence of selected variables, we measured the frequency of variables. The frequency of a variable i can be defined as

Results
Where H i is the number of times a particular variable is selected in all test and T is the total number of test. Figure 3, Figure 4 and Figure 5 show the frequency of variables selected for diabetes, cancer, and glass datasets, respectively. It can be seen in Figure 3 that HVS-MRMR selected features 1, 2, 6, 7, 8 and 9 of the Cancer dataset vary frequently. The frequency of selection for these variables is one or nearly one.

˝ ̶ ˝ means not available
We can be observed that our method achieved the best classification accuracy among all other algorithms for five out (Cancer, Glass ,out, Hepatitis, Horse and Ionosphere) of eight datasets. For the remaining three datasets, HVS-MRMR achieved as a second best. while HVS (Waveform), ANNIGMA-WRAPPER(Diabetes) and GPSFSCD(Vehicle) achieved the best classification accuracy for one dataset each.
We can be said that the variables selection increases the classification accuracy by ignoring the irrelevant variables from the original feature set. The variables selection is an important task in such a process is to select necessary information (irrelevant variables). Otherwise, the performance of classifiers might be decreased. The efficacy of embedding of MRMR filter in HVS was evidenced by improved classification performance on benchmark datasets. In this paper, the proposed algorithm outperformed other methods in the classification on the most database tested, it was able to select the relevance variables among datasets. We can be choose from these data with the analysis of performance of the subset of variables that have strong relationship with the classification. However in terms of execution time, our proposed approach consumes more than HVS and MRMR.

CONCLUSION
It is very important to remove the redundant and irrelevant variables in data before applying some data mining techniques to analyze the data sets. In our research, we suggest a new method of variables selection based on HVS criterion and MRMR criterion, called the HVS-MRMR, to integrate the procedures of variable selection filter and variable selection wrapper to improve the performance of classification.
We applied HVS-MRMR for four classification problems. The experiment results show that HVS-MRMR variables selection selected a less number of variables with high classification accuracy compared to MRMR, HVS.
In a forthcoming research work, we intend to improve this approach to find better subset of variables selected and to improve classification accuracy [26][27].