The pertinent single-attribute-based classifier for small datasets classification

Received Jul 27, 2019 Revised Dec 5, 2019 Accepted Dec 11, 2019 Classifying a dataset using machine learning algorithms can be a big challenge when the target is a small dataset. The OneR classifier can be used for such cases due to its simplicity and efficiency. In this paper, we revealed the power of a single attribute by introducing the pertinent single-attributebased-heterogeneity-ratio classifier (SAB-HR) that used a pertinent attribute to classify small datasets. The SAB-HR’s used feature selection method, which used the Heterogeneity-Ratio (H-Ratio) measure to identify the most homogeneous attribute among the other attributes in the set. Our empirical results on 12 benchmark datasets from a UCI machine learning repository showed that the SAB-HR classifier significantly outperformed the classical OneR classifier for small datasets. In addition, using the H-Ratio as a feature selection criterion for selecting the single attribute was more effectual than other traditional criteria, such as Information Gain (IG) and Gain Ratio (GR).


INTRODUCTION
Classification is one of the main tasks of data mining and machine learning [1] that is widely used to predict different real-life situations. High accuracy is a key indicator for a successful prediction model. Building an accurate classifier is one of the important goals, and rich datasets make this task easier and more effective [2]. Classifying small datasets efficiently is essential as some real situations cannot provide a sufficient number of cases. A limited training set is challenging to learn and, as a result, base a decision on it. In many multivariable classification or regression problems, such as estimation or forecasting, we have a training set Tp = (x i , t i ) of p pairs of input/output vector x ∈ ℜ n and scalar target t. Thus, according to Vapnik's definition, a small dataset for Tp is determined as follows: "For estimating functions with VC dimension h, we consider the size p of data to be small if the ratio p/h is small (say p/h < 20)" [3].
The problem with the small dataset is that, if not elaborately collected, it is not a representative sample. Non-representative instances hinder the process of providing enough information for the learner model because of the gaps existing between instances; thus, the model does not generalize well. Many works have been proposed in the literature to solve the problem of small data size by using different methods. One of the common methods used is to increase the size of data by adding artificial instances [4], but this approach lacks data credibility and reflection on real-life use. Some researchers have used feature-selection methods [5][6][7][8], whereas a novel technique using multiple runs for model development was proposed by [9] and others.
A simple solution is one of the requirements when the problem is becoming increasingly complex. This philosophy has been stated by Occam's razor [1]. Literature in the field of classification has shown some successful attempts of very simple rules to achieve high accuracy with many datasets [10]. OneR is one of the simple and widely used algorithms in machine learning to build a simple classifier. A trade-off between simplicity and high performance [10] makes OneR's performance slightly less accurate than state-of-the-art classification algorithms [11,12], although sometimes it outperforms them [13,14]. Its main advantage is that it balances the best accuracy possible with a model that is still simple enough for humans to understand [12].
OneR is a single-attribute-based classifier that involves only one attribute at the classification time. A single attribute concept is powerful if it can directly influence the classification accuracy of the dataset in a positive manner. Yet not all attributes have to positively contribute to the classification process which may increase the single attribute power. The single attribute rule can be more effective than complex methods when it is difficult to learn from the dataset due to it being simple, small, noisy, or complex. A study by [15] used the single attribute concept by creating multiple one-dimensional classifiers from the original dataset in the training phase and combining the results in the prediction phase. The new method is unlike OneR because it considers all attributes' contributions at the prediction time. Feature selection is a data-mining preprocessing step widely used to improve the classification and reduce the performance time. It is effective in reducing the dataset's dimensionality by eliminating non-contributable attributes. It uses different techniques to come up with a single attribute or a subset of attributes [16,17]. Moreover, it has proven its effectiveness in improving various applications' predictive accuracy [18][19][20].
In this paper, we tackle the problem of classifying small datasets by expanding the power of a pertinent single attribute using SAB-HR classifier, which is similar to OneR classifier in using single attribute at classification phase, but different in which instead of generating a rule for each attribute, a feature selection method is employed to select the attribute that is less heterogenic among the other attributes. We calculated the H-Ratio [21] for each attribute (att) then identified the attribute with the lowest H-Ratio value (att H-Ratio ). We used the pair (att H-Ratio , c), where c is the class value, to learn and classify the small dataset. The results were encouraging and showed a significant improvement compared to the classical OneR classifier. In addition, we created multiple classifiers in the same manner of SAB-HR, using different criteria to select the pertinent single attribute. We used IG and GR in the feature-selection process and created SAB-IG and SAB-GR classifiers, correspondingly. We individually compared the new classifier SAB-HR with others (i.e., SAB-IG and SAB-GR). The remainder of this paper is organized as follows: Section 2 reviews the background of our work. In Section 3, we propose the research method SAB-HR classifier. The experiments and a brief discussion of the findings is in subsections 3.1 and 3.2, consequently. Finally, Section 4 concludes the paper.

BACKGROUND
In this section we will review some of the techniques that will be used in this study.

OneR classifier
OneR, is short for "One Rule", and has been introduced by Rob Holte [22,10]. It is one of the most primitive techniques, based on a 1-level decision tree that creates one rule for each attribute in the dataset, then selects the rule with minimum classification errors as its "one rule". To create a rule for an attribute, it constructs a frequency table for each attribute against the class [22], Figure 1 shows the pseudocode of OneR algorithm. It has shown that OneR work distinctively well in practice with real-world data and can compete the state-of-the-art classification algorithms in some situations [13,14,23]. OneR is using one attribute for classification and many consider it as one of feature selection methods with feature subset containing a single attribute [24]. Comparing the OneR classifier with the baseline classifier ZeroR [14], OneR is a one step beyond. Both OneR and ZeroR are useful for determining a minimum standard classifier for other classification algorithms. OneR's accuracy is always higher or at least equal the baseline classifier when evaluated on the training data. The authors in [25] proposed attempts to enhance the performance of OneR by addressing two issues: the quantization of continuous-valued attributes, and the treatment of missing values. Figure 1. The pseudocode of OneR algorithm [15] For each attribute (att), For each value of that att, make a rule as follows; Count how often each value of class appears Find the most frequent class Make the rule assign that class to this value of the att Calculate the total error of the rules of each att Choose the att with the smallest total error. ISSN: 2088-8708  The pertinent single-attribute-based classifier for small datasets classification (M. Jamjoom) 3229

Feature selection
Feature selection methods attempt to find the minimal subset of features that do not significantly decrease the classification accuracy. Feature selection methods can be categorized as wrapper methods or filter methods [17]. Surveys done by [17] and [16] showed plenty of such methods. A wrapper method is a model-based approach where the quality of the features selected is measured by the classification accuracy of the classification algorithm being used. Some use a greedy search to select the subset [16]. Meanwhile, in a filter method, called a model-free approach, the selection of features is done independently from the classification algorithm. It selects the subset's features dependent on general measurable characteristics of the feature, such as information Gain, Gain Ratio, Pearson Correlation, Mutual Information (MI) [16], and Heterogeneity Ratio [21]. In this paper, we used feature selection that utilizes filter methods (i.e., attribute evaluation) and focused on some of the mentioned measures (i.e., IG, GR, and H-Ratio). A brief description of each follows. -Information gain [21] measures the amount of information given by an attribute about the class. It is defined by formula (1): where H att (Y) measures the entropy of the attribute att by contributing to class Y while H(Y) calculates the entropy of class Y. In fact, entropy is the quantity of information contained or delivered by a source of information. It is also used in measuring the relevancy and defined by formula (2): -Gain ratio [26] is a ratio of information gain to intrinsic information. It determines the relevancy of an attribute. GR is calculated using the formula (3): where H(att) = ∑ − ( ) 2 ( ) and P(v j ) represents the probability to have the value v j by contributing to overall values for attribute j.
-Heterogeneity ratio is a new measure defined by [21] that measures the ratio of heterogeneity of a nominal attribute among the dataset attributes. In other words, it quantifies the homogeneity of a set of instances sharing the same value of attributes. The H-Ratio is defined by formula (4): The ratio

RESULTS AND ANALYSIS
In this section, we introduce a new single-attribute-based classifier SAB-HR to classify the small datasets. The new algorithm uses a new criterion to select the powerful pertinent single attribute, which will contribute in the classification. SAB-HR is unlike OneR in generating a rule for each attribute. It calculates the H-Ratio for each attribute (attH-Ratio) in the dataset to determine the attribute that is less heterogenic among the other attributes. The attribute with the lowest heterogeneity ratio value is used in pairs with the class c (attH-Ratio , c) in the classification process while the remaining attributes are eliminated. The power of the single attribute selected for SAB-HR lies in its homogeneity with other attributes in which it provides enough information for the classifier to predict correctly. attH-Ratio is a representative attribute that is sufficient for small datasets. The algorithmic description of SAB-HR is presented in Figure 2.

Experiments
In the following experiments, we aim to evaluate the performance of the new SAB-HR classifier when dealing with small datasets. In addition, we want to compare the performance of SAB-HR with other single attribute classifiers that use different criteria, such as IG and GR, when selecting the single attribute during the feature-selection process. We used the well-known open source software WEKA [27]. The datasets were obtained from the UCI Repository for Machine Learning [28]. We selected 12 small datasets corresponding to Vapnik's definition [3]. Table 1 lists the main characteristics of the datasets collected and used in terms of number of instances, number of attributes, and Vapnik's ratio for determining the dataset's size. The number beside the dataset name will be its reference in the figures.
The OneR was used as a base classifier; a 10-fold cross-validation and a paired t-test with a confidence level of 95% were used to determine if the differences in classification accuracy were statistically significant, and underlined in the tables. We compared the different methods with respect to the average classification accuracy and the number of datasets for which each method achieved better results. Better results are shown in the tables in bold font. In the tables, we named each technique using the abbreviation SAB for single-attribute-based name, suffixed with an abbreviation for the measure used for selecting the single attribute in the feature-selection process. The new classifiers, with respect to the different measures, are named as follows: SAB-HR, SAB-IG and SAB-GR. In our experiments, we applied the feature-selection process using different measures (H-Ratio, IG, and GR) to select the pertinent single attribute, then we eliminated the remaining (i.e., unselected) attributes and classified with a pair of attributes (pertinent single attribute, class).

Results and discussion
The experiment's results are combined in Table 2, which compares the performance of classical OneR with the new created classifiers. Noticeably, the performance of the classical OneR is insignificant when compared to the new applied classifiers. The overall average accuracy for the new classifiers (i.e., SAB-HR, SAB-IG and SAB-GR) is 64.6%, 49.72% and 61.31%, respectively, corresponding to 48.53% for the classical OneR classifier. Furthermore, the difference in average accuracy between SAB-HR compared to the classical OneR is statistically significant. The average difference between the classical OneR and the applied classifiers (i.e., SAB-HR, SAB-IG and SAB-GR) is 16.07%, 1.19% and 12.78%, respectively, favoring new classifiers.  Figure 3 (a-c) compare the applied classifiers to the classical OneR classifier in terms of average accuracy, with the less heterogenous attribute classifier (SAB-HR) ranking first, followed by SAB-GR with a slight difference (3.29%) from first, and SAB-IG classifier with a big difference from other classifiers but looking typical to the classical OneR, the two lines approximately identical as shown in Figure 3 (b). The (att IG ) attribute used in SAB-IG contains the largest amount of information about the class. In a small dataset case, it may be more important to be concerned about the consistency of the attribute with other attributes due to the limited number of instances in the dataset. This would minimize the gaps existing between the instances in the dataset. The homogeneity of the dataset helps make it more representative and, thus, more accurate to be learned. In addition, the new classifiers achieved better average accuracy in more datasets than OneR as shown in Table 2. Figure 4 (a-c) shows each new classifier in comparison to OneR. The number of better datasets achieved is 8, 4 and 8 for SAB-HR, SAB-IG and SAB-GR, respectively, corresponding to 3, 1, and 3 for OneR classifier.   Table 2, it is obvious that selecting the single attribute that has a lower classification error rate for the OneR classifier is not always optimal, especially in small datasets. Using a more deliberate technique to select the single attribute has a positive impact on classification accuracy and number of better datasets achieved. Meanwhile, we developed Table 3 to highlight the new classifier SAB-HR, which used homogeneity for the pertinent single attribute selection. Table 3 shows a comparison between the new classifier SAB-HR and the other created classifiers for the same purpose (i.e., SAB-IG and SAB-GR). The results showed that SAB-HR's average accuracy outperforms SAB-IG's average accuracy by nearly 14.88%, while with SAB-GR the difference is only 1.37%. In general, the performance of the SAB-HR classifier is remarkable when compared to the classical OneR or the applied classifiers (i.e., SAB-IG and SAB-GR). Figure 5 (a) and (b) show the difference of performance of each dataset between SAB-HR and the other applied classifiers in terms of average accuracy.  In summary, we can conclude that, for small datasets, using a simple classifier, such as OneR, is one of the main options for enhancing its classification accuracy. In addition, employing the feature-selection method for selecting a single attribute using a common measure like H-Ratio, IG or GR will do so, with better results. On the other hand, considering the homogeneity of the attribute for pertinent single attribute selection can positively impact the classification process. It helped to reduce the gap between instances, and accordingly had a representative dataset. Consequently, it provided enough information for the classifier to learn and achieve a decent average accuracy. From the previous results, single-attribute-based classifier can be powerful for classifying small datasets when the pertinent attribute is selected. That is the case with the new SAB-HR, which is recommended among the tested classifiers in this work.

CONCLUSION
In this work we have explored the power of the single attribute when selected using an effectual feature-selection criterion. We have addressed the small dataset mining problem as it is not always easy to gather a large amount of real data. The new algorithm SAB-HR is a pertinent single-attribute-based classifier consisting of a pair of (simplicity, effectiveness) to contribute positively in classifying small datasets. The single attribute selected to be the most homogenous with the other attributes in the dataset gives more consistency between instances. Our empirical results used 12 benchmark datasets of a small size corresponding to Vapnik's definition. The results show that SAB-HR's performance significantly outperforms the classical OneR's performance. In addition, we compared the performance of SAB-HR with other single attribute classifiers that use different attribute selection criteria (e.g., IG and GR), and all the results confirmed the effectiveness of the SAB-HR classifier. In future work, we intend to investigate algorithms to improve the classification accuracy of small datasets using more progressive classifiers.
In addition, we aim to propose more simple methods for classification.