Journal article Open Access

A hybrid classification framework based on clustering

Xiao, Jin; Tian, Yuhang; Xie, Ling; Jiang, Xiaoyi; Huang, Jing

The traditional supervised classification algorithms tend to focus on uncovering the relationship between sample attributes and the class labels; they seldom consider the potential structural characteristics of the sample space, often leading to unsatisfactory classification results. To improve the performance of classification models, many scholars have sought to construct hybrid models
by combining both supervised and unsupervised learning. Although the existing hybrid models have shown significant potential in industrial applications, our experiments indicate that some shortcomings remain. With the aim of overcoming such shortcomings of the existing hybrid models, this article proposes a hybrid classification framework based on clustering (HCFC). First, it applies a clustering algorithm to partition the training samples into K clusters. It then constructs a clustering-based attribute selection
measure—namely, the hybrid information gain ratio, based upon which it then trains a C4.5 decision tree. Depending on
the differences in the clustering algorithms used, this article constructs two different versions of the HCFC (HCFC-K and
HCFC-D) and tests them on eight benchmark datasets in the healthcare and disease diagnosis industries and on 15
datasets from other fields. The results indicate that both versions of the HCFC achieve a comparable or even better
classification performance than the other three hybrid and six single models considered. In addition, the HCFC-D has
a stronger ability to resist class noise compared with the HCFC-K.

Files (3.0 MB)
Name Size
3.0 MB Download
Views 15
Downloads 43
Data volume 129.0 MB
Unique views 9
Unique downloads 36


Cite as