Published February 15, 2021 | Version v1
Journal article Open

A hybrid classification framework based on clustering

  • 1. Sichuan University, China
  • 2. Zunyi Medical University, China
  • 3. University of Münster, Germany

Description

The traditional supervised classification algorithms tend to focus on uncovering the relationship between sample attributes and the class labels; they seldom consider the potential structural characteristics of the sample space, often leading to unsatisfactory classification results. To improve the performance of classification models, many scholars have sought to construct hybrid models
by combining both supervised and unsupervised learning. Although the existing hybrid models have shown significant potential in industrial applications, our experiments indicate that some shortcomings remain. With the aim of overcoming such shortcomings of the existing hybrid models, this article proposes a hybrid classification framework based on clustering (HCFC). First, it applies a clustering algorithm to partition the training samples into K clusters. It then constructs a clustering-based attribute selection
measure—namely, the hybrid information gain ratio, based upon which it then trains a C4.5 decision tree. Depending on
the differences in the clustering algorithms used, this article constructs two different versions of the HCFC (HCFC-K and
HCFC-D) and tests them on eight benchmark datasets in the healthcare and disease diagnosis industries and on 15
datasets from other fields. The results indicate that both versions of the HCFC achieve a comparable or even better
classification performance than the other three hybrid and six single models considered. In addition, the HCFC-D has
a stronger ability to resist class noise compared with the HCFC-K.

Files

ieee_tii_2020.pdf

Files (3.0 MB)

Name Size Download all
md5:cd56985410834af72569a3aef15c5130
3.0 MB Preview Download

Additional details

Funding

European Commission
ULTRACEPT - Ultra-layered perception with brain-inspired information processing for vehicle collision avoidance 778062