Published August 22, 2023 | Version v1
Journal article Open

Early thyroid risk prediction by data mining and ensemble classifiers

  • 1. Computer Engineering Department, College of Engineering and Petroleum, Kuwait University, P.O. Box 5969, Kuwait City, 13060 Safat, Kuwait

Description

Thyroid disease is among the most prevalent endocrinopathies worldwide. As the thyroid gland controls human metabolism, thyroid illness is a matter of concern for human health. To save time and reduce error rates, an automatic, reliable, and accurate thyroid identification machine learning (ML) system is essential. The proposed model aims to address existing work limitations such as lack of detailed feature analysis, visualization, improvement in prediction accuracy, and reliability. Here used a public thyroid illness dataset containing 29 clinical features from University of California, Irvine ML repository. The clinical features help us build an ML model that can predict thyroid illness by analyzing early symptoms and replacing the manual analysis of these attributes. Feature analysis and visualization facilitate an understanding of the role of features in thyroid prediction tasks. In addition, the overfitting problem was eliminated by 5-fold cross validation and data balancing using synthetic minority oversampling technique (SMOTE). Ensemble learning ensures the reliability of the prediction model owing to the involvement of multiple classifiers in the prediction decisions. The proposed model achieved 99.5% accuracy, 99.39% sensitivity and 99.59% specificity with boosting method which is applicable to real-time computer-aided diagnosis (CAD) systems to ease diagnosis and promote early treatment.

Files

hypothyroid.csv

Files (807.8 kB)

Name Size Download all
md5:3a35974cc59f5c524c3c173a0362dd87
517.7 kB Download
md5:b9c6997af786c1e98ed781fecf6bd30b
279.9 kB Preview Download
md5:f4e803c33aa7688741e9f40680386a3b
2.3 kB Download
md5:465b082c9344c10eb35049fff6cf6e13
7.9 kB Download