Early thyroid risk prediction by data mining and ensemble classifiers
Creators
- 1. Computer Engineering Department, College of Engineering and Petroleum, Kuwait University, P.O. Box 5969, Kuwait City, 13060 Safat, Kuwait
Description
Thyroid disease is among the most prevalent endocrinopathies worldwide. As the thyroid gland controls human metabolism, thyroid illness is a matter of concern for human health. To save time and reduce error rates, an automatic, reliable, and accurate thyroid identification machine learning (ML) system is essential. The proposed model aims to address existing work limitations such as lack of detailed feature analysis, visualization, improvement in prediction accuracy, and reliability. Here used a public thyroid illness dataset containing 29 clinical features from University of California, Irvine ML repository. The clinical features help us build an ML model that can predict thyroid illness by analyzing early symptoms and replacing the manual analysis of these attributes. Feature analysis and visualization facilitate an understanding of the role of features in thyroid prediction tasks. In addition, the overfitting problem was eliminated by 5-fold cross validation and data balancing using synthetic minority oversampling technique (SMOTE). Ensemble learning ensures the reliability of the prediction model owing to the involvement of multiple classifiers in the prediction decisions. The proposed model achieved 99.5% accuracy, 99.39% sensitivity and 99.59% specificity with boosting method which is applicable to real-time computer-aided diagnosis (CAD) systems to ease diagnosis and promote early treatment.