Published November 18, 2025 | Version v1
Journal article Open

New Horizons in Diabetes Prediction: Comparative Machine Learning Models Using Orange Data Mining

  • 1. Gaziantep City Hospital, Gaziantep, Türkiye

Description

Background: Diabetes mellitus remains a growing global health concern. Early prediction based on clinical and metabolic parameters may improve prevention and management strategies. This study aims to compare the performance of different supervised machine learning models for diabetes prediction using the Pima Diabetes dataset, implemented through the Orange Data Mining platform, a no-code visual analytics environment.
Methods: The Pima Indians Diabetes Dataset was originally developed by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) in the United States. It includes data collected from female patients of Pima Indian heritage, aged 21 years or older, living near Phoenix, Arizona. The Pima Diabetes dataset was analyzed in Orange, involving data preprocessing (missing value imputation, normalization), stratified train/test splitting, and model training through cross-validation. Supervised learning algorithms—including Logistic Regression, Neural Network, Random Forest, Naïve Bayes, k-Nearest Neighbors, and AdaBoost were compared. Model evaluation was based on ROC-AUC as the primary metric, along with PR-AUC, F1-score, sensitivity, specificity, and calibration metrics (Brier score and reliability plots).
Results: Among the six supervised models tested, Logistic Regression and Neural Network achieved the best overall performance with AUC values of 0.835 and 0.816, respectively. Both models showed balanced accuracy and good calibration, while AdaBoost performed the weakest (AUC = 0.655). The Calibration Plot confirmed that Logistic Regression provided the most reliable probability estimates, consistent with its lower Brier score.
Conclusions: Orange Data Mining enabled an easy and reproducible comparison of supervised learning algorithms for diabetes prediction. Logistic Regression and Neural Network models showed the most reliable and well-calibrated performance, indicating that accurate prediction can be achieved even in a no-code visual environment.

Files

29.pdf

Files (1.9 MB)

Name Size Download all
md5:3aeac10eb9460f428e51ba821761c55a
1.9 MB Preview Download