Published June 6, 2026 | Version v1

Software Defect Prediction Using Machine Learning: A Comparative Analysis of Random Forest, Decision Tree, and XGBoost

Authors/Creators

Description

Software defect prediction (SDP) is a critical proactive
methodology in modern software engineering, allowing
development teams to allocate testing resources efficiently by
identifying fault-prone modules. However, the inherent class imbalance
in software repositories where clean code vastly outnumbers
defective code severely limits the reliability of traditional
classification models, a phenomenon known as the Accuracy
Paradox. This study presents a comparative analysis of three treebased
machine learning architectures: a baseline Decision Tree,
a bagging ensemble (Random Forest), and a boosting ensemble
(XGBoost), to evaluate their robustness against imbalanced
static code metrics. Utilizing the NASA MDP JM1 dataset, the
methodology incorporates the Synthetic Minority Over-sampling
Technique (SMOTE) to balance the training space. Performance
was evaluated using Accuracy, Precision, Recall, F1-Score, and
ROC-AUC. The empirical results confirm the Accuracy Paradox:
while XGBoost achieved the highest global accuracy (80.15%),
it exhibited the lowest recall (31.03%), missing a critical volume
of actual defects. Conversely, the Random Forest model proved
superior in navigating the noisy feature space, achieving the
highest F1-Score (0.4187) and ROC-AUC (0.7472). The findings
demonstrate that for static software metric analysis, variance
reduction through bootstrap aggregating provides a more reliable
predictive threshold than sequential error correction.

Files

Software Defect Prediction Using Machine Learning A Comparative Analysis of Random Forest, Decision Tree, and XGBoost.pdf