Software Defect Prediction Using Machine Learning: A Comparative Analysis of Random Forest, Decision Tree, and XGBoost

JASANI, NAITIKKUMAR

doi:10.5281/zenodo.20573276

Published June 6, 2026 | Version v1

Dataset Open

Software Defect Prediction Using Machine Learning: A Comparative Analysis of Random Forest, Decision Tree, and XGBoost

JASANI, NAITIKKUMAR

Software defect prediction (SDP) is a critical proactive
methodology in modern software engineering, allowing
development teams to allocate testing resources efficiently by
identifying fault-prone modules. However, the inherent class imbalance
in software repositories where clean code vastly outnumbers
defective code severely limits the reliability of traditional
classification models, a phenomenon known as the Accuracy
Paradox. This study presents a comparative analysis of three treebased
machine learning architectures: a baseline Decision Tree,
a bagging ensemble (Random Forest), and a boosting ensemble
(XGBoost), to evaluate their robustness against imbalanced
static code metrics. Utilizing the NASA MDP JM1 dataset, the
methodology incorporates the Synthetic Minority Over-sampling
Technique (SMOTE) to balance the training space. Performance
was evaluated using Accuracy, Precision, Recall, F1-Score, and
ROC-AUC. The empirical results confirm the Accuracy Paradox:
while XGBoost achieved the highest global accuracy (80.15%),
it exhibited the lowest recall (31.03%), missing a critical volume
of actual defects. Conversely, the Random Forest model proved
superior in navigating the noisy feature space, achieving the
highest F1-Score (0.4187) and ROC-AUC (0.7472). The findings
demonstrate that for static software metric analysis, variance
reduction through bootstrap aggregating provides a more reliable
predictive threshold than sequential error correction.

Files

Software Defect Prediction Using Machine Learning A Comparative Analysis of Random Forest, Decision Tree, and XGBoost.pdf

Files (340.0 kB)

Name	Size	Download all
defect_prediction_research.rar md5:8288fe5a0b74cd85e37b170ca8572c0d	242.9 kB	Download
Software Defect Prediction Using Machine Learning A Comparative Analysis of Random Forest, Decision Tree, and XGBoost.pdf md5:4cfdca57299aeec4b45ddd6029792085	97.1 kB	Preview Download

	All versions	This version
Views	5	5
Downloads	1	1
Data volume	194.2 kB	194.2 kB

Software Defect Prediction Using Machine Learning: A Comparative Analysis of Random Forest, Decision Tree, and XGBoost

Authors/Creators

Description

Files

Software Defect Prediction Using Machine Learning A Comparative Analysis of Random Forest, Decision Tree, and XGBoost.pdf

Files (340.0 kB)