Published July 30, 2025 | Version CC-BY-NC-ND 4.0
Journal article Open

A Comparative Analysis of Techniques, Datasets, Feature Selection Methods, and Evaluation Metrics in Software Fault Prediction

  • 1. Research Scholar, Department of Computer Science and Engineering, Sri Guru Granth Sahib World University, Fatehgarh Sahib. (Punjab), India and Assistant Professor, Department of Computer Applications, Chandigarh Business School of Administration, Landran, Mohali (Punjab), India.

Contributors

Contact person:

Researcher:

  • 1. Research Scholar, Department of Computer Science and Engineering, Sri Guru Granth Sahib World University, Fatehgarh Sahib. (Punjab), India and Assistant Professor, Department of Computer Applications, Chandigarh Business School of Administration, Landran, Mohali (Punjab), India.
  • 2. Assistant Professor, Department of Computer Science, Sri Guru Granth Sahib World University, Fatehgarh Sahib (Punjab), India.

Description

Abstract: This study presents a systematic literature review (SLR) that investigates recent advancements in Software Fault Prediction (SFP) methodologies. The review focuses on key dimensions including techniques, datasets, feature selection methods, software metrics, and evaluation criteria. By analyzing significant studies from renowned digital libraries such as ACM, IEEE, Springer Link, and Science Direct, five research questions were defined to guide the assessment of current trends in SFP research. Findings reveal that machine learning approaches— particularly neural networks, deep learning, and ensemble methods—are increasingly employed due to their capability to manage the complexity of software fault data. Public datasets, notably those from the PROMISE and NASA MDP repositories, are widely utilized, underlining the importance of dataset diversity for enhancing model performance. Feature selection methods, particularly wrapper techniques, are often employed to improve predictive accuracy. Evaluation of models predominantly relies on confusion matrix-based metrics such as Accuracy, Precision, Recall, and F1-Score. Despite these advances, challenges remain in addressing class imbalance, adapting to rapidly evolving software environments, and achieving real-time fault prediction. The study highlights the need for greater classifier diversity and ongoing methodological improvements to enhance the robustness and generalizability of SFP models.

Files

B828014020725.pdf

Files (1.0 MB)

Name Size Download all
md5:f708b48bc1dd51cbfb663eb54d828050
1.0 MB Preview Download

Additional details

Identifiers

Dates

Accepted
2025-07-30
Manuscript received on 12 June 2025 | First Revised Manuscript received on 05 July 2025 | Second Revised Manuscript received on 10 July 2025 | Manuscript Accepted on 15 July 2025 | Manuscript published on 30 July 2025.

References

  • Matloob, F., Ghazal, T. M., Taleb, N., Aftab, S., Ahmad, M., Khan, M. A., Soomro, T. R. (2021). Software defect prediction using ensemble learning: A systematic literature review. IEEE Access.https://www.researchgate.net/publication/353107026Software_ Defect_Prediction_Using_Ensemble_Learning_A_Systematic_Literatu re_Review
  • Borandag, E. (2023). Software Fault Prediction Using an RNN-Based Deep Learning Approach and Ensemble Machine Learning Techniques. Applied Sciences, 13(3), 1639. https://www.mdpi.com/2076- 3417/13/3/1639
  • Pandey, S. K., Mishra, R. B., & Tripathi, A. K. (2020). BPDET: An effective software bug prediction model using deep representation and ensemble learning techniques. Expert Systems with Applications, 144, 113085. https://www.Science direct.com/science/article/abs/pii/S0957417419308024?utm_ source.
  • Rathore, S. S., & Kumar, S. (2021). An empirical study of ensemble techniques for software fault prediction. Applied Intelligence, 51, 3615- 3644. https://link.springer.com/article/10.1007/s10489-020-01935- 6?utm_source.
  • Phung, K., Ogunshile, E., & Aydin, M. (2021, October). A novel software fault prediction approach to predict error-type proneness in Java programs using stream X-machine and machine learning. In 2021, the 9th International Conference on Software Engineering Research and Innovation (CONISOFT) (pp. 168-179). IEEE.https://uwerepository.worktribe.com/output/ 7605934/a-novel-software-faultprediction-approach-to-predict-error-type-proneness-in-the-javaprograms-using-stream-x-machine-and-machine-learning?utm_ source.
  • Alfredo Daza, (2025) Software defect prediction based on a multiclassifier with hyperparameters: Future work. www.sciencedirect.com/journal/results-in-engineering. https://doi.org/10.1016/j.rineng.2025.104123
  • Barbara Wi˛eckowska, Katarzyna B. Kubiak, Paulina Jozwiak, Wacław Moryson and Barbara Stawinska-Witoszynska (2022). Cohen's Kappa Coefficient as a Measure to Assess Classification Improvement following the Addition of a New Marker to a Regression Model. International Journal of Environmental Research and Public Health. https://www.mdpi.com/1660-4601/19/16/10213?utm_source.
  • Goyal, S., & Bhatia, P. K. (2021). Software fault prediction using lion optimization algorithm. International Journal of Information Technology, 13, 2185-2190. https://ouci.dntb.gov.ua/en/works/7ABmB1a4/?utm_source.
  • Rathore, S. S., & Kumar, S. (2021). Software fault prediction based on the dynamic selection of learning technique: findings from the eclipse project study. Applied Intelligence, 1-16. https://link.springer.com/content/pdf/10.1007/s10489-021-02346- x.pdf?utm_source.
  • Qiao, L., Li, X., Umer, Q., & Guo, P. (2020). Deep learning based software defect prediction. Neurocomputing, 385, 100-110. https://colab.ws/articles/10.1016%2Fj.neucom.2019.11.067?utm_sourc e.
  • Aryan Boloori, Azadeh Zamanifar, Amirfarhad Farhadi (2024). Enhancing software defect prediction models using metaheuristics with a learning to rank approach. https://doi.org/10.1007/s44248-024-00016- 0
  • Amir Elmishali and Meir Kalech (2022). Issue-Driven Features for Software Fault Prediction, Software and Information Systems Engineering. https://dblp.org/rec/journals/infsof/ ElmishaliK 23?utm_source.
  • Kaur, G., Pruthi, J., & Gandhi, P. (2023). Machine Learning-Based Software Fault Prediction Models. Karbala International Journal of Modern Science, 9(2). https://kijoms.uokerbala.edu.iq/home/vol9/iss2/9/?utm_source.
  • Rajput, P. K., Aarti, & Pal, R. (2023, February). Genetic AlgorithmBased Clustering with Neural Network Classification for Software Fault Prediction. In Proceedings of International Conference on Data Science and Applications: ICDSA 2022, Volume 1 (pp. 399-414). https://ebin.pub/proceedings-of-international-conference-on-datascience-and-applications-icdsa-2022-volume-1-9811966303- 9789811966309.html?utm_source.
  • ARORA, T., SAINI, H., & GARG, S. (2023). Nature-Inspired Approaches in Software Fault Prediction. JUN 2023 | IRE Journals | Volume 6 Issue 12 | ISSN: 2456-8880. https://cse.mait.ac.in/index.php/campus-life/r-dlab/researchpublications/9-computer-center/1254-details-of-paper-published-injournal-international-national-during-2023-24?utm_source.
  • Mehmood, I., Shahid, S., Hussain, H., Khan, I., Ahmad, S., Rahman, S., & Huda, S. (2023). A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine Learning. IEEE Access. https://dblp.org/pid/351/0094?utm_source.
  • Wang, Z., Tong, W., Li, P., Ye, G., Chen, H., Gong, X., & Tang, Z. (2023). BugPre: an intelligent software version-to-version bug prediction system using graph convolutional neural networks. Complex & Intelligent Systems, 9(4), 3835-3855. https://ouci.dntb.gov.ua/en/works/98oBGYjl/?utm_source.
  • Khan, B., & Nadeem, A. (2023). Evaluating the effectiveness of decomposed Halstead Metrics in software fault prediction. PeerJ Computer Science, 9, e1647. https://ouci.dntb.gov.ua/en/ works/ ldkAogk4/?utm_source.
  • Al Qasem, O., Akour, M., & Alenezi, M. (2020). The influence of deep learning algorithms on software fault prediction. IEEE Access, 8, 63945- 63960. https://malenezi.github. io/malenezi/pdfs/09055422.pdf?utm_source.
  • Mohsen Hesamolhokama, Amirahmad Shafiee, Mohammadreza Ahmaditeshnizi, Mohammadamin Fazli, Jafar Habibi( 2024), SDPERL: A Framework for Software Defect Prediction Using Ensemble Feature Extraction and Reinforcement Learning, arXiv:2412.07927v2. https://arxiv.org/abs/ 2412.07927 ?utm_source.
  • Khleel, N. A. A., & Nehéz, K. (2023). Software defect prediction using a bidirectional LSTM network combined with oversampling techniques. Cluster Computing, 1-24. https://link.springer.com/article/10.1007/s10586-023-04170- z?utm_source.
  • Khalid, A., Badshah, G., Ayub, N., Shiraz, M., & Ghouse, M. (2023). Software Defect Prediction Analysis Using Machine Learning Techniques. Sustainability, 15(6), 5517. https://doi.org/10.3390/su15065517
  • Sofian Kassaymeh, Salwani Abdullah, Ph.D, Mohammed Azmi AlBetar (2021). Salp swarm optimiser for modelling the software fault prediction problem, Journal of King Saud University – Computer and Information Sciences 34 (2022) 3365–3378. https://www.sciencedirect.com/science/article/pii/S1319157821000173 ?utm_source.
  • Das, H., Prajapati, S., Gourisaria, M. K., Pattanayak, R. M., Alameen, A., & Kolhar, M. (2023). Feature Selection Using Golden Jackal Optimization for Software Fault Prediction. Mathematics, 11(11),2438.https://www.mdpi.com/2227- 7390/11/11/2438?utm_source,
  • Feng, S., Keung, J., Yu, X., Xiao, Y., Bennin, K. E., Kabir, M. A., & Zhang, M. (2021). COSTE: Complexity-based OverSampling Technique to alleviate the class imbalance prob- lem in software defect prediction. Information and Software Technology, 129, 106432. https://bibbase.org/network/publication/feng-keung-yu-xiao-benninkabir-zhang-coste-complexity-based-over-sampling-technique-toalleviate-the-class-imbalance-problem-in-software-defect-prediction2021?utm_source.
  • Hassouneh, Y., Turabieh, H., Thaher, T., Tumar, I., Chantar, H., & Too, J. (2021). Boosted whale optimization algorithm with natural selection operators for software fault prediction. IEEE Access, 9, 14239-14258. https://jeeemi.org/index.php/jeeemi/ article/view/334?utm_source=chatgpt.com
  • Anil Kumar Pandey and Manjari Gupta (2024), Software Metrics Selection for Fault Prediction: A Review, International Journal of Management, Technology and Engineering, ISSN NO: 2249-7455. https://www.researchgate.net/publication/382888111 _Software_Metrics_Selection_for_Fault_Prediction_A_Review?utm_s ource.
  • Zhao, K., Xu, Z., Yan, M., Zhang, T., Xue, L., Fan, M., & Keung, J. (2023). The Impact of Class Imbalance Techniques on Crash Fault Residence Prediction Models. Empirical Software Engineering, 28(2), 49. https://yanmeng.github.io/papers/EMSE231.pdf?utm_source.
  • Ali, U., Aftab, S., Iqbal, A., Nawaz, Z., Bashir, M. S., & Saeed, M. A. (2020). Software defect prediction using variant-based ensemble learning and feature selection techniques. Int. J. Mod. Educ. Comput. Sci, 12(5), 29-40. https://www.mecs-press.org/ijmecs/ijmecs-v12- n5/v12n5-3.html?utm_source.
  • Balogun, A. O., Basri, S., Abdulkadir, S. J., & Hashim, A. S. (2019). Performance analysis of feature selection methods in software defect prediction: a search method approach. Applied Sciences, 9(13), 2764. https://www.mdpi.com/2076-3417/9/13/2764?utm_source.
  • Anbu, M., & Anandha Mala, G. S. (2019). Feature Selection Using the Firefly Algorithm in Software Defect Prediction. Cluster Computing, 22, 10925-10934. https://jisemjournal.com/index.php/journal/article/download/6277/2891/10449?utm _source.
  • Iqbal, A., & Aftab, S. (2020). A Classification Framework for Software Defect Prediction Using Multi-Filter Feature Selection Technique and MLP. International Journal of Modern Education Computer Science, 12(1). https://www.mecs-press.org/ijmecs/ijmecs-v12-n1/v12n1- 3.html?utm_source.
  • Balogun, A. O., Basri, S., Mahamad, S., Abdulkadir, S. J., Almomani, M. A., Adeyemo, V. E., .& Bajeh, A. O. (2020). Impact of feature selection methods on the predictive performance of software defect prediction models: An extensive empirical study. Symmetry, 12(7), 1147. https://www.mdpi.com/2073-8994/12/7/1147?utm_source.
  • Rathi, S. C., Misra, S., Colomo-Palacios, R., Adarsh, R., Neti, L. B. M., & Kumar, L. (2023). Empirical evaluation of the performance of data sampling and feature selection techniques for software fault prediction. Expert Systems with Applications, 223, 119806. https://www.researchgate.net/publication/369306462_Empirical_evalu ation_of_the_performance_of_data_sampling_and_feature_selection_t echniques_for_software_fault_prediction?utm_source.
  • Mafarja, M., Thaher, T., Al-Betar, M. A., Too, J., Awadallah, M. A., Abu Doush, I., & Turabieh, H. (2023). Classification framework for faulty software using an enhanced exploratory whale optimiser-based feature selection scheme and random forest ensemble learning. Applied Intelligence, 1-43. https://link.springer.com/article/10.1007/s10489- 022-04427-x?utm_source.
  • Yogita Khatri Sandeep Kumar Singh (2022), An effective feature selection-based cross‑project defect prediction model for software quality improvement, Int J Syst Assur Eng Manag (March 2023) 14(Suppl. 1): S154–S172. https://ideas.repec.org/a/spr/ijsaem/ v
  • Shiqi Tang, Song Huang, Changyou Zheng, Erhu Liu, Cheng Zong, and Yixian Ding (2022), A Novel Cross-Project Software Defect Prediction Algorithm Based on Transfer Learning, TINGHUA SCIENCE AND TECHNOLOGY, ISSN 1007- 0214, 04/18 pp. 41–57 DOI: 10.26599/TST.2020.9010040. https://www.sciopen.com/article/10.26599/TST.2020.9010040?utm_so urce.
  • Goyal, S. (2023). 3PcGE: 3-parent child-based genetic evolution for software defect prediction. Innovations in Systems and Software Engineering, 19(2), 197-216. https://link.springer.com/article/10.1007/s11334-021-00427- 1?utm_source
  • Aarti, A., Rajput, P. K., & Khare, A. (2023, April). Hybrid semisupervised SOM-based clustered approach with genetic algorithm for software fault classification. In AIP Conference Proceedings (Vol. 2724, No. 1). AIP Publishing. https://www.researchgate.net/publication/370379196_Hybrid_semisup ervised_SOM_based_clustered_approach_with_genetic_algorithm_for _software_fault_classification?utm_source.
  • Khatri, Y., & Singh, S. K. (2023). An effective software cross-project fault prediction model for quality improvement. Science of Computer Programming, 226, 102918. https://ideas.repec.org/a/spr/ijsaem/v14y2023i1d10.1007_s13198-022- 01831-x.html?utm_source.
  • Faiz, R. B., Shaheen, S., Sharaf, M., & Rauf, H. T. (2023). Optimal Feature Selection through Search-Based Optimizer in Cross-Project. Electronics, 12(3), 514. https://doi.org/10.3390/electronics12030514.
  • Baraah Alsangari & Göksel Biricik (2023) Performance Evaluation of various ML techniques for Software Fault Prediction using NASA dataset. 5th International Congress on Human-Computer Interaction, Optimization and Robotic Applications. https://www.proceedings.com/content/069/069589webtoc.pdf?utm_sou rce.
  • Hanyu Shi & Mingxia Chen (2022) A two‐stage transformer fault diagnosis method based on multi‐filter interactive feature selection, integrated adaptive sparrow algorithm, optimised support vector machine, IET Electric Power Applications. DOI: 10.1049/elp2.12270.https://ietresearch.onlinelibrary.wiley.com/doi/abs/ 10.1049/elp2.12270?utm_source.
  • Sagheer Abbas, Shabib Aftab, Muhammad Adnan Khan, Taher M. Ghazal, Hussam Al Hamadi and Chan Yeob Yeun (2023), Data and Ensemble Machine Learning Fusion-Based Intelligent Software Defect Prediction System, DOI: 10.32604/cmc. 2023.037933. https://www.techscience.com/cmc/v75n3/52611?utm_source
  • abdullah sharaf , Amin y. noaman, and Asaad ahmed (2023), Prediction and Correction of Software Defects in Message-Passing Interfaces Using a Static Analysis Tool and Machine Learning, IEEE Access. https://sciprofiles.com/profile/3095509?utm_source.
  • Al Qasem, O., Akour, M., & Alenezi, M. (2020). The influence of deep learning algorithms is a factor in software prediction. IEEE Access, 8, 63945-6396. https://malenezi.github.io/malenezi/pdfs/ 09055422.pdf?utm_source.
  • Kulamala, V. K., Kumar, L., & Mohapatra, D. P. (2021). Software fault prediction using LSSVM with different kernel functions. Arabian Journal for Science and Engineering, 46, 8655-8664. https://link.springer.com/article/10.1007/s13369-021-05643- 2?utm_source.
  • Jinfu CHEN, Xiaoli WANG, Saihua CAI, Jiaping XU, Jingyi CHEN, Haibo CHEN (2022), A software defect prediction method with metric compensation based on feature selection and transfer learning, Chen et al. / Front Inform Technol Electron Eng. https://link.springer.com/article/10.1631/FITEE.2100468?utm_source.
  • Anupama Kaushik & Niyati Singal (2022) A hybrid model of wavelet neural network and metaheuristic algorithm for software development effort estimation, Int. j. inf. tecnol.. 14(3):1689– 1698,.https://link.springer.com/journal/41870/volumes-and-issues/14- 3?page=2&utm_source.
  • HAQUE, ALI, MCCLEAN & NOPPEN (2024), Heterogeneous CrossProject Defect Prediction Using Encoder Networks and Transfer Learning, IEEE Access, 10.1109/ACCESS.2023.3343329. https://pure.ulster.ac.uk/files/130014670/Heterogeneous_CrossProject_Defect_Prediction_using_Encoder_and_Transfer_Learning.pdf ?utm_source.
  • Malhotra, R., & Khan, K. (2020). A study on software defect prediction using feature extraction techniques. In 2020, the 8th International Conference on Reliability, Infocom Technologies and Optimization (pp. 1139-1144). IEEE. https://www.researchgate.net/ publication/ 344983707_A _Study_on_Software_Defect_Prediction_using_Feature_Extraction_Techniques?utm_source.
  • Waleed Albattah and Musaad Alzahrani (2024), Software Defect Prediction Based on Machine Learning and Deep Learning Techniques: An Empirical Approach, https:// doi.org/10.3390/ai5040086.https://www.mdpi.com/2673-2688/5/4/ 86? utm_source
  • İlhan, Ö., & Erçelebi Ayyıldız, T. (2021). Software Quality Prediction: An Investigation Based on Artificial Intelligence Techniques for ObjectOriented Applications. In Trends in Data Engineering Methods for Intelligent Systems: Proceedings of the International Conference on Artificial Intelligence and Applied Mathematics in Engineering. https://www.researchgate.net/publication/353005268_Sofware_Quality _Prediction_An_Investigation_Based_on_Artificial_Intelligence_Tech niques_for_Object-Oriented_Applications?utm_source.
  • Wenjun Yao, Muhammad Shafiq, Xiaoxin Lin and Xiang YuA (2023), Software Defect Prediction Method Based on Program Semantic Feature Mining, https://www.mdpi.com/2079-9292/12/7/1546?utm_source.
  • S. Kaliraj, Velisetti Geetha Pavan Sahasranth, V. Sivakumar (2024), A holistic approach to software fault prediction with dynamic classification, Automated Software Engineering.
  • Ran YAN, Meichen WANG, Zhaowei XU and Kai ZHANG (2023) Research on Software Fault Feature Data Extraction Method for Software Fault Prediction Technology, Advances in Machinery, Materials Science and Engineering Application IX M. Chen et al. (Eds.). https://www.researchgate.net/publication/ 374791399_Research_on_Software_Fault_ Feature_ Data_ Extraction_ Method_ for_ Software_ Fault_ Prediction_ Technology? utm_source.
  • Hrishikesh Kumar & Himansu Das (2025), Cost-Effective Prediction Model for Optimal Selection of Software Faults Using Coati Optimization Algorithm, SN Computer Science. https://link.springer.com/article/10.1007/s42979-025-03953- y?utm_source.