Machine Learning-Based Email Spam Detection: Accuracy, Overfitting and Robustness Analysis

Fareed, Bushra; Din, Ghulam Muhayyu; Khan, Muhammad Rehan Ahmed; Fatima, Tehreem; Shahid, Shifa

doi:10.59324/ejaset.2025.3(6).06

Published November 2, 2025 | Version v1

Journal article Open

Machine Learning-Based Email Spam Detection: Accuracy, Overfitting and Robustness Analysis

1. Institute of Computer & Software Engineering, Khwaja Fareed University of Engineering and Information Technology, Pakistan
2. Department of Computer Science, Khwaja Fareed University of Engineering and Information Technology, Pakistan
3. Department of Computer Science, University of Wah, Wah Cantt, Pakistan
4. Department of Computer Science, The Islamia University of Bahawalpur, Pakistan

This study evaluates classic and modern machine-learning methods for email spam detection using term frequency and inverse document frequency (TF-IDF) features and a public dataset of ham and spam emails. Nineteen classifiers were trained and compared with accuracy, precision, recall, F1, and variance-based stability. While several models (e.g., Gradient Boosting, Ridge Classifier CV, Bernoulli Naive Bayes) achieved high test accuracy, robustness analysis shows Random Forest and Logistic Regression with cross-validation provide steadier performance and reduced overfitting. Standard-deviation results and train-test gaps expose variance issues in single trees and highlight the practical value of ensembles and regularized linear models. The work underscores that deployment choices should favor consistent, generalizable behavior over peak scores alone.

Files

377-Article Text-664-1-10-20251102.pdf

Files (858.3 kB)

Name	Size	Download all
377-Article Text-664-1-10-20251102.pdf md5:85fc709cf46e8f54c30df2889ceeb0a5	858.3 kB	Preview Download

	All versions	This version
Views	46	46
Downloads	19	19
Data volume	17.2 MB	17.2 MB

Machine Learning-Based Email Spam Detection: Accuracy, Overfitting and Robustness Analysis

Authors/Creators

Description

Files

377-Article Text-664-1-10-20251102.pdf

Files (858.3 kB)