Innovative Spam Detection Using Hybrid Machine Learning Algorithms: A Data-Centric Approach

Sree Vidya Venigalla

doi:10.35940/ijitee.D8310.14121125

Published November 30, 2025 | Version CC-BY-NC-ND 4.0

Journal article Open

Innovative Spam Detection Using Hybrid Machine Learning Algorithms: A Data-Centric Approach

Sree Vidya Venigalla (Contact person)¹

1. Student, Department of Computer Science Engineering, Koneru Lakshmaiah Educational Foundation, Vijayawada (Andhra Pradesh), India.

Contributors

Contact person:

Sree Vidya Venigalla¹

Researcher:

Dr. K.V.D. Kiran²

1. Student, Department of Computer Science Engineering, Koneru Lakshmaiah Educational Foundation, Vijayawada (Andhra Pradesh), India.
2. Professor, Department of Computer Science Engineering, Koneru Lakshmaiah Educational Foundation, Vijayawada, (Andhra Pradesh), India.

Abstract: The rise of spam messages, in the form of malware, phishing attacks, and unrequested messages, poses a serious threat to internet users and security infrastructures. Conventional spam filtering techniques that rely solely on strict rules and keyword lists struggle to keep pace with contemporary spammer tactics that mask malicious content. This study proposes a solution to this challenge by developing a hybrid machine learning methodology that leverages Naive Bayes (NB) and a Support Vector Machine (SVM), combining them into an ensemble for improved accuracy and resilience in spam detection. The technique uses the wellknown SMS Spam Collection Dataset. It employs more complex textual feature extraction (TF-IDF), as well as additional nontextual features such as message length, word capitalisation, and the frequency of previously determined keywords. The proposed system is extensively evaluated using standard classification metrics—accuracy, F1 score, precision, and recall —to assess its reliability and validity. The research findings indicate that the proposed machine learning hybrid ensemble is effective at reducing false positives while more boldly tackling the challenges inherent in the real-world spam data environment. The research project offers practical potential for use; the hybrid proposed system is computationally efficient enough for most real-time deployment applications in automated systems to combat spam. This research contributes scalable, adaptive spam-detection mechanisms suitable for real-time messaging environments.

Files

D831014041125.pdf

Files (497.1 kB)

Name	Size	Download all
D831014041125.pdf md5:0b6d4cff47ba92a529d50bbf48587863	497.1 kB	Preview Download

Additional details

DOI: 10.35940/ijitee.D8310.14121125
EISSN: 2278-3075

Accepted: 2025-11-15

Manuscript received on 28 October 2025 | First Revised Manuscript received on 06 November 2025 | Second Revised Manuscript received on 11 November 2025 | Manuscript Accepted on 15 November 2025 | Manuscript published on 30 November 2025.

Alzahrani, A., et al., "Spam and Phishing Detection Using Machine Learning," Computers, 2022. DOI: https://doi.org/10.3390/computers11050062
Kowsari, K., et al., "Text Classification Algorithms: A Survey," Information, 2019. DOI: http://doi.org/10.3390/info10030150.
Akbik, A., et al., "Contextual String Embeddings for Sequence Labelling," COLING 2018. URL: https://aclanthology.org/C18- 1128.pdf
Yang, X., et al., "Recent Advances of Spam Detection in Social Networks Based on Machine Learning," Sensors, 2021. DOI: http://doi.org/10.3390/s21020457
Zhang, H., "An SVM-Based Spam Filtering Approach with Prior Knowledge," IEEE Access, 2019. DOI: http://doi.org/10.1109/ACCESS.2019.2926793
Zhang, L., Zhu, Y., & Yao, T., "Spam detection using deep learning: A survey," IEEE Access, 2020. DOI: http://doi.org/10.1109/ACCESS.2020.3026394
Minaee, S., Kalchbrenner, N., Cambria, E., et al. (2021). Deep learningbased text classification: A comprehensive review. ACM Computing Surveys, 54(3), 1-40. DOI: http://doi.org/10.1145/3439726
Young, T., et al., "Recent trends in deep learning based natural language processing," IEEE Computational Intelligence Magazine, 2018. DOI: http://doi.org/10.1109/MCI.2018.2840738.
Ravi, R., and Ravindran, B., "Spam detection using Bayesian networks," International Journal of Computer Applications, 2016. DOI: http://doi.org/10.5120/ijca2016911820.
Nature (2025). Key insights into recommended SMS spam detection datasets. Nature, March 2025.URL: https://www.nature.com/articles/s41598-025-10562-y.
Young, S., et al., "Adaptive deep learning approach for spam detection," Journal of Cybersecurity, 2025 (forthcoming publication IJRTE).
Lee, J., et al., "An ensemble machine learning approach for email spam detection," Expert Systems with Applications, 2024. DOI: http://doi.org/10.1016/j.eswa.2023.120572.
Roul, R. K., Sahoo, J. K., & Arora, K. (2025). A Comparative Study of TF-IDF, Word2Vec, FastText, BERT, and GPT for Text Representation. DOI: http://doi.org/10.1016/j.procs.2025.04.250.
Kumar, R., Sahoo, J. K., & Arora, K. (2021). Modern term weighting schemes for text classification. Procedia Computer Science, 186, 604- 613. DOI: http://doi.org/10.1016/j.procs.2021.04.050

	All versions	This version
Views	91	91
Downloads	47	47
Data volume	26.8 MB	26.8 MB

Contributors

Contact person:

Researcher:

D831014041125.pdf

Files (497.1 kB)

Identifiers

Dates

References

Innovative Spam Detection Using Hybrid Machine Learning Algorithms: A Data-Centric Approach

Authors/Creators

Contributors

Contact person:

Researcher:

Description

Files

D831014041125.pdf

Files (497.1 kB)

Additional details

Identifiers

Dates

References