YouTube Yorumlarından Spam Tespitine Yönelik Makine Öğrenmesi ve Derin Öğrenme Yöntemlerinin Karşılaştırmalı Bir Analizi

Utku, Anıl

doi:10.5281/zenodo.15719417

Published June 23, 2025 | Version v1

Journal article Open

YouTube Yorumlarından Spam Tespitine Yönelik Makine Öğrenmesi ve Derin Öğrenme Yöntemlerinin Karşılaştırmalı Bir Analizi

Utku, Anıl (Producer)

Spam içeriklerin sosyal medya platformlarındaki bilgi güvenliğini tehdit etmesi ve manuel tespit yöntemlerinin yetersiz kalması nedeniyle, otomatik spam tespit sistemlerinin geliştirilmesi büyük önem taşımaktadır. Makine öğrenmesi ve derin öğrenme teknikleri, spam yorumları yalnızca anahtar kelimelere dayanarak değil, bağlamsal ilişkileri ve dilin anlamını dikkate alarak sınıflandırmada büyük avantajlar sunmaktadır. Bu çalışmada, YouTube yorumlarında spam tespitini otomatik olarak gerçekleştirmek için farklı makine öğrenmesi ve derin öğrenme modellerinin karşılaştırmalı bir analizi sunulmuştur. Çalışmada, LR, RF, SVM, XGBoost, Bi-LSTM ve BERT kullanılarak spam yorumları tespit etmek için kapsamlı analizler yapılmıştır. TF-IDF vektörleştirme yöntemi kullanılarak metinler sayısal hale getirilmiş ve modellerin eğitimi için uygun bir veri temsili oluşturulmuştur. Deneysel sonuçlar, metin tabanlı verilerde uzun vadeli bağımlılıkları öğrenme yeteneği sayesinde BERT'in %97,7 sınıflandırma doğruluyla karşılaştırılan modellerden daha başarılı olduğunu göstermiştir.

Files

3.pdf

Files (611.1 kB)

Name	Size	Download all
3.pdf md5:10e81503238bc973a44f5e6aaf7fffdc	611.1 kB	Preview Download

Additional details

Submitted: 2025-03-06
Accepted: 2025-05-29

[1] Susanto H, Fang Yie L, Mohiddin F, Rahman Setiawan A A, Haghi P K, Setiana D. Revealing social media phenomenon in time of COVID-19 pandemic for boosting start-up businesses through digital ecosystem. Applied system innovation. 2021;4(1).
[2] Humprecht E, Kessler S H. Unveiling misinformation on YouTube: examining the content of COVID-19 vaccination misinformation videos in Switzerland. Frontiers in Communication. 2024; 9.
[3] Lakshmi M S, Rani A S, Divya T S, Shravani J. Dynamic Spam Detection in Social Networks: Leveraging Convex Nonnegative Matrix Factorization for Enhanced Accuracy and Scalability. International Journal of Computer Engineering in Research Trends. 2024; 11(4), 1-11.
[4] Gongane V U, Munot M V, Anuse A D. Detection and moderation of detrimental content on social media platforms: current status and future directions. Social Network Analysis and Mining. 2022; 12(1).
[5] Wani M A, ElAffendi M, Shakil K A. AI-Generated Spam Review Detection Framework with Deep Learning Algorithms and Natural Language Processing. Computers. 2024; 13(10).
[6] Ahmed N, Amin R, Aldabbas H, Koundal D, Alouffi B, Shah T. Machine learning techniques for spam detection in email and IoT platforms: analysis and research challenges. Security and Communication Networks. 2022; 2022(1).
[7] Akinyelu A A. Advances in spam detection for email spam, web spam, social network spam, and review spam: ML-based and nature-inspired-based techniques. Journal of Computer Security. 2021; 29(5), 473-529.
[8] Al Saidat M R, Yerima S Y, Shaalan K. Advancements of SMS Spam Detection: A Comprehensive Survey of NLP and ML Techniques. Procedia Computer Science. 2024; 244, 248-259.
[9] Al-Adhaileh M H, Alsaade F W. Detecting and Analysing Fake Opinions Using Artificial Intelligence Algorithms. Intelligent Automation & Soft Computing. 2022; 32(1).
[10] Shinde S A, Pawar R R, Jagtap A A, Tambewagh P A, Rajput P U, Mali M K, Mulik S V. Deceptive opinion spam detection using bidirectional long short-term memory with capsule neural network. Multimedia Tools and Applications. 2024; 83(15), 45111-45140.
[11] Sinhal A, Maheshwari M. An Extensive Review on Contemporary Analysis of Comment Filtration of YouTube Videos Using Machine Learning Techniques. International Journal of Emerging Technology and Advanced Engineering. 2022; 12(9), 130-143.
[12] Shirzadova S, Uysal A K. Türkçe YouTube Yorumları Üzerinde Spam Filtreleme. Düzce Üniversitesi Bilim ve Teknoloji Dergisi. 2022; 10(4), 1793-1810.
[13] Baktır N, Atay Y. Makine Öğrenmesi Yaklaşımlarının Spam-Mail Sınıflandırma Probleminde Karşılaştırmalı Analizi. Bilişim Teknolojileri Dergisi. 2022; 15(3), 349-364.
[14] Bakır R, Erbay H, Bakır H. ALBERT4Spam: a novel approach for spam detection on social networks. Bilişim Teknolojileri Dergisi. 2024; 17(2), 81-94.
[15] Güven Z A. Türkçe e-postalarda spam tespiti için makine öğrenme yöntemlerinin ve dil modellerinin analizi. Avrupa Bilim ve Teknoloji Dergisi 2023; 47, 1-6.
[16] Şengel Ö. A comparative analysis of learning techniques in the context of Turkish spam detection. Batman Üniversitesi Yaşam Bilimleri Dergisi. 2024; 14(1), 43-56.
[17] Sam'an M, Imaddudin K. Hybrid deep learning model for YouTube spam comment detection. International Journal of Electrical and Computer Engineering (IJECE). 2024; 14(3), 3313-3319.
[18] Airlangga G. Spam Detection in YouTube Comments Using Deep Learning Models: A Comparative Study of MLP, CNN, LSTM, BiLSTM, GRU, and Attention Mechanisms. MALCOM: Indonesian Journal of Machine Learning and Computer Science. 2024; 4(4), 1533-1538.
[19] Waheed A. YouTube Yorumları Spam Veri Kümesi [İnternet]. Kaggle; [alıntılanma tarihi 6 Mart 2025]. Erişim adresi: https://www.kaggle.com/datasets/ahsenwaheed/youtube-comments-spam-dataset/data
[20] Bektaş J. EKSL: An effective novel dynamic ensemble model for unbalanced datasets based on LR and SVM hyperplane-distances. Information Sciences. 2022; 597, 182-192.
[21] Al-Najjar H A, Pradhan B, Kalantar B, Sameen M I, Santosh M, Alamri A. Landslide susceptibility modeling: an integrated novel method based on machine learning feature transformation. Remote Sensing. 2021; 13(16).
[22] Kim G, Yang S M, Kim D M, Choi J G, Lim S, Park H W. Developing a deep learning-based uncertainty-aware tool wear prediction method using smartphone sensors for the turning process of Ti-6Al-4V. Journal of Manufacturing Systems. 2024; 76, 133-157.
[23] Nwosu A, Aimufua G I O, Ajayi B A, Olalere M. The Impact of Regularization on Linear Regression Based Model. Journal of Artificial Intelligence and Computer Science. 2024; 1(1).
[24] Arabameri A, Chandra Pal S, Rezaie F, Chakrabortty R, Saha A, Blaschke T, Thi Ngo P T. Decision tree based ensemble machine learning approaches for landslide susceptibility mapping. Geocarto International. 2022; 37(16), 4594-4627.
[25] Sesa O, Haikal A Y, Elhosseini M A, Gad H H. Smart Bagged Tree-based Classifier optimized by Random Forests (SBT-RF) to Classify Brain-Machine Interface Data. International journal of electrical and computer engineering systems. 2022; 13(10), 895-908.
[26] Jagannath A, Jagannath J, Kumar P S P V. A comprehensive survey on radio frequency (RF) fingerprinting: Traditional approaches, deep learning, and open challenges. Computer Networks. 2022; 219.
[27] Chandra M A, Bedi S S. Survey on SVM and their application in image classification. International Journal of Information Technology. 2021; 13(5), 1-11.
[28] Lai Z, Chen X, Zhang J, Kong H, Wen J. Maximal margin support vector machine for feature representation and classification. IEEE Transactions on Cybernetics. 2023; 53(10), 6700-6713.
[29] Negi H S, Dimri S C, Kumar B, Ram M. Support vector machine and classification, kernel trick for separating of data points. Mathematics in Engineering, Science & Aerospace (MESA). 2024; 15(2).
[30] Ding X, Liu J, Yang F, Cao J. Random radial basis function kernel-based support vector machine. Journal of the Franklin Institute. 2021; 358(18), 10121-10140.
[31] Natras R, Soja B, Schmidt M. Ensemble machine learning of random forest, AdaBoost and XGBoost for vertical total electron content forecasting. Remote Sensing. 2022; 14(15), 3547.
[32] Demir S, Sahin E K. An investigation of feature selection methods for soil liquefaction prediction based on tree-based ensemble algorithms using AdaBoost, gradient boosting, and XGBoost. Neural Computing and Applications. 2023; 35(4), 3173-3190.
[33] Ji S, Wang X, Lyu T, Liu X, Wang Y, Heinen E, Sun Z. Understanding cycling distance according to the prediction of the XGBoost and the interpretation of SHAP: A non-linear and interaction effect analysis. Journal of Transport Geography. 2022; 103.
[34] Shoubaki H, Abdallah S, Shaalan K. Deep Learning Techniques for Identifying Poets in Arabic Poetry: A Focus on LSTM and Bi-LSTM. Procedia Computer Science. 2024; 244, 461-470.
[35] Zhou Z G. Research on sentiment analysis model of short text based on deep learning. Scientific Programming. 2022; 2022(1), 2681533.
[36] Ahmed S, Saif A S, Hanif M I, Shakil M M N, Jaman M M, Haque M M U, Sabbir H M. Att-BiL-SL: Attention-based Bi-LSTM and sequential LSTM for describing video in the textual formation. Applied sciences. 2021; 12(1), 317.
[37] Odera D, Odiaga G. A comparative analysis of recurrent neural network and support vector machine for binary classification of spam short message service. World Journal of Advanced Engineering Technology and Sciences. 2023; 9(1), 127-152.
[38] Zhou C, Li Q, Li C, Yu J, Liu Y, Wang G, Sun L. A comprehensive survey on pretrained foundation models: A history from bert to chatgpt. International Journal of Machine Learning and Cybernetics, 2024, 1-65.
[39] Gao Z, Feng A, Song X, Wu X. Target-dependent sentiment classification with BERT. Ieee Access, 2019; 7, 154290-154299.
[40] Rosso M M, Marasco G, Aiello S, Aloisio A, Chiaia B, Marano G C. Convolutional networks and transformers for intelligent road tunnel investigations. Computers & Structures, 2023; 275.

	All versions	This version
Views	69	69
Downloads	36	36
Data volume	25.1 MB	25.1 MB

3.pdf

Files (611.1 kB)

Dates

References

YouTube Yorumlarından Spam Tespitine Yönelik Makine Öğrenmesi ve Derin Öğrenme Yöntemlerinin Karşılaştırmalı Bir Analizi

Authors/Creators

Description

Files

3.pdf

Files (611.1 kB)

Additional details

Dates

References