Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

Published October 1, 2020 | Version v1
Journal article Open

Indonesian language email spam detection using N-gram and Naïve Bayes algorithm

  • 1. Universitas Multimedia Nusantara

Description

Indonesia is ranked the top 8th out of the total country population in the world
for the global spammers. Web-based spam filter service with the REST API
type can be used to detect email spam in the Indonesian language on the email server or various types of email server applications. With REST API, then there will be data exchange between the applications with JSON data type using existing HTTP commands. One type of spam filter commonly used is Bayesian Filtering, where the Naïve Bayes algorithm is used as a  classification algorithm. Meanwhile, the N-gram method is used to increase the accuracy of the implementation of the Naïve Bayes algorithm in this study. N-gram and Naïve Bayes algorithms to detect spam email in the  Indonesian language have successfully been implemented with accuracy
around 0.615 until 0.94, precision at 0.566 until 0.924, recall at 0.96 until 1.00, and F-measure at 0.721 until 0.942. The best solution is found by using
the 5-gram method with the highest score of accuracy at 0.94, precision at
0.924, recall at 0.96, and F-measure value at 0.942.

Files

33-2444.pdf

Files (270.0 kB)

Name Size Download all
md5:9ddc81718c93956c879116aa94d987e5
270.0 kB Preview Download