Performance analysis in text clustering using k-means and k-medoids algorithms for Malay crime documents

doi:10.11591/ijece.v12i5.pp5014-5026

Published October 1, 2022 | Version v1

Journal article Open

Performance analysis in text clustering using k-means and k-medoids algorithms for Malay crime documents

1. Universiti Malaysia Terengganu
2. Universiti Kebangsaan Malaysia

Few studies on text clustering for the Malay language have been conducted due to some limitations that need to be addressed. The purpose of this article is to compare the two clustering algorithms of k-means and k-medoids using Euclidean distance similarity to determine which method is the best for clustering documents. Both algorithms are applied to 1,000 documents pertaining to housebreaking crimes involving a variety of different modus operandi. Comparability results indicate that the k-means algorithm performed the best at clustering the relevant documents, with a 78% accuracy rate. K-means clustering also achieves the best performance for cluster evaluation when comparing the average within-cluster distance to the k-medoids algorithm. However, k-medoids perform exceptionally well on the Davis Bouldin index (DBI). Furthermore, the accuracy of k-means is dependent on the number of initial clusters, where the appropriate cluster number can be determined using the elbow method.

Files

49 1570743644 27817 EMr 19apr22 16Jul21 K.pdf

Files (634.6 kB)

Name	Size	Download all
49 1570743644 27817 EMr 19apr22 16Jul21 K.pdf md5:73b179f3a7ef25cf96f51c57c232453f	634.6 kB	Preview Download

	All versions	This version
Views	14	14
Downloads	15	15
Data volume	10.2 MB	10.2 MB

Performance analysis in text clustering using k-means and k-medoids algorithms for Malay crime documents

Creators

Description

Files

49 1570743644 27817 EMr 19apr22 16Jul21 K.pdf

Files (634.6 kB)