Published March 15, 2025 | Version v1
Journal article Open

A MODIFIED K-MEANS APPROACH FOR EFFECTIVE CLUSTERING USING WEIGHTED ADJACENT MATRIX

Description

K-means clustering has several limitations, such as sensitivity to initialization and determining the number of clusters. It is sensitive to outliers, especially when identifying clusters with irregular shapes or varying sizes. Handling categorical data directly in k-means can be challenging. This study aims to present methods to improve the existing k-means clustering algorithms. It proposes designing two distinct proximity matrices for this purpose. The study suggests that the new algorithm performs better than traditional clustering methods based on several evaluation metrics. Randomly chosen centroids lead to unstable outcomes. The unpredictable initialization of centroids makes it difficult to replicate clustering results. Spectral clustering begins by creating a similarity matrix, followed by eigenvalue decomposition applied to the Laplacian matrix. This decomposition results in a spectral representation. However, optimal clustering outcomes cannot be guaranteed in the initial stage of the spectral clustering algorithm. This research proposes a solution to this issue. An Initialization & Similarity approach is recommended, where both the representation and the similarity matrix are determined in a cohesive manner. Additionally, it improves clustering performance by using sum of norms regularization. Based on evaluation metrics, this clustering technique proves to be better than the original k-means algorithm. Using normalized mutual information, purity, and accuracy as measures, the proposed technique demonstrates superiority over traditional algorithms. This study presents a novel approach to K-Means clustering by integrating a weighted adjacent matrix, significantly enhancing clustering accuracy and effectively handling high-dimensional data. The proposed methods, KM-AM and KM-WAM, show improved performance metrics such as normalized mutual information, accuracy, and purity, offering a more efficient and robust solution for various data analysis applications.

Files

24Vol103No5.pdf

Files (967.9 kB)

Name Size Download all
md5:044d1179c1e19a8ba05c64ccbdb9530e
967.9 kB Preview Download