A MODIFIED K-MEANS APPROACH FOR EFFECTIVE CLUSTERING USING WEIGHTED ADJACENT MATRIX
Authors/Creators
Contributors
Research group:
Description
K-means clustering has several limitations, such as sensitivity to initialization and determining the number of clusters. It is sensitive to outliers, especially when identifying clusters with irregular shapes or varying sizes. Handling categorical data directly in k-means can be challenging. This study aims to present methods to improve the existing k-means clustering algorithms. It proposes designing two distinct proximity matrices for this purpose. The study suggests that the new algorithm performs better than traditional clustering methods based on several evaluation metrics. Randomly chosen centroids lead to unstable outcomes. The unpredictable initialization of centroids makes it difficult to replicate clustering results. Spectral clustering begins by creating a similarity matrix, followed by eigenvalue decomposition applied to the Laplacian matrix. This decomposition results in a spectral representation. However, optimal clustering outcomes cannot be guaranteed in the initial stage of the spectral clustering algorithm. This research proposes a solution to this issue. An Initialization & Similarity approach is recommended, where both the representation and the similarity matrix are determined in a cohesive manner. Additionally, it improves clustering performance by using sum of norms regularization. Based on evaluation metrics, this clustering technique proves to be better than the original k-means algorithm. Using normalized mutual information, purity, and accuracy as measures, the proposed technique demonstrates superiority over traditional algorithms. This study presents a novel approach to K-Means clustering by integrating a weighted adjacent matrix, significantly enhancing clustering accuracy and effectively handling high-dimensional data. The proposed methods, KM-AM and KM-WAM, show improved performance metrics such as normalized mutual information, accuracy, and purity, offering a more efficient and robust solution for various data analysis applications.
Files
24Vol103No5.pdf
Files
(967.9 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:044d1179c1e19a8ba05c64ccbdb9530e
|
967.9 kB | Preview Download |