memgonzales/meta-learning-clustering: Distance Metric Recommendation for k-Means via Meta-Learning
- 1. De La Salle University
Description
This is the repository for the paper "Distance Metric Recommendation for k-Means Clustering: A Meta-Learning Approach," presented at the 2022 IEEE Region 10 Conference (TENCON 2022). The project page is https://github.com/memgonzales/meta-learning-clustering. The paper is available at IEEE Xplore: https://ieeexplore.ieee.org/abstract/document/9978037.
The choice of distance metric impacts the clustering quality of centroid-based algorithms, such as k-means. Theoretical attempts to select the optimal metric entail deep domain knowledge, while experimental approaches are resource-intensive. This paper presents a meta-learning approach to automatically recommend a distance metric for k-means clustering that optimizes the Davies-Bouldin score. Three distance measures were considered: Chebyshev, Euclidean, and Manhattan. General, statistical, information-theoretic, structural, and complexity meta-features were extracted, and random forest was used to construct the meta-learning model; borderline SMOTE was applied to address class imbalance. The model registered an accuracy of 70.59%. Employing Shapley additive explanations, it was found that the mean of the sparsity of the attributes has the highest meta-feature importance. Feeding only the top 25 most important meta-features increased the accuracy to 71.57%. The main contribution of this paper is twofold: the construction of a meta-learning model for distance metric recommendation and a fine-grained analysis of the importance and effects of the meta-features on the model's output.
Files
memgonzales/meta-learning-clustering-v1.0.0.zip
Files
(93.5 MB)
Name | Size | Download all |
---|---|---|
md5:c69934785e0649ff2cab5c9a9b2eb208
|
93.5 MB | Preview Download |
Additional details
Related works
- Is published in
- Conference paper: 10.1109/TENCON55691.2022.9978037 (DOI)
- Is supplement to
- https://github.com/memgonzales/meta-learning-clustering/tree/v1.0.0 (URL)