Published May 25, 2025 | Version v1
Journal article Open

COMPARATIVE STUDY OF CLUSTERING ALGORITHMS FOR STUDENT PERFORMANCE EVALUATION

Description

Predicting student performance is essential for enhancing educational outcomes, enabling educators to identify students
who may need additional support or intervention. Clustering algorithms, as unsupervised data mining techniques, are
particularly effective at uncovering patterns in student performance data. These algorithms can group students based
on their exam scores, providing insights that allow for more tailored and targeted educational strategies. This study
compares four unsupervised methods K-Means, DBSCAN, Hierarchical Clustering (Ward linkage), and Gaussian
Mixture Models (GMM) on a dataset of 200 students’ scores across five exam questions. After standardizing the data,
we project it into two dimensions via Principal Component Analysis (PCA) for visualization. We then evaluate each
model using three validation metrics: Silhouette Score, Davies-Bouldin Index, and Calinski-Harabasz Index. K-Means
with k = 5 achieves the highest Silhouette (0.387) and Calinski-Harabasz (90.156) scores and the lowest DaviesBouldin Index (0.883), outperforming alternatives in both visual separation and quantitative metrics. DBSCAN
identifies noise but yields overlapping clusters; Hierarchical clustering shows moderate cohesion; GMM produces
softer boundaries. Our results demonstrate that K-Means offers the most interpretable and robust grouping for this
educational dataset, providing a practical tool for segmenting students into performance tiers. Future work may explore
dynamic k-selection methods, incorporation of additional student features, and deployment in intelligent tutoring
systems.

Files

MAY50.pdf

Files (498.0 kB)

Name Size Download all
md5:f028d8d9c6bd4c62771f6456374f96be
498.0 kB Preview Download