Student Performance and Learning Behavior Dataset for Educational Analytics

NAJEM, Kamal

doi:10.5281/zenodo.16459132

Published July 26, 2025 | Version v1

Dataset Open

Student Performance and Learning Behavior Dataset for Educational Analytics

NAJEM, Kamal¹

1. Mohammed V University

The dataset used in this study integrates quantitative data on student learning behaviors, engagement patterns, demographics, and academic performance. It was compiled by merging two publicly available Kaggle datasets, resulting in a combined file (“merged_dataset.csv”) containing 14,003 student records with 16 attributes. All records are anonymized and contain no personally identifiable information.

The dataset covers the following categories of variables:

Study behaviors and engagement: StudyHours, Attendance, Extracurricular, AssignmentCompletion, OnlineCourses, Discussions
Resource access and learning environment: Resources, Internet, EduTech
Motivation and psychological factors: Motivation, StressLevel
Demographic information: Gender, Age (ranging from 18 to 30 years)
Learning preference classification: LearningStyle
Academic performance indicators: ExamScore, FinalGrade

In this study, “ExamScore” and “FinalGrade” served as the primary performance indicators. The remaining variables were used to derive behavioral and contextual profiles, which were clustered using unsupervised machine learning techniques.

The analysis and modeling were implemented in Python through a structured Jupyter Notebook (“Project.ipynb”), which included the following main steps:

Environment Setup – Import of essential libraries (NumPy, pandas, Matplotlib, Seaborn, SciPy, StatsModels, scikit-learn, imbalanced-learn) and visualization configuration.
Data Import and Integration – Loading the two source CSV files, harmonizing columns, removing irrelevant attributes, aligning formats, handling missing values, and merging them into a unified dataset (merged_dataset.csv).
Data Preprocessing –
- Encoding categorical variables using LabelEncoder.
- Scaling features using both z-score standardization (for statistical tests and PCA) and Min–Max normalization (for clustering).
- Detecting and removing duplicates.
Clustering Analysis –
- Applying K-Means clustering to segment learners into distinct profiles.
- Determining the optimal number of clusters using the Elbow Method and Silhouette Score.
- Evaluating cluster quality with internal metrics (Silhouette Score, Davies–Bouldin Index).
Dimensionality Reduction & Visualization – Using PCA for 2D/3D cluster visualization and feature importance exploration.
Mapping Clusters to Learning Styles – Associating each identified cluster with the most relevant learning style model based on feature patterns and alignment scores.
Statistical Analysis – Conducting ANOVA and regression to test for significant differences in performance between clusters.
Interpretation & Practical Recommendations – Analyzing cluster-specific characteristics and providing implications for adaptive and mobile learning integration.

Files

merged_dataset.csv

Files (2.4 MB)

Name	Size	Download all
merged_dataset.csv md5:f4e2d66446de33c9591292a90e6a7423	539.6 kB	Preview Download
Project.ipynb md5:5a9e4e66505f111b1367d2917827f82a	1.9 MB	Preview Download

Additional details

Available: 2025-07-26

	All versions	This version
Views	779	779
Downloads	813	813
Data volume	821.3 MB	821.3 MB

Student Performance and Learning Behavior Dataset for Educational Analytics

Creators

Description

Files

merged_dataset.csv

Files (2.4 MB)

Additional details

Dates