Published October 28, 2024 | Version v1
Dataset Open

A New Dataset for Streaming Learning Analytics

  • 1. University of Bari

Description

This research introduces a novel dataset developed for streaming learning analytics, derived from the Open University Learning Analytics Dataset (OULAD). The dataset incorporates essential temporal information that captures the timing of student interactions with the Virtual Learning Environment (VLE). By integrating these time-based interactions, the dataset enhances the capabilities of stream algorithms, which are particularly well-suited for real-time monitoring and analysis of student learning behaviors.

 

The dataset consists of 34 features and 1,718,983 samples, encompassing students' demographic information, assessment scores, and interactions with the VLE for a specific time ( T ), corresponding to each student ( S ) within a given course ( C ) and module ( M ). The target classes—'Withdrawn', 'Fail', 'Pass', and 'Distinction'—were encoded as 0, 1, 2, and 3, respectively. Notably, the data exhibits a significant imbalance, with a substantial prevalence of records associated with students who passed the final examination. The class distribution is as follows: 'Pass' (1,022,760 samples), 'Distinction' (308,642 samples), 'Fail' (227,550$ samples), and 'Withdrawn' (160,031 samples).

For further details on the data, please refer to the manuscript:  Gabriella Casalino, Giovanna Castellano, Gianluca Zaza, "Does Time Matter in Analyzing Educational Data? - A New Dataset for Streaming Learning Analytics.", CEUR Proceedings 

Files

dataset4classes.csv

Files (162.0 MB)

Name Size Download all
md5:22843500eaec301984926be96c18a9f6
162.0 MB Preview Download

Additional details

Related works

Dates

Accepted
2024-10

References

  • Gabriella Casalino, Giovanna Castellano, Gianluca Zaza, "Does Time Matter in Analyzing Educational Data? - A New Dataset for Streaming Learning Analytics.", CEUR Workshop Proceedings, 2024