A New Dataset for Streaming Learning Analytics
Description
This research introduces a novel dataset developed for streaming learning analytics, derived from the Open University Learning Analytics Dataset (OULAD). The dataset incorporates essential temporal information that captures the timing of student interactions with the Virtual Learning Environment (VLE). By integrating these time-based interactions, the dataset enhances the capabilities of stream algorithms, which are particularly well-suited for real-time monitoring and analysis of student learning behaviors.
The dataset consists of 34 features and 1,718,983 samples, encompassing students' demographic information, assessment scores, and interactions with the VLE for a specific time ( T ), corresponding to each student ( S ) within a given course ( C ) and module ( M ). The target classes—'Withdrawn', 'Fail', 'Pass', and 'Distinction'—were encoded as 0, 1, 2, and 3, respectively. Notably, the data exhibits a significant imbalance, with a substantial prevalence of records associated with students who passed the final examination. The class distribution is as follows: 'Pass' (1,022,760 samples), 'Distinction' (308,642 samples), 'Fail' (227,550$ samples), and 'Withdrawn' (160,031 samples).
For further details on the data, please refer to the manuscript: Gabriella Casalino, Giovanna Castellano, Gianluca Zaza, "Does Time Matter in Analyzing Educational Data? - A New Dataset for Streaming Learning Analytics.", CEUR Proceedings
Files
dataset4classes.csv
Files
(162.0 MB)
Name | Size | Download all |
---|---|---|
md5:22843500eaec301984926be96c18a9f6
|
162.0 MB | Preview Download |
Additional details
Identifiers
Related works
- Is variant form of
- https://analyse.kmi.open.ac.uk/open_dataset (URL)
Dates
- Accepted
-
2024-10
References
- Gabriella Casalino, Giovanna Castellano, Gianluca Zaza, "Does Time Matter in Analyzing Educational Data? - A New Dataset for Streaming Learning Analytics.", CEUR Workshop Proceedings, 2024