Published March 15, 2023 | Version v1
Journal article Open

Review of CSEDM Data and Introduction of Two Public CS1 Keystroke Datasets

  • 1. Utah State University


Analysis of programming process data has become popular in computing education research and educational
data mining in the last decade. This type of data is quantitative, often of high temporal resolution,
and it can be collected non-intrusively while the student is in a natural setting. Many levels of granularity
can be obtained, such as submission, compilation, edit, and keystroke events, with keystroke-level logs
being the most fine-grained of commonly used dataset types. However, the lack of open datasets, especially
at the keystroke level, is notable. There are several reasons for this failing, with the most prominent
being the challenges of deidentification that are peculiar to keystroke log data. In this paper, we present
the public release of two fully deidentified keystroke datasets that are the first of their kind in terms of
both event and metadata richness. We describe our collection technique and properties of the data along
with deidentification techniques that, while not fully relieving researchers of significant effort, at least
reduce and streamline manual work in hopes that researchers will release similar datasets in the future.



Files (664.4 kB)

Name Size Download all
664.4 kB Preview Download

Additional details

Related works