Published November 14, 2024 | Version v1
Dataset Open

Student discourse in small-group collaborative contexts in real-world higher education

  • 1. University College London (UCL)

Description

This dataset contains anonymised student discourse in small-group collaborative contexts in real-world higher education settings.

There are 28 sessions, composed of 10799 utterances.

The columns consist of 13 columns:

  • session: the session number
  • start: the starting time of the utterance
  • end: the ending time of the utterance
  • speaker: anonymised speaker name
  • content: the content of the utterance
  • ep: the episode number of the utterance. An episode refers to a single topic of discussion with multiple utterances.
  • C: a binary value indicating the presence or absence of a cognitive challenge, where 0 means no challenge, and 1 means there is a cognitive challenge.
  • E: a binary value indicating the presence or absence of an emotional/motivational challenge, where 0 means no challenge, and 1 means there is an emotional/motivational challenge.
  • M: a binary value indicating the presence or absence of a metacognitive challenge, where 0 means no challenge, and 1 means there is a metacognitive challenge.
  • T: a binary value indicating the presence or absence of a technical/other challenge, where 0 means no challenge, and 1 means there is a technical/other challenge.
  • TA: a binary value indicating the presence or absence of the regulatory process "task analysis," where 0 means no regulation and 1 means task analysis is present.
  • MC: a binary value indicating the presence or absence of the regulatory process "monitoring/control," where 0 means no regulation and 1 means monitoring/control is present.
  • RA: a binary value indicating the presence or absence of the regulatory process "reflection/adaptation," where 0 means no regulation and 1 means reflection/adaptation is present.

This annotated dataset was used in the following paper to model challenge moments.

For more details on how the dataset was generated, please refer to the paper.

Suraworachet, W., Seon, J., & Cukurova, M. (2024). Predicting challenge moments from students’ discourse: A comparison of GPT-4 to two traditional natural language processing approaches. Proceedings of the 14th Learning Analytics and Knowledge Conference, 473–485. https://doi.org/10.1145/3636555.3636905
 

Files

dataset.csv

Files (1.2 MB)

Name Size Download all
md5:d44e5650fc72fdb58c43d1b9ef738b99
1.2 MB Preview Download

Additional details