Assessing the impact of hints in learning formal specification: Research artifact
Authors/Creators
Description
This artifact accompanies the SEET@ICSE article "Assessing the impact of hints in learning formal specification", which reports on a user study to investigate the impact of different types of automated hints while learning a formal specification language, both in terms of immediate performance and learning retention, but also in the emotional response of the students. This research artifact provides all the material required to replicate this study (except for the proprietary questionnaires passed to assess the emotional response and user experience), as well as the collected data and data analysis scripts used for the discussion in the paper.
Dataset
The artifact contains the resources described below.
Experiment resources
The resources needed for replicating the experiment, namely in directory experiment:
alloy_sheet_pt.pdf: the 1-page Alloy sheet that participants had access to during the 2 sessions of the experiment. The sheet was passed in Portuguese due to the population of the experiment.alloy_sheet_en.pdf: a version the 1-page Alloy sheet that participants had access to during the 2 sessions of the experiment translated into English.docker-compose.yml: a Docker Compose configuration file to launch Alloy4Fun populated with the tasks in directorydata/experimentfor the 2 sessions of the experiment.apiandmeteor: directories with source files for building and launching the Alloy4Fun platform for the study.
Experiment data
The task database used in our application of the experiment, namely in directory data/experiment:
Model.json,Instance.json, andLink.json: JSON files with to populate Alloy4Fun with the tasks for the 2 sessions of the experiment.identifiers.txt: the list of all (104) available participant identifiers that can participate in the experiment.
Collected data
Data collected in the application of the experiment as a simple one-factor randomised experiment in 2 sessions involving 85 undergraduate students majoring in CSE. The experiment was validated by the Ethics Committee for Research in Social and Human Sciences of the Ethics Council of the University of Minho, where the experiment took place. Data is shared the shape of JSON and CSV files with a header row, namely in directory data/results:
data_sessions.json: data collected from task-solving in the 2 sessions of the experiment, used to calculate variables productivity (PROD1andPROD2, between 0 and 12 solved tasks) and efficiency (EFF1andEFF2, between 0 and 1).data_socio.csv: data collected from socio-demographic questionnaire in the 1st session of the experiment, namely:- participant identification: participant's unique identifier (
ID); - socio-demographic information: participant's age (
AGE), sex (SEX, 1 through 4 for female, male, prefer not to disclosure, and other, respectively), and average academic grade (GRADE, from 0 to 20,NAdenotes preference to not disclosure).
- participant identification: participant's unique identifier (
data_emo.csv: detailed data collected from the emotional questionnaire in the 2 sessions of the experiment, namely:
- participant identification: participant's unique identifier (
ID) and the assigned treatment (columnHINT, eitherN,L,EorD); - detailed emotional response data: the differential in the 5-point Likert scale for each of the 14 measured emotions in the 2 sessions, ranging from -5 to -1 if decreased, 0 if maintained, from 1 to 5 if increased, or
NAdenoting failure to submit the questionnaire. Half of the emotions are positive (Admiration1andAdmiration2,Desire1andDesire2,Hope1andHope2,Fascination1andFascination2,Joy1andJoy2,Satisfaction1andSatisfaction2, andPride1andPride2), and half are negative (Anger1andAnger2,Boredom1andBoredom2,Contempt1andContempt2,Disgust1andDisgust2,Fear1andFear2,Sadness1andSadness2, andShame1andShame2). This detailed data was used to compute the aggregate data indata_emo_aggregate.csvand in the detailed discussion in Section 6 of the paper.
- participant identification: participant's unique identifier (
data_umux.csv: data collected from the user experience questionnaires in the 2 sessions of the experiment, namely:
- participant identification: participant's unique identifier (
ID); - user experience data: summarised user experience data from the UMUX surveys (
UMUX1andUMUX2, as a usability metric ranging from 0 to 100).
- participant identification: participant's unique identifier (
participants.txt: the list of participant identifiers that have registered for the experiment.
Analysis scripts
The analysis scripts required to replicate the analysis of the results of the experiment as reported in the paper, namely in directory analysis:
analysis.r: An R script to analyse the data in the provided CSV files; each performed analysis is documented within the file itself.requirements.r: An R script to install the required libraries for the analysis script.normalize_task.r: A Python script to normalize the task JSON data from filedata_sessions.jsoninto the CSV format required by the analysis script.normalize_emo.r: A Python script to compute the aggregate emotional response in the CSV format required by the analysis script from the detailed emotional response data in the CSV format ofdata_emo.csv.Dockerfile: Docker script to automate the analysis script from the collected data.
Setup
To replicate the experiment and the analysis of the results, only Docker is required.
If you wish to manually replicate the experiment and collect your own data, you'll need to install:
- A modified version of the Alloy4Fun platform, which is built in the Meteor web framework. This version of Alloy4Fun is publicly available in branch
studyof its repository at https://github.com/haslab/Alloy4Fun/tree/study.
If you wish to manually replicate the analysis of the data collected in our experiment, you'll need to install:
- Python to manipulate the JSON data collected in the experiment. Python is freely available for download at https://www.python.org/downloads/, with distributions for most platforms.
- R software for the analysis scripts. R is freely available for download at https://cran.r-project.org/mirrors.html, with binary distributions available for Windows, Linux and Mac.
Usage
Experiment replication
This section describes how to replicate our user study experiment, and collect data about how different hints impact the performance of participants.
To launch the Alloy4Fun platform populated with tasks for each session, just run the following commands from the root directory of the artifact. The Meteor server may take a few minutes to launch, wait for the "Started your app" message to show.
cd experimentdocker-compose up
This will launch Alloy4Fun at http://localhost:3000. The tasks are accessed through permalinks assigned to each participant. The experiment allows for up to 104 participants, and the list of available identifiers is given in file identifiers.txt. The group of each participant is determined by the last character of the identifier, either N, L, E or D. The task database can be consulted in directory data/experiment, in Alloy4Fun JSON files.
In the 1st session, each participant was given one permalink that gives access to 12 sequential tasks. The permalink is simply the participant's identifier, so participant 0CAN would just access http://localhost:3000/0CAN. The next task is available after a correct submission to the current task or when a time-out occurs (5mins). Each participant was assigned to a different treatment group, so depending on the permalink different kinds of hints are provided. Below are 4 permalinks, each for each hint group:
- Group N (no hints): http://localhost:3000/0CAN
- Group L (error locations): http://localhost:3000/CA0L
- Group E (counter-example): http://localhost:3000/350E
- Group D (error description): http://localhost:3000/27AD
In the 2nd session, likewise the 1st session, each permalink gave access to 12 sequential tasks, and the next task is available after a correct submission or a time-out (5mins). The permalink is constructed by prepending the participant's identifier with P-. So participant 0CAN would just access http://localhost:3000/P-0CAN. In the 2nd sessions all participants were expected to solve the tasks without any hints provided, so the permalinks from different groups are undifferentiated.
Before the 1st session the participants should answer the socio-demographic questionnaire, that should ask the following information: unique identifier, age, sex, familiarity with the Alloy language, and average academic grade.
Before and after both sessions the participants should answer the standard PrEmo 2 questionnaire. PrEmo 2 is published under an Attribution-NonCommercial-NoDerivatives 4.0 International Creative Commons licence (CC BY-NC-ND 4.0). This means that you are free to use the tool for non-commercial purposes as long as you give appropriate credit, provide a link to the license, and do not modify the original material. The original material, namely the depictions of the diferent emotions, can be downloaded from https://diopd.org/premo/. The questionnaire should ask for the unique user identifier, and for the attachment with each of the depicted 14 emotions, expressed in a 5-point Likert scale.
After both sessions the participants should also answer the standard UMUX questionnaire. This questionnaire can be used freely, and should ask for the user unique identifier and answers for the standard 4 questions in a 7-point Likert scale. For information about the questions, how to implement the questionnaire, and how to compute the usability metric ranging from 0 to 100 score from the answers, please see the original paper:
- Kraig Finstad. 2010. The usability metric for user experience. Interacting with computers 22, 5 (2010), 323–327.
Analysis of other applications of the experiment
This section describes how to replicate the analysis of the data collected in an application of the experiment described in Experiment replication.
The analysis script expects data in 4 CSV files, data_socio.csv, data_tasks.csv, data_emo.csv, and data_umux.csv, as well as a list of registered participants in participants.txt. The process to create these files is as follows, assuming that the docker-compose up command is still running with the Alloy4Fun platform and that the working directory is the experiment directory of the artifact.
File participants.txt
Each participant must be assigned an identifier from the list available at identifiers.txt which gives access to the challenges in Alloy4Fun through permalinks (see Experiment replication). Create a file participants.txt with the identifiers of the registered participants, one per line, selected from identifiers.txt.
File data_socio.csv
Candidates that reported familiarity with Alloy should be excluded from the analysis, and the remaining data from the socio-demographic questionnaire should be encoded in file data_socio.csv with the same CSV format used for the data collected in our experience (see Data).
File data_tasks.csv
The task data must be retrieved from the Alloy4Fun database and converted into CSV. After the 2nd session takes place, execute the following commands to retrieve the JSON file with the task data and convert it into the normalized CSV file:
docker-compose exec mongo mongoexport --db meteor --port 27017 --collection Model --out data_sessions.jsondocker-compose cp mongo:data_sessions.json .python ../analysis/normalize_task.py data_sessions.json participants.txt > data_task.csv
The resulting data_tasks.csv file contains the following information:
- participant identification: participant's unique identifier (
ID) and the assigned treatment (columnHINT, eitherN,L,EorD); - overall task resolution: task resolution data in the two sessions, namely variables productivity (
PROD1andPROD2, between 0 and 12 solved tasks) and efficiency (EFF1andEFF2, between 0 and 1); the number of tries is also provided (TRY1andTRY2), from which efficiency was calculated; - task resolution per domain model: the task resolution data from the 1st session but split by domain model (suffix
Afor the social network and suffixBfor course management), which was used to discuss a possible threat to internal validity in Section 5 of the paper.
Using this strategy the distracted participant can still access the tasks from the 1st session, so it may be wise to double-check the resulting JSON.
File data_emo.csv
The unique identifier and the difference between the Likerts for all emotions after each session reported in PrEmo 2 questionnaires should be encoded in file data_emo.csv with the same CSV format used for the data collected in our experience (see Data). Afterwards, run the following script to compute the respective aggregate values.
python ../analysis/normalize_emo.py data_emo.csv participants.txt > data_emo_aggregate.csv
The resulting data_emo_aggregate.csv file contains the following information:
- participant identification: participant's unique identifier (
ID); - emotional response data: summarised emotional response data (
EMO1andEMO12), the sum of differential in positive and negative emotions, ranging from -14 to 14 (for each emotion we only measured if there was a variation in any direction, ranging from -1 to +1), orNAif the participant did not submit the questionnaire.
File data_umux.csv
The unique identifier and the usability metrics of both sessions collected from the UMUX questionnaires should be encoded in file data_umux.csv with same CSV format used for the data collected in our experience (see Data).
Running the analysis
The CSV data is then analysed with an R script. The first step is to install the required libraries. This step may require super-user privileges:
sudo Rscript ../analysis/requirements.r
Then just run the analysis script to get the result of the statistical tests:
Rscript ../analysis/analysis.r
The data analysis results are reported in the terminal by default. For more details on running R, you can consult https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Invoking-R.
Files
artifact.zip
Files
(21.8 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:6994657dcdd4530c6449435936f3973a
|
21.8 MB | Preview Download |
Additional details
Related works
- Is supplement to
- Conference paper: https://doi.org/10.1145/3639474.3640050 (URL)
Funding
Dates
- Collected
-
2023-03-24/2023-03-31Data collection