Usability Survey Dataset: ChatGPT for Role‑Play Language Practice

Gervás, Pablo; León, Carlos; Kumar, Mayuresh; Méndez, Gonzalo; Susana, Bautista

doi:10.5281/zenodo.18783531

Published December 1, 2025 | Version 1.0

Dataset Open

Usability Survey Dataset: ChatGPT for Role‑Play Language Practice

1. Universidad Complutense de Madrid
2. Aligarh Muslim University
3. Universidad Complutense de Madrid Facultad de Informática
4. Universidad Francisco de Vitoria

Usability Survey Dataset: Prompted ChatGPT Role-Play for Language Practice

This Zenodo record contains the raw survey responses collected for the study reported in:

Pablo Gervás, Carlos León, Mayuresh Kumar, Gonzalo Méndez, Susana Bautista (2025). Prompting an LLM Chatbot to Role Play Conversational Situations for Language Practice.
In Proceedings of the 17th International Conference on Computer Supported Education (CSEDU 2025), Volume 2, pp. 257–264. SciTePress. DOI: 10.5220/0013235400003932.
Paper PDF (publisher page): https://www.scitepress.org/publishedPapers/2025/132354/pdf/index.html

Authors

The dataset authors are the same as the paper authors:

Pablo Gervás
Carlos León
Mayuresh Kumar
Gonzalo Méndez
Susana Bautista

Contact author: Carlos León — cleon@ucm.es

ORCID identifiers (from the paper PDF)

Pablo Gervás — ORCID: https://orcid.org/0000-0003-4906-9837
Carlos León — ORCID: https://orcid.org/0000-0002-6768-1766
Mayuresh Kumar — ORCID: https://orcid.org/0000-0002-1728-7349
Gonzalo Méndez — ORCID: https://orcid.org/0000-0001-7659-1482
Susana Bautista — ORCID: https://orcid.org/0000-0003-1648-0208

Overview

Large Language Model (LLM) chatbots can sustain fluent dialogues and can be configured via prompts to play specific roles in an interaction. The related paper proposes and evaluates a prompting framework to make a chatbot:

propose conversational situations of appropriate complexity,
play a role in those situations,
monitor learner language, and
provide feedback proactively and on request.

This dataset provides the participant-level questionnaire data from a usability-focused user study of that approach.

What is in this record?

Files

Usability of ChatGPT(1-20).xlsx — raw survey export (one row per participant).

Unit of analysis

One row = one participant (N=20)

Variables

The spreadsheet includes:

Administrative timestamps (start/end time),
Participant background (gender; self-rated English skills),
USE Questionnaire (Usefulness, Ease of Use, Ease of Learning, Satisfaction),
System Usability Scale (SUS),
Custom items about role-play language practice (clarifications, switching language, correction behavior, praise, perceived learning, perceived practice of specific skills),
Open-ended comments (optional).

Quick descriptive summary (computed from the XLSX)

Participants: 20
Columns: 67
Collection date: 2024-05-31 (all responses collected on the same day, based on Hora de inicio)
Hora de inicio range: 2024-05-31 08:16:51 → 2024-05-31 18:56:06
Survey completion time (minutes):
- mean ≈ 6.41
- median ≈ 6.08
- min ≈ 1.43, max ≈ 14.72
Gender (self-reported): Man=13, Woman=7
The Correo electrónico field is "anonymous" for all records in the provided file.

Instruments included

USE Questionnaire (30 items)

The dataset contains the 30-item USE Questionnaire (Likert 1–5). In this file, the items appear in the standard order and can be aggregated into the common subscales:

Usefulness (8 items): items 1–8 (from “It helps me be more effective” … “It does everything I would expect it to do”)
Ease of Use (11 items): items 9–19
Ease of Learning (4 items): items 20–23
Satisfaction (7 items): items 24–30

Suggested aggregation: mean (or sum) within each subscale (report which one you use).

System Usability Scale (SUS) (10 items)

The dataset includes the 10 SUS items (Likert 1–5). See Scoring below for the standard 0–100 calculation.

Custom pedagogical/usability items for language practice

The dataset includes additional questions about:

whether suggested topics enabled practice of the target feature,
clarification behavior (and whether clarifications were helpful),
switching to the native language and returning to Spanish afterward,
whether the chatbot corrected mistakes (and whether reminders were needed),
usefulness of corrections (open-ended),
praise, perceived learning, and perceived Spanish practice,
perceived practice of: conversation, writing, grammar, vocabulary.

Dataset structure (columns)

The spreadsheet has 67 columns. For convenience, they are grouped below.

1) Administrative / metadata

ID, Hora de inicio, Hora de finalización, Correo electrónico, Nombre, Hora de la última modificación

2) Participant background

Select your gender, and self-rated English skills (Speaking, Listening, Reading, Writing)

3) USE Questionnaire (30 items)

All columns from It helps me be more effective to It is pleasant to use

4) SUS (10 items)

All columns from I think that I would like to use this platform frequently to
I needed to learn a lot of things before I could get going with this platform

5) Role-play language-practice items (custom)

All columns from Did you feel that the chatbot proposed conversation topics... to Please, any comments more

Data dictionary (full column list)

Notes - Some headers include trailing non‑breaking spaces (\xa0) and/or newlines (\n) from the form export. - Suggested short names below are provided to support scripting; they are not additional files.

Group	XLSX column header (raw)	Suggested short name	Type	Notes
`administrative_metadata`	`ID`	`id`	integer	Administrative / survey export metadata.
`administrative_metadata`	`Hora de inicio`	`hora_de_inicio`	datetime	Administrative / survey export metadata.
`administrative_metadata`	`Hora de finalización`	`hora_de_finalización`	datetime	Administrative / survey export metadata.
`administrative_metadata`	`Correo electrónico`	`correo_electrónico`	string	Administrative / survey export metadata.
`administrative_metadata`	`Nombre`	`nombre`	string	Administrative / survey export metadata.
`administrative_metadata`	`Hora de la última modificación`	`hora_de_la_última_modificación`	datetime	Administrative / survey export metadata.
`participant_background`	`Select your gender`	`select_your_gender`	string	Participant self-report background item.
`participant_background`	`Rate your Speaking English language knowledge level`	`rate_your_speaking_english_language_knowledge_level`	integer (1–5)	Participant self-report background item.
`participant_background`	`Rate your Listening English language knowledge level`	`rate_your_listening_english_language_knowledge_level`	integer (1–5)	Participant self-report background item.
`participant_background`	`Rate your Reading English language knowledge level`	`rate_your_reading_english_language_knowledge_level`	integer (1–5)	Participant self-report background item.
`participant_background`	`Rate your Writing English language knowledge level`	`rate_your_writing_english_language_knowledge_level`	integer (1–5)	Participant self-report background item.
`use_questionnaire`	`It helps me be more effective`	`it_helps_me_be_more_effective`	integer (1–5)	USE Questionnaire item (Likert 1–5).
`use_questionnaire`	`It helps me be more productive`	`it_helps_me_be_more_productive`	integer (1–5)	USE Questionnaire item (Likert 1–5).
`use_questionnaire`	`It is useful`	`it_is_useful`	integer (1–5)	USE Questionnaire item (Likert 1–5).
`use_questionnaire`	`It gives me more control over the activities in my life`	`it_gives_me_more_control_over_the_activities_in_my_life`	integer (1–5)	USE Questionnaire item (Likert 1–5).
`use_questionnaire`	`It makes the things I want to accomplish easier to get done`	`it_makes_the_things_i_want_to_accomplish_easier_to_get_done`	integer (1–5)	USE Questionnaire item (Likert 1–5).
`use_questionnaire`	`It saves me time when I use it`	`it_saves_me_time_when_i_use_it`	integer (1–5)	USE Questionnaire item (Likert 1–5).
`use_questionnaire`	`It meets my needs`	`it_meets_my_needs`	integer (1–5)	USE Questionnaire item (Likert 1–5).
`use_questionnaire`	`It does everything I would expect it to do`	`it_does_everything_i_would_expect_it_to_do`	integer (1–5)	USE Questionnaire item (Likert 1–5).
`use_questionnaire`	`It is easy to use`	`it_is_easy_to_use`	integer (1–5)	USE Questionnaire item (Likert 1–5).
`use_questionnaire`	`It is simple to use`	`it_is_simple_to_use`	integer (1–5)	USE Questionnaire item (Likert 1–5).
`use_questionnaire`	`It is user friendly`	`it_is_user_friendly`	integer (1–5)	USE Questionnaire item (Likert 1–5).
`use_questionnaire`	`It requires the fewest steps possible to accomplish what I want to do with it`	`it_requires_the_fewest_steps_possible_to_accomplish_what_i_want_to_do_with_it`	integer (1–5)	USE Questionnaire item (Likert 1–5).
`use_questionnaire`	`It is flexible`	`it_is_flexible`	integer (1–5)	USE Questionnaire item (Likert 1–5).
`use_questionnaire`	`Using it is effortless`	`using_it_is_effortless`	integer (1–5)	USE Questionnaire item (Likert 1–5).
`use_questionnaire`	`I can use it without written instructions`	`i_can_use_it_without_written_instructions`	integer (1–5)	USE Questionnaire item (Likert 1–5).
`use_questionnaire`	`I do not notice any inconsistencies as I use it`	`i_do_not_notice_any_inconsistencies_as_i_use_it`	integer (1–5)	USE Questionnaire item (Likert 1–5).
`use_questionnaire`	`Both occasional and regular users would like it`	`both_occasional_and_regular_users_would_like_it`	integer (1–5)	USE Questionnaire item (Likert 1–5).
`use_questionnaire`	`I can recover from mistakes quickly and easily`	`i_can_recover_from_mistakes_quickly_and_easily`	integer (1–5)	USE Questionnaire item (Likert 1–5).
`use_questionnaire`	`I can use it successfully every time`	`i_can_use_it_successfully_every_time`	integer (1–5)	USE Questionnaire item (Likert 1–5).
`use_questionnaire`	`I learned to use it quickly`	`i_learned_to_use_it_quickly`	integer (1–5)	USE Questionnaire item (Likert 1–5).
`use_questionnaire`	`I easily remember how to use it`	`i_easily_remember_how_to_use_it`	integer (1–5)	USE Questionnaire item (Likert 1–5).
`use_questionnaire`	`It is easy to learn to use it`	`it_is_easy_to_learn_to_use_it`	integer (1–5)	USE Questionnaire item (Likert 1–5).
`use_questionnaire`	`I quickly became skillful with it`	`i_quickly_became_skillful_with_it`	integer (1–5)	USE Questionnaire item (Likert 1–5).
`use_questionnaire`	`I am satisfied with it`	`i_am_satisfied_with_it`	integer (1–5)	USE Questionnaire item (Likert 1–5).
`use_questionnaire`	`I would recommend it to a friend`	`i_would_recommend_it_to_a_friend`	integer (1–5)	USE Questionnaire item (Likert 1–5).
`use_questionnaire`	`It is fun to use`	`it_is_fun_to_use`	integer (1–5)	USE Questionnaire item (Likert 1–5).
`use_questionnaire`	`It works the way I want it to work`	`it_works_the_way_i_want_it_to_work`	integer (1–5)	USE Questionnaire item (Likert 1–5).
`use_questionnaire`	`It is wonderful`	`it_is_wonderful`	integer (1–5)	USE Questionnaire item (Likert 1–5).
`use_questionnaire`	`I feel I need to have it`	`i_feel_i_need_to_have_it`	integer (1–5)	USE Questionnaire item (Likert 1–5).
`use_questionnaire`	`It is pleasant to use`	`it_is_pleasant_to_use`	integer (1–5)	USE Questionnaire item (Likert 1–5).
`sus`	`I think that I would like to use this platform frequently`	`i_think_that_i_would_like_to_use_this_platform_frequently`	integer (1–5)	System Usability Scale item (Likert 1–5).
`sus`	`I found the platform unnecessarily complex`	`i_found_the_platform_unnecessarily_complex`	integer (1–5)	System Usability Scale item (Likert 1–5).
`sus`	`I thought the platform was easy to use`	`i_thought_the_platform_was_easy_to_use`	integer (1–5)	System Usability Scale item (Likert 1–5).
`sus`	`I think that I would need the support of a technical person to be able to use this platform`	`i_think_that_i_would_need_the_support_of_a_technical_person_to_be_able_to_use_this_platform`	integer (1–5)	System Usability Scale item (Likert 1–5).
`sus`	`I found the various functions in this platform were well integrated`	`i_found_the_various_functions_in_this_platform_were_well_integrated`	integer (1–5)	System Usability Scale item (Likert 1–5).
`sus`	`I thought there was too much inconsistency in the platform`	`i_thought_there_was_too_much_inconsistency_in_the_platform`	integer (1–5)	System Usability Scale item (Likert 1–5).
`sus`	`I would imagine that most people would learn to use this platform very quickly`	`i_would_imagine_that_most_people_would_learn_to_use_this_platform_very_quickly`	integer (1–5)	System Usability Scale item (Likert 1–5).
`sus`	`I found the platform very cumbersome to use`	`i_found_the_platform_very_cumbersome_to_use`	integer (1–5)	System Usability Scale item (Likert 1–5).
`sus`	`I felt very confident using the platform`	`i_felt_very_confident_using_the_platform`	integer (1–5)	System Usability Scale item (Likert 1–5).
`sus`	`I needed to learn a lot of things before I could get going with this platform`	`i_needed_to_learn_a_lot_of_things_before_i_could_get_going_with_this_platform`	integer (1–5)	System Usability Scale item (Likert 1–5).
`roleplay_language_practice_items`	`Did you feel that the chatbot proposed conversation topics that allowed you to practice the targeted feature of the language?`	`did_you_feel_that_the_chatbot_proposed_conversation_topics_that_allowed_you_to_practice_the_targeted_feature_of_the_language`	integer (1–5)	Custom study item about interaction/pedagogy (Likert 1–5).
`roleplay_language_practice_items`	`Did you request clarifications?`	`did_you_request_clarifications`	integer (1–5)	Custom study item about interaction/pedagogy (Likert 1–5).
`roleplay_language_practice_items`	`Where they helpful?`	`where_they_helpful`	string	Open-ended text response.
`roleplay_language_practice_items`	`Did you need to ask for explanations in your native language?`	`did_you_need_to_ask_for_explanations_in_your_native_language`	integer (1–5)	Custom study item about interaction/pedagogy (Likert 1–5).
`roleplay_language_practice_items`	`If you asked for clarification in English, did the chatbot remember to return to Spanish after explaining?`	`if_you_asked_for_clarification_in_english_did_the_chatbot_remember_to_return_to_spanish_after_explaining`	integer (1–5)	Custom study item about interaction/pedagogy (Likert 1–5).
`roleplay_language_practice_items`	`Did the chatbot remember to correct your mistakes?`	`did_the_chatbot_remember_to_correct_your_mistakes`	integer (1–5)	Custom study item about interaction/pedagogy (Likert 1–5).
`roleplay_language_practice_items`	`Did you have to remind it to correct your mistakes?`	`did_you_have_to_remind_it_to_correct_your_mistakes`	integer (1–5)	Custom study item about interaction/pedagogy (Likert 1–5).
`roleplay_language_practice_items`	`Where the corrections useful?

Scoring and derived measures

SUS (0–100)

To compute the standard SUS score:

For items 1, 3, 5, 7, 9: contribution = (score − 1)
For items 2, 4, 6, 8, 10: contribution = (5 − score)
SUS score = (sum of contributions) × 2.5

Where the item order is the same as the column order in the XLSX under the SUS group.

USE subscales

A typical approach is to compute a mean (or sum) for each subscale:

Usefulness (8 items)
Ease of Use (11 items)
Ease of Learning (4 items)
Satisfaction (7 items)

Because the raw export does not embed the survey anchors, treat the scale as ordinal 1–5 and document your assumed anchors when reporting results.

Custom items

Most custom items are also stored as 1–5 numeric responses. Some are phrased as Yes/No questions but appear numerically coded; interpret them as degree/frequency unless you have access to the exact survey labels used in the form.

Recommended preprocessing

1) Preserve raw values

Keep the XLSX as the source of truth. If you export to CSV, store a copy of the exact export and record your preprocessing steps.

2) Clean column names (optional)

If you want to work with cleaner headers, you may: - strip whitespace, - replace non‑breaking spaces, - replace newlines with spaces, - map the raw headers to short snake_case names.

3) Handle missingness

In the provided file: - Nombre and Hora de la última modificación are empty for all participants, - the open-ended Please, any comments more is missing for several participants.

Reproducible loading examples

Python (pandas)

import pandas as pd

df = pd.read_excel("Usability of ChatGPT(1-20).xlsx", sheet_name="Sheet1")

# Parse survey duration in minutes
start = pd.to_datetime(df["Hora de inicio"])
end = pd.to_datetime(df["Hora de finalización"])
df["duration_min"] = (end - start).dt.total_seconds() / 60

# SUS scoring (0–100)
sus_cols = [
    "I think that I would like to use this platform frequently ",
    "I found the platform unnecessarily complex",
    "I thought the platform was easy to use ",
    "I think that I would need the support of a technical person to be able to use this platform ",
    "I found the various functions in this platform were well integrated",
    "I thought there was too much inconsistency in the platform",
    "I would imagine that most people would learn to use this platform very quickly ",
    "I found the platform very cumbersome to use ",
    "I felt very confident using the platform",
    "I needed to learn a lot of things before I could get going with this platform ",
]

X = df[sus_cols].copy()
odd = [0, 2, 4, 6, 8]   # items 1,3,5,7,9
even = [1, 3, 5, 7, 9]  # items 2,4,6,8,10

sus_contrib = X.copy()
sus_contrib.iloc[:, odd] = sus_contrib.iloc[:, odd] - 1
sus_contrib.iloc[:, even] = 5 - sus_contrib.iloc[:, even]
df["sus_score_0_100"] = sus_contrib.sum(axis=1) * 2.5

R (readxl)

library(readxl)
df <- read_excel("Usability of ChatGPT(1-20).xlsx", sheet = "Sheet1")
dim(df)
head(df)

Ethics, privacy, and responsible reuse

The provided export is anonymized at the level of direct identifiers:
Correo electrónico is "anonymous" for all records and Nombre is empty.
Free-text fields may still contain incidental personal information entered by participants.
If you publish derived versions (cleaned CSV, annotated texts), consider re-checking those fields before release.
This dataset reflects self-reported perceptions (usability/learning impressions) and does not measure objective learning gains.

License

Choose a Zenodo license compatible with participant consent and institutional policy.

Common options: - CC BY 4.0 (recommended for open data when possible), - CC BY-NC 4.0 (if restricting commercial reuse is required), - CC BY-NC-ND 4.0 is more restrictive and may reduce reuse.

The associated paper is published under CC BY-NC-ND 4.0 according to the PDF footer.

How to cite

Dataset (this Zenodo record)

Replace placeholders after Zenodo assigns the DOI and version.

Recommended citation > Gervás, P., León, C., Kumar, M., Méndez, G., & Bautista, S. ([YEAR]). Usability Survey Dataset: Prompted ChatGPT Role-Play for Language Practice (Version [VERSION]) [Data set]. Zenodo. https://doi.org/[ZENODO_DOI]

BibTeX (dataset)

@dataset{gervas_leon_kumar_mendez_bautista_zenodo_dataset,
  author       = {Pablo Gerv{\'a}s and Carlos Le{\'o}n and Mayuresh Kumar and Gonzalo M{\'e}ndez and Susana Bautista},
  title        = {Usability Survey Dataset: Prompted ChatGPT Role-Play for Language Practice},
  year         = {2025},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.18783531}
}

Related paper

Recommended citation > Gervás, P., León, C., Kumar, M., Méndez, G., & Bautista, S. (2025). Prompting an LLM Chatbot to Role Play Conversational Situations for Language Practice. In Proceedings of the 17th International Conference on Computer Supported Education (CSEDU 2025) – Volume 2, 257–264. SciTePress. https://doi.org/10.5220/0013235400003932

BibTeX (paper)

@conference{csedu25,
  author       = {{Pablo Gerv{\'a}s and Carlos Le{\'o}n and Mayuresh Kumar and Gonzalo M{\'e}ndez and Susana Bautista}},
  title        = {{Prompting an LLM Chatbot to Role Play Conversational Situations for Language Practice}},
  booktitle    = {{Proceedings of the 17th International Conference on Computer Supported Education - Volume 2: CSEDU}},
  year         = {{2025}},
  pages        = {{257-264}},
  publisher    = {{SciTePress}},
  organization = {{INSTICC}},
  doi          = {{10.5220/0013235400003932}},
  isbn         = {{978-989-758-746-7}}
}

Versioning and changelog

v1.0.0 — Initial release of the raw XLSX export (N=20).

Funding / acknowledgements (optional)

If you have project or institutional funding information, add it here (grant numbers, institutions, etc.).

Questions?

Please contact Carlos León (cleon@ucm.es).

Files

README.md

Files (41.7 kB)

Name	Size	Download all
README.md md5:95d532f2aa9530ee71b284e2f0bc25d9	23.4 kB	Preview Download
Usability of ChatGPT(1-20).xlsx md5:987745e83d0039b56c7310c77ae976e3	18.3 kB	Download

Additional details

Is described by: Conference paper: 10.5220/0013235400003932 (DOI)

Agencia Estatal de Investigación
DARK NITE PID2023-146308OB-I00
European Commission
EA-DIGIFOLK 101086338
Consejo de Seguridad Nuclear
ADARVE SUBV20/2021
Agencia Estatal de Investigación
CANTOR PID2019-108927RB- I00

Development Status: Active

	All versions	This version
Views	8	8
Downloads	0	0
Data volume	0 Bytes	0 Bytes

Usability Survey Dataset: ChatGPT for Role‑Play Language Practice

Authors/Creators

Description

Usability Survey Dataset: Prompted ChatGPT Role-Play for Language Practice

Authors

ORCID identifiers (from the paper PDF)

Overview

What is in this record?

Files

Unit of analysis

Variables

Quick descriptive summary (computed from the XLSX)

Instruments included

USE Questionnaire (30 items)

System Usability Scale (SUS) (10 items)

Custom pedagogical/usability items for language practice

Dataset structure (columns)

1) Administrative / metadata

2) Participant background

3) USE Questionnaire (30 items)

4) SUS (10 items)

5) Role-play language-practice items (custom)

Data dictionary (full column list)

Scoring and derived measures

SUS (0–100)

USE subscales

Custom items

Recommended preprocessing

1) Preserve raw values

2) Clean column names (optional)

3) Handle missingness

Reproducible loading examples

Python (pandas)

R (readxl)

Ethics, privacy, and responsible reuse

License

How to cite

Dataset (this Zenodo record)

Related paper

Versioning and changelog

Funding / acknowledgements (optional)

Questions?

Files

README.md

Files (41.7 kB)

Additional details

Related works

Funding

Software