ESQAD: Educational Spanish Question-Answer Dataset
Creators
- 1. Universidad Politécnica de Madrid
- 2. Universidad Politécnica de Madrid Escuela Universitaria de Informática
Description
Spanish Question Answer Generation Dataset and Code
Description
This repository contains the dataset and source code developed for the article: "ESQAD: An Open Spanish Dataset for Curriculum-Aligned Question-Answer Generation in Educational Settings"
The resources include:
- A Spanish QAG dataset aligned with national curricula (EVAU).
- Automatically generated QAG pairs from literary and legal sources.
- A pilot study subset with questions validated by teachers and students.
Dataset Structure
1. EVAU
- File: `evau/docs/EvAU_QA.csv`
- Description: Manually curated questions and answers aligned with the Spanish *Evaluación para el Acceso a la Universidad (EVAU)*.
- Columns: `question`, `answer`, `subject`, `difficulty`
- Purpose: Benchmark for educational QAG tasks in Spanish.
2. Quijote
- File: `quijote/docs/Quijote_QA.csv`
- Description: Automatically generated QAG pairs from *Don Quijote de la Mancha*.
- Columns: `question`, `answer`, `chapter`, `difficulty`
- Purpose: Evaluation of QAG performance on literary texts.
3. Legal FAQs
- File: `legal_faqs/docs/Legal_QA.csv`
- Description: Questions and answers extracted and generated from FAQs related to Spanish laws (*Ley 39/2015* and *Ley 40/2015*).
- Columns: `question`, `answer`, `law_reference`
- Purpose: Testing QAG in legal and administrative contexts.
4. Exams (Pilot Study)
- File: `exams/exams_QA_validated.json`
- Description: 923 automatically generated QAG pairs evaluated by teachers and students during a pilot study:
- Ratings: Clarity, complexity, pedagogical value (1–3 scale).
- Difficulty: Intended vs perceived difficulty levels.
- Comments: Free-text feedback from users.
- Purpose: Benchmark for evaluating QAG quality with human-validated data.
Citation
This dataset accompanies the article:
Badenes-Olmedo, C., Eyzaguirre-Barreda, P., Chu-Artzt, N., & Gayoso-Cabada, J. (2025).
"ESQAD: A Curriculum-Aligned Spanish Dataset for Educational Question Answering"
submitted to Computer Speech & Language (Elsevier, 2025)
Please cite this resource using the article (once published), or refer to this Zenodo DOI in the meantime.
License
- Datasets: CC BY 4.0
- Source Code: MIT License
Contact
- Carlos Badenes-Olmedo: carlos.badenes@upm.es
- Noa Chu-Artzt: noa.chu.artzt@alumnos.upm.es
- Paul Eyzaguirre-Barreda: paul.eyzaguirre@alumnos.upm.es
- Joaquin Gayoso-Cabada: j.gayoso@upm.es
Files
QAG_Spanish_ExpertSystems_v1.0.zip
Files
(676.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:ab6d34257ca2f8d2daa6c6dcd6f238a2
|
676.1 MB | Preview Download |