There is a newer version of the record available.

Published July 27, 2023 | Version 2.0.0
Dataset Open


  • 1. Grupo de Patología Musculoesquelética. Hospital Clínico San Carlos. Instituto de Investigación Sanitaria San Carlos (IdISSC), Prof. Martin Lagos s/n, Madrid, 28040, Spain
  • 2. Reumatología. Hospital Universitario la Paz-IdiPaz, Paseo de la Castellana, 261, Madrid, 28046, Spain
  • 3. Medicina Interna. Hospital Universitario del Henares, Avenida de Marie Curie, 0, Madrid, 28822, Spain
  • 1. Grupo de Patología Musculoesquelética. Hospital Clínico San Carlos. Instituto de Investigación Sanitaria San Carlos (IdISSC), Prof. Martin Lagos s/n, Madrid, 28040, Spain
  • 2. Reumatología. Hospital Universitario la Paz-IdiPaz, Paseo de la Castellana, 261, Madrid, 28046, Spain
  • 3. Medicina Interna. Hospital Universitario del Henares, Avenida de Marie Curie, 0, Madrid, 28822, Spain


This dataset accompanies the research paper entitled: 

Harnessing ChatGPT and GPT-4 for Evaluating the Rheumatology Questions of the Spanish Access Exam to Specialized Medical Training.

Alfredo Madrid-García, Zulema Rosales-Rosado, Dalifer Dayanira Freites-Núñez, Inés Pérez-Sancristobal, Esperanza Pato-Cour, Chamaida Plasencia-Rodríguez, Luis Cabeza-Osorio, Lydia Abasolo Alcazar, Leticia Leon Mateos, Benjamín Fernández-Gutiérrez, Luis Rodríguez-Rodríguez

medRxiv 2023.07.21.23292821; doi:

The dataset contains 145 rheumatology-related questions extracted from the Spanish MIR exams held between the academic years 2009-2010 to 2022-2023. The questions are evaluated by ChatGPT, GPT-4, BARD and CLAUDE. Six rheumatologists assess the clinical reasoning of ChatGPT and GPT-4.

The dataset is made up of the following columns:

Column Description
Id Question identifier
Question (ES) MIR exam question in Spanish
Question (EN) Translation of `Question (ES)` column
Year Academic year of the question (from 2009-2010 to 2022-2023)
Question Type Case or factual question
Genre Male, Female, Does not apply, No sex (newborn)
Invalidated question 0,1 (invalidated question by the Spanish Minister of Health)
Official answer Official answer given by the Spanish Minister of Health
GPT-4 answer Answer provided by GPT-4
Correct answer GPT-4 0, 1 (Whether the answer provided by GPT-4 is correct)
Clinical reasoning GPT-4 (ES) Clinical reasoning provided by GPT-4 
Clinical reasoning GPT-4 (EN) Translation of `Clinical reasoning GPT-4 (ES)` column
Eval1_GPT4 The score of the `Clinical reasoning GPT-4 (ES)` column given by the first evaluator
Eval2_GPT4 The score of the `Clinical reasoning GPT-4 (ES)` column given by the second evaluator
Eval3_GPT4 The score of the `Clinical reasoning GPT-4 (ES)` column given by the third evaluator
Eval4_GPT4 The score of the `Clinical reasoning GPT-4 (ES)` column given by the fourth evaluator
Eval5_GPT4 The score of the `Clinical reasoning GPT-4 (ES)` column given by the fifth evaluator
Eval6_GPT4 The score of the `Clinical reasoning GPT-4 (ES)` column given by the sixth evaluator
ChatGPT answer Answer provided by ChatGPT
Correct answer ChatGPT 0, 1 (Whether the answer provided by ChatGPT is correct)
Clinical reasoning ChatGPT (ES) Clinical reasoning provided by ChatGPT
Clinical reasoning ChatGPT (EN) Translation of `Clinical reasoning ChatGPT (ES)` column
Eval1_ChatGPT The score of the `Clinical reasoning ChatGPT (ES)` column given by the first evaluator
Eval2_ChatGPT The score of the `Clinical reasoning ChatGPT (ES)` column given by the second evaluator
Eval3_ChatGPT The score of the `Clinical reasoning ChatGPT (ES)` column given by the third evaluator
Eval4_ChatGPT The score of the `Clinical reasoning ChatGPT (ES)` column given by the fourth evaluator
Eval5_ChatGPT The score of the `Clinical reasoning ChatGPT (ES)` column given by the fifth evaluator
Eval6_ChatGPT The score of the `Clinical reasoning ChatGPT (ES)` column given by the sixth evaluator
Disease category (ES) The disease that the question addressed (Bone metabolism, Infective arthritis, Microcrystalline arthritis, Others, Rheumatoid arthritis, Scleroderma, Spondyloarthropathies, Systemic lupus erythematosus, Vasculitis )
Disease category (EN) Translation of `Disease category (ES)` column
CLAUDE answer Answer provided by CLAUDE
Correct answer CLAUDE 0, 1 (Whether the answer provided by CLAUDE is correct)
Clinical reasoning CLAUDE (ES) Clinical reasoning provided by CLAUDE
Clinical reasoning CLAUDE (EN) Translation of `Clinical reasoning CLAUDE (ES)` column
BARD answer Answer provided by BARD
Correct answer BARD 0, 1 (Whether the answer provided by BARD is correct)
Clinical reasoning BARD (ES) Clinical reasoning provided by BARD
Clinical reasoning BARD (EN) Translation of `Clinical reasoning BARD (ES)` column

The translations of the questions and the clinical reasoning from Spanish into English were done with DeepL


Files (587.7 kB)

Name Size Download all
587.7 kB Download

Additional details


  • Harnessing ChatGPT and GPT-4 for Evaluating the Rheumatology Questions of the Spanish Access Exam to Specialized Medical Training. Alfredo Madrid-García, Zulema Rosales-Rosado, Dalifer Dayanira Freites-Núñez, Inés Pérez-Sancristobal, Esperanza Pato-Cour, Chamaida Plasencia-Rodríguez, Luis Cabeza-Osorio, Lydia Abasolo Alcazar, Leticia Leon Mateos, Benjamín Fernández-Gutiérrez, Luis Rodríguez-Rodríguez medRxiv 2023.07.21.23292821; doi: