There is a newer version of the record available.

Published April 23, 2025 | Version 3
Dataset Open

CT-EBM-SP - Corpus of Clinical Trials for Evidence-Based-Medicine in Spanish (version 3)

  • 1. ROR icon Consejo Superior de Investigaciones Científicas
  • 2. Spanish Royal Academy of Medicine
  • 3. ROR icon Hospital General Universitario Gregorio Marañón
  • 4. Agencia Estatal Consejo Superior de Investigaciones Científicas

Description

A collection of 1200 texts (292173 tokens) about clinical trials studies and clinical trials announcements in Spanish:

- 500 abstracts from journals published under a Creative Commons license, e.g. available in PubMed or the Scientific Electronic Library Online (SciELO).
- 700 clinical trials announcements published in the European Clinical Trials Register and Repositorio Español de Estudios Clínicos.

Texts were annotated with the following entities types:

- Semantic groups from the Unified Medical Language System: ANAT, CHEM, DEVI, DISO, LIVB, PHYS and PROC.
- Medical drug information: Contraindicated, Dose or strength, Form and Route or mode of administration.
- Temporal expressions: Age, Date, Duration, Frequency and Time.
- Miscellaneous medical entities: Concept, Food or drink, Observation or finding, Quantifier_or_Qualifier, and Result_or_Value.
- Negation/Speculation: Neg_cue, Negated, Spec_cue and Speculated.
- Temporality attributes (History_of and Future) and experiencer attributes (Patient, Family_member and Other).

In addition, the following semantic relationships were annotated: 

- Intervention-related relations
    • Has_Dose_or_Strength
    • Has_Drug_Form
    • Has_Route_or_Mode
    • Combined_with
    • Used_for
    • Has_Result_or_Value
- Temporal relations
    • Before
    • After
    • Overlap
    • Has_Age 
    • Has_Frequency
    • Has_Duration_or_Interval
- Event-related relations
    • Causes
    • Experiences
    • Has_Quantifier_or_Qualifier
    • Location_of
- Assertion relations
    • Negation
    • Speculation

81.72% of the total entities were normalized to Unified Medical Language System (UMLS) Concept Unique Identifiers (CUIs).

Files

CT-EBM-SP-v3.zip

Files (12.2 MB)

Name Size Download all
md5:a72a9c018f234b0166f0dfe9e2b7d6d6
12.2 MB Preview Download

Additional details

Related works

Continues
Dataset: 10.5281/zenodo.13880599 (DOI)

Funding

Agencia Estatal de Investigación
CLARA-MeD, funded by MICIU/AEI/10.13039/501100011033 in project call "Proyectos I+D+i Retos Investigación" PID2020-116001RA-C33
Consejo Superior de Investigaciones Científicas
JAE Intro 2021

Software

Repository URL
https://github.com/lcampillos/ct-ebm-sp-v3/
Programming language
Python
Development Status
Active