CT-EBM-SP - Corpus of Clinical Trials for Evidence-Based-Medicine in Spanish (version 3)
Authors/Creators
Description
A collection of 1200 texts (292173 tokens) about clinical trials studies and clinical trials announcements in Spanish:
- 500 abstracts from journals published under a Creative Commons license, e.g. available in PubMed or the Scientific Electronic Library Online (SciELO).
- 700 clinical trials announcements published in the European Clinical Trials Register and Repositorio Español de Estudios Clínicos.
Texts were annotated with the following entities types:
- Semantic groups from the Unified Medical Language System: ANAT, CHEM, DEVI, DISO, LIVB, PHYS and PROC.
- Medical drug information: Contraindicated, Dose_or_Strength, Form, and Route_or_Mode_of_administration.
- Temporal expressions: Age, Date, Duration_or_Interval, Frequency and Time.
- Miscellaneous medical entities: Concept, Food_or_Drink, Observation_or_Finding, Quantifier_or_Qualifier, and Result_or_Value.
- Negation/Speculation: Neg_cue, Negated, Spec_cue and Speculated.
- Attributes of temporality (Future, Family_history_of, and History_of), experiencer (Patient, Family_member and Other) and other information (Hypothetical).
In addition, the following semantic relationships were annotated:
- Intervention-related relations:
• Has_Dose_or_Strength
• Has_Drug_Form
• Has_Route_or_Mode
• Combined_with
• Used_for
• Has_Result_or_Value
- Temporal relations:
• Before
• After
• Overlap
• Has_Age
• Has_Frequency
• Has_Duration_or_Interval
- Event-related relations:
• Causes
• Experiences
• Has_Quantifier_or_Qualifier
• Location_of
- Assertion relations:
• Negation
• Speculation
81.75% of the total entities were normalized to Unified Medical Language System (UMLS) Concept Unique Identifiers (CUIs).
This is the final version with the corrections made after each file was reviewed by a a second reviewer.
Two annotators reviewed each corpus file.
Relation extraction Python code is available at the companion GitHub repository: https://github.com/lcampillos/ct-ebm-sp-v3
Files
CT-EBM-SP-v3.zip
Files
(21.8 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:cec1b6e9650bcda5d5bbeec1098b23e7
|
21.8 MB | Preview Download |
Additional details
Related works
- Continues
- Dataset: 10.5281/zenodo.13880599 (DOI)
Funding
- Agencia Estatal de Investigación
- CLARA-MeD, funded by MICIU/AEI/10.13039/501100011033 in project call "Proyectos I+D+i Retos Investigación" PID2020-116001RA-C33
- Consejo Superior de Investigaciones Científicas
- JAE Intro 2021
- Agencia Estatal de Investigación
- ExPlain4Health project, funded by MICIU/AEI/10.13039/414501100011033 PID2024-158912NB-I00
Dates
- Available
-
2025-12-29
Software
- Repository URL
- https://github.com/lcampillos/ct-ebm-sp-v3/
- Programming language
- Python
- Development Status
- Active