CT-EBM-SP - Corpus of Clinical Trials for Evidence-Based-Medicine in Spanish (version 3)
Creators
Description
A collection of 1200 texts (292173 tokens) about clinical trials studies and clinical trials announcements in Spanish:
- 500 abstracts from journals published under a Creative Commons license, e.g. available in PubMed or the Scientific Electronic Library Online (SciELO).
- 700 clinical trials announcements published in the European Clinical Trials Register and Repositorio Español de Estudios Clínicos.
Texts were annotated with the following entities types:
- Semantic groups from the Unified Medical Language System: ANAT, CHEM, DEVI, DISO, LIVB, PHYS and PROC.
- Medical drug information: Contraindicated, Dose or strength, Form and Route or mode of administration.
- Temporal expressions: Age, Date, Duration, Frequency and Time.
- Miscellaneous medical entities: Concept, Food or drink, Observation or finding, Quantifier_or_Qualifier, and Result_or_Value.
- Negation/Speculation: Neg_cue, Negated, Spec_cue and Speculated.
- Temporality attributes (History_of and Future) and experiencer attributes (Patient, Family_member and Other).
In addition, the following semantic relationships were annotated:
- Intervention-related relations:
• Has_Dose_or_Strength
• Has_Drug_Form
• Has_Route_or_Mode
• Combined_with
• Used_for
• Has_Result_or_Value
- Temporal relations:
• Before
• After
• Overlap
• Has_Age
• Has_Frequency
• Has_Duration_or_Interval
- Event-related relations:
• Causes
• Experiences
• Has_Quantifier_or_Qualifier
• Location_of
- Assertion relations:
• Negation
• Speculation
81.72% of the total entities were normalized to Unified Medical Language System (UMLS) Concept Unique Identifiers (CUIs).
Files
CT-EBM-SP-v3.zip
Files
(12.2 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:a72a9c018f234b0166f0dfe9e2b7d6d6
|
12.2 MB | Preview Download |
Additional details
Related works
- Continues
- Dataset: 10.5281/zenodo.13880599 (DOI)
Funding
- Agencia Estatal de Investigación
- CLARA-MeD, funded by MICIU/AEI/10.13039/501100011033 in project call "Proyectos I+D+i Retos Investigación" PID2020-116001RA-C33
- Consejo Superior de Investigaciones Científicas
- JAE Intro 2021
Software
- Repository URL
- https://github.com/lcampillos/ct-ebm-sp-v3/
- Programming language
- Python
- Development Status
- Active