CARMEN-I: Clinical Entities Annotation Guidelines in Spanish
Authors/Creators
- 1. Barcelona Supercomputing Center, Spain
Description
CARMEN-I is a corpus of 2,000 de-identified clinical records generated at the Hospital Clínic of Barcelona (HCB) from March 2020 to March 2022, during the height of the COVID-19 pandemic, and developed in collaboration with the Barcelona Supercomputing Center. It consists of discharge letters, referrals and radiology reports written mainly in Spanish, with some sections in Catalan. The corpus covers patients admitted with COVID-19, and includes a wide variety of comorbidities, such as kidney failure, chronic cardiovascular and respiratory diseases, malignancies and immunosuppression. CARMEN-I has been exhaustively anonymized and validated by hospital physicians, natural language processing experts and linguists, following detailed annotation guidelines, and replacing original sensitive data elements by synthetic equivalents. A subset of the corpus has been annotated with key medical concepts labeled by experts, namely, symptoms, diseases, procedures, medications, species and humans (incl. family members), using an annotation scheme based on previously-released biomedical corpora such as DisTEMIST, ProcTEMIST or LivingNER.
This repository includes the annotation guidelines for clinical entities developed for the corpus in Spanish. They cover six types of clinical entities, namely diseases, symptoms, procedures, drugs, species, and humans. The rules are based on previous work carried out by the Barcelona Supercomputing Center team: the DisTEMIST corpus for diseases, SympTEMIST for symptoms and findings, ProcTEMIST for procedures, DrugTEMIST for drugs and LivingNER for species and humans. These guidelines compile the rules followed for the annotation of these clinical entities, serving as a summary of previously published annotation guidelines.
CARMEN-I is available on PhysioNet under demand.
Other relevant links:
- CARMEN-I Clinical Entities Annotation Guidelines (Spanish version): zenodo.org/doi/10.5281/zenodo.10171539
- CARMEN-I Clinical Entities Annotation Guidelines (English version): zenodo.org/doi/10.5281/zenodo.10171646
- CARMEN-I Anonymization Protocol (Spanish version): zenodo.org/doi/10.5281/zenodo.10171660
- CARMEN-I Anonymization Protocol (English version): zenodo.org/doi/10.5281/zenodo.10171681
If you use this document, please cite:
@article{LimaLopez2025,
author = {Salvador Lima-López and Eulàlia Farré-Maduell and Luis Gasco and Jan Rodríguez-Miret and Santiago Frid and Xavier Pastor and Xavier Borrat and Martin Krallinger},
title = {A textual dataset of de-identified health records in Spanish and Catalan for medical entity recognition and anonymization},
journal = {Scientific Data},
volume = {12},
pages = {Article 1088},
year = {2025},
publisher = {Nature Publishing Group},
doi = {10.1038/s41597-025-05320-1},
url = {https://www.nature.com/articles/s41597-025-05320-1}
}
Files
[HCB-BSC] Guías generales de anotación CARMEN-I.pdf
Files
(2.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:65390beb65813c41639ff4d28a660e49
|
2.1 MB | Preview Download |
Additional details
Related works
- Is variant form of
- Other: 10.5281/zenodo.10171646 (DOI)