CARMEN-I: Anonymization Protocol for Clinical Reports in Spanish
Creators
- 1. Barcelona Supercomputing Center, Spain
- 2. Hospital Clínic de Barcelona, Spain
Description
CARMEN-I is a corpus of 2,000 de-identified clinical records generated at the Hospital Clínic of Barcelona (HCB) from March 2020 to March 2022, during the height of the COVID-19 pandemic, and developed in collaboration with the Barcelona Supercomputing Center (BSC). It consists of discharge letters, referrals and radiology reports written mainly in Spanish, with some sections in Catalan. The corpus covers patients admitted with COVID-19, and includes a wide variety of comorbidities, such as kidney failure, chronic cardiovascular and respiratory diseases, malignancies and immunosuppression. CARMEN-I has been exhaustively anonymized and validated by hospital physicians, natural language processing experts and linguists, following detailed annotation guidelines, and replacing original sensitive data elements by synthetic equivalents. A subset of the corpus has been annotated with key medical concepts labeled by experts, namely, symptoms, diseases, procedures, medications, species and humans (incl. family members), using an annotation scheme based on previously-released biomedical corpora such as DisTEMIST, ProcTEMIST or LivingNER.
This repository includes the anonymization protocol in Spanish. This document describes the protocol created for the data anonymization process, as well as the control mechanisms put in place for this purpose. It also includes addenda to the MEDDOCAN guidelines for the annotation of sensitive data, criteria for inclusion/exclusion of documents, and a list of indirect identifiers.
CARMEN-I is available on PhysioNet under demand.
Other relevant links:
- CARMEN-I Clinical Entities Annotation Guidelines (Spanish version): zenodo.org/doi/10.5281/zenodo.10171539
- CARMEN-I Clinical Entities Annotation Guidelines (English version): zenodo.org/doi/10.5281/zenodo.10171646
- CARMEN-I Anonymization Protocol (Spanish version): zenodo.org/doi/10.5281/zenodo.10171660
- CARMEN-I Anonymization Protocol (English version): zenodo.org/doi/10.5281/zenodo.10171681
- MEDDOCAN Anonymization Corpus: zenodo.org/doi/10.5281/zenodo.4279322
- MEDDOCAN Anonymization Guidelines: zenodo.org/doi/10.5281/zenodo.4279337
If you use this document, please cite:
@article{LimaLopez2025,
author = {Salvador Lima-López and Eulàlia Farré-Maduell and Luis Gasco and Jan Rodríguez-Miret and Santiago Frid and Xavier Pastor and Xavier Borrat and Martin Krallinger},
title = {A textual dataset of de-identified health records in Spanish and Catalan for medical entity recognition and anonymization},
journal = {Scientific Data},
volume = {12},
pages = {Article 1088},
year = {2025},
publisher = {Nature Publishing Group},
doi = {10.1038/s41597-025-05320-1},
url = {https://www.nature.com/articles/s41597-025-05320-1}
}
Files
[HCB-BSC] Protocolo y criterios anonimización.pdf
Files
(2.1 MB)
Name | Size | Download all |
---|---|---|
md5:6d1a39c8001e4dc939dd1cf9629885d4
|
2.1 MB | Preview Download |
Additional details
Related works
- Is variant form of
- Other: 10.5281/zenodo.10171681 (DOI)