MEDDOPROF corpus: complete gold standard annotations for occupation detection in medical documents in Spanish
Creators
- 1. Barcelona Supercomputing Center
- 2. D-REAL
Description
UPDATE 27/09/2022: A complete normalization of all mentions in the corpus to SNOMED CT has been added to the 'meddoprof-norm.tsv' file.
Description
This repository contains the complete MEDDOPROF Gold Standard, a collection of 1,844 clinical cases in Spanish with annotations for occupations, working statuses and activities. MEDDOPROF is a Shared Task celebrated in 2021 that explores the application of natural language processing to occupational health. If you'd like to learn more, please visit: https://temu.bsc.es/meddoprof.
Folder and File Structure
The corpus' files are presented in the format used by the annotation tool brat. That is, for each clinical case there is a .txt file with the text and a .ann file with its corresponding annotations.
- meddoprof-ner/
Clinical cases annotated with these labels: PROFESION (PROFESSION), SITUACION_LABORAL (WORKING_STATUS) or ACTIVIDAD (ACTIVIDAD).
- meddoprof-class/
Clinical cases with the same annotations as 'meddoprof-ner' but with these labels instead: PACIENTE (patient), FAMILIAR (family member), SANITARIO (health professional) or OTRO (other).
- ner_class_joint/
Clinical cases with both levels of annotation (ner and class) joint (that is, a mention classified as as PROFESOR in meddoprof-ner and as PACIENTE in meddoprof-class would be PROFESION-PACIENTE here).
- meddoprof-norm.tsv
Tab-separated file (.tsv) with the mapping of each mention in the corpus to ESCO and SNOMED CT. The file has five columns: filename, mention text, span, ESCO code and SNOMED code.
Additionally, two files with the filenames of the train and test partitions are included.
Please cite if you use this resource:
Salvador Lima-López, Eulàlia Farré-Maduell, Antonio Miranda-Escalada, Vicent Brivá-Iglesias and Martin Krallinger. NLP applied to occupational health: MEDDOPROF shared task at IberLEF 2021 on automatic recognition, classification and normalization of professions and occupations from medical texts. In Procesamiento del Lenguaje Natural, 67. 2021.
@article{meddoprof,
title={NLP applied to occupational health: MEDDOPROF shared task at IberLEF 2021 on automatic recognition, classification and normalization of professions and occupations from medical texts},
author={Lima-López, Salvador and Farré-Maduell, Eulàlia and Miranda-Escalada, Antonio and Brivá-Iglesias, Vicent and Krallinger, Martin},
journal = {Procesamiento del Lenguaje Natural},
volume = {67},
year={2021},
issn = {1989-7553},
url = {http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6393},
pages = {243--256}
}
Related Resources:
- Web
- Test set
- Codes Reference List (for MEDDOPROF-NORM)
MEDDOPROF is part of the IberLEF 2021 workshop, which is co-located with the SEPLN 2021 conference. For further information, please visit https://temu.bsc.es/meddoprof/ or email us at encargo-pln-life@bsc.es
MEDDOPROF is promoted by the Plan de Impulso de las Tecnologías del Lenguaje de la Agenda Digital (Plan TL) and the Spanish government's 2020 Proyectos de I+D+i RTI Tipo A (AI4PROFHEALTH - DESCIFRANDO EL PAPEL DE LAS PROFESIONES EN LA SALUD DE LOS PACIENTES A TRAVES DE LA MINERIA DE TEXTOS (PID2020-119266RA-I00)).
Files
MEDDOPROF_GS.zip
Files
(14.1 MB)
Name | Size | Download all |
---|---|---|
md5:58b641fe2bc31b934b7566a4506f3704
|
14.1 MB | Preview Download |