There is a newer version of the record available.

Published March 15, 2022 | Version v2
Dataset Open

MEDDOPROF corpus: complete gold standard annotations for occupation detection in medical documents in Spanish

Description

The MEDDOPROF Shared Task tackles the detection of occupations and employment statuses in clinical cases in Spanish from different specialties. Systems capable of automatically processing clinical texts are of interest to the medical community, social workers, researchers, the pharmaceutical industry, computer engineers, AI developers, policy makers, citizen’s associations and patients. Additionally, other NLP tasks (such as anonymization) can also benefit from this type of data.

MEDDOPROF has three different sub-tasks:

1) MEDDOPROF-NER: Participants must find the beginning and end of occupation mentions and classify them as PROFESION (PROFESSION), SITUACION_LABORAL (WORKING_STATUS) or ACTIVIDAD (ACTIVIDAD).

2) MEDDOPROF-CLASS: Participants must find the beginning and end of occupation mentions and classify them according to their referent (PACIENTE [patient], FAMILIAR [family member], SANITARIO [health professional] or OTRO [other]).

3) MEDDOPROF-NORM: Participants must find the beginning and end of occupation mentions and normalize them according to a reference codes list.

This is the complete Gold Standard. Annotations for the NER and CLASS sub-track are provided both separately and joint together (with each annotation level separated by a dash, e.g. PROFESION-PACIENTE). The normalized mentions are given as tab-separated file (.tsv) with four columns: filename, mention text, span and code.

Please cite if you use this resource:

Salvador Lima-López, Eulàlia Farré-Maduell, Antonio Miranda-Escalada, Vicent Brivá-Iglesias and Martin Krallinger. NLP applied to occupational health: MEDDOPROF shared task at IberLEF 2021 on automatic recognition, classification and normalization of professions and occupations from medical texts. In Procesamiento del Lenguaje Natural, 67. 2021.

@article{meddoprof,
    title={NLP applied to occupational health: MEDDOPROF shared task at IberLEF 2021 on automatic recognition, classification and normalization of professions and occupations from medical texts},
    author={Lima-López, Salvador and Farré-Maduell, Eulàlia and Miranda-Escalada, Antonio and Brivá-Iglesias, Vicent and Krallinger, Martin},
journal = {Procesamiento del Lenguaje Natural},
volume = {67},
year={2021},
issn = {1989-7553},
url = {http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6393},
pages = {243--256}
}

Resources:

- Web

- Training Data

- Test set

- Codes Reference List (for MEDDOPROF-NORM)

- Annotation Guidelines

- Occupations Gazetteer

 

MEDDOPROF is part of the IberLEF 2021 workshop, which is co-located with the SEPLN 2021 conference. For further information, please visit https://temu.bsc.es/meddoprof/ or email us at encargo-pln-life@bsc.es

MEDDOPROF is promoted by the Plan de Impulso de las Tecnologías del Lenguaje de la Agenda Digital (Plan TL).

Files

MEDDOPROF_GS.zip

Files (14.1 MB)

Name Size Download all
md5:2c88f1a9d0190efc6f95282c0f02abad
14.1 MB Preview Download