Published April 15, 2021 | Version v2
Other Open

MEDDOPROF guidelines

Description

The MEDDOPROF Shared Task tackles the detection of occupations and employment statuses in clinical cases in Spanish from different specialties. Systems capable of automatically processing clinical texts are of interest to the medical community, social workers, researchers, the pharmaceutical industry, computer engineers, AI developers, policy makers, citizen’s associations and patients. Additionally, other NLP tasks (such as anonymization) can also benefit from this type of data.

These guidelines describe the process followed by the clinical and linguist experts who manually annotated the MEDDOPROF corpus, and a series of rules for annotating occupations in clinical texts.

 

Annotation quality:

We have performed a consistency analysis of the corpus. ~10% of the documents have been annotated by an internal annotator as well as by the linguist experts following these annotation guidelines. The average Inter-Annotator Agreement (pairwise agreement) after multiple rounds is around 0.9.

Please cite if you use this resource:

Salvador Lima-López, Eulàlia Farré-Maduell, Antonio Miranda-Escalada, Vicent Brivá-Iglesias and Martin Krallinger. NLP applied to occupational health: MEDDOPROF shared task at IberLEF 2021 on automatic recognition, classification and normalization of professions and occupations from medical texts. In Procesamiento del Lenguaje Natural, 67. 2021.

@article{meddoprof,
    title={NLP applied to occupational health: MEDDOPROF shared task at IberLEF 2021 on automatic recognition, classification and normalization of professions and occupations from medical texts},
    author={Lima-López, Salvador and Farré-Maduell, Eulàlia and Miranda-Escalada, Antonio and Brivá-Iglesias, Vicent and Krallinger, Martin},
journal = {Procesamiento del Lenguaje Natural},
volume = {67},
year={2021},
issn = {1989-7553},
url = {http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6393},
pages = {243--256}
}

Resources:

- Web

- Complete corpus

- Training Data

- Test set

MEDDOPROF is part of the IberLEF 2021 workshop, which is co-located with the SEPLN 2021 conference. For further information, please visit https://temu.bsc.es/meddoprof/ or email us at encargo-pln-life@bsc.es

MEDDOPROF is promoted by the Plan de Impulso de las Tecnologías del Lenguaje de la Agenda Digital (Plan TL) and the Spanish government's 2020 Proyectos de I+D+i RTI Tipo A (DESCIFRANDO EL PAPEL DE LAS PROFESIONES EN LA SALUD DE LOS PACIENTES A TRAVES DE LA MINERIA DE TEXTOS (PID2020-119266RA-I00)).

Files

MEDDOPROF_guias_final.pdf

Files (557.1 kB)

Name Size Download all
md5:949d0035695bd7e5b75b2c1f2d3a374e
557.1 kB Preview Download