MEDDOPROF corpus: training set

Eulàlia Farré-Maduell; Salvador Lima-López; Antonio Miranda-Escalada; Vicent Briva-Iglesias; Martin Krallinger

doi:10.5281/zenodo.4709056

Published April 15, 2021 | Version v2

Journal article Open

MEDDOPROF corpus: training set

1. Barcelona Supercomputing Center
2. D-REAL

The MEDDOPROF Shared Task tackles the detection of occupations and employment statuses in clinical cases in Spanish from different specialties. Systems capable of automatically processing clinical texts are of interest to the medical community, social workers, researchers, the pharmaceutical industry, computer engineers, AI developers, policy makers, citizen’s associations and patients. Additionally, other NLP tasks (such as anonymization) can also benefit from this type of data.

MEDDOPROF has three different sub-tasks:

1) MEDDOPROF-NER: Participants must find the beginning and end of occupation mentions and classify them as PROFESION (PROFESSION) or SITUACION_LABORAL (WORKING_STATUS)

2) MEDDOPROF-CLASS: Participants must find the beginning and end of occupation mentions and classify them according to their referent (PACIENTE [patient], FAMILIAR [family member], SANITARIO [health professional] or OTRO [other]).

3) MEDDOPROF-NORM: Participants must find the beginning and end of occupation mentions and normalize them according to a reference codes list.

MEDDOPROF is part of the IberLEF 2021 workshop, which is co-located with the SEPLN 2021 conference. For further information, please visit https://temu.bsc.es/meddoprof/ or email us at encargo-pln-life@bsc.es

MEDDOPROF is promoted by the Plan de Impulso de las Tecnologías del Lenguaje de la Agenda Digital (Plan TL).

UPDATE 22/04/21: A new version of the training data has been uploaded after detecting some minor errors in some of the annotations. Training data for Task 3 (MEDDOPROF-NORM) has also been added. Please make sure to download the latest version!

Resources:

- Web

- Annotation Guidelines

Files

meddoprof_train_set.zip

Files (7.5 MB)

Name	Size	Download all
meddoprof_train_set.zip md5:2036531bd1a3476ae4585b43b26beea7	7.5 MB	Preview Download

	All versions	This version
Views	2,822	122
Downloads	414	52
Data volume	4.8 GB	404.7 MB

MEDDOPROF corpus: training set

Authors/Creators

Description

Files

meddoprof_train_set.zip

Files (7.5 MB)