Published May 15, 2023 | Version v1
Dataset Open

MEDDOPROF Silver Standard: Predictions for Occupations, Working Statuses and Activities in Clinical Case Reports

  • 1. Barcelona Supercomputing Center

Description

MEDDOPROF Silver Standard

This repository contains the MEDDOPROF Silver Standard, a collection of clinical case reports of multiple specialties that have been automatically annotated with occupations, working statuses and activities using named entity recognition models trained on the MEDDOPROF corpus. MEDDOPROF and this Silver Standard have been created by the Barcelona Supercomputing Center's NLP for Biomedical Information Analysis.

About the MEDDOPROF Corpus

MEDDOPROF is a corpus and shared task focused on the detection and normalization of occupations and related information such as working statuses and non-paid activities in clinical texts in Spanish. All mentions in the corpus have been normalized to SNOMED CT, with occupations also being normalized to the European Skills, Competences, and Occupations classification ESCO. MEDDOPLACE was released as part of the IberLEF 2021 workshop, which in turn was held within the SEPLN 2021 conference.

About this Silver Standard

The Silver Standard includes 13,357 documents and a total of 5,568,844 tokens. The documents belong to over 10 different medical specialties, including occupational health, oncology, radiology, infectology, dermatology, psichiatry and more.

The corpus includes three different labels: PROFESION (occupation), SITUACION-LABORAL (working status) and ACTIVIDAD (activity/hobby). There is a total of 4,332 predictions (1,325 unique), with the most common label being PROFESION with 3,599 predictions (1,181 unique), followed by SITUACION-LABORAL with 682 (180 unique) and ACTIVIDAD with 51 (44 unique).

The annotations are presented in .ann format, which is the format used by the annotation tool brat.

For more information, please check the attached README file.

Citation

If you use this Silver Standard or any MEDDOPROF materials, please cite the task's overview paper:
Lima-López, Salvador, Eulàlia Farré-Maduell, Antonio Miranda-Escalada, Vicent Brivá-Iglesias, & Martin Krallinger. “NLP applied to occupational health: MEDDOPROF shared task at IberLEF 2021 on automatic recognition, classification and normalization of professions and occupations from medical texts.” Procesamiento del Lenguaje Natural [Online], 67 (2021): 243-256.

License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Contact
If you have any questions or suggestions, please contact us at:

- Salvador Lima-López (<salvador [dot] limalopez [at] gmail [dot] com>)
- Martin Krallinger (<krallinger [dot] martin [at] gmail [dot] com>)

Acknowledgements
MEDDOPROF was promoted by the Spanish government's 2020 Proyectos de I+D+i RTI Tipo A (AI4ProfHealth - DESCIFRANDO EL PAPEL DE LAS PROFESIONES EN LA SALUD DE LOS PACIENTES A TRAVES DE LA MINERIA DE TEXTOS (PID2020-119266RA-I00)).

Related Links

- MEDDOPROF website: https://temu.bsc.es/meddoprof/

- MEDDOPROF Gold Standard: https://doi.org/10.5281/zenodo.5070540

- MEDDOPROF Annotation Guidelines: https://doi.org/10.5281/zenodo.4694675

Files

meddoprof_silver_standard.zip

Files (25.7 MB)

Name Size Download all
md5:621ea59a32074ae143d247b4775cc927
25.7 MB Preview Download