MEDDOPROF Silver Standard: Predictions for Occupations, Working Statuses and Activities in Clinical Case Reports
Description
MEDDOPROF Silver Standard
This repository contains the MEDDOPROF Silver Standard, a collection of clinical case reports of multiple specialties that have been automatically annotated with occupations, working statuses and activities using named entity recognition models trained on the MEDDOPROF corpus. MEDDOPROF and this Silver Standard have been created by the Barcelona Supercomputing Center's NLP for Biomedical Information Analysis.
About the MEDDOPROF Corpus
MEDDOPROF is a corpus and shared task focused on the detection and normalization of occupations and related information such as working statuses and non-paid activities in clinical texts in Spanish. All mentions in the corpus have been normalized to SNOMED CT, with occupations also being normalized to the European Skills, Competences, and Occupations classification ESCO. MEDDOPLACE was released as part of the IberLEF 2021 workshop, which in turn was held within the SEPLN 2021 conference.
About this Silver Standard
The Silver Standard includes 13,357 documents and a total of 5,568,844 tokens. The documents belong to over 10 different medical specialties, including occupational health, oncology, radiology, infectology, dermatology, psichiatry and more.
The corpus includes three different labels: PROFESION (occupation), SITUACION-LABORAL (working status) and ACTIVIDAD (activity/hobby). There is a total of 4,332 predictions (1,325 unique), with the most common label being PROFESION with 3,599 predictions (1,181 unique), followed by SITUACION-LABORAL with 682 (180 unique) and ACTIVIDAD with 51 (44 unique).
The annotations are presented in .ann format, which is the format used by the annotation tool brat.
For more information, please check the attached README file.
Citation
If you use this Silver Standard or any MEDDOPROF materials, please cite the task's overview paper:
Lima-López, Salvador, Eulàlia Farré-Maduell, Antonio Miranda-Escalada, Vicent Brivá-Iglesias, & Martin Krallinger. “NLP applied to occupational health: MEDDOPROF shared task at IberLEF 2021 on automatic recognition, classification and normalization of professions and occupations from medical texts.” Procesamiento del Lenguaje Natural [Online], 67 (2021): 243-256.
License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Contact
If you have any questions or suggestions, please contact us at:
- Salvador Lima-López (<salvador [dot] limalopez [at] gmail [dot] com>)
- Martin Krallinger (<krallinger [dot] martin [at] gmail [dot] com>)
Acknowledgements
MEDDOPROF was promoted by the Spanish government's 2020 Proyectos de I+D+i RTI Tipo A (AI4ProfHealth - DESCIFRANDO EL PAPEL DE LAS PROFESIONES EN LA SALUD DE LOS PACIENTES A TRAVES DE LA MINERIA DE TEXTOS (PID2020-119266RA-I00)).
Related Links
- MEDDOPROF website: https://temu.bsc.es/meddoprof/
- MEDDOPROF Gold Standard: https://doi.org/10.5281/zenodo.5070540
- MEDDOPROF Annotation Guidelines: https://doi.org/10.5281/zenodo.4694675
Files
meddoprof_silver_standard.zip
Files
(25.7 MB)
Name | Size | Download all |
---|---|---|
md5:621ea59a32074ae143d247b4775cc927
|
25.7 MB | Preview Download |