MedProcNER/ProcTEMIST Corpus: Gold Standard annotations for Clinical Procedures Information Extraction
- 1. Barcelona Supercomputing Center
Description
MedProcNER stands for MEDical PROCedure Named Entity Recognition. It is a shared task and set of resources focused on the detection, normalization and indexing of clinical procedures in medical documents in Spanish. MedProcNER is complementary to the DisTEMIST corpus (https://temu.bsc.es/distemist) as they both use the same document collection, which is why it's also called ProcTEMIST.
This repository includes the Train Set of the task, which includes a total of 750 documents. The unannotated test text files are also included so that predictions can be created for them. Finally, we include a gazetteer of possible SNOMED CT codes for the normalization and indexing tasks. For more information, please check the attached README file.
** UPDATE MAY 2nd 2023: Second part of the train set, test set texts and gazetteer now available!
MedProcNER was developed by the Barcelona Supercomputing Center's NLP for Biomedical Information Analysis and used as part of BioASQ @ CLEF 2023. For more information on the corpus, annotation scheme and task in general, please visit: https://temu.bsc.es/medprocner.
Related Links:
- MedProcNER website: https://temu.bsc.es/medprocner
- MedProcNER Guidelines: https://doi.org/10.5281/zenodo.7817666
License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Contact
If you have any questions or suggestions, please contact us at:
- Salvador Lima-López (<salvador [dot] limalopez [at] gmail [dot] com>)
- Martin Krallinger (<krallinger [dot] martin [at] gmail [dot] com>)
Files
medprocner_gs_train+test+gazz_230502.zip
Files
(6.7 MB)
Name | Size | Download all |
---|---|---|
md5:4f82f5fc97dbb6ae1796ea1f74f62bd3
|
6.7 MB | Preview Download |