There is a newer version of the record available.

Published August 8, 2023 | Version 0
Dataset Open

SympTEMIST Corpus: Gold Standard annotations for clinical symptoms, signs and findings information extraction

Description

SympTEMIST stands for Symptoms TExt MIning Shared Task. It is a shared task and set of resources focused on the detection, normalization and indexing of symptoms, signs and findings in medical documents in Spanish. SympTEMIST is complementary to the DisTEMIST corpus (https://temu.bsc.es/distemist) and MedProcNER/ProcTEMIST (https://temu.bsc.es/medprocner) as they all use the same document collection.

This repository includes the Train Set of the task's Named Entity Recognition subtask, which includes a total of 750 documents. Training data for the other two subtasks (entity linking and multilingual) will be released later on.

The annotated files are provided in two different formats, each separated in a different folder. The text files are also offered individually. On the one hand, .ann files in brat's standoff format are provided (For more information on brat's format please visit: https://brat.nlplab.org/standoff.html). On the other, a .tsv file is included where each line represents an annotation. The .tsv file has the following columns:

- "filename": document name

- "ann_id": identifier mention mark

- "label": mention type (always SINTOMA)

- "start_span": starting position of the mention in the document in characters

- "end_span": ending position of the mention in the document in characters

- "text": annotated string

SympTEMIST was developed by the Barcelona Supercomputing Center's NLP for Biomedical Information Analysis and used as part of BioCREATIVE 2023. For more information on the corpus, annotation scheme and task in general, please visit: https://temu.bsc.es/symptemist.

Related Links:

- SympTEMIST website: https://temu.bsc.es/symptemist

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Contact

If you have any questions or suggestions, please contact us at:

- Salvador Lima-López (<salvador [dot] limalopez [at] gmail [dot] com>)
- Martin Krallinger (<krallinger [dot] martin [at] gmail [dot] com>)

Additional resources

If you are interested in SympTEMIST, you might want to check out these corpora and resources that use the same text documents:

 

Files

symptemist_train.zip

Files (2.9 MB)

Name Size Download all
md5:6a654bed16e9945aec5a18df105d9ea6
2.9 MB Preview Download