Overview of SympTEMIST at BioCreative VIII: corpus, guidelines and evaluation of systems for the detection and normalization of symptoms, signs and findings from text
Creators
- 1. Barcelona Supercomputing Center, Spain
Description
Abstract
Systems able to detect and normalize symptom mentions from clinical texts are crucial for healthcare data mining, AI applied to clinical systems, medical analytics and predictive applications. As opposed to other clinical information types, such as diagnoses/diseases, procedures, lab test results or even medications, clinical symptoms can often only be recovered in detail directly from written clinical narratives. Due to the high complexity, variability and difficulty in generating annotated corpora for clinical symptoms, few manually annotated data collections have been constructed so far. Previous efforts typically showed limitations, such as missing entity normalization to controlled vocabularies, use of dictionaries for pre-annotations, lack of multilingual solutions or underspecified annotation guidelines. To address these issues, we proposed the SympTEMIST track at the BioCreative VIII initiative. The SympTEMIST task is part of the BioCreative VIII evaluation initiative. It is structured into the following three sub-tracks: automatic detection of exact mentions of symptoms, normalizing symptoms to their SNOMED CT concept identifiers and an experimental subtask with the aim of promoting entity linking and concept normalization for several languages, namely English, Portuguese, French, Italian and Dutch. From a total of 25 teams, 11 submitted results for at least one of the three sub-tasks. Top scoring teams obtained an F1-score of 0.7477 for the SymptomNER task (with precision of 0.8039 and recall of 0.6988), while the top-performing team for the SymptomNorm task obtained an accuracy of 0.6070. Taking into account the complexity of symptom mentions, which often include long descriptive or nested entities and abbreviations, the obtained results and used datasets can be considered a relevant contribution for future symptom mining approaches from clinical texts. The SympTEMIST Gold Standard is freely available at: https://zenodo.org/doi/10.5281/zenodo.8223653.
This article is part of the Proceedings of the BioCreative VIII Challenge and Workshop: Curation and Evaluation in the era of Generative Models.
Files
bc8_symptemist_overview.pdf
Files
(948.0 kB)
Name | Size | Download all |
---|---|---|
md5:1bfccb04b0f44411669ea0c07b6f06d1
|
948.0 kB | Preview Download |
Additional details
Related works
- Is published in
- Conference proceeding: 10.5281/zenodo.10103190 (DOI)