Published November 12, 2023 | Version v1
Conference proceeding Open

Overview of SympTEMIST at BioCreative VIII: corpus, guidelines and evaluation of systems for the detection and normalization of symptoms, signs and findings from text

Description

Abstract

Systems able to detect and normalize symptom mentions from clinical texts are crucial for healthcare data mining, AI applied to clinical systems, medical analytics and predictive applications. As opposed to other clinical information types, such as diagnoses/diseases, procedures, lab test results or even medications, clinical symptoms can often only be recovered in detail directly from written clinical narratives. Due to the high complexity, variability and difficulty in generating annotated corpora for clinical symptoms, few manually annotated data collections have been constructed so far. Previous efforts typically showed limitations, such as missing entity normalization to controlled vocabularies, use of dictionaries for pre-annotations, lack of multilingual solutions or underspecified annotation guidelines. To address these issues, we proposed the SympTEMIST track at the BioCreative VIII initiative. The SympTEMIST task is part of the BioCreative VIII evaluation initiative. It is structured into the following three sub-tracks: automatic detection of exact mentions of symptoms, normalizing symptoms to their SNOMED CT concept identifiers and an experimental subtask with the aim of promoting entity linking and concept normalization for several languages, namely English, Portuguese, French, Italian and Dutch. From a total of 25 teams, 11 submitted results for at least one of the three sub-tasks. Top scoring teams obtained an F1-score of 0.7477 for the SymptomNER task (with precision of 0.8039 and recall of 0.6988), while the top-performing team for the SymptomNorm task obtained an accuracy of 0.6070. Taking into account the complexity of symptom mentions, which often include long descriptive or nested entities and abbreviations, the obtained results and used datasets can be considered a relevant contribution for future symptom mining approaches from clinical texts. The SympTEMIST Gold Standard is freely available at: https://zenodo.org/doi/10.5281/zenodo.8223653.

 

This article is part of the Proceedings of the BioCreative VIII Challenge and Workshop: Curation and Evaluation in the era of Generative Models.

Files

bc8_symptemist_overview.pdf

Files (948.0 kB)

Name Size Download all
md5:1bfccb04b0f44411669ea0c07b6f06d1
948.0 kB Preview Download

Additional details

Related works

Is published in
Conference proceeding: 10.5281/zenodo.10103190 (DOI)