Symptom normalization using unsupervised learning and text similarity
Creators
- 1. Department of Computer Engineering, Bogazici University, Istanbul Turkey
Description
Abstract
Mapping named entities to their respective IDs in raw texts is an important task, as inaccuracies can significantly affect data consistency and the precision of information retrieval. This is especially important in medical texts, where correct entity identification can have a major impact on diagnostic accuracy and patient care. SympTEMIST is a shared task dedicated to, as the name suggests, text mining of medical symptoms, signs, and findings from texts. In this paper, we present our team BounNLP's participation in subtask 2, which primarily aims to map Spanish symptom mentions to the corresponding SNOMED CT concept IDs. We propose an unsupervised approach for named entity normalization based on clustering and text similarity, using both string similarity and BERT-based contextual word vector representations.
This article is part of the Proceedings of the BioCreative VIII Challenge and Workshop: Curation and Evaluation in the era of Generative Models.
Files
bc8_symptemist_bounnlp.pdf
Files
(131.4 kB)
Name | Size | Download all |
---|---|---|
md5:542c4f67ce68ad3055d30003ab679ce4
|
131.4 kB | Preview Download |
Additional details
Related works
- Is published in
- Conference proceeding: 10.5281/zenodo.10103190 (DOI)