Published November 12, 2023 | Version v1
Conference proceeding Open

Symptom normalization using unsupervised learning and text similarity

  • 1. Department of Computer Engineering, Bogazici University, Istanbul Turkey

Description

Abstract

Mapping named entities to their respective IDs in raw texts is an important task, as inaccuracies can significantly affect data consistency and the precision of information retrieval. This is especially important in medical texts, where correct entity identification can have a major impact on diagnostic accuracy and patient care. SympTEMIST is a shared task dedicated to, as the name suggests, text mining of medical symptoms, signs, and findings from texts. In this paper, we present our team BounNLP's participation in subtask 2, which primarily aims to map Spanish symptom mentions to the corresponding SNOMED CT concept IDs. We propose an unsupervised approach for named entity normalization based on clustering and text similarity, using both string similarity and BERT-based contextual word vector representations.

 

This article is part of the Proceedings of the BioCreative VIII Challenge and Workshop: Curation and Evaluation in the era of Generative Models.

Files

bc8_symptemist_bounnlp.pdf

Files (131.4 kB)

Name Size Download all
md5:542c4f67ce68ad3055d30003ab679ce4
131.4 kB Preview Download

Additional details

Related works

Is published in
Conference proceeding: 10.5281/zenodo.10103190 (DOI)