Symptom normalization using unsupervised learning and text similarity

Kavak, Berke; Özgür, Arzucan

doi:10.5281/zenodo.10104184

Published November 12, 2023 | Version v1

Conference proceeding Open

Symptom normalization using unsupervised learning and text similarity

1. Department of Computer Engineering, Bogazici University, Istanbul Turkey

Abstract

Mapping named entities to their respective IDs in raw texts is an important task, as inaccuracies can significantly affect data consistency and the precision of information retrieval. This is especially important in medical texts, where correct entity identification can have a major impact on diagnostic accuracy and patient care. SympTEMIST is a shared task dedicated to, as the name suggests, text mining of medical symptoms, signs, and findings from texts. In this paper, we present our team BounNLP's participation in subtask 2, which primarily aims to map Spanish symptom mentions to the corresponding SNOMED CT concept IDs. We propose an unsupervised approach for named entity normalization based on clustering and text similarity, using both string similarity and BERT-based contextual word vector representations.

This article is part of the Proceedings of the BioCreative VIII Challenge and Workshop: Curation and Evaluation in the era of Generative Models.

Files

bc8_symptemist_bounnlp.pdf

Files (131.4 kB)

Name	Size	Download all
bc8_symptemist_bounnlp.pdf md5:542c4f67ce68ad3055d30003ab679ce4	131.4 kB	Preview Download

Additional details

Is published in: Conference proceeding: 10.5281/zenodo.10103190 (DOI)

Citations

Oops! Something went wrong while fetching results.

136

Views

Downloads

Show more details

	All versions	This version
Views	136	136
Downloads	84	84
Data volume	13.0 MB	13.0 MB

More info on how stats are collected....

DOI

Resource type

Conference proceeding

Publisher

Zenodo

Imprint

Proceedings of the BioCreative VIII Challenge and Workshop: Curation and Evaluation in the era of Generative Models. New Orleans, USA.

Conference

AMIA 2023 Annual Symposium , New Orleans, USA, November 2023

Languages

English

Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: November 10, 2023
Modified: July 10, 2024

Symptom normalization using unsupervised learning and text similarity

Creators

Description

Abstract

Files

bc8_symptemist_bounnlp.pdf

Files (131.4 kB)

Additional details

Related works