Published November 12, 2023 | Version v1
Conference proceeding Open

ICB-UMA at BioCreative VIII @ AMIA 2023 Task 2 SYMPTEMIST (Symptom TExt Mining Shared Task)

  • 1. Dept. of Computer Languages and Sciences & Research Institute of Multilingual Language Technologies, Universidad de Málaga, Málaga, Spain

Description

Abstract

These working notes summarize the contribution of the ICB research group from the University of Malaga to the BioCreative VIII Workshop @AMIA 2023, from our participation in Task 2 - SympTEMIST. Engaged in both subtasks, our approaches tackled symptom, sign, and clinical finding entities recognition (subtask 1 - SymptomNER) and their normalization to the corresponding SNOMED CT concepts (subtask 2 - SymptomNorm). For subtask 1, we analyzed the performance of some BERT-based models tailored for the nuances of Spanish clinical data. These models, specifically fine-tuned on the SymptomNER corpus, showed remarkable precision (0.804), recall (0.699), and F1-score (0.748) for the test set. For SymtomNorm subtask, we incorporated recent strategies using bi-encoder and cross-encoder models, especially SapBERT models enhanced with FAISS methods for similarity search. Finally, the model's predictions were further refined by leveraging a gazetteer with more than 150,000 concepts. Our strategy achieved 0.58 accuracy for the test set.

 

This article is part of the Proceedings of the BioCreative VIII Challenge and Workshop: Curation and Evaluation in the era of Generative Models.

Files

bc8_symptemist_icbuma.pdf

Files (114.1 kB)

Name Size Download all
md5:c5914b88675088bad2a67caac7ad8ed0
114.1 kB Preview Download

Additional details

Related works

Is published in
Conference proceeding: 10.5281/zenodo.10103190 (DOI)