ICB-UMA at BioCreative VIII @ AMIA 2023 Task 2 SYMPTEMIST (Symptom TExt Mining Shared Task)
Creators
- 1. Dept. of Computer Languages and Sciences & Research Institute of Multilingual Language Technologies, Universidad de Málaga, Málaga, Spain
Description
Abstract
These working notes summarize the contribution of the ICB research group from the University of Malaga to the BioCreative VIII Workshop @AMIA 2023, from our participation in Task 2 - SympTEMIST. Engaged in both subtasks, our approaches tackled symptom, sign, and clinical finding entities recognition (subtask 1 - SymptomNER) and their normalization to the corresponding SNOMED CT concepts (subtask 2 - SymptomNorm). For subtask 1, we analyzed the performance of some BERT-based models tailored for the nuances of Spanish clinical data. These models, specifically fine-tuned on the SymptomNER corpus, showed remarkable precision (0.804), recall (0.699), and F1-score (0.748) for the test set. For SymtomNorm subtask, we incorporated recent strategies using bi-encoder and cross-encoder models, especially SapBERT models enhanced with FAISS methods for similarity search. Finally, the model's predictions were further refined by leveraging a gazetteer with more than 150,000 concepts. Our strategy achieved 0.58 accuracy for the test set.
This article is part of the Proceedings of the BioCreative VIII Challenge and Workshop: Curation and Evaluation in the era of Generative Models.
Files
bc8_symptemist_icbuma.pdf
Files
(114.1 kB)
Name | Size | Download all |
---|---|---|
md5:c5914b88675088bad2a67caac7ad8ed0
|
114.1 kB | Preview Download |
Additional details
Related works
- Is published in
- Conference proceeding: 10.5281/zenodo.10103190 (DOI)