PicusLab @ BC8 SympTEMIST track: Disambiguating Entity Linking Candidates with Question Answering
- 1. Department of Electrical Engineering and Information Technology (DIETI), University of Naples "Federico II", Via Claudio 21, Naples, Italy
Description
Abstract
In the field of biomedical informatics, entity linking plays a pivotal role in enhancing search capabilities, integrating heterogeneous data, and fostering advanced semantic understanding. Nonetheless, there is a significant lack of linguistic resources specifically designed for developing entity linking frameworks. Most existing datasets and concept aliases in primary ontologies are predominantly in English. This scarcity poses significant challenges in generating potential entities for linkage and in disambiguating among candidate entities, particularly for under-resourced languages. In this work, we describe our contribution to the BioCreative SympTEMIST shared task, which focuses on the detection and normalization of symptoms, signs and findings in medical documents in Spanish. Our methodology employs a pre-trained Spanish RoBERTa model in tandem with a cross-lingual SapBERT model for candidate generation, followed by a disambiguation phase utilizing a Question Answering-based module. Extending our approach to the multilingual sub-task, we demonstrate its adaptability. We conducted experiments on an internal test set mirroring the SNOMED codes distribution of the training set, which underscored the efficacy of our approach. Nonetheless, the challenge results highlight a need for additional investigation to tailor the framework to unforeseen codes.
This article is part of the Proceedings of the BioCreative VIII Challenge and Workshop: Curation and Evaluation in the era of Generative Models.
Files
bc8_symptemist_picus.pdf
Files
(2.4 MB)
Name | Size | Download all |
---|---|---|
md5:2c6eef21844c28980d8e6b06ccb7dc94
|
2.4 MB | Preview Download |
Additional details
Related works
- Is published in
- Conference proceeding: 10.5281/zenodo.10103190 (DOI)