DUTIR-BioNLP@BC8 Track 3: Genetic Phenotype Extraction and Normalization with Biomedical Pre-trained Language Models
Creators
- 1. School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
Description
Abstract
It is important to automatically extract and normalize key medical findings from the observation results written during the physical examination of teratology. The BioCreative VIII Track 3 endeavors to facilitate the advancement and assessment of systems designed to automatically extract and normalize the phenotype entities from electronic health records (EHRs). This paper describes our method used to create our submissions to the track. Our pipelined method for the phenotype concept extraction partitions the process into two subtasks: Named Entity Recognition and Named Entity Normalization. The cutting-edge biomedical pre-trained language models are used for both subtasks. Then the ensemble method is further used to improve the final performance. The official results on the test set show that our best submission achieves the F1-scores of 0.7632 on Subtask 3a and 0.7112 on Subtask 3b.
This article is part of the Proceedings of the BioCreative VIII Challenge and Workshop: Curation and Evaluation in the era of Generative Models.
Files
bc8_phenotypes_dutir.pdf
Files
(539.3 kB)
Name | Size | Download all |
---|---|---|
md5:7a741901ea78f98d6fef47153adc8482
|
539.3 kB | Preview Download |
Additional details
Related works
- Is published in
- Conference proceeding: 10.5281/zenodo.10103190 (DOI)