Published November 12, 2023 | Version v1
Conference proceeding Open

DUTIR-BioNLP@BC8 Track 3: Genetic Phenotype Extraction and Normalization with Biomedical Pre-trained Language Models

  • 1. School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China

Description

Abstract

It is important to automatically extract and normalize key medical findings from the observation results written during the physical examination of teratology. The BioCreative VIII Track 3 endeavors to facilitate the advancement and assessment of systems designed to automatically extract and normalize the phenotype entities from electronic health records (EHRs). This paper describes our method used to create our submissions to the track. Our pipelined method for the phenotype concept extraction partitions the process into two subtasks: Named Entity Recognition and Named Entity Normalization. The cutting-edge biomedical pre-trained language models are used for both subtasks. Then the ensemble method is further used to improve the final performance. The official results on the test set show that our best submission achieves the F1-scores of 0.7632 on Subtask 3a and 0.7112 on Subtask 3b.

 

This article is part of the Proceedings of the BioCreative VIII Challenge and Workshop: Curation and Evaluation in the era of Generative Models.

Files

bc8_phenotypes_dutir.pdf

Files (539.3 kB)

Name Size Download all
md5:7a741901ea78f98d6fef47153adc8482
539.3 kB Preview Download

Additional details

Related works

Is published in
Conference proceeding: 10.5281/zenodo.10103190 (DOI)