Published November 12, 2023 | Version v1
Conference proceeding Open

IKMLab@BC8 Track 3: Sequence Tagging for Position-Aware Human Phenotype Extraction with Pre-trained Language Models

  • 1. Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan

Description

Abstract

Automatic extraction and normalization of human phenotypes from unstructured physical examination reports is a crucial and challenging task in clinical genetics. This paper presents the system submitted by IKMLab for the BioCreative VIII Task 3 - Genetic Phenotype Extraction and Normalization. We target Subtask 3b and aim at providing accurate locations of human phenotype findings given an observation text. Our system consists of two stages. In the first stage, we use the output of an existing baseline (e.g., PhenoTagger) to obtain a preliminary set of Human Phenotype Ontology (HPO) terms for each observation. Then, in the second stage, we design a sequence tagging schema based on a pre-trained language model and perform token classification to locate spans for the HPO terms. Our best system achieved 60.4% and 64.2% in Exact and Overlapping F1 scores during the final evaluations. In addition, further experiments show that our approach helps to better locate separated and consecutive spans describing HPO terms from observations.

 

This article is part of the Proceedings of the BioCreative VIII Challenge and Workshop: Curation and Evaluation in the era of Generative Models.

Files

bc8_phenotypes_ikmlab.pdf

Files (1.2 MB)

Name Size Download all
md5:239d3fbdc513a983d6bd06e18202d07f
1.2 MB Preview Download

Additional details

Related works

Is published in
Conference proceeding: 10.5281/zenodo.10103190 (DOI)