IKMLab@BC8 Track 3: Sequence Tagging for Position-Aware Human Phenotype Extraction with Pre-trained Language Models
Creators
- 1. Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan
Description
Abstract
Automatic extraction and normalization of human phenotypes from unstructured physical examination reports is a crucial and challenging task in clinical genetics. This paper presents the system submitted by IKMLab for the BioCreative VIII Task 3 - Genetic Phenotype Extraction and Normalization. We target Subtask 3b and aim at providing accurate locations of human phenotype findings given an observation text. Our system consists of two stages. In the first stage, we use the output of an existing baseline (e.g., PhenoTagger) to obtain a preliminary set of Human Phenotype Ontology (HPO) terms for each observation. Then, in the second stage, we design a sequence tagging schema based on a pre-trained language model and perform token classification to locate spans for the HPO terms. Our best system achieved 60.4% and 64.2% in Exact and Overlapping F1 scores during the final evaluations. In addition, further experiments show that our approach helps to better locate separated and consecutive spans describing HPO terms from observations.
This article is part of the Proceedings of the BioCreative VIII Challenge and Workshop: Curation and Evaluation in the era of Generative Models.
Files
bc8_phenotypes_ikmlab.pdf
Files
(1.2 MB)
Name | Size | Download all |
---|---|---|
md5:239d3fbdc513a983d6bd06e18202d07f
|
1.2 MB | Preview Download |
Additional details
Related works
- Is published in
- Conference proceeding: 10.5281/zenodo.10103190 (DOI)