IKMLab@BC8 Track 3: Sequence Tagging for Position-Aware Human Phenotype Extraction with Pre-trained Language Models

Lin, Ying-Jia; Feng, Zhi-Quan; Kao, Hung-Yu

doi:10.5281/zenodo.10104936

Published November 12, 2023 | Version v1

Conference proceeding Open

IKMLab@BC8 Track 3: Sequence Tagging for Position-Aware Human Phenotype Extraction with Pre-trained Language Models

1. Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan

Abstract

Automatic extraction and normalization of human phenotypes from unstructured physical examination reports is a crucial and challenging task in clinical genetics. This paper presents the system submitted by IKMLab for the BioCreative VIII Task 3 - Genetic Phenotype Extraction and Normalization. We target Subtask 3b and aim at providing accurate locations of human phenotype findings given an observation text. Our system consists of two stages. In the first stage, we use the output of an existing baseline (e.g., PhenoTagger) to obtain a preliminary set of Human Phenotype Ontology (HPO) terms for each observation. Then, in the second stage, we design a sequence tagging schema based on a pre-trained language model and perform token classification to locate spans for the HPO terms. Our best system achieved 60.4% and 64.2% in Exact and Overlapping F1 scores during the final evaluations. In addition, further experiments show that our approach helps to better locate separated and consecutive spans describing HPO terms from observations.

This article is part of the Proceedings of the BioCreative VIII Challenge and Workshop: Curation and Evaluation in the era of Generative Models.

Files

bc8_phenotypes_ikmlab.pdf

Files (1.2 MB)

Name	Size	Download all
bc8_phenotypes_ikmlab.pdf md5:239d3fbdc513a983d6bd06e18202d07f	1.2 MB	Preview Download

Additional details

Is published in: Conference proceeding: 10.5281/zenodo.10103190 (DOI)

	All versions	This version
Views	100	100
Downloads	53	53
Data volume	69.5 MB	69.5 MB

IKMLab@BC8 Track 3: Sequence Tagging for Position-Aware Human Phenotype Extraction with Pre-trained Language Models

Creators

Description

Abstract

Files

bc8_phenotypes_ikmlab.pdf

Files (1.2 MB)

Additional details

Related works