Assessing the state of the art in biomedical relation extraction: evaluating ChatGPT, PubMedBERT and BioREx for the BioRED track at BioCreative VIII

Lai, Po-Ting; Islamaj, Rezarta; Wei, Chih-Hsuan; Luo, Ling; Lu, Zhiyong

doi:10.5281/zenodo.10351286

Published November 12, 2023 | Version v1

Conference proceeding Open

Assessing the state of the art in biomedical relation extraction: evaluating ChatGPT, PubMedBERT and BioREx for the BioRED track at BioCreative VIII

1. National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), MD, 20894, Bethesda, USA
2. School of Computer Science and Technology, Dalian University of Technology, 116024, Dalian, China

Abstract

Biomedical relation extraction aims to identify and categorize relationships between biomedical entities in unstructured text. This is crucial for various biomedical NLP applications, from drug discovery to custom medical solutions. BioRED track at the BioCreative VIII challenge and workshop aimed to foster the development of novel algorithms for biomedical relation extraction. This challenge differed from previous relation extraction challenges because it addresses relation extraction at the document level, addresses relation extraction between five entity types, in eight semantic categories and asks for the classification of the relation on whether it is a novel finding according to the document or background knowledge. In this paper, we use the BioRED track training dataset, and build three different benchmarking systems using: BERT, GPT and BioREx methods. The BioRED track consisted of two sub-tasks: the first subtask provided the participants with article titles, abstracts and human annotated genes, diseases, chemicals, gene variants, cell lines, and species mentions in the text. All annotated entities were linked to database identifiers, the second sub-task did not provide entity annotations, and asked for end-to-end relation extraction systems. Although we discuss three different systems, we followed a similar overall strategy for both sub-tasks, and we considered them as multi-label classification problems. For sub-task 1 we used the provided human annotations as entity inputs, while for sub-task 2 we retrieved the PubTator output for all articles. Here we discuss our three different approaches, and offer our perspective on the advantages and limitations of these approaches. Our best performing system was the BioREx model with an F1-score of 75.68%, and 56.89% on recognizing the entity pairs, and relation types respectively, surpassing the median (73.56%, and 53.17%) and average scores (67.03% and 47.74%) of all participating teams.

This article is part of the Proceedings of the BioCreative VIII Challenge and Workshop: Curation and Evaluation in the era of Generative Models.

Files

BioCreativeVIII-BioRED-Track-assessing-state-of-the-art.pdf

Files (792.7 kB)

Name	Size	Download all
BioCreativeVIII-BioRED-Track-assessing-state-of-the-art.pdf md5:788729bb8144829ca4c42a207071286f	792.7 kB	Preview Download

Additional details

Is published in: Journal article: 10.5281/zenodo.10103190 (DOI)

	All versions	This version
Views	186	186
Downloads	122	122
Data volume	129.2 MB	129.2 MB

Assessing the state of the art in biomedical relation extraction: evaluating ChatGPT, PubMedBERT and BioREx for the BioRED track at BioCreative VIII

Creators

Description

Abstract

Files

BioCreativeVIII-BioRED-Track-assessing-state-of-the-art.pdf

Files (792.7 kB)

Additional details

Related works