Enhancing BERT Performance with LLMs: Structured Data Augmentation for Biomedical Entity Recognition

Wei, Ying; Li, Qi; Pillai, Jay

doi:10.5281/zenodo.16875728

Published August 14, 2025 | Version v1

Conference proceeding Open

Enhancing BERT Performance with LLMs: Structured Data Augmentation for Biomedical Entity Recognition

1. Iowa State University
2. Truveta, Bellevue

Abstract

Large Language Models (LLMs) have shown remarkable capabilities across many NLP tasks, but their performance on domain-specific named entity recognition (NER), such as in the biomedical field, remains limited. Meanwhile, BERT-based models continue to achieve strong results in biomedical NER but require substantial amounts of high-quality annotated data. In this work, we investigate how to harness LLMs to generate auxiliary annotation data for BERT-based NER models, offering a cost-effective alternative to manual annotation. We address three key research questions: (1) whether LLMs or fine-tuned BERT models provide more effective weak supervision for improving BERT-based NER, (2) how to best integrate augmented and gold-standard data during training, and (3) how factors such as data source and augmentation size affect downstream performance. In particular, we introduce a structured supervision framework where an LLM is fine-tuned to generate entity annotations in a context-rich JSON format, which are decoded into token-level labels for BERT training. Experimental results on the biomedical NER dataset show that LLM-generated auxiliary annotation data effectively enhances BERT performance. Our findings provide practical insights into designing hybrid systems that combine LLMs and BERT for scalable, high-quality biomedical NER.

This article is part of the Proceedings of the BioCreative IX Challenge and Workshop (BC9): Large Language Models for Clinical and Biomedical NLP at the International Joint Conference on Artificial Intelligence (IJCAI).

Files

BC9_paper06.pdf

Files (168.0 kB)

Name	Size	Download all
BC9_paper06.pdf md5:5f221aa047671664922fd1cc8a05d662	168.0 kB	Preview Download

	All versions	This version
Views	48	48
Downloads	77	77
Data volume	14.9 MB	14.9 MB

Enhancing BERT Performance with LLMs: Structured Data Augmentation for Biomedical Entity Recognition

Creators

Description

Abstract

Files

BC9_paper06.pdf

Files (168.0 kB)