Published February 20, 2026 | Version v1
Presentation Open

Challenges with deploying agentic AI for biomedical data integration and curation

Authors/Creators

  • 1. Lawrence Berkeley National Laboratory

Description

Challenges with Deploying Agentic AI for Biomedical Data Integration and Curation

Agentic AI — large language models operating within tool-using, iterative reasoning loops — represents a fundamental paradigm shift from the earlier model of LLMs as passive oracles. Here I report on direct deployment experience across four biomedical data curation domains: ontology development, gene function annotation review, disease mechanism knowledge base construction, and sample metadata harmonization. Across these efforts, I identify four recurring categories of challenge: task misalignment, harness engineering, evaluation, and upskilling.

A central observation is what I term the "iceberg of misalignment." Traditional AI tools have largely targeted well-defined, tractable tasks — named entity recognition, metadata slot-filling, term extraction — while the highest-value curatorial work remains below the surface: reconciling conflicting evidence, refactoring ontologies, synthesizing multiple lines of literature, and engineering community consensus. Aligning agent deployment with this more complex task space requires mapping what I describe as the "jagged frontier" of agent capability for biological data specifically.

A second finding is that naive agent deployments are prone to well-known failure modes including hallucination and output degradation, but that the biological data community is unusually well-positioned to address this through existing mature infrastructure. Tools such as ROBOT, the Ontology Development Kit, OWL reasoning, and LinkML provide a deterministic harness layer that agents can operate within and be validated against.

Third, evaluation remains an open and pressing problem. Strong benchmarks exist for simpler tasks, but agentic workflows are open-ended and resist binary success criteria. The field currently relies heavily on informal assessment; I argue that rubric-based evaluation frameworks are needed to establish the trust required for broader adoption.

Finally, I discuss the challenge of upskilling: effective use of agents requires deliberate investment in training and institutional access, and biocurators currently face significant barriers to both.

 
 
 
 
 
 
 
 

Files

Challenges with deploying agentic AI for biomedical data integration and curation.pdf