Published June 9, 2026 | Version v3

Structure Grounding Is Not Enough: Real Execution as the Ground Truth for LLM-Generated Bioinformatics Workflows

  • 1. BioMate AI

Description

Large language models (LLMs) applied to bioinformatics workflow generation hallucinate function names, cite packages absent from current releases, and produce non-executable workflows; across a 20-task benchmark over seven biological domains, ungrounded generation attains only 71.4% function-citation accuracy. We show that Bioconductor's formal structure — NAMESPACE export manifests, S4 typed class hierarchies, BiocViews controlled vocabularies, and standardized vignettes — serves as a grounding scaffold that suppresses this hallucination, raising citation accuracy to 88.2% and nearly eliminating wrong-package citations (14.3% → 2.1%). A dual-agent LLM cross-validation protocol, in which two reviewer agents and a mediator adjudicate step quality at scale, reaches 90.0% step correctness with inter-rater agreement κ = 0.96 after calibration.

The central contribution is the recognition that structure grounding suppresses hallucination but does not guarantee execution: a workflow can cite only verified functions, pass dual-agent review, and still fail at runtime through version skew, data-contract mismatch, or un-synthesizable inputs. We therefore introduce execution-grounded validation — a real end-to-end run in a dependency-complete environment on a synthesized realistic input — as the strongest tier of a grounding hierarchy (lexical ⊂ structural ⊂ executional), whose binary pass/fail is a self-supervised correctness oracle needing no human or LLM judgement. The "structure-as-guardrail" thesis thus generalizes to "execution-as-guardrail": wherever reproducible run environments exist, execution is the only ground truth, and the prior tiers are necessary but not sufficient.

Keywords: large language models; LLM hallucination; bioinformatics; Bioconductor; structure-grounded extraction; retrieval-augmented generation; agentic bioinformatics; dual-agent validation; workflow execution; grounding hierarchy; workflow generation; reproducibility; AI code generation

Files

Additional details

Software

Repository URL
https://github.com/bioMate-AI/biomate-bioconductor-kb
Programming language
Python , R
Development Status
Active