Structure Grounding Is Not Enough: Real Execution as the Ground Truth for LLM-Generated Bioinformatics Workflows

Zhang, Yaoyun

doi:10.5281/zenodo.20616544

Published June 9, 2026 | Version v3

Preprint Open

Structure Grounding Is Not Enough: Real Execution as the Ground Truth for LLM-Generated Bioinformatics Workflows

Zhang, Yaoyun (Project leader)¹

1. BioMate AI

Large language models (LLMs) applied to bioinformatics workflow generation hallucinate function names, cite packages absent from current releases, and produce non-executable workflows; across a 20-task benchmark over seven biological domains, ungrounded generation attains only 71.4% function-citation accuracy. We show that Bioconductor's formal structure — NAMESPACE export manifests, S4 typed class hierarchies, BiocViews controlled vocabularies, and standardized vignettes — serves as a grounding scaffold that suppresses this hallucination, raising citation accuracy to 88.2% and nearly eliminating wrong-package citations (14.3% → 2.1%). A dual-agent LLM cross-validation protocol, in which two reviewer agents and a mediator adjudicate step quality at scale, reaches 90.0% step correctness with inter-rater agreement κ = 0.96 after calibration.

The central contribution is the recognition that structure grounding suppresses hallucination but does not guarantee execution: a workflow can cite only verified functions, pass dual-agent review, and still fail at runtime through version skew, data-contract mismatch, or un-synthesizable inputs. We therefore introduce execution-grounded validation — a real end-to-end run in a dependency-complete environment on a synthesized realistic input — as the strongest tier of a grounding hierarchy (lexical ⊂ structural ⊂ executional), whose binary pass/fail is a self-supervised correctness oracle needing no human or LLM judgement. The "structure-as-guardrail" thesis thus generalizes to "execution-as-guardrail": wherever reproducible run environments exist, execution is the only ground truth, and the prior tiers are necessary but not sufficient.

Keywords: large language models; LLM hallucination; bioinformatics; Bioconductor; structure-grounded extraction; retrieval-augmented generation; agentic bioinformatics; dual-agent validation; workflow execution; grounding hierarchy; workflow generation; reproducibility; AI code generation

Files

Files (45.7 kB)

Name	Size	Download all
Structure Grounding Is Not Enough: Real Execution as the Ground Truth for LLM-Generated Bioinformatics Workflows.docx md5:3255ff83034d45093bba16e1f9d40251	45.7 kB	Download

Additional details

Repository URL: https://github.com/bioMate-AI/biomate-bioconductor-kb
Programming language: Python , R
Development Status: Active

	All versions	This version
Views	19	19
Downloads	4	4
Data volume	182.8 kB	182.8 kB

Structure Grounding Is Not Enough: Real Execution as the Ground Truth for LLM-Generated Bioinformatics Workflows

Authors/Creators

Description

Files

Files (45.7 kB)

Additional details

Software