Structure Grounding Is Not Enough: Real Execution as the Ground Truth for LLM-Generated Bioinformatics Workflows
Description
Large language models (LLMs) applied to bioinformatics workflow generation hallucinate function names, cite packages absent from current releases, and produce non-executable workflows; across a 20-task benchmark over seven biological domains, ungrounded generation attains only 71.4% function-citation accuracy. We show that Bioconductor's formal structure — NAMESPACE export manifests, S4 typed class hierarchies, BiocViews controlled vocabularies, and standardized vignettes — serves as a grounding scaffold that suppresses this hallucination, raising citation accuracy to 88.2% and nearly eliminating wrong-package citations (14.3% → 2.1%). A dual-agent LLM cross-validation protocol, in which two reviewer agents and a mediator adjudicate step quality at scale, reaches 90.0% step correctness with inter-rater agreement κ = 0.96 after calibration.
The central contribution is the recognition that structure grounding suppresses hallucination but does not guarantee execution: a workflow can cite only verified functions, pass dual-agent review, and still fail at runtime through version skew, data-contract mismatch, or un-synthesizable inputs. We therefore introduce execution-grounded validation — a real end-to-end run in a dependency-complete environment on a synthesized realistic input — as the strongest tier of a grounding hierarchy (lexical ⊂ structural ⊂ executional), whose binary pass/fail is a self-supervised correctness oracle needing no human or LLM judgement. The "structure-as-guardrail" thesis thus generalizes to "execution-as-guardrail": wherever reproducible run environments exist, execution is the only ground truth, and the prior tiers are necessary but not sufficient.
Keywords: large language models; LLM hallucination; bioinformatics; Bioconductor; structure-grounded extraction; retrieval-augmented generation; agentic bioinformatics; dual-agent validation; workflow execution; grounding hierarchy; workflow generation; reproducibility; AI code generation
Files
Files
(45.7 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:3255ff83034d45093bba16e1f9d40251
|
45.7 kB | Download |
Additional details
Software
- Repository URL
- https://github.com/bioMate-AI/biomate-bioconductor-kb
- Programming language
- Python , R
- Development Status
- Active