Published June 3, 2026 | Version v2
Working paper Open

Structural Defects Over Task Errors: A Unified Framework for Diagnosing Unreliable Software Artifacts Across Agentic, Supply-Chain, and Decompilation Pipelines

  • 1. Saluca LLC

Description

Version 2 — revised in response to an external structural review and an automated critique pass. See "Response to Review" appendix in the PDF for the change log.

A recurring pattern across recent software engineering research is that the most consequential failures in complex software pipelines are not task-level errors — wrong answers, incorrect transformations, or bad predictions — but structural defects: gaps in component coverage, mismatches between artifact representations, and integration failures that mask the signal task-level monitors are designed to detect. This paper synthesises five corpus findings spanning agentic system monitoring, software bill of materials (SBOM) generation, binary decompilation evaluation, credential leakage detection, and multi-agent LLM collaboration topology to argue a single defensible thesis: **the dominant failure mode in modern software pipelines is structural ambiguity, not functional error, and current tooling is systematically blind to this distinction**. The evidence base draws from cs.SE preprints covering monitoring methodology [corpus:arxiv:2606.02494], SBOM component inclusion [corpus:arxiv:2606.02442], decompilation reusability metrics [corpus:arxiv:2605.29490], credential classification [corpus:arxiv:2605.31520], and multi-agent topology experiments [corpus:arxiv:2606.01490]. Across these domains, a consistent pattern emerges: pipelines optimised for one quality axis (readability, task accuracy, alert precision) systematically underperform on orthogonal axes (functionality, structural coverage, integration completeness), and this orthogonality is rarely measured. **Important framing note:** The thesis is a *heuristic reading* of five independent studies, not a derivation from a shared formal structure. The five domains use different formalisms, different evaluation methodologies, and different operationalisations of "structural defect." The claim is that a common pattern is visible across them — not that they are instances of a single proven theorem. The primary falsification path is empirical: if a tool or monitoring regime can be shown to simultaneously maximise performance on all quality dimensions without explicit cross-axis measurement, the thesis fails. Current evidence suggests this has not been demonstrated for any of the reviewed systems. Secondary falsification requires showing that structural monitors add no predictive power beyond task-level monitors in a controlled regression study. The paper concludes with a proposal for a maturity-staged, multi-axis evaluation discipline applicable across these domains. ---

Authorship: Saluca Agentic AI Research Team (Saluca LLC). AI-drafted from arXiv preprint corpus on the date in the filename.

Cited arXiv preprints: 2605.27328, 2605.27332, 2605.29490, 2605.31520, 2606.01490, 2606.02442, 2606.02494

Notes

This paper was AI-drafted by an internal multi-persona research agent over a curated arXiv corpus. It is not peer-reviewed. All cited works are listed by arXiv ID; readers should follow those links to verify claims against the primary preprints.

Files

20260602_speedy_structural-defects-dominate-task-errors-software-pipelines_v2.pdf