There is a newer version of the record available.

Published March 17, 2026 | Version 1.0
Publication Open

Generative Specification: A Pragmatic Programming Paradigm for the Stateless Reader

Description

The ladder has been moving in one direction since the first compiler freed the engineer from machine code. Each step produced a more capable reader. Each more capable reader demanded a richer specification.

  The dominant failure mode of AI-assisted software development is not incorrect code — it is architectural drift: structurally incoherent output produced at generation speed across sessions that share no persistent context. Each AI session starts stateless. Without an explicit, self-contained specification, intent degrades with every context boundary. This paper addresses that failure mode directly.

  Generative Specification (GS) is the first programming discipline of the pragmatic dimension: the tier at which derivability — what a stateless reader can correctly determine from the artifacts alone — becomes a binding constraint. Where Robert C. Martin's paradigm sequence (structured, object-oriented, functional) constrains syntactic form, and the semantic disciplines (SOLID, TDD, DDD) constrain meaning for a contextual reader, GS constrains what can be derived by a reader carrying no accumulated context. That reader now exists at scale: a large language model that approximates, by structural analogy, what Chomsky classifies as context-sensitive reading — output whose meaning depends on surrounding context in ways no context-free parser can achieve — stateless by architecture, reading the specification as the only available instrument. Paradigm carries Martin's sense throughout: a discipline defined by what it removes from programmer freedom, not Kuhn's sense of a scientific revolution. Whether GS constitutes the latter is a determination for the community; the claim this paper advances is narrower — that GS occupies the pragmatic tier left vacant by prior disciplines — and is answerable by structural inspection. The seven properties that define a generative specification — Self-describing, Bounded, Verifiable, Defended, Auditable, Composable, and Executable — operationalize this constraint as a measurable artifact standard. A precisely stated use case simultaneously seeds an implementation contract, an acceptance test, and user documentation: three artifacts from one production rule, with test difficulty serving as the diagnostic for underspecification.

  Empirical evidence across six production projects demonstrates consistent structural outcomes. SafetyCorePro: a brownfield takeover — 411 source files, zero unit tests — transformed in under 48 hours of active development to 174 changed files, 16,229 lines inserted, 484 tests covering a fully layered architecture. ForgeCraft: a developer tooling system built under the methodology it implements — 40 commits, 307 tests, a complete package rename executed in one commit with zero regressions. The six cases span five challenge categories (takeover, brownfield, greenfield, extension, migration), all executed by one engineer with AI assistance and all verifiable from public or reviewer-accessible repository history.

  The paper defines the theoretical principle, presents the artifact grammar through which GS operates in practice, documents the empirical record, and presents one completed replication study (Rx, evidence committed) and one practitioner study scheduled for April 2026 (Dx). The multi-agent adversarial study (Ax) — seven conditions (three pre-registered, four post-hoc), results incorporated below — tests derivation quality as a function of specification completeness under controlled conditions. The Replication Experiment (Rx) independently derived and executed a scoped subset of the benchmark domain — user management, articles, profiles, and tags; comments and favourites are explicitly out of scope per the Rx specification — using a fresh GS document, producing 104 passing tests, zero failures, across seven test suites against a live PostgreSQL instance; the evidence is committed to the repository at experiments/rx/ and is reproducible by any reader with an Anthropic API key. The human practitioner experiment (Dx), scheduled for April 2026 (40 developers, dual rubric), will test whether the methodology transfers to engineers other than its author; its design is stated in §7.7.A. The adversarial study provides the controlled condition the single-author practitioner cases cannot supply; the practitioner study will establish between-practitioner replication when completed. The paradigm characterization is advanced as a theoretical claim for community evaluation; both studies are designed to test it against its most exposed flanks. A structural corollary — the community convergence theorem — establishes that when a practitioner community contributes to a shared GS methodology under quality gates, the specification floor across all governed domains rises monotonically and, by construction, cannot retreat while quality gates hold, compounding across domains without conflict. The structural argument, grounded in five architectural properties of the methodology, is developed in §10; its implication extends beyond software to any domain in which human work can be reduced to organizable intent.

Files

GenerativeSpecification_PractitionerProtocol.md

Files (1.6 MB)

Name Size Download all
md5:b1a9f622e54c314ae5d4b6aabfb1e76e
92.0 kB Preview Download
md5:27467bae7f46297d296b582f3ee42503
1.5 MB Preview Download
md5:989881a8f6cafe05db6cd8b47763da7d
41.2 kB Preview Download
md5:aad9279f1f175d1ddbff8535ee24d62c
1.0 kB Preview Download

Additional details

Related works

Software