You Only Need Your Transformer 25% of the Time: Meaning-First Execution for Eliminating Unnecessary Inference
Authors/Creators
- 1. Anima Core Inc.
- 2. Shamim Institute of Soul Systems
Description
This paper introduces Meaning-First Execution (MFEE), a control-layer execution framework that reduces unnecessary transformer inference by deciding when high-capacity language models must be invoked and when they can be safely avoided without changing outputs.
MFEE operates as a meaning-gated execution layer placed upstream of a transformer renderer. Under a strict execution contract, MFEE routes each request into one of four actions: direct response, no-op, abstention, or full transformer rendering. Crucially, whenever the transformer is invoked, MFEE enforces exact output equivalence to the baseline transformer configuration under deterministic decoding.
We present a reproducible evaluation harness demonstrating that, on a 1,000-request replay set spanning factual, conversational, creative, and safety-sensitive prompts, MFEE avoids transformer invocation 75.1% of the time while maintaining a 100.0% exact-match rate on all requests that are routed to the transformer (N = 249). All observed performance, energy, and cost improvements arise exclusively from avoided execution rather than accelerated generation.
The paper details:
- The MFEE execution contract and architectural design
- A replay-based equivalence validation methodology
- MLPerf-style measurement protocols
- Derived latency, energy, and cost implications at production scale
- A black-box evaluation framework enabling third-party validation on custom workloads
MFEE reframes inference optimization as a control problem rather than a model problem, demonstrating that large-scale transformer inference is structurally over-invoked in realistic production workloads. The framework is model-agnostic and does not assume any specific internal representation, enabling application beyond the specific implementation evaluated.
This work is intended for systems researchers, infrastructure engineers, and practitioners operating large-scale AI deployments seeking to reduce inference cost and latency without degrading output quality
Files
MFEE_v1.6.pdf
Files
(456.0 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:32d209f1c9c4d4e3d5b78265450f4ab7
|
456.0 kB | Preview Download |
Additional details
Related works
- Continues
- Preprint: 10.5281/zenodo.17873275 (DOI)
- Is supplement to
- Preprint: 10.5281/zenodo.17973641 (DOI)
Dates
- Submitted
-
2025-12-24
Software
- Repository URL
- https://github.com/Anima-Core/meaning-first-execution
- Programming language
- Python
- Development Status
- Active