There is a newer version of the record available.

Published December 24, 2025 | Version 1.6
Preprint Open

You Only Need Your Transformer 25% of the Time: Meaning-First Execution for Eliminating Unnecessary Inference

  • 1. Anima Core Inc.
  • 2. Shamim Institute of Soul Systems

Description

This paper introduces Meaning-First Execution (MFEE), a control-layer execution framework that reduces unnecessary transformer inference by deciding when high-capacity language models must be invoked and when they can be safely avoided without changing outputs.

MFEE operates as a meaning-gated execution layer placed upstream of a transformer renderer. Under a strict execution contract, MFEE routes each request into one of four actions: direct response, no-op, abstention, or full transformer rendering. Crucially, whenever the transformer is invoked, MFEE enforces exact output equivalence to the baseline transformer configuration under deterministic decoding.

We present a reproducible evaluation harness demonstrating that, on a 1,000-request replay set spanning factual, conversational, creative, and safety-sensitive prompts, MFEE avoids transformer invocation 75.1% of the time while maintaining a 100.0% exact-match rate on all requests that are routed to the transformer (N = 249). All observed performance, energy, and cost improvements arise exclusively from avoided execution rather than accelerated generation.

The paper details:

  • The MFEE execution contract and architectural design
  • A replay-based equivalence validation methodology
  • MLPerf-style measurement protocols
  • Derived latency, energy, and cost implications at production scale
  • A black-box evaluation framework enabling third-party validation on custom workloads

MFEE reframes inference optimization as a control problem rather than a model problem, demonstrating that large-scale transformer inference is structurally over-invoked in realistic production workloads. The framework is model-agnostic and does not assume any specific internal representation, enabling application beyond the specific implementation evaluated.

This work is intended for systems researchers, infrastructure engineers, and practitioners operating large-scale AI deployments seeking to reduce inference cost and latency without degrading output quality

Files

MFEE_v1.6.pdf

Files (456.0 kB)

Name Size Download all
md5:32d209f1c9c4d4e3d5b78265450f4ab7
456.0 kB Preview Download

Additional details

Related works

Continues
Preprint: 10.5281/zenodo.17873275 (DOI)
Is supplement to
Preprint: 10.5281/zenodo.17973641 (DOI)

Dates

Submitted
2025-12-24

Software

Repository URL
https://github.com/Anima-Core/meaning-first-execution
Programming language
Python
Development Status
Active