You Only Need Your Transformer 25% of the Time: Meaning-First Execution for Eliminating Unnecessary Inference

Shamim, Ryan

doi:10.5281/zenodo.18050162

Published December 24, 2025 | Version 1.6

Preprint Open

You Only Need Your Transformer 25% of the Time: Meaning-First Execution for Eliminating Unnecessary Inference

Shamim, Ryan (Project leader)^{1, 2}

1. Anima Core Inc.
2. Shamim Institute of Soul Systems

This paper introduces Meaning-First Execution (MFEE), a control-layer execution framework that reduces unnecessary transformer inference by deciding when high-capacity language models must be invoked and when they can be safely avoided without changing outputs.

MFEE operates as a meaning-gated execution layer placed upstream of a transformer renderer. Under a strict execution contract, MFEE routes each request into one of four actions: direct response, no-op, abstention, or full transformer rendering. Crucially, whenever the transformer is invoked, MFEE enforces exact output equivalence to the baseline transformer configuration under deterministic decoding.

We present a reproducible evaluation harness demonstrating that, on a 1,000-request replay set spanning factual, conversational, creative, and safety-sensitive prompts, MFEE avoids transformer invocation 75.1% of the time while maintaining a 100.0% exact-match rate on all requests that are routed to the transformer (N = 249). All observed performance, energy, and cost improvements arise exclusively from avoided execution rather than accelerated generation.

The paper details:

The MFEE execution contract and architectural design
A replay-based equivalence validation methodology
MLPerf-style measurement protocols
Derived latency, energy, and cost implications at production scale
A black-box evaluation framework enabling third-party validation on custom workloads

MFEE reframes inference optimization as a control problem rather than a model problem, demonstrating that large-scale transformer inference is structurally over-invoked in realistic production workloads. The framework is model-agnostic and does not assume any specific internal representation, enabling application beyond the specific implementation evaluated.

This work is intended for systems researchers, infrastructure engineers, and practitioners operating large-scale AI deployments seeking to reduce inference cost and latency without degrading output quality

Files

MFEE_v1.6.pdf

Files (456.0 kB)

Name	Size	Download all
MFEE_v1.6.pdf md5:32d209f1c9c4d4e3d5b78265450f4ab7	456.0 kB	Preview Download

Additional details

Continues: Preprint: 10.5281/zenodo.17873275 (DOI)
Is supplement to: Preprint: 10.5281/zenodo.17973641 (DOI)

Submitted: 2025-12-24

Repository URL: https://github.com/Anima-Core/meaning-first-execution
Programming language: Python
Development Status: Active

	All versions	This version
Views	672	52
Downloads	302	26
Data volume	168.5 MB	17.8 MB

MFEE_v1.6.pdf

Files (456.0 kB)

Related works

Dates

Software

You Only Need Your Transformer 25% of the Time: Meaning-First Execution for Eliminating Unnecessary Inference

Authors/Creators

Description

Files

MFEE_v1.6.pdf

Files (456.0 kB)

Additional details

Related works

Dates

Software