AXIOM: Adaptive eXecution with Intelligent Operations Memory — A Sparse Dynamic Routing Architecture for Cost-Efficient LLM Inference
Authors/Creators
Description
We present AXIOM, a lightweight machine learning framework implemented in pure Rust for training and deploying small transformer-based text classifiers. The primary architectural contribution is a sparse computation graph supporting four distinct traversal directions (forward, lateral, feedback, and temporal) that enables non-local communication between classification nodes. Unlike all existing text classifiers and LLM routers surveyed (75+), which make single-pass decisions, AXIOM nodes exchange information and form dynamic coalitions before committing to a classification. The framework requires zero external ML framework dependencies, achieves microsecond-latency inference on CPU, and provides a complete training pipeline including backpropagation, Adam/AdamW optimisation, and JSON weight serialisation. AXIOM combines the sparse graph structural encoder (128 dimensions, 1.2M parameters) with a trainable semantic encoder (2-layer transformer, 128 dimensions, 4 attention heads, 512 FFN, 37K parameters) through an always-fuse classification architecture. As a demonstration, we apply AXIOM to LLM query complexity routing, achieving 94.8% classification accuracy across 1,000 diverse queries with a mean inference latency of 90 microseconds. We evaluate against the RouterBench benchmark (36,511 queries, 11 models) and report 31.6% cost reduction, identifying a key finding: linguistic complexity classification and cost-optimal model selection are fundamentally different objectives. The framework trains in four minutes on a laptop CPU and compiles to a single binary suitable for on-device, edge, and embedded deployment. Code is available at https://github.com/olliverc1985/AXIOM.
Files
Files
(17.1 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:95e2ec33e2464ec619e7aed49444cede
|
17.1 kB | Download |