Published March 16, 2025 | Version v3

AXIOM: Adaptive eXecution with Intelligent Operations Memory — A Sparse Dynamic Routing Architecture for Cost-Efficient LLM Inference

Description

We present AXIOM, a lightweight machine learning framework implemented in pure Rust for training and deploying small transformer-based text classifiers. The primary architectural contribution is a sparse computation graph supporting four distinct traversal directions (forward, lateral, feedback, and temporal) that enables non-local communication between classification nodes. Unlike all existing text classifiers and LLM routers surveyed (75+), which make single-pass decisions, AXIOM nodes exchange information and form dynamic coalitions before committing to a classification. The framework requires zero external ML framework dependencies, achieves microsecond-latency inference on CPU, and provides a complete training pipeline including backpropagation, Adam/AdamW optimisation, and JSON weight serialisation. AXIOM combines the sparse graph structural encoder (128 dimensions, 1.2M parameters) with a trainable semantic encoder (2-layer transformer, 128 dimensions, 4 attention heads, 512 FFN, 37K parameters) through an always-fuse classification architecture. As a demonstration, we apply AXIOM to LLM query complexity routing, achieving 94.8% classification accuracy across 1,000 diverse queries with a mean inference latency of 90 microseconds. We evaluate against the RouterBench benchmark (36,511 queries, 11 models) and report 31.6% cost reduction, identifying a key finding: linguistic complexity classification and cost-optimal model selection are fundamentally different objectives. The framework trains in four minutes on a laptop CPU and compiles to a single binary suitable for on-device, edge, and embedded deployment. Code is available at https://github.com/olliverc1985/AXIOM.

Files

Files (17.1 kB)

Name Size Download all
md5:95e2ec33e2464ec619e7aed49444cede
17.1 kB Download