Carbon-Aware Inference Routing for Large Language Models: A Real-Time Framework for Sustainable AI Serving
Authors/Creators
- 1. Independent Researcher, Vancouver, BC, Canada
Description
This paper introduces CAIR, a real-time carbon-aware inference routing framework for large language models (LLMs). CAIR routes each inference request to the smallest model capable of satisfying its accuracy floor and latency SLA, using three concurrent signals: per-prompt complexity score, live grid carbon intensity, and a time-bounded carbon budget.
The carbon budget layer tracks cumulative emissions against a configurable period cap (daily or monthly) and progressively constrains the available model tier as the budget depletes enabling carbon governance at the inference layer rather than optimising only per request. Per-request signals optimise within whatever tier the budget permits.
Preliminary analysis on a 1M prompt/day workload suggests ~62% reduction in inference carbon by routing approximately 65% of requests to a 7B-parameter model. The framework is serving-layer-agnostic and integrates with existing LLM deployment infrastructure (vLLM, Ollama). The audit logger is designed for direct use in CSRD ESRS E1 and EU AI Act Art.53 compliance reporting. Phase 1 empirical evaluation on 50 human-labelled tasks confirms: routing precision of 100% on simple tasks (100% on complex tasks, 90% on medium tasks), 45.5% carbon reduction vs an always-large baseline, routing overhead P95 of 0.27ms, and 100% fallback reliability. The budget enforcement layer independently reduces carbon by 92.5% in CRITICAL state vs uncapped routing.
Framework repository: https://github.com/pretzelslab/sa1-carbon-inference-router
Files
CAIR_CarbonRouter_Preprint.pdf
Files
(186.9 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:c881cc95807d72c99a60df90d298ff16
|
186.9 kB | Preview Download |
Additional details
Dates
- Submitted
-
2026-04-30
Software
- Repository URL
- https://github.com/pretzelslab/sa1-carbon-inference-router
- Programming language
- Python
- Development Status
- Concept