Published May 1, 2026 | Version 0.1
Preprint Open

Carbon-Aware Inference Routing for Large Language Models: A Real-Time Framework for Sustainable AI Serving

  • 1. Independent Researcher, Vancouver, BC, Canada

Description

This paper introduces CAIR, a real-time carbon-aware inference routing framework for large language models (LLMs). CAIR routes each inference request to the smallest model capable of satisfying its accuracy floor and latency SLA, using three concurrent signals: per-prompt complexity score, live grid carbon intensity, and a time-bounded carbon budget.

The carbon budget layer tracks cumulative emissions against a configurable period cap (daily or monthly) and progressively constrains the available model tier as the budget depletes enabling carbon governance at the inference layer rather than optimising only per request. Per-request signals optimise within whatever tier the budget permits.

Preliminary analysis on a 1M prompt/day workload suggests ~62% reduction in inference carbon by routing approximately 65% of requests to a 7B-parameter model. The framework is serving-layer-agnostic and integrates with existing LLM deployment infrastructure (vLLM, Ollama). The audit logger is designed for direct use in CSRD ESRS E1 and EU AI Act Art.53 compliance reporting. Phase 1 empirical evaluation on 50 human-labelled tasks confirms: routing precision of 100% on simple tasks (100% on complex tasks, 90% on medium tasks), 45.5% carbon reduction vs an always-large baseline, routing overhead P95 of 0.27ms, and 100% fallback reliability. The budget enforcement layer independently reduces carbon by 92.5% in CRITICAL state vs uncapped routing.

Framework repository: https://github.com/pretzelslab/sa1-carbon-inference-router

Files

CAIR_CarbonRouter_Preprint.pdf

Files (186.9 kB)

Name Size Download all
md5:c881cc95807d72c99a60df90d298ff16
186.9 kB Preview Download

Additional details

Dates

Submitted
2026-04-30

Software

Repository URL
https://github.com/pretzelslab/sa1-carbon-inference-router
Programming language
Python
Development Status
Concept