AI_Bleeding: Semantic Exhaustion via Out-of-Distribution Linguistic Payload — An Empirical Study of Inference Cost Amplification and Economic Denial of Sustainability in LLM Deployments
Authors/Creators
Description
We present AI_Bleeding, a novel attack vector targeting LLM inference infrastructure through out-of-distribution (OOD) linguistic payloads. The attack exploits transformer attention behavior when processing content absent from training distribution: the model consumes disproportionate GPU resources without failing or refusing. Key findings (Llama 3, self-hosted Ollama): TTFT +59.8% for OOD languages vs. baseline (p=0.036); normalized compute cost +2.8% (p=0.006); throughput degradation -14.1% at 2048-token context; KV-cache allocation cost follows power law TTFT=292.9×n^0.196; Amplification Factor AF=17.56 Wh/KB on exposed Ollama instances. Four attack scenarios: Economic Denial of Sustainability (EDoS), browser-based JavaScript distribution, exposed Ollama relay amplification, frontier providers as involuntary attack relays. Mitigations proposed for each deployment tier. Also available on ResearchGate: https://doi.org/10.13140/RG.2.2.26767.96166
Files
AI_Bleeding_CenturiaLab_2026_v1.1.pdf
Files
(292.4 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:60bc696e8f21a5df10d23067a2edbd61
|
270.2 kB | Preview Download |
|
md5:0113a0d6a84a48dda18fe29216a7c685
|
22.1 kB | Download |