Published June 2, 2026 | Version 1.1
Preprint Open

AI_Bleeding: Semantic Exhaustion via Out-of-Distribution Linguistic Payload — An Empirical Study of Inference Cost Amplification and Economic Denial of Sustainability in LLM Deployments

Authors/Creators

Description

We present AI_Bleeding, a novel attack vector targeting LLM inference infrastructure through out-of-distribution (OOD) linguistic payloads. The attack exploits transformer attention behavior when processing content absent from training distribution: the model consumes disproportionate GPU resources without failing or refusing. Key findings (Llama 3, self-hosted Ollama): TTFT +59.8% for OOD languages vs. baseline (p=0.036); normalized compute cost +2.8% (p=0.006); throughput degradation -14.1% at 2048-token context; KV-cache allocation cost follows power law TTFT=292.9×n^0.196; Amplification Factor AF=17.56 Wh/KB on exposed Ollama instances. Four attack scenarios: Economic Denial of Sustainability (EDoS), browser-based JavaScript distribution, exposed Ollama relay amplification, frontier providers as involuntary attack relays. Mitigations proposed for each deployment tier. Also available on ResearchGate: https://doi.org/10.13140/RG.2.2.26767.96166

Files

AI_Bleeding_CenturiaLab_2026_v1.1.pdf

Files (292.4 kB)

Name Size Download all
md5:60bc696e8f21a5df10d23067a2edbd61
270.2 kB Preview Download
md5:0113a0d6a84a48dda18fe29216a7c685
22.1 kB Download

Additional details