Published May 6, 2026
| Version 1
Report
Open
Toward LLM-Assisted Policy Enforcement at the Kernel Boundary
Authors/Creators
Contributors
Researcher:
Description
AI coding agents now execute file, process, and network operations on developer hosts with the user's full token authority. We study whether LLM-assisted policy verdicts can support runtime enforcement for these agents under syscall-blocking latency constraints. We describe a Windows runtime-guardrail architecture with kernel-mode hooks for file-system, network, and process events plus a userspace policy pipeline that routes ambiguous events to Claude Haiku 4.5 via AWS Bedrock and matches a small pre-registered pattern set synchronously for unambiguous high-risk actions.
In this paper we measure only the userspace verdict pipeline of the prototype, under a user-mode-fallback configuration in which the kernel drivers were not loaded on the test host. The measurement covers 1,247 events spanning 1,000 scenarios drawn from a five-category threat taxonomy for AI coding agents. We report: (E1) Bedrock round-trip and event-to-verdict latency CDFs at the prototype-default batching configuration, (E6) the impact of a synchronous fast-path that bypasses the LLM for unambiguous cases, (E5) a cost-vs-latency sweep across six batching configurations, and (E4) the geographic latency floor across three Bedrock regions.
Two findings dominate. First, an LLM-only critical path is fragile under typical batching: at the prototype-default configuration (BEDROCK_MAX_BATCH=10, BEDROCK_BATCH_DELAY=2.0 s), event-to-verdict p99 reaches 7,741 ms and 65% of Bedrock-routed events (excluding synchronous fast-path hits) exceed a 4 s userspace timeout in our measurements; counted across all events the rate is 56%. A re-tuned configuration with a shorter batch window (d=0.5 s) reduces the Bedrock-routed rate to 7%, at a 71% increase in API calls. Second, where a synchronous fast-path is implemented, architectural placement matters as much as the pattern set: the same six pre-registered patterns produce p99 fast-path latency of 3,617 ms when executed inside the single-threaded LLM reviewer, versus 1.00 ms when executed synchronously on the publishing thread — a 3,617-fold reduction with no change to the patterns or workload. The fast-path's coverage in our pilot pattern set is small (about 14% of events), so the architectural finding is a latency claim, not a coverage claim. The results support a hybrid design: deterministic synchronous controls for unambiguous high-risk actions, with the LLM reserved for slower semantic review on a path that is not load-bearing for the intended kernel-mode deadline.
Files
whitepaper.pdf
Files
(503.9 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:e12f8317fefaaa007e3012ee37a62290
|
503.9 kB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/gdf-ai/agent-runtime-guardrail