EmbedGuard: Cross-Layer Detection and Provenance Attestation for Adversarial Embedding Attacks in RAG Systems
Description
Abstract
Embedding-based Retrieval-Augmented Generation (RAG) systems are critical infrastructure for production AI applications, yet remain vulnerable to embedding space poisoning attacks that achieve disproportionate success with minimal payloads (less than 1% corpus contamination achieving greater than 80% attack success rates). Current defense approaches optimize for isolated attack surfaces, making them vulnerable to coordinated attacks distributing adversarial signals across architectural layers.
EmbedGuard is an adaptive, cross-layer detection framework integrating hardware-backed cryptographic attestation with statistical anomaly detection across four RAG architectural layers: prompt injection detection, TEE-based embedding attestation, retrieval distributional analysis, and output consistency verification. The framework employs weighted multi-signal fusion to correlate individually benign signals that collectively indicate attacks.
Evaluation on the EmbedGuard Benchmark v1.0—comprising Natural Questions (N=50), HotpotQA (N=25), MS-MARCO (N=25), and a curated injection attack dataset (N=35) spanning 25 attack categories—demonstrates 100% detection rate (30/30 attacks) with 0% false positive rate (0/105 benign queries) at sub-millisecond latency (0.04ms mean, p99 < 0.14ms on AMD EPYC 7542). Statistical significance is confirmed via Wilcoxon signed-rank test (p < 0.001). The cross-layer architecture detects all tested attack variants including direct instruction injection, jailbreak attempts, encoding obfuscation, context manipulation, and composite multi-vector attacks.
Files
embedguard-v1.0-zenodo.zip
Files
(103.0 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:bf2e370fcfaa2329cec098ecc99bdc34
|
103.0 kB | Preview Download |