Semantic Chameleon: Corpus-Dependent Poisoning Attacks and Defenses in RAG Systems
Description
Retrieval-Augmented Generation (RAG) systems enhance Large Language Models (LLMs) by incorporating external knowledge bases,
but this architectural design introduces additional poisoning surfaces. We provide a systematic empirical study of how corpus
composition and retrieval architecture jointly affect the effectiveness of RAG poisoning attacks and the defense capability.
Using a gradient-guided dual-document "sleeper-trigger" poisoning attack, we evaluate two contrasting knowledge bases—Security
Stack Exchange (67,941 technical documents) and a FEVER Wikipedia subset (96,561 general knowledge articles). We observe a
security tension: in our cross-corpus sample (n=9 per corpus), the technical corpus enables 66.7% attack stealth success yet
shows 13–62× worse detection performance using standard retrieval-based detection than the general corpus.
Large-scale retrieval-level evaluation (n=50 attacks) on Security Stack Exchange shows that dual-document poisoning achieves a
38.0% co-retrieval success rate under pure vector retrieval systems (95% CI: 25.9%–51.8%). However, a simple hybrid BM25+vector
retriever eliminates co-retrieval of poisoned sleeper/trigger pairs in all tested configurations (α=0.3–0.7) in our
experiments, without modifying the underlying LLM.
We further compare five detection methods and find that query pattern differential analysis consistently provides the best
retrieval-level detection performance, achieving F1 scores of 0.632 on FEVER and 0.171 on Security Stack Exchange under
optimistic thresholding. We validate experimental rigor through embedding model ablation, adaptive attack testing (0% success
across 25 configurations), and holdout validation (generalization gap <0.01).
We extend validation with end-to-end LLM evaluation showing 60% attack success rate (9/15 scenarios) with 80% safety bypass
rate when poisoned context is retrieved, and a production RAG case study (156,777 documents) demonstrating that attacks fail
completely (0%) when targeting different corpora but succeed reliably (100%) when corpus-adapted.
These results highlight that corpus-aware and retrieval-aware design choices are critical for secure RAG deployment: for
security-sensitive applications, we recommend hybrid retrieval with α≤0.5 (equal or greater BM25 weighting) as a practical
default, augmented with corpus-appropriate monitoring.
Files
Semantic_Chameleon__Corpus_Dependent_Poisoning_Attacks_and_Defenses_in_RAG_Systems__3_.pdf
Files
(3.4 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:7124ee65c8d44f124861636f49ef1020
|
3.4 MB | Preview Download |
Additional details
Related works
- Is supplemented by
- Dataset: 10.5281/zenodo.18079735 (Other)
Dates
- Created
-
2025-11-01
Software
- Development Status
- Active