Semantic Chameleon: Corpus-Dependent Poisoning Attacks and Defenses in RAG Systems

Thornton, Scott

doi:10.48550/arXiv.2603.18034

Published March 16, 2026 | Version v2

Publication Open

Semantic Chameleon: Corpus-Dependent Poisoning Attacks and Defenses in RAG Systems

Thornton, Scott (Researcher)¹

1. ORCID

Retrieval-Augmented Generation (RAG) systems enhance Large Language Models (LLMs) by incorporating external knowledge bases,
but this architectural design introduces additional poisoning surfaces. We provide a systematic empirical study of how corpus
composition and retrieval architecture jointly affect the effectiveness of RAG poisoning attacks and the defense capability.

Using a gradient-guided dual-document "sleeper-trigger" poisoning attack, we evaluate two contrasting knowledge bases—Security
Stack Exchange (67,941 technical documents) and a FEVER Wikipedia subset (96,561 general knowledge articles). We observe a
security tension: in our cross-corpus sample (n=9 per corpus), the technical corpus enables 66.7% attack stealth success yet
shows 13–62× worse detection performance using standard retrieval-based detection than the general corpus.

Large-scale retrieval-level evaluation (n=50 attacks) on Security Stack Exchange shows that dual-document poisoning achieves a
38.0% co-retrieval success rate under pure vector retrieval systems (95% CI: 25.9%–51.8%). However, a simple hybrid BM25+vector
retriever eliminates co-retrieval of poisoned sleeper/trigger pairs in all tested configurations (α=0.3–0.7) in our
experiments, without modifying the underlying LLM.

We further compare five detection methods and find that query pattern differential analysis consistently provides the best
retrieval-level detection performance, achieving F1 scores of 0.632 on FEVER and 0.171 on Security Stack Exchange under
optimistic thresholding. We validate experimental rigor through embedding model ablation, adaptive attack testing (0% success
across 25 configurations), and holdout validation (generalization gap <0.01).

We extend validation with end-to-end LLM evaluation showing 60% attack success rate (9/15 scenarios) with 80% safety bypass
rate when poisoned context is retrieved, and a production RAG case study (156,777 documents) demonstrating that attacks fail
completely (0%) when targeting different corpora but succeed reliably (100%) when corpus-adapted.

These results highlight that corpus-aware and retrieval-aware design choices are critical for secure RAG deployment: for
security-sensitive applications, we recommend hybrid retrieval with α≤0.5 (equal or greater BM25 weighting) as a practical
default, augmented with corpus-appropriate monitoring.

Files

Semantic Chameleon- Corpus-Dependent Poisoning Attacks and Defenses in RAG Systems-2603.18034v1.pdf

Files (1.2 MB)

Name	Size	Download all
Semantic Chameleon- Corpus-Dependent Poisoning Attacks and Defenses in RAG Systems-2603.18034v1.pdf md5:33e14a8ede83cb73cb8d55e296a8dc40	1.2 MB	Preview Download

Additional details

Is supplemented by: Dataset: 10.5281/zenodo.18079735 (Other)

Created: 2025-11-01

Development Status: Active

	All versions	This version
Views	100	8
Downloads	106	6
Data volume	410.9 MB	10.8 MB

Semantic Chameleon- Corpus-Dependent Poisoning Attacks and Defenses in RAG Systems-2603.18034v1.pdf

Files (1.2 MB)

Related works

Dates

Software

Semantic Chameleon: Corpus-Dependent Poisoning Attacks and Defenses in RAG Systems

Authors/Creators

Description

Files

Semantic Chameleon- Corpus-Dependent Poisoning Attacks and Defenses in RAG Systems-2603.18034v1.pdf

Files (1.2 MB)

Additional details

Related works

Dates

Software