Published March 16, 2026 | Version v2
Publication Open

Semantic Chameleon: Corpus-Dependent Poisoning Attacks and Defenses in RAG Systems

  • 1. ROR icon ORCID

Description

  Retrieval-Augmented Generation (RAG) systems enhance Large Language Models (LLMs) by incorporating external knowledge bases,
  but this architectural design introduces additional poisoning surfaces. We provide a systematic empirical study of how corpus
  composition and retrieval architecture jointly affect the effectiveness of RAG poisoning attacks and the defense capability.

  Using a gradient-guided dual-document "sleeper-trigger" poisoning attack, we evaluate two contrasting knowledge bases—Security
  Stack Exchange (67,941 technical documents) and a FEVER Wikipedia subset (96,561 general knowledge articles). We observe a
  security tension: in our cross-corpus sample (n=9 per corpus), the technical corpus enables 66.7% attack stealth success yet
  shows 13–62× worse detection performance using standard retrieval-based detection than the general corpus.

  Large-scale retrieval-level evaluation (n=50 attacks) on Security Stack Exchange shows that dual-document poisoning achieves a
  38.0% co-retrieval success rate under pure vector retrieval systems (95% CI: 25.9%–51.8%). However, a simple hybrid BM25+vector
   retriever eliminates co-retrieval of poisoned sleeper/trigger pairs in all tested configurations (α=0.3–0.7) in our
  experiments, without modifying the underlying LLM.

  We further compare five detection methods and find that query pattern differential analysis consistently provides the best
  retrieval-level detection performance, achieving F1 scores of 0.632 on FEVER and 0.171 on Security Stack Exchange under
  optimistic thresholding. We validate experimental rigor through embedding model ablation, adaptive attack testing (0% success
  across 25 configurations), and holdout validation (generalization gap <0.01).

  We extend validation with end-to-end LLM evaluation showing 60% attack success rate (9/15 scenarios) with 80% safety bypass
  rate when poisoned context is retrieved, and a production RAG case study (156,777 documents) demonstrating that attacks fail
  completely (0%) when targeting different corpora but succeed reliably (100%) when corpus-adapted.

  These results highlight that corpus-aware and retrieval-aware design choices are critical for secure RAG deployment: for
  security-sensitive applications, we recommend hybrid retrieval with α≤0.5 (equal or greater BM25 weighting) as a practical
  default, augmented with corpus-appropriate monitoring.

Files

Semantic Chameleon- Corpus-Dependent Poisoning Attacks and Defenses in RAG Systems-2603.18034v1.pdf

Additional details

Related works

Is supplemented by
Dataset: 10.5281/zenodo.18079735 (Other)

Dates

Created
2025-11-01

Software

Development Status
Active