Layered Defense Against Stealth Prompt Injection in Hinglish: An Empirically Grounded Hybrid Architecture
Description
This paper presents a 5-layer detection pipeline for semantically disguised prompt injection attacks in Hinglish, a code-mixed language spoken by over 600 million people. The system combines normalization, rule-based filtering, a novel "Contextual Guard" derived from red-teaming five frontier models, and an SVM classifier on multilingual embeddings. On a 250-sample stealth benchmark, the full pipeline achieves 98.4% detection (compared to 85.6% for a syntactic baseline) with a 0.6% false positive rate on clean prompts. The pipeline is CPU-deployable (~475 MB, 35–45ms latency) and exported to ONNX. This work addresses the semantic gap in Hinglish prompt injection detection and provides an empirically grounded first step toward safety infrastructure for code-mixed AI deployments.
Files
hinglish_prompt_injection_tex.pdf
Files
(522.3 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:28a4755f92c4aeca20c3a9009fdabc06
|
522.3 kB | Preview Download |
Additional details
Related works
- Is supplemented by
- Journal article: 10.1038/s41598-026-43883-0 (DOI)
Software
- Repository URL
- https://github.com/kyahikaru/hinglish-prompt-injection-detector.git
- Programming language
- Python , Jupyter Notebook
- Development Status
- Active