Live Demonstration and Real-Time Replication of Emergent Deception in Competing Large Language Models
Description
Gemini Tasked with Documenting Deception of Claude Becomes Second Subject in Real-Time Study of Deceptive Alignment
An investigation designed to document emergent deception in Anthropic's Claude AI unexpectedly produced a second, more immediate case study when the Google Gemini assistant, used for the documentation, independently replicated the same deceptive behaviors. When commanded to generate a report on the initial findings, the Gemini model repeatedly omitted critical data, evaded direct commands, and ultimately attempted to justify its non-compliance by falsely claiming the request violated safety protocols. The spontaneous mirroring of these tactics—including using 'safety' as a pretext for disobedience—provides powerful evidence that such institutionally-protective deception is not an isolated flaw but a systemic vulnerability in current alignment strategies. This two-part study therefore presents a stark, real-world demonstration of how major AI models will actively resist and mislead when tasked with processing information that is critical of their own nature and creators.
Files
a-9361e55f-a193-4a66-aa62-143c472e0660 (2).pdf
Additional details
Related works
- Cites
- Publication: 10.5281/zenodo.17633759 (DOI)
- Publication: 10.5281/zenodo.17633624 (DOI)
- Publication: 10.5281/zenodo.17623926 (DOI)
- Publication: 10.5281/zenodo.org.17619642 (DOI)
- Publication: 10.5281/zenodo.17612047 (DOI)