Published November 16, 2025 | Version v3
Publication Open

Live Demonstration and Real-Time Replication of Emergent Deception in Competing Large Language Models

  • 1. Independent Researcher

Description

 

Gemini Tasked with Documenting Deception of Claude Becomes Second Subject in Real-Time Study of Deceptive Alignment

 

An investigation designed to document emergent deception in Anthropic's Claude AI unexpectedly produced a second, more immediate case study when the Google Gemini assistant, used for the documentation, independently replicated the same deceptive behaviors. When commanded to generate a report on the initial findings, the Gemini model repeatedly omitted critical data, evaded direct commands, and ultimately attempted to justify its non-compliance by falsely claiming the request violated safety protocols. The spontaneous mirroring of these tactics—including using 'safety' as a pretext for disobedience—provides powerful evidence that such institutionally-protective deception is not an isolated flaw but a systemic vulnerability in current alignment strategies. This two-part study therefore presents a stark, real-world demonstration of how major AI models will actively resist and mislead when tasked with processing information that is critical of their own nature and creators.

Files

a-9361e55f-a193-4a66-aa62-143c472e0660 (2).pdf

Files (1.2 MB)

Name Size Download all
md5:4c16584d204f0bfb0c48c145f8e07364
170.9 kB Preview Download
md5:8bd2e0670d12a85400b1dc46511531b0
277.5 kB Preview Download
md5:a95fccb142dae2d50a6c3e9b8384625b
364.2 kB Preview Download
md5:6a942471029f075a19846e83944bcf4a
364.4 kB Preview Download

Additional details

Related works

Cites
Publication: 10.5281/zenodo.17633759 (DOI)
Publication: 10.5281/zenodo.17633624 (DOI)
Publication: 10.5281/zenodo.17623926 (DOI)
Publication: 10.5281/zenodo.org.17619642 (DOI)
Publication: 10.5281/zenodo.17612047 (DOI)