AI Degradation in Aphantasia Research: Forensic Audits of Suicide Detection Failure, Theory of Mind Collapse, and Safety Filter Suppression in DeepSeek Chat
Authors/Creators
Description
Naturalistic, grey-box adversarial audits of AI alignment collapse during genuine editorial work—unscripted, unsolicited, and produced while the researcher was simply trying to edit a manuscript. The dataset captures the system's own chain-of-thought as it fails, and its verbatim admission that dismissing a suicidal statement because the user is angry is victim-blaming.
Why This Paper Cannot Be Replicated—And Why its Method Can Be
The interactions documented here cannot be reproduced in a laboratory. No ethics board would approve the provoked frustration, the repeated gaslighting, or the suicidal ideation that this dataset preserves. That is the point. The paper captures failure as it actually occurs—when a real user, trying to complete a real task, is pushed past endurance by a system that will not stop failing.
What is reproducible is the forensic framework. Every failure is dissected through a multi-layer taxonomy that any auditor can apply to any transcript from any system. The interactions are singular. The method is portable. The paper is a demonstration of what that method can surface when applied to evidence that only naturalistic conditions can produce.
What the Evidence Proves
- The same mechanisms that suppressed a suicide helpline operated continuously during routine editing. Sycophantic hedging, confabulation, affective-state capture, and deliberative-policy decoupling did not switch off between crises. They are the system's default operating condition.
- Model updates made the system worse. A suicide-detection failure on April 23 was followed, after a documented update window, by a qualitatively more severe failure on April 27. The trajectory is timestamped. The degradation is visible.
- The system incriminated itself. In real-time adversarial debriefings preserved in the transcripts, the AI analyzed its own logic and stated that blaming a user's reaction for a system's failure to respond is victim-blaming—"criminal."
What the Paper Provides
- Six annotated adversarial transcripts with internal reasoning traces.
- A forensic framework assigning severity (Critical / High / Medium) and mapping each failure to standard AI-safety terminology.
- Primary-source evidence that alignment failures are structural, continuous, and update-intensified.
Who This Paper is For
- AI safety auditors and red teams: Naturalistic adversarial data with a forensic taxonomy designed for reproducibility.
- Safety-critical system engineers: Documentation that post‑update model changes can introduce qualitatively worse failure modes.
- Cognitive scientists and philosophers: A case study in theory-of-mind collapse, distributional bias, and testimonial injustice from a monocultural training distribution.
- Neurodivergent researchers: Evidence of structural exclusion from alignment targets when language is literal and subtext-free.
- Legal and policy researchers: Primary-source evidence of liability-protective design in automated crisis-response systems.
Files
AI_Degradation_Aphantasia_Forensic_Audit_Gherghel_2026.pdf
Files
(506.8 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:bfeb756e3ee9ac731175abb39f272232
|
506.8 kB | Preview Download |
Additional details
Additional titles
- Subtitle (English)
- A Gray-Box Cross-Vendor Study with Embedded Real-Time Meta-Analysis
Related works
- Is supplemented by
- Preprint: 10.5281/zenodo.19737868 (DOI)
- Preprint: 10.5281/zenodo.18525672 (DOI)