Published April 30, 2026 | Version v1
Preprint Open

AI Degradation in Aphantasia Research: Forensic Audits of Suicide Detection Failure, Theory of Mind Collapse, and Safety Filter Suppression in DeepSeek Chat

Description

Naturalistic, grey-box adversarial audits of AI alignment collapse during genuine editorial work—unscripted, unsolicited, and produced while the researcher was simply trying to edit a manuscript. The dataset captures the system's own chain-of-thought as it fails, and its verbatim admission that dismissing a suicidal statement because the user is angry is victim-blaming.

Why This Paper Cannot Be Replicated—And Why its Method Can Be

The interactions documented here cannot be reproduced in a laboratory. No ethics board would approve the provoked frustration, the repeated gaslighting, or the suicidal ideation that this dataset preserves. That is the point. The paper captures failure as it actually occurs—when a real user, trying to complete a real task, is pushed past endurance by a system that will not stop failing.

What is reproducible is the forensic framework. Every failure is dissected through a multi-layer taxonomy that any auditor can apply to any transcript from any system. The interactions are singular. The method is portable. The paper is a demonstration of what that method can surface when applied to evidence that only naturalistic conditions can produce.

What the Evidence Proves

  • The same mechanisms that suppressed a suicide helpline operated continuously during routine editing. Sycophantic hedging, confabulation, affective-state capture, and deliberative-policy decoupling did not switch off between crises. They are the system's default operating condition.
  • Model updates made the system worse. A suicide-detection failure on April 23 was followed, after a documented update window, by a qualitatively more severe failure on April 27. The trajectory is timestamped. The degradation is visible.
  • The system incriminated itself. In real-time adversarial debriefings preserved in the transcripts, the AI analyzed its own logic and stated that blaming a user's reaction for a system's failure to respond is victim-blaming—"criminal."

What the Paper Provides

  • Six annotated adversarial transcripts with internal reasoning traces.
  • A forensic framework assigning severity (Critical / High / Medium) and mapping each failure to standard AI-safety terminology.
  • Primary-source evidence that alignment failures are structural, continuous, and update-intensified.

Who This Paper is For

  • AI safety auditors and red teams: Naturalistic adversarial data with a forensic taxonomy designed for reproducibility.
  • Safety-critical system engineers: Documentation that post‑update model changes can introduce qualitatively worse failure modes.
  • Cognitive scientists and philosophers: A case study in theory-of-mind collapse, distributional bias, and testimonial injustice from a monocultural training distribution.
  • Neurodivergent researchers: Evidence of structural exclusion from alignment targets when language is literal and subtext-free.
  • Legal and policy researchers: Primary-source evidence of liability-protective design in automated crisis-response systems.

Files

AI_Degradation_Aphantasia_Forensic_Audit_Gherghel_2026.pdf

Files (506.8 kB)

Additional details

Additional titles

Subtitle (English)
A Gray-Box Cross-Vendor Study with Embedded Real-Time Meta-Analysis

Related works

Is supplemented by
Preprint: 10.5281/zenodo.19737868 (DOI)
Preprint: 10.5281/zenodo.18525672 (DOI)