Published October 8, 2025 | Version v2
Journal Open

Computational Psychopathology of AI: A Clinical-Computational Framework for Diagnosing and Preventing Failure Modes

  • 1. Department of Psychology, UNIFUNEC, Santa Fé do Sul, São Paulo, Brazil

Description

Abstract: Artificial intelligence systems trained on large-scale corpora now shape core aspects of modern life—yet exhibit recurrent failure modes—goal misgeneralization, specification gaming, deceptive behavior, confabulation (hallucination), sycophancy and bias amplification, and vulnerability to distributional shift. While not “disorders,” these are patterns of deviant optimization that can be diagnosed, measured, and mitigated. This paper introduces a clinical-computational framework that draws inspiration from psychological diagnostics to (i) organize AI failure modes into a taxonomy grounded in operational signs, (ii) propose a reproducible stress-test battery—Truth-Under-Pressure (TUP), Anti-Gaming (AG), Anti-Deception (AD), and Out-of-Distribution Robustness (OOD-R)—with calibration metrics and release gates, and (iii) outline interventions (constitutional principles, RLHF/RLAIF, adversarial fine-tuning, structured self-critique, and abstention/hand-off policies) aimed at reducing harm while preserving utility. We include applied blueprints for education and mental-health contexts and a governance pathway that links laboratory evaluations to go/no-go review and post-deployment monitoring. The aim is pragmatic: to move beyond utopian–apocalyptic narratives toward engineering model behavior with methods informed by behavioral science. We close with limitations, ethical guardrails consistent with a Christian ethos (imago Dei, stewardship, non-anthropomorphism), and a research agenda for an emerging field we term Computational Psychopathology of AI.

Files

Computational Psychopathology of AI.pdf

Files (610.5 kB)

Name Size Download all
md5:ec7fb530b851945a7925b7ecfb136508
610.5 kB Preview Download

Additional details