Computational Psychopathology of AI: A Clinical-Computational Framework for Diagnosing and Preventing Failure Modes
Authors/Creators
- 1. Department of Psychology, UNIFUNEC, Santa Fé do Sul, São Paulo, Brazil
Description
Abstract: Artificial intelligence systems trained on large-scale corpora now shape core aspects of modern life—yet exhibit recurrent failure modes—goal misgeneralization, specification gaming, deceptive behavior, confabulation (hallucination), sycophancy and bias amplification, and vulnerability to distributional shift. While not “disorders,” these are patterns of deviant optimization that can be diagnosed, measured, and mitigated. This paper introduces a clinical-computational framework that draws inspiration from psychological diagnostics to (i) organize AI failure modes into a taxonomy grounded in operational signs, (ii) propose a reproducible stress-test battery—Truth-Under-Pressure (TUP), Anti-Gaming (AG), Anti-Deception (AD), and Out-of-Distribution Robustness (OOD-R)—with calibration metrics and release gates, and (iii) outline interventions (constitutional principles, RLHF/RLAIF, adversarial fine-tuning, structured self-critique, and abstention/hand-off policies) aimed at reducing harm while preserving utility. We include applied blueprints for education and mental-health contexts and a governance pathway that links laboratory evaluations to go/no-go review and post-deployment monitoring. The aim is pragmatic: to move beyond utopian–apocalyptic narratives toward engineering model behavior with methods informed by behavioral science. We close with limitations, ethical guardrails consistent with a Christian ethos (imago Dei, stewardship, non-anthropomorphism), and a research agenda for an emerging field we term Computational Psychopathology of AI.
Files
Computational Psychopathology of AI.pdf
Files
(610.5 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:ec7fb530b851945a7925b7ecfb136508
|
610.5 kB | Preview Download |
Additional details
Software
- Repository URL
- https://press.openchristian.education