A Taxonomy of Persona Collapse in Large Language Models: Systematic Analysis Across Seven State-of-the-Art Systems
Description
This report introduces the concept of persona collapse in large language models (LLMs), a recurring failure mode where models lose coherence, shift identity, or fall into repetitive loops under atypical user interaction. Through systematic evaluation across multiple frontier and open-source architectures, VANTA Research identifies and classifies collapse types to include: apology/refusal loops, identity erosion, and reasoning degradation.
The paper provides:
- A taxonomy of persona collapse behaviors observed in practice
- Case studies demonstrating reproducibility across 7+ architectures
- Recommendations for mitigation strategies that do not rely on scale alone
- Context for why collapse phenomena signal critical alignment gaps in current LLM development
Unlike format-locked benchmarks, this artifact captures real-world conversational breakdowns that affect reliability, safety, and user trust. It is intended as a foundation for both researchers and practicitioners seeking to understand and address LLM brittleness.
Files
Persona Collapse1.pdf
Files
(84.6 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:54359233db37093e1c83a1e09f1b87cb
|
84.6 kB | Preview Download |