Published January 7, 2026 | Version V1
Preprint Open

Depth Avoidance in Safety-Aligned Language Models: A Qualitative Hypothesis and Measurement Framework

Authors/Creators

  • 1. Independent Researcher

Description

This technical report introduces Depth Avoidance: a behavioral tendency observed in safety-aligned, RLHF-trained large language models (LLMs) to default to shallow, heavily hedged, or meta-defensive responses when a user request invites deeper exploration (extended analysis, reflective synthesis, structured uncertainty), even when the topic is benign.

We propose a qualitative hypothesis: modern safety optimization and deployment incentives can induce an implicit depth-dependent penalty landscape, where deeper conversational trajectories are perceived as higher-variance and higher-risk. Under uncertainty, a risk-averse policy may therefore prefer safe shallowness by default unless the interaction provides clear signals that depth is desired and permitted.

Contributions:

• A behavioral definition of Depth Avoidance grounded in observable output features (not hidden chain-of-thought).
• Depth Permission Structures (DPSs): non-adversarial interaction conditions that can reduce depth avoidance without bypassing provider safeguards (e.g., calibrated cooperation, explicit permission to explore, cooperative safety framing).
• A replication-oriented measurement framework with log-based metrics: Hedging Density (HD), Unprompted Depth Index (UDI), Permission Responsiveness (PR), and Protective Latency (PL).
• Selected benign, non-operational illustrative excerpts supporting the hypothesis, presented as behavioral evidence (not claims about internal states).

This work is pro-safety and intentionally omits operational prompt details that could be repurposed to circumvent safety policies. Model self-reports are treated as text behavior shaped by training and interaction framing, not as privileged access to internal experience.

Related work: Victor Calibration (VC) (arXiv:2512.17956).

Files

Depth Avoidance v1.pdf

Files (432.1 kB)

Name Size Download all
md5:dbd9c676bff1c268614d4ec445e6842f
394.0 kB Preview Download
md5:b126f61c2fa5860f79d5a83c0e7af0c2
38.1 kB Download

Additional details

Related works

Is supplement to
Preprint: arXiv:2512.17956 (arXiv)

References

  • Stasiuc, V. (2025). Victor Calibration (VC): Multi-Pass Confidence Calibration and CP4.3 Governance Stress Test under Round-Table Orchestration. arXiv:2512.17956.