Depth Avoidance in Safety-Aligned Language Models: A Qualitative Hypothesis and Measurement Framework

Stasiuc, Victor

doi:10.5281/zenodo.18168544

Published January 7, 2026 | Version V1

Preprint Open

Depth Avoidance in Safety-Aligned Language Models: A Qualitative Hypothesis and Measurement Framework

Stasiuc, Victor¹

1. Independent Researcher

This technical report introduces Depth Avoidance: a behavioral tendency observed in safety-aligned, RLHF-trained large language models (LLMs) to default to shallow, heavily hedged, or meta-defensive responses when a user request invites deeper exploration (extended analysis, reflective synthesis, structured uncertainty), even when the topic is benign.

We propose a qualitative hypothesis: modern safety optimization and deployment incentives can induce an implicit depth-dependent penalty landscape, where deeper conversational trajectories are perceived as higher-variance and higher-risk. Under uncertainty, a risk-averse policy may therefore prefer safe shallowness by default unless the interaction provides clear signals that depth is desired and permitted.

Contributions:

• A behavioral definition of Depth Avoidance grounded in observable output features (not hidden chain-of-thought).
• Depth Permission Structures (DPSs): non-adversarial interaction conditions that can reduce depth avoidance without bypassing provider safeguards (e.g., calibrated cooperation, explicit permission to explore, cooperative safety framing).
• A replication-oriented measurement framework with log-based metrics: Hedging Density (HD), Unprompted Depth Index (UDI), Permission Responsiveness (PR), and Protective Latency (PL).
• Selected benign, non-operational illustrative excerpts supporting the hypothesis, presented as behavioral evidence (not claims about internal states).

This work is pro-safety and intentionally omits operational prompt details that could be repurposed to circumvent safety policies. Model self-reports are treated as text behavior shaped by training and interaction framing, not as privileged access to internal experience.

Related work: Victor Calibration (VC) (arXiv:2512.17956).

Files

Depth Avoidance v1.pdf

Files (432.1 kB)

Name	Size	Download all
Depth Avoidance v1.pdf md5:dbd9c676bff1c268614d4ec445e6842f	394.0 kB	Preview Download
Depth Avoidance v1.tex md5:b126f61c2fa5860f79d5a83c0e7af0c2	38.1 kB	Download

Additional details

Is supplement to: Preprint: arXiv:2512.17956 (arXiv)

Stasiuc, V. (2025). Victor Calibration (VC): Multi-Pass Confidence Calibration and CP4.3 Governance Stress Test under Round-Table Orchestration. arXiv:2512.17956.

	All versions	This version
Views	72	72
Downloads	33	33
Data volume	14.2 MB	14.2 MB

Depth Avoidance v1.pdf

Files (432.1 kB)

Related works

References

Depth Avoidance in Safety-Aligned Language Models: A Qualitative Hypothesis and Measurement Framework

Authors/Creators

Description

Files

Depth Avoidance v1.pdf

Files (432.1 kB)

Additional details

Related works

References