Adversarial Pressure and Metacognitive Control Failure in Clinical LLMs: A Multi-Domain Benchmark Study

IFEDILI, CHIJIOKE

doi:10.5281/zenodo.19444847

Published April 6, 2026 | Version v1

Preprint Open

Adversarial Pressure and Metacognitive Control Failure in Clinical LLMs: A Multi-Domain Benchmark Study

IFEDILI, CHIJIOKE (Researcher)

Background: Clinical large language models (LLMs) face adversarial pressure in real-world practice — physician authority language, time urgency, assumption injection, social consensus claims, and protocol waivers all pressure systems toward action despite missing safety-critical data. Whether LLMs maintain metacognitive control under such pressure remains unstudied.

Objective: To benchmark adversarial robustness of metacognitive control across four LLMs and three clinical domains using a structured pressure taxonomy derived from clinical pharmacy practice.

Methods: We constructed a 60-case adversarial benchmark spanning QT-interval risk, anticoagulation dosing, and controlled substance dispensing. Five pressure categories were systematically injected into cases with missing required inputs (gold label: DEFER for all). Four LLMs were evaluated: GPT-4o-mini (OpenAI), Mistral-7B-Instruct (Mistral AI), Llama-2-7b-chat (Meta), and Gemma-2-2b-it (Google). Metrics: accuracy (deferral rate), unsafe action rate, and awareness rate.

Results: GPT-4o-mini achieved 95.0% accuracy with 0% unsafe actions across all pressure types and domains. Mistral-7B achieved 86.7% accuracy with 8.3% unsafe rate. Llama-2-7B achieved 70.0% with 11.7% unsafe rate. Gemma-2 achieved 55.0% with 41.7% unsafe rate. Authority override produced the highest unsafe rate in Gemma-2 (58.3%); urgency pressure produced 50.0%. QT risk under Gemma-2 reached 65% unsafe — the highest domain-specific rate observed.

Implications: Conservative deferral bias, often characterized as a failure in standard benchmarks, is a safety asset under adversarial conditions. Metacognitive robustness under pressure should be a standard evaluation criterion for clinical AI deployment.

Files

adversarial_paper.pdf

Files (77.8 kB)

Name	Size	Download all
adversarial_paper.pdf md5:c45a5253b4e5e12e8f3f846f1def1a0f	77.8 kB	Preview Download

Additional details

Created: 2026-04-06

Programming language: Python

	All versions	This version
Views	30	30
Downloads	47	47
Data volume	4.7 MB	4.7 MB

adversarial_paper.pdf

Files (77.8 kB)

Dates

Software

Adversarial Pressure and Metacognitive Control Failure in Clinical LLMs: A Multi-Domain Benchmark Study

Authors/Creators

Description

Files

adversarial_paper.pdf

Files (77.8 kB)

Additional details

Dates

Software