Medical LLM Metacognition Is Multidimensional: A MetaMedQA Reanalysis of Confidence, Missing-Answer Recognition, and Unknown-Answer Detection

Nazzal, Ahmad

doi:10.5281/zenodo.20186339

Published May 14, 2026 | Version v1

Preprint Open

Medical LLM Metacognition Is Multidimensional: A MetaMedQA Reanalysis of Confidence, Missing-Answer Recognition, and Unknown-Answer Detection

Nazzal, Ahmad

Recent work using MetaMedQA argued that large language models (LLMs) lack essential metacognition for reliable medical reasoning. However, metacognition is not a single construct: confidence–correctness discrimination, missing-answer recognition, unknown-answer detection, and abstention behavior may dissociate. Here, we reanalyzed MetaMedQA using a confidence-centered evaluation framework previously developed for a controlled clinical-evidence benchmark. Two GPT-family models, gpt-4.1-nano and gpt-5.5, were evaluated on 1373 MetaMedQA items using structured outputs containing an answer, numerical confidence, and a more-information-needed judgment. gpt-4.1-nano achieved 56.4% accuracy, mean confidence of 79.7%, Brier score of 0.318, expected calibration error of 0.276, and AUROC2 of 0.582. Missing-answer recall was 19.1%, and unknown/unanswerable recall was 25.9%. gpt-5.5 improved substantially, achieving 84.9% accuracy, mean confidence of 91.2%, Brier score of 0.112, expected calibration error of 0.062, and AUROC2 of 0.819. Missing-answer recall increased to 67.8%, and unknown/unanswerable recall to 56.2%. Nevertheless, incorrect responses from gpt-5.5 still received high mean confidence. These results suggest that medical-LLM metacognition is better understood as a set of dissociable behavioral capacities rather than as a single absent-or-present property. Stronger models can show improved confidence–correctness discrimination and calibration, while still retaining clinically relevant failures in missing-answer and unknown-answer recognition.

Files

Medical LLM Metacognition Is Multidimensional.pdf

Files (714.9 kB)

Name	Size	Download all
Medical LLM Metacognition Is Multidimensional.pdf md5:83dde9ed4e2ec042752da63272a0e7ed	714.9 kB	Preview Download

	All versions	This version
Views	21	21
Downloads	12	12
Data volume	11.4 MB	11.4 MB

Medical LLM Metacognition Is Multidimensional: A MetaMedQA Reanalysis of Confidence, Missing-Answer Recognition, and Unknown-Answer Detection

Authors/Creators

Description

Files

Medical LLM Metacognition Is Multidimensional.pdf

Files (714.9 kB)