Beyond Confidence Scores: Why Multi-Dimensional Quality Assessment Is Required for Transparent AI Systems
Authors/Creators
Description
Large language model systems that would report single confidence scores to indicate output quality, obscure which aspects of generation succeeded or failed. This fundamental opacity prevents users from making informed trust decisions and prevents engineers from performing targeted debugging. We argue that quality must be decomposed into orthogonal dimensions measuring independent concerns rather than aggregated into opaque numerical scores.
We present the case for multi-dimensional quality assessment as a foundational requirement for trustworthy AI systems. By measuring source credibility, claim support, and policy compliance independently, systems can provide transparency that enables appropriate trust calibration and actionable engineering diagnostics. This paper examines why aggregate metrics are structurally insufficient, what properties dimensional quality must satisfy, and why this represents a necessary evolution in how AI system quality is conceptualized and communicated.
The core insight: quality is multifaceted, and different quality failures require different responses. A single number cannot communicate which aspects of output are trustworthy and which are not. Dimensional decomposition transforms quality from an opaque assertion into transparent, actionable information.
Files
Beyond Confidence Scores_ Why Multi-Dimensional Quality Assessment Is Required for Transparent AI Systems.pdf
Files
(147.5 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:9aad7dda6c06114f6f6bbb00963ffefd
|
147.5 kB | Preview Download |