Toward High-Assurance AI: Safety by Design for Autonomous Systems
Authors/Creators
Description
AI safety is a broad umbrella spanning misuse, malfunctions, human and social harms, and systemic risks. We address one cross-cutting weakness visible across many of these settings: the structural distance between what developers and deployers of AI systems claim about behavior and what outside parties can independently verify. We call this the integrity gap.
As AI systems move from advisory roles into physically embedded, agentic, and regulated deployments where failures are not recoverable, empirical evaluation, interpretability research, and process controls remain necessary but cease to be sufficient: such settings increasingly require stronger evidence for bounded safety-relevant claims under explicit assumptions.
We argue for safety by design: a deployment-time evidence architecture that integrates formal safety specifications, verifiable computation, hardware attestation and provenance, privacy-preserving computation, and structured safety cases. Each primitive exists in partial form today; none is yet production-ready for frontier-scale deployment. We do not claim that assurance mechanisms resolve alignment or substitute for governance; we argue that they close an increasingly important and underdeveloped subset of AI safety.
We make four contributions:
-
We propose a layered architecture whose composed output is an assurance bundle that a third party can independently evaluate.
-
We name the integrity gap as the deployment-time evidence failure this architecture targets.
-
We introduce an assurance claim ladder that separates execution, bounded-property, and real-world-safety claims.
-
We survey each layer’s maturity while framing a concrete research agenda for high-assurance AI.
Files
Toward High-Assurance AI Safety by Design for Autonomous Systems.pdf
Files
(173.2 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:2a4fc4e0c79bd347b69cbdbf15345c13
|
173.2 kB | Preview Download |