Gold-Standard AGI: Outer AGI Superalignment
Description
The way in which AI (and, in particular, agentic superintelligent AGI) develops over the coming decades will determine the fate of all humanity for all eternity. In order to maximise the net benefit of AGI for all humanity, without favouring any subset thereof, we imagine a Gold-Standard AGI that is maximally-aligned and maximally-validated. The first of these properties --- alignment --- is traditionally decomposed into outer alignment (how do we define a final goal FG_G that correctly states what we want?), and inner alignment (how do we build an agent G that forever pursues FG_G as intended?) This paper presents a complete, foundational, and self-contained theory of AGI, culminating in an implementation-neutral solution to the outer AGI alignment problem in the case that G is superintelligent (hence "superalignment"). Given the AGI alignment problem's profound relevance to AGI governance, we adopt a pedagogic style throughout, in order that the paper might be accessible to less technical readers such as AGI policymakers. We envisage that the definitions of practical-maximal-alignment and practical-maximal-validation presented in this paper could form the basis of an international standard for Gold-Standard AGI certification, and that this international standard could form the basis for the formal certification of AGI by competent certification authorities, such that only formally-certified Gold-Standard AGI systems could then be lawfully deployed within the jurisdiction of each certification authority.
Files
TTQ___Outer_AGI_Superalignment___AIE-DRAFT-v217.pdf
Files
(5.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:75c071319bd61364f9e7f3350c9b058c
|
5.1 MB | Preview Download |