Published October 23, 2025 | Version v178
Preprint Open

TTQ: An Implementation-Neutral Solution to the Outer AGI Superalignment Problem

  • 1. BigMother.AI

Description

The way in which AI (and particularly superintelligent AGI) develops over the coming decades may determine the fate of all humanity for all eternity. In order to maximise the net benefit of AGI for all humanity, without favouring any subset thereof, we imagine a Gold-Standard AGI that is maximally-aligned, maximally-validated, and maximally-superintelligent. The first of these three properties --- alignment --- is traditionally decomposed into outer alignment (how do we define a final goal FG_S that correctly states what we want?), and inner alignment (how do we build an agent S that forever pursues FG_S as intended?) This paper addresses the former problem (outer alignment) assuming that S is superintelligent (hence "superalignment"). In this regard, we formulate a final goal TTQ and corresponding Outer Alignment Precondition OAP such that, if goal-less superintelligent agent S^- satisfies OAP (irrespective of the specific technology used to implement S^-) then final goal TTQ works as intended ("strives to maximise the net benefit of AGI for all humanity, without favouring any subset thereof"); that is, superintelligent agent S (where S = S^- + TTQ) forever strives (to the best of its ability, which is at least that of any human) to behave in a manner that is at all times maximally aligned with a maximally fair aggregation of the individual idealised (i.e. actual, rational, well-informed, and freely-determined) preferences of all human beings (living or future). Thus the (hard) problem of building a maximally-aligned agentic superintelligence S is reduced to the (much easier) problem of building an OAP-compliant non-agentic superintelligence S^-.

Files

TTQ___Outer_AGI_Superalignment-DRAFT-v178.pdf

Files (10.5 MB)

Name Size Download all
md5:d00091496782dc30d56fbaa926d494dd
10.5 MB Preview Download