Published April 10, 2026 | Version v1
Preprint Open

Pre-registered evaluation of complex-aware structural features for TP53 dominant-negative versus loss-of-function classification: no detectable gain over an AlphaMissense plus monomer FoldX baseline

Authors/Creators

  • 1. Independent researcher

Description

Distinguishing dominant-negative (DN) from loss-of-function (LOF) missense variants is clinically meaningful but not yet reliably solved at the per-variant level. AlphaMissense (AM) reports pathogenicity but not mechanism; the gene-level mLOF tool of Badonyi and Marsh reports mechanism propensity but not per-variant labels. We pre-registered a test of whether complex-aware structural features—FoldX interface ΔΔG computed on biological assemblies of the TP53 DBD tetramer (PDB 2AC0), together with intramolecular distance features—provide independent predictive signal beyond a monomer-only baseline (AM pathogenicity score plus FoldX monomer ΔΔG) for DN versus LOF classification. The training set comprised 105 strict variants from Giacomelli et al. 2018 (45 DN-only and 60 LOF-only). Stop/go criteria were fixed before model fitting and required all four of: AUROC delta > +0.03, bootstrap 95% CI excluding zero, at least one complex feature among the top five by importance, and fold-consistent improvement in at least 7 of 10 cross-validation repeats. The full model achieved a 5-fold cross-validated AUROC of 0.904 versus 0.892 for the baseline (delta +0.013, 95% CI [−0.025, +0.055], 6/10 fold-consistent), failing three of the four pre-registered criteria; the fourth criterion could not be evaluated due to a downstream script failure but the verdict was already determined. Two parallel observations carry independent value for the field. First, the AlphaMissense + monomer baseline reached AUROC 0.892 on a mechanism task it was not trained for, suggesting that AM may already encode substantial mechanism-correlated signal in TP53 through evolutionary constraint, structural context, or both. Second, residue position alone, fit by a random forest, reached AUROC 0.946: in this strict Giacomelli-derived TP53 set, positional segregation between DN-enriched (DBD-resident) and LOF-enriched (largely TET-resident) variants dominates the classification task, substantially limiting the measurable marginal value of additional complex-aware FoldX features. We report this as a pre-registered negative result and recommend that future work in TP53 mechanism prediction adopt AM + monomer as the minimum baseline and report position-only random forest performance as a sanity check.

Files

DNSense_MiniB_v1_Chen_2026_SUBMISSION.pdf

Files (573.1 kB)

Name Size Download all
md5:919f714157c1828c047903bfd05bf23f
573.1 kB Preview Download

Additional details

Related works

Is supplemented by
Software: https://github.com/pcc402-art/dnsense (URL)