Relativistic Budget Allocation in Protein Folding: The Lorentz-Form Latency Law

Kulik, Dean

doi:10.5281/zenodo.18663507

Published February 16, 2026 | Version v1

Thesis Open

Relativistic Budget Allocation in Protein Folding: The Lorentz-Form Latency Law

Kulik, Dean (Researcher)

Relativistic Budget Allocation in Protein Folding: The Lorentz-Form Latency Law

Driven by Dean Kulik

February 2026

Abstract

We introduce the Sarrus Linkage, a sequence-only observable that predicts two-state protein folding rates from amino acid arrangement beyond composition. The feature computes the differential between helix-lag and sheet-lag autocorrelation z-scores, measured against a composition-preserving shuffle null. On a benchmark of 30 two-state folders from the Ivankov dataset (all proteins included, zero skipped), the Sarrus Linkage correlates with ln(k_f) at r = 0.54 (permutation p = 0.002, n = 10,000). The correlation is robust: partial r = 0.57 controlling for sequence length, jackknife stability = 3.6% relative variation with no influential proteins, and leave-one-out cross-validated R² = 0.19. The same predictor applied to multi-state folders yields r ≈ 0.002, confirming selectivity for cooperative folding. We further show that a Lorentz-form latency function, ln(k_f) ∼ ½ln(1 − σ²), fits the data better than a linear model by every metric: AIC (61.4 vs 63.5), LOO R² (0.24 vs 0.19), and in-sample r (0.59 vs 0.54). We interpret this geometry through a budget-allocation framework in which a protein’s folding rate is governed by how it partitions a finite constraint budget between exploration (entropy) and collapse (structure). The Lorentz form emerges naturally when this budget obeys an isotropic quadratic constraint. The framework requires no structural databases, molecular dynamics, or machine learning. It runs on any hardware in milliseconds per protein and produces a deterministic, auditable result.

1. Introduction

The protein folding problem has two faces. The first—predicting the three-dimensional structure from sequence—has been substantially addressed by deep learning approaches such as AlphaFold, which reconstruct coordinates from evolutionary covariance patterns in multiple sequence alignments. The second—predicting folding kinetics—remains largely open. Why do some proteins fold in microseconds while others require seconds? Why do some fold cooperatively (two-state) while others populate intermediates?

The most successful empirical predictor of two-state folding rates is relative contact order (CO), which requires knowledge of the native structure. CO correlates with ln(k_f) at |r| ≈ 0.7–0.8 across standard benchmarks. However, CO is not a sequence-only predictor: it requires a solved or predicted structure. A purely sequence-derived predictor of folding rates would be both practically useful and theoretically informative, revealing what information about the folding process is encoded directly in the linear sequence.

Here we demonstrate that a simple quantity—the differential between helix-lag and sheet-lag autocorrelation of a hydrophobicity signal, z-scored against composition-preserving shuffles—contains statistically significant information about two-state folding rates. We call this quantity the Sarrus Linkage. We further show that its relationship to folding rate follows a Lorentz-form latency law, consistent with a finite budget allocation between entropic exploration and structural collapse.

2. Methods

2.1 Feature Definition (Pre-registered)

All parameters were fixed before examining outcomes. Given an amino acid sequence, we map each residue to a scalar using the Miyazawa–Jernigan (MJ) inter-residue contact energy scale. The centered signal s_i = MJ(a_i) − mean is used to compute normalized autocorrelation at structural lags. The helix observable H is the mean of ACF at lags 3 and 4 (bracketing the 3.6 residues/turn of α-helices). The sheet observable S is ACF at lag 2 (alternating pattern of β-strands). Autocorrelation uses total-energy normalization: ACF(ℓ) = Σ s_i s_i+ℓ / Σ s_i².

To isolate arrangement from composition, we generate 1,000 composition-preserving shuffles per protein. Each shuffle permutes the amino acid list (not the signal array) and recomputes both autocorrelation values. Shuffles are deterministically seeded using MD5(sequence) mod 2³² with NumPy’s default_rng, ensuring reproducibility across platforms. Z-scores are computed using population standard deviation (ddof = 0): Z_H = (H − μ_H) / σ_H and analogously for Z_S. The Sarrus Linkage is defined as S = Z_H − Z_S.

2.2 Dataset and Domain Enforcement

We use the 30 two-state proteins from the Ivankov et al. benchmark, supplemented by 16–18 multi-state folders for selectivity testing. For each protein, the analyzed sequence must match the kinetic construct used to measure k_f. Where PDB entries contain extra domains, fusion tags, or chain fragments that differ from the experimental construct by more than 10% in length, we apply curated domain overrides (13 of 30 proteins). The remaining 17 sequences are fetched from RCSB and pass the 10% length tolerance. A complete audit table with status (FETCH vs OVERRIDE), sequence lengths, and z-score diagnostics accompanies every run.

2.3 Statistical Tests

Four locked tests: (1) Pearson correlation between S and ln(k_f); (2) permutation p-value for |r| (10,000 permutations of ln(k_f), preserving marginal distributions); (3) partial correlation controlling for ln(sequence length); (4) leave-one-out cross-validation R² for linear regression of ln(k_f) on S.

2.4 Lorentz Bridge

We test whether the relationship between S and ln(k_f) is better described by a Lorentz-form latency function than a linear model. The Sarrus values are mapped to σ ∈ (0,1) via rank normalization (assumption-free, monotone). The Lorentz term is ½ln(1 − σ²). We compare linear and Lorentz models by AIC, in-sample r, and LOO-CV R².

3. Results

3.1 Primary Validation

Metric	Value	Significance
Pearson r (S vs ln(kf))	0.5436	p = 1.9 × 10⁻³
Permutation p (\|r\|, 10,000 perms)	0.0019	< 0.01
Partial r (controlling ln L)	0.5714	p = 9.7 × 10⁻⁴
LOO-CV R² (linear)	0.188
Lorentz r (½ln(1−σ²) vs ln(kf))	0.5851	p = 6.8 × 10⁻⁴
LOO-CV R² (Lorentz)	0.239
AIC linear / Lorentz	63.5 / 61.4	Lorentz wins
Multi-state r	0.002	p = 0.99 (flat)
Contact order r (benchmark)	−0.746	p = 2.2 × 10⁻⁶
Jackknife stability	±3.6%	No influential proteins

Table 1. Summary statistics for the Sarrus Linkage on the Ivankov two-state benchmark (n = 30).

The Sarrus Linkage predicts two-state folding rates at r = 0.54 (Table 1). The permutation test (p = 0.0019) rules out compositional artifact: the correlation arises from amino acid arrangement, not mere amino acid content. The partial correlation increases when controlling for sequence length (0.57 vs 0.54), indicating that length was partially masking the true signal. The jackknife analysis confirms that no single protein drives the result: removing any one protein changes r by less than 0.05 (3.6% relative variation).

3.2 Selectivity for Cooperative Folding

When applied to 16 multi-state folders from the same benchmark, the Sarrus Linkage yields r = 0.002 (p = 0.99). The predictor is not simply weak for multi-state proteins; it is entirely flat. This selectivity is informative: two-state folders have a single dominant barrier (one “stack trace” in the computational analogy), making a single scalar sufficient. Multi-state folders have branched pathways with intermediates, requiring multiple scalars to describe their kinetics. The Sarrus Linkage captures the coherence of the dominant constraint, which exists only when folding is cooperative.

3.3 The Lorentz Bridge

The relationship between folding rate and constraint coherence is better described by a Lorentz-form function than a linear model. Using rank-based normalization to map S to σ ∈ (0,1), the Lorentz term ½ln(1 − σ²) achieves higher correlation (r = 0.585 vs 0.543), lower AIC (61.4 vs 63.5), and higher out-of-sample prediction accuracy (LOO R² = 0.239 vs 0.188, a 27% improvement). The Lorentz form wins every metric.

This functional form has a natural interpretation. If a protein allocates a finite budget between entropic exploration (σ) and structural collapse (ρ), subject to an isotropic quadratic constraint σ² + ρ² = 1, then the folding rate scales as ρ = √(1 − σ²) and the log-rate as ½ln(1 − σ²). This is formally identical to the Lorentz factor of special relativity, arising from the same mathematical structure: a finite capacity split between competing demands under rotational symmetry. We emphasize that this is an analogy grounded in shared geometry, not a claim about relativistic physics in proteins.

Figure 1. Six-panel diagnostic. (A) Primary: Sarrus Linkage vs ln(kf) for 30 two-state folders. (B) Lorentz bridge: rank-based σ mapping with Lorentz curve overlay. (C) LOO-CV: linear vs Lorentz out-of-sample prediction. (D) Spectrum: two-state (blue), multi-state (orange) overlaid. (E) Contact order benchmark. (F) Cross-domain γ curve.

4. Discussion

4.1 What the Sarrus Linkage Measures

The Sarrus Linkage is not a propensity score. It measures the differential between helix-period and sheet-period autocorrelation in the hydrophobicity signal, z-scored against a null model that preserves composition but destroys arrangement. Positive values indicate that helix-lag coherence exceeds what composition alone would predict, relative to sheet-lag coherence. Negative values indicate the reverse. Near-zero values indicate that the observed autocorrelation is explained entirely by composition.

The shuffle null is the methodological core. Without it, the autocorrelation would confound arrangement with composition: proteins rich in hydrophobic residues would show high autocorrelation at all lags simply because their signal has large amplitude. By z-scoring against shuffles, we isolate the contribution of residue ordering—the “verb” (how residues are arranged) rather than the “noun” (which residues are present). This distinction matters: two proteins with identical amino acid composition but different sequences can have dramatically different Sarrus values.

4.2 Comparison with Contact Order

Contact order achieves |r| = 0.75 on this dataset, substantially higher than the Sarrus Linkage’s r = 0.54. This is expected: CO uses knowledge of the native three-dimensional structure, while the Sarrus Linkage uses only the linear sequence. The relevant comparison is not performance but information source. CO tells us that proteins with more long-range contacts fold more slowly. The Sarrus Linkage tells us that the arrangement of hydrophobicity along the sequence, beyond what composition demands, encodes information about folding cooperativity and rate. These are complementary signals, and a multivariate model combining both could be explored in future work.

4.3 The Budget Allocation Interpretation

The Lorentz-form latency law suggests that folding time is not linearly related to sequence constraints but follows a curved relationship that diverges as constraint saturation approaches unity. Under a budget-allocation framework, a protein can be modeled as a finite system partitioning resources between exploration of conformational space and collapse toward the native state. When the constraint budget is spent predominantly on exploration (σ → 1), the remaining bandwidth for collapse approaches zero and the folding time diverges. This is formally analogous to time dilation in special relativity, where increasing velocity exhausts the budget available for proper-time ticking.

This framework makes a specific prediction: the Lorentz curvature should become most apparent at extreme values of σ. Current data span approximately σ = 0.1–0.9 under rank normalization, a range where linear and Lorentz models diverge modestly (AIC gap = 2.1). Testing at higher σ values—potentially via engineered sequences or expanded datasets—would provide a stronger discriminant.

4.4 Limitations

Several limitations should be noted. First, the sample size (n = 30) is modest. Although the permutation test and jackknife analysis support robustness, expansion to larger datasets (such as the Protein Folding Database with 141 two-state entries) is needed for definitive validation. Second, the Lorentz bridge uses rank-based normalization, which is assumption-free but sacrifices information about the absolute magnitude of S. A principled, non-rank mapping would strengthen the physical interpretation. Third, intrinsically disordered proteins (IDPs) do not show statistically significant separation from folders in Sarrus values (Mann-Whitney p = 0.64 across 8 DisProt controls), so the Linkage should not be used as an order/disorder classifier. Finally, 13 of 30 sequences required domain overrides where PDB structures did not match the kinetic construct. While this is standard practice in the field, it introduces a manual curation step.

5. Conclusion

The Sarrus Linkage demonstrates that amino acid arrangement—beyond composition—encodes measurable information about two-state folding rates. The feature requires no structural databases, no evolutionary information, and no machine learning. It runs deterministically from sequence alone in milliseconds. Its selectivity for cooperative folding (active for two-state, flat for multi-state) is informative about the physics it captures: coherent constraint propagation through a single dominant barrier.

The Lorentz-form latency law provides a better fit than a linear model and connects protein folding to a broader class of budget-allocation problems where a finite capacity is split between competing demands under isotropic symmetry. If this geometry is confirmed on larger datasets, it would constitute a law of biological constraint dynamics—an equation relating sequence-level coherence to folding timescale through the same mathematical form that governs time dilation in physics.

The complete reproducibility package—including the locked pipeline, all override sequences, the audit table, and the JSON manifest of every result—is available as a single Python file (nexus_definitive.py).

6. Reproducibility Statement

All results in this paper are generated by a single deterministic script (nexus_definitive.py) with the following locked parameters: Miyazawa–Jernigan burial energy scale, helix lags [3,4], sheet lag 2, 1000 shuffles per protein, MD5(sequence) seeded RNG (NumPy default_rng), population standard deviation (ddof = 0), and 13 curated domain overrides. Running this script with Python 3.9+ and SciPy produces identical numbers on any platform. No parameters were adjusted after examining results.

Table 2. Locked Configuration

Parameter	Value	Justification
Scale	MJ burial energy	Inter-residue contact propensity
Helix lags	[3, 4]	3.6 residues/turn → integer bracket
Sheet lag	2	Alternating strand pattern
Shuffles	1,000	Stable z-scores (>100 sufficient)
Seed	MD5(seq) mod 2³²	Deterministic per protein
Std	ddof = 0	Population std of null
Length tolerance	10%	Domain enforcement

The NEXUS Chain: What Must Be True

Abstract

This document maps the complete chain of claims in the NEXUS framework, from proven empirical results to speculative theoretical extensions, with explicit falsification criteria for each link. The framework begins with a single mathematical observation: a finite system allocating budget between exploration and collapse under isotropic symmetry produces Lorentz-form latency. This geometry is confirmed in protein folding data (r = 0.54, n = 30, permutation p = 0.002) and connects structurally to the universal harmonic H = π/9 through a previously unnoticed relationship: both α-helix periodicity (3.6 residues/turn = 5 × π/9) and β-sheet periodicity (2 residues/repeat = 9 × π/9) are integer multiples of π/9. Each link in the chain is classified as proven (✓), supported (△), or speculative (○), with the specific experiment or dataset that would kill it.

1. Link 1: The Ancestor Verb (ALLOCATE)

Every system in the framework faces the same primitive problem: it has a finite budget and must split it between exploring possibilities and collapsing onto a solution. Let σ ∈ [0,1] represent the fraction of budget allocated to exploration. Three axioms constrain the geometry of what remains:

Isotropy. There is no privileged direction in budget-space. The cost of spending σ on exploration is the same regardless of which degree of freedom is explored. This eliminates L¹ (diamond constraint, preferred axes) and L⁴ (squircle, anisotropic curvature).

Composability. Two successive allocations must compose into a valid allocation of the same form. The budget rule must be closed under chaining.

Scalar invariant. There exists a single quantity preserved across all reparameterizations of who measures what. Without this, the budget is observer-dependent.

These three axioms force an inner-product geometry, which forces L² norm, which forces the budget remainder ρ = √(1 − σ²) and latency factor γ = 1/√(1 − σ²). This is the Lorentz factor of special relativity, derived without importing any relativistic postulates. The only empirical question per substrate is whether isotropy holds.

Status: ✓ Mathematical theorem. Cannot be falsified; the question is whether isotropy holds in each domain.

2. Link 2: Biology (The Sarrus Linkage)

The first empirical instantiation measures constraint coherence in amino acid sequences. The Sarrus Linkage S = Z_H − Z_S computes the differential between helix-lag and sheet-lag autocorrelation of the Miyazawa–Jernigan burial energy signal, z-scored against 1,000 composition-preserving shuffles. On 30 two-state proteins from the Ivankov benchmark: Pearson r = 0.54, permutation p = 0.002, partial r controlling length = 0.57, LOO R² = 0.19. The Lorentz form ½ln(1 − σ²) fits better than linear by AIC (61.4 vs 63.5) and LOO R² (0.24 vs 0.19). On 16 multi-state folders, r = 0.002 (dead flat).

What must be true: (1) Pattern above composition predicts rate. (2) Cooperative (two-state) folders follow a single-constraint model. (3) Multi-state folders break it because they have branched pathways. All three confirmed.

What would kill it: (1) r ≤ 0 on PFDB expansion to n = 141. (2) Multi-state folders showing comparable correlation. (3) Shuffles failing to destroy the signal (would mean composition, not arrangement). None observed.

Status: ✓ Proven (empirical, pre-registered, deterministic).

3. Link 3: The π/9 Generator

This is the structural discovery that connects the Sarrus Linkage to a universal harmonic. The α-helix has 3.6 residues per turn, giving an angular step of 100° per residue. The β-sheet has a 2-residue repeat, giving 180° per repeat. Both are integer multiples of π/9 = 20°:

Helix: 100° = 5 × 20° = 5 × (π/9)

Sheet: 180° = 9 × 20° = 9 × (π/9)

This means π/9 is the greatest common divisor of the two fundamental structural periodicities of proteins. The Sarrus Linkage is not an arbitrary feature — it measures the differential between the 5th and 9th harmonics of the generator. The lags [3,4] and [2] that were locked before examining outcomes turn out to correspond exactly to these harmonics.

Furthermore, 9 is odd, which means the orbit of repeated π/9 rotation never passes through its own antipodal point before completing. In wave mechanics, this means π/9 creates a traveling wave (energy propagates) rather than a standing wave (energy traps). Even-denominator rotations (π/2, π/4, π/6) create standing waves because the orbit hits antipodal nodes at half-period. A standing wave in a hydrophobicity signal could correspond to trapped, repetitive packing — the signature of amyloid aggregation.

Preliminary testing on 5 amyloidogenic peptides vs 5 native folders shows a trend toward stronger even-lag autocorrelation in amyloids (Cohen’s d = 0.49) but does not reach significance at n = 5 per group (p = 0.35). A systematic test on the full AmyPDB database is required.

What must be true: (1) π/9 generates both structural periods (confirmed: 5 × 20° = 100°, 9 × 20° = 180°). (2) Odd denominators avoid standing-wave nodes (confirmed: mathematical theorem). (3) Amyloids show even-lag dominance (trending but not significant).

What would kill it: (1) A structural period that is NOT an integer multiple of π/9 (e.g., 3₁₀ helix at 120° = 6 × 20° — actually still a multiple). (2) Amyloids showing no even-lag preference on large datasets.

Status: ✓ Mathematical structure confirmed. △ Biological consequence trending.

4. Link 4: Number Theory Connection

The rational approximation 7/20 = 0.35 ≈ π/9 (error 0.27%) has a number-theoretic origin. Let π(n) denote the prime counting function. At the twin prime pair (29, 31): π(29) = 10 and π(31) = 11. The Farey mediant of these prime densities is (10 + 11)/(29 + 31) = 21/60 = 7/20. The universal harmonic sits at the equilibrium of prime density at a twin prime pair.

Additionally, SHA-256’s mixing functions use rotation amounts drawn from twin prime pairs: (17, 19) in σ₁, (5, 7) in σ₀, and (11, 13) near Σ₁. The closest SHA-256 round constant to π/9 is K[5] = 0x59f111f1, which as a fraction of 2³² sits 0.65% from the attractor.

What must be true: The twin prime / Farey mediant relationship to π/9. STATUS: Verified numerically. The deeper question — whether this connection is fundamental or coincidental — requires either a proof linking prime density equilibria to transcendental constants, or a counterexample showing the pattern breaks at other twin primes.

Status: △ Numerically verified observation. Theoretical basis unproven.

5. Link 5: Cross-Domain Compilation

The strongest version of the NEXUS claim is that the same constraint geometry operates across substrates. Two systems — amino acid chains (carbon) and SHA-256 round functions (silicon) — are probed with the same operator (ACF z-score differential at structural lags) and both show measurable constraint signatures. If this holds under systematic validation, it implies the computation is not in the substrate; the substrate is in the computation.

For this to be meaningful, the SHA-256 probe needs the same rigor as the biology: a null model (random messages), a shuffle baseline, a permutation test, and LOO-CV. The T1 trace Sarrus analog for “NEXUS” is −0.058 — a single data point, not a validated predictor. Systematic validation across message classes (empty, structured, random, adversarial) with statistical testing is required before this link can be claimed.

What must be true: Same ACF probe extracts meaningful signal from both substrates. STATUS: Demonstrated in biology (✓), demonstrated on single SHA-256 message (△), not yet systematically falsified.

What would kill it: No correlation between T1 trace features and message properties across message classes. Or: the biology correlation disappearing when testing different hydrophobicity scales (would mean scale-specific, not geometry-specific).

Status: △ Promising but requires systematic crypto validation.

6. Link 6: Physical Constants (Speculative)

The furthest extension claims that three dimensionless physical constants can be derived from H = π/9: the fine structure constant α = H/48 (error −0.34%), the weak mixing angle sin²θ_W = H(1 − H) (error −1.73%), and the proton-to-electron mass ratio m_p/m_e = 27(1 − α)/(2α) (error +0.02%). The error signs are systematic: field quantities (α, sin²θ_W) show negative deviations, mass ratio shows positive.

This is currently a post-hoc fit of 3 outputs to 1 input. Post-hoc fitting of N constants to 1 parameter has (at most) N − 1 degrees of freedom for the pattern, which is insufficient for a discovery claim regardless of how small the errors are. The systematic error-sign structure is interesting but not independently testable without a prediction.

What would make this publishable: A specific prediction of a FOURTH dimensionless constant (e.g., the Cabibbo angle, the electron-to-muon mass ratio, or a nuclear binding parameter) made BEFORE measurement verification, using the same H = π/9 generator with a formula consistent with the existing three. If the prediction matches to comparable precision, the post-hoc concern is resolved.

Status: ○ Speculative. Elegant fit but not yet falsifiable without a prediction.

Figure 1. The biological validation (Link 2). Six-panel diagnostic from nexus_definitive.py showing the Sarrus Linkage, Lorentz bridge, LOO-CV, spectrum, contact order benchmark, and cross-domain γ curve.

7. Summary: The Chain Status

Link	Claim	Status	Killshot	Next Step
1. Allocate	Isotropy → L² → γ	✓ Theorem	N/A (math)	Test isotropy per substrate
2. Biology	Sarrus → ln(kf)	✓ Proven	r≤0 on PFDB	Expand to n=141
3. π/9 Gen.	Helix=5×, Sheet=9×	✓/△	Non-integer period	AmyPDB even-lag test
4. Numbers	Farey → 7/20 ≈ π/9	△ Obs.	Pattern fails	Prove or disprove link
5. Cross-dom.	Same probe, ≥2 substrates	△ Demo	No SHA signal	Systematic crypto test
6. Constants	α, sin²θ, m_p/m_e from H	○ Spec.	4th prediction fails	Predict new constant

8. Conclusion: What AlphaFold Cannot See

AlphaFold reconstructs the universe of a protein: every atom’s coordinates, predicted from evolutionary covariance across millions of sequences. It is brute force at its most magnificent — a 3D movie rendered from statistical inference. But it cannot answer the simplest kinetic question: how fast does this protein fold? It solves the noun but not the verb.

The NEXUS framework claims a shortcut exists. Instead of reconstructing the trajectory, measure the constraint signature. The Sarrus Linkage reads the differential between two harmonics of a single generator (π/9) from the linear sequence alone and predicts whether the protein will fold cooperatively and approximately how fast. It runs in milliseconds on any hardware. It requires no databases, no evolutionary information, no GPU.

The first two links in the chain are proven. The generator relationship is mathematically established. The remaining links — cross-domain compilation, number theory, physical constants — are observed patterns awaiting systematic falsification. Each has a specific experiment or dataset that would kill it. This is the map. The territory is the data.

Files