Evans' Law — Regression Analysis (v3, November 2025)¶
This notebook reproduces the log–log regression of hallucination thresholds (L, tokens) on model size (M, billions of parameters) across GPT, Claude, Gemini, and Grok families.
Model:
[ \log L = \log c + \alpha \cdot \log M ]
In [1]:
import numpy as np, pandas as pd, matplotlib.pyplot as plt
df = pd.read_csv('evanslaw_v3_2025_11.csv')
M = df['parameters_billion'].astype(float).values
L = df['threshold_tokens'].astype(float).values
x = np.log(M); y = np.log(L)
alpha, log_c = np.polyfit(x, y, 1)
c = np.exp(log_c)
alpha, c
(0.7902309493562054, 1772.5891505683078)
In [2]:
# Plot observed points, theoretical M^1.5 reference, empirical fit, and CI band
import numpy as np, matplotlib.pyplot as plt
y_pred = log_c + alpha * x
residuals = y - y_pred
s = residuals.std(ddof=2)
plt.figure(figsize=(8,6))
plt.loglog(M, L, 'x', label='Observed')
plt.loglog(M, 353.6*(M**1.5), '-', label='Theoretical: L = 353.6 × M^1.5')
L_fit = c*(M**alpha)
plt.loglog(M, L_fit, '--', label=f'Empirical fit: L ≈ {c:.0f} × M^{alpha:.2f}')
M_grid = np.logspace(np.log10(M.min()), np.log10(M.max()), 200)
y_pred_grid = log_c + alpha * np.log(M_grid)
lower = np.exp(y_pred_grid - 1.96*s)
upper = np.exp(y_pred_grid + 1.96*s)
plt.fill_between(M_grid, lower, upper, alpha=0.2, label='95% Confidence Band')
plt.xlabel('Model size M (billions of parameters)')
plt.ylabel('Hallucination threshold L (tokens)')
plt.title('Evans\u2019 Law — Observed Hallucination Thresholds with 95% CI (log–log)')
plt.legend(); plt.grid(True, which='both', ls='--')
plt.savefig('evanslaw_plot_v3_2025_11.png', dpi=200, bbox_inches='tight')
plt.show()
Notes:
- Temperature = 0.2, deterministic sampling, uniform stop at first incoherence.
- Theoretical curve shown for reference; empirical scaling exponent estimated from the dataset.