Published June 2, 2026
| Version v1
Preprint
Open
Data Quantity as Primary Bottleneck in Machine Learning for Modular Forms
Description
We test whether garden-variety sklearn models can predict Hecke eigenvalues of modular forms using only invariants from the modular symbol database. Expanding the dataset from 1K to 200K newforms reveals data quantity, not model architecture, is the key limitation. Rank prediction R² improves from 0.49 to 0.99; dimension from 0.02 to 0.73. We correct Sato-Tate moments for congruent number elliptic curves, resolving a 30-year discrepancy reported by Zagier and validate ^{\\mathrm{alg}} = rank$ predictions via BSD for \\leq 10^4$. This demonstrates algorithmic approaches complementing analytic number theory.
Files
paper.pdf
Files
(228.4 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:0f80a79c507a0ffe7f0f1cf5d88e3427
|
228.4 kB | Preview Download |