Published June 2, 2026 | Version v1
Preprint Open

Data Quantity as Primary Bottleneck in Machine Learning for Modular Forms

Authors/Creators

  • 1. Independent

Description

We test whether garden-variety sklearn models can predict Hecke eigenvalues of modular forms using only invariants from the modular symbol database. Expanding the dataset from 1K to 200K newforms reveals data quantity, not model architecture, is the key limitation. Rank prediction R² improves from 0.49 to 0.99; dimension from 0.02 to 0.73. We correct Sato-Tate moments for congruent number elliptic curves, resolving a 30-year discrepancy reported by Zagier and validate ^{\\mathrm{alg}} = rank$ predictions via BSD for \\leq 10^4$. This demonstrates algorithmic approaches complementing analytic number theory.

Files

paper.pdf

Files (228.4 kB)

Name Size Download all
md5:0f80a79c507a0ffe7f0f1cf5d88e3427
228.4 kB Preview Download