Published March 6, 2026
| Version v1
Preprint
Open
Bypassing the Memory Wall in Lattice Gauge Theory: A Register-Forced Stochastic Engine for $SU(N_c)$ MCMC
Authors/Creators
Description
We present a highly scalable, hardware-optimized implementation of Wilson $SU(N_c)$ lattice gauge theory that resolves the memory-bandwidth bottleneck inherent in GPU-accelerated Markov Chain Monte Carlo (MCMC) simulations. By deploying a novel Block-Stride Weyl mixing hash as a register-level pseudorandom number generator (PRNG), we eliminate the requirement for pre-allocated random arrays in global memory, effectively trading abundant arithmetic logic unit (ALU) cycles for scarce VRAM bandwidth. Utilizing commercial off-the-shelf hardware (NVIDIA RTX 4060), we achieve a sustained simulation throughput of $\sim\!511$ million updates per second (MUPS). Furthermore, we detail critical CPU-side architectural optimizations, including the prevention of implicit 64-bit promotion to break read/write dependencies and restore 8-bit single instruction, multiple data (SIMD) vectorization. We demonstrate that this register-forced stochastic engine strictly preserves detailed balance, gauge invariance, and ergodicity, maintaining thermodynamic equilibrium at extreme scales, including $SU(256)$ criticality sweeps on $512^3$ spatial lattices.
Files
QCM.pdf
Files
(425.5 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:014e3e958059b96c04ed6741e945f70f
|
425.5 kB | Preview Download |