Single-Kernel Fusion for Sequential Fitness Evaluation via WebGPU Compute Shaders

Gunaydin, Ahmet Baris

doi:10.5281/zenodo.19331834

Published March 30, 2026 | Version v1

Preprint Open

Single-Kernel Fusion for Sequential Fitness Evaluation via WebGPU Compute Shaders

Gunaydin, Ahmet Baris¹

1. Independent Researcher

Fusing sequential fitness evaluations into single GPU compute shader dispatches eliminates the per-step kernel launch overhead that dominates framework-based GPU computation. On a 1,500-timestep financial simulation, a WebGPU compute shader achieves 46.2 gen/s — 7.2× over JAX GPU with lax.scan+vmap (6.43 gen/s on Tesla T4) and 94× over PyTorch CUDA. On Acrobot-v1 (500 timesteps, RK4), the gap narrows to 1.29× over JAX GPU, revealing that the fusion advantage scales with episode length L. JAX GPU dominates on embarrassingly parallel Rastrigin (1,164 vs 170 gen/s), confirming the advantage is specific to sequential workloads. A native Metal baseline via wgpu quantifies Chrome's browser overhead at 1.92×. We show torch.compile fails at L≥1,000 and that WebGPU dominates CMA-ES across all tested dimensionality regimes. The insight — hand-fused compute shaders outperform even XLA-compiled loop fusion on long sequential fitness functions — applies beyond WebGPU, and WebGPU makes such fusion accessible with zero installation.

Files

Single_Kernel_Fusion_for_Sequential_Fitness_Evaluation_via_WebGPU_Compute_Shaders__6_.pdf

Files (450.5 kB)

Name	Size	Download all
Single_Kernel_Fusion_for_Sequential_Fitness_Evaluation_via_WebGPU_Compute_Shaders__6_.pdf md5:1b3fa0d38bd815f2bf1a0ceba9a4a2f4	450.5 kB	Preview Download

Additional details

Repository URL: https://github.com/abgnydn/webgpu-kernel-fusion
Development Status: Active

	All versions	This version
Views	297	91
Downloads	202	79
Data volume	110.7 MB	40.1 MB

Single-Kernel Fusion for Sequential Fitness Evaluation via WebGPU Compute Shaders

Authors/Creators

Description

Files

Single_Kernel_Fusion_for_Sequential_Fitness_Evaluation_via_WebGPU_Compute_Shaders__6_.pdf

Files (450.5 kB)

Additional details

Software