Published April 15, 2026
| Version v2
Journal article
Open
Benchmarking AlphaGenome on NVIDIA GPUs: latency, memory, and feasibility across sequence lengths
Authors/Creators
- 1. University of Washington
- 2. Stanford University
Description
AlphaGenome pushes genomic modeling to sequence lengths up to 1 Mb, but practical adoption still comes down to a simple question: what fits on the GPU you actually have, and how long will it take?
This post benchmarks the official JAX implementation and the community PyTorch port across seven NVIDIA GPUs: H200 (141 GB), H100 (80 GB), A100 (80 GB), L40S (48 GB), L40 (48 GB), A40 (48 GB), and RTX 6000 (24 GB). We report inference, heads-only finetuning, and full-weights finetuning on real genomic workloads.
Key takeaways:
* Inference up to 1 Mb is feasible on every tested GPU from 48 GB upward; 1 Mb typically requires about 35-41 GB of peak memory
* Heads-only finetuning fits on every tested GPU up to 1 Mb, making it the accessible adaptation path for most labs
* Full-weights finetuning is memory-bound: 1 Mb is comfortable on H200, borderline on H100, reaches 524 kb on 80 GB cards otherwise, and tops out at 262 kb on 48 GB cards
* Above about 131 kb, memory scaling is close to linear for all three workloads