Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

Published June 7, 2022 | Version v1
Presentation Open

HPC Software Codesign in GROMACS

  • 1. KTH Royal Institute of Technology

Description

Abstract

GROMACS is a widely used molecular dynamics simulation package known for its versatility, performance, and portability, with excellent efficiency and scalability from laptops to supercomputers. This is enabled by state-of-the art parallel algorithms and bottom-up performance optimizations to target all levels of hardware parallelism from SIMD to multicore, NUMA, accelerators and intra- and inter-node. Motivated by the end of Dennard scaling, the increasing need to parallelize, and to make efficient use of microprocessors, the GROMACS engine has evolved through a series of algorithmic and parallelization redesigns. Codesign efforts have been at core of these efforts and this talk will give an overview of these.

Fundamental molecular dynamics algorithms have been reformulated to target wide SIMD/SIMT-style architectures combining physics and accuracy-based approach, computer science, and HPC performance engineering. A SIMD abstraction layer and algorithm benchmark miniapp was developed to facilitate codesign and allow quick and relatively easy porting to new SIMD instruction sets without expert knowledge of the algorithms or the application, often in a just hours to days.

As early as 2011 an NVIDIA collaboration became an integral part of the GROMACS development. This collaboration had two-way impact: learning from hardware and library engineers provided essential guidance, while our use-cases motivated improvements and features exposed in CUDA like stream priorities, 3D FFTs optimizations, and most recently the distributed cuFFTmp library. The collaboration evolved into an ongoing codesign project with with important recent results like: a GPU-resident loop, design and implementation of the direct GPU communication layer, and a distributed GPU-optimized particle mesh Ewald algorithm.

In the frame of the Intel CoE collaboration project delivered long-term impact results with a nearly complete SYCL backend of GROMACS, planned to become the new standards-based GPU portability layer with support for all major GPU platforms and the primary means to target AMD and Intel.

Notes

Additional funding: - SSF Infrastructure Fellow programme: SECI - Swedish e-Science Research Centre (SeRC)

Files

GROMACS-HPC-codesign_compr.pdf

Files (6.6 MB)

Name Size Download all
md5:8aa6bd605a807fbb4809e761faed3681
6.6 MB Preview Download

Additional details

Funding

BioExcel-2 – BioExcel Centre of Excellence for ComputationalBiomolecular Research 823830
European Commission