Published May 19, 2023 | Version 1.0.2
Dataset Open

Efficient GPU Offloading with OpenMP for a Hyperbolic Finite Volume Solver on Dynamically Adaptive Meshes

  • 1. Technical University of Munich
  • 2. Durham University
  • 3. NVIDIA

Description

We identify and show how to overcome an OpenMP bottleneck in the administration of GPU memory. It arises for a wave equation solver on dynamically adaptive block-structured Cartesian meshes, which keeps all CPU threads busy and allows all of them to offload sets of patches to the GPU. Our studies show that multithreaded, concurrent, non-deterministic access to the GPU leads to performance breakdowns, since the GPU memory bookkeeping as offered through OpenMP's map clause, i.e., the allocation and freeing, becomes another runtime challenge besides expensive data transfer and actual computation. We, therefore, propose to retain the memory management responsibility on the host: A caching mechanism acquires memory on the accelerator for all CPU threads, keeps hold of this memory and hands it out to the offloading threads upon demand. We show that this user-managed, CPU-based memory administration helps us to overcome the GPU memory bookkeeping bottleneck and speeds up the time-to-solution of Finite Volume kernels by more than an order of magnitude.

Files

AMD Results.pdf

Files (4.2 MB)

Name Size Download all
md5:ed795d349dde3726a0ee690788e81c8f
169.5 kB Preview Download
md5:37d6d46809f9bde7f83c7f98cf360a50
633.1 kB Preview Download
md5:7407c4281fbf46cb97644bb9cd0d3ccc
650.4 kB Preview Download
md5:3d61f86afbdfbc09fdd898fcf41016d7
159.4 kB Preview Download
md5:fa7204a4b09fb4076ebee8157ad9d01f
239.8 kB Preview Download
md5:d13ed845a6cb26f40e0f460c178a07fa
6.6 kB Download
md5:8fa0482e116febd38c159a5f3fffb6b7
6.3 kB Download
md5:48af74cbbac7e56da2270bbe4173b779
3.7 kB Download
md5:a4f97fbc0a3812be92f1c70d6acee837
797.0 kB Preview Download
md5:bb68a45a27b26dfac59fd1f9837baaa0
760.2 kB Preview Download
md5:0c8bfcec4d3fa6085988c6e1223c70e9
798.3 kB Preview Download