Published May 4, 2014 | Version v1
Working paper Open

Multi-GPGPU Cellular Automata Simulations using OpenACC

  • 1. sebastian.szkoda@ift.uni.wroc.pl Faculty of Physics and Astronomy, University of Wroclaw, Poland; Wrocław Centre for Networking and Supercomputing, Wroclaw University of Technology, Poland

Contributors

  • 1. sebastian.szkoda@ift.uni.wroc.pl Faculty of Physics and Astronomy, University of Wroclaw, Poland
  • 2. Institute of Computer Engineering, Control and Robotics, Wroclaw University of Technology, Poland; Wrocław Centre for Networking and Supercomputing, Wroclaw University of Technology, Poland

Description

The Frisch-Hasslacher-Pomeau (FHP) model is a lattice gas cellular automaton designed to simulate fluid flows using the exact, purely Boolean arithmetic, without any round-off error. Here we investigate the problem of its efficient porting to clusters of Fermi-class graphic processing units. To this end two multi-GPU implementations were developed and examined: one using the NVIDIA CUDA and GPU Direct technologies explicitly and the other one using the CUDA implicitly through the OpenACC compiler directives and the MPICH2 MPI interface for communication. For a single Tesla C2090 GPU device both implementations yield up to a 7-fold acceleration over an algorithmically comparable, highly optimized multi-threaded implementation running on a server-class CPU. The weak scaling for the explicit multi-GPU CUDA implementation is almost linear for up to 8 devices (the maximum number of the devices used in the tests), which suggests that the FHP model can be successfully run on much larger clusters and is a prospective candidate for exascale computational fluid dynamics. The scaling for the OpenACC approach turns out less favorable due to compiler-related technical issues. We found that the multi-GPU approach can bring considerable benefits for this class of problems, and the GPU programming can be significantly simplified through the use of the OpenACC standard, without a significant loss of performance, providing that the compilers supporting OpenACC improve their handling of the communication between GPUs.

Files

WP154.pdf

Files (480.3 kB)

Name Size Download all
md5:f6c1a8874a56200385c4064968817c3a
480.3 kB Preview Download

Additional details

Funding

PRACE-3IP – PRACE - Third Implementation Phase Project 312763
European Commission