Fine-grained MPI+OpenMP plasma simulations: communication overlap with dependent tasks
- 1. CEA, IRFM
- 2. CEA, maison de la simulation
- 3. INRIA, Univ. Lyon
Description
This paper demonstrates how OpenMP 4.5 tasks can be used
to eciently overlap computations and MPI communications based on a
case-study conducted on multi-core and many-core architectures. It focuses
on task granularity, dependencies and priorities, and also identies
some limitations of OpenMP. Results on 64 Skylake nodes show that
while 64% of the wall-clock time is spent in MPI communications, 60%
of the cores are busy in computations, which is a good result. Indeed,
the chosen dataset is small enough to be a challenging case in terms of
overlap and thus useful to assess worst-case scenarios in future simulations.
Two key features were identied: by using task priority we improved the
performance by 5.7% (mainly due to an improved overlap), and with recursive
tasks we shortened the execution time by 9.7%. We also illustrate
the need to have access to tools for task tracing and task visualization.
These tools allowed a ne understanding and a performance increase for
this task-based OpenMP+MPI code.
Files
Richard_2019_MPI_OpenMP_communication_overlap.pdf
Files
(669.3 kB)
Name | Size | Download all |
---|---|---|
md5:d777fbbb6c2ab0588733484ed56dca34
|
669.3 kB | Preview Download |
Additional details
Funding
References
- In European Conference on Parallel Processing, pages 419-433. Springer (2019)