Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

Published January 6, 2021 | Version v1
Journal article Open

Fine-grained MPI+OpenMP plasma simulations: communication overlap with dependent tasks

  • 1. CEA, IRFM
  • 2. CEA, maison de la simulation
  • 3. INRIA, Univ. Lyon

Description

This paper demonstrates how OpenMP 4.5 tasks can be used
to eciently overlap computations and MPI communications based on a
case-study conducted on multi-core and many-core architectures. It focuses
on task granularity, dependencies and priorities, and also identies
some limitations of OpenMP. Results on 64 Skylake nodes show that
while 64% of the wall-clock time is spent in MPI communications, 60%
of the cores are busy in computations, which is a good result. Indeed,
the chosen dataset is small enough to be a challenging case in terms of
overlap and thus useful to assess worst-case scenarios in future simulations.
Two key features were identied: by using task priority we improved the
performance by 5.7% (mainly due to an improved overlap), and with recursive
tasks we shortened the execution time by 9.7%. We also illustrate
the need to have access to tools for task tracing and task visualization.
These tools allowed a ne understanding and a performance increase for
this task-based OpenMP+MPI code.

Files

Richard_2019_MPI_OpenMP_communication_overlap.pdf

Files (669.3 kB)

Name Size Download all
md5:d777fbbb6c2ab0588733484ed56dca34
669.3 kB Preview Download

Additional details

Funding

EoCoE-II – Energy Oriented Center of Excellence : toward exascale for energy 824158
European Commission

References

  • In European Conference on Parallel Processing, pages 419-433. Springer (2019)