Report Open Access
Aumage, Olivier;
Carpenter, Paul;
Benkner, Siegfried
As HPC hardware continues to evolve and diversify and workloads become more dynamic and complex, applications need to be expressed in a way that facilitates high performance across a range of hardware and situations. The main application code should be platform-independent, malleable and asynchronous with an open, clean, stable and dependable interface between the higher levels of the application, library or programming model and the kernels and software layers tuned for the machine. The platform-independent part should avoid direct references to specific resources and their availability, and instead provide the information needed to optimise behaviour.
This paper summarises how task abstraction, which first appeared in the 1990s and is already mainstream in HPC, should be the basis for a composable and dynamic performance-portable interface. It outlines the innovations that are required in the programming model and runtime layers, and highlights the need for a greater degree of trust among application developers in the ability of the underlying software layers to extract full performance. These steps will help realise the vision for performance portability across current and future architectures and problems.
Name | Size | |
---|---|---|
ETP4HPC_WP_Task-based-PP_FINAL.pdf
md5:21ef9ac844d9d6479d047cf095cef572 |
4.8 MB | Download |
Emmanuel Agullo, Olivier Aumage, Mathieu Faverge, Nathalie Furmento, Florent Pruvost, Marc Sergent, and Samuel Thibault. Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model. IEEE TPDS, 2017. DOI: http://dx.doi.org/10.1109/TPDS.2017.2766064.
Michael Bauer, Sean Treichler, Elliott Slaughter, and Alex Aiken. Legion: expressing locality and independence with logical regions. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '12), 2012. PDF: https://legion.stanford.edu/pdfs/sc2012.pdf.
Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. 1995. Cilk: an efficient multithreaded runtime system. SIGPLAN Not. 30, 8 (Aug. 1995), 207–216. DOI: https://doi.org/10.1145/209937.209958.
Chameleon Dense Linear Algebra library: https://gitlab.inria.fr/solverstack/chameleon.
J. Dokulil, M. Sandrieser and S. Benkner, "Implementing the Open Community Runtime for Shared-Memory and Distributed-Memory Systems," 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), 2016, pp. 364-368, doi: 10.1109/PDP.2016.81.
A. Duran, E. Ayguadé, Rosa M. Badia, Jesús Labarta, Luis Martinell, X. Martorell and Judit Planas. Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures, Parallel Process. Lett., 2011. Volume 21, pp.173–193.
Code FLUSEPA: https://hal.inria.fr/hal-01507613.
Garcia-Gasulla, M., Mantovani, F., Josep-Fabrego, M., Eguzkitza, B. and Houzeaux, G., 2020. Runtime mechanisms to survive new HPC architectures: a use case in human respiratory simulations. The International Journal of High Performance Computing Applications, 34(1), pp.42-56.
H2020 INTERTWinE: http://www.intertwine-project.eu/.
Reazul Hoque, Thomas Herault, George Bosilca, and Jack Dongarra. Dynamic task discovery in PaRSEC: a data-flow task-based runtime. In Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA '17), 2017. DOI: https://doi.org/10.1145/3148226.3148233.
Hartmut Kaiser, Thomas Heller, Bryce Adelstein-Lelbach, Adrian Serio, and Dietmar Fey. HPX:A task based programming model in a global address space. In 8th international conference on partitioned global address space programming models PGAS'14, 2014. DOI: https://doi.org/10.1145/2676870.2676883.
OCR specification: https://www.univie.ac.at/ocr-vx/doc/ocr-v1.1.0.pdf
OCR-Vx website: https://www.univie.ac.at/ocr-vx/
Perez, Josep M., et al. "Improving the integration of task nesting and dependencies in OpenMP." 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 2017.
OpenMP Application Program Interface. Version 3.0 May 2008. Available: https://openmp.org/wp-content/uploads/spec30.pdf.
PasTiX Sparse Linear Algebra library: https://solverstack.gitlabpages.inria.fr/pastix/
StarPU website: https://starpu.gitlabpages.inria.fr/
Afshin Zafari, Elisabeth Larsson, Martin Tillenius. DuctTeip: An efficient programming model for distributed task-based parallel computing. Parallel Computing, Vol. 90, 2019. DOI: https://doi.org/10.1016/j.parco.2019.102582.
All versions | This version | |
---|---|---|
Views | 157 | 157 |
Downloads | 86 | 86 |
Data volume | 413.6 MB | 413.6 MB |
Unique views | 134 | 134 |
Unique downloads | 78 | 78 |