Heterogeneous High Performance Computing
- 1. BSC
- 2. HPE
- 3. Seagate Systems
- 4. Forschungszentrum Jülich
Description
Modern HPC systems are becoming increasingly heterogeneous, affecting all components of HPC systems, from the processing units, through memory hierarchies and network components to storage systems. This trend is on the one hand due to the need to build larger, yet more energy efficient systems, and on the other hand it is caused by the need to optimise (parts of the) systems for certain workloads. In fact, it is not only the systems themselves that are becoming more heterogeneous, but also scientific and industrial applications are increasingly combining different technologies into complex workflows, including simulation, data analytics, visualisation, and artificial intelligence/machine learning. Different steps in these workflows call for different hardware and thus today’s HPC systems are often composed of different modules optimised to suit certain stages in these workflows.
While the trend towards heterogeneity is certainly helpful in many aspects, it makes the task of programming these systems and using them efficiently much more complicated. Often, a combination of different programming models is required and selecting suitable technologies for certain tasks or even parts of an algorithm is difficult. Novel methods might be needed for heterogeneous components or be only facilitated by them. And this trend is continuing, with new technologies around the corner that will further increase heterogeneity, e.g. neuromorphic or quantum accelerators, in-memory-computing, and other non-von-Neumann approaches.
In this paper, we present an overview of the different levels of heterogeneity we find in HPC technologies and provide recommendations for research directions to help deal with the challenges they pose. We also point out opportunities that particularly applications can profit from by exploiting these technologies. Research efforts will be needed over the full spectrum, from system architecture, compilers and programming models/languages, to runtime systems, algorithms and novel mathematical approaches.
Files
ETP4HPC_WP_Heterogeneous-HPC_20220216.pdf
Files
(3.2 MB)
Name | Size | Download all |
---|---|---|
md5:b2f9858906fe0c9711db4603206de8b5
|
3.2 MB | Preview Download |
Additional details
Funding
- European Commission
- MAESTRO – Middleware for memory and data-awareness in workflows 801101
- European Commission
- DEEP-SEA – DEEP – SOFTWARE FOR EXASCALE ARCHITECTURES 955606
- European Commission
- EuroEXA – Co-designed Innovation and System for Resilient Exascale Computing in Europe: From Applications to Silicon 754337
- European Commission
- FocusCoE – Concerted action for the European HPC CoEs 823964
References
- [1] R. H. Dennard, F. H. Gaensslen, H.-N. Yu, V. L. Rideout, E. Bassous and A. R. LeBlanc, "Design of ion-implanted MOSFET's with very small physical dimensions," IEEE Journal of Solid-State Circuits, vol. 9, no. 5, pp. 256-268, October 1974.
- [2] "Intel discontinues Xeon Phi 7200 series Knights Landing coprocessor cards," [Online]. Available: https://www.anandtech.com/show/11769/intel-discontinues-xeon-phi-7200-series-knights-landing-coprocessor-cards.
- [3] OpenACC, [Online]. Available: https://www.openacc.org/.
- [4] OpenCL, [Online]. Available: https://www.khronos.org/registry/cl/specs/opencl-1.1.pdf.
- [5] OpenMP, [Online]. Available: https://www.openmp.org/.
- [6] "OmpSs@FPGA," [Online]. Available: https://pm.bsc.es/ompss-at-fpga.
- [7] "High Bandwidth Memory (HBM) DRAM, White Paper," JEDEC Solid State Technology Association, 2013.
- [8] "JEDEC Publishes DDR4 NVDIMM-P Bus Protocol Standard," February 2021. [Online]. Available: https://www.jedec.org/news/pressreleases/jedec-publishes-ddr4-nvdimm-p-bus-protocol-standard.
- [9] K. Wang, K. Angstadt, C. Bo, N. Brunelle, E. Sadredini, T. Tracy, J. Wadden, M. Stan and K. Skadron, "An Overview of Micron's Automata Processor," in Proceedings of the Eleventh IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES), 2016.
- [10] Y.-C. Kwon, S. H. Lee, J. Lee, S.-H. Kwon, J. M. Ryu, J.-P. Son, O. Seongil, H.-S. Yu, H. Lee, S. Y. Kim, Y. Cho, J. G. Kim, J. Choi, H.-S. Shin, J. Kim, B. Phuah, H. Kim, M. J. Song, A. Choi, D. Kim, S. Kim, E.-B. Kim, D. Wang, S. Kang, Y. Ro, S. Seo, J. Song, J. Youn, K. Sohn and N. S. Kim, "A 20nm 6GB Function-In-Memory DRAM, Based on HBM2 with a 1.2TFLOPS Programmable Computing Unit Using Bank-Level Parallelism, for Machine Learning Applications," in Proceedings of IEEE International Solid-State Circuits Conference (ISSCC), 2021.
- [11] F. Devaux, "The True Processing in Memory Accelerator," in IEEE Hot Chips Symposium (HCS), 2019.
- [12] "Intel Optane Memory - Responsive Memory, Accelerated Performance," [Online]. Available: https://www.intel.com/content/www/us/en/products/details/memory-storage/optane-memory.html.
- [13] "Intel at last announces Optane memory: DDR4 that never forgets.," May 2018. [Online]. Available: https://arstechnica.com/gadgets/2018/05/intel-finally-announces-ddr4-memory-made-from-persistent-3d-xpoint.
- [14] "memkind GitHub," [Online]. Available: http://memkind.github.io/memkind/.
- [15] "SICM GitHub," [Online]. Available: https://github.com/lanl/SICM.
- [16] DEEP-SEA Project. Horizon 2020 grant agreement 955606, [Online]. Available: https://www.deep-projects.eu/.
- [17] P. Radojković, P. Carpenter, P. Esmaili-Dokht, R. Cimadomo, H.-P. Charles, A. Sebastian and P. Amato, "Processing in Memory: The Tipping Point," ETP4HPC, 2021.
- [18] S. Ghose, A. Boroumand, J. S. Kim, J. Gomez-Luna and O. Mutlu, "Processing-in-memory: A workload-driven perspective," IBM Journal of Research and Development, vol. 63, no. 6, pp. 3:1-3:19, November-December 2019.
- [19] P. Siegl, R. Buchty and M. Berekovic, "Data-centric computing frontiers: A survey on processing-in-memory," in Proceedings of the Second International Symposium on Memory Systems, 2016.
- [20] S. Narasimhamurthy, N. Danilov, S. Wu, G. Umanesan, S. Markidis, S. Rivas-Gomez, I. B. Peng and S. De Witt, "Sage: percipient storage for exascale data centric computing," Parallel Computing, no. 81, 2019.
- [21] IO-SEA project, [Online]. Available: https://iosea-project.eu.
- [22] Z. Ruan, T. He and J. Cong, "Designing In-Storage Computing System for Emerging High-Performance Drive," in Proceedings of the 2019 USENIX Annual Technical Conference, Renton, WA, USA, 2019.
- [23] "What's a DPU data processing unit," [Online]. Available: https://blogs.nvidia.com/blog/2020/05/20/whats-a-dpu-data-processing-unit/.
- [24] EuroEXA project. Horizon 2020 grant agreement number 754337, [Online]. Available: https://euroexa.eu/.
- [25] E. Suarez and T. Lippert, "Modular Supercomputing Architecture: from idea to production," in Contemporary High Performance Computing: from Petascale toward Exascale, vol. 3, Ed. Jeffrey S. Vetter, CRC Press, 2019, pp. 223-251.
- [26] DEEP projects. Horizon 2020 grant agreements 287530, 610476, 754303 and 955606., [Online]. Available: https://www.deep-projects.eu/.
- [27] J. Schmidt, "Network Attached Memory, Chapter 4 of the PhD Thesis: "Accelerating Checkpoint/Restart Application Performance in Large-Scale Systems with Network Attached Memory", Ruprecht-Karls University Heidelberg - Fakultaet fuer Mathematik und Informatik," [Online]. Available: http://archiv.ub.uni-heidelberg.de/volltextserver/23800/1/dissertation_juri_schmidt_publish.pdf.
- [28] P. Faraboschi, K. Keeton, T. Marsland and D. Milojicic, "Beyond Processor-centric Operating Systems," in 15th Workshop on Hot Topics in Operating Systems (HotOS XV), 2015.
- [29] Rigo et al., "Paving the way towards a highly energy-efficient and highly integrated compute node for the Exascale revolution: the ExaNoDe approach," in Euromicro Symposium on Digital System Design, DSD 2017, 2017.
- [30] S. Schweitzer, "Panel: SmartNIC or DPU, Who Wins?," in 2020 IEEE Symposium on High-Performance Interconnects (HOTI), see https://technologyevangelist.co/2020/08/25/smartnics-vs-dpus/, 2020.
- [31] M. Malms, M. Ostasz, M. Gilliot, P. Bernier-Bruna, L. Cargemel, E. Suarez, H. Cornelius, M. Duranton, B. Koren, P. Rosse-Laurent, M. S. Pérez-Hernández, M. Marazakis, G. Lonsdale, P. Carpenter, G. Antoniu , S. Narasimhamurthy, A. Brinkman, D. Pleiter, A. Tate, A. Wierse, J. Krueger, H.-C. Hoppe and E. Laure, "ETP4HPC's Strategic Research Agenda for High-Performance Computing in Europe 4," Zenodo, 2020.
- [32] SPIFFE, [Online]. Available: https://spiffe.io/.
- [33] F. Tessier, M. Martinasso, M. Chesi, M. Klein and M. Gila, "Dynamic Provisioning of Storage Resources: A Case Study with Burst Buffers," in 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2020.
- [34] "BeeOND™: BeeGFS On Demand," [Online]. Available: https://www.beegfs.io/wiki/BeeOND.
- [35] DAOS, [Online]. Available: https://docs.daos.io/.
- [36] "Portable Hardware Locality (hwloc)," [Online]. Available: https://www.open-mpi.org/projects/hwloc/.
- [37] "Portable Network Locality (netloc)," [Online]. Available: https://www.open-mpi.org/projects/netloc/.
- [38] "CUDA Zone," [Online]. Available: https://developer.nvidia.com/cuda-zone.
- [39] "HIP Guide," [Online]. Available: https://rocmdocs.amd.com/en/latest/Programming_Guides/HIP-GUIDE.html.
- [40] S. Memeti, L. Li, S. Pllana, J. Kołodziej and C. Kessler, "Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption," in ARMS-CC '17. Proceedings of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing, July 2017.
- [41] Kokkos, [Online]. Available: https://kokkos.org/.
- [42] "RAJA," [Online]. Available: https://computing.llnl.gov/projects/raja-managing-application-portability-next-generation-platforms.
- [43] ADIOS2, [Online]. Available: https://csmd.ornl.gov/software/adios2.
- [44] "XIOS wiki page," [Online]. Available: http://forge.ipsl.jussieu.fr/ioserver/wiki.
- [45] Maestro project, [Online]. Available: https://www.maestro-data.eu/.
- [46] O. Aumage, P. Carpenter and S. Benkner, "Task-Based Performance Portability in HPC," Zenodo, 2021.
- [47] SINGULARITY, [Online]. Available: https://singularity.hpcng.org/.
- [48] Docker, [Online]. Available: https://www.docker.com/.
- [49] Podman, [Online]. Available: https://podman.io/.
- [50] Sarus, [Online]. Available: https://sarus.readthedocs.io/.
- [51] ECP, [Online]. Available: https://www.exascaleproject.org/.
- [52] European Centres of Excellence in High Performance Computing, [Online]. Available: https://www.hpccoe.eu/eu-hpc-centres-of-excellence2/.