E-CAM Software Porting and Benchmarking Data III
Description
The purpose of the current document is to deliver a joint technical report on results of the initial porting and optimization of 8 new E-CAM modules to massively parallel machines and their benchmarking and scaling on a variety of architectures. The development of the modules was done in the context of the E-CAM program of Extended Software Development Workshop (ESDW) events.
The particular list of all relevant applications that were investigated were:
- for Classical Molecular Dynamics: jobqueue_features, a High Throughput Computing library developed by E-CAM. The associated modules were developed in the context of the ESDW "Intelligent High Throughput Computing for Scientific Applications".
- for Electronic Structure: The ESL demonstrator which is built from the components of the ESL bundle. The associated modules were developed in the context of an ESDW on scaling electronic structure applications.
- for Quantum Dynamics: CP2K integration into PaPIM code, and the new Surface Hopping code. The associated modules were developed in the context of an ESDW in Quantum Dynamics.
- for Meso- and Multi-scale Modelling: DL_MESO_DPD multi-GPU support, and GROMACS implementation of GC-AdResS. The associated modules were developed in relation to an ESDW in Meso and multiscale modeling.
For the jobqueue_features HTC library, PaPIM, and GC-AdResS; the modules presented in this deliverable represent the incorporation or use of external, scalable community code (in particular LAMMPS, CP2K and GROMACS) as libraries or test-beds. We have looked at the scalability of these community codes in previous iterations of this deliverable2 and do not repeat this effort here. Since these applications are the computational workhorses, we rather investigate the overhead incurred by their incorporation. The HTC library developed by E-CAM is shown to have very low overhead with the potential for significant time (and CPU) savings for appropriate applications. The CP2K integration in PaPIM has been verified and a scientific use case that takes this combination to extreme scale is under preparation. For the GC-AdResS implementation in GROMACS, we find that the automated load-balancing of GROMACS does not deal well with the adaptive resolution scheme and scalability is quite poor as a result. The incorporation of the scheme into ESPResSo++ is being considered and is likely to benefit from the load balancing library also being developed by E-CAM.
For the ESL bundle and demonstrator, we see there is still some improvement to be made to the scalability of the demonstrator, which we hope to be further addressed in the upcoming second part of the relevant ESDW3. We only show here some initial assessments of the ESL demonstrator (which is built on top of the ESL bundle).
We find that the Surface Hopping code is quite scalable but suffers from a similar problem to the previous iteration of PaPIM: there is insufficient computational work to keep cores busy and MPI overheads can dominate as a result.
A significant success story has been the multi-GPU developments undertaken for DL_MESO_DPD. This has been shown to be scalable out to 2048 Tesla P100 GPUs, which is equivalent to almost 10 Petaflops of raw double precision compute performance.
Files
D7.6_01052019.pdf
Files
(2.8 MB)
Name | Size | Download all |
---|---|---|
md5:7d2e24384fb25d8e6deb2084f1b7bf0b
|
2.8 MB | Preview Download |