Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

Published November 5, 2019 | Version v1
Presentation Open

MPI-based tools for large-scale training and optimization at HPC sites

Description

MPI-learn and MPI-opt are libraries to perform large-scale training and hyper-parameter optimization for deep neural networks. The two libraries, based on Message Passing Interface, allows to perform these tasks on GPU clusters, through different kinds of parallelism. The main characteristic of these libraries is their flexibility: the user has complete freedom in building her own model, thanks to the multi-backend support. In addition, the library supports several cluster architectures, allowing a deployment on multiple platforms. This generality can make this the basis for a train & optimise service for the HEP community. We present scalability results obtained from two typical HEP use-case: jet identification from raw data and shower generation from a GAN model. Results on GPU clusters were obtained at the ORNL TITAN supercomputer ad other HPC facilities, as well as exploiting commercial cloud resources and OpenStack. A comprehensive comparisons of scalability performance across platforms will be presented, together with a detailed description of the libraries and their functionalities.

Files

CHEP2019_35.pdf

Files (1.1 MB)

Name Size Download all
md5:6682bb5d1067abaa6a3afe65fe0f33be
1.1 MB Preview Download