MPI-based tools for large-scale training and optimization at HPC sites

Vladimir Loncar; Jean-Roch Vlimant; Sofia Vallecorsa; Gul Rukh Khattak; Maurizio Pierini; Thong Nguyen; Federico Carminati

doi:10.5281/zenodo.3598749

Published November 5, 2019 | Version v1

Presentation Open

MPI-based tools for large-scale training and optimization at HPC sites

MPI-learn and MPI-opt are libraries to perform large-scale training and hyper-parameter optimization for deep neural networks. The two libraries, based on Message Passing Interface, allows to perform these tasks on GPU clusters, through different kinds of parallelism. The main characteristic of these libraries is their flexibility: the user has complete freedom in building her own model, thanks to the multi-backend support. In addition, the library supports several cluster architectures, allowing a deployment on multiple platforms. This generality can make this the basis for a train & optimise service for the HEP community. We present scalability results obtained from two typical HEP use-case: jet identification from raw data and shower generation from a GAN model. Results on GPU clusters were obtained at the ORNL TITAN supercomputer ad other HPC facilities, as well as exploiting commercial cloud resources and OpenStack. A comprehensive comparisons of scalability performance across platforms will be presented, together with a detailed description of the libraries and their functionalities.

Files

CHEP2019_35.pdf

Files (1.1 MB)

Name	Size	Download all
CHEP2019_35.pdf md5:6682bb5d1067abaa6a3afe65fe0f33be	1.1 MB	Preview Download

Views

Downloads

Show more details

	All versions	This version
Views	36	36
Downloads	25	25
Data volume	27.5 MB	27.5 MB

More info on how stats are collected....

DOI

Resource type

Presentation

Publisher

Zenodo

Conference

24th International Conference on Computing in High Energy & Nuclear Physics , Adelaide, Australia (Session Track 9 - Exascale Science)

Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: January 6, 2020
Modified: July 22, 2024

MPI-based tools for large-scale training and optimization at HPC sites

Creators

Description

Files

CHEP2019_35.pdf

Files (1.1 MB)