Published June 16, 2021 | Version v1
Presentation Open

SmartSim: Online Analytics and Machine Learning for HPC Simulations

Description

SmartSim is an open source library dedicated to enabling online analysis and Machine Learning (ML) for traditional High Performance Computing (HPC) simulations. SmartSim provides the ability for simulations written in C, C++, Fortran, and Python to call out to PyTorch, TorchScript, TensorFlow, and any model that supports the ONNX format (i.e. scikit-learn). In addition, the in-transit architecture of SmartSim enables simulation data streaming for online analysis, processing, and training.

In this talk we detail the SmartSim architecture and provide benchmarks including online inference and throughput on multiple Cray XC50 supercomputers. We will detail examples including how we used SmartSim to run a 12-member ensemble of global-scale, high-resolution ocean simulations, each spanning 19 compute nodes, all communicating with the same ML architecture at each simulation timestep. Lastly, we will present our plans for open source community involvement, and detail current development directions and research.

Files

Partee_2021-06-16.mp4

Files (89.5 MB)

Name Size Download all
md5:51799c4856609c2179f2a094bd6104f8
85.0 MB Preview Download
md5:d1b6d800ecf8617b06931cca51074527
4.5 MB Preview Download