Presentation Open Access

SmartSim: Online Analytics and Machine Learning for HPC Simulations

Partee, Sam; Ellis, Matthew; Rigazzi, Alessandro; Bachman, Scott; Marques, Gustavo; Shao, Andrew

SmartSim is an open source library dedicated to enabling online analysis and Machine Learning (ML) for traditional High Performance Computing (HPC) simulations. SmartSim provides the ability for simulations written in C, C++, Fortran, and Python to call out to PyTorch, TorchScript, TensorFlow, and any model that supports the ONNX format (i.e. scikit-learn). In addition, the in-transit architecture of SmartSim enables simulation data streaming for online analysis, processing, and training.

In this talk we detail the SmartSim architecture and provide benchmarks including online inference and throughput on multiple Cray XC50 supercomputers. We will detail examples including how we used SmartSim to run a 12-member ensemble of global-scale, high-resolution ocean simulations, each spanning 19 compute nodes, all communicating with the same ML architecture at each simulation timestep. Lastly, we will present our plans for open source community involvement, and detail current development directions and research.

Files (89.5 MB)
Name Size
Partee_2021-06-16.mp4
md5:51799c4856609c2179f2a094bd6104f8
85.0 MB Download
Partee_2021-06-16.pdf
md5:d1b6d800ecf8617b06931cca51074527
4.5 MB Download
49
28
views
downloads
All versions This version
Views 4949
Downloads 2828
Data volume 286.5 MB286.5 MB
Unique views 4242
Unique downloads 2424

Share

Cite as