Accelerated Machine Learning as a Service for Particle Physics Computing
Authors/Creators
-
Javier Duarte1
-
Burt Holzman2
- Sergo Jindariani2
-
Thomas Klijnsma2
-
Benjamin Kreis2
- Mia Liu2
-
Kevin Pedro2
-
Nhan Tran2
- Aristeidis Tsaris2
- Phil Harris3
- Dylan Rankin3
- Vladimir Loncar4
-
Jennifer Ngadiuba4
-
Maurizio Pierini4
- Suffian Khan5
- Brian Lee5
- Brandon Perez5
- Ted W. Way5
- Colin Versteeg5
- Scott Hauck6
-
Shih-Chieh Hsu6
- Matthew Trahms6
- Dustin Werran6
-
Zhenbin Wu7
- 1. University of California San Diego
- 2. Fermi National Accelerator Laboratory
- 3. MIT
- 4. CERN
- 5. Microsoft
- 6. University of Washington
- 7. University of Illinois at Chicago
Description
Large-scale particle physics experiments face challenging demands for high- throughput computing resources both now and in the future. New heterogeneous computing paradigms on dedicated hardware with increased parallelization, such as Field Programmable Gate Arrays (FPGAs), offer exciting solutions with large potential gains. The growing applications of machine learning algorithms in particle physics for simulation, reconstruction, and analysis are naturally deployed on such platforms. We demonstrate that the acceleration of machine learning inference as a web service represents a heterogeneous computing solution for particle physics experiments that requires minimal modification to the current computing model. As an example, we retrain the ResNet-50 convolutional neural network to demonstrate state-of-the-art performance for top quark jet tagging at the LHC. Using Microsoft Azure Machine Learning deploying Intel FPGAs to accelerate the ResNet-50 image classification model, we achieve average inference times of 60 (10) milliseconds with our experimental physics software framework deployed as a cloud (edge or on-premises) service, representing an improvement by a factor of approximately 30 (175) in model inference latency over traditional CPU inference in current experimental hardware. A single FPGA service accessed by many CPUs achieves a throughput of 600-700 inferences per second using an image batch of one, comparable to large batch-size GPU throughput and significantly better than small batch-size GPU throughput. Deployed as an edge or cloud service for the particle physics computing model, coprocessor accelerators can have a higher duty cycle and are potentially much more cost-effective.
Files
NeurIPS_ML4PS_2019_64.pdf
Files
(301.4 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:530b873ddc85d6e60697e85c0b2091be
|
301.4 kB | Preview Download |