Benchmarking Neural Networks on Heterogeneous Hardware Resources
Authors/Creators
- 1. Aptiv Services Deutschland GmbH
- 2. University of Hildesheim
Description
In recent years, artificial intelligence (AI) became a key enabling technology for many domains. To
achieve best performance, modern AI methods have high resource demands, e.g., GPU servers for the
training of neural networks. With the advent of further processor technologies, such as tensor processors
or re-wirable processors, AI methods may be executed in shorter time while even saving energy. For
many application domains such as autonomous driving or unmanned aerial vehicles, real-time constraints
mandate low end-to-end latencies in AI processing.
In this paper, we present a combined micro- and macro-benchmarking approach to analyze the
performance as well as the power demands of modern processor architectures using convolutional neural
networks as workload. We discuss tradeoffs among the different processor types and indicate issues and
challenges that arise when performing such benchmarks on heterogeneous hardware resources.
We show that FPGAs allow for an increase of 7x up to 45x in performance over high-end GPUs while
using only 10% of the power. In the consumer space, novel architectures such as the Apple M1 are able
to offer 3-5x better performance at 10-20% the power draw of current x86 CPU or GPU hardware.
This artifact contains the replication package for the respective paper (paper, slides included) on the Symposium of Software Performance 2021.