LSTM acceleration with FPGA and GPU devices for edge computing applications in B5G MEC
Creators
- 1. National Technical University of Athens, Greece
- 2. National Technical University of Athens, Greece & National and Kapodistrian University of Athens, Greece
Description
The advent of AI/ML in B5G and Multi-Access Edge Computing will rely on the acceleration of neural networks. The current work focuses on the acceleration of Long Short-Term Memory (LSTM) kernels playing a key role in numerous applications. We assume various LSTM sizes while targeting FPGA and GPU hardware for both embedded and server MEC purposes. Systematically, we perform a design space exploration to determine the most efficient acceleration approach and most suitable configuration for each device. We use High-Level-Synthesis to implement our proposed circuit architectures on Xilinx FPGAs, while we use high level tools for NVIDIA GPUs such as PyTorch’s JIT compiler or ONNX runtime. Our exploration shows that the full parallelization of an LSTM array multiplication quickly overutilizes the FPGA, while on GPUs LSTM models can be deployed more easily. Instead, the best approach for FPGAs is to find a balance between parallelizing LSTM gates and vector multiplications. Our comparative study shows that FPGAs prevail in light LSTM models, whereas GPUs prevail in larger model topologies. Moreover, we show that far- and near-edge FPGAs achieve similar latency, however, near-edge GPUs can achieve one order of magnitude faster execution than far-edge GPUs. The best results range in 0.3-5msec latency per execution with acceleration factors in 12×−174×.
Files
LSTM___SAMOS_2022.pdf
Files
(521.9 kB)
Name | Size | Download all |
---|---|---|
md5:1cfe9d36a0ea908f8d1c8111d4bee156
|
521.9 kB | Preview Download |