Published May 16, 2023 | Version Pre-print
Conference paper Open

LSTM acceleration with FPGA and GPU devices for edge computing applications in B5G MEC

  • 1. National Technical University of Athens, Greece
  • 2. National Technical University of Athens, Greece & National and Kapodistrian University of Athens, Greece

Description

The advent of AI/ML in B5G and Multi-Access Edge Computing will rely on the acceleration of neural networks. The current work focuses on the acceleration of Long Short-Term Memory (LSTM) kernels playing a key role in numerous applications. We assume various LSTM sizes while targeting FPGA and GPU hardware for both embedded and server MEC purposes. Systematically, we perform a design space exploration to determine the most efficient acceleration approach and most suitable configuration for each device. We use High-Level-Synthesis to implement our proposed circuit architectures on Xilinx FPGAs, while we use high level tools for NVIDIA GPUs such as PyTorch’s JIT compiler or ONNX runtime. Our exploration shows that the full parallelization of an LSTM array multiplication quickly overutilizes the FPGA, while on GPUs LSTM models can be deployed more easily. Instead, the best approach for FPGAs is to find a balance between parallelizing LSTM gates and vector multiplications. Our comparative study shows that FPGAs prevail in light LSTM models, whereas GPUs prevail in larger model topologies. Moreover, we show that far- and near-edge FPGAs achieve similar latency, however, near-edge GPUs can achieve one order of magnitude faster execution than far-edge GPUs. The best results range in 0.3-5msec latency per execution with acceleration factors in 12×−174×.

Files

LSTM___SAMOS_2022.pdf

Files (521.9 kB)

Name Size Download all
md5:1cfe9d36a0ea908f8d1c8111d4bee156
521.9 kB Preview Download

Additional details

Funding

AIatEDGE – A secure and reusable Artificial Intelligence platform for Edge computing in beyond 5G Networks 101015922
European Commission