Shedding the Bits: Pushing the Boundaries of Quantization with Minifloats on FPGAs

Aggarwal, Shivam; Damsgaard, Hans Jakob; Pappalardo, Alessandro; Franco, Giuseppe; Preusser, Thomas B.; Blott, Michaela; Mitra, Tulika

doi:10.5281/zenodo.13629671

Published September 2, 2024 | Version v1

Conference paper Open

Shedding the Bits: Pushing the Boundaries of Quantization with Minifloats on FPGAs

1. National University of Singapore
2. AMD Research, Dublin
3. Tampere University
4. AMD Research, Germany
5. AMD Research, Ireland

Contributors

Editors:

1. AMD Research, Ireland
2. National University of Singapore

Post-training quantization (PTQ) is a powerful technique for model compression, reducing the numerical precision in neural networks without additional training overhead. Recent works have investigated adopting 8-bit floating-point formats (\code{FP8}) in the context of PTQ for model inference. However, floating-point formats smaller than 8 bits and their relative comparison in terms of accuracy-hardware cost with integers remains unexplored on FPGAs. In this work, we present minifloats, which are reduced-precision floating-point formats capable of further reducing the memory footprint, latency, and energy cost of a model while approaching full-precision model accuracy. We implement a custom FPGA-based multiply-accumulate operator library and explore the vast design space, comparing minifloat and integer representations across 3 to 8 bits for both weights and activations. We also examine the applicability of various integer-based quantization techniques to minifloats. Our experiments show that minifloats offer a promising alternative for emerging workloads such as vision transformers.

Files

Shedding_the_Bits.pdf

Files (2.1 MB)

Name	Size	Download all
Shedding_the_Bits.pdf md5:19a021beb0835dc0761957a0cc96c493	2.1 MB	Preview Download

Additional details

arXiv: arXiv:2311.12359
DOI: 10.1109/FPL64840.2024.00048

European Commission
APROPOS - Approximate Computing for Power and Energy Optimisation 956090
National Research Foundation
Competitive Research Programme NRF-CRP23-2019-0003
Ministry of Education
Academic Research Fund T1 251RES1905

Repository URL: https://github.com/Xilinx/brevitas
Programming language: Python
Development Status: Active

	All versions	This version
Views	24	24
Downloads	22	22
Data volume	50.8 MB	50.8 MB

Contributors

Editors:

Shedding_the_Bits.pdf

Files (2.1 MB)

Identifiers

Funding

Software

Shedding the Bits: Pushing the Boundaries of Quantization with Minifloats on FPGAs

Authors/Creators

Contributors

Editors:

Description

Files

Shedding_the_Bits.pdf

Files (2.1 MB)

Additional details

Identifiers

Funding

Software