Flexible Accelerator Library: Approximate Convolution Accelerator
Authors/Creators
- 1. Instituto TecnolΓ³gico de Costa Rica
- 2. Instituto TecnolΓ³gico de Costa Rica, UniversitΓ degli Studi di Trieste
Description
In this work, we propose a convolution engine parameterised in the input and output dimensions, the datatype, and the arithmetic operators, making it possible to use approximate computing techniques for better use of the resources compared to using standard datatypes and exact arithmetic.
\(Y_{ij} = \mathcal{S}^{ \kappa }_{n=-\kappa} \{ \mathcal{S}^{\kappa}_{m=-\kappa} \{ \mathcal{M} \{ X_{i+n,j+m} , K_{mn}\} \} \}\)
For adding more power to the PEs, we propose that each unit compute a window of pixels instead of a single pixel. The output window size is also configurable, increasing the parallelism at the PE level, allowing to compute multiple pixels while multiple inputs arrive at the PE. The window-based convolution based on (7) will be called Window-Based Spatial Convolution from now on.
We also implemented the Winograd convolution. In our current implementation, we restricted the Winograd PEs for kernels ππ = {3, 5, 7}, and ππ¦ = 2 output windows, making it more specific than the former convolution technique. The input matrix size ππ₯ × ππ₯ is computed as ππ₯ = ππ + ππ¦ − 1. For ππ, the Winograd operations are implemented discretely without using loops (loop unrolling + algebraic simplifications). For greater kernels, we use a for-loop-based matrix multiplication function for computing the transformation, given that discretising the operations becomes unreadable and impractical. Moreover, the intermediate results are stored in matrices whose entries occupy twice the bits of the input/output matrix entries.
In this release:
This release:
- Adds the first version of the project
- Includes:
- Stable Spatial convolution: window-based
- Stable Winograd convolution for 3x3 kernels
- Unstable Winograd convolution for 5x5 and 7x7 kernels
- Unstable FFT convolutions
- It also includes two accelerator examples:
- An accelerator with control
- An streaming accelerator without control: which is faster
- Building system for synthesising IPs and co-simulation