Published April 1, 2022 | Version v0.1.0
Software Open

Flexible Accelerator Library: Approximate Convolution Accelerator

  • 1. Instituto TecnolΓ³gico de Costa Rica
  • 2. Instituto TecnolΓ³gico de Costa Rica, UniversitΓ  degli Studi di Trieste

Description

In this work, we propose a convolution engine parameterised in the input and output dimensions, the datatype, and the arithmetic operators, making it possible to use approximate computing techniques for better use of the resources compared to using standard datatypes and exact arithmetic.

 

\(Y_{ij} = \mathcal{S}^{ \kappa }_{n=-\kappa} \{ \mathcal{S}^{\kappa}_{m=-\kappa} \{ \mathcal{M} \{ X_{i+n,j+m} , K_{mn}\} \} \}\)


For adding more power to the PEs, we propose that each unit compute a window of pixels instead of a single pixel. The output window size is also configurable, increasing the parallelism at the PE level, allowing to compute multiple pixels while multiple inputs arrive at the PE. The window-based convolution based on (7) will be called Window-Based Spatial Convolution from now on.

We also implemented the Winograd convolution. In our current implementation, we restricted the Winograd PEs for kernels π‘π‘˜ = {3, 5, 7}, and 𝑁𝑦 = 2 output windows, making it more specific than the former convolution technique. The input matrix size 𝑁π‘₯ × π‘π‘₯ is computed as 𝑁π‘₯ = π‘π‘˜ + 𝑁𝑦 − 1. For π‘π‘˜, the Winograd operations are implemented discretely without using loops (loop unrolling + algebraic simplifications). For greater kernels, we use a for-loop-based matrix multiplication function for computing the transformation, given that discretising the operations becomes unreadable and impractical. Moreover, the intermediate results are stored in matrices whose entries occupy twice the bits of the input/output matrix entries.

In this release:

This release:

  • Adds the first version of the project
  • Includes:
    • Stable Spatial convolution: window-based
    • Stable Winograd convolution for 3x3 kernels
    • Unstable Winograd convolution for 5x5 and 7x7 kernels
    • Unstable FFT convolutions
  • It also includes two accelerator examples:
    • An accelerator with control
    • An streaming accelerator without control: which is faster
  • Building system for synthesising IPs and co-simulation

Files

approximate-convolution-accelerator-v0.1.0.zip

Files (2.4 MB)

Name Size Download all
md5:924ab42966ad1f09bdbfdbec7aad0b69
2.4 MB Preview Download
md5:987bd9cd6f2ae1034be2888464db789a
11.3 kB Download
md5:1334a84cb36e30ec4e3cd6e0908c9603
4.6 kB Preview Download