Conference paper Open Access
Marco Cococcioni; Federico Rossi; Emanuele Ruffaldi; Sergio Saponara
With the pervasiveness of deep neural networks in scenarios that bring real-time requirements, there is the increasing need for optimized arithmetic on high performance architectures. In this paper we adopt two key visions: i) extensive use of vectorization to accelerate computation of deep neural network kernels; ii) adoption of the posit compressed arithmetic in order to reduce the memory transfers between the vector registers and the rest of the memory architecture. Finally, we present our first results on a real hardware implementation of the ARM Scalable Vector Extension.
Name | Size | |
---|---|---|
Rossi_C5_applepies_2021.pdf
md5:86938c6836de4bcbf4c52beb705f6a58 |
279.1 kB | Download |
All versions | This version | |
---|---|---|
Views | 16 | 16 |
Downloads | 18 | 18 |
Data volume | 5.0 MB | 5.0 MB |
Unique views | 12 | 12 |
Unique downloads | 13 | 13 |