Combining Relevance and Magnitude for Resource-saving DNN Pruning
Description
Pruning neural networks, i.e., removing some of their parameters whilst retaining their accuracy, is one of the main ways to reduce the latency of a machine learn- ing pipeline, especially in resource- and/or bandwidth-constrained scenarios. In this context, the pruning tech- nique, i.e., how to choose the parameters to remove, is crit- ical to the system performance. In this paper, we propose a novel pruning approach, called FlexRel and predicated upon combining training-time and inference-time infor- mation, namely, parameter magnitude and relevance, in order to improve the resulting accuracy whilst saving both computational resources and bandwidth. Our performance evaluation shows that FlexRel is able to achieve higher pruning factors, saving over 35% bandwidth for typical accuracy targets.
Files
Combining_Relevance_and_Magnitude_for_Resource-saving_DNN_Pruning.pdf
Files
(379.7 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:d8ac8ec274283f5576b3ef7d830c7b01
|
379.7 kB | Preview Download |
Additional details
Dates
- Accepted
-
2025-04-24