Published April 24, 2025 | Version v1
Publication Open

Combining Relevance and Magnitude for Resource-saving DNN Pruning

  • 1. ROR icon Polytechnic University of Turin
  • 2. Politecnico di Torino

Description

Pruning neural networks, i.e., removing some of their parameters whilst retaining their accuracy, is one of the main ways to reduce the latency of a machine learn- ing pipeline, especially in resource- and/or bandwidth-constrained scenarios. In this context, the pruning tech- nique, i.e., how to choose the parameters to remove, is crit- ical to the system performance. In this paper, we propose a novel pruning approach, called FlexRel and predicated upon combining training-time and inference-time infor- mation, namely, parameter magnitude and relevance, in order to improve the resulting accuracy whilst saving both computational resources and bandwidth. Our performance evaluation shows that FlexRel is able to achieve higher pruning factors, saving over 35% bandwidth for typical accuracy targets.

Files

Combining_Relevance_and_Magnitude_for_Resource-saving_DNN_Pruning.pdf

Files (379.7 kB)

Additional details

Dates

Accepted
2025-04-24