Published September 22, 2024 | Version v1
Conference paper Open

InDistill: Information flow-preserving knowledge distillation for model compression

  • 1. ROR icon Centre for Research and Technology Hellas
  • 2. Centre for Research and Technology-Hellas

Description

In this paper, we introduce InDistill, a method that serves as a warmup stage for enhancing Knowledge Distillation (KD) effectiveness. InDistill focuses on transferring critical information flow paths from a heavyweight teacher to a lightweight student.  This is achieved through a curriculum learning-based training scheme that considers the distillation difficulty of each layer and the critical learning periods when the information flow paths are established. This procedure can lead to a student model that is better prepared to learn from the teacher. To ensure the applicability of InDistill across a wide range of teacher-student pairs, we also incorporate a pruning operation when there is a discrepancy in the width of the teacher and student layers. This pruning operation reduces the width of the teacher's intermediate layers to match those of the student, allowing direct distillation without the need for an encoding stage. The proposed method is extensively evaluated using various pairs of teacher-student architectures on CIFAR-10, CIFAR-100, and ImageNet datasets showcasing that preserving the information flow paths consistently increases the performance of the baseline KD approaches on both classification and retrieval settings.

Files

InDistill (1).pdf

Files (418.9 kB)

Name Size Download all
md5:a4fe194a20da734e82d3044945aa9704
418.9 kB Preview Download

Additional details

Funding

European Commission
AI4Media - A European Excellence Centre for Media, Society and Democracy 951911
UK Research and Innovation
ELIAS: European Lighthouse of AI for Sustainability 10080425
European Commission
MediaVerse - A universe of media assets and co-creation opportunities at your fingertips 957252