InDistill: Information flow-preserving knowledge distillation for model compression
Creators
Description
In this paper, we introduce InDistill, a method that serves as a warmup stage for enhancing Knowledge Distillation (KD) effectiveness. InDistill focuses on transferring critical information flow paths from a heavyweight teacher to a lightweight student. This is achieved through a curriculum learning-based training scheme that considers the distillation difficulty of each layer and the critical learning periods when the information flow paths are established. This procedure can lead to a student model that is better prepared to learn from the teacher. To ensure the applicability of InDistill across a wide range of teacher-student pairs, we also incorporate a pruning operation when there is a discrepancy in the width of the teacher and student layers. This pruning operation reduces the width of the teacher's intermediate layers to match those of the student, allowing direct distillation without the need for an encoding stage. The proposed method is extensively evaluated using various pairs of teacher-student architectures on CIFAR-10, CIFAR-100, and ImageNet datasets showcasing that preserving the information flow paths consistently increases the performance of the baseline KD approaches on both classification and retrieval settings.
Files
      
        InDistill (1).pdf
        
      
    
    
      
        Files
         (418.9 kB)
        
      
    
    | Name | Size | Download all | 
|---|---|---|
| 
            
            md5:a4fe194a20da734e82d3044945aa9704
             | 
          418.9 kB | Preview Download | 
Additional details
Funding
- European Commission
 - AI4Media - A European Excellence Centre for Media, Society and Democracy 951911
 - UK Research and Innovation
 - ELIAS: European Lighthouse of AI for Sustainability 10080425
 - European Commission
 - MediaVerse - A universe of media assets and co-creation opportunities at your fingertips 957252