Published October 25, 2024 | Version v1
Conference proceeding Open

Information Dissimilarity Measures in Decentralized Knowledge Distillation: A Comparative Analysis

  • 1. ROR icon Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo"
  • 2. University Of Strathclyde
  • 3. ROR icon St. Andrews University

Description

Knowledge distillation (KD) is a key technique for transferring knowledge from a large, complex ``teacher'' model to a smaller, more efficient ``student'' model. Although initially developed for model compression, it has found applications across various domains due to the benefits of its knowledge transfer mechanism. While Cross Entropy (CE) and Kullback-Leibler (KL) are commonly used in KD, this work investigates the applicability of loss functions based on underexplored information dissimilarity measures, such as Triangular Divergence (TD), Structural Entropic Distance (SED), and Jensen-Shannon Divergence (JS),  for both independent and identically distributed (iid) and non-iid data distributions.
The primary contributions of this study include an empirical evaluation of these dissimilarity measures within a decentralized learning context, i.e., where independent clients collaborate without a central server coordinating the learning process. Additionally, the paper assesses the performance of clients by comparing pairwise distillation averaging among clients to conventional peer-to-peer pairwise distillation. 
Results indicate that while dissimilarity measures perform comparably in iid settings, non-iid distributions favor SED and JS, which also demonstrated consistent performance across clients.

Files

2024_SISAP__Information_Dissimilarity_Measures_in_Decentralized_Knowledge_Distillation.pdf

Additional details

Funding

European Commission
SUN - Social and hUman ceNtered XR 101092612