Published August 23, 2023 | Version v1
Preprint Open

CED: Consistent ensemble distillation for audio tagging

  • 1. XIaomi

Description

Augmentation and knowledge distillation (KD) are well-established techniques employed in the realm of audio classification tasks, aimed at enhancing performance and reducing model sizes on the widely recognized Audioset (AS) benchmark. 
Although both techniques are effective individually, their combined use, called consistent teaching, hasn't been explored before. 
This paper proposes CED, a simple training framework that distils student models from large teacher ensembles with consistent teaching.
To achieve this, CED efficiently stores logits as well as the augmentation methods on disk, making it scalable to large-scale datasets.
Central to CED's efficacy is its label-free nature, meaning that only the stored logits are used for the optimization of a student model only requiring 0.3\% additional disk space for AS.
The study trains various transformer-based models, including a 10M parameter model achieving a 49.0 mean average precision (mAP) on AS. 
Pretrained models and code are available here.

Files

logits.zip

Files (2.5 GB)

Name Size Download all
md5:1176ecdbe080494d6816c659dc84951d
341.3 MB Download
md5:55ea264cfe908f35a062db9e64d3562a
343.0 MB Download
md5:9fedb09f881543c48c06f1daaa89b998
38.4 MB Download
md5:eab24de99d10c2af39b381f30eb69d76
38.9 MB Download
md5:aff2802f7741705dd9107658b3922d49
85.8 MB Download
md5:1135356bd7ce7506ab8bf1bf276d24b8
86.6 MB Download
md5:9320a0dfb6c43ec8d4229b996c03e71e
21.7 MB Download
md5:5c10b91d56655e533166df937705423b
22.1 MB Download
md5:2eb13d581d455afdb258f9f7c6497456
1.5 GB Preview Download