Thesis Open Access
Pablo Alonso; Dmitry Bogdanov
Deep learning models have recently led to significant improvements in a wide variety of tasks. Known as being a very powerful tool capable of generalizing better than traditional machine learning approaches, deep learning models still heavily rely on large quantities of annotated data. As the field of music information retrieval is still subject to data sparsity, automatic music classification remains a challenging problem and numerous models fail at generalizing to out-of-distribution music col-lections. This project investigates possible directions to follow in order to improve the generalization capacity of deep learning music classifiers. More specifically, we suggest a set of guidelines to be followed in order to address the generalization problem of music classifiers trained on very small datasets. We first propose ways to maximize the amount of information extracted from small datasets through outliers detection and eÿcient audio data augmentation. We then show that considering the amount of perceptual ambiguity of each classification task through label smoothing can help obtaining more generalizable classification bounds. We also highlight the impact label noise can have in a small dataset setting and explore ways to improve the model’s robustness. Finally, we argue that leveraging common knowledge among related classification tasks can result in a more generalizable internal representation learned by the model. To illustrate this assumption, we employ a simple multi-task learning architecture to jointly learn pairs of tasks, and list other interesting axes to be further explored in that direction. All the suggested approaches are exper-imentally assessed on two state-of-the-art CNN architectures for automatic music classification. They all lead to consistent improvements over baseline models and unveil new relevant questions to rethink the task of automatic music classification.