Thesis Open Access
Won, Minz Sanghee
Since deep learning showed outstanding performance in the computer vision field, Music Information Retrieval (MIR) researchers also started to adopt these successful models in their research area. Unfortunately, a number of publications are still simply applying deep learning algorithms to any new problems or dataset without understanding their models. For sophisticated model design process, interpreting the architecture and the mechanism of hidden layers became more important, and it resulted in multiple publications to propose methods for investigating learnt information in hidden layers: visualization, auralization, and playlist generation. However, due to the fact that proposed methods are very time-consuming processes, hidden layers still remain a black-box. In this paper, I propose two ideas to investigate hidden layers more efficiently, which are ranking tags and deriving filter importances. With conventional approaches and proposed methods, I investigate latent semantics learnt in hidden layers of deep learning models, particularly Convolutional Neural Networks (CNNs). A prototype experiment was processed with Ballroom dataset and the main experiment was done with Beatport dataset which consists of 15k electronic music. The experimental result reports latent semantics of learnt kernels from pre-trained CNNs for the electronic music genre classification, which was not mainly explored in a deep learning research.