Thesis Open Access
Severe hearing loss problems that some people suffer from can be treated by providing them with a surgically implanted electrical device called cochlear implant (CI). These devices perform well in the context of speech intelligibility but still struggle when it comes to representing more complex audio signals such as music. However, previous studies show that CI recipients find music more enjoyable when enhancing the vocals with respect to the background music. In this thesis source separation (SS) algorithms are used to remix music multi-tracks by applying gain to the lead singing vocal.
This work proposes deep convolutional auto-encoders (DCAEs), a deep recurrent neural network (DRNN), a multilayer perceptron (MLP) and non-negative matrix factorization (NMF) to be evaluated objectively and subjectively through two different perceptual experiments involving normal hearing (NH) subjects and CI recipients. The evaluation assesses the relevance of the artifacts introduced by the SS algorithms considering their degree of complexity, as this study will try to propose one of the algorithms for real-time implementation. Moreover, this work presents a benchmark which relates the measured distortions as a function of the observed preference ratings on CI subjects. Objective results based on the source to distortion ratio (SDR) and source to artifacts ratio (SAR) show that the DCAEs outperform only when presented with data similar to the one used for training, on the other hand, the MLP performs in a consistent way throughout the tested data obtaining similar performance as the DRNN while reducing algorithmic complexity.
Using the benchmark, next to a MUSHRA test we propose an MLP for real-time audio SS.