Planned intervention: On Wednesday June 26th 05:30 UTC Zenodo will be unavailable for 10-20 minutes to perform a storage cluster upgrade.
Published September 30, 2015 | Version v1
Thesis Open

Music remixing using source separation to improve cochlear implant users music perception


  • 1. MTG, Universitat Pompeu Fabra


Music appreciation remains rather poor for many Cochlear Implant (CI) users due to their poor pitch perception. Simple music structures with a clear rhythm/beat are well perceived for CI users. A previous publication which studies the mixing preferences of CI users on vocal western music, shows a significant preference for higher vocals and attenuated background instruments. By re-mixing the music they are able to simplify the signal to make it more suitable for implantees. But the multitrack recordings necessary to generate a re-mix are not always accessible; only mono/stereo pre-mixed audio les are available. In order to overcome this limitation, we propose to use current Source Separation (SS) state-of-the-art techniques to estimate the multitrack recordings. The perceptual studies conducted are focused on studying how the errors/artifacts produced by a SS algorithm, Non-negative Matrix Factorization (NMF), affect the music mixing preferences. These show that when attenuating the background instruments by 6dB, the artifacts/errors present in the vocals are not perceived by CI users. Then, SS can be used to estimate the multitrack. To our knowledge, no previous work exist on trying to simplify classical music for CI users by means of re-mixing. This work shows the influence of the music genre on CI users mixing preferences. We show that CI users with classical musical training have a significant preference for mixing pre-sets that enforce musicological details dicult to encode with CIs (others than beat). However, CI users without classical music training do not show any significant preference, probably due to the lack of music understanding. This work also shows how CI users may not benefit from general mixing pre-sets solutions. Technologies like SS, that allow individual configurations, seem to be the right approach towards a better music appreciation. ii Additionally, we studied a new approach for source signal separation based on deep recurrent neural networks (DRNN). Recently, some researchers successfully used DRNN for singing voice separation from monaural recordings in a supervised setting. A great advantage of this technique, compared to NMF, is that allows similar performance reducing the processing time; which is crucial for CI applications. In this work, we investigated how different theoretically motivated initialization schemes behave when training DRNN for SS. Concluding that if the initialization allows the output activations to be inside the data range, the model is able to find a good local minimum. It is also introduced a theoretically motivated interpretation of why music models (considering neighbouring frames as input vector) do not suer the gradient vanish/explode problem.



Files (15.4 MB)