Other Open Access

Binaural Source Separation with Convolutional Neural Networks

Erruz, Gerard

Thesis supervisor(s)

Miron, Marius; Garriga, Adan

This work is a study on source separation techniques for binaural music mixtures. The chosen framework uses a Convolutional Neural Network (CNN) to estimate time-frequency soft masks. This masks are used to extract the different sources from the original two-channel mixture signal. Its baseline single-channel architecture performed state-of-the-art results on monaural music mixtures under low-latency conditions. It has been extended to perform separation in two-channel signals, being the first two-channel CNN joint estimation architecture. This means that filters are learned for each source by taking in account both channels information. Furthermore, a specific binaural condition is included during training stage. It uses Interaural Level Difference (ILD) information to improve spatial images of extracted sources. Concurrently, we present a novel tool to create binaural scenes for testing purposes. Multiple binaural scenes are rendered from a music dataset of four instruments (voice, drums, bass and others). The CNN framework have been tested for these binaural scenes and compared with monaural and stereo results. The system showed a great amount of adaptability and good separation results in all the scenarios. These results are used to evaluate spatial information impact on separation performance.

Files (4.6 MB)
Name Size
4.6 MB Download
All versions This version
Views 221221
Downloads 592592
Data volume 2.7 GB2.7 GB
Unique views 210210
Unique downloads 554554


Cite as