Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source Separation

Daniel Stoller; Sebastian Ewert; Simon Dixon

doi:10.5281/zenodo.1492417

Published September 23, 2018 | Version v1

Conference paper Open

Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source Separation

Models for audio source separation usually operate on the magnitude spectrum, which ignores phase information and makes separation performance dependant on hyperparameters for the spectral front-end. Therefore, we investigate end-to-end source separation in the time-domain, which allows modelling phase information and avoids fixed spectral transformations. Due to high sampling rates for audio, employing a long temporal input context on the sample level is difficult, but required for high quality separation results because of long-range temporal correlations. In this context, we propose the Wave-U-Net, an adaptation of the U-Net to the one-dimensional time domain, which repeatedly resamples feature maps to compute and combine features at different time scales. We introduce further architectural improvements, including an output layer that enforces source additivity, an upsampling technique and a context-aware prediction framework to reduce output artifacts. Experiments for singing voice separation indicate that our architecture yields a performance comparable to a stateof-the-art spectrogram-based U-Net architecture, given the same data. Finally, we reveal a problem with outliers in the currently used SDR evaluation metrics and suggest reporting rank-based statistics to alleviate this problem.

Files

205_Paper.pdf

Files (606.6 kB)

Name	Size	Download all
205_Paper.pdf md5:a1e146c272e9a5daebc854f120531a09	606.6 kB	Preview Download

417

Views

240

Downloads

Show more details

	All versions	This version
Views	417	415
Downloads	240	238
Data volume	160.1 MB	158.9 MB

More info on how stats are collected....

DOI

Resource type

Conference paper

Publisher

ISMIR

Imprint

Proceedings of the 19th International Society for Music Information Retrieval Conference, 334-340. Paris, France.

Conference

International Society for Music Information Retrieval Conference (ISMIR 2018) , Paris, France, September 23-27, 2018

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: November 20, 2018
Modified: August 2, 2024

Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source Separation

Creators

Description

Files

205_Paper.pdf

Files (606.6 kB)