Sudo rm -rf pre-trained audio source separation models

Efthymios Tzinis

doi:10.5281/zenodo.6299852

Published February 26, 2022 | Version 0.1

Physical object Open

Sudo rm -rf pre-trained audio source separation models

Efthymios Tzinis¹

1. University of Illinois at Urbana-Champaign

Efficient pre-trained models for 8kHz 2-speaker source separation (anechoic and noisy with reverberation). You can see the full code here alongside some basic description of the models' performance and computation requirements github-codebase. You can git-clone the repo and download the pre-trained models under: sudo_rm_rf/pretrained_models

We have also prepared an easy to use example for the pre-trained sudo rm -rf models here python-notebook so you can take all models for a spin 🏎️.. Simply normalize the input audio and infer!

# Load a pretrained model
separation_model = torch.load(anechoic_model_p)

# Normalize the waveform and apply the model
input_mix_std = separation_model.std(-1, keepdim=True)
input_mix_mean = separation_model.mean(-1, keepdim=True)
input_mix = (separation_model - input_mix_mean) / (input_mix_std + 1e-9)

# Apply the model
rec_sources_wavs = separation_model(input_mix.unsqueeze(1))

# Rescale the input sources with the mixture mean and variance
rec_sources_wavs = (rec_sources_wavs * input_mix_std) + input_mix_mean

One of the main points that sudo rm -rf models have brought forward is that focusing only on the reconstruction fidelity performance and ignoring all other computational metrics, such as: execution time and actual memory consumption is an ideal way of wasting resources for getting almost neglidgible performance improvement. To that end, we show that the Sudo rm -rf models can provide a very effective alternative for a range of separation tasks while also being respectful to users who do not have access to immense computational power or researchers who prefer not to train their models for weeks on a multitude of GPUs.

Results on WSJ0-2mix

Results on WHAMR!

Thus, Sudo rm- rf models are able to perform adequately with SOTA and even surpass it in certain cases with minimal computational overhead in terms of both time and memory. Also, the importance of reporting all the above metrics when proposign a new model becomes apparent. We have conducted all the experiments assuming 8kHz sampling rate and 4 seconds of input audio on a server with an NVIDIA GeForce RTX 2080 Ti (11 GBs) and an 12-core Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz. OOM means out of memory for the corresponding configuration. A value of Z ex/sec corresponds to the throughput of each model, in other words, for each second that passes, the model is is capable of processing (either forward or backward pass) Z 32,000 sampled audio files. The attention models, which undoubtly provide the best performance in most of the cases, are extremely heavy in terms of actual time and memory consumption (even if they appear that the number of parameters is rather small). They also become prohibitively expenssive for longer sequencies.



Please cite as:
```BibTex
@inproceedings{tzinis2020sudo,
  title={Sudo rm-rf: Efficient networks for universal audio source separation},
  author={Tzinis, Efthymios and Wang, Zhepei and Smaragdis, Paris},
  booktitle={2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP)},
  pages={1--6},
  year={2020},
  organization={IEEE}
}

@article{tzinis2022compute,
  title={Compute and Memory Efficient Universal Sound Source Separation},
  author={Tzinis, Efthymios and Wang, Zhepei and Jiang, Xilin and Smaragdis, Paris},
  journal={Journal of Signal Processing Systems},
  year={2022},
  volume={94},
  number={2},
  pages={245--259},
  publisher={Springer}
}
```

Files

Files (248.6 MB)

Name	Size	Download all
GroupCom_Sudormrf_U8_Bases512_WSJ02mix.pt md5:a8f681bc4156b72d251c91146ae24e96	2.2 MB	Download
Improved_Sudormrf_U16_Bases2048_WHAMRexclmark.pt md5:e3207a0ab57f8ee563fc76d56e13f595	25.7 MB	Download
Improved_Sudormrf_U16_Bases512_WSJ02mix.pt md5:cfe3109624b4e0937d9dd88009de0559	20.3 MB	Download
Improved_Sudormrf_U36_Bases2048_WSJ02mix.pt md5:93b9e3c7cbe4c73198d078a6d42c608b	93.5 MB	Download
Improved_Sudormrf_U36_Bases4096_WHAMRexclmark.pt md5:b0fe546d4a42594afe99749ccdc89a17	107.0 MB	Download

Additional details

Tzinis, E., Wang, Z. and Smaragdis, P., 2020, September. Sudo rm-rf: Efficient networks for universal audio source separation. In 2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP) (pp. 1-6). IEEE.
E. Tzinis, Z. Wang, X. Jiang, and P. Smaragdis, "Compute and memory efficient universal sound source separation," Journal of Signal Processing Systems, vol. 94, no. 2, pp. 245–259, 2022.

	All versions	This version
Views	498	494
Downloads	1,520	1,510
Data volume	148.3 GB	146.8 GB

Sudo rm -rf pre-trained audio source separation models

Creators

Description

Files

Files (248.6 MB)

Additional details

References