Librispeech Slakh Unmix (LSX)

Petermann, Darius; Wichern, Gordon; Le Roux, Jonathan

doi:10.5281/zenodo.7765140

Published March 30, 2023 | Version v1

Dataset Open

Librispeech Slakh Unmix (LSX)

1. Indiana University
2. Mitsubishi Electric Research Laboratories (MERL)

Introduction

Librispeech Slakh Unmix (LSX) is a proof of concept source separation dataset for training and testing algorithms that separate a monaural audio signal using hyperbolic embeddings for hierarchical separation. The dataset is composed of artificial mixtures using audio from the librispeech (clean subset) and Slakh2100 datasets. The dataset was introduced in our paper Hyperbolic Audio Source Separation.

At a Glance

The size of the unzipped dataset is ~28GB
Each mixture is 60-s in length and denotes the first 60 s of the bass, drums, and guitar stems of the associated Slakh2100 track.
Audio is encoded as 16 bit wav files at a sampling rate of 16 kHz
The data is split into training tr (1390 mixtues), validation cv (348 mixtures) and testing tt (209 mixtures) subsets
The directory for each mixture contains eight wav files:
- mix.wav the overall mixture from the five child sources
- music_mix.wav the music submix containing guitar, bass, and drums
- speech_mix.wav the speech submix containing both male and female speech signals
- bass.wav original bass submix from slakh track
- drums.wav original drums submix from slakh track
- guitar.wav original guitar submix from slakh track
- speech_male.wav concatenated male speech utterances filling the length of the song
- speech_female.wav concatenated female speech utterances filling the length of the song

Other Resources

Pytorch code for training models along with our hyperbolic separation interface are available here

Citation

If you use LSX in your research, please cite our paper:

@InProceedings{Petermann2023ICASSP_hyper,
  author =   {Petermann, Darius and Wichern, Gordon and Subramanian, Aswin and {Le Roux}, Jonathan},
  title =    {Hyperbolic Audio Source Separation},
  booktitle =    {Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year =     2023,
  month =    jun
}

Copyright and License

The LSX dataset is released under CC-BY-4.0 license.

All data:

Created by Mitsubishi Electric Research Laboratories (MERL), 2022-2023
 
SPDX-License-Identifier: CC-BY-4.0

Files

lsx.zip

Files (21.9 GB)

Name	Size
lsx.zip md5:e584e91d6e89eff009ba58331583ab3d	21.9 GB	Preview Download

	All versions	This version
Views	422	418
Downloads	136	136
Data volume	3.7 TB	3.7 TB

Librispeech Slakh Unmix (LSX)

Authors/Creators

Description

Files

lsx.zip

Files (21.9 GB)