WHISPER SET 1: a dataset for multi-channel, multi-device speech separation and speech enhancement
Creators
- 1. Institute of Neuroinformatics, UZH and ETH Zurich
Description
This dataset is WHISPER SET 1,
a dataset for speech enhancement and source separation recorded with a Wireless Acoustic Sensor Network (WASN) called WHISPER Kiselev2018. The dataset contains samples for up to 4 concurrent speakers and speech in noise. The dataset was recorded in a room with low reverberation (T_60 = 0.2 s) and using 16 microphones. In general, each track contains first a calibration phase where each of the speakers sequentially is active alone for 15 seconds. Followed by 15 seconds of all the speakers together (plus noise in some cases).
If you use this dataset please cite:
- E. Ceolini, I. Kiselev and S. Liu, "Evaluating multi-channel multi-device speech separation algorithms in the wild: a hardware-software solution," in IEEE/ACM Transactions on Audio, Speech, and Language Processing.
===
Each sample is a 16-channel wav file in which the order of the channel follows the following logic:
0 - module 5 mic 1 1 - module 5 mic 2 2 - module 5 mic 3 3 - module 5 mic 4 4 - module 6 mic 1 5 - module 6 mic 2 6 - module 6 mic 3 7 - module 6 mic 4 8 - module 7 mic 1 9 - module 7 mic 2 10 - module 7 mic 3 11 - module 7 mic 4 12 - module 8 mic 1 13 - module 8 mic 2 14 - module 8 mic 3 15 - module 8 mic 4
Refer to the floor plan for a visual illustration of the microphone arrangement.
The files are divided into two subfolders, one for the samples of speech enhancement and one for the samples of speech separation.
- In the folder of speech separation, the files are divided into subfolders defining the number of speakers in the mixtures (2, 3, or 4)
- In the folder of speech enhancement, the files are divided into subfolders following the SNR of the mixture (0, -5, -10 dB)
Samples are ordered in folders. Each sample folder contains a 15 seconds 16-channels mixture.wav
file, plus the 15 seconds 16-channels calibX.wav
files one for each speaker alone or noise alone in the mixture. That is a sample with a mixture with 4 speakers will have 4 calibration files (calib1.wav, calib2.wav, calib3.wav, calib4.wav) and a mixture of a speaker plus noise will have 2 calibration files one for speech (calib1.wav) and one for noise (calib2.wav).
==
A Jupyter notebook is included to show an example of how to use the data of this dataset for speech separation and speech enhancement using beamforming. The notebook is dependent on this beamforming library and this tool to evaluate the quality of the separation.
==
Refer to the README.md in the dataset for more information.
For any question please contact enea.ceolini@gmail.com
Files
WHISPER_SET_1.zip
Files
(3.5 GB)
Name | Size | Download all |
---|---|---|
md5:6a9a882caaad43932a448d8f5182ac3e
|
3.5 GB | Preview Download |
Additional details
Related works
- Is derived from
- Journal article: 10.1109/TASLP.2020.2989545 (DOI)