Published February 26, 2020 | Version 2.0
Dataset Open

WHISPER SET 1: a dataset for multi-channel, multi-device speech separation and speech enhancement

  • 1. Institute of Neuroinformatics, UZH and ETH Zurich

Description

This dataset is  WHISPER SET 1, a dataset for speech enhancement and source separation recorded with a Wireless Acoustic Sensor Network (WASN) called WHISPER Kiselev2018. The dataset contains samples for up to 4 concurrent speakers and speech in noise. The dataset was recorded in a room with low reverberation (T_60 = 0.2 s) and using 16 microphones. In general, each track contains first a calibration phase where each of the speakers sequentially is active alone for 15 seconds. Followed by 15 seconds of all the speakers together (plus noise in some cases). 

If you use this dataset please cite:

  • E. Ceolini, I. Kiselev and S. Liu, "Evaluating multi-channel multi-device speech separation algorithms in the wild: a hardware-software solution," in IEEE/ACM Transactions on Audio, Speech, and Language Processing.

===

Each sample is a 16-channel wav file in which the order of the channel follows the following logic:

0 - module 5 mic 1 1 - module 5 mic 2 2 - module 5 mic 3 3 - module 5 mic 4 4 - module 6 mic 1 5 - module 6 mic 2 6 - module 6 mic 3 7 - module 6 mic 4 8 - module 7 mic 1 9 - module 7 mic 2 10 - module 7 mic 3 11 - module 7 mic 4 12 - module 8 mic 1 13 - module 8 mic 2 14 - module 8 mic 3 15 - module 8 mic 4

Refer to the floor plan for a visual illustration of the microphone arrangement.

The files are divided into two subfolders, one for the samples of speech enhancement and one for the samples of speech separation.

  • In the folder of speech separation, the files are divided into subfolders defining the number of speakers in the mixtures (2, 3, or 4)
  • In the folder of speech enhancement, the files are divided into subfolders following the SNR of the mixture (0, -5, -10 dB)

Samples are ordered in folders. Each sample folder contains a 15 seconds 16-channels mixture.wav file, plus the 15 seconds 16-channels calibX.wav files one for each speaker alone or noise alone in the mixture. That is a sample with a mixture with 4 speakers will have 4 calibration files (calib1.wav, calib2.wav, calib3.wav, calib4.wav) and a mixture of a speaker plus noise will have 2 calibration files one for speech (calib1.wav) and one for noise (calib2.wav).

== 

A Jupyter notebook is included to show an example of how to use the data of this dataset for speech separation and speech enhancement using beamforming. The notebook is dependent on this beamforming library and this tool to evaluate the quality of the separation.

==

Refer to the README.md in the dataset for more information.

For any question please contact enea.ceolini@gmail.com

Files

WHISPER_SET_1.zip

Files (3.5 GB)

Name Size Download all
md5:6a9a882caaad43932a448d8f5182ac3e
3.5 GB Preview Download

Additional details

Related works

Is derived from
Journal article: 10.1109/TASLP.2020.2989545 (DOI)

Funding

COCOHA – Cognitive Control of a Hearing Aid 644732
European Commission
HEAR-EAR 200021_172553
Swiss National Science Foundation