WHISPER SET 1: a dataset for multi-channel, multi-device speech separation and speech enhancement

Enea Ceolini; Ilya Kiselev; Shih-Chii Liu

doi:10.5281/zenodo.3688540

Published February 26, 2020 | Version 2.0

Dataset Open

WHISPER SET 1: a dataset for multi-channel, multi-device speech separation and speech enhancement

1. Institute of Neuroinformatics, UZH and ETH Zurich

This dataset is WHISPER SET 1, a dataset for speech enhancement and source separation recorded with a Wireless Acoustic Sensor Network (WASN) called WHISPER Kiselev2018. The dataset contains samples for up to 4 concurrent speakers and speech in noise. The dataset was recorded in a room with low reverberation (T_60 = 0.2 s) and using 16 microphones. In general, each track contains first a calibration phase where each of the speakers sequentially is active alone for 15 seconds. Followed by 15 seconds of all the speakers together (plus noise in some cases).

If you use this dataset please cite:

E. Ceolini, I. Kiselev and S. Liu, "Evaluating multi-channel multi-device speech separation algorithms in the wild: a hardware-software solution," in IEEE/ACM Transactions on Audio, Speech, and Language Processing.

===

Each sample is a 16-channel wav file in which the order of the channel follows the following logic:

0 - module 5 mic 1 1 - module 5 mic 2 2 - module 5 mic 3 3 - module 5 mic 4 4 - module 6 mic 1 5 - module 6 mic 2 6 - module 6 mic 3 7 - module 6 mic 4 8 - module 7 mic 1 9 - module 7 mic 2 10 - module 7 mic 3 11 - module 7 mic 4 12 - module 8 mic 1 13 - module 8 mic 2 14 - module 8 mic 3 15 - module 8 mic 4

Refer to the floor plan for a visual illustration of the microphone arrangement.

The files are divided into two subfolders, one for the samples of speech enhancement and one for the samples of speech separation.

In the folder of speech separation, the files are divided into subfolders defining the number of speakers in the mixtures (2, 3, or 4)
In the folder of speech enhancement, the files are divided into subfolders following the SNR of the mixture (0, -5, -10 dB)

Samples are ordered in folders. Each sample folder contains a 15 seconds 16-channels mixture.wav file, plus the 15 seconds 16-channels calibX.wav files one for each speaker alone or noise alone in the mixture. That is a sample with a mixture with 4 speakers will have 4 calibration files (calib1.wav, calib2.wav, calib3.wav, calib4.wav) and a mixture of a speaker plus noise will have 2 calibration files one for speech (calib1.wav) and one for noise (calib2.wav).

==

A Jupyter notebook is included to show an example of how to use the data of this dataset for speech separation and speech enhancement using beamforming. The notebook is dependent on this beamforming library and this tool to evaluate the quality of the separation.

==

Refer to the README.md in the dataset for more information.

For any question please contact enea.ceolini@gmail.com

Files

WHISPER_SET_1.zip

Files (3.5 GB)

Name	Size
WHISPER_SET_1.zip md5:6a9a882caaad43932a448d8f5182ac3e	3.5 GB	Preview Download

Additional details

Is derived from: Journal article: 10.1109/TASLP.2020.2989545 (DOI)

European Commission
COCOHA - Cognitive Control of a Hearing Aid 644732
Swiss National Science Foundation
HEAR-EAR 200021_172553

	All versions	This version
Views	3,572	3,031
Downloads	600	541
Data volume	6.7 TB	6.5 TB

WHISPER_SET_1.zip

Files (3.5 GB)

Related works

Funding

WHISPER SET 1: a dataset for multi-channel, multi-device speech separation and speech enhancement

Authors/Creators

Description

Files

WHISPER_SET_1.zip

Files (3.5 GB)

Additional details

Related works

Funding