MIVIA Speech Command (FELICE Project)
Creators
- Department of Information Engineering, Electrical Engineering, and Applied Mathematics (DIEM) (Hosting institution)1
-
Vento, Mario
(Project leader)1
-
Saggese, Alessia
(Data curator)1
-
Carletti, Vincenzo
(Data curator)1
-
Greco, Antonio
(Contact person)1
-
Ritrovato, Pierluigi
(Project manager)1
-
Rosa, Francesco
(Data collector)1
-
De Simone, Giuseppe
(Data collector)1
Description
The speech command dataset facilitates human-robot vocal communication. It consists of speech commands recorded with a Telegram bot through crowdsourcing and with the microphones equipped by the robot and the adaptive workstation. The dataset also includes synthetic samples produced with text-to-speech services and negative samples that reproduce “normal” speech of workers during their assembly operations. To reproduce the typical noisy environment of the assembly line, an augmentation procedure allows the addition of random noise, collected in real industrial sites, with different SNRs on the voice samples.
Deployment environment:
The dataset includes voice samples recorded by real people with the microphone installed on board the robot and/or the adaptive workstation and/or with the Telegram bot. In addition, synthetic samples are produced with text to speech algorithms. Finally, an automatic augmentation procedure allows the addition of random noise, with variable SNRs, to the voice samples, in order to reproduce different types of industrial noise.
Data acquisition:
The samples are collected with the Telegram bot available at this link: https://t.me/speechcommand_bot. The use of a widespread open-source tool like Telegram allows to collect a large amount of data, from a conspicuous number of people, in a short time. In addition, speech commands have been collected with the microphones installed on board the robot and the adaptive workstation in the CRF use case. Ground truths are double-checked by experts.
MIVIA Speech Command:
The dataset can be split into two parts:
-
Training and Validation Sets: These subsets used for training and validation are available in two versions:
- With synthetic samples: speech_command_dataset_with_synth.zip
- Without synthetic samples: speech_command_dataset_without_synth.zip
-
Test Set: This subset contains only real samples collected from real-world scenarios, specifically within CRF.