MIVIA Speech Command (FELICE Project)

Department of Information Engineering, Electrical Engineering, and Applied Mathematics (DIEM); Vento, Mario; Saggese, Alessia; Carletti, Vincenzo; Greco, Antonio; Ritrovato, Pierluigi; Rosa, Francesco; De Simone, Giuseppe

doi:10.5281/zenodo.14771083

Published January 30, 2025 | Version v2

Dataset Open

MIVIA Speech Command (FELICE Project)

1. University of Salerno

The speech command dataset facilitates human-robot vocal communication. It consists of speech commands recorded with a Telegram bot through crowdsourcing and with the microphones equipped by the robot and the adaptive workstation. The dataset also includes synthetic samples produced with text-to-speech services and negative samples that reproduce “normal” speech of workers during their assembly operations. To reproduce the typical noisy environment of the assembly line, an augmentation procedure allows the addition of random noise, collected in real industrial sites, with different SNRs on the voice samples.

Deployment environment:

The dataset includes voice samples recorded by real people with the microphone installed on board the robot and/or the adaptive workstation and/or with the Telegram bot. In addition, synthetic samples are produced with text to speech algorithms. Finally, an automatic augmentation procedure allows the addition of random noise, with variable SNRs, to the voice samples, in order to reproduce different types of industrial noise.

Data acquisition:

The samples are collected with the Telegram bot available at this link: https://t.me/speechcommand_bot. The use of a widespread open-source tool like Telegram allows to collect a large amount of data, from a conspicuous number of people, in a short time. In addition, speech commands have been collected with the microphones installed on board the robot and the adaptive workstation in the CRF use case. Ground truths are double-checked by experts.

MIVIA Speech Command:

The dataset can be split into two parts:

Training and Validation Sets: These subsets used for training and validation are available in two versions:
- With synthetic samples: speech_command_dataset_with_synth.zip
- Without synthetic samples: speech_command_dataset_without_synth.zip
Test Set: This subset contains only real samples collected from real-world scenarios, specifically within CRF.

Files

speech_command_dataset_with_synth.zip

Files (15.2 GB)

Name	Size	Download all
speech_command_dataset_with_synth.zip md5:fe5f36f39af7038d4a79bc6785903fe2	7.0 GB	Preview Download
speech_command_dataset_without_synth.zip md5:fbafb13a4021a4cd7fcfc953783bfda3	7.8 GB	Preview Download
testset.zip md5:2e091287097aa993962585e41fff1b7c	364.3 MB	Preview Download

Additional details

European Commission
FELICE - FlExible assembLy manufacturIng with human-robot Collaboration and digital twin modEls 101017151

	All versions	This version
Views	137	97
Downloads	57	48
Data volume	242.6 GB	242.6 GB

MIVIA Speech Command (FELICE Project)

Creators

Description

Deployment environment:

Data acquisition:

Files

speech_command_dataset_with_synth.zip

Files (15.2 GB)

Additional details

Funding