Common Phone: A Multilingual Dataset for Robust Acoustic Modelling

Klumpp, Philipp; Arias-Vergara, Tomás; Pérez-Toro, Paula Andrea; Nöth, Elmar; Orozco-Arroyave, Juan Rafael

doi:10.5281/zenodo.5846137

Published January 17, 2022 | Version 1.0

Dataset Open

Common Phone: A Multilingual Dataset for Robust Acoustic Modelling

1. Friedrich-Alexander-Universität Erlangen-Nürnberg

Release Date: 17.01.22

Welcome to Common Phone 1.0

Legal Information

Common Phone is a subset of the Common Voice corpus collected by Mozilla Corporation. By using Common Phone, you agree to the Common Voice Legal Terms. Common Phone is maintained and distributed by speech researchers at the Pattern Recognition Lab of Friedrich-Alexander-University Erlangen-Nuremberg (FAU) under the CC0 license.

Like for Common Voice, you must not make any attempt to identify speakers that contributed to Common Phone.

About Common Phone

This corpus aims to provide a basis for Machine Learning (ML) researchers and enthusiasts to train and test their models against a wide variety of speakers, hardware/software ecosystems and acoustic conditions to improve generalization and availability of ML in real-world speech applications.
The current version of Common Phone comprises 116,5 hours of speech samples, collected from 11.246 speakers in 6 languages:

Language	Speakers	Hours
	`train` / `dev` / `test`	`train` / `dev` / `test`
English	4716 / 771 / 774	14.1 / 2.3 / 2.3
French	796 / 138 / 135	13.6 / 2.3 / 2.2
German	1176 / 202 / 206	14.5 / 2.5 / 2.6
Italian	1031 / 176 / 178	14.6 / 2.5 / 2.5
Spanish	508 / 88 / 91	16.5 / 3.0 / 3.1
Russian	190 / 34 / 36	12.7 / 2.6 / 2.8
Total	8417 / 1409 / 1420	85.8 / 15.2 / 15.5

Presented train, dev and test splits are not identical to those shipped with Common Voice. Speaker separation among splits was realized by only using those speakers that had provided age and gender information. This information can only be provided as a registered user on the website. When logged in, the session ID of contributed recordings is always linked to your user, thus we could easily link recordings to individual speakers. Keep in mind this would not be possible for unregistered users, as their session ID changes if they decide to contribute more than once.
During speaker selection, we considered that some speakers had contributed to more than one of the six Common Voice datasets (one for each language). In Common Phone, a speaker will only appear in one language.
The dataset is structured as follows:

Six top-level directories, one for each language.
Each language folder contains:
- [train|dev|test].csv files listing audio files, respective speaker ID and plain text transcript.
- meta.csv provides speaker information: age group, gender, language, accent (if available) and which of the three splits this speaker was assigned to. File names match corresponding audio file names except their extension.
- /grids/ contains phonetic transcription for every audio file in Praat TextGrid format.
- /mp3/ contains audio files in mp3, identical to those of Common Voice, e.g., sampling rates have been preserved and may vary for different files.
- /wav/ contains raw audio files in 16 bits/sample, 16 kHz single channel. They had been created from the original mp3 audios. We provide them for convenience, keep in mind that their source had undergone MP3-compression.

Where does the phonetic annotation come from?

Phonetic annotation was computed via BAS Web Services. We used the regular Pipeline (G2P-MAUS) without ASR to create an alignment of text transcripts with audio signals. We chose International Phonetic Alphabet (IPA) output symbols as they work well even in a multi-lingual setup. Common Phone annotation comprises 101 phonetic symbols, including silence.

Why Common Phone?

Large number of speakers and varying acoustic conditions to improve robustness of ML models
Time-aligned IPA phonetic transcription for every audio sample
Gender-balanced and age-group-matched (equal number of female/male speakers in every age group)
Support for six different languages to leverage multi-lingual approaches
Original MP3 files plus standard WAVE files

Is there any publication available?

Yes, a paper describing Common Phone in detail is currently under revision for LREC 2022. You can access a pre-print version on arXiv entitled “Common Phone: A Multilingual Dataset for Robust Acoustic Modelling”.

Files

Common Phone.pdf

Files (13.3 GB)

Name	Size	Download all
Common Phone.pdf md5:c7c99eb0c18696acf7f4d7ef6e811a25	138.5 kB	Preview Download
cp-1-0.tgz md5:2022f16ef7296b9141b275e2288280b9	13.3 GB	Download

Additional details

Klumpp, Philipp et al. (2022); "Common Phone: A Multilingual Dataset for Robust Acoustic Modelling" https://arxiv.org/abs/2201.05912

	All versions	This version
Views	2,206	2,199
Downloads	1,719	1,713
Data volume	4.7 TB	4.7 TB

Common Phone: A Multilingual Dataset for Robust Acoustic Modelling

Creators

Description

Files

Common Phone.pdf

Files (13.3 GB)

Additional details

References