CRPIH_UVigo-GL-Voices: Galician TTS dataset
Description
CRPIH_UVigo-GL-Voices is a Galician TTS multi-speaker dataset containing audio recordings from four different speakers (two female and two male voices). The characteristics of each voice are detailed in the table below:
Voice Name | Gender | Speaker | Recording | # Utts | Duration | Sampling Rate | Format |
Sabela | Female | Professional radio broadcaster | Professional studio | 9,999 | 14h 28m | 16kHz/ 44kHz | 16-bit PCM |
Icía | Female | Amateur | Semi-professional studio | 2,950 | 4h 5m | 16kHz/ 96kHz | 16/24-bit PCM |
Iago | Male | Amateur | Radio studio | 1,316 | 1h 13m | 16kHz/ 48kHz | 16-bit PCM |
Paulo | Male | Amateur | Radio studio | 1,316 | 1h 15m | 16kHz | 16-bit PCM |
Each speaker recorded a subset of utterances from a text corpus of 10,000 sentences, with a length between 1 and 44 words. This corpus is mainly composed of press excerpts, but it also contains a small subset of manually designed sentences. The press excerpts were extracted from newspapers published before 2010 ("O Correo Galego", "Galicia Hoxe" and "Vieiros"), whereas the hand-crafted sentences were created at the CRPIH in the year 1999.
The data is organized into folders with each folder corresponding to one of the speakers. Each speaker's folder is composed of the following subdirectories:
- txt → Audio transcripts enconded in ISO 8859-1.
- fon → Phoneme-level forced alignment between phonemic transcriptions (provided by Cotovía) and audio recordings.
- wav_[bit depth]bits_[sampling rate]kHz → WAV format audio files at [sampling-rate]kHz [bit-depth]-bit (see table above).
The file naming convention is as follows: two lowercase elements indicating the creators of the dataset (“crpih_uvigo”), the ISO code for the Galician language (“gl”), the name of the voice (e. g., “sabela”), and a 5-digit number identifying the utterance. All components are separated by underscores (e. g., “crpih_uvigo_gl_sabela_00001.txt”). For some of the speakers, there is an additional element for identifying dataset's splits (e. g., “crpih_uvigo_gl_iago_a_00001.txt” and “crpih_uvigo_gl_iago_b_00001.txt”).
Acknowledgements
We would like to thank the speakers for recording and donating their voices.