Published April 29, 2024
| Version 1.1.0
Dataset
Restricted
Okey Aura Wake-up Word Dataset
Authors/Creators
- 1. Amazon AGI
- 2. Telefónica Innovación Digital
- 3. Stanford University
- 4. Universitat de Barcelona
Description
Speech dataset for wake-up word (WuW) detection in Telefónica's home assistant, Aura. It contains 1247 utterances (1.4 hours) from ~80 speakers. Speakers pronounce the wake-up word itself "Okey Aura", plus other sentences that might be similar, or not, to "Okey Aura".
This dataset contains rich metadata annotations, so it is possible to study diverse factors and biases that might affect wake-up word detection performance: accent, gender, prosody/emotion, room size, distance to the microphone, etc. Besides, it also contains recordings of sentences that are phonetically similar to "Okey Aura", like "Porque Laura..." or "... como Aura...", to experiment with difficult sentences.
Files
Additional details
Related works
- Is published in
- Journal article: 10.3390/app12041974 (DOI)
Funding
Dates
- Available
-
2024-04-29
References
- Cámbara, G.; López, F.; Bonet, D.; Gómez, P.; Segura, C.; Farrús, M.; Luque, J. TASE: Task-Aware Speech Enhancement for Wake-Up Word Detection in Voice Assistants . Appl. Sci. 2022, 12, 1974. https://doi.org/10.3390/app12041974