Published December 6, 2022
| Version 1.0.2
Dataset
Open
Voice of America: Ukrainian ASR Dataset of Broadcast Speech
Creators
Contributors
Data collectors:
Description
The dataset is based on public recordings of Voice of America (https://ukrainian.voanews.com) extracted from their videos.
The dataset contains 398 hours of speech.
The dataset is created by the ASR Corpus Creator (https://zenodo.org/record/7396705).
The format of files: WAV with 16 kHz.
Files
1_1.zip
Files
(42.0 GB)
Name | Size | Download all |
---|---|---|
md5:8fc5518a31e4686b16b8f9447de4b3de
|
9.5 GB | Preview Download |
md5:cdb0a8072f11babc9aaf3463563c2cb2
|
5.3 GB | Preview Download |
md5:d47ba6b2bafcc7886899abf7e6ec911f
|
6.9 GB | Preview Download |
md5:462045dedd8f084d9e34de2d65aeb160
|
6.3 GB | Preview Download |
md5:c8e0562572a17113de27e6c4cb779409
|
6.8 GB | Preview Download |
md5:d692cd1629322ec67b233c1802ccc509
|
6.8 GB | Preview Download |
md5:13e486a0fe76ef0b519485bf6d6f4ef1
|
127.2 MB | Download |
md5:920447b4bd9056612dfe6b877083ba89
|
147.6 MB | Download |
Additional details
References
- Speech Recognition for Ukrainian, https://github.com/egorsmkv/speech-recognition-uk
- Smoliakov, Yehor. (2022). ASR Corpus Creator (1.5.1). Zenodo. https://doi.org/10.5281/zenodo.7396705