Introducing the COVID-19 YouTube (COVYT) speech dataset featuring the same speakers with and without infection
Creators
- 1. University of Augsburg
Description
The COVYT dataset contains speech samples from individuals who self-reported their COVID-19 infection on public social media platforms (YouTube, Xiaohongshu). These videos, as well as accompanying videos of the same people prior to infection, were mined in an attempt to gather publicly-available data for COVID-19 research. This release includes the links to the original videos along with the accompanying manual segmentation and diarisation that identifies the utterances of the target individuals. We are additionally releasing features derived from the segmented utterances. Finally, the dataset includes partitioning information according to 4 different cross-validation schemes. See the arxiv pre-print for more details: https://arxiv.org/abs/2206.11045
Files
COVYT.zip
Files
(330.4 MB)
Name | Size | Download all |
---|---|---|
md5:d4008a76ae7f1af5967114f259439fe7
|
330.4 MB | Preview Download |
Additional details
Related works
- Is described by
- Preprint: https://arxiv.org/abs/2206.11045 (URL)