Introducing the COVID-19 YouTube (COVYT) speech dataset featuring the same speakers with and without infection

Andreas Triantafyllopoulos; Anastasia Semertzidou; Meishu Song; Florian B. Pokorny; Björn W. Schuller

doi:10.5281/zenodo.6962930

Published September 1, 2022 | Version 1.0.0

Dataset Open

Introducing the COVID-19 YouTube (COVYT) speech dataset featuring the same speakers with and without infection

1. University of Augsburg

The COVYT dataset contains speech samples from individuals who self-reported their COVID-19 infection on public social media platforms (YouTube, Xiaohongshu). These videos, as well as accompanying videos of the same people prior to infection, were mined in an attempt to gather publicly-available data for COVID-19 research. This release includes the links to the original videos along with the accompanying manual segmentation and diarisation that identifies the utterances of the target individuals. We are additionally releasing features derived from the segmented utterances. Finally, the dataset includes partitioning information according to 4 different cross-validation schemes. See the arxiv pre-print for more details: https://arxiv.org/abs/2206.11045

Files

COVYT.zip

Files (330.4 MB)

Name	Size
COVYT.zip md5:d4008a76ae7f1af5967114f259439fe7	330.4 MB	Preview Download

Additional details

Is described by: Preprint: https://arxiv.org/abs/2206.11045 (URL)

European Commission
sustAGE - Smart environments for person-centered sustainable work and well-being 826506

	All versions	This version
Views	362	362
Downloads	68	68
Data volume	27.4 GB	27.4 GB

COVYT.zip

Files (330.4 MB)

Related works

Funding

Introducing the COVID-19 YouTube (COVYT) speech dataset featuring the same speakers with and without infection

Authors/Creators

Description

Files

COVYT.zip

Files (330.4 MB)

Additional details

Related works

Funding