UPDATE: Zenodo migration postponed to Oct 13 from 06:00-08:00 UTC. Read the announcement.

Journal article Open Access

The COUGHVID crowdsourcing dataset, a corpus for the study of large-scale cough analysis algorithms

Orlandic, Lara; Teijeiro, Tomas; Atienza, David

Cough audio signal classification has been successfully used to diagnose a variety of respiratory conditions, and there has been significant interest in leveraging Machine Learning (ML) to provide widespread COVID-19 screening. The COUGHVID dataset provides over 25,000 crowdsourced cough recordings representing a wide range of participant ages, genders, geographic locations, and COVID-19 statuses. First, we contribute our open-sourced cough detection algorithm to the research community to assist in data robustness assessment. Second, four experienced physicians labeled more than 2,800 recordings to diagnose medical abnormalities present in the coughs, thereby contributing one of the largest expert-labeled cough datasets in existence that can be used for a plethora of cough audio classification tasks. Finally, we ensured that coughs labeled as symptomatic and COVID-19 originate from countries with high infection rates. As a result, the COUGHVID dataset contributes a wealth of cough recordings for training ML models to address the world’s most urgent health crises.

Files (1.8 MB)
Name Size
1.8 MB Download
Views 74
Downloads 111
Data volume 202.4 MB
Unique views 69
Unique downloads 102


Cite as