Planned intervention: On Thursday 19/09 between 05:30-06:30 (UTC), Zenodo will be unavailable because of a scheduled upgrade in our storage cluster.
Published June 23, 2021 | Version v1
Journal article Open

The COUGHVID crowdsourcing dataset, a corpus for the study of large-scale cough analysis algorithms

Description

Cough audio signal classification has been successfully used to diagnose a variety of respiratory conditions, and there has been significant interest in leveraging Machine Learning (ML) to provide widespread COVID-19 screening. The COUGHVID dataset provides over 25,000 crowdsourced cough recordings representing a wide range of participant ages, genders, geographic locations, and COVID-19 statuses. First, we contribute our open-sourced cough detection algorithm to the research community to assist in data robustness assessment. Second, four experienced physicians labeled more than 2,800 recordings to diagnose medical abnormalities present in the coughs, thereby contributing one of the largest expert-labeled cough datasets in existence that can be used for a plethora of cough audio classification tasks. Finally, we ensured that coughs labeled as symptomatic and COVID-19 originate from countries with high infection rates. As a result, the COUGHVID dataset contributes a wealth of cough recordings for training ML models to address the world’s most urgent health crises.

Files

s41597-021-00937-4.pdf

Files (1.8 MB)

Name Size Download all
md5:c15f746d0766b3bebc15d6e321dc00af
1.8 MB Preview Download

Additional details

Related works

Is new version of
Preprint: arXiv:2009.11644 (arXiv)

Funding

DeepHealth – Deep-Learning and HPC to Boost Biomedical Applications for Health 825111
European Commission
ML-edge: Enabling Machine-Learning-Based Health Monitoring in Edge Sensors via Architectural Customization 200020_182009
Swiss National Science Foundation