The COUGHVID crowdsourcing dataset: A corpus for the study of large-scale cough analysis algorithms

Lara Orlandic; Tomas Teijeiro; David Atienza

doi:10.5281/zenodo.4048312

Published September 24, 2020 | Version 1.0

Dataset Open

The COUGHVID crowdsourcing dataset: A corpus for the study of large-scale cough analysis algorithms

1. EPFL

Cough audio signal classification has been successfully used to diagnose a variety of respiratory conditions, and there has been significant interest in leveraging Machine Learning (ML) to provide widespread COVID-19 screening. The COUGHVID dataset provides over 20,000 crowdsourced cough recordings representing a wide range of subject ages, genders, geographic locations, and COVID-19 statuses. Furthermore, experienced pulmonologists labeled more than 2,000 recordings to diagnose medical abnormalities present in the coughs, thereby contributing one of the largest expert-labeled cough datasets in existence that can be used for a plethora of cough audio classification tasks. As a result, the COUGHVID dataset contributes a wealth of cough recordings for training ML models to address the world’s most urgent health crises.

Notes

For more information about the data collection, pre-processing, validation, and data structure, please refer to the following publication: https://arxiv.org/abs/2009.11644 The cough pre-processing and feature extraction code is available from the following c4science repository: https://c4science.ch/diffusion/10770/

Files

public_dataset.zip

Files (951.4 MB)

Name	Size
public_dataset.zip md5:5c30a8b00c8bb7783a2c15a48cb8ea9e	951.4 MB	Preview Download

Additional details

European Commission
DeepHealth - Deep-Learning and HPC to Boost Biomedical Applications for Health 825111
Swiss National Science Foundation
ML-edge: Enabling Machine-Learning-Based Health Monitoring in Edge Sensors via Architectural Customization 200020_182009

	All versions	This version
Views	30,297	12,631
Downloads	11,682	5,157
Data volume	28.8 TB	6.6 TB

The COUGHVID crowdsourcing dataset: A corpus for the study of large-scale cough analysis algorithms

Authors/Creators

Description

Notes

Files

public_dataset.zip

Files (951.4 MB)

Additional details

Funding