Published January 3, 2024 | Version 1.0.0
Software documentation Open

TBscreen: A Passive Cough Classifier for Tuberculosis Screening with a Controlled Dataset

Description

Abstract

Recent respiratory disease screening studies suggest promising performance of cough classifiers, but potential biases in model training and dataset quality preclude robust conclusions. To examine tuberculosis (TB) cough diagnostic features, we enrolled subjects with pulmonary TB (= 149) and controls with other respiratory illnesses (= 46) in Nairobi. We collected a dataset with 33,000 passive coughs and 1600 forced coughs in a controlled setting with similar demographics. We trained a ResNet18-based cough classifier using images of passive cough scalogram as input and obtained a fivefold cross-validation sensitivity of 0.70 (±0.11 SD). The smartphone-based model had better performance in subjects with higher bacterial load {receiver operating characteristic–area under the curve (ROC-AUC): 0.87 [95% confidence interval (CI): 0.87 to 0.88], < 0.001} or lung cavities [ROC-AUC: 0.89 (95% CI: 0.88 to 0.89), < 0.001]. Overall, our data suggest that passive cough features distinguish TB from non-TB subjects and are associated with bacterial burden and disease severity.

 

Code

The code to generate features, load dataset, mauscript results is provided. It has following strucure:

├── audio_preprocessing
    ├── cough_processing.py
    ├── generate_melspec.py
    ├── generate_vggish.py
    ├── generate_wavelet.py
    ├── generate_waveletImages.py
    └── wavelet_to_image_params
├── dataset_loaders
    ├── datasetload_spectralfeatures.py
    └── datasetload_waveletImage.py
├── Figures
    ├── datacsv
        ├── folds
        ├── result_baseline
        ├── result_scalogram
        └── result_tb_multiclass
    ├── R_scripts
        ├── ci_bootstrap.R
        └── ROC_CI.R
    ├── Binary_model_metrics.ipynb
    ├── Fig2_coughCounts.py
    ├── Fig4_ROC.py
    ├── SupplementalFig2_ROC.ipynb
    ├── SupplementalFig3a.ipynb
    ├── SupplementalFig3b.ipynb
    └── SupplementalFig3c.ipynb
├── train_scripts
    ├── earlystopping.py
    ├── prediction.py
    ├── train_tb.py
    └── train_tb_multiclass.py
├── utils
    ├── audioset
        ├── mel_features.py
        ├── vggish_input.py
        ├── vggish_params.py
        ├── vggish_postprocess.py
        └── vggish_slim.py
    ├── model
        ├── Modelresnet.py
        ├── pytorch_vggish.pth
        └── vggish.py
    └── TB_multiclass_foldsGen.ipynb    
├── predict_k_fold_scalogram.py
├── train_k_fold_scalogram.py
├── train_k_fold_scalogram_multiclass.py
├── train_k_fold_mspec.py
├── train_k_fold_vggish.py
├── requirements
└── README.md
    


Files in audio_preprocessing folder are used to clean dataset and generate various spectral features. Files in Scalogram_numpy provided in the Nairobi dataset can be converted to images using generate_waveletImages.py. 

Script to load and transform dataset is provided in dataset_loaders

Figures folder contains csv files of various results and fold specific dataset csvs. It additionally contains scripts and notebooks to generate manuscript results.

Some of the analysis (95% CI interval, comparing ROCs) were perfomed in R and the scripts are available in R_scripts folder.

Train scripts contains functions to support training and validation.

 

Dataset

The accompanying dataset can be downloaded using the link: https://tbscreen.s3.amazonaws.com/TBscreen_Dataset.zip. Download size: 395.4 GB. 

The dataset consists of two sets of cough - passive (natural) coughs and forced (voluntary) coughs. Coughs have been recorded using three devices: smartphone(pixel), boundary microphone(codec), and high end condenser microphone (yeti). Audio files were annotated by human annotators at the University of Washington using Audacity software. Coughs with background noise such as fan, door, speech, or other respiratory sounds like a sneeze or clearing of nose/throat were discarded. Each cough sound was processed to have a fixed length of 1 second and stored as wave file. Recordings greater than a second were divided into multiple audio files and audios with length less than 1 second were centered and padded with zeroes to make them one second long. Scalograms are generated using complex Morlet transform. Mel spectrogram features are generated using Pytorch's torchaudio and additionally using vggish embeddings. These features are stored as numpy files. Some cough audio files have been removed based on subject consent. Cough features of all the coughs are shared. Metadata.csv contains subject details, symptoms reported by subjects, and various test results related to TB. Column names and associated key is provided in Dataset_key.csv. Passive_coughs.csv and Forced_coughs.csv contains path to files in Passive coughs and Forced coughs along with subject metadata. Raw audio (.wav) files exist only for rows that have Permission_sound =='yes'. The downloaded folder expands to following structure on unzipping:

├── Forced_Coughs
    ├── Audio_files
    ├── Scalogram_numpy
    ├── Melspectrogram_numpy
        ├── Vggish_embeddings
        ├── pytorch_embeddings
    └── Forced_coughs.csv
├── Passive Coughs
    ├── Audio_files
    ├── Scalogram_numpy
    ├── Melspectrogram_numpy
        ├── Vggish_embeddings
        ├── pytorch_embeddings
    └── Passive_coughs.csv
├── Metadata.csv
├── Dataset_key.csv
└── README.md
    

 

Citation

Cite "Sharma et. al. TBscreen: A Passive Cough Classifier for Tuberculosis Screening with a Controlled Dataset. Science Advances. 2024" for any use of code or dataset.

Files

TBscreen_code.zip

Files (284.2 MB)

Name Size Download all
md5:2ea000c8cf093d15fe443e9ac57f2c9e
284.2 MB Preview Download