Published July 15, 2022 | Version v5
Journal article Open

Robust Machine Learning predicts COVID-19 Disease Severity based on Single-cell RNA-seq from multiple hospitals

  • 1. Section of Bioinformatics and Systems Cardiology, Klaus Tschira Institute for Integrative Computational Cardiology, University Hospital Heidelberg, Im Neuenheimer Feld 669, 69120 Heidelberg, Germany
  • 2. Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Hannover Medical School, Germany
  • 3. Department of Computational Biology for Individualised Medicine, Centre for Individualised Infection Medicine (CiiM) & TWINCORE, joint ventures between the Helmholtz-Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany

Description

Coronavirus disease 2019 (COVID-19) has a highly variable disease severity. Possible associations between peripheral blood signatures and disease severity have been investigated since the emergence of the pandemic. Although there appear to be several signatures identified based on exploratory analyses of single-cell omics data, there are no state-of-the-art validated models to predict COVID-19 severity from comprehensive transcriptome profiling of Peripheral Blood Mononuclear Cells (PBMCs) across multiple sites. In this manuscript, we present a computational workflow based on a Multilayer perceptron (MLP) network that predicts disease severity from PBMCs single-cell RNA-seq data of COVID-19 patients. The study includes patient cohorts from different sites: the University Hospital in Bonn, the Charité in Berlin, the Stanford University COVID-19 Biobanking studies, and three Korean medical centers; all accompanied by their severity status in terms of mechanical ventilation necessity. Training and model validation are performed on randomly selected samples solely from the Berlin and Bonn cohorts, while testing is performed on completely unseen samples from the Stanford and Korean datasets. Our proposed predictive model shows a high area under the receiver operating characteristic (AUROC) curve on the testing datasets (1 (CI:1-1)  Korea, 0.86 (CI:0.81-0.9) Stanford), proving our model's robustness. Moreover, we identified a number of features that contributed strongly to the prediction, some of which have been reported previously to have a strong relation with the severity state. In summary, we could show that the expression of 15 genes and the cell proportion profile of 29 PBMC cell types alone are sufficient to distinguish between severe and mild COVID-19 disease states. Our model is publically available at https://github.com/dieterich-lab/ImmunOMICS.

Files

age.csv

Files (10.7 GB)

Name Size Download all
md5:c3329025236e193cd1380bb7f56b5814
1.9 kB Preview Download
md5:fcff2bb6d527f5685ed6b3934a824d4a
1.5 GB Download
md5:9c6f68b63c57331ba727d63262d7ee29
467.3 MB Download
md5:b7058bc5707a37e07485fccb77ae4a12
1.2 GB Download
md5:d3480b3b730cf0ec8b7c4dda347b0827
676.9 MB Download
md5:adfe218c42a7e2b2aadbd86f11bd095c
2.3 GB Download
md5:a3588fe2aa7addd6ce1e7cae7a8dc3cb
1.5 kB Preview Download
md5:46a485a31d5ac97b764ec5888ea7ee94
322.3 MB Download
md5:3f6e4554f51f1111837f74f31a840d9d
882.6 MB Download
md5:868197c3fb74477f96fc9731cb098a48
815.4 MB Download
md5:8cc2276a60a06ffa7753ffb9ec6b81d5
2.5 GB Download