Robust Machine Learning predicts COVID-19 Disease Severity based on Single-cell RNA-seq from multiple hospitals
Authors/Creators
- 1. Section of Bioinformatics and Systems Cardiology, Klaus Tschira Institute for Integrative Computational Cardiology, University Hospital Heidelberg, Im Neuenheimer Feld 669, 69120 Heidelberg, Germany
- 2. Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Hannover Medical School, Germany
- 3. Department of Computational Biology for Individualised Medicine, Centre for Individualised Infection Medicine (CiiM) & TWINCORE, joint ventures between the Helmholtz-Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany
Description
Coronavirus disease 2019 (COVID-19) has a highly variable disease severity. Possible associations between peripheral blood signatures and disease severity have been investigated since the emergence of the pandemic. Although there appear to be several signatures identified based on exploratory analyses of single-cell omics data, there are no state-of-the-art validated models to predict COVID-19 severity from comprehensive transcriptome profiling of Peripheral Blood Mononuclear Cells (PBMCs) across multiple sites. In this manuscript, we present a computational workflow based on a Multilayer perceptron (MLP) network that predicts disease severity from PBMCs single-cell RNA-seq data of COVID-19 patients. The study includes patient cohorts from different sites: the University Hospital in Bonn, the Charité in Berlin, the Stanford University COVID-19 Biobanking studies, and three Korean medical centers; all accompanied by their severity status in terms of mechanical ventilation necessity. Training and model validation are performed on randomly selected samples solely from the Berlin and Bonn cohorts, while testing is performed on completely unseen samples from the Stanford and Korean datasets. Our proposed predictive model shows a high area under the receiver operating characteristic (AUROC) curve on the testing datasets (1 (CI:1-1) Korea, 0.86 (CI:0.81-0.9) Stanford), proving our model's robustness. Moreover, we identified a number of features that contributed strongly to the prediction, some of which have been reported previously to have a strong relation with the severity state. In summary, we could show that the expression of 15 genes and the cell proportion profile of 29 PBMC cell types alone are sufficient to distinguish between severe and mild COVID-19 disease states. Our model is publically available at https://github.com/dieterich-lab/ImmunOMICS.
Files
age.csv
Files
(10.7 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:c3329025236e193cd1380bb7f56b5814
|
1.9 kB | Preview Download |
|
md5:fcff2bb6d527f5685ed6b3934a824d4a
|
1.5 GB | Download |
|
md5:9c6f68b63c57331ba727d63262d7ee29
|
467.3 MB | Download |
|
md5:b7058bc5707a37e07485fccb77ae4a12
|
1.2 GB | Download |
|
md5:d3480b3b730cf0ec8b7c4dda347b0827
|
676.9 MB | Download |
|
md5:adfe218c42a7e2b2aadbd86f11bd095c
|
2.3 GB | Download |
|
md5:a3588fe2aa7addd6ce1e7cae7a8dc3cb
|
1.5 kB | Preview Download |
|
md5:46a485a31d5ac97b764ec5888ea7ee94
|
322.3 MB | Download |
|
md5:3f6e4554f51f1111837f74f31a840d9d
|
882.6 MB | Download |
|
md5:868197c3fb74477f96fc9731cb098a48
|
815.4 MB | Download |
|
md5:8cc2276a60a06ffa7753ffb9ec6b81d5
|
2.5 GB | Download |