Longitudinal plasma proteomic analysis of 1,117 hospitalized COVID-19 patients identifies features associated with severity and outcomes
Creators
Contributors
Contact person:
Description
Data availability:
Data files are available at ImmPort (immport.org) under accession number SDY1760.
Abstract:
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection is characterized by highly heterogenous manifestations ranging from asymptomatic cases to death for still incompletely understood reasons. As part of the IMmunoPhenotyping Assessment in a COVID-19 Cohort (IMPACC) study, we mapped the plasma proteomes of 1,117 hospitalized coronavirus disease 2019 (COVID-19) patients from 15 hospitals across the USA. Up to 6 samples were collected within ~28 days of hospitalization resulting in one of the largest COVID-19 plasma proteomics cohorts with 2,934 samples. Using perchloric acid to deplete the most abundant plasma proteins allowed for detecting 2,910 proteins. Our findings show that increased levels of neutrophil extracellular trap and heart damage markers are associated with fatal outcomes. Our analysis also identified prognostic biomarkers for worsening severity and death. Our comprehensive longitudinal plasma proteomics study, involving 1,117 participants and 2,934 samples, allowed for testing the generalizability of the findings of many previous COVID-19 plasma proteomics studies using much smaller cohorts.
Data:
MATERIALS & METHODS
Ethics Statement
Ethics NIAID staff conferred with the Department of Health and Human Services Office for Human Research Protections (OHRP) regarding potential applicability of the public health surveillance exception [45CFR46.102(39, 93) to the IMPACC study protocol. OHRP concurred that the study satisfied criteria for the public health surveillance exception, and the IMPACC study team sent the study protocol, and participant information sheet for review, and assessment to institutional review boards (IRBs) at participating institutions. Twelve institutions elected to conduct the study as public health surveillance, while 3 sites with prior IRB-approved biobanking protocols elected to integrate and conduct IMPACC under their institutional protocols (University of Texas at Austin, IRB 2020-04-0117; University of California San Francisco, IRB 20-30497; Case Western Reserve University, IRB STUDY20200573) with informed consent requirements. Participants enrolled under the public health surveillance exclusion were provided information sheets describing the study, samples to be collected, and plans for data de-identification, and use. Those that requested not to participate after reviewing the information sheet were not enrolled. In addition, participants did not receive compensation for study participation while inpatient, and subsequently were offered compensation during outpatient follow-ups(40).
Cohort and Study Design
The cohort and study design of the IMPACC study has been previously published(39, 40). In brief, hospital in-patients 18 years and older admitted to one of the 20 USA hospitals (affiliated with 15 academic institutions) were enrolled in the study within 72 hours of hospital admission. Symptomatic patients with a confirmed positive SARS-CoV-2 PCR test were followed longitudinally for up to 28 days of their hospital stay. Patient outcome was followed for up to 12 months after discharge. The November 2021 data freeze of the clinical data was used for the subsequent analysis.
Cohort Demographics, Timepoints and Data Collected
All study participant data, containing all relevant deidentified variables, was collected using a secure electronic data collection form(40). Plasma samples were collected upon admission at the hospital and up to 5 additional samples were collected during the acute phase in the hospital. contains demographic information, including sex, median age, median BMI, and median symptom onset.
Outcome Categorization
As reported(40), the clinical severity of illness was assessed using a 7-point ordinal scale (OS), adapted from the World Health Organization COVID-19 and NIAID disease ordinal severity scales. The 7-point OS includes, OS1 = Not hospitalized, no limitations; OS2= Not hospitalized, activity limitations or requires home O2; OS3 = Hospitalized, not requiring supplemental O2; OS4 = Hospitalized, requiring O2; OS5 = Hospitalized on non-invasive ventilation or high-flow O2; OS6 = Hospitalized on invasive mechanical ventilation and/or Extracorporeal membrane oxygenation (ECMO); OS7 = Death. The 7-point OS for respiratory status was calculated at each hospitalization time point. Patients were then classified into 5 clinical trajectory groups (TG) based on longitudinal modeling of OS over time. A subset of the full IMPACC cohort was analyzed using plasma proteomics after depletion of the most abundant classical plasma proteins. We used the definition of ‘classical plasma proteins’ from Anderson and Anderson(45) and applied the biochemical depletion method as described in Viode et al(44)).The detailed description of the clinical characteristics of the full IMPACC cohort has been reported(40). These include TG1 (n=230) characterized by a mild respiratory disease and brief hospital stay with a largely uncomplicated hospital course, TG2 (n=272) generally required more respiratory support than TG1 and had a longer hospital stay but were discharged without limitations, TG3 (n=260 patients) was characterized by roughly similar respiratory support requirements and similar length of hospital stay as TG2 but generally had limitations at discharge, TG4 (n=199) generally received more aggressive respiratory support and generally experienced a prolonged hospital stay, and TG5 (n=98) characterized by high respiratory support requirements that progressed to mortality by day 28. For some analyses, TG4 was split into 2 sub-groups: TG4 survivors (TG4-S) and TG4 fatalities (TG4-F). TG4-F includes participants who eventually died within the study but only after the 28-day sampling period.
Sample Preparation
Fifty microliters of neat plasma samples were diluted with 450 µL water and 25 µL of perchloric acid (70%) was added(44). After vigorous agitation, the suspension is kept at -20°C for 15 min. The suspension was centrifuged for 60 min (4°C, 3200 ×g) and the supernatant is kept. The supernatant was mixed with 40 µL of 1% trifluoroacetic acid and loaded onto a µSPE HLB plate (Waters, catalog #186001828BA), pre-conditioned with 300 µL methanol and twice with 500 µL of 0.1% trifluoroacetic acid. Proteins were eluted from the µSPE HLB plate with 100µL 90% acetonitrile 0.1% trifluoroacetic acid. After elution, the samples were dried using a Speedvac. The samples were resuspended with 35µL of 50 mM ammonium bicarbonate and digested with 10µL trypsin (Promega, catalog #V5280, 500 ng) overnight at 37°C. Digestion was stopped by the addition of 5µL 10% formic acid. The samples were stored at -80°C before LC/MS analysis.
Sample MS Data Acquisition
Two microliters of tryptic peptides were loaded onto Evotip and analyzed using an EVOSEP one Liquid Chromatography (EVOSEP) connected to a TIMSTOF Pro (Bruker). The EVOSEP one method was the 60 sample per day (21 min gradient) and the mass spectrometer was operated in DDA-PASEF mode. 4 PASEF MS/MS scans were triggered per cycle. DDA-PASEF parameters were set as follows: m/z range 100-1700, mobility (1/K0) range was set to 0.70-1.45 V.s/cm2, the accumulation and ramp time were 100ms. Target intensity per individual PASEF precursor was set to 5000. The values for mobility-dependent collision energy ramping were set to 51eV at an inversed reduced mobility (1/K0) of 1.45 V.s/cm2 and 21eV at 0.7 V.s/cm2. Collision energies were linearly interpolated between these two 1/K0 values.
Sample Search
The method used for protein identification and quantification is described in van Zalm et al(41). In brief, data were copied to a high-performance computing (HPC) system (https://www.mghpcc.org) for which we wrote a parallelization strategy to facilitate the identification and quantification of proteins in such large LC/MS dataset(41) using MSFragger(94). This parallelization strategy allowed for data analysis, including match between runs, in less than 2 weeks of computing time. The Uniprot human protein sequences without isoforms were combined with the protein sequences of the SARS-CoV-2 virus into a single FASTA file downloaded on March 27th, 2021. Methionine oxidation and protein N-term acetylation were set as variable modifications; no fixed modifications were specified. A maximum of three modifications was allowed during the peptide spectrum matching. A 1% false discovery rate was applied using the Philosopher toolkit(95). IonQuant(96) was used for quantification, which uses MS1 spectra to determine the relative quantification between samples. At least one ion was required for protein quantification.
Statistical Analysis
Statistical analysis was performed using R studio. Protein intensities were normalized using VSN(97) and log2 transformed for further analysis. ClusterProfiler(49) was used for the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis and ReactomePA(98) for Reactome. The heatmap in Figure 3 was done using ComplexHeatmap(99).
To identify longitudinal associations, presented in Figure 4, we tested if proteins kinetics during hospitalization were different across the TGs via a generalized additive model with mixed effects (gamm4 v0.2.6) while controlling for sex and age. Proteins for which the average (intercept in the gamm4 documentation) or shape (smoothing term in the gamm4 documentation) differed between the TGs at FDR<5% were considered significant dysregulated (detailed description in (100)).
Results from Figure 5 were visualized using ClusterProfiler(49): after a paired t-test and Benjamini Hochberg correction, the significant proteins for each trajectory group were submitted to the CompareCluster function of the ClusterProfiler tool.
For the discovery of the prognostic biomarker panels analysis, missing values were imputed protein-wise with half the minimum value for each protein. Data were split between a training and a test cohort. This splitting met the FDA-definition of independence the two cohorts featured samples from two independent sets of hospitals. The samples from the training cohort were collected: University of Arizona (UA)-Tucson, Baylor, Brigham and Women's Hospital (BWH) Boston, Case Western, University of Oklahoma Health Sciences Center (OUHSC), University of California, Los Angeles (UCLA), and Yale. The samples from the test cohort were collected at: Drexel/Tower Health, Emory, University of Florida (UF), Icahn School of Medicine at Mount Sinai (ISMMS), Oregon Health & Science University (OHSU), Stanford, University of California, San Francisco (UCSF), and The University of Texas (UT) Austin.
A Mann-Whitney test was performed, only on the training cohort, between the 2 conditions, i.e., Death vs. Survival or ECMO/invasive mechanical ventilation Yes vs. No. The 20 most significant proteins were further evaluated by performing a stepwise selection applying the Akaike Information Criteria (AIC)(101) to find the best prognostic biomarker panel. Only significant proteins were selected (Supplementary Figure 2). For ROC analysis and AUROC calculation, pROC(102) was used.
Files
Files
(166.7 kB)
Name | Size | Download all |
---|---|---|
md5:d02f0bfbad6e6deb709f88a036a77685
|
166.7 kB | Download |
Additional details
Dates
- Other
-
2024-04-03
Software
- Repository URL
- https://github.com/SteenOmicsLab/COVID19
- Programming language
- RMarkdown
- Development Status
- Inactive