There is a newer version of the record available.

Published November 28, 2020 | Version v1
Dataset Open

Evolution of the SARS-CoV-2 proteome in three dimensions (3D) during the first six months of the COVID-19 pandemic -- Supplementary Tables and Models

  • 1. Rutgers University
  • 2. Grinnell College
  • 3. University of Notre Dame
  • 4. University of Maryland--Baltimore County
  • 5. Stevens Institute of Technology
  • 6. Frostburg State University
  • 7. Youngstown State University
  • 8. University of Central Florida
  • 9. New York City College of Technology
  • 10. Howard University
  • 11. Watchung Hills Regional High School
  • 12. Xavier University
  • 13. Hope College
  • 14. Ursinus College
  • 15. State University of New York--Oswego
  • 16. Roger Williams University
  • 17. Brandeis University
  • 18. University of Puerto Rico--Rio Piedras
  • 19. John Jay College
  • 20. Grand View University
  • 21. Rochester Institute of Technology

Description

Evolution of the SARS-CoV-2 proteome in three dimensions (3D) during the first six months of the COVID-19 pandemic

https://covid-19_proteome_evolution_paper.iqb.rutgers.edu 

 

Legends for Supplementary Figures for 29 SARS-CoV-2 Study Proteins

Separate analysis of protein changes was performed for each study protein and complex. Description below applies to all figures.

A: Grey scale representation of observed frequencies for all USV substitutions of Native Residue (i.e., amino acid type in the reference protein sequence) changing to Substituted Residue for a given protein/complex. Red boxes enclose conservative substitutions for hydrophobic, uncharged polar, positively charged, and negatively charged amino acids, respectively in order from upper left to lower right. Cysteine, Glycine and Proline are excluded from these groupings.

B-D: Normalized Frequency histograms for ΔΔGApp calculated for all USVs for a given protein/complex. These were calculated using three methods, which we refer to as hard-hard (B), soft-hard (C), and soft-soft (D), based on the scoring functions used for sidechain rotamer optimization and gradient-based energy minimization respectively (see methods). All energy values described in the text were obtained using the soft-hard method. Overlay of energy histogram with fitted bi-Gaussian curve (solid red line) and fitted single Gaussian curves for subsets of USVs with surface (green), boundary layer (yellow), or core (blue) substitutions. USVs with multiple substitutions were included in single Gaussian fitting when all substitutions mapped to the same region of the study protein. The data used for fitting includes the energies of all unique protein models produced by a given method, excluding extreme outliers with energy values greater than 3 standard deviations away from the central mean.

E-G: USV Count histograms indicate the number of USVs among the full set for a given protein in which each site included a substitution. Sites are separated by burial layer. Substitutions at sites that are absent from the available crystal structures are excluded from the histograms. In most cases, only a single protein is analyzed, and only panel E is included. In the case of complexes, a separate histogram is provided for each protein in the complex: for methyltransferase nsp10-nsp16, E is nsp10 and F is nsp16; for RDRP nsp12-nsp7-nsp8, E is nsp7, F is nsp8, and G is nsp12.

 

Legends for Supplementary Tables for 29 SARS-CoV-2 Study Proteins

Table: USVs: All identified USVs for a protein/complex. Columns are:

  • date: Date of first collection of a strain with the USV reported to GISAID
  • gisaid_count: The number of sequences in the GISAID database that include the USV
  • id: The GISAID strain identification for the first collected instance of the USV
  • location: The country in which the first strain including the USV was collected
  • substitutions: All substitutions in the USV, in the form [chain]_[sequence][site][substitution], with multiple substitutions separated by semicolons
  • is_in_PDB: whether a substitution is present in the PDB model used to generate the USV structure, with multiple substitutions separated by semicolons
  • multiple: whether more than one amino acid substitution is present in the USV
  • conservative: whether a substitution is conservative, with multiple substitutions separated by semicolons
  • layer: Identification of the burial layer (surface, boundary, or core) of a substitution in the reference structure, with multiple substitutions separated by semicolons and substitutions absent from the PDB excluded
  • sh_rmsd: The RMSD of the USV to the reference structure when modeled using the soft-hard method
  • sh_ddG: The ΔΔGApp of the USV when modeled using the soft-hard method
  • hh_rmsd: The RMSD of the USV to the reference structure when modeled using the hard-hard method
  • hh_ddG: The ΔΔGApp of the USV when modeled using the hard-hard method
  • ss_rmsd: The RMSD of the USV to the reference structure when modeled using the soft-soft method
  • ss_ddG: The ΔΔGApp of the USV when modeled using the soft-soft method

 

Table: Substitutions: All substitutions identified for a protein/complex

  • chain: The chain identifier of the protein in the PDB file in which the substitution is present
  • site: The residue number at which the substitution is present
  • reference: The one-letter amino acid name of the residue in the reference sequence
  • mutant: The one-letter amino acid name of the residue in a USV
  • conservative: Indication of whether a substitution is conservative
  • in_pdb: whether the substitution site is present in the PDB model used to generate the USV structure
  • layer: Identification of the burial layer (surface, boundary, or core) of a substitution in the reference structure
  • date: date: Date of first collection of a strain with the substitution reported to GISAID
  • location: The country in which the first strain including the substitution was collected
  • gisaid_count: The number of sequences in the GISAID database including the substitution
  • usv_count: The number of identified USVs including the substitution
  • ddG: The soft-hard ΔΔGApp of the USV that includes only the substitution, left empty if no single-substitution USV was identified with the substitution
  • single: Indication of whether the substitution was present in a single-substitution USV
  • multiple: Indication of whether the substitution was present in a USV with multiple substitutions
  • associates: List of all other substitutions that were identified in a USV that included the substitution
  • strains: List of all USV-representative GISAID strains that included the substitution, with the single-substitution USV strain listed first if one was available

 

Table: Gaussian Fit Statistics: Fitted models for the energies of all USVs either together (ALL) or by study protein.

  • fit: The number of Gaussian curves in the fitted energy model 
  • protein: The protein/complex name
  • method: The modeling method used to calculate energy values
  • layer: The subset burial layer (surface, boundary, or core) of USVs for which the energy model was fitted, excluding all USVs with substitutions not in that layer
  • μ1: Mean of the first Gaussian in the fitted model
  • σ1: Variance of the first Gaussian in the fitted model
  • wt1: Weight of the first Gaussian in the fitted model
  • μ2: Mean of the second Gaussian in the fitted model
  • σ2: Variance of the second Gaussian in the fitted model
  • wt2: Weight of the second Gaussian in the fitted model
  • R2: R-squared value indicating the goodness of fit

 

Description of Computed Structural Models for Unique Sequence Variants for 29 SARS-CoV-2 Study Proteins.

USV Computed Structural Models. Computed structural models for all amino acid substituted USVs. We are providing the structural models of all study proteins modeled using the soft-hard modeling method (see Methods). Structural models are named according to the GISAID strain identification of the first strain in which the USV was identified, followed by an underscore-separated list of substitutions in the form [chain]_[sequence][site][substitution]. Atomic coordinates for each computed structural model are provided in the legacy Protein Data Bank format used by most molecular graphics software tools (see https://www.wwpdb.org/documentation/file-format-content/format33/v3.3.html for detailed description).

Files

E-protein Substitutions.csv

Files (2.7 GB)

Name Size Download all
md5:6d9b254ac4115d3ee428e29b7d409220
8.3 kB Preview Download
md5:0f82572ccea5e7ef5d93c223e5d9a506
11.1 kB Preview Download
md5:ed1354b91e32440ef068f1b9da6e013a
220.5 kB Preview Download
md5:39b2f22c32e90f8ce96a85b7a9720366
7.4 MB Download
md5:b16e83d426da8fefe3ee1b4c2599a0e9
71.9 kB Download
md5:2d93a0dca4fe7381e411844d23ac1cdc
20.9 kB Preview Download
md5:506e19b4c5ca020487b4e0c96d7741c2
31.1 kB Preview Download
md5:29a1031c41f5ac981d383ce97ddc6edf
246.6 kB Preview Download
md5:975656aa7b4b3b510b6d506e5628e044
14.0 MB Download
md5:83534343d319bf2aa4260a97bab504a6
100.4 kB Preview Download
md5:1546f7df3fbde54469ddbe7204776ff8
122.8 kB Preview Download
md5:6a9d28ed4f40e04c69f0e9280b00ea37
249.7 kB Preview Download
md5:a91b239e805f7215f0e659cbba2a56b1
70.7 MB Download
md5:1c5efb563bf684ea5a8c522a1a4817e1
23.0 kB Preview Download
md5:ed06a11b097c967228087ec7266dd6ca
35.4 kB Preview Download
md5:7aab25f620a1d9e9f55e3a00e79ad984
247.8 kB Preview Download
md5:aa979da33a4b3da9f372d48ed3d2110e
12.5 MB Download
md5:53fc91d04b2400b5283ee32a8d4a07ef
30.6 kB Preview Download
md5:ac186ace1df52206318a7de1a0cf7129
46.8 kB Preview Download
md5:31af6941f9f6d0988364f1fb630ae5f5
40.1 MB Download
md5:534c97a39f380b8a7843c1be776854c3
264.9 kB Preview Download
md5:ada40f3c60e6642a7fcadab62988c2db
29.8 kB Preview Download
md5:950a56691a57a16e9126abe54667cd50
52.3 kB Preview Download
md5:3dd9e45982428757948875573b1e7d43
82.0 kB Preview Download
md5:9453099293b5b254b58efd348aa287de
239.5 kB Preview Download
md5:a0d083a6c408ff0b1a4afccb69cf6969
92.3 MB Download
md5:49732f34bee8c12ca77d4646d7d640a2
39.1 kB Preview Download
md5:d206f937ba6626fc300675813ad2b81e
57.1 kB Preview Download
md5:86e79a5dc70a756d215a16c575dc99d2
264.0 kB Preview Download
md5:1ed0bf5a075591f391e05be83a093403
58.4 MB Download
md5:92f4441469847d59531f172d4757fd7d
35.5 kB Preview Download
md5:eb102c82bb6348d0dd58e3783dd99a00
54.8 kB Preview Download
md5:2963ab4ad75fb1a037aae9bcc7c33d04
253.5 kB Preview Download
md5:4b3220de4426dbd1d5f6b6a64562c90c
194.6 MB Download
md5:51070ac70878b8073cea283ae6d46471
93.0 kB Preview Download
md5:1e6c2542f63ec8a1934c1da7ba2eee23
151.7 kB Preview Download
md5:ea47ca13469df70bf53131ae0114335e
271.3 kB Preview Download
md5:e61235d1c156ae72ffee499374179fc3
173.7 MB Download
md5:70210cb3d99376c4e63dc0c6792a6794
22.7 kB Preview Download
md5:0a13934a5b20e129e85b573ea354cc85
28.4 kB Preview Download
md5:5a3ff7b1c5049e681984e1d9c698e19a
233.8 kB Preview Download
md5:0e01ab1b5bab1f6633b4463b29b78e30
8.0 MB Download
md5:61df84cc94d65a71296cf2861b2017c5
19.0 kB Preview Download
md5:a117d47e0b69d60c45e5c3379bd25d66
27.7 kB Preview Download
md5:0862ef6e5de6d5e58ec5e75fc1b190aa
246.3 kB Preview Download
md5:2fd84d1ec284ac888a17ac49799e1dc5
10.2 MB Download
md5:662173eafad92c81052adf5a00e82f7b
24.4 kB Preview Download
md5:4cb5d349cfb379da113a6f2ff0781047
34.7 kB Preview Download
md5:c430c161eaccd7edae904e471ed6f2b4
249.4 kB Preview Download
md5:98ccb8d1fa8da7ff833fc8f293387de0
20.1 MB Download
md5:666073d02840762e9b57173919abda93
15.6 kB Preview Download
md5:c1a2d193fae9cb87e200b804995b56fa
20.9 kB Preview Download
md5:87b37386a098912b0d73c3f138ef7b1d
225.1 kB Preview Download
md5:9ba44f5e0ed89ca80f2d939c54bfb16e
6.1 MB Download
md5:7a05af6aa8f0c9c504cca0ba4cd101b9
24.1 kB Preview Download
md5:353ecbe6d6d176b4c3aeee3f4337b874
36.9 kB Preview Download
md5:277b1985610fef9c2409b805728c0168
243.7 kB Preview Download
md5:2edba68b58cc9a396e0e7c8409463159
24.0 MB Download
md5:559a8e4cf64c97f0683ed65e25f63172
52.5 kB Preview Download
md5:c74ffbbb03713ef0c38f6ee0ed4a66bf
78.2 kB Preview Download
md5:42e5da05f5058045ce19325624d21ade
266.6 kB Preview Download
md5:b5181224fbb3061f5d89812f612bacc7
102.0 MB Download
md5:affa4899e39bcba386664e369d6b75cf
42.1 kB Preview Download
md5:863f4f900411989a33d33aae9c0783be
65.4 kB Preview Download
md5:46d033b16b0b653bf0a6a60802e25cab
255.7 kB Preview Download
md5:f646a37ea5294f2c6011ce2de6498f26
62.4 MB Download
md5:7a83d07bf3fe18efb2c28d727a347770
24.0 kB Preview Download
md5:b9805bf3c38d4a04287715f23b9dde8c
37.0 kB Preview Download
md5:d6e1858c6c178f6cf80374686927abfe
253.4 kB Preview Download
md5:249bea3084b396858ec1b46c84e960f4
39.9 MB Download
md5:a64f9273fc211f82b7e7187de034f81b
28.7 kB Preview Download
md5:05ae4ad5b4cd3b42b3415f2ed3e1dfcf
43.3 kB Preview Download
md5:b9f26c8d0e8939c80efb5297087ab926
255.2 kB Preview Download
md5:233ac3c62aec89ad0939e15f27c0f3c4
25.8 MB Download
md5:1c6459aa2134f4dafed9b9b1c45bed35
166.0 kB Preview Download
md5:fc6253f69fd274bd4c00cc099728efbe
428.5 MB Download
md5:05f8a5a9df581a6b5bda6923a5783255
263.1 kB Preview Download
md5:2de9e1fc00a7cb35c0229ea3f07c8096
63.5 kB Preview Download
md5:c5f1091def7448f833fc8f318ee42157
110.9 kB Preview Download
md5:aee467984189c0273a4b4cbaa6812622
9.2 kB Preview Download
md5:0e3c5f62f4fd68707ef2ba868941304f
14.0 kB Preview Download
md5:b1323e5d60c567770dd4dc5dfeab2258
236.8 kB Preview Download
md5:fcd1e0e96cf9077c3e869e5941b5f983
5.7 MB Download
md5:6edca2c09badce63363cba469fc50d67
67.9 kB Preview Download
md5:829d9dde4f5dc1294ada7fc39e17b3f6
109.8 kB Preview Download
md5:14aa646fe55e8a19fa139b8df9177320
262.7 kB Preview Download
md5:ba18a4fd721c1db923f78419d82bfb39
84.2 MB Download
md5:dfc119a6afc3c61d05ecded6113bbf47
8.7 kB Preview Download
md5:c4e03fdbe27565d944d3232fbfa91b9d
12.9 kB Preview Download
md5:791a6b3296ed3a0e679f62e59bb39575
217.0 kB Preview Download
md5:d02c5ffaa3188ad0f4d00a2fd6052e20
1.8 MB Download
md5:39b611abf910babb41c640c93ba8a7aa
19.3 kB Preview Download
md5:804148ba13de58988ed2227b79815b5c
30.3 kB Preview Download
md5:d5022c689597dcb1b64ecbe64574c4a0
253.8 kB Preview Download
md5:de70c393643bfdfbe33f03d21cdedeba
7.7 MB Download
md5:0b31bfcdcc9e7d81f85420a9ec04c106
6.3 kB Preview Download
md5:9d84e057e169848f6109049be1719b0b
9.7 kB Preview Download
md5:437b295a3097918a9cc2576d1cd2edf7
217.3 kB Preview Download
md5:3eb7837e91c5899531c1cf25d16a3d33
949.2 kB Download
md5:1558f44b951ce7c958d6fe4a91e03ab4
20.6 kB Preview Download
md5:fd546fff94aabe518459596b12bd8e30
30.5 kB Preview Download
md5:e3c1fe38db7ae189ec3cb583f6ba3a2c
243.9 kB Preview Download
md5:02285551a466675c741a72b97e34fc37
7.2 MB Download
md5:4b71574aa107b5f78cb0382d15ad2570
139.6 kB Preview Download
md5:fdd23c33a72a572ab4fb874937aafac8
129.6 kB Preview Download
md5:127782cd63f1c86a51cdd82771e83a39
218.3 MB Download
md5:16768857099cb82823fbad7191560527
239.3 kB Preview Download
md5:147d78e271f8a94c3b5a7cab7c5119e7
144.3 kB Preview Download
md5:0f602511ad5c1fb0d5ea0a9272b13936
208.5 kB Preview Download
md5:67c9fa7ef6808f90010e289c1fa26f23
269.9 kB Preview Download
md5:ceb8e9c0a4619c10dc4572618c45fdd6
1.0 GB Download