There is a newer version of the record available.

Published November 28, 2020 | Version v2
Dataset Open

Evolution of the SARS-CoV-2 proteome in three dimensions (3D) during the first six months of the COVID-19 pandemic -- Supplementary Tables and Models

  • 1. Rutgers University
  • 2. Grinnell College
  • 3. University of Notre Dame
  • 4. University of Maryland--Baltimore County
  • 5. Stevens Institute of Technology
  • 6. Frostburg State University
  • 7. Youngstown State University
  • 8. University of Central Florida
  • 9. New York City College of Technology
  • 10. Howard University
  • 11. Watchung Hills Regional High School
  • 12. Xavier University
  • 13. Hope College
  • 14. Ursinus College
  • 15. State University of New York--Oswego
  • 16. Roger Williams University
  • 17. Brandeis University
  • 18. University of Puerto Rico--Rio Piedras
  • 19. John Jay College
  • 20. Grand View University
  • 21. Rochester Institute of Technology

Description

Evolution of the SARS-CoV-2 proteome in three dimensions (3D) during the first six months of the COVID-19 pandemic

https://iqb.rutgers.edu/covid-19_proteome_evolution

 

Legends for Supplementary Figures for 29 SARS-CoV-2 Study Proteins

Separate analysis of protein changes was performed for each study protein and complex. Description below applies to all figures.

A: Grey scale representation of observed frequencies for all USV substitutions of Native Residue (i.e., amino acid type in the reference protein sequence) changing to Substituted Residue for a given protein/complex. Red boxes enclose conservative substitutions for hydrophobic, uncharged polar, positively charged, and negatively charged amino acids, respectively in order from upper left to lower right. Cysteine, Glycine and Proline are excluded from these groupings.

B-D: Normalized Frequency histograms for ΔΔGApp calculated for all USVs for a given protein/complex. These were calculated using three methods, which we refer to as hard-hard (B), soft-hard (C), and soft-soft (D), based on the scoring functions used for sidechain rotamer optimization and gradient-based energy minimization respectively (see methods). All energy values described in the text were obtained using the soft-hard method. Overlay of energy histogram with fitted bi-Gaussian curve (solid red line) and fitted single Gaussian curves for subsets of USVs with surface (green), boundary layer (yellow), or core (blue) substitutions. USVs with multiple substitutions were included in single Gaussian fitting when all substitutions mapped to the same region of the study protein. The data used for fitting includes the energies of all unique protein models produced by a given method, excluding extreme outliers with energy values greater than 3 standard deviations away from the central mean.

E-G: USV Count histograms indicate the number of USVs among the full set for a given protein in which each site included a substitution. Sites are separated by burial layer. Substitutions at sites that are absent from the available crystal structures are excluded from the histograms. In most cases, only a single protein is analyzed, and only panel E is included. In the case of complexes, a separate histogram is provided for each protein in the complex: for methyltransferase nsp10-nsp16, E is nsp10 and F is nsp16; for RDRP nsp12-nsp7-nsp8, E is nsp7, F is nsp8, and G is nsp12.

 

Legends for Supplementary Tables for 29 SARS-CoV-2 Study Proteins

Table: USVs: All identified USVs for a protein/complex. Columns are:

  • date: Date of first collection of a strain with the USV reported to GISAID
  • gisaid_count: The number of sequences in the GISAID database that include the USV
  • id: The GISAID strain identification for the first collected instance of the USV
  • location: The country in which the first strain including the USV was collected
  • substitutions: All substitutions in the USV, in the form [chain]_[sequence][site][substitution], with multiple substitutions separated by semicolons
  • is_in_PDB: whether a substitution is present in the PDB model used to generate the USV structure, with multiple substitutions separated by semicolons
  • multiple: whether more than one amino acid substitution is present in the USV
  • conservative: whether a substitution is conservative, with multiple substitutions separated by semicolons
  • layer: Identification of the burial layer (surface, boundary, or core) of a substitution in the reference structure, with multiple substitutions separated by semicolons and substitutions absent from the PDB excluded
  • sh_rmsd: The RMSD of the USV to the reference structure when modeled using the soft-hard method
  • sh_ddG: The ΔΔGApp of the USV when modeled using the soft-hard method
  • hh_rmsd: The RMSD of the USV to the reference structure when modeled using the hard-hard method
  • hh_ddG: The ΔΔGApp of the USV when modeled using the hard-hard method
  • ss_rmsd: The RMSD of the USV to the reference structure when modeled using the soft-soft method
  • ss_ddG: The ΔΔGApp of the USV when modeled using the soft-soft method

 

Table: Substitutions: All substitutions identified for a protein/complex

  • chain: The chain identifier of the protein in the PDB file in which the substitution is present
  • site: The residue number at which the substitution is present
  • reference: The one-letter amino acid name of the residue in the reference sequence
  • mutant: The one-letter amino acid name of the residue in a USV
  • conservative: Indication of whether a substitution is conservative
  • in_pdb: whether the substitution site is present in the PDB model used to generate the USV structure
  • layer: Identification of the burial layer (surface, boundary, or core) of a substitution in the reference structure
  • date: date: Date of first collection of a strain with the substitution reported to GISAID
  • location: The country in which the first strain including the substitution was collected
  • gisaid_count: The number of sequences in the GISAID database including the substitution
  • usv_count: The number of identified USVs including the substitution
  • ddG: The soft-hard ΔΔGApp of the USV that includes only the substitution, left empty if no single-substitution USV was identified with the substitution
  • single: Indication of whether the substitution was present in a single-substitution USV
  • multiple: Indication of whether the substitution was present in a USV with multiple substitutions
  • associates: List of all other substitutions that were identified in a USV that included the substitution
  • strains: List of all USV-representative GISAID strains that included the substitution, with the single-substitution USV strain listed first if one was available

 

Table: Gaussian Fit Statistics: Fitted models for the energies of all USVs either together (ALL) or by study protein.

  • fit: The number of Gaussian curves in the fitted energy model 
  • protein: The protein/complex name
  • method: The modeling method used to calculate energy values
  • layer: The subset burial layer (surface, boundary, or core) of USVs for which the energy model was fitted, excluding all USVs with substitutions not in that layer
  • μ1: Mean of the first Gaussian in the fitted model
  • σ1: Variance of the first Gaussian in the fitted model
  • wt1: Weight of the first Gaussian in the fitted model
  • μ2: Mean of the second Gaussian in the fitted model
  • σ2: Variance of the second Gaussian in the fitted model
  • wt2: Weight of the second Gaussian in the fitted model
  • R2: R-squared value indicating the goodness of fit

 

Description of Computed Structural Models for Unique Sequence Variants for 29 SARS-CoV-2 Study Proteins.

USV Computed Structural Models. Computed structural models for all amino acid substituted USVs. We are providing the structural models of all study proteins modeled using the soft-hard modeling method (see Methods). Structural models are named according to the GISAID strain identification of the first strain in which the USV was identified, followed by an underscore-separated list of substitutions in the form [chain]_[sequence][site][substitution]. Atomic coordinates for each computed structural model are provided in the legacy Protein Data Bank format used by most molecular graphics software tools (see https://www.wwpdb.org/documentation/file-format-content/format33/v3.3.html for detailed description).

Files

E-protein Substitutions.csv

Files (2.8 GB)

Name Size Download all
md5:2da42693d1b356f83ce4a1b0715cba2c
8.4 kB Preview Download
md5:0406ed7e822ae5909e67bbe1f9b7b335
11.3 kB Preview Download
md5:1f0d79dc5c6d11af9b087e28c2bd3a99
226.2 kB Preview Download
md5:39b2f22c32e90f8ce96a85b7a9720366
7.4 MB Download
md5:adb8e9aec90f46c643a16ed790d491cc
71.1 kB Download
md5:5129c0cdeec25eff237ef5e92f9ef560
20.9 kB Preview Download
md5:8fcb1dd73fc64c2f1ed17af90eedeff9
29.8 kB Preview Download
md5:0ae2d5011340cf401ed704258ddd0736
251.0 kB Preview Download
md5:975656aa7b4b3b510b6d506e5628e044
14.0 MB Download
md5:e650319d3b09e0d2cfdbfec2e0cddb7d
99.8 kB Preview Download
md5:f4e2d15887fcbb9bb639b974c96c6b4d
120.0 kB Preview Download
md5:6090cd512c81c18551838009290d98b4
258.6 kB Preview Download
md5:a91b239e805f7215f0e659cbba2a56b1
70.7 MB Download
md5:3016d6ed6f69b1759ca7d4636c7876db
21.3 kB Preview Download
md5:a30becc8e111fa970e08918cb079e581
32.9 kB Preview Download
md5:61cf893b2f51e4f8dca893da5835bf7a
249.1 kB Preview Download
md5:2247132120634c238d4cf9bd36a1cfaa
7.7 MB Download
md5:1f8c454c7f6725b165d9510d7c5f0de1
30.5 kB Preview Download
md5:05cfd7c3f87c2b96bce96c640a2930e7
45.0 kB Preview Download
md5:31af6941f9f6d0988364f1fb630ae5f5
40.1 MB Download
md5:602ef4443fed779514b427fc98ebebd0
271.5 kB Preview Download
md5:ca57c8673c0d17b107ef7e54c5060f07
35.4 kB Preview Download
md5:4062f57c14cf0e0050471b697552f03b
52.0 kB Preview Download
md5:0292e8a0ac5dcf412dd7fe8d3d20ad73
78.8 kB Preview Download
md5:b2da289b91c4200f2e595165efacdc23
247.8 kB Preview Download
md5:a0d083a6c408ff0b1a4afccb69cf6969
92.3 MB Download
md5:fdbe3b60a50f61d5463b2a18438c6b39
39.0 kB Preview Download
md5:8b20ec606cbf0595c38bcda8b2f41326
54.7 kB Preview Download
md5:cd6eacfd8fc9ea25b312f7c5035da807
268.5 kB Preview Download
md5:1ed0bf5a075591f391e05be83a093403
58.4 MB Download
md5:02bf83053a42a0e599da1a434ac1b0c2
35.3 kB Preview Download
md5:f171c50be0d7a5b9f586db5754b978a3
53.0 kB Preview Download
md5:ea56b25e09f95ba271eeada08d5e53eb
258.4 kB Preview Download
md5:4b3220de4426dbd1d5f6b6a64562c90c
194.6 MB Download
md5:f71cbf0e4d2be206d4b198dc0f17792c
92.6 kB Preview Download
md5:13c9ec177bbdf2d163cc4eb97da525c6
145.6 kB Preview Download
md5:8b00c5aac75a0afd1d1012721756e787
278.6 kB Preview Download
md5:e61235d1c156ae72ffee499374179fc3
173.7 MB Download
md5:efb8c5e05dbc4223919060b85a6f381b
8.3 MB Download
md5:2fd84d1ec284ac888a17ac49799e1dc5
10.2 MB Download
md5:98ccb8d1fa8da7ff833fc8f293387de0
20.1 MB Download
md5:666073d02840762e9b57173919abda93
15.6 kB Preview Download
md5:c1a2d193fae9cb87e200b804995b56fa
20.9 kB Preview Download
md5:87b37386a098912b0d73c3f138ef7b1d
225.1 kB Preview Download
md5:4976b54d4e045202137fa841bdf45a31
6.0 MB Download
md5:7a05af6aa8f0c9c504cca0ba4cd101b9
24.1 kB Preview Download
md5:353ecbe6d6d176b4c3aeee3f4337b874
36.9 kB Preview Download
md5:277b1985610fef9c2409b805728c0168
243.7 kB Preview Download
md5:9d1a1ec04d8d5b344b44190f42f3cf27
52.2 kB Preview Download
md5:559a8e4cf64c97f0683ed65e25f63172
52.5 kB Preview Download
md5:2a7e55b96cfab6e69f0fb9b3bfc22f36
75.1 kB Preview Download
md5:c74ffbbb03713ef0c38f6ee0ed4a66bf
78.2 kB Preview Download
md5:42e5da05f5058045ce19325624d21ade
266.6 kB Preview Download
md5:2d0cd52c18b73b8fac933e28757678c5
271.5 kB Preview Download
md5:b5181224fbb3061f5d89812f612bacc7
102.0 MB Download
md5:7ed2d119ffe93666ec822419e1fe4c3a
22.6 kB Preview Download
md5:5e59136ca8afc05a2ef4caf5c0f1079d
32.8 kB Preview Download
md5:728e3f40f61a3d00d12566d530709465
243.5 kB Preview Download
md5:346fcd7b32283eb2014ff8000bf070f3
18.8 kB Preview Download
md5:0203d145e97576e12d68d93d6af7941a
26.8 kB Preview Download
md5:bd2bba2de743a47787eba7dc0f70e0b6
254.2 kB Preview Download
md5:ed75e1778ff020dd2e12e28cc67440e1
24.3 kB Preview Download
md5:c25f594b71bb3b04c657d4a9d3736bc2
33.5 kB Preview Download
md5:432d916200358af46093c94aa53f72cf
255.6 kB Preview Download
md5:3c7008df8280ebf487b2cff0c04151ea
15.3 kB Preview Download
md5:fa805aad510d1df76952f7635ac6132a
23.6 kB Preview Download
md5:e9b3a4cc53e1bd172af743398871ebe6
258.3 kB Preview Download
md5:4bb13d23154c032364f6afd3b2f6a70c
41.8 kB Preview Download
md5:affa4899e39bcba386664e369d6b75cf
42.1 kB Preview Download
md5:6e249cd8fa65f96d4e058cdab6cee5f7
62.8 kB Preview Download
md5:863f4f900411989a33d33aae9c0783be
65.4 kB Preview Download
md5:d21a1b1d4332d47d871a26cc66abb3fe
262.2 kB Preview Download
md5:f646a37ea5294f2c6011ce2de6498f26
62.4 MB Download
md5:45bf5e3aefd83f3ede68e6571f190bc6
23.9 kB Preview Download
md5:7a83d07bf3fe18efb2c28d727a347770
24.0 kB Preview Download
md5:e484377e4e3d92fcf90ddde35e8455a8
35.5 kB Preview Download
md5:b9805bf3c38d4a04287715f23b9dde8c
37.0 kB Preview Download
md5:5e25dfd2e5c655dbb5838ca24b2a76c9
258.7 kB Preview Download
md5:249bea3084b396858ec1b46c84e960f4
39.9 MB Download
md5:a64f9273fc211f82b7e7187de034f81b
28.7 kB Preview Download
md5:b1a4db9fdb67055b43df4b5efb9f608c
28.8 kB Preview Download
md5:05ae4ad5b4cd3b42b3415f2ed3e1dfcf
43.3 kB Preview Download
md5:27a02043bd5291002e3a53d25c3b84eb
44.1 kB Preview Download
md5:309cb7488a7856fa578a2a3679e4a3c1
262.3 kB Preview Download
md5:233ac3c62aec89ad0939e15f27c0f3c4
25.8 MB Download
md5:ca021437997d77785d6eaee70daadd0d
110.1 kB Preview Download
md5:1c6459aa2134f4dafed9b9b1c45bed35
166.0 kB Preview Download
md5:447948ca9ca06ce280f512b4ac5044cf
160.5 kB Preview Download
md5:fc6253f69fd274bd4c00cc099728efbe
428.5 MB Download
md5:6d5f177b4812a7ba6d40216faec53a14
270.2 kB Preview Download
md5:a8a7ebd4772308ce98b503b93149e147
78.5 kB Preview Download
md5:05f8a5a9df581a6b5bda6923a5783255
263.1 kB Preview Download
md5:2de9e1fc00a7cb35c0229ea3f07c8096
63.5 kB Preview Download
md5:c5f1091def7448f833fc8f318ee42157
110.9 kB Preview Download
md5:aee467984189c0273a4b4cbaa6812622
9.2 kB Preview Download
md5:0a8af803f9fbced76ce63606b5b7eb96
9.2 kB Preview Download
md5:0e3c5f62f4fd68707ef2ba868941304f
14.0 kB Preview Download
md5:bc141a4b9398039e3e9d49b82009a465
13.3 kB Preview Download
md5:9717a9eb8a235bfb95fece5ded91ed19
241.8 kB Preview Download
md5:fcd1e0e96cf9077c3e869e5941b5f983
5.7 MB Download
md5:6c6ec5bf02c46363e30b6df294b59539
67.5 kB Preview Download
md5:68cdc09d285c1b3c42a4753788d266c2
106.6 kB Preview Download
md5:71ff21e24570ee846d980440eaa3c332
269.6 kB Preview Download
md5:ba18a4fd721c1db923f78419d82bfb39
84.2 MB Download
md5:ba18a4fd721c1db923f78419d82bfb39
84.2 MB Download
md5:52ea87790ec5962b5372212ce44e5836
8.7 kB Preview Download
md5:f4cd721fe9d512cb46c17389872537c5
12.4 kB Preview Download
md5:b25294ed8bd183e5b22b6fc2f86145d7
222.5 kB Preview Download
md5:d02c5ffaa3188ad0f4d00a2fd6052e20
1.8 MB Download
md5:d02c5ffaa3188ad0f4d00a2fd6052e20
1.8 MB Download
md5:69fd4574c805ca3e6aecdd7d0eeac4a5
17.7 kB Preview Download
md5:6dc525ac665456a56651f2b54cdec270
27.0 kB Preview Download
md5:ffde80665865da0aac4179c318b862f7
238.8 kB Preview Download
md5:de70c393643bfdfbe33f03d21cdedeba
7.7 MB Download
md5:ce1f92d134fc2894f8ef347d348a429d
4.3 MB Download
md5:fa0cc71786ddf30eb0e56203011526b3
6.3 kB Preview Download
md5:00cf400aa00f43a3a1fcb6e386da5cfe
9.2 kB Preview Download
md5:cf142d7f75b7a36828dad206185e022b
222.1 kB Preview Download
md5:3eb7837e91c5899531c1cf25d16a3d33
949.2 kB Download
md5:3eb7837e91c5899531c1cf25d16a3d33
949.2 kB Download
md5:eec4cac728356df553f5857927781392
20.4 kB Preview Download
md5:949287d0f72bab5fd59083730810628d
37.8 kB Preview Download
md5:7bff5d289791de48edef30b5e8d0948b
253.5 kB Preview Download
md5:02285551a466675c741a72b97e34fc37
7.2 MB Download
md5:654640d1103a010adbb66bc84420c8fa
12.9 MB Download
md5:89784e8c77d11c3ffa409f1ac4af5e22
24.0 kB Preview Download
md5:53e0f98c67974802a999dcea266c689d
35.4 kB Preview Download
md5:854cf697df55a71d498522eef78052f5
249.1 kB Preview Download
md5:2edba68b58cc9a396e0e7c8409463159
24.0 MB Download
md5:8f8a414a535a11850bc69030784eb3be
138.4 kB Preview Download
md5:e8aee6583f53a836f8599339176b03a3
128.6 kB Preview Download
md5:127782cd63f1c86a51cdd82771e83a39
218.3 MB Download
md5:16768857099cb82823fbad7191560527
239.3 kB Preview Download
md5:77fd25635f500575da5558bdeb76d071
244.9 kB Preview Download
md5:668ad619958dd732fea30524fd569f4b
143.3 kB Preview Download
md5:6f2c1749b8aa469b8159ff84ddc3f80e
203.4 kB Preview Download
md5:8220a91ff02a9274fdbd4421fbbc7a20
278.8 kB Preview Download
md5:ceb8e9c0a4619c10dc4572618c45fdd6
1.0 GB Download

Additional details

Related works

Is referenced by
Preprint: 10.1101/2020.12.01.406637 (DOI)