Published May 8, 2020 | Version 0.1
Dataset Open

All atom simulations snapshots and contact maps analysis scripts for SARS-CoV-2002 and SARS-CoV-2 spike proteins with and without ACE2 enzyme

  • 1. Institute of Fundamental Technological Research, Polish Academy of Sciences, Pawinskiego 5B, 02-106 Warsaw, Poland
  • 2. Institute of Physics, Polish Academy of Sciences, al. Lotnikow 32/46, 02-668 Warasw, Poland
  • 3. Department of Chemistry, The College of New Jersey, 2000 Pennington Road, Ewing, NJ 08628, United States
  • 4. Jožef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia

Description

The dataset contains a total of 40 snapshots of the four trajectories (10 snapshots each system = two per replica x 5 replicas/system):

  1. SARS-CoV-2002 spike protein without ACE2
  2. SARS-CoV-2 spike protein without ACE2
  3. SARS-CoV-2002 spike protein with ACE2
  4. SARS-CoV-2 spike protein with ACE2

Molecular dynamics simulation trajectories (320ns each) have been performed using the Amber ff14SB force field running with the Amber18 package at the the NSF-funded (OAC-1826915, OAC-1828163) ELSA high performance computing cluster at The College of New Jersey. Under the following simulation methodology:

All-atom simulations were carried out with Amber18 (ambermd.org), and system components (protein, ions, water) were modeled with the included FF14SB and TIP3P parameter sets. Energy minimization used CPU pmemd, while later simulation stages used GPU pmemd. CoV2 and CoV1 systems with one RBD up (with/without ACE2) were solvated in 12 angstrom water shells. Cysteine residues identified in the initial models as having a disulfide bond (DB) were bonded using tLeap. All simulations used 0.150 M NaCl. Hydrogen mass repartitioning was applied only to the protein to enable a 4 fs timestep (https://pubs.acs.org/doi/abs/10.1021/ct5010406). The SHAKE algorithm was applied to hydrogens, and a real-space cutoff of 8 angstroms was used. Periodic boundary conditions were applied and PME was used for long-range electrostatics. Minimization was by steepest descent (2000 steps) followed by conjugate gradient (3000 steps). Heating used two stages: (1) NVT heating from 0 K to 100 K (50 ps), and (2) NPT heating from 100 K to 300 K (100 ps). Restraints of 10 kcal mol-1 angstrom-2 were applied during minimization and heating to C-alpha atoms. During 6 ns of equilibration at 300 K C-alpha restraints were gradually reduced from 10 kcal mol-1 angstrom-2 to 0.1 kcal mol-1 angstrom-2. Finally, restraints were released and 320 ns unrestrained production simulations were carried out for CoV2 and CoV1 systems. Production simulations began from the final equilibrated snapshots, and five copies of each system were simulated. As unrestrained systems can freely rotate we monitored simulations for any close contacts and found that in one copy of the CoV1 simulation without ACE2 and one RBD up that a few contacts close to 8 angstrom occur near the end of the 320 ns between the RBD and a different subdomain of the spike complex in a periodic image. However this did not influence analyzed structural properties which is verified by comparing results across simulations. The Monte Carlo barostat was used to maintain pressure (1 atm), and the Langevin thermostat was used to maintain 300 K temperature (collision frequency 1 ps-1), as implemented in Amber18. In aggregate, nearly 7 microseconds of simulation of systems ranging from 396,147 to 879,100 atoms was carried out for this work.
For further details on the trajectories, please contact Joseph Baker (bakerj@tcnj.edu).

Regarding the contact map analysis scripts (contactMaps_Analysis.tar.gz), they contain the following workflow:

contactmap      --> source files from contact_map executable
process_nc.sh   --> convert raw data from all-atom simulation to numbered PDB files and get the contact maps
frequency.lua   --> read a set of PDB files and output the frequency count for each contact
consensus.fasta --> align sequence of Covid19 and SARS from Chimera
consensus.lua   --> read data previously generated and compute the frequency per residue, among other things.
consensus.sh    --> input information to consensus.lua
consensus.gp    --> gnuplot script to plot figures

This dataset and the code is part of tripartite collaboration between:

  • The Institute of Fundamental Technological Research, Polish Academy of Sciences, Warsaw, Poland (supported by the National Science Centre, Poland, under grant No. 2017/26/D/NZ1/0046)
  • Department of Chemistry, The College of New Jersey, New Jersey, United States (supported by National Science Foundation under grant numbers OAC-1826915 and OAC-1828163).
  • Jozef Stefan Institute, Ljubljana, Slovenia (supported by the Slovenian Research Agency (Funding No. P1-0055)).

Files

Files (41.3 MB)

Name Size Download all
md5:944d5edb02918224661d936775cb7272
14.9 kB Download
md5:ad08309d2be91cac1d3492cc6fb1ccc5
41.3 MB Download

Additional details

Related works