Published April 21, 2022 | Version v1
Preprint Open

Single-chain CG Polymers in Simulate Time-integrated Coarse-grained Molecular Dynamics with Geometric Machine Learning

Description

The single-chain coarse-grained polymer preprocessed dataset that's described in the paper: "Simulate Time-integrated Coarse-grained Molecular Dynamics with Geometric Machine Learning". 

Paper: https://arxiv.org/abs/2204.10348

Code: https://github.com/kyonofx/mlcgmd/

Website: https://xiangfu.co/mlcgmd

Video: https://youtu.be/l3aGVjQezsc

Dataset description:

We have done some preprocessing and down-sampling to reduce the gigantic dataset size. The uploaded dataset is made of three components:

  • polymer_train.tar.gz contains MD trajectories for 100 training class-I polymers, each 50k tau long with a recording frequency of 5 tau (10k steps).
  • polymer_test_5M.tar.gz contains MD trajectories for 40 testing class-II polymers, each 4.95 million tau long with a recording frequency of 500 tau (9900 steps). This data is used for final evaluation.
  • polymer_test.tar.gz contains MD trajectories for 40 testing class-II polymers, each 5k tau long with a recording frequency of 5 tau (1k steps). This data is used for initializing the learned simulator at test time.

Paper Abstract:

Molecular dynamics (MD) simulation is the workhorse of various scientific domains but is limited by high computational cost. Learning-based force fields have made major progress in accelerating ab-initio MD simulation but are still not fast enough for many real-world applications that require long-time MD simulation. In this paper, we adopt a different machine learning approach where we coarse-grain a physical system using graph clustering, and model the system evolution with a very large time-integration step using graph neural networks. A novel score-based GNN refinement module resolves the long-standing challenge of long-time simulation instability. Despite only trained with short MD trajectory data, our learned simulator can generalize to unseen novel systems and simulate for much longer than the training trajectories. Properties requiring 10-100 ns level long-time dynamics can be accurately recovered at several-orders-of-magnitude higher speed than classical force fields. We demonstrate the effectiveness of our method on two realistic complex systems: (1) single-chain coarse-grained polymers in implicit solvent; (2) multi-component Li-ion polymer electrolyte systems.

If you find this dataset useful, please consider reference in your paper:

@article{fu2022simulate,
  title={Simulate Time-integrated Coarse-grained Molecular Dynamics with Geometric Machine Learning},
  author={Fu, Xiang and Xie, Tian and Rebello, Nathan J and Olsen, Bradley D and Jaakkola, Tommi},
  journal={arXiv preprint arXiv:2204.10348},
  year={2022}
}

And:

@article{webb2020targeted,
  title={Targeted sequence design within the coarse-grained polymer genome},
  author={Webb, Michael A and Jackson, Nicholas E and Gil, Phwey S and de Pablo, Juan J},
  journal={Science advances},
  volume={6},
  number={43},
  pages={eabc6216},
  year={2020},
  publisher={American Association for the Advancement of Science}
}

 

Files

Files (13.5 GB)

Name Size Download all
md5:644ea9f1efbd1df2e355f054cfdab7dd
371.0 MB Download
md5:8c5c003fbcfd48edc75a20ca6cb2a3e3
3.7 GB Download
md5:cc8279c05a75b267ed000b8b7b2c3e96
9.5 GB Download

Additional details

Related works

Is compiled by
Software: https://github.com/kyonofx/mlcgmd/ (URL)
Is described by
Preprint: https://arxiv.org/abs/2204.10348 (URL)