Preprint Open Access

Single-chain CG Polymers in Simulate Time-integrated Coarse-grained Molecular Dynamics with Geometric Machine Learning

Fu, Xiang; Xie, Tian; Rebello, Nathan; Olsen, Bradley; Jaakkola, Tommi

The single-chain coarse-grained polymer preprocessed dataset that's described in the paper: "Simulate Time-integrated Coarse-grained Molecular Dynamics with Geometric Machine Learning". 





Dataset description:

We have done some preprocessing and down-sampling to reduce the gigantic dataset size. The uploaded dataset is made of three components:

  • polymer_train.tar.gz contains MD trajectories for 100 training class-I polymers, each 50k tau long with a recording frequency of 5 tau (10k steps).
  • polymer_test_5M.tar.gz contains MD trajectories for 40 testing class-II polymers, each 4.95 million tau long with a recording frequency of 500 tau (9900 steps). This data is used for final evaluation.
  • polymer_test.tar.gz contains MD trajectories for 40 testing class-II polymers, each 5k tau long with a recording frequency of 5 tau (1k steps). This data is used for initializing the learned simulator at test time.

Paper Abstract:

Molecular dynamics (MD) simulation is the workhorse of various scientific domains but is limited by high computational cost. Learning-based force fields have made major progress in accelerating ab-initio MD simulation but are still not fast enough for many real-world applications that require long-time MD simulation. In this paper, we adopt a different machine learning approach where we coarse-grain a physical system using graph clustering, and model the system evolution with a very large time-integration step using graph neural networks. A novel score-based GNN refinement module resolves the long-standing challenge of long-time simulation instability. Despite only trained with short MD trajectory data, our learned simulator can generalize to unseen novel systems and simulate for much longer than the training trajectories. Properties requiring 10-100 ns level long-time dynamics can be accurately recovered at several-orders-of-magnitude higher speed than classical force fields. We demonstrate the effectiveness of our method on two realistic complex systems: (1) single-chain coarse-grained polymers in implicit solvent; (2) multi-component Li-ion polymer electrolyte systems.

If you find this dataset useful, please consider reference in your paper:

  title={Simulate Time-integrated Coarse-grained Molecular Dynamics with Geometric Machine Learning},
  author={Fu, Xiang and Xie, Tian and Rebello, Nathan J and Olsen, Bradley D and Jaakkola, Tommi},
  journal={arXiv preprint arXiv:2204.10348},


  title={Targeted sequence design within the coarse-grained polymer genome},
  author={Webb, Michael A and Jackson, Nicholas E and Gil, Phwey S and de Pablo, Juan J},
  journal={Science advances},
  publisher={American Association for the Advancement of Science}


Files (13.5 GB)
Name Size
371.0 MB Download
3.7 GB Download
9.5 GB Download
All versions This version
Views 918918
Downloads 351351
Data volume 1.9 TB1.9 TB
Unique views 624624
Unique downloads 159159


Cite as