UPDATE: Zenodo migration postponed to Oct 13 from 06:00-08:00 UTC. Read the announcement.

Preprint Open Access

Forces are not Enough: Benchmark and Critical Evaluation for Machine Learning Force Fields with Molecular Simulations

Fu, Xiang; Wu, Zhenghao; Wang, Wujie; Xie, Tian; Keten, Sinan; Gomez-Bombarelli, Rafael; Jaakkola, Tommi


Dublin Core Export

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
  <dc:creator>Fu, Xiang</dc:creator>
  <dc:creator>Wu, Zhenghao</dc:creator>
  <dc:creator>Wang, Wujie</dc:creator>
  <dc:creator>Xie, Tian</dc:creator>
  <dc:creator>Keten, Sinan</dc:creator>
  <dc:creator>Gomez-Bombarelli, Rafael</dc:creator>
  <dc:creator>Jaakkola, Tommi</dc:creator>
  <dc:date>2022-10-13</dc:date>
  <dc:description>The preprocessed datasets described in the paper: "Forces are not Enough: Benchmark and Critical Evaluation for Machine Learning Force Fields with Molecular Simulations." 

Paper: https://arxiv.org/abs/2210.07237

Code: https://github.com/kyonofx/MDsim/

Dataset Description:

The MD17 dataset and the LiPS dataset are adapted from previous work. The source data can be found in the hyperlinks. We include the source data for our alanine dipeptide dataset (alanine_dipeptide.npy) and water dataset (water.npy), along with preprocessed datasets for all datasets in the paper: MD17, water, alanine dipeptide, and LiPS (mdsim_data.tar.gz). Please refer to the paper for details on each dataset.

Paper Abstract:

Molecular dynamics (MD) simulation techniques are widely used for various natural science applications. Increasingly, machine learning (ML) force field (FF) models begin to replace ab-initio simulations by predicting forces directly from atomic structures. Despite significant progress in this area, such techniques are primarily benchmarked by their force/energy prediction errors, even though the practical use case would be to produce realistic MD trajectories. We aim to fill this gap by introducing a novel benchmark suite for ML MD simulation. We curate representative MD systems, including water, organic molecules, peptide, and materials, and design evaluation metrics corresponding to the scientific objectives of respective systems. We benchmark a collection of state-of-the-art (SOTA) ML FF models and illustrate, in particular, how the commonly benchmarked force accuracy is not well aligned with relevant simulation metrics. We demonstrate when and how selected SOTA methods fail, along with offering directions for further improvement. Specifically, we identify stability as a key metric for ML models to improve. Our benchmark suite comes with a comprehensive open-source codebase for training and simulation with ML FFs to facilitate further work.

If you find this dataset useful, please consider reference in your paper:

@article{fu2022forces,
      title={Forces are not Enough: Benchmark and Critical Evaluation for Machine Learning Force Fields with Molecular Simulations}, 
      author={Xiang Fu and Zhenghao Wu and Wujie Wang and Tian Xie and Sinan Keten and Rafael Gomez-Bombarelli and Tommi Jaakkola},
      journal={arXiv preprint arXiv:2210.07237},
      year={2022},
}

For the MD17 dataset, reference:

@article{chmiela2017machine,
  title={Machine learning of accurate energy-conserving molecular force fields},
  author={Chmiela, Stefan and Tkatchenko, Alexandre and Sauceda, Huziel E and Poltavsky, Igor and Sch{\"u}tt, Kristof T and M{\"u}ller, Klaus-Robert},
  journal={Science advances},
  volume={3},
  number={5},
  pages={e1603015},
  year={2017},
  publisher={American Association for the Advancement of Science}
}

For the LiPS dataset, reference:

@article{batzner20223,
  title={E (3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials},
  author={Batzner, Simon and Musaelian, Albert and Sun, Lixin and Geiger, Mario and Mailoa, Jonathan P and Kornbluth, Mordechai and Molinari, Nicola and Smidt, Tess E and Kozinsky, Boris},
  journal={Nature communications},
  volume={13},
  number={1},
  pages={1--11},
  year={2022},
  publisher={Nature Publishing Group}
}

 </dc:description>
  <dc:identifier>https://zenodo.org/record/7196767</dc:identifier>
  <dc:identifier>10.5281/zenodo.7196767</dc:identifier>
  <dc:identifier>oai:zenodo.org:7196767</dc:identifier>
  <dc:relation>url:https://github.com/kyonofx/MDsim</dc:relation>
  <dc:relation>url:https://arxiv.org/abs/2210.07237</dc:relation>
  <dc:relation>doi:10.5281/zenodo.7196578</dc:relation>
  <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
  <dc:rights>https://creativecommons.org/licenses/by/4.0/legalcode</dc:rights>
  <dc:subject>molecular dynamics</dc:subject>
  <dc:subject>machine learning force fields</dc:subject>
  <dc:subject>machine learning potentials</dc:subject>
  <dc:title>Forces are not Enough: Benchmark and Critical Evaluation for Machine Learning Force Fields with Molecular Simulations</dc:title>
  <dc:type>info:eu-repo/semantics/preprint</dc:type>
  <dc:type>publication-preprint</dc:type>
</oai_dc:dc>
457
274
views
downloads
All versions This version
Views 457442
Downloads 274272
Data volume 943.6 GB943.5 GB
Unique views 407398
Unique downloads 180178

Share

Cite as