Published August 15, 2022 | Version v1
Dataset Open

Alanine dipeptide in an implicit solvent at 300K

  • 1. University of Cambridge
  • 2. InstaDeep
  • 3. Max Planck Institute for Intelligent Systems

Description

This dataset has been introduced in the article Midgley, et al.: Flow Annealed Importance Sampling Bootstrap, 2022.

It contains samples from the Boltzmann distribution of alanine dipeptide in an implicit solvent, which have been generated with a Replica Exchange Molecular Dynamics (REMD) simulation. The ff96 with an OBC GBSA implicit solvent was used.

The REMD uses 21 replicas starting at a temperature of 300K and increasing the temperature by an increment of 50K. The replicas are exchanged every 200 iterations and use the state at each multiple of 1000 time steps as samples. Many of these simulations were in parallel with different seeds. We let the system equilibrate for \(2\times10^5\) iterations and run the simulation subsequently for \(2\times10^6\) iterations.

The data is split into a training set, which consists of \(10^6\) samples; a validation set consisting of \(10^6\) samples as well; and a test set with \(10^7\) samples.

The data is provided as raw \((x,y,z)\)-coordinates, stored as *.h5 files, and transformed to internal coordinates, stored as *.pt files.

More details about the data and how to use it are given in our GitHub repository and paper.

Files

Files (8.4 GB)

Name Size Download all
md5:060e0165dcaba7c596293fc04f274081
2.2 GB Download
md5:fc12871ced6f450bec83ccaff90990b2
4.8 GB Download
md5:8d34fdda8694ee8d6745fc45b9eb3380
220.2 MB Download
md5:8abcf91b93cad050bd737ab1b4b0928e
480.0 MB Download
md5:7f93e4931ae4b2ef424d15b42218c9db
220.2 MB Download
md5:925bbeaf03dd09fb8b030646ca0eecef
480.0 MB Download

Additional details

Related works

Is supplement to
Preprint: 10.48550/arXiv.2208.01893 (DOI)