There is a newer version of the record available.

Published February 24, 2025 | Version v1_400K
Dataset Open

tmQM-xtb dataset

  • 1. ROR icon Memorial Sloan Kettering Cancer Center
  • 2. ROR icon Vanderbilt University

Description

This provides an hdf5 file for the tmQM-xtb dataset.  This dataset was generated starting from the tmQM dataset (release 13Aug2024, https://github.com/uiocompcat/tmQM) containing 108541 unique molecules; each molecule was evaluated using gfn2-xtb, and then a short MD simulation performed to provide additional configurations of the molecules. 

  • The tblite package was used to evaluate the energetic of the system using the gfn2-xtb formalism.
  • MD simulations were performed using the Atomic Simulation Environment (ASE), using the Langevin integrator 
  • Simulations were performed at 400K with a 1 fs timestep and 0.01 1/fs friction damping factor.  
  • In all trajectories, the first configuration corresponds to the energy minimized configuration reported in the original tmQM dataset.
  • 100 steps were taken between snapshots,  with 10 total snapshots per molecule
  • During MD sampling, gfn2-xtb accuracy was set to 2; all reported properties were calculated at accuracy level 1. 

Scripts used to perform the sampling can be found https://github.com/chrisiacovella/xtb_config_gen

Note, this dataset correpsonds to the raw output of the gfn2-xtb based calculations. No validation or exclusions have been made and thus not all configurations will necessarily be stable or realistic. 

 

Note, this version now inclues units for each entry; these were inadvertantly left out in the prior version

 

Files

Files (4.0 GB)

Name Size Download all
md5:8c599df8cf52484a470955638ae839a8
4.0 GB Download

Additional details

Related works

Is derived from
Dataset: https://github.com/uiocompcat/tmQM (URL)
Journal: 10.1021/acs.jcim.0c01041 (DOI)
Dataset: 10.5281/zenodo.14188396 (DOI)

Software