Published April 14, 2025 | Version v1_PdZnFeCuNiPtIrRhCrAg_300K
Dataset Open

tmQM-xtb dataset

  • 1. ROR icon Memorial Sloan Kettering Cancer Center
  • 2. ROR icon Vanderbilt University

Description

This provides an hdf5 file for a subset of the tmQM-xtb dataset sampled at T=300K. 

This  consists of 51252 systems (1,537,560 configurations total, 30 per system), considering only those systems with the following metal centers and organic elements, respectively: 
- Pd, Zn, Fe, Cu, Ni, Pt, Ir, Rh, Cr, or Ag 
- C, H, P, S O, N, F Cl, or Br.

This dataset was generated starting from the tmQM dataset (release 13Aug2024, https://github.com/uiocompcat/tmQM). The original dataset contains 108541 unique molecules.  Here each molecule, that matches the criteria above, was evaluated using gfn2-xtb, and then a short MD simulation performed to provide additional configurations of the molecules. 

  • The tblite package was used to evaluate the energetic of the system using the gfn2-xtb formalism.
  • MD simulations were performed using the Atomic Simulation Environment (ASE), using the Langevin integrator 
  • Simulations were performed at 300K with a 1 fs timestep and 0.01 1/fs friction damping factor.  
  • In all trajectories, the first configuration corresponds to the energy minimized configuration reported in the original tmQM dataset.
  • 100 steps were taken between snapshots,  with 30 total snapshots per molecule
  • During MD sampling, gfn2-xtb accuracy was set to 2; all reported properties were calculated at accuracy level 1. 

Scripts used to perform the sampling can be found https://github.com/chrisiacovella/xtb_config_gen

Note, this dataset correpsonds to the raw output of the gfn2-xtb based calculations. No validation or exclusions have been made and thus not all configurations will necessarily be stable or realistic. 

 

 

 

Files

Files (5.5 GB)

Name Size Download all
md5:48b3041fb4fdf6b0a68c68149d891d88
5.5 GB Download

Additional details

Related works

Is derived from
Dataset: https://github.com/uiocompcat/tmQM (URL)
Journal: 10.1021/acs.jcim.0c01041 (DOI)
Dataset: 10.5281/zenodo.14188396 (DOI)

Software