There is a newer version of the record available.

Published April 15, 2025 | Version v1.1_PdZnFeCuNiPtIrRhCrAg_T200K
Dataset Open

modelforge curated dataset: tmQM-xtb

Description

Curated tmQM-xtb Dataset:
- T=200K dataset restricted to [Pd, Zn, Fe, Cu, Ni, Pt, Ir, Rh, Cr, and Ag]
- Version: v1.1_PdZnFeCuNiPtIrRhCrAg_T200K

This dataset contains 51249 unique systems with 1317625 total configurations (~30 per system), sampled at T=200K. 

This dataset is limited to systems that contain transition metals Pd, Zn, Fe,  Cu, Ni, Pt, Ir, Rh, Cr, or Ag and also only contain elements C, H, P, S, O, N, F, Cl, or Br. 

Potentially problematic configurations (i.e., unstable or those with structural changes) were removed. Briefly, bond inference was performed on the initial configuration using RDKit and a configuration was excluded if any of those bond distances changed by more than 0.15  angstroms compared to the initial, energy minimized state. 

This dataset was generated starting from the tmQM dataset; the original tmQM repository (https://github.com/uiocompcat/tmQM) was forked and a release made that corresponds to the data committed on 13 August 2024 (https://github.com/chrisiacovella/tmQM/releases/tag/2024Aug13).

Tach molecule was evaluated using gfn2-xtb, and then a short MD simulation performed to provide additional configurations of the molecules. 

  • The tblite package was used to evaluate the energetic of the system using the gfn2-xtb formalism.
  • MD simulations were performed using the Atomic Simulation Environment (ASE), using the Langevin integrator 
  • Simulations were performed at 200K with a 1 fs timestep and 0.01 1/fs friction damping factor.  
  • In all trajectories, the first configuration corresponds to the energy minimized configuration reported in the original tmQM dataset.
  • 100 steps were taken between snapshots,  with 30 total snapshots per molecule
  • During MD sampling, gfn2-xtb accuracy was set to 2; all reported properties were calculated at accuracy level 1. 

Scripts used to perform the sampling can be found https://github.com/chrisiacovella/xtb_config_gen

Properties included: 

  • atomic_numbers  
  • positions  
    • "per_atom"
    • "nanometer"
  • forces
    • "per_atom"
    •  "kilojoule_per_mole / nanometer"
  • partial_charges 
    • "per_atom"
    • "elementary_charge"
  • energies  
    • "per_system"
    • "kilojoule_per_mole"
  • dipole_moment_per_system  
    • "per_system"
    •  "elementary_charge * nanometer"
  • total_charge  
    •  "per_system"
    • "elementary_charge"
  • spin_multiplicities 
    • "per_system"
    • "dimensionless"
  • stoichiometry 
    • "meta_data"

Files

Files (4.7 GB)

Name Size Download all
md5:a1d03a025ecfd48d7dc286b3d71cb900
4.7 GB Download

Additional details

Related works

Is derived from
Publication: 10.1021/acs.jcim.0c01041 (DOI)
Is part of
Dataset: 10.5281/zenodo.14920177 (DOI)

Software

Repository URL
https://github.com/choderalab/modelforge
Programming language
Python
Development Status
Active