Published November 18, 2024 | Version full_dataset_v1
Dataset Open

modelforge curated dataset: tmQM

  • 1. ROR icon Memorial Sloan Kettering Cancer Center
  • 2. ROR icon Open Molecular Software Foundation

Description

Curated tmQM Dataset:

Full dataset, version "full_dataset_v1":

This provides a curated hdf5 file for the tmQM dataset (release 13Aug2024) designed to be compatible with modelforge, an infrastructure to implement and train NNPs.  This datafile includes 108541 unique molecules. Note, only a single configuration per unique molecule is provided. 

Change from full_dataset_v0: fixed minor labeling bug and scaling issue in the scaled version of the computed dipole moment.

When applicable, the units of properties are provided in the datafile,  encoded as strings compatible with the openff-units package.  For more information about the structure of the data file, please see the following:

This curated dataset was generated using the modelforge software at commit <add commit>:

  • Link to the source code at this commit: <add commit>
  • Link to the script file used to generate the dataset: <add commit>

Files

Files (310.6 MB)

Name Size Download all
md5:c584662a02964d78b0d5c6bc28960867
310.6 MB Download

Additional details

Related works

Is derived from
Dataset: https://github.com/bbskjelstad/tmqm (URL)
Is published in
Publication: 10.1021/acs.jcim.0c01041 (DOI)

Software

Repository URL
https://github.com/choderalab/modelforge
Programming language
Python
Development Status
Active