modelforge curated dataset: GEOM QM9
Authors/Creators
- 1. Memorial Sloan Kettering Cancer Center
Description
Modelforge Curated GEOM QM9 Dataset:
- Full Dataset
- Version: full_dataset_v1.0:
This provides a curated hdf5 file for the QM9 subset of the Geometric Ensemble Of Molecules (GEOM) datase (https://doi.org/10.1038/s41597-022-01288-4). The GEOM QM9 dataset samples the 133,885 organic molecules with up to nine total heavy atoms (C,O,N,or F; excluding H) from the original QM9 dataset ( https://doi.org/10.1038/sdata.2014.22), generating multiple configurations for each molecule using the CREST software that relies on GFN2-XTB. Energies were evaluated using DFT via ORCA 5.0.2 using the r2scan-3c functional and mTZVPP basis.
The provided hdf5 file contains a subset of this dataset to be used for testing purposes, designed to be compatible with modelforge, an infrastructure to implement and train NNPs.
The dataset contains 133258 unique records for 1822137 total configurations.
When applicable, the units of properties are provided in the datafile, encoded as strings compatible with the openff-units package. For more information about the structure of the data file, please see the following:
Properties Included:
- atomic_numbers
- positions
- "per_atom"
- "nanometer"
- dft_total_energy
- "per_system"
- "kilojoule_per_mole"
- total_charge
- "per_system"
- "elementary_charge"
- smiles
- "meta_data"
- "meta_data"
Files
Files
(906.4 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:79031ae528e08f7839feab4449fde201
|
906.4 MB | Download |
Additional details
Related works
- Is derived from
- Publication: 10.1038/s41597-022-01288-4 (DOI)
- Dataset: 10.7910/DVN/JNGTDF (DOI)
Software
- Repository URL
- https://github.com/choderalab/modelforge
- Programming language
- Python
- Development Status
- Active