There is a newer version of the record available.

Published June 17, 2024 | Version nc_1000_v0
Dataset Open

modelforge curated dataset: ANI-2x

  • 1. ROR icon Memorial Sloan Kettering Cancer Center

Description

Curated ANI-2x Dataset:

1000 conformer test set, version "nc_1000_v0":

This provides a curated hdf5 file for a subset of the ANI-2x dataset designed to be compatible with modelforge, an infrastructure to implement and train NNPs.  This dataset contains 1000 total conformers, for 101 unique entries, useful for testing (note, conformers are paritioned into entries based on the array of atomic species appearing in sequence in the source data file). 

When applicable, the units of properties are provided in the datafile,  encoded as strings compatible with the openff-units package.  For more information about the structure of the data file, please see the following:

This curated dataset was generated using the modelforge software at commit c5c7153:

 

Source Dataset:

The ANI-2x data set includes properties for small organic molecules that contain H, C, N, O, S, F, and Cl. This dataset contains 9651712 conformers. This data was generated with the wB97X/631Gd level of theory used in the original ANI-2x paper, calculated using Gaussian 09.

Citations:

ANI-2x publication:

  • Devereux, C, Zubatyuk, R., Smith, J. et al. "Extending the applicability of the ANI deep learning molecular potential to sulfur and halogens." Journal of Chemical Theory and Computation 16.7 (2020): 4192-4202. https://doi.org/10.1021/acs.jctc.0c00121

Source dataset, released with CC Attribution 4.0 International license:

Files

Files (174.0 kB)

Name Size Download all
md5:c6b01060d164cf98497b6927da49dd07
174.0 kB Download

Additional details

Related works

Is derived from
Dataset: 10.5281/zenodo.10108942 (DOI)
Is described by
Journal article: 10.1021/acs.jctc.0c00121 (DOI)

Software

Repository URL
https://github.com/choderalab/modelforge
Programming language
Python
Development Status
Active