There is a newer version of the record available.

Published June 19, 2024 | Version nc_1000_v0
Dataset Open

modelforge curated dataset: ANI-1x

  • 1. ROR icon Memorial Sloan Kettering Cancer Center

Description

Curated ANI-1x Dataset:

1000 conformer test set, version "nc_1000_v0":

This provides a curated hdf5 file for a subset of the ANI-1x dataset designed to be compatible with modelforge, an infrastructure to implement and train NNPs.  This dataset contains 1000 total conformers, for 135 unique entries (a maximum of 10 conformers per record). 

When applicable, the units of properties are provided in the datafile,  encoded as strings compatible with the openff-units package.  For more information about the structure of the data file, please see the following:

This curated dataset was generated using the modelforge software at commit c5c7153:

 

Source Dataset:

The ANI-1x data set includes properties for small organic molecules that contain H, C, N, and O. This dataset contains nearly 5 million conformers. This data was generated with the wB97X/631Gd level of theory calculated using Gaussian 09. A subset of the the conformers (~500K) with accurate coupled cluster methods (ANI-1xcc).

Citations:

ANI-1x publications:

  • ANI-1x dataset

    Smith, J. S.; Nebgen, B.; Lubbers, N.; Isayev, O.; Roitberg, A. E. Less Is More: Sampling Chemical Space with Active Learning. J. Chem. Phys. 2018, 148 (24), 241733.
    https://doi.org/10.1063/1.5023802

  • ANI-1ccx dataset

    Smith, J. S.; Nebgen, B. T.; Zubatyuk, R.; Lubbers, N.; Devereux, C.; Barros, K.; Tretiak, S.; Isayev, O.; Roitberg, A. E. Approaching Coupled Cluster Accuracy with a General-Purpose Neural Network Potential through Transfer Learning. Nat. Commun. 2019, 10 (1), 2903.
    https://doi.org/10.1038/s41467-019-10827-4

  • wB97x/def2-TZVPP data

    Zubatyuk, R.; Smith, J. S.; Leszczynski, J.; Isayev, O. Accurate and Transferable Multitask Prediction of Chemical Properties with an Atoms-in-Molecules Neural Network. Sci. Adv. 2019, 5 (8), eaav6490.
    https://doi.org/10.1126/sciadv.aav6490

  •  

Source dataset, released with CCO 1.0 Universal License:

  • Smith, Justin S; Zubatyuk, Roman; Nebgen, Benjamin; Lubbers, Nicholas; Barros, Kipton; Roitberg, Adrian; et al. (2020). The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules. figshare. Collection. https://doi.org/10.6084/m9.figshare.c.4712477.v1

Github repository:

Files

Files (1.8 MB)

Name Size Download all
md5:ab72a43b342bf57b94f4e242c132da97
1.8 MB Download

Additional details

Software

Repository URL
https://github.com/choderalab/modelforge
Programming language
Python
Development Status
Active