There is a newer version of the record available.

Published June 14, 2024 | Version full_dataset_v0_HCNOFClS
Dataset Open

modelforge curated dataset: SPICE 2

  • 1. ROR icon Memorial Sloan Kettering Cancer Center

Description

Curated SPICE 2 Dataset:

Full dataset restricted to elements H, C, N, O, F, Cl, S, version "full_dataset_v0_HCNOFClS":

This provides a curated hdf5 file for a subset of the SPICE 2 dataset (release v2.0.1) designed to be compatible with modelforge, an infrastructure to implement and train NNPs.  This subset is limited to molecules containing any of the following 7 elements: H, C, N, O, F, Cl, and S.  This  dataset contains 1,620,239 total conformers, for 97,280 unique molecules. 

When applicable, the units of properties are provided in the datafile,  encoded as strings compatible with the openff-units package.  For more information about the structure of the data file, please see the following:

This curated dataset was generated using the modelforge software at commit c5c7153:

 

Source Dataset:

Small-molecule/Protein Interaction Chemical Energies (SPICE).

The SPICE 2 dataset contains roughly 2 million conformations for a diverse set of small molecules, dimers, dipeptides, and solvated amino acids. It includes 17 elements, charged and  uncharged molecules, and a wide range of covalent and non-covalent interactions.  SPICE 2 is an update to spice 1, roughly double the total dataset size and including 2 additional elements.  It provides both forces and energies calculated at the ωB97M-D3(BJ)/def2-TZVPPD level of theory, using Psi4 1.4.1 along with other useful quantities such as multipole moments and bond orders.

 

Citations:

Original SPICE 1 publication:

  • Eastman, P., Behara, P.K., Dotson, D.L. et al. SPICE,  A Dataset of Drug-like Molecules and Peptides for Training Machine Learning Potentials. Sci Data 10, 11 (2023). https://doi.org/10.1038/s41597-022-01882-6

Source dataset, released with CCO 1.0 Universal license:

  • Eastman, P., Behara, P. K., Dotson, D., Galvelis, R., Herr, J., Horton, J., Mao, Y., Chodera, J., Pritchard, B., Wang, Y., De Fabritiis, G., & Markland, T. (2024). SPICE 2.0.1 (2.0.1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.10975225

Files

Files (21.7 GB)

Name Size Download all
md5:af1af7b7f50b5dfdd6a3d94924c7147f
21.7 GB Download

Additional details

Related works

Is derived from
Dataset: 10.5281/zenodo.10975225 (DOI)
Is described by
Journal article: 10.1038/s41597-022-01882-6 (DOI)

Software

Repository URL
https://github.com/choderalab/modelforge
Programming language
Python
Development Status
Active