modelforge curated dataset: SPICE 2
Description
Curated SPICE 2 Dataset:
Full dataset restricted to elements H, C, N, O, F, Cl, S, version "full_dataset_v0_HCNOFClS":
This provides a curated hdf5 file for a subset of the SPICE 2 dataset (release v2.0.1) designed to be compatible with modelforge, an infrastructure to implement and train NNPs. This subset is limited to molecules containing any of the following 7 elements: H, C, N, O, F, Cl, and S. This dataset contains 1,620,239 total conformers, for 97,280 unique molecules.
When applicable, the units of properties are provided in the datafile, encoded as strings compatible with the openff-units package. For more information about the structure of the data file, please see the following:
This curated dataset was generated using the modelforge software at commit c5c7153:
- Link to the source code at this commit: https://github.com/choderalab/modelforge/tree/c5c7153e06172fe8e6f25015250ecb5db05655cc
- Link to the script file used to generate the dataset: https://github.com/choderalab/modelforge/blob/c5c7153e06172fe8e6f25015250ecb5db05655cc/modelforge/curation/scripts/curate_spice2.py
Source Dataset:
Small-molecule/Protein Interaction Chemical Energies (SPICE).
The SPICE 2 dataset contains roughly 2 million conformations for a diverse set of small molecules, dimers, dipeptides, and solvated amino acids. It includes 17 elements, charged and uncharged molecules, and a wide range of covalent and non-covalent interactions. SPICE 2 is an update to spice 1, roughly double the total dataset size and including 2 additional elements. It provides both forces and energies calculated at the ωB97M-D3(BJ)/def2-TZVPPD level of theory, using Psi4 1.4.1 along with other useful quantities such as multipole moments and bond orders.
Citations:
Original SPICE 1 publication:
- Eastman, P., Behara, P.K., Dotson, D.L. et al. SPICE, A Dataset of Drug-like Molecules and Peptides for Training Machine Learning Potentials. Sci Data 10, 11 (2023). https://doi.org/10.1038/s41597-022-01882-6
Source dataset, released with CCO 1.0 Universal license:
-
Eastman, P., Behara, P. K., Dotson, D., Galvelis, R., Herr, J., Horton, J., Mao, Y., Chodera, J., Pritchard, B., Wang, Y., De Fabritiis, G., & Markland, T. (2024). SPICE 2.0.1 (2.0.1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.10975225
Files
Files
(21.7 GB)
Name | Size | Download all |
---|---|---|
md5:af1af7b7f50b5dfdd6a3d94924c7147f
|
21.7 GB | Download |
Additional details
Related works
- Is derived from
- Dataset: 10.5281/zenodo.10975225 (DOI)
- Is described by
- Journal article: 10.1038/s41597-022-01882-6 (DOI)
Software
- Repository URL
- https://github.com/choderalab/modelforge
- Programming language
- Python
- Development Status
- Active