SPICE 1.1.3
Creators
- 1. Stanford University
- 2. University of California, Irvine
- 3. The Open Force Field Initiative
- 4. Acellera Labs
- 5. University of Notre Dame
- 6. Newcastle University
- 7. Memorial Sloan Kettering Cancer Center
- 8. Virginia Polytechnic Institute and State University
- 9. Weill Cornell Graduate School of Medical Sciences
- 10. Universitat Pompeu Fabra
Description
SPICE (Small-Molecule/Protein Interaction Chemical Energies) is a collection of quantum mechanical data for training potential functions. The emphasis is particularly on simulating drug-like small molecules interacting with proteins. It is described in this publication:
Peter Eastman, Pavan Kumar Behara, David L. Dotson, Raimondas Galvelis, John E. Herr, Josh T. Horton, Yuezhi Mao, John D. Chodera, Benjamin P. Pritchard, Yuanqing Wang, Gianni De Fabritiis, and Thomas E. Markland. "SPICE, A Dataset of Drug-like Molecules and Peptides for Training Machine Learning Potentials." https://doi.org/10.48550/arXiv.2209.10702 (2022).
The HDF5 file is structured as follows.
- There is one top level group for each unique molecule or cluster. The name of each group is either a PubChem Substance ID (for PubChem molecules), an amino acid sequence (for dipeptides and solvated amino acids), or a SMILES string (for everything else).
- Each group contains the following datasets.
N
is the number of atoms in the molecule andM
is the number of conformations. (Some groups may be missing some of them, for example if MBIS failed to converge.)subset
: The name of the data subset the molecule is from.smiles
: The canonical SMILES string for the molecule. It includes explicit hydrogens and atom indices.atomic_numbers
: Array of lengthN
containing the atomic number of every atom. They are ordered following the indices in the SMILES string.conformations
: Array of shape(M, N, 3)
containing the atomic coordinates for every conformation.formation_energy
: Array of lengthM
containing the total energy of each conformation, minus the reference energies of the individual atoms when infinitely separated. This is the most useful energy for most purposes, since it contains all energy components that vary with atom positions but removes the large constant part corresponding to the internal energies of individual atoms.dft_total_energy:
Array of lengthM
containing the energy of each conformation.dft_total_gradient
: Array of shape(M, N, 3)
containing the gradient of the energy with respect to the atomic coordinates.mbis_charges
: Array of shape(M, N, 1)
containing the MBIS charge of each atom.mbis_dipoles
: Array of shape(M, N, 3)
containing the MBIS dipole of each atom.mbis_quadrupoles
: Array of shape(M, N, 3, 3)
containing the MBIS quadrupole of each atom.mbis_octupoles
: Array of shape(M, N, 3, 3, 3)
containing the MBIS octupole of each atom.scf_dipoles
: Array of shape(M, 3)
containing the dipole of each molecule.scf_quadrupole
: Array of shape(M, 3, 3)
containing the quadrupole of each molecule.mayer_indices
: Array of shape(M, N, N)
containing the Mayer bond indices.wiberg_lowdin_indices
: Array of shape(M, N, N)
containing the Wiberg bond indices using orthogonal Löwdin orbitals.
- All values are in atomic units. Distances are in bohr and energies in hartree.
Files
Files
(11.3 GB)
Name | Size | Download all |
---|---|---|
md5:be93706b3bb2b2e327b690b185905856
|
11.3 GB | Download |