======= GENERAL INFORMATION =======
This is the dataset used for training the PhysNet model in "Reactive Atomistic Simulations of Diels-Alder Reactions: the Importance of Molecular Rotations". It contains energies, forces and dipole moments calculated at the M06-2X/6-31G* level of theory for structures of all 378 possible "amon" [1] structures of 2,3-dibromo-1,3-butadiene (DBB) and maleic anhydride (MA) at multiple geometries sampled by running Langevin dynamics at 1000 K at the PM7 level of theory. Additional geometries were generated by adaptive sampling [2,3]. In total, the dataset contains 224483 data points.
For more details, see https://arxiv.org/abs/1906.07455.
[1] Huang, B. and von Lilienfeld O. A., arXiv:1707.04146 (2017)
[2] Behler, J., Phys. Condens. Matter 26, 183001 (2014)
[3] Behler, J., Int. J. Quantum Chem. 115, 1032 (2015)
======= HOW TO CITE? =======
When using this dataset, please cite the following papers:
Rivero, U.; Unke, O. T.; Meuwly, M. and Willitsch S. "Reactive Atomistic Simulations of Diels-Alder Reactions: the Importance of Molecular Rotations" arXiv:1906.07455 (2019).
Unke, O. T. and Meuwly, M. "PhysNet: A Neural Network for Predicting Energies, Forces, Dipole Moments and Partial Charges" J. Chem. Theory Comput. 2019, 15(6), 3678-3693.
and the digital object identifier (DOI):
Rivero, U.; Unke, O. T.; Meuwly, M. and Willitsch S. (2019). Diels-Alder reactions dataset. Zenodo. http://doi.org/10.5281/zenodo.3291503.
======= DATA FORMAT =======
The dataset is stored as python dictionary in a compressed numpy binary file (.npz). The dictionary contains seven numpy arrays:
R (num_data, max_atoms, 3): Cartesian coordinates of nuclei (in Angstrom [A])
Q (num_data,): Total charge (in elementary charges [e])
D (num_data, 3): Dipole moment vector with respect to the origin (in elementary charges times Angstrom [eA])
E (num_data,): Potential energy with respect to free atoms (in electronvolt [eV])
F (num_data, max_atoms, 3): Forces acting on the nuclei (in electronvolt per Angstrom [eV/A])
Z (num_data, max_atoms): Nuclear charges/atomic numbers of nuclei
N (num_data,): Number of atoms in each structure (structures consisting of less than max_atoms entries are zero-padded)
Please note that the potential energy is given with respect to free atoms (i.e. total atomization).
The following constants were subtracted from the original ab initio values for each occurence of the corresponding elements:
H : -13.514961470009688 eV
C : -1029.3784600921310 eV
O : -2041.7274541382831 eV
Br: -69980.227285891879 eV
In order to recover the original ab initio values, simply add the constants back.
To read the dataset, load the dictionary with python:
>>> data = np.load("diels-alder_reactions.npz")
and access individual entries with the appropriate dictionary key, e.g. "Z" for the nuclear charges:
>>> nuclear_charges = data["Z"]
See also "read_data.py" for a more comprehensive example.