DATA SET FORMAT:
The data is saved in a compressed numpy binary file (.npz) and
contains a python dictionary with seven numpy arrays:
R: Cartesian Coordinates of atoms (in Angstrom [A]), (num_data, 4, 3)
Q: Total charge (in elementary charges [e]), (num_data,)
D: Dipole moment vector with respect to the origin (in elementary charges times
Angstrom [eA]), (num_data, 3)
E: Potential energy with respect to free atoms (in electronvolt [eV]), (num_data,)
F: Forces acting on atoms (in electronvolt per Angstrom [eV/A]), (num_data, 4, 3)
Z: Atomic number of atoms, (num_data, 4)
N: Number of atoms in structure (num_data,)
It is important to note that the potential energies of the structures
are given with respect to free atoms. Thus, the constants listed below
are subtracted from the Orca/Molpro output for each occurence of the
corresponding atoms:
For B3LYP:
H: -0.497858658764 hartree
C: -37.830617391474 hartree
O: -75.039041613326 hartree
For MP2:
H: -0.499821176024 hartree
C: -37.759560677467 hartree
O: -74.959294141352 hartree
For CCSD(T)-F12
H: -0.499946213283 hartree
C: -37.788204984713 hartree
O: -75.000839553994 hartree
---------------------------------------------------------------------------------------
ACCESS DATA SET:
The data set can be accessed using python:
>>> data = np.load("h2co_B3LYP_cc-pVDZ_4001.npz")
The different keywords of the python dictionary can be listed using
>>> data.files
>>>['Q', 'D', 'F', 'Z', 'R', 'E', 'N']
and the individual entries can be loaded using the appropriate
keyword, e.g. for the energy
>>> energies = data["E"]