Published June 30, 2020 | Version 1
Dataset Open

H2CO Dataset

Description

The deposited data sets were used to compare three state-of-the art machine
learning (ML) approaches to obtain representations of potential energy
surfaces (PESs). The comparison is meant to be representative as it examines
a purely kernel-based approach (reproducing kernel Hilbert space plus
forces (RKHS+F))[1], a purely neural network based approach (PhysNet)[2]
and includes the FCHL representation [3] within kernel ridge regression.
Formaldehyde, H2CO, is used as a benchmark system.

H2CO is a small molecule for which PESs can be calculated
at different levels of theory and, thus, suitable for an in-depth theoretical
study. Also, very high-level calculations have already been presented (see e.g.
Ref. [4]) and experimental reference data is available to compare with [5].

Using reference data calculated at three different levels of quantum chemical
theory (B3LYP/cc-pVDZ, MP2/aug-cc-pVTZ and CCSD(T)-F12/aug-cc-pVTZ-F12) ML
models are trained using the different ML methods. The performance of the
models is then examined by considering energy and force learning curves,
harmonic frequencies and IR spectra from finite-Temperature molecular dynamics
(MD) simulations.

The data sets contain different geometries for the H2CO molecule generated using
the normal mode sampling approach [6] performed at different temperatures. Four
data sets are deposited:


i)   "h2co_B3LYP_cc-pVDZ_4001.npz": 4001 geometries of H2CO generated using normal mode
     sampling and calculated using ORCA [7] (B3LYP/cc-pVDZ).
ii)  "h2co_mp2_avtz_4001.npz": 4001 geometries of H2CO generated using normal mode
     sampling and calculated using MOLPRO 2019 [8] (MP2/aug-cc-pVTZ).
iii) "h2co_ccsdt_avtz_4001.npz": 4001 geometries of H2CO generated using normal mode
     sampling and calculated using MOLPRO 2019 [8] (CCSD(T)-F12/aug-cc-pVTZ-F12).
iv)  "h2co_ccsdt_avtz_2500_extrapol.npz": 2500 geometries of H2CO generated using normal mode
     sampling and calculated using MOLPRO 2019 [8] (CCSD(T)-F12/aug-cc-pVTZ-F12). This sampling
     was carried out at higher temperature (5000 K compared to 2000K) to test the extrapolation
     ability of the ML methods.

For more details, see http://arxiv.org/abs/2006.16752

---------------------------------------------------------------------------------------
HOW TO CITE:

When using this dataset, please cite the following paper:
Käser, S. and Koner, D. and Christensen, A. S. and von Lilienfeld, O. A. and Meuwly, M.
"ML Models of Vibrating H2CO: Comparing Reproducing Kernels,  FCHL and PhysNet"
arXiv:2006.16752

and the digital object identifier (DOI):
Käser, S. and Koner, D. and Christensen, A. S. and von Lilienfeld, O. A. and Meuwly, M. (2020).
H2CO Dataset. Zenodo. http://doi.org/10.5281/zenodo.3923823

---------------------------------------------------------------------------------------

[1] Koner, D.; Meuwly, M. arXiv e-prints 2020, arXiv:2005.04667
[2] Unke, O. T.; Meuwly, M. J. Chem. Theory Comput. 2019, 15, 3678–3693
[3] Faber, F. A.; Christensen, A. S.; Huang, B.; von Lilienfeld, O. A. J. Chem. Phys. 2018, 148, 241717
[4] Zhang, X.; Zou, S.; Harding, L. B.; Bowman, J. M. J. Phys. Chem. A 2004, 108, 8980–8986
[5] Herndon, S. C.; Nelson Jr, D. D.; Li, Y.; Zahniser, M. S. J. Quant. Spectrosc. Radiat. Transf. 2005, 90, 207–216
[6] Smith, J. S.; Isayev, O.; Roitberg, A. E. Sci. Data 2017, 4, 170193
[7] Neese, F. The ORCA program system. WIREs Comput. Mol. Sci. 2012, 2, 73–78
[8] Werner, H.-J.; Knowles, P. J.; Knizia, G.; Manby, F. R.; Schütz, M.; et al. https://www.molpro.net

Files

Files (2.9 MB)

Name Size Download all
md5:65247051f74418360e04333163f6b289
786.0 kB Download
md5:fb8c8e08e7fbca44ccbe3020e944d823
507.8 kB Download
md5:de3b3d5be02760747d96494462653951
803.1 kB Download
md5:95c93047a56d788ee9c7fb6fddc85a2d
803.1 kB Download
md5:0b4a10a9f1c38c6eaedea770dc5e6d0d
1.6 kB Download

Additional details

Related works

Is documented by
Preprint: arXiv:2006.16752 (arXiv)