Published November 23, 2020 | Version 2.0
Dataset Open

QM7-X: A comprehensive dataset of quantum-mechanical properties spanning the chemical space of small organic molecules

  • 1. Physics and Materials Science Research Unit, University of Luxembourg, L-1511 Luxembourg, Luxembourg.
  • 2. Department of Chemistry and Chemical Biology, Cornell University, Ithaca, NY 14853, USA.
  • 3. Argonne Leadership Computing Facility, Argonne National Laboratory, Argonne, IL 0439, USA.

Description

Here, we introduce QM7-X, a comprehensive dataset of > 40 physicochemical properties for ~4.2 M equilibrium and non-equilibrium structures of small organic molecules with up to seven non-hydrogen (C, N, O, S, Cl) atoms. To span this fundamentally important region of chemical compound space (CCS), QM7-X includes an exhaustive sampling of (meta-)stable equilibrium structures---comprised of constitutional/structural isomers and stereoisomers, e.g., enantiomers and diastereomers (including cis-trans-and conformational isomers)---as well as 100 non-equilibrium structural variations thereof to reach a total of ~4.2 M molecular structures. Computed at the tightly converged quantum-mechanical PBE0+MBD level of theory, QM7-X contains global (molecular) and local (atom-in-a-molecule) properties ranging from ground state quantities (such as atomization energies and dipole moments) to response quantities (such as polarizability tensors and dispersion coefficients). By providing a systematic, extensive, and tightly converged dataset of quantum-mechanically computed physical and chemical properties, we expect that QM7-X will play a critical role in the development of next-generation machine-learning based models for exploring greater swaths of CCS and performing in silico design of molecules with targeted properties.

The dataset is provided in eight HDF5 based files (compressed in .XZ files). One can also find here a README file with technical usage details and examples of how to access the information stored in the dataset (see createDB.py). 

*The paper explaining the generation of data stored in QM7-X can be found in Sci Data 8, 43 (2021). DOI: 10.1038/s41597-021-00812-2 . arXiv: https://arxiv.org/abs/2006.15139 .

Notes

JH, LMS, and AT acknowledge financial support from the European Research Council (ERC-CoG grant BeStMo). BGE and RAD are grateful for support from start-up funding through the College of Arts and Sciences at Cornell University. The results presented in this publication have been partially obtained using the HPC facilities of the University of Luxembourg. This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.

Files

README.txt

Files (9.6 GB)

Name Size Download all
md5:b50c6a5d0a4493c274368cf22285503e
715.4 MB Download
md5:4418a813daf5e0d44aa5a26544249ee6
1.0 GB Download
md5:f7b5aac39a745f11436047c12d1eb24e
2.1 GB Download
md5:26819601705ef8c14080fa7fc69decd4
1.5 GB Download
md5:85ac444596b87812aaa9e48d203d0b70
1.1 GB Download
md5:787fc4a9036af0e67c034a30ad854c07
2.0 GB Download
md5:5ecce00a188410d06b747cb683d8d347
1.1 GB Download
md5:c893ae88b8f5c32541c3f024fc1daa45
89.4 MB Download
md5:52e90f3bc00230602112c4f49cb2c0c7
3.6 kB Download
md5:5d886ccac38877c8cb26c07704dd1034
39.7 kB Download
md5:df8cfd48e0e3f565591afe4ee9a0693d
4.4 kB Preview Download