Published November 23, 2020 | Version 2.0
Dataset Open

QM7-X: A comprehensive dataset of quantum-mechanical properties spanning the chemical space of small organic molecules

  • 1. Physics and Materials Science Research Unit, University of Luxembourg, L-1511 Luxembourg, Luxembourg.
  • 2. Department of Chemistry and Chemical Biology, Cornell University, Ithaca, NY 14853, USA.
  • 3. Argonne Leadership Computing Facility, Argonne National Laboratory, Argonne, IL 0439, USA.


Here, we introduce QM7-X, a comprehensive dataset of > 40 physicochemical properties for ~4.2 M equilibrium and non-equilibrium structures of small organic molecules with up to seven non-hydrogen (C, N, O, S, Cl) atoms. To span this fundamentally important region of chemical compound space (CCS), QM7-X includes an exhaustive sampling of (meta-)stable equilibrium structures---comprised of constitutional/structural isomers and stereoisomers, e.g., enantiomers and diastereomers (including cis-trans-and conformational isomers)---as well as 100 non-equilibrium structural variations thereof to reach a total of ~4.2 M molecular structures. Computed at the tightly converged quantum-mechanical PBE0+MBD level of theory, QM7-X contains global (molecular) and local (atom-in-a-molecule) properties ranging from ground state quantities (such as atomization energies and dipole moments) to response quantities (such as polarizability tensors and dispersion coefficients). By providing a systematic, extensive, and tightly converged dataset of quantum-mechanically computed physical and chemical properties, we expect that QM7-X will play a critical role in the development of next-generation machine-learning based models for exploring greater swaths of CCS and performing in silico design of molecules with targeted properties.

The dataset is provided in eight HDF5 based files (compressed in .XZ files). One can also find here a README file with technical usage details and examples of how to access the information stored in the dataset (see 

*The paper explaining the generation of data stored in QM7-X can be found in Sci Data 8, 43 (2021). DOI: 10.1038/s41597-021-00812-2 . arXiv: .


JH, LMS, and AT acknowledge financial support from the European Research Council (ERC-CoG grant BeStMo). BGE and RAD are grateful for support from start-up funding through the College of Arts and Sciences at Cornell University. The results presented in this publication have been partially obtained using the HPC facilities of the University of Luxembourg. This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.



Files (9.6 GB)

Name Size Download all
715.4 MB Download
1.0 GB Download
2.1 GB Download
1.5 GB Download
1.1 GB Download
2.0 GB Download
1.1 GB Download
89.4 MB Download
3.6 kB Download
39.7 kB Download
4.4 kB Preview Download