Published November 14, 2023 | Version v1
Dataset Open

COMP6v2 Release

Description

COmprehensive Machine-learning Potential (COMP6) Benchmark Suite version 2.0

 

The COMP6v2 data is an extension of the COMP6 benchmark found in the following repository: https://github.com/isayev/COMP6

COMP6v2 is a data set of density functional properties for molecules containing H, C, N, O, S, F, and Cl. 

It is available at the following levels of theory: 

  • wB97X/631Gd (data used to train model in the ANI-2x paper)
  • wB97MD3BJ/def2TZVPP
  • wB97MV/def2TZVPP
  • B973c/def2mTZVP

 

You will notice that COMP6v1 is split into subsets: 

  • ANI-MD
  • DrugBank
  • GDB07to09
  • GDB10to13
  • Tripeptides
  • s66x8

As we are releasing multiple levels of theory here, each h5 file is a combination of all subsets. 

 

The sample data loader script provides an example of how to access the contents of the data sets using h5py. 

View the supplementary information for more information on the formatting and contents of the data sets. 

Details on the generation of this data can be found in the ANI-2x paper

Files

supplementary_information.pdf

Files (538.5 MB)

Name Size Download all
md5:90c4167b244cfb03fc0fdbf8ff8f3b64
138.9 MB Download
md5:c3cdcbe2ef396bd669586d171967093b
183.8 MB Download
md5:2e5f716c490fb0ad127be5cf0d413bf6
48.9 MB Download
md5:0a417148966022e72f54c135d8f3d4e7
166.9 MB Download
md5:93065af62e2e0ebe634f74a0c34d231a
664 Bytes Download
md5:fd6312a019b70f554374a3bd1271842e
72.9 kB Preview Download

Additional details

Related works

References
Journal: 10.1021/acs.jctc.0c00121 (DOI)

Dates

Available
2023-11-14

References

  • Extending the Applicability of the ANI Deep Learning Molecular Potential to Sulfur and Halogens Christian Devereux, Justin S. Smith, Kate K. Huddleston, Kipton Barros, Roman Zubatyuk, Olexandr Isayev, and Adrian E. Roitberg Journal of Chemical Theory and Computation 2020 16 (7), 4192-4202 DOI: 10.1021/acs.jctc.0c00121