XTB2-MolData : Dataset of 12 Million Molecules
Creators
- 1. Univ. Lyon, Université Claude Bernard Lyon 1, CNRS, Institut Lumière Matière, UMR5306, F-69622 Villeurbanne, France
- 2. Univ. Lyon, Université Claude Bernard Lyon 1, CNRS, Institut Lumière Matière, UMR5306, F-69622 Villeurbanne, France
Description
This dataset is an open chemistry database containing optimized molecular geometries and electronic properties calculated by the GFN2-xTB method (C. Bannwarth et al.) for 12.6 million organic molecules contained C, H, O, and N atoms.
The initial geometries, before optimization by GFN2-xTB method, are taken from PubChem PM6 (Shimazaki et al.) database.
We also include our python code to manage a large molecule database. This code includes scripts to generate input files for Gaussian software, to read Gaussian output files, to create a small reduced dataset based on clustering algorithm, and many scripts to analyze the molecular properties included in the database.
This code can be also taken from github: https://github.com/Castaneche/MolDataFW.
Files
MolData_XTB2_V1.zip
Files
(34.2 GB)
Name | Size | Download all |
---|---|---|
md5:9adfae897fc6c9c7869b2e7193124d15
|
34.2 GB | Preview Download |
Additional details
References
- Tomomi Shimazaki, Masatomo Hashimoto, and Toshiyuki Maeda J. Chem. Inf. Model. 2020, 60, 12, 5891–5899 https://doi.org/10.1021/acs.jcim.0c00740
- C. Bannwarth, E. Caldeweyher, S. Ehlert, A. Hansen, P. Pracht, J. Seibert, S. Spicher, S. Grimme WIREs Comput. Mol. Sci., 2020, 11, e01493. DOI: 10.1002/wcms.1493