Published December 31, 2024 | Version v2
Dataset Open

Datasets for understanding the importance of conformation in property prediction models

  • 1. ROR icon Nara Institute of Science and Technology
  • 2. Nara Institute of Science and Technology(NAIST)

Description

Descriptor and conformer data sets for molecular property and reaction selectivity prediction tasks. The PQC data set was created based on a part of the PubChemQC PM6 dataset (J. Chem. Inf. Model. 2020, 60, 12, 5891–5899), which contains two- and three-dimensional descriptors and conformers. The APTC data sets are based on the data sets for asymmetric phase transfer catalysts with enantio-selectivity (https://github.com/Laboratoire-de-Chemoinformatique/3D-MIL-QSSR/tree/main/datasets). The melting point data set was created from the Jean-Claude Bradley Double Plus Good (Highly Curated and Validated) Melting Points Dataset (https://doi.org/10.6084/m9.figshare.1031638.v1).

They contained descriptors and conformers to train and validate machine learning models.

Detailed explanations on how to use these datasets are found in the Github repository: https://github.com/YuHamakawa/Conformation-Importance-ML-Models

 

 

Files

dataset.zip

Files (9.3 GB)

Name Size Download all
md5:34959746f624327b7f79813d5a3cc4ca
9.3 GB Preview Download