There is a newer version of the record available.

Published August 30, 2023 | Version 1.0.0
Dataset Open

Dataset for "ConfSolv: Prediction of solute conformer free energies across a range of solvents"

  • 1. Massachusetts Institute of Technology
  • 2. BASF SE Scientific Modelling
  • 3. Katholieke Universiteit Leuven

Description

This dataset contains two archives. The first archive, full_dataset.zip, contains geometries and free energies for nearly 44,000 solute molecules with almost 9 million conformers, in 42 different solvents. The geometries and gas phase free energies are computed using density functional theory (DFT). The solvation free energy for each conformer is computed using COSMO-RS and the solution free energies are computed using the sum of the gas phase free energies and the solvation free energies. The geometries for each solute conformer are provided as ASE_atoms_objects within a pandas DataFrame, found in the compressed file dft coords.pkl.gz within full_dataset.zip. The gas-phase energies, solvation free energies, and solution free energies are also provided as a pandas DataFrame in the compressed file free_energy.pkl.gz within full_dataset.zip. Ten example data splits for both random and scaffold split types are also provided in the ZIP archive for training models. Scaffold split index 0 is used to generate results in the corresponding publication. 

The second archive, refined_conf_search.zip, contains geometries and free energies for a representative sample of 28 solute molecules from the full dataset that were subject to a refined conformer search and thus had more conformers located. The format of the data is identical to full_dataset.zip.

Files

full_dataset.zip

Files (11.5 GB)

Name Size Download all
md5:f0f27186f9e5091ea17604ddb25085f1
11.4 GB Preview Download
md5:6f65c02802997c13ca1d3d7d8343804f
120.4 MB Preview Download