Dataset for "ConfSolv: Prediction of solute conformer free energies across a range of solvents"
Creators
- 1. Massachusetts Institute of Technology
- 2. BASF SE Scientific Modelling
- 3. Katholieke Universiteit Leuven
Description
This dataset contains two archives. The first archive, full_dataset.zip, contains geometries and free energies for nearly 44,000 solute molecules with almost 9 million conformers, in 42 different solvents. The geometries and gas phase free energies are computed using density functional theory (DFT). The solvation free energy for each conformer is computed using COSMO-RS and the solution free energies are computed using the sum of the gas phase free energies and the solvation free energies. The geometries for each solute conformer are provided as ASE_atoms_objects within a pandas DataFrame, found in the compressed file dft coords.pkl.gz within full_dataset.zip. The gas-phase energies, solvation free energies, and solution free energies are also provided as a pandas DataFrame in the compressed file free_energy.pkl.gz within full_dataset.zip. Ten example data splits for both random and scaffold split types are also provided in the ZIP archive for training models. Scaffold split index 0 is used to generate results in the corresponding publication.
The second archive, refined_conf_search.zip, contains geometries and free energies for a representative sample of 28 solute molecules from the full dataset that were subject to a refined conformer search and thus had more conformers located. The format of the data is identical to full_dataset.zip.
Files
full_dataset.zip
Files
(11.5 GB)
Name | Size | Download all |
---|---|---|
md5:f0f27186f9e5091ea17604ddb25085f1
|
11.4 GB | Preview Download |
md5:6f65c02802997c13ca1d3d7d8343804f
|
120.4 MB | Preview Download |