Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

There is a newer version of the record available.

Published June 17, 2023 | Version 1.0
Dataset Open

Data sets and machine learning models for: Machine learning from quantum chemistry to predict experimental solvent effects on reaction rates

  • 1. Massachusetts Institute of Technology

Description

The datasets and final machine learning model files for the manuscript "Machine learning from quantum chemistry to predict experimental solvent effects on reaction rates". Citation should refer directly to the manuscript:

  • Chung, Y.; Green, W. H. Machine learning from quantum chemistry to predict experimental solvent effects on reaction rates. ChemRxiv 2023, doi: 10.26434/chemrxiv-2023-f20bg

To use the machine learning models, please refer to the sample files and instructions on https://github.com/yunsiechung/chemprop/tree/RxnSolvKSE_ML

Detailed information can be found in README.md file.


Details on the files

In the pretraining and finetuning set csv files, each column represents:

  1. rxn_smiles: atom-mapped reaction SMILES
  2. solvent_smiles: solvent SMILES
  3. ddGsolv: solvation free energy of activation of a reaction-solvent pair at 298K in kcal/mol (main prediction target)
  4. ddHsolv: solvation enthalpy of activation of a reaction-solvent pair at 298K in kcal/mol (main prediction target)
  5. dGsolv_reactant: solvation free energy of reactant(s) at 298K in kcal/mol (additional feature)
  6. dGsolv_product: solvation free energy of product(s) at 298K in kcal/mol (additional feature)
  7. dHsolv_reactant: solvation enthalpy of reactant(s) at 298K in kcal/mol (additional feature)
  8. dHsolv_product: solvation enthalpy of product(s) at 298K in kcal/mol (additional feature)

Data sets under 'RxnSolvKSE_dataset_v1.0.zip'

  • pretraining_set: contains the dataset used for pre-training
    • all_data: contains all calculated data
      • pretraining_rxn_solvent_ddGsolv_ddHsolv_with_features_all.csv: contains both main prediction targets and additional feature for reaction-solvent pairs
      • pretraining_solvent_info.csv: list of all solvents
      • pretraining_unique_rxn.csv: list of all reactions, both forward and reverse directions
    • chosen_500k_data: contains the chosen 500k data
      • pretraining_rxn_solvent_ddGsolv_ddHsolv_500k.csv: contains main prediction targets for reaction-solvent pairs
      • pretraining_features_react_prod_dGsolv_dHsolv_500k.csv: contains additional features for reaction-solvent pairs
  • finetuning_set: contains the dataset used for fine-tuning
    • all_data: contains all calculated data
      • finetuning_rxn_solvent_ddGsolv_ddHsolv_with_features_all.csv: constains both main prediction targets and additional features for reaction-solvent pairs. The rxn_key column indicates whether the reaction is bimolecular hydrogen abstraction (bihabs), unimolecular hydrogen migration (intrahabs), or radical addition to a multiple bond (raddition). The 'fwd' and 'rev' each indicate forward and reverse reactions.
      • finetuning_solvent_info.csv: list of all solvents
      • finetuning_unique_rxn.csv: list of all reactions, both forward and reverse directions
    • chosen_data: contains chosen data
      • finetuning_rxn_solvent_ddGsolv_ddHsolv_chosen.csv: contains main prediction targets for reaction-solvent pairs
      • finetuning_features_react_prod_dGsolv_dHsolv_chosen.csv: contains additional features for reaction-solvent pairs
  • experimental_set/expt_rxn_atom_mapped_smiles.csv: contains the atom-mapped reaction SMILES used for the experimental data.The original experimental data can be found at https://zenodo.org/record/7747557.

Machine learning model files under 'RxnSolvKSE_ML_model_files.zip'

  • Contains the Chemprop machine learning model files for predicting ddGsolv and ddHsolv for a reaction-solvent pair. It takes atom-mapped reaction SMILES and solvent SMILES as inputs.
  • To use these ML models, please refer to the sample files and instructions on https://github.com/yunsiechung/chemprop/tree/RxnSolvKSE_ML

Files

README.md

Files (525.1 MB)

Name Size Download all
md5:6c9d2c8226a05fc074dedb83476fc97d
3.9 kB Preview Download
md5:e66e3bea65772bf9435106b117a6a1fd
230.6 MB Preview Download
md5:f9e1d33baa230b2cf7d22af86df34fa8
294.5 MB Preview Download