Published April 6, 2023 | Version 1.1.0
Dataset Open

Data sets and machine learning models for: Predicting critical properties and acentric factor of fluids using multi-task machine learning

  • 1. Massachusetts Institute of Technology

Description

The experimental data sets, data splits, additional features, QM calculations, model predictions, and final machine learning models for the manuscript "Predicting Critical Properties and Acentric Factor of Fluids Using Multi-Task Machine Learning". Citation should refer directly to the manuscript:

  • Biswas, S.; Chung, Y.; Ramirez, J.; Wu, H.; Green, W. H. Predicting Critical Properties and Acentric Factors of Fluids Using Multitask Machine Learning. Journal of Chemical Information and Modeling. 2023 63 (15), 4574-4588. DOI: 10.1021/acs.jcim.3c00546

To use the machine learning models, please refer to the sample files and instructions on https://github.com/yunsiechung/chemprop/tree/crit_prop

Detailed information can be found in README.md file.

 

Details on the properties considered

The data set includes the following 8 properties:

  • Tc: critical temperature, in K
  • Pc: critical pressure, in bar
  • rhoc: critical density, in mol/L
  • omega: acentric factor, unitless
  • Tb: boiling point, in K
  • Tm: melting point, in K
  • dHvap: enthalpy of vaporization at boiling point, in kJ/mol
  • dHfus: enthalpy of fusion at melting point, in kJ/mol

Details on the files

1. Data sets under CritProp_v1.1.0:

  • all_data: includes the data sets used in this work. All data points are listed for each chemical compound as well as its corresponding data source. The details of the data sources can be found in the README.md file. The distribution of the data set is included in each folder.
    • estimated_data_for_pretraining: contains the estimated data from Yaws' handbook that are used to pre-train our machine learning (ML) model.
    • experimental_data: contains the experimental data (references 1 - 15) used to fine-tune our final ML model.
  • additional_features: includes the additional features tested for the ML model. The Abraham features are generated for all data (references 1 - 15) while the acsf, qm, and rdkit features are only generated for the data from references 1 - 9.
    • abraham: Abraham solute parameters (E, S, A, B, L). Molecular features.
    • acsf: ACSF (atom-centered symmetry functions). Atomic features that are coverted from the 3D coordinates of the compound
    • qm_atom: QM (quantum chemical) atomic feature. 
    • qm_mol: QM molecular feature.
    • rdkit: Selected RDKit 2D molecular features.
  • data_splits_and_model_predictions: contains the training and test sets used to evaluate the model. It also contains the predicted values from our final ML model for each test set.
    • random and scaffold splits: training and test sets that include the data from references 1 - 9.
    • external test set: a test set that includes the data from only references 10 - 15.

2. Machine learning (ML) model files:

  • CritProp_ML_model_files_with_abraham_feat.zip: contains the Chemprop ML model files that are trained using Abraham features as additional molecular features. This gives the best results.
  • CritProp_ML_model_files_without_additional_feat.zip: contains the Chemprop ML model files that are trained without any additional features. This gives the second best results.

To use these ML models, please refer to the sample files and instructions on https://github.com/yunsiechung/chemprop/tree/crit_prop

3. QM (quantum chemical) calculations:

  • QM_calculations.zip: contains the results of the QM calculations that are performed to compute QM features.

 

 

Files

CritProp_ML_model_files_with_abraham_feat.zip

Files (1.8 GB)

Name Size Download all
md5:92764f6c5a40085dd8d21ecd6ceb2536
798.1 MB Preview Download
md5:cd90384331e81b2e331f6249b374c47b
797.6 MB Preview Download
md5:0962dcd12b6cab5fe3ef6d20285deece
15.8 MB Preview Download
md5:eaf2c09b85eb193705d95010369eb696
161.9 MB Preview Download
md5:664b23b4d748352a837a6e7a019f794d
6.8 kB Preview Download