Data sets and machine learning models for: Predicting critical properties and acentric factor of fluids using multi-task machine learning
- 1. Massachusetts Institute of Technology
Description
The experimental data sets, data splits, additional features, QM calculations, model predictions, and final machine learning models for the manuscript "Predicting Critical Properties and Acentric Factor of Fluids Using Multi-Task Machine Learning". Citation should refer directly to the manuscript:
-
Biswas, S.; Chung, Y.; Ramirez, J.; Wu, H.; Green, W. H. Predicting Critical Properties and Acentric Factors of Fluids Using Multitask Machine Learning. Journal of Chemical Information and Modeling. 2023 63 (15), 4574-4588. DOI: 10.1021/acs.jcim.3c00546
To use the machine learning models, please refer to the sample files and instructions on https://github.com/yunsiechung/chemprop/tree/crit_prop.
Detailed information can be found in README.md file.
Details on the properties considered
The data set includes the following 8 properties:
- Tc: critical temperature, in K
- Pc: critical pressure, in bar
- rhoc: critical density, in mol/L
- omega: acentric factor, unitless
- Tb: boiling point, in K
- Tm: melting point, in K
- dHvap: enthalpy of vaporization at boiling point, in kJ/mol
- dHfus: enthalpy of fusion at melting point, in kJ/mol
Details on the files
1. Data sets under CritProp_v1.1.0:
- all_data: includes the data sets used in this work. All data points are listed for each chemical compound as well as its corresponding data source. The details of the data sources can be found in the README.md file. The distribution of the data set is included in each folder.
- estimated_data_for_pretraining: contains the estimated data from Yaws' handbook that are used to pre-train our machine learning (ML) model.
- experimental_data: contains the experimental data (references 1 - 15) used to fine-tune our final ML model.
- additional_features: includes the additional features tested for the ML model. The Abraham features are generated for all data (references 1 - 15) while the acsf, qm, and rdkit features are only generated for the data from references 1 - 9.
- abraham: Abraham solute parameters (E, S, A, B, L). Molecular features.
- acsf: ACSF (atom-centered symmetry functions). Atomic features that are coverted from the 3D coordinates of the compound
- qm_atom: QM (quantum chemical) atomic feature.
- qm_mol: QM molecular feature.
- rdkit: Selected RDKit 2D molecular features.
- data_splits_and_model_predictions: contains the training and test sets used to evaluate the model. It also contains the predicted values from our final ML model for each test set.
- random and scaffold splits: training and test sets that include the data from references 1 - 9.
- external test set: a test set that includes the data from only references 10 - 15.
2. Machine learning (ML) model files:
- CritProp_ML_model_files_with_abraham_feat.zip: contains the Chemprop ML model files that are trained using Abraham features as additional molecular features. This gives the best results.
- CritProp_ML_model_files_without_additional_feat.zip: contains the Chemprop ML model files that are trained without any additional features. This gives the second best results.
To use these ML models, please refer to the sample files and instructions on https://github.com/yunsiechung/chemprop/tree/crit_prop
3. QM (quantum chemical) calculations:
- QM_calculations.zip: contains the results of the QM calculations that are performed to compute QM features.
Files
CritProp_ML_model_files_with_abraham_feat.zip
Files
(1.8 GB)
Name | Size | Download all |
---|---|---|
md5:92764f6c5a40085dd8d21ecd6ceb2536
|
798.1 MB | Preview Download |
md5:cd90384331e81b2e331f6249b374c47b
|
797.6 MB | Preview Download |
md5:0962dcd12b6cab5fe3ef6d20285deece
|
15.8 MB | Preview Download |
md5:eaf2c09b85eb193705d95010369eb696
|
161.9 MB | Preview Download |
md5:664b23b4d748352a837a6e7a019f794d
|
6.8 kB | Preview Download |