Group contribution models for heat of formation (SUB 2018)
Authors/Creators
- 1. University of Plovdiv, Department of Analytical Chemistry and Computer Chemistry, 24 Tsar Assen St., Plovdiv 4000, Bulgaria
Description
We present a set of group contribution models for predicting heat of formation of organic compounds. A dataset containing 1004 molecular structures from DIPPR database was split into a learning and a test sets further used for model training and validation. The model building was performed with software Ambit-GCM (https://doi.org/10.5281/zenodo.1470793). A set of preliminary models were build according to various fragmentation schemes, with and without use of correction factors and external descriptors. Different orders of additive schemes were studied. Every model in the set was validated using leave-one-out procedure and Y-scrambling technique as well as model performances were tested using the external dataset. The best five models full data and corresponding statistical characteristics are available in models.zip. The model 2 is available also as a JSON file in the archive and can be used for theoretical prediction of heat of formation.
To use the model 2 please download gcm-predict.jar from https://doi.org/10.5281/zenodo.1470793). Example application of gcm-predict (Ambit-GCM) module for a single structure is given below:
java -jar gcm-predict-v1.1.jar -s CC(C)OCC(C)O -c model_2.json
GCM value (Hf) for CC(C)OCC(C)O is -528.7163407123614
The gcm-predict (Ambit-GCM) module can also be applied for a set of structures. An example follows with 5 molecules inputted as a *.csv file:
java -jar gcm-predict-v1.1.jar -i Prediction_Examples.csv -c model_2.json
GCM calculateting property Hf for 5 molecules ...
Mol#,ModelValue(Hf),SMILES,CalcStatus
1,-126.23055043388635,CCCC,OK
2,-353.4883670381994,c1ccc(c(c1)O)O,OK
3,-524.1758445616103,CCC(CO)O,OK
4,-220.91676720730212,CC(Cc1ccccc1)O,OK
5,-728.1309488149211,C(C(Cl)(Cl)F)(F)F,OK
The output lines contain: molecule number, predicted Hf, SMILES, and calculation status
The full data for training and validation is available in learn-test-sets.zip. Data can be used for retraining or improving the models.
More examples for using Ambit-GCM software for group contribution modeling and property prediction are available in https://doi.org/10.5281/zenodo.1471646