Support Vector Machine Classification and Regression Prioritize Different Structural Features for Binary Compound Activity and Potency Value Prediction

doi:10.1021/acsomega.7b01079

Published October 4, 2017 | Version v1

Journal article Open

Support Vector Machine Classification and Regression Prioritize Different Structural Features for Binary Compound Activity and Potency Value Prediction

1. Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universitaẗ,Dahlmannstr.2,D-53113Bonn,Germany

In computational chemistry and chemoinformatics, the support vector machine (SVM) algorithm is among the most widely used machine learning methods for the identification of new active compounds. In addition, support vector regression (SVR) has become a preferred approach for modeling nonlinear structure−activity relationships and predicting compound potency values. For the closely related SVM and SVR methods, fingerprints (i.e., bit string or feature set representations of chemical structure and properties) are generally preferred descriptors. Herein, we have compared SVM and SVR calculations for the same compound data sets to evaluate which features are responsible for predictions. On the basis of systematic feature weight analysis, rather surprising results were obtained. Fingerprint features were frequently identified that contributed differently to the corresponding SVM and SVR models. The overlap between feature sets determining the predictive performance of SVM and SVR was only very small. Furthermore, features were identified that had opposite effects on SVM and SVR predictions. Feature weight analysis in combination with feature mapping made it also possible to interpret individual predictions, thus balancing the black box character of SVM/SVR modeling.

Files

acsomega.7b01079(1).pdf

Files (1.5 MB)

Name	Size	Download all
acsomega.7b01079(1).pdf md5:7f2b01074565dc17471146d0b404f4b9	1.5 MB	Preview Download

Additional details

BIGCHEM – Big Data in Chemistry 676434: European Commission

	All versions	This version
Views	175	175
Downloads	109	109
Data volume	163.6 MB	163.6 MB

Support Vector Machine Classification and Regression Prioritize Different Structural Features for Binary Compound Activity and Potency Value Prediction

Creators

Description

Files

acsomega.7b01079(1).pdf

Files (1.5 MB)

Additional details

Funding