Journal article Open Access

Assessing QSAR Limitations - A Regulatory Perspective

Tong, Weida; Hong, Huixiao; Xie, Qian; Shi, Leming; Fang, Hong; Perkins, Roger

Wider acceptance of QSARs would result in a constellation of benefits and savings to both private and public sectors. For this to occur, particularly in regulatory applications, a models limitations need to be identified. We define a models limitations as encompassing assessment of overall prediction accuracy, applicability domain and chance correlation. A general guideline is presented in this review for assessing a models limitations with emphasis on and examples of application with consensus modeling methods. More specifically, we discuss the commonalities and differences between external validation and cross-validation for assessing a models limitations. We illustrate two common ways of assessing overall prediction accuracy, depending on whether or not the intended application domain is predefined. Since even a high quality model will have different confidence in accuracy for predicting different chemicals, we further demonstrate using the novel Decision Forest consensus modeling method a means to determine prediction confidence (i.e., certainty for an individual chemicals prediction) and domain extrapolation (i.e., the prediction accuracy for a chemical that is outside the chemistry space defined by the training chemicals). We show that prediction confidence and domain extrapolation are related measures that together determine the applicability domain of a model, and that prediction confidence is the more important measure. Lastly, the importance of assessing chance correlation is emphasized, and illustrated with several examples of models having a high degree of chance correlations despite cross-validation indicating high prediction accuracy. Generally, a dataset with a skewed distribution, small data size and/or low signal/noise ratio tends to produce a model with high chance correlation. We conclude that it is imperative to assess all three aspects (i.e., overall accuracy, applicability domain and chance correlation) of a model for the regulatory acceptance of QSARs.

Files (421.3 kB)
Name Size
421.3 kB Download
Views 109
Downloads 98
Data volume 41.3 MB
Unique views 108
Unique downloads 97


Cite as