Published July 13, 2021 | Version v2
Dataset Restricted

Predicting the skin sensitization potential of small molecules with machine learning models trained on biologically meaningful descriptors

  • 1. Center for Bioinformatics (ZBH), Department of Informatics, Universität Hamburg, 20146 Hamburg, Germany; HITeC e.V., 22527 Hamburg, Germany
  • 2. Department of Pharmaceutical Sciences, Faculty of Life Sciences, University of Vienna, 1090 Vienna, Austria
  • 3. Center for Bioinformatics (ZBH), Department of Informatics, Universität Hamburg, 20146 Hamburg, Germany
  • 4. Department of Chemistry and Computational Biology Unit (CBU), University of Bergen, N-5020 Bergen, Norway
  • 5. MTM Research Centre, School of Science and Technology, Örebro University, SE-70182 Örebro, Sweden; Department of Computer and Systems Sciences, Stockholm University, SE-16407 Kista, Sweden; Department of Pharmaceutical Biosciences, Uppsala University, SE-75124 Uppsala, Sweden
  • 6. Front End Innovation, Beiersdorf AG, 22529 Hamburg, Germany
  • 7. Center for Bioinformatics (ZBH), Department of Informatics, Universität Hamburg, 20146 Hamburg, Germany; Department of Pharmaceutical Sciences, Faculty of Life Sciences, University of Vienna, 1090 Vienna, Austria; Department of Chemistry and Computational Biology Unit (CBU), University of Bergen, N-5020 Bergen, Norway

Description

In recent years a number of machine learning models for the prediction of the skin sensitization potential of small organic molecules have been reported and become available. These models generally perform well within their applicability domains but, as a result of the use of molecular fingerprints and other non-intuitive descriptors, the interpretability of the existing models is clearly limited. The aim of this work is to develop a strategy to replace the non-intuitive features by predicted outcomes of bioassays. We show that such replacement is indeed possible and that as few as ten interpretable, predicted bioactivities are sufficient to reach competitive performance. On a holdout data set of 257 compounds, the best model ("Skin Doctor CP:Bio") obtained an efficiency of 0.82 and an MCC of 0.52 (at the significance level of 0.20). Skin Doctor CP:Bio is available from the authors free of charge for academic research. The modeling strategies explored in this work are easily transferable and could be adopted for the development of more interpretable machine learning models for the prediction of the bioactivity and toxicity of small organic compounds.

The corresponding research article has been published in Pharmaceuticals 2021, 14(8), 790, DOI: https://doi.org/10.3390/ph14080790

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Request access

If you would like to request access to these files, please fill out the form below.

You need to satisfy these conditions in order for this request to be accepted:

Files are permitted for academic research only.

You are currently not logged in. Do you have an account? Log in here