Published July 8, 2022 | Version v4
Journal article Open

ProfhEX: AI-based platform for small molecules liability profiling

Description

Background

Drugs off-target interactions are one of the main reasons of candidate failure in the drug discovery process. Anticipating potential drug’s adverse effects in the early stages is necessary to minimize health risks on patients, animal testing, and economical costs. With the constantly increasing size of virtual screening libraries AI-driven methods can be exploited as first-tier screening tools proving liability estimation for drug candidates.

Objectives

We present ProfhEX, an AI-driven suite of 46 OECD-compliant machine learning models able to profile small molecules on 7 relevant liability groups, namely: cardiovascular, central nervous system, gastrointestional, endocrine disruption, renal, pumlonary and immune response toxicities. 

Methods

Experimental affinity data was collected from public and commercial data sources. The entire chemical space comprised 289’202 activity data for a total of 210’116 unique compounds, spanning over 46 targets with dataset sizes ranging from 819 to 18896. Gradient boosting and random forest algorithms were initially employed and ensembled for the selection of a champion model. Models were validated according to the OECD principles, including robust internal (cross validation, bootstrap, y-scrambling) and external validation.

Results

Champion models achieved an average Pearson correlation coefficient of 0.81 (SD of 0.06) and a root mean squared error of 0.75 (SD of 0.09). All liability groups showed good hit-retrievement power with and average enrichment factor (at 5%) of 13.1 (SD of 3.0) and AUC of 0.92 (SD of 0.05).

Conclusion

ProfhEX would be a useful tool for large-scale liability profiling of small molecules. This suite will be further expanded with the inclusion of new targets and by complementary modelling approaches, including docking and pharmacophore-based models.

Files

Ligands_PCAdataset.csv

Files (303.0 MB)

Name Size Download all
md5:89175a24e7978493d70badfbe89a1562
2.1 MB Download
md5:eb138b5f796be3dd3db025578813fd2c
20.2 MB Download
md5:010f0b9780b7ff7c148fc018184aa4af
114.8 MB Download
md5:7c1b354d5aaca0cb05f04c822a2d8f41
20.6 MB Preview Download
md5:6f2b21ec5ce2e553b7a58f5a657b0a0d
144.7 MB Download
md5:a0481ac3a915f4ef4d2e75d314b628a4
128.0 kB Download
md5:05c7919fdfe24e39256aca3d6bdb88bf
250.5 kB Download
md5:18123b219979c4556513b0718fb92640
196.9 kB Download