Published December 25, 2021 | Version v1
Journal article Open

AI-Bind: Improving Binding Predictions for Novel Protein Targets and Ligands

  • 1. Northeastern University
  • 2. University of California, San Diego
  • 3. Harvard Medical School

Contributors

Contact person:

  • 1. Northeastern University

Description

Identifying novel drug-target interactions (DTI) is a critical and rate limiting step in drug discovery. While deep learning models have been proposed to accelerate the identification process, we show that state-of-the-art models fail to generalize to novel (i.e., never-before-seen) structures. We first unveil the mechanisms responsible for this shortcoming, demonstrating how models rely on shortcuts that leverage the topology of the protein-ligand bipartite network, rather than learning the node features. Then, we introduce AI-Bind, a pipeline that combines network-based sampling strategies with unsupervised pre-training, allowing us to limit the annotation imbalance and improve binding predictions for novel proteins and ligands. We illustrate the value of AI-Bind by predicting drugs and natural compounds with binding affinity to SARS-CoV-2 viral proteins and the associated human proteins. We also validate these predictions via auto-docking simulations and comparison with recent experimental evidence, and step up the process of interpreting machine learning prediction of protein-ligand binding by identifying potential active binding sites on the amino acid sequence. Overall, AI-Bind offers a powerful high-throughput approach to identify drug-target combinations, with the potential of becoming a powerful tool in drug discovery.

Files

Files (14.3 GB)

Name Size Download all
md5:cb1cf5806e2377681e1f0cb6ea8f3c72
14.3 GB Download

Additional details

Related works

Is cited by
Journal article: 10.48550/arXiv.2112.13168 (DOI)