There is a newer version of the record available.

Published February 9, 2025 | Version 1.2.0
Publication Open

Deep Learning for Protein-Ligand Docking: Are We There Yet?

Description

Included are preprocessed datasets and corresponding protein multiple sequence alignments, notebook metadata, and benchmark method predictions accompanying the benchmarking manuscript "Deep Learning for Protein-Ligand Docking: Are We There Yet?" [1]. In particular, the preprocessed Astex Diverse, PoseBusters Benchmark, and DockGen-E datasets as well as the publicly available CASP15 targets referenced in the manuscript are available for download. Also available are baseline method predictions from a variety of deep learning and conventional docking methods (e.g., AlphaFold 3, AutoDock Vina) for each of these benchmarking datasets. Note that the "holo_aligned" AlphaFold 3-predicted protein structures provided for the Astex Diverse, PoseBusters Benchmark, and DockGen-E datasets have been pre-aligned to the corresponding ground-truth (holo) protein structures. Similarly, the "predicted_structures" AlphaFold 3-predicted protein structures provided for the CASP15 dataset have been pre-aligned to the corresponding ground-truth (holo) protein structures. ESMFold's predicted protein structures for each dataset are also included separately for licensing flexibility.

 

Paper Abstract:

The effects of ligand binding on protein structures and their in vivo functions carry numerous implications for modern biomedical research and biotechnology development efforts such as drug discovery. Although several deep learning (DL) methods and benchmarks designed for protein-ligand docking have recently been introduced, to date no prior works have systematically studied the behavior of the latest docking and structure prediction methods within the broadly applicable context of (1) using predicted (apo) protein structures for docking (e.g., for applicability to new proteins); (2) binding multiple (cofactor) ligands concurrently to a given target protein (e.g., for enzyme design); and (3) having no prior knowledge of binding pockets (e.g., for generalization to unknown pockets). To enable a deeper understanding of docking methods' real-world utility, we introduce PoseBench, the first comprehensive benchmark for broadly applicable protein-ligand docking. PoseBench enables researchers to rigorously and systematically evaluate DL methods for apo-to-holo protein-ligand docking and protein-ligand structure prediction using both primary ligand and multi-ligand benchmark datasets, the latter of which we introduce for the first time to the DL community. Empirically, using PoseBench, we find that (1) DL co-folding methods generally outperform comparable conventional and DL docking baselines, yet popular methods such as AlphaFold 3 are still challenged by prediction targets with novel protein sequences; (2) certain DL co-folding methods are highly sensitive to their input multiple sequence alignments, while others are not; and (3) DL methods struggle to strike a balance between structural accuracy and chemical specificity when predicting novel or multi-ligand protein targets. Code, data, tutorials, and benchmark results are available at https://github.com/BioinfoMachineLearning/PoseBench.

 

References:

[1] Morehead A, Giri N, Liu J, Neupane P, Cheng J. Deep Learning for Protein-Ligand Docking: Are We There Yet? arXiv; 2025. Available from: http://arxiv.org/abs/2308.05777

Files

Files (17.1 GB)

Name Size Download all
md5:5c08085921858dbe7e67b08a94000533
848.5 MB Download
md5:06e11417a74b4db9f5bcdb3fdd0b87ac
528.0 MB Download
md5:93fbbfd977578dfc3341dd6e1960b4e6
670.5 MB Download
md5:05c95018bff0691c693ea236a311d5c5
242.2 MB Download
md5:ab37f7de53fe8f8a8a5d05e5fabe6c17
1.5 GB Download
md5:0d3f21f9aae08a999bf972dc84e4e227
10.9 MB Download
md5:73d82258c2025023c153a2633cad7bcc
1.4 GB Download
md5:d2c3a774277c9f0700708047281398b4
8.3 GB Download
md5:81af75ef6450aeea80a2dcd5f70a2479
1.6 GB Download
md5:c14277752c7e21c63c8a0e36a3943188
38.4 MB Download
md5:a6dd51dbbc89048377c6f1b892a7eaf0
1.9 GB Download
md5:f960ab6b736dda8112f6da537548a5b8
107.2 MB Download
md5:1cb9a03906956f405bd7787f9c978273
18.8 MB Download

Additional details

Software

Repository URL
https://github.com/BioinfoMachineLearning/PoseBench
Programming language
Python
Development Status
Active