Deep Learning for Protein-Ligand Docking: Are We There Yet?
Authors/Creators
- 1. University of Missouri
Description
Included are preprocessed datasets and corresponding protein multiple sequence alignments, notebook metadata, and benchmark method predictions accompanying the benchmarking manuscript "Deep Learning for Protein-Ligand Docking: Are We There Yet?" [1]. In particular, the preprocessed Astex Diverse, PoseBusters Benchmark, and DockGen-E datasets as well as the publicly available CASP15 targets referenced in the manuscript are available for download. Also available are baseline method predictions from a variety of deep learning and conventional docking methods (e.g., AlphaFold 3, AutoDock Vina) for each of these benchmarking datasets. Note that the "holo_aligned" AlphaFold 3-predicted protein structures provided for the Astex Diverse, PoseBusters Benchmark, and DockGen-E datasets have been pre-aligned to the corresponding ground-truth (holo) protein structures. Similarly, the "predicted_structures" AlphaFold 3-predicted protein structures provided for the CASP15 dataset have been pre-aligned to the corresponding ground-truth (holo) protein structures. ESMFold's predicted protein structures for each dataset are also included separately for licensing flexibility.
Paper Abstract:
The effects of ligand binding on protein structures and their in vivo functions carry numerous implications for modern biomedical research and biotechnology development efforts such as drug discovery. Although several deep learning (DL) methods and benchmarks designed for protein-ligand docking have recently been introduced, to date no prior works have systematically studied the behavior of the latest docking and structure prediction methods within the broadly applicable context of (1) using predicted (apo) protein structures for docking (e.g., for applicability to new proteins); (2) binding multiple (cofactor) ligands concurrently to a given target protein (e.g., for enzyme design); and (3) having no prior knowledge of binding pockets (e.g., for generalization to unknown pockets). To enable a deeper understanding of docking methods' real-world utility, we introduce PoseBench, the first comprehensive benchmark for broadly applicable protein-ligand docking. PoseBench enables researchers to rigorously and systematically evaluate DL methods for apo-to-holo protein-ligand docking and protein-ligand structure prediction using both primary ligand and multi-ligand benchmark datasets, the latter of which we introduce for the first time to the DL community. Empirically, using PoseBench, we find that (1) DL co-folding methods generally outperform comparable conventional and DL docking baselines, yet popular methods such as AlphaFold 3 are still challenged by prediction targets with novel protein sequences; (2) certain DL co-folding methods are highly sensitive to their input multiple sequence alignments, while others are not; and (3) DL methods struggle to strike a balance between structural accuracy and chemical specificity when predicting novel or multi-ligand protein targets. Code, data, tutorials, and benchmark results are available at https://github.com/BioinfoMachineLearning/PoseBench.
References:
[1] Morehead A, Giri N, Liu J, Neupane P, Cheng J. Deep Learning for Protein-Ligand Docking: Are We There Yet? arXiv; 2025. Available from: http://arxiv.org/abs/2308.05777
Files
Files
(17.1 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:5c08085921858dbe7e67b08a94000533
|
848.5 MB | Download |
|
md5:06e11417a74b4db9f5bcdb3fdd0b87ac
|
528.0 MB | Download |
|
md5:93fbbfd977578dfc3341dd6e1960b4e6
|
670.5 MB | Download |
|
md5:05c95018bff0691c693ea236a311d5c5
|
242.2 MB | Download |
|
md5:ab37f7de53fe8f8a8a5d05e5fabe6c17
|
1.5 GB | Download |
|
md5:0d3f21f9aae08a999bf972dc84e4e227
|
10.9 MB | Download |
|
md5:73d82258c2025023c153a2633cad7bcc
|
1.4 GB | Download |
|
md5:d2c3a774277c9f0700708047281398b4
|
8.3 GB | Download |
|
md5:81af75ef6450aeea80a2dcd5f70a2479
|
1.6 GB | Download |
|
md5:c14277752c7e21c63c8a0e36a3943188
|
38.4 MB | Download |
|
md5:a6dd51dbbc89048377c6f1b892a7eaf0
|
1.9 GB | Download |
|
md5:f960ab6b736dda8112f6da537548a5b8
|
107.2 MB | Download |
|
md5:1cb9a03906956f405bd7787f9c978273
|
18.8 MB | Download |
Additional details
Software
- Repository URL
- https://github.com/BioinfoMachineLearning/PoseBench
- Programming language
- Python
- Development Status
- Active