Published November 4, 2025 | Version 1.4.0
Publication Open

Assessing the potential of deep learning for protein-ligand docking

Description

Included are preprocessed datasets and corresponding protein multiple sequence alignments, notebook metadata, and benchmark method predictions accompanying the benchmarking manuscript "Assessing the potential of deep learning for protein-ligand docking" [1]. In particular, the preprocessed Astex Diverse, PoseBusters Benchmark, and DockGen-E datasets as well as the publicly available CASP15 targets referenced in the manuscript are available for download. Also available are baseline method predictions from a variety of deep learning and conventional docking methods (e.g., AlphaFold 3, AutoDock Vina) for each of these benchmarking datasets. Note that the "holo_aligned" AlphaFold 3-predicted protein structures provided for the Astex Diverse, PoseBusters Benchmark, and DockGen-E datasets have been pre-aligned to the corresponding ground-truth (holo) protein structures. Similarly, the "predicted_structures" AlphaFold 3-predicted protein structures provided for the CASP15 dataset have been pre-aligned to the corresponding ground-truth (holo) protein structures. ESMFold's predicted protein structures for each dataset are also included separately for licensing flexibility.

 

Paper Abstract:

The effects of ligand binding on protein structures and their in vivo functions carry numerous implications for modern biomedical research and biotechnology development efforts such as drug discovery. Although several deep learning (DL) methods and benchmarks designed for protein-ligand docking have recently been introduced, to date no prior works have systematically studied the behavior of the latest docking and structure prediction methods within the broadly applicable context of (1) using predicted (apo) protein structures for docking (e.g., for applicability to new proteins); (2) binding multiple (cofactor) ligands concurrently to a given target protein (e.g., for enzyme design); and (3) having no prior knowledge of binding pockets (e.g., for generalization to unknown pockets). To enable a deeper understanding of docking methods' real-world utility, we introduce PoseBench, the first comprehensive benchmark for broadly applicable protein-ligand docking. PoseBench enables researchers to rigorously and systematically evaluate DL methods for apo-to-holo protein-ligand docking and protein-ligand structure prediction using both primary ligand and multi-ligand benchmark datasets, the latter of which we introduce for the first time to the DL community. Empirically, using PoseBench, we find that (1) DL co-folding methods generally outperform comparable conventional and DL docking baseline algorithms, yet popular methods such as AlphaFold 3 are still challenged by prediction targets with novel protein-ligand binding poses; (2) certain DL co-folding methods are highly sensitive to their input multiple sequence alignments, while others are not; and (3) DL methods struggle to strike a balance between structural accuracy and chemical specificity when predicting novel or multi-ligand protein targets. Code, data, tutorials, and benchmark results are available at https://github.com/BioinfoMachineLearning/PoseBench.

 

References:

[1] Morehead A, Giri N, Liu J, Neupane P, Cheng J. Assessing the potential of deep learning for protein-ligand docking. arXiv; 2025. Available from: http://arxiv.org/abs/2308.05777

Files

Files (17.7 GB)

Name Size Download all
md5:e700b19e890cc5eae5fbcf5580298b7f
852.1 MB Download
md5:06e11417a74b4db9f5bcdb3fdd0b87ac
528.0 MB Download
md5:29c20fde5a7b5f1f9c7ab35e1652b896
400.1 MB Download
md5:153209cf1ccb65a232e303025c3f9858
716.5 MB Download
md5:05c95018bff0691c693ea236a311d5c5
242.2 MB Download
md5:75d4a65c7979c509503290b0cff98896
1.6 GB Download
md5:a75946453eacc346f0d4749993071f2f
11.2 MB Download
md5:73d82258c2025023c153a2633cad7bcc
1.4 GB Download
md5:2fc79642d553f6cf1ebff42200b0069d
8.3 GB Download
md5:82630c475f47d7de894e1faba9d77808
1.6 GB Download
md5:175ab0e7d183482a1a97eac554a9aeb3
45.7 MB Download
md5:a6dd51dbbc89048377c6f1b892a7eaf0
1.9 GB Download
md5:2c610d092c4afcbc5ee9122769d4a4cd
110.3 MB Download
md5:23e22cb527666d21afb29518b2fc7dda
18.8 MB Download

Additional details

Software

Repository URL
https://github.com/BioinfoMachineLearning/PoseBench
Programming language
Python
Development Status
Active