There is a newer version of the record available.

Published September 30, 2024 | Version 1.1.0
Publication Open

Deep Learning for Protein-Ligand Docking: Are We There Yet?

Description

Included are preprocessed datasets and benchmark method predictions accompanying the benchmarking manuscript "Deep Learning for Protein-Ligand Docking: Are We There Yet?" [1]. In particular, the preprocessed Astex Diverse, PoseBusters Benchmark, and DockGen datasets as well as the publicly available CASP15 targets referenced in the manuscript are available for download. Also available are baseline method predictions from a variety of deep learning and conventional docking methods (e.g., DiffDock-L, Vina) for each of these benchmarking datasets (including pocket-only baseline results for the PoseBusters Benchmark dataset). Note that the "holo_aligned" AlphaFold 3-predicted protein structures provided for the Astex Diverse, PoseBusters Benchmark, and DockGen datasets have been pre-aligned to the corresponding ground-truth (holo) protein structures. Similarly, the "predicted_structures" AlphaFold 3-predicted protein structures provided for the CASP15 dataset have been pre-aligned to the corresponding ground-truth (holo) protein structures.

 

Paper Abstract:

The effects of ligand binding on protein structures and their in vivo functions carry numerous implications for modern biomedical research and biotechnology development efforts such as drug discovery. Although several deep learning (DL) methods and benchmarks designed for protein-ligand docking have recently been introduced, to date no prior works have systematically studied the behavior of docking methods within the broadly applicable context of (1) using predicted (apo) protein structures for docking (e.g., for applicability to unknown structures); (2) docking multiple ligands concurrently to a given target protein (e.g., for enzyme design); and (3) having no prior knowledge of binding pockets (e.g., for unknown pocket generalization). To enable a deeper understanding of docking methods' real-world utility, we introduce PoseBench, the first comprehensive benchmark for broadly applicable protein-ligand docking. PoseBench enables researchers to rigorously and systematically evaluate DL docking methods for apo-to-holo protein-ligand docking and protein-ligand structure generation using both single and multi-ligand benchmark datasets, the latter of which we introduce for the first time to the DL community. Empirically, using PoseBench, we find that (1) DL methods consistently outperform conventional docking algorithms; (2) most recent DL docking methods fail to generalize to multi-ligand protein targets; and (3) training DL methods with physics-informed loss functions on diverse clusters of protein-ligand complexes is a promising direction for future work. Code, data, tutorials, and benchmark results are available at https://github.com/BioinfoMachineLearning/PoseBench.

 

References:

[1] Morehead A, Giri N, Liu J, Cheng J. Deep Learning for Protein-Ligand Docking: Are We There Yet? arXiv; 2024. Available from: http://arxiv.org/abs/2308.05777

Files

Files (30.2 GB)

Name Size Download all
md5:429e2a296a291acc5212477eba2c1af3
48.7 MB Download
md5:73e75640eb442daa06319de1b61885d7
22.4 MB Download
md5:0f28a241e9efe5e10173eab6820760be
1.0 GB Download
md5:35a460616ca32d7650238f3d62211d5d
10.9 MB Download
md5:0e7f94fb8ec3985310680f3551da6628
744.6 MB Download
md5:6ba22ed916c745d3f30cb02ba24e1051
37.8 MB Download
md5:753db7840c26e6b39ac327c2040d7231
43.7 MB Download
md5:6127d194a74956e6505ce8eb964c2f33
48.8 MB Download
md5:131f8f5808361b1fdecb2087585e9beb
12.7 GB Download
md5:f5bd903061cfce25daea3b152d2c4fb1
1.6 GB Download
md5:ed507d1934802a41098144a38ace758d
13.2 GB Download
md5:66ed32cb705d1db1b33f88d2e45cba0a
275.6 MB Download
md5:da3204c81686459c5429178aa6a74326
127.3 MB Download
md5:25145bccd641242b7d968d48587b5d59
160.8 MB Download
md5:8dc020c109f67fb45701b9a590c46efd
893.8 kB Download
md5:decc3bae9a10850a2dadb1671bbfcf72
73.6 MB Download

Additional details

Software

Repository URL
https://github.com/BioinfoMachineLearning/PoseBench
Programming language
Python
Development Status
Active