Deep Learning for Protein-Ligand Docking: Are We There Yet?

Morehead, Alex; Giri, Nabin; Liu, Jian; Cheng, Jianlin

doi:10.5281/zenodo.11477766

Published June 5, 2024 | Version 1.0.1

Publication Open

Deep Learning for Protein-Ligand Docking: Are We There Yet?

1. University of Missouri

Included are preprocessed datasets and benchmark method predictions accompanying the benchmarking manuscript "Deep Learning for Protein-Ligand Docking: Are We There Yet?" [1]. In particular, the preprocessed Astex Diverse, PoseBusters Benchmark, and DockGen datasets as well as the publicly available CASP15 targets referenced in the manuscript are available for download. Also available are baseline method predictions from a variety of deep learning and conventional docking methods (e.g., DiffDock-L, Vina) for each of these benchmarking datasets (including pocket-only baseline results for the PoseBusters Benchmark dataset). Note that the "holo_aligned" predicted protein structures provided for the Astex Diverse, PoseBusters Benchmark, and DockGen datasets have been pre-aligned to the corresponding ground-truth (holo) protein structures. Similarly, the "predicted_structures" predicted protein structures provided for the CASP15 dataset have been pre-aligned to the corresponding ground-truth (holo) protein structures.

Paper Abstract:

The effects of ligand binding on protein structures and their in vivo functions carry numerous implications for modern biomedical research and biotechnology development efforts such as drug discovery. Although several deep learning (DL) methods and benchmarks designed for protein-ligand docking have recently been introduced, to date no prior works have systematically studied the behavior of docking methods within the practical context of (1) predicted (apo) protein structures (e.g., for broad applicability); (2) multiple ligands concurrently binding to a given target protein (e.g., for enzyme design); and (3) having no prior knowledge of binding pockets (e.g., for pocket generalization). To enable a deeper understanding of docking methods' real-world utility, we introduce PoseBench, the first comprehensive benchmark for practical protein-ligand docking. PoseBench enables researchers to rigorously and systematically evaluate DL docking methods for apo-to-holo protein-ligand docking and protein-ligand structure generation using both single and multi-ligand benchmark datasets, the latter of which we introduce for the first time to the DL community. Empirically, using PoseBench, we find that all recent DL docking methods but one fail to generalize to multi-ligand protein targets and also that template-based docking algorithms perform equally well or better for multi-ligand docking as recent single-ligand DL docking methods, suggesting areas of improvement for future work. Code, data, tutorials, and benchmark results are available at https://github.com/BioinfoMachineLearning/PoseBench.

References:

[1] Morehead A, Giri N, Liu J, Cheng J. Deep Learning for Protein-Ligand Docking: Are We There Yet? arXiv; 2024. Available from: http://arxiv.org/abs/2308.05777

Files

Files (36.4 GB)

Name	Size	Download all
astex_diverse_ensemble_benchmark_method_predictions.tar.gz md5:eeba48db091d63b80e7613a34b92f115	44.2 MB	Download
astex_diverse_set.tar.gz md5:f0bc078d73ff5af74ae2e82f450757ba	23.3 MB	Download
casp15_ensemble_benchmark_method_predictions.tar.gz md5:f5bace269a8e288f44e8b7f2865fc73f	992.5 MB	Download
casp15_set.tar.gz md5:06b32ab4382bc61b8242c441027ba60a	257.8 MB	Download
diffdock_benchmark_method_predictions.tar.gz md5:d07823a878d06add9aa23c25e95703a4	40.8 MB	Download
dockgen_ensemble_benchmark_method_predictions.tar.gz md5:713ae92ed337fffbc2464d5c23e65761	89.2 MB	Download
dockgen_set.tar.gz md5:4065f15d828cd4e4ec9d304c923d0ce8	85.3 MB	Download
dynamicbind_benchmark_method_predictions.tar.gz md5:a2dc761f4dfcac8c814bf4c8d9583a7e	14.5 GB	Download
fabind_benchmark_method_predictions.tar.gz md5:5f813ad44ff04f91ec13cfeb80ed2f91	1.1 MB	Download
neuralplexer_benchmark_method_predictions.tar.gz md5:e5a51f880c35846975b6023bd44dd685	19.8 GB	Download
posebusters_benchmark_ensemble_benchmark_method_predictions.tar.gz md5:1e7636692afc97a998a55512bcc1eb9a	243.8 MB	Download
posebusters_benchmark_set.tar.gz md5:9f65953bbc91bb894fee31ec6e3b48a1	145.3 MB	Download
rfaa_benchmark_method_predictions.tar.gz md5:907537557790488d816d0feed8d6f0e3	160.8 MB	Download
tulip_benchmark_method_predictions.tar.gz md5:97e8100adbf2d0df6dfddd4433a100a1	962.1 kB	Download
vina_benchmark_method_predictions.tar.gz md5:cf76ae8fd41e6e6dff21b80bcb40ca6e	47.1 MB	Download

Additional details

Repository URL: https://github.com/BioinfoMachineLearning/PoseBench
Programming language: Python
Development Status: Active

	All versions	This version
Views	1,624	259
Downloads	9,486	1,517
Data volume	14.3 TB	3.2 TB

Deep Learning for Protein-Ligand Docking: Are We There Yet?

Authors/Creators

Description

Files

Files (36.4 GB)

Additional details

Software