FlowDock: Geometric Flow Matching for Generative Protein-Ligand Docking and Affinity Prediction

Morehead, Alex; Cheng, Jianlin

doi:10.5281/zenodo.15066450

Published March 21, 2025 | Version 0.0.4

Preprint Open

FlowDock: Geometric Flow Matching for Generative Protein-Ligand Docking and Affinity Prediction

1. University of Missouri

Included are preprocessed datasets and model weights accompanying the manuscript "FlowDock: Geometric Flow Matching for Generative Protein-Ligand Docking and Affinity Prediction" [1]. In particular, the preprocessed PDBBind-E dataset, which is comprised of apo (holo) predicted (crystal) protein structure pairs for PDBBind 2020, Binding MOAD, DockGen, and the PDB-based van der Mers (vdm) dataset, is available for download. Note that the included "holo_aligned" protein structures (predicted by ESMFold) for each constituent dataset have been pre-aligned w.r.t. the corresponding holo (crystal) protein structures.

Paper Abstract:

Motivation Powerful generative AI models of protein-ligand structure have recently been proposed, but few of these methods support both flexible protein-ligand docking and affinity estimation. Of those that do, none can directly model multiple binding ligands concurrently or have been rigorously benchmarked on pharmacologically relevant drug targets, hindering their widespread adoption in drug discovery efforts.
Results In this work, we propose FlowDock, the first deep geometric generative model based on conditional flow matching that learns to directly map unbound (apo) structures to their bound (holo) counterparts for an arbitrary number of binding ligands. Furthermore, FlowDock provides predicted structural confidence scores and binding affinity values with each of its generated protein-ligand complex structures, enabling fast virtual screening of new (multi-ligand) drug targets. For the well-known PoseBusters Benchmark dataset, FlowDock outperforms single-sequence AlphaFold 3 with a 51% blind docking success rate using unbound (apo) protein input structures and without any information derived from multiple sequence alignments, and for the challenging new DockGen-E dataset, FlowDock outperforms single-sequence AlphaFold 3 and matches single-sequence Chai-1 for binding pocket generalization. Additionally, in the ligand category of the 16th community-wide Critical Assessment of Techniques for Structure Prediction (CASP16), FlowDock ranked among the top-5 methods for pharmacological binding affinity estimation across 140 protein-ligand complexes, demonstrating the efficacy of its learned representations in virtual screening.
Availability and implementation Source code, data, and pre-trained models are available at https://github.com/BioinfoMachineLearning/FlowDock.

References

[1] Morehead A, Cheng J. FlowDock: Geometric Flow Matching for Generative Protein-Ligand Docking and Affinity Prediction. arXiv; 2024. Available from: https://arxiv.org/abs/2412.10966 (ISMB 2025)

Files

Files (39.9 GB)

Name	Size	Download all
alphafold3_baseline_method_predictions.tar.gz md5:58f116a86fb703a73bdbca74e01ca26e	334.0 MB	Download
chai_baseline_method_predictions.tar.gz md5:f0d3e4c72c25b3e6881f3d55aefe99b1	596.0 MB	Download
diffdock_baseline_method_predictions.tar.gz md5:208a7e70ad2052576f43c279285c3397	9.7 MB	Download
dynamicbind_baseline_method_predictions.tar.gz md5:6c2f9127e781b1fa0cc44033d62036cd	6.7 GB	Download
flowdock_aft_baseline_method_predictions.tar.gz md5:a8520df183c957b2260f867996120e4c	1.2 GB	Download
flowdock_baseline_method_predictions.tar.gz md5:7d3fc8b045dfd00c47ddef9ca806632b	1.1 GB	Download
flowdock_chai_baseline_method_predictions.tar.gz md5:574c4d5b5902f6b0f7e896c64f08661e	711.3 MB	Download
flowdock_checkpoints.tar.gz md5:a19bbf4d49f20af5f94303c9370f9dee	3.0 GB	Download
flowdock_dockgen_data.tar.gz md5:99df042d68e30e37ff205ef954185d61	808.5 MB	Download
flowdock_esmfold_baseline_method_predictions.tar.gz md5:df7115b343847aad1a1c30a3a048e806	1.2 GB	Download
flowdock_hp_baseline_method_predictions.tar.gz md5:c368a31f008c0d04fa4957e635606d4c	1.1 GB	Download
flowdock_moad_data.tar.gz md5:f83d6cbd788ae126f95347cb17944c8a	2.5 GB	Download
flowdock_pdbbind_data.tar.gz md5:ef0f7858171fe8bc1e2f7e214a20b3eb	884.2 MB	Download
flowdock_pdbsidechain_data.tar.gz md5:44d110aa29e83530a2ee1160f70c5793	18.3 GB	Download
flowdock_pft_baseline_method_predictions.tar.gz md5:54c7b25f2f28782f416be648727eb03f	349.4 MB	Download
neuralplexer_baseline_method_predictions.tar.gz md5:0c30c09ca508d1371dc7cd1450cf6ece	1.1 GB	Download
rfaa_baseline_method_predictions.tar.gz md5:f3f9bcc5e1aeac4907b6ed650652a834	81.3 MB	Download
vina_p2rank_baseline_method_predictions.tar.gz md5:c1c1a9ec2a617dbca49988b5924199c7	13.6 MB	Download

Additional details

Available: 2025-03-21

Public release

Repository URL: https://github.com/BioinfoMachineLearning/FlowDock
Programming language: Python
Development Status: Active

	All versions	This version
Views	342	164
Downloads	1,380	669
Data volume	3.8 TB	1.6 TB

FlowDock: Geometric Flow Matching for Generative Protein-Ligand Docking and Affinity Prediction

Files

Files (39.9 GB)

Additional details

Dates

Software

FlowDock: Geometric Flow Matching for Generative Protein-Ligand Docking and Affinity Prediction

Creators

Description

Files

Files (39.9 GB)

Additional details

Dates

Software