Published August 24, 2023 | Version v1
Preprint Open

PoseBusters: AI-based docking methods fail to generate physically valid poses or generalise to novel sequences

  • 1. University of Oxford

Description

The protein-ligand complexes of the Astex Diverse set and the PoseBusters Benchmark set as described in the paper "PoseBusters: AI-based docking methods fail to generate physically valid poses or generalise to novel sequences" [1] with associated code at https://github.com/maabuu/posebusters

All protein-ligand complexes were originally submitted to the Protein Data Bank (PDB) [2]. The Astex Diverse set was curated by Hartshorn et al. [3] and contains 85 cases. The PoseBusters Benchmark set was curated by us using the PDB and contains 428 cases. All complexes were downloaded from the PDB as MMTF [4] files and processed with PyMOL [5]. 

Note: During peer review we learned that some of the 428 structures contain crystal contacts (e.g. 5S8I_2LY). For the journal paper the results are reported on a subset containing 308 structures. The IDs for this subset can be downloaded here: PoseBusters 308 identifiers


Paper Abstract:
The last few years have seen the development of numerous deep learning-based protein-ligand docking methods. They offer huge promise in terms of speed and accuracy. However, despite claims of state-of the-art performance in terms of crystallographic root-mean-square deviation (RMSD), upon closer inspection, it has become apparent that they often produce physically implausible molecular structures. It is therefore not sufficient to evaluate these methods solely by RMSD to a native binding mode. It is vital, particularly for deep learning-based methods, that they are also evaluated on steric and energetic criteria. We present PoseBusters, a Python package that performs a series of standard quality checks using the well-established cheminformatics toolkit RDKit. The PoseBusters test suite validates chemical and geometric consistency of a ligand including its stereochemistry, and the physical plausibility of intra- and intermolecular measurements such as the planarity of aromatic rings, standard bond lengths, and protein ligand clashes. Only methods that both pass these checks and predict native-like binding modes should be classed as having “state-of-the-art” performance. We use PoseBusters to compare five deep learning based docking methods (DeepDock, DiffDock, EquiBind, TankBind, and Uni-Mol) and two well-established standard docking methods (AutoDock Vina and CCDC Gold) with and without an additional post-prediction energy minimisation step using a molecular mechanics force field. We show that both in terms of physical plausibility and the ability to generalise to examples that are distinct from the training data, no deep learning-based method yet outperforms classical docking tools. In addition, we find that molecular mechanics force fields contain docking-relevant physics missing from deep-learning methods. PoseBusters allows practitioners to assess docking and molecular generation methods and may inspire new inductive biases still required to improve deep learning-based methods, which will help drive the development of more accurate and more realistic predictions.

References

[1] Buttenschoen M, Morris GM, Deane CM. PoseBusters: AI-based docking methods fail to generate physically valid poses or generalise to novel sequences. arXiv; 2023. Available from: http://arxiv.org/abs/2308.05777

[2] H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne, The Protein Data Bank (2000) Nucleic Acids Research 28: 235-242 https://doi.org/10.1093/nar/28.1.235.

[3] Hartshorn MJ, Verdonk ML, Chessari G, Brewerton SC, Mooij WTM, Mortenson PN, et al. Diverse, high-quality test set for the validation of protein-ligand docking performance. J Med Chem. 2007 Feb 1;50(4):726–41. 

[4] Bradley AR, Rose AS, Pavelka A, Valasatava Y, Duarte JM, Prlić A, et al. MMTF—an efficient file format for the transmission, visualization, and analysis of macromolecular structures. Schneidman D, editor. PLoS Comput Biol. 2017 Jun 2;13(6):e1005575. 

[5] Schrödinger, LLC. The PyMOL molecular graphics system. 2015. Available from: https://github.com/schrodinger/pymol-open-source

Files

posebusters_paper_data.zip

Files (55.0 MB)

Name Size Download all
md5:f004ac7c4e68317b5348497d2bb6bee6
53.7 MB Preview Download
md5:a7cbe725e86e412fdfeb3c3e35c566dd
1.3 MB Preview Download

Additional details