There is a newer version of the record available.

Published October 31, 2025 | Version v1
Dataset Open

NextTopDocker

  • 1. Université Paris Cité, CNRS UMR 8251, INSERM ERL 1133, F-75013 Paris, France
  • 2. Department of Bioengineering, Imperial College London, London SW7 2AZ, UK

Description

NextTopDocker, a largest-scale, up-to-date (as of May 2025), and fully open-access data set of 19,239 PDB-derived protein-ligand complexes, split into 14,038 training and 5,201 test entries via a strict cold-ligand strategy, together with nine ligand-similarity-aware training subsets, provides a challenging, diverse, and reproducible foundation for evaluating pose generation and docking performance.

On this benchmark dataset, our simple logistic regression models, LogReg (x%), trained on Smina and GNINA 1.3 scores from chemically dissimilar ligands and applied to Smina-generated poses, achieved docking power comparable to or exceeding that of the four SOTA end-to-end ML docking tools (DeepDock, Interformer, SurfDock, and Uni-Mol Docking v.2).

Files

DeepDock.zip

Files (33.6 GB)

Name Size Download all
md5:f2f2a84cda7e5a72e5a9225c0edf6781
290.8 MB Preview Download
md5:a14ce1fc348a23e7f8f4319c81829602
250.1 MB Preview Download
md5:9f8a188f2958b9fcb200cee70dac3dd1
12.9 GB Preview Download
md5:2c9a3bac624b2aa81a1bb9a3cf0ca4b5
874.8 MB Preview Download
md5:a6463de2f2163bea42274f097fa2b993
18.9 GB Preview Download
md5:def5e26fc9ffecad7a50d9459e7760dd
453.8 MB Preview Download