There is a newer version of the record available.

Published August 27, 2021 | Version 1.0
Dataset Open

TocoDecoy: a new approach to design unbiased datasets for training and benchmarking machine-learning scoring functions


  • 1. Zhejiang University


This dataset file contains TocoDecoy datasets generated based on the targets and active ligands of LIT-PCBA. :

  • TD set: the ligand file name, 2D T-sne vectors, Smiles, molecular weight (MW), Wildman-Crippen partition coefficient (log P), number of rotatable bonds (RB), number of hydrogen-bond acceptors (HBA), number of hydrogen-bond donors (HBD), number of halogens (HAL), topology similarities of decoys to the seed active ligands, active label (active or inactive) and training set label (whether belongs to training set or test set) OF active ligands and their topologically dissimilar decoys
  • CD set: the decoy conformations with low docking scores generated by docking active ligands into protein pockets using Glide, Schrödinger.



Files (4.9 GB)

Name Size Download all
4.9 GB Preview Download