There is a newer version of the record available.

Published August 27, 2021 | Version 1.0
Dataset Open

TocoDecoy: a new approach to design unbiased datasets for training and benchmarking machine-learning scoring functions

Creators

  • 1. Zhejiang University

Description

This dataset file contains TocoDecoy datasets generated based on the targets and active ligands of LIT-PCBA.

1_property_filtered.zip :

  • TD set: the ligand file name, 2D T-sne vectors, Smiles, molecular weight (MW), Wildman-Crippen partition coefficient (log P), number of rotatable bonds (RB), number of hydrogen-bond acceptors (HBA), number of hydrogen-bond donors (HBD), number of halogens (HAL), topology similarities of decoys to the seed active ligands, active label (active or inactive) and training set label (whether belongs to training set or test set) OF active ligands and their topologically dissimilar decoys
  • CD set: the decoy conformations with low docking scores generated by docking active ligands into protein pockets using Glide, Schrödinger.

 

Files

1_property_filtered.zip

Files (4.9 GB)

Name Size Download all
md5:24e5ee6f96a941730bfe41ae771e0e05
4.9 GB Preview Download