There is a newer version of the record available.

Published June 1, 2018 | Version 1.0.0
Dataset Open

Datasets for practical model selection for prospective virtual screening

Description

This repository contains datasets for the manuscript "Practical model selection for prospective virtual screening":

  • pria_rmi_cv.tar.gz: A compressed directory containing chemical screening data for the PriA-SSB ASPriA-SSB FP, and RMI-FANCM FP binary datasets.  The files also contain the associated continuous % inhibition values and chemical features represented as SMILES and ECFP4 fingerprints.  The dataset has been split into five folds for cross validation.
  • pria_rmi_pcba_cv.tar.gz: A compressed directory containing chemical screening data for the PriA-SSB ASPriA-SSB FP, and RMI-FANCM FP binary datasets as well as public PubChem BioAssay datasets.  The files also contain the PriA-SSB and RMI-FANCM continuous % inhibition values and chemical features represented as SMILES and ECFP4 fingerprints.  The dataset has been split into five folds for cross validation.  Missing values are left blank.
  • pria_prospective.csv.gz: A compressed file containing chemical screening data for the binary dataset PriA-SSB prospective.  The file also contains the continuous % inhibition values and chemical features represented as SMILES and ECFP4 fingerprints.

If you use this data in a publication, please cite:

Shengchao Liu+, Moayad Alnammi+, Spencer S. Ericksen, Andrew F. Voter, James L. Keck, F. Michael Hoffmann, Scott A. Wildman, Anthony Gitter. Practical model selection for prospective virtual screening. bioRxiv 2018. doi:10.1101/337956

PubChem data were provided by the PubChem database.  Follow the PubChem citation guidelines if you use the PubChem data.

Files

Files (65.3 MB)

Name Size Download all
md5:56e2670b220a1e1992dbbcc62ca42382
2.1 MB Download
md5:5a59e477bd4243b73c0dc775b4cfe057
6.7 MB Download
md5:7c89e92d4269a7ea1ac39dcda65ae70c
56.5 MB Download

Additional details

Related works

References

  • Liu et al. (2018) Practical model selection for prospective virtual screening. bioRxiv doi:10.1101/337956