Datasets for practical model selection for prospective virtual screening
Description
This repository contains datasets for the manuscript "Practical model selection for prospective virtual screening":
- pria_rmi_cv.tar.gz: A compressed directory containing chemical screening data for the PriA-SSB AS, PriA-SSB FP, and RMI-FANCM FP binary datasets. The files also contain the associated continuous % inhibition values and chemical features represented as SMILES and Morgan fingerprints. The dataset has been split into five folds for cross validation.
- pria_rmi_pcba_cv.tar.gz: A compressed directory containing chemical screening data for the PriA-SSB AS, PriA-SSB FP, and RMI-FANCM FP binary datasets as well as public PubChem BioAssay datasets. The files also contain the PriA-SSB and RMI-FANCM continuous % inhibition values and chemical features represented as SMILES and Morgan fingerprints. The dataset has been split into five folds for cross validation. Missing values are left blank.
- pria_prospective.csv.gz: A compressed file containing chemical screening data for the binary dataset PriA-SSB prospective. The file also contains the continuous % inhibition values and chemical features represented as SMILES and Morgan fingerprints.
If you use these data in a publication, please cite:
Shengchao Liu+, Moayad Alnammi+, Spencer S. Ericksen, Andrew F. Voter, Gene E. Ananiev, James L. Keck, F. Michael Hoffmann, Scott A. Wildman, Anthony Gitter. Practical Model Selection for Prospective Virtual Screening. Journal of Chemical Information and Modeling. 2018 doi:10.1021/acs.jcim.8b00363
PubChem data were provided by the PubChem database. Follow the PubChem citation guidelines if you use the PubChem data. See Voter et al. 2017 (PubChem AID 1272365) for the PriA-SSB screening data and Voter et al. 2016 (PubChem AID 1159607) for RMI-FANCM.
Version 1.1.0 updates all of the data files. We standardized the SMILES in all files by generating canonical SMILES with RDKit version 2016.03.4. In addition, we removed 2845 chemicals from pria_prospective.csv.gz that were duplicates of compounds in pria_rmi_cv.tar.gz.
Files
Files
(65.0 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:874559a45bb0b38c11f06d1c0aef2767
|
1.6 MB | Download |
|
md5:d08574d9a9de52fb4581935b265d55fe
|
6.7 MB | Download |
|
md5:2c2d94f819754cd50a09a387cc9a60b8
|
56.7 MB | Download |
Additional details
Related works
- Is supplement to
- https://github.com/gitter-lab/pria_lifechem (URL)
- 10.1021/acs.jcim.8b00363 (DOI)
References
- Liu et al. (2018) Practical Model Selection for Prospective Virtual Screening. Journal of Chemical Information and Modeling. doi:10.1021/acs.jcim.8b00363