Improving the drug discovery process by using multiple classifier systems

Ruano-Ordás, David

doi:10.5281/zenodo.1261217

Published June 4, 2018 | Version v1.0.0

Dataset Open

Improving the drug discovery process by using multiple classifier systems

Ruano-Ordás, David¹

1. University of Vigo

High-quality dataset gathered from ChEMBL version 22 based on UniProt accession P34972. Regarding to activity data potential, duplicates were ignored, no activity or data validity comments were allowed, only data from binding assays and with a pCheMBL value were kept. This led to a dataset composed of 3925 chemical compounds (instances) represented using 2132 features. The first 2048 features epitomize different chemical structures fingerprints (represented using FCFP_6 notation), while the remaining 84 are associated with several physicochemical descriptors (such as Fractional Polar Surface Area, Rotatable Bonds or Molecular Weight). Finally, the set was transformed into a binary classification set where the activity cut-off was defined at a pChEMBL value > 7 and written to a tab-delimited text file. The final set contained 1977 active compounds and 1948 inactive compounds. Table 3 shows the codification of each feature grouped by type.

Files

d4n_corpus_physchem.csv

Files (17.8 MB)

Name	Size	Download all
d4n_corpus_physchem.csv md5:9f5da5cd7ab2172624dad674dd4ee970	17.8 MB	Preview Download

	All versions	This version
Views	461	458
Downloads	154	154
Data volume	3.1 GB	3.1 GB

Improving the drug discovery process by using multiple classifier systems

Authors/Creators

Description

Files

d4n_corpus_physchem.csv

Files (17.8 MB)