Published August 7, 2019 | Version v2
Dataset Open

Compound activity classes from ChEMBL for machine learning analysis

  • 1. Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität

Description

Ten activity classes are provided that were extracted from ChEMBL version 24 for machine learning studies. Compounds are given in SMILES representations. The following selection criteria were applied. Compounds were required to be tested in a direct binding assay against a single human protein with a ChEMBL assay confidence score of 9. In addition, Ki measurements had to be available. If multiple Ki values were available for a compound and did not fall within the same order of magnitude, the compound was not selected. Furthermore only compounds with (mean) pKi of at least 5 were considered. Moreover, activity classes had to contain at least 200 compounds belonging to at least 50 computationally determined analog series. The 10 deposited classes consist of 243 to 955 compounds and 57 to 216 analog series.

Files

Files (541.0 kB)

Name Size Download all
md5:8608ff11ec4dc8204d5fffef59a23eba
541.0 kB Download