There is a newer version of the record available.

Published February 2, 2024 | Version v3
Dataset Open

Morphing libraries, QSAR models, and compounds predicted to be active on the Glucocorticoid receptor (GR)

  • 1. ROR icon University of Chemistry and Technology

Description

This repository contains datasets and files related to the computational drug discovery project of the chemical space exploration of the Glucocorticoid receptor. The accompanying Python code is freely available in the GitHub repository (https://github.com/Iagea/GRML_analyses).

Morphing Libraries:

  • GRML_library.csv: The GRML library is the collection of 999,015 virtual compounds generated by Molpher [1-2] starting from GR ligands with unique Bemis-Murcko scaffolds collected from the ChEMBL17 and IMG libraries.
  • RML_library.csv: The RML library is the collection of 1,346,310 virtual compounds generated by Molpher starting from compounds with unique Bemis-Murcko scaffolds  randomly selected from the ZINC database.

IMG library:

  • IMG_non_proprietary.csv: The non-proprietary IMG library subset containing 12,956 compounds and their corresponding B-scores from the primary screen.

Molpher inputs:

  • GR_inputs.csv: The GR inputs are the ligands used to create the GRML library, 204 compounds from ChEMBL17 (95 compounds) and the non-proprietary dataset from IMG (109 compounds).
  • Random_inputs.csv: The random inputs are 249 random ZINC compounds used to create the Random library.

Model's training sets:

  • Model33_training_set.csv: Random forest classification model training set, it includes 865 compounds; known GR actives and inactives from ChEMBL33 (738 compounds) and non-proprietary active ligands from the IMG library (127 compounds).
  • Model17_training_set.csv: Random forest classification model training set, it includes 601 compounds; known GR actives and inactives from ChEMBL17 (474 compounds) and non-proprietary active ligands from the IMG library (127 compounds).
  • RFR_training_set.csv: Random forest regression model training set, it includes 89 compounds; known GR actives and inactives from ChEMBL33 that fit into the GR pharmacophore with the four features we describe in our paper.

Models:

  • Model33.pkl: Python pickle file containing the trained Random forest classification models used along with Mondrian cross-conformal prediction to classify GR actives/inactives. This model was trained with ChEMBL33 and IMG libraries.
  • Model17.pkl: Python pickle file containing the trained Random forest classification models used along with Mondrian cross-conformal prediction to classify GR actives/inactives. This model was trained with ChEMBL17 and IMG libraries.
  • RFR_model.pkl: Python pickle file containing the trained random forest regression model used to predict GR pEC50. This model was trained with the RFR_training_set.csv.

Active predicted morphs:

  • all_morphs_actives_predicted.xlsx: An Excel spreadsheet containing two sheets. 1) All 22,524 GRML active predicted morphs. 2) All 4,341 RML active predicted morphs. The QED, NIBR Severity Score, and Molskill Score are given for each morph.

Proposed GR active ligands:

  • designed_ligands.xlsx: An Excel spreadsheet containing two sheets. 1) All 54 designed GR ligands with their QED, NIBR severity score, MolSkill score, predicted activity (pEC50 value), and the result of the manual annotation and remarks, if available. 2) The structure of the 54 ligands based on their manual annotation and presence or not in ChEMBL33 database.

Researchers and professionals in the field of drug discovery and cheminformatics may find these resources useful for further analysis and investigations.

Bibliography

[1] Hoksza, D., Škoda, P., Voršilák, M. et al. Molpher: a software framework for systematic chemical space exploration. J Cheminform 6, 7 (2014). https://doi.org/10.1186/1758-2946-6-7

[2] https://github.com/lich-uct/molpher-lib

Files

GR_inputs.csv

Files (419.2 MB)

Name Size Download all
md5:db5e196e1a792b4a31bbaf6bc173125c
1.5 MB Download
md5:91822a0ac10263631789fcc3811670a5
228.0 kB Download
md5:452d6878510a123706f0cb6c30dd0aa8
12.1 kB Preview Download
md5:7fe280d66c663355822fc89cd13292b3
59.1 MB Preview Download
md5:5f3fad7d86c072405d50d30aa9f4ad06
928.6 kB Preview Download
md5:afaca101c56c44eac9b4d890f081b76b
92.2 MB Download
md5:57f1d7fabab3ec26e6d3c8d5217b2fff
46.1 kB Preview Download
md5:044291f49591a46673627a1aa8caf50b
188.1 MB Download
md5:8d4a748810adecf0e6ba03b65565bf65
81.1 kB Preview Download
md5:d576cebc82e8d4d4a7a3127506832e70
11.5 kB Preview Download
md5:56890ff6748ed0048bcb2aadf25f4162
168.0 kB Download
md5:690917d111628ded2487c07ae354665a
9.0 kB Preview Download
md5:d7d9a147ffef05b5de8b4e3fd29b28cd
76.9 MB Preview Download