==============================================================================================

Supporting data and code for: Machine Learning analysis of protein-ligand interaction fingerprints (IF) extracted from tau-RAMD dissociation trajectories for inhibitors of HSP90

This is a set of Jupyter Notebook scripts for analysis of the protein-ligand binding contacts in protein-ligand dissociation trajectories generated in RAMD (random acceleration molecular dynamcis) simulations. The aim is to derive a regression model for estimating the relative residence time and to decipher the molecular features leading to longer residence times. The present script is written for HSP90 and requires some adaptation if applied to another system.


This script has been used to generate the results described in:

Kokh DB, Kaufman T, Kister B, Wade RC. Machine learning analysis of tauRAMD trajectories to decipher molecular determinants of drug-target residence times, (2019) submitted.


Authors:

    Authors: Daria Kokh, Tom Kaufmann

version:

    v1.0 27.03.2019

Project Manager:

    Dr. Rebecca Wade
Heidelberg Institute for Theoretical Studies (HITS)
    Schloss-Wolfsbrunnenweg 35
    D-69118 Heidelberg, Germany

Contact:

  E-Mail:   mcmsoft(at)h-its.org

1. Prerequisite


2. Running scripts


3. Method

Script construct: - Regression models, RM, for the prediction of the residence time Linear Regression with ridge regularization, LR, and Support Vector Regression, SVR: models are trained on the experimental unbinding rates, koff, on the logarithmic scale - Clustering using Gaussian Mixture Models


4. Input Data

Metadata:

../data/metadata1.xlsx This file contains a list of all compounds with their measured kinetic rate constants and computed molecular properties. The experimental data are from: Amaral, M., et al. (2017). Nat. Commun. 8, 2276 Kokh, D., et al. (2018). J. Chem. Theory Comput 14, 3859-3869 Schuetz, D. A., et al. (2018). J. Med. Chem. 90, 4397-4411

There are three different data sets of interaction fingerprints (IF) for 94 inhibitors of HSP90:

(These differ by the length of the starting part of the trajectory that is discarded)

  1. ../data/fulldataallligandsnew-2-94 - Model A: all snapshots are discarded in which fewer than 2 protein-ligand contacts in the bound state structure are lost
  2. ../data/fulldataallligandsnew-20-94 - Model B: all snapshots are discarded in which less than 20% of the protein-ligand contacts in the bound state structure are lost
  3. ../data/fulldataallligandsnew-60-94 - Model C: all snapshots are discarded in which less than 60% of the protein-ligand contacts in the bound state structure are lost

These IFs were obtained from analysis of the ligand dissociation trajectories generated using the following protocol:

Each directory contains:


6. Output Data