Published December 1, 2022 | Version v2
Dataset Open

Data for "Machine Learning Scoring Functions for Drug Discovery from Experimental and Computer-generated Protein-Ligand Structures: Towards Per-target Scoring Functions"

  • 1. School of Science and Technology, Physics Division, University of Camerino, I-62032 Camerino (MC), Italy
  • 2. School of Pharmacy, Medicinal Chemistry Unit, University of Camerino, I-62032 Camerino (MC), Italy
  • 3. School of Pharmacy, Physics Unit, University of Camerino, Via Madonna delle Carceri 9, I-62032 Camerino (MC), Italy
  • 4. School of Science and Technology, Physics Division, University of Camerino, I-62032 Camerino (MC), Italy, and INFN-Sezione di Perugia, I-06123 Perugia, Italy

Description

Data used in "Machine Learning Scoring Functions for Drug Discovery from Experimental and Computer-generated Protein-Ligand Structures: Towards Per-target Scoring Functions"
by F. Pellicani, D. Dal Ben, A. Perali, S. Pilati

If you use these data or the python script for your research or other activities, please cite the corresponding journal article.

 

====================

Uncompressing the zipped file DataSFUnicam.zip provies the following files and folders:


DataSFUnicam/

 

    ExperimentalDataPDBFiles/
    This folder contains 2408 .pdb files of experimental complex structures. The files are named with a univocal code corresponding to the protein-ligand complex.

 

    ExperimentalDataXLSXFile.xlsx
    This Excel file reports the experimental protein-ligand chemical information. In the sheet named “Foglio1”, the first column contains the univocal code of the protein-ligand complex, the second column contains the experimentally measured pK_d.

 

    SyntheticDataPDBFiles/
    This folder contains the .pdb files of the synthetic complex structures. The .pdb files are grouped in 17 folders according to just as many target proteins. The folders are named after the corresponding protein. Each folder contains the .pdb files for the best position of each protein-ligand pair according to the MOE docking score. The files are named with a univocal code.

 

    SyntheticDataXLSXFiles/
    The folder contains 17 Excel files with the chemical information of the synthetic protein-ligand complexes. The files are named after the corresponding target protein. In the sheet named “Foglio1” of each .xlsx file, the first column contains a univocal code of the protein-ligand complex in each conformation, the second column contains an auxiliary numerical code corresponding to the protein-ligand pair, the third column contains the experimentally measured pK_i, and the fourth column contains the docking score provided by the MOE software.

====================

USER GUIDE FOR THE PYTHON SCRIPT

Download and uncompress the zipped file "SFUnicam.zip" with a command like "unzip SFUnicam.zip". 

The following file structure is created:

SFUnicam/

        ComplexToBePredictedFolder/4ey5_30.pdb 
        MaxAssMatrix.npy
        my_model
        devStndSynt.npy
        mediaSynt.npy
        UnicamSF13prot.py
        README.txt

        
The subfolder "ComplexToBePredictedFolder/" contains the example PDB file "4ey5_30.pdb".

-) To execute the script "UnicamSF13prot.py", Python 3 should be installed with the following libraries and sublibraries:
Keras:
      Regularizers
      Sequential (keras.models)
      Conv1D, Dense, MaxPooling1D, GlobalMaxPooling1D, GlobalAveragePooling1D, AveragePooling1D (keras.layers)
      Adam (keras.optimizers)
Numpy

Tensorflow

Operation:
-) Copy the .pdb file related to the protein-ligand complex whose affinity is to be predicted in the subfolder “ComplexToBePredictedFolder/”.
-) Make sure the following files are in the same folder where the python script is:
MaxAssMatrix.npy
mediaSynt.npy
devStndSynt.npy
my_model

-) Run the code using Python 3 with a command like "python3.x UnicamSF13prot.py".
-) Enter the name of the protein-ligand PDB file whose affinity is to be predicted (excluding the extension ".pdb").
-) Read the predicted affinity from screen.
 

 

Files

DataSFUnicam.zip

Files (2.9 GB)

Name Size Download all
md5:2ac8fe2f084088ee58c2a562dc55e2e5
2.9 GB Preview Download
md5:b8f91cbb297c7b917baf71c4d4c4b73a
317.9 kB Preview Download