Data for "Machine Learning Scoring Functions for Drug Discovery from Experimental and Computer-generated Protein-Ligand Structures: Towards Per-target Scoring Functions"
- 1. School of Science and Technology, Physics Division, University of Camerino, I-62032 Camerino (MC), Italy
- 2. School of Pharmacy, Medicinal Chemistry Unit, University of Camerino, I-62032 Camerino (MC), Italy
- 3. School of Pharmacy, Physics Unit, University of Camerino, Via Madonna delle Carceri 9, I-62032 Camerino (MC), Italy
- 4. School of Science and Technology, Physics Division, University of Camerino, I-62032 Camerino (MC), Italy, and INFN-Sezione di Perugia, I-06123 Perugia, Italy
Description
Data used in "Machine Learning Scoring Functions for Drug Discovery from Experimental and Computer-generated Protein-Ligand Structures: Towards Per-target Scoring Functions"
by F. Pellicani, D. Dal Ben, A. Perali, S. Pilati
If you use these data or the python script for your research or other activities, please cite the corresponding journal article.
====================
Uncompressing the zipped file DataSFUnicam.zip provies the following files and folders:
DataSFUnicam/
ExperimentalDataPDBFiles/
This folder contains 2408 .pdb files of experimental complex structures. The files are named with a univocal code corresponding to the protein-ligand complex.
ExperimentalDataXLSXFile.xlsx
This Excel file reports the experimental protein-ligand chemical information. In the sheet named “Foglio1”, the first column contains the univocal code of the protein-ligand complex, the second column contains the experimentally measured pK_d.
SyntheticDataPDBFiles/
This folder contains the .pdb files of the synthetic complex structures. The .pdb files are grouped in 17 folders according to just as many target proteins. The folders are named after the corresponding protein. Each folder contains the .pdb files for the best position of each protein-ligand pair according to the MOE docking score. The files are named with a univocal code.
SyntheticDataXLSXFiles/
The folder contains 17 Excel files with the chemical information of the synthetic protein-ligand complexes. The files are named after the corresponding target protein. In the sheet named “Foglio1” of each .xlsx file, the first column contains a univocal code of the protein-ligand complex in each conformation, the second column contains an auxiliary numerical code corresponding to the protein-ligand pair, the third column contains the experimentally measured pK_i, and the fourth column contains the docking score provided by the MOE software.
====================
USER GUIDE FOR THE PYTHON SCRIPT
Download and uncompress the zipped file "SFUnicam.zip" with a command like "unzip SFUnicam.zip".
The following file structure is created:
SFUnicam/
ComplexToBePredictedFolder/4ey5_30.pdb
MaxAssMatrix.npy
my_model
devStndSynt.npy
mediaSynt.npy
UnicamSF13prot.py
README.txt
The subfolder "ComplexToBePredictedFolder/" contains the example PDB file "4ey5_30.pdb".
-) To execute the script "UnicamSF13prot.py", Python 3 should be installed with the following libraries and sublibraries:
Keras:
Regularizers
Sequential (keras.models)
Conv1D, Dense, MaxPooling1D, GlobalMaxPooling1D, GlobalAveragePooling1D, AveragePooling1D (keras.layers)
Adam (keras.optimizers)
Numpy
Tensorflow
Operation:
-) Copy the .pdb file related to the protein-ligand complex whose affinity is to be predicted in the subfolder “ComplexToBePredictedFolder/”.
-) Make sure the following files are in the same folder where the python script is:
MaxAssMatrix.npy
mediaSynt.npy
devStndSynt.npy
my_model
-) Run the code using Python 3 with a command like "python3.x UnicamSF13prot.py".
-) Enter the name of the protein-ligand PDB file whose affinity is to be predicted (excluding the extension ".pdb").
-) Read the predicted affinity from screen.
Files
DataSFUnicam.zip
Files
(2.9 GB)
Name | Size | Download all |
---|---|---|
md5:2ac8fe2f084088ee58c2a562dc55e2e5
|
2.9 GB | Preview Download |
md5:b8f91cbb297c7b917baf71c4d4c4b73a
|
317.9 kB | Preview Download |