PepScorer::RMSD: an improved Machine Learning scoring function for protein-peptide docking
Creators
Description
1. Background:
PepScorer::RMSD is a machine learning-based scoring function (SF) specifically tailored for the pose-selection task of short peptides. The need for such SF was raised from the strong interest in peptides as therapeutic entities observed in the last years and the unsatisfactory performance of protein-peptide docking, especially due to non-specific scoring functions.
2. Methods:
PepScorer::RMSD consists of a regression machine learning (ML) model that predicts the root-mean-squared deviation (RMSD) between a given pose and the corresponding native one. For the development of PepScorer::RMSD, we collected and curated a high-quality dataset of 298 protein-peptide complexes, including peptides between 3 and 10 amino acids. For each complex, we generated a set of binding poses that, together with the x-ray pose, were used to train and evaluate the model.
3. Results:
PepScorer::RMSD outperformed common, ML, and peptide-specific scoring functions, with a Pearson correlation coefficient R of 0.75, a mean absolute error (MAE) of 1.69 Å, and a top-1 DP of 96% on the single evaluation set and 81% on the curated external test set.
4. Files explanation:
The X-ray structures underwent energy minimization, treating the protein backbone and the peptide as rigid and the protein side chains as flexible. The so obtained structures were considered reference structures, and the corresponding pose was called “pepxray”. From them, through energy minimization, we generated two other structures, maintaining the protein backbone fixed, the protein side chains free to move, and the peptide either free or with a constraint of 0.5. These two generated poses were called “freelig” and “05lig”, respectively. The other poses were obtained with molecular docking, employing PLANTS or ADCP. The best 23 poses in terms of RMSD were selected for the model development.
CSV files:
1) PepScorerRMSD_proteins.csv: list of all the complexes included in the dataset, identified by their PDB ID, and annotated for the peptide and protein chain identifiers, the peptide length, and the structural group to which the complex belongs.
2) PepScorerRMSD_poses.csv: list of all the filenames of the poses, the PDB IDs, and the RMSD of the poses.
Directories:
1) Proteins:
· Reference: reference protein structures, obtained after energy minimization with side chain flexible and peptide ligand fixed.
· Minimization_05: protein structures obtained after energy minimization with side chain flexible and peptide ligand partially flexible (0.5 constraints).
· Minimization_free: protein structures obtained after energy minimization with side chain and peptide ligand flexible.
2) Poses: the 23 poses for each protein.
3) PepScorerRMSD:
· objects: directory where files for running the model are stored.
· test: test files.
· predict.py: python file to utilize the model.
. README.pdf: instructions to run the model.
· requirements.txt: python libraries required.
Files
PepScorerRMSD.zip
Files
(168.2 MB)
Name | Size | Download all |
---|---|---|
md5:1814ea3710415bc029c6a4d4953d9fa1
|
13.2 MB | Preview Download |
md5:21553432250e12e91aa9afe0d38a2212
|
340.3 kB | Preview Download |
md5:add2a66ca6e1142b61e544b740c2042e
|
4.8 kB | Preview Download |
md5:619842a0d7c70524827f85b52afa3b05
|
25.5 MB | Preview Download |
md5:8ae69a202861d477249b184e1898c17d
|
129.1 MB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/andregiuseppecavalli/PepScorerRMSD
- Programming language
- Python
- Development Status
- Active