PepScorer::RMSD: an improved Machine Learning scoring function for protein-peptide docking

Cavalli, Andrea Giuseppe; Mazzolari, Angelica; Vistoli, Giulio; Pedretti, Alessandro

doi:10.5281/zenodo.15011382

Published March 12, 2025 | Version v1

Software Open

PepScorer::RMSD: an improved Machine Learning scoring function for protein-peptide docking

1. University of Milan

1. Background:

PepScorer::RMSD is a machine learning-based scoring function (SF) specifically tailored for the pose-selection task of short peptides. The need for such SF was raised from the strong interest in peptides as therapeutic entities observed in the last years and the unsatisfactory performance of protein-peptide docking, especially due to non-specific scoring functions.

2. Methods:

PepScorer::RMSD consists of a regression machine learning (ML) model that predicts the root-mean-squared deviation (RMSD) between a given pose and the corresponding native one. For the development of PepScorer::RMSD, we collected and curated a high-quality dataset of 298 protein-peptide complexes, including peptides between 3 and 10 amino acids. For each complex, we generated a set of binding poses that, together with the x-ray pose, were used to train and evaluate the model.

3. Results:

PepScorer::RMSD outperformed common, ML, and peptide-specific scoring functions, with a Pearson correlation coefficient R of 0.75, a mean absolute error (MAE) of 1.69 Å, and a top-1 DP of 96% on the single evaluation set and 81% on the curated external test set.

4. Files explanation:

The X-ray structures underwent energy minimization, treating the protein backbone and the peptide as rigid and the protein side chains as flexible. The so obtained structures were considered reference structures, and the corresponding pose was called “pepxray”. From them, through energy minimization, we generated two other structures, maintaining the protein backbone fixed, the protein side chains free to move, and the peptide either free or with a constraint of 0.5. These two generated poses were called “freelig” and “05lig”, respectively. The other poses were obtained with molecular docking, employing PLANTS or ADCP. The best 23 poses in terms of RMSD were selected for the model development.

CSV files:

1) PepScorerRMSD_proteins.csv: list of all the complexes included in the dataset, identified by their PDB ID, and annotated for the peptide and protein chain identifiers, the peptide length, and the structural group to which the complex belongs.

2) PepScorerRMSD_poses.csv: list of all the filenames of the poses, the PDB IDs, and the RMSD of the poses.

Directories:

1) Proteins:

· Reference: reference protein structures, obtained after energy minimization with side chain flexible and peptide ligand fixed.

· Minimization_05: protein structures obtained after energy minimization with side chain flexible and peptide ligand partially flexible (0.5 constraints).

· Minimization_free: protein structures obtained after energy minimization with side chain and peptide ligand flexible.

2) Poses: the 23 poses for each protein.

3) PepScorerRMSD:

· objects: directory where files for running the model are stored.

· test: test files.

· predict.py: python file to utilize the model.

. README.pdf: instructions to run the model.

· requirements.txt: python libraries required.

Files

PepScorerRMSD.zip

Files (168.2 MB)

Name	Size	Download all
PepScorerRMSD.zip md5:1814ea3710415bc029c6a4d4953d9fa1	13.2 MB	Preview Download
PepScorerRMSD_poses.csv md5:21553432250e12e91aa9afe0d38a2212	340.3 kB	Preview Download
PepScorerRMSD_proteins.csv md5:add2a66ca6e1142b61e544b740c2042e	4.8 kB	Preview Download
Poses.zip md5:619842a0d7c70524827f85b52afa3b05	25.5 MB	Preview Download
Proteins.zip md5:8ae69a202861d477249b184e1898c17d	129.1 MB	Preview Download

Additional details

Repository URL: https://github.com/andregiuseppecavalli/PepScorerRMSD
Programming language: Python
Development Status: Active

	All versions	This version
Views	45	32
Downloads	94	69
Data volume	2.8 GB	2.3 GB

PepScorer::RMSD: an improved Machine Learning scoring function for protein-peptide docking

Creators

Description

Files

PepScorerRMSD.zip

Files (168.2 MB)

Additional details

Software