HOPPred – Prediction of peptide hormones using an ensemble of machine learning and similarity‑based methods

Dashleen Kaur; Arora, Akanksha; Raghava, Gajendra

doi:10.5281/zenodo.19910668

Published April 30, 2026 | Version v1.0

Software Open

HOPPred – Prediction of peptide hormones using an ensemble of machine learning and similarity‑based methods

1. Indraprastha Institute of Information Technology Delhi

Title:
HOPPred Dataset – Experimentally validated peptide hormones and non‑hormonal peptides

Description:

Project: HOPPred – Prediction of peptide hormones using an ensemble of machine learning and similarity‑based methods

Publication: Kaur, D., Arora, A., Vigneshwar, P., & Raghava, G.P.S. (2024). Prediction of peptide hormones using an ensemble of machine learning and similarity‑based methods. Proteomics, 24, e2400004. https://doi.org/10.1002/pmic.202400004

Overview: This dataset accompanies HOPPred, the first computational tool for predicting peptide hormones. Peptide hormones are genome‑encoded signal transduction molecules essential for regulating growth, development, and homeostasis; their dysregulation leads to endocrine disorders (e.g., diabetes, neoplasia). The dataset is curated from Hmrbase2 and other sources, balanced (1,174 hormonal + 1,174 non‑hormonal peptides), and redundancy‑reduced (CD‑HIT at 90% similarity).

Content:

Dataset	Peptides
Hormonal (positive)	1,174
Non‑hormonal (negative)	1,174
Total	2,348

Key Findings – Compositional analysis (hormonal peptides enriched in):

Cysteine (C), Aspartic acid (D), Phenylalanine (F), Glycine (G), Arginine (R), Serine (S), Asparagine (N), Proline (P), Tyrosine (Y) – statistically significant (Mann‑Whitney U, p < 0.05)
Non‑hormonal enriched in: Glutamic acid (E), Isoleucine (I), Leucine (L), Methionine (M), Glutamine (Q), Lysine (K), Threonine (T), Valine (V)

Exclusive motifs in hormonal peptides (MERCI): FGPR, WFGP, WFGPR, FGPRL, GPRL, WFGP, MWFGPRL, LCGS (LCGS is known motif in Insulin chain B)

Best Model Performance (validation set – 20% held out):

Model	AUC	MCC	Accuracy	Sensitivity	Specificity
Ensemble (LR + Motif + BLAST)	0.96	0.80	89.8%	90.1%	89.5%
LR (ML alone – top 50 features)	0.93	0.72	86.0%	85.3%	86.6%
TextCNN (DL)	0.90	0.67	83.0%	87.0%	79.0%
RF (ML – top 50 features)	0.90	0.64	82.1%	80.2%	84.0%
TabNet (DL)	0.75	0.57	74.0%	73.0%	75.0%

Top features align with motifs: DPC1_CF (Cys‑Phe), TPC_FRP (Phe‑Arg‑Pro), TPC_GNF (Gly‑Asn‑Phe), TPC_LMG (Leu‑Met‑Gly), TPC_RGL (Arg‑Gly‑Leu) – overlapping with motifs FGPR, WFGPRL, etc., confirming biological relevance.

Data Curation & Quality Control:

Source: Hmrbase2 (hormone database) + PeptideAtlas + UniProt/Swiss‑Prot
Redundancy reduction: CD‑HIT at 90% sequence identity
Negative set: Randomly selected from Swiss‑Prot excluding known hormones
Train/validation split: 80/20 (5‑fold CV on training)
Feature selection: RFE (Recursive Feature Elimination) with Logistic Regression as estimator

Usage: Predicting peptide hormones from sequence, designing novel hormone peptides (Design module), scanning protein sequences for hormone regions (Protein Scan module), identifying hormone‑associated motifs, developing peptide‑based therapeutics and endocrine disorder treatments.

Related Resources: Web server: https://webs.iiitd.edu.in/raghava/hoppred/ | GitHub: https://github.com/raghavagps/HOPPRED

Contact: raghava@iiitd.ac.in (Gajendra P. S. Raghava)

Files

raghavagps/hoppred-v1.0.zip

Files (167.9 kB)

Name	Size	Download all
raghavagps/hoppred-v1.0.zip md5:04f3c0fd3f96250c820216afdbdcd176	167.9 kB	Preview Download

Additional details

Is supplement to: Software: https://github.com/raghavagps/hoppred/tree/v1.0 (URL)

Repository URL: https://github.com/raghavagps/hoppred

	All versions	This version
Views	6	6
Downloads	1	1
Data volume	167.9 kB	167.9 kB

HOPPred – Prediction of peptide hormones using an ensemble of machine learning and similarity‑based methods

Authors/Creators

Description

Files

raghavagps/hoppred-v1.0.zip

Files (167.9 kB)

Additional details

Related works

Software