Published April 30, 2026 | Version v1.0
Software Open

AFProPred – Prediction of antifreeze proteins using machine learning and evolutionary information

  • 1. Indraprastha Institute of Information Technology
  • 2. ROR icon Indraprastha Institute of Information Technology Delhi

Description

Title:
AFProPred Dataset – Experimentally validated antifreeze proteins (AFPs) and non‑AFPs from reviewed UniProt entries

Description:

Project: AFProPred – Prediction of antifreeze proteins using machine learning and evolutionary information

Publication: Kumar, N., Patiyal, S., Choudhury, S., Bajiya, N., & Raghava, G.P.S. (2025). AFProPred: Prediction of antifreeze proteins using machine learning and evolutionary information. Proteomics, e202400157. https://doi.org/10.1002/pmic.202400157

Overview: This dataset accompanies AFProPred, a machine learning method for predicting antifreeze proteins (AFPs). AFPs enable organisms (fish, insects, fungi, bacteria) to survive in sub‑zero temperatures via thermal hysteresis and ice recrystallisation inhibition, with applications in food preservation, medicine, and cryosurgery. Unlike existing methods evaluated on unreviewed data, this study uses a validation dataset of reviewed (Swiss‑Prot) AFPs and non‑AFPs.

Content:

 
Dataset AFPs Non‑AFPs Source
Main (training) 8,134 9,439 UniProt (unreviewed) + AFP‑Pred
Validation (independent) 80 73 Swiss‑Prot (reviewed) – keyword: "antifreeze protein" vs. "NOT_antifreeze_protein"

Validation set length range: 16–2,439 amino acids (CD‑HIT 40% redundancy reduction)

Key Findings – Compositional analysis (AFPs enriched in): Alanine (A), Isoleucine (I), Valine (V), Threonine (T) – Thr increases AFP activity by adding hydrogen bonds to surface area

Best Model Performance (validation set – 80 AFPs + 73 non‑AFPs, reviewed):

 
Model Features AUC MCC Accuracy
ET PSSM + AAC 0.93 0.77 88.2%
RF PSSM + AAC 0.91 0.64 81.7%
ET 150 selected (mRMR) 0.90 0.69 84.3%
XGB AAC only 0.89 0.63 81.7%

Comparison with existing methods (same validation dataset – reviewed):

 
Method AUC MCC Accuracy
AFProPred (ET + PSSM+AAC) 0.93 0.77 88.2%
AFP‑CKSAAP (2019) 0.89 0.65 82.0%
AFP‑LSE (2020) 0.48 74.0%
CryoProtect (2017) 0.61 0.23 60.1%
AFP‑SRC (2022) 0.14 57.0%

Alignment‑based methods (BLAST, MERCI motifs) failed due to poor coverage – ML models essential.

Data Curation & Quality Control:

  • Validation set: Swiss‑Prot reviewed entries only (manually curated)

  • Training set: Unreviewed UniProt + AFP‑Pred (CD‑HIT 40% identity)

  • Length filter: 16–2,439 amino acids

  • Evolutionary features: PSSM generated via PSI‑BLAST (Swiss‑Prot, 3 iterations)

  • Feature selection: mRMR (minimum redundancy maximum relevance)

Usage: Predicting antifreeze proteins for food preservation, cryopreservation, and medical applications, scanning protein sequences for AFP regions, designing AFP mutants.

Related Resources: Web server: https://webs.iiitd.edu.in/raghava/afpropred/ | GitHub: https://github.com/raghavagps/afpropred

Contact: raghava@iiitd.ac.in (Gajendra P. S. Raghava)

Files

raghavagps/afpropred-v1.0.zip

Files (4.0 MB)

Name Size Download all
md5:d71b704c66575f9ac60956d61a3c4851
4.0 MB Preview Download

Additional details

Related works