AFProPred – Prediction of antifreeze proteins using machine learning and evolutionary information
Authors/Creators
Description
Title:
AFProPred Dataset – Experimentally validated antifreeze proteins (AFPs) and non‑AFPs from reviewed UniProt entries
Description:
Project: AFProPred – Prediction of antifreeze proteins using machine learning and evolutionary information
Publication: Kumar, N., Patiyal, S., Choudhury, S., Bajiya, N., & Raghava, G.P.S. (2025). AFProPred: Prediction of antifreeze proteins using machine learning and evolutionary information. Proteomics, e202400157. https://doi.org/10.1002/pmic.202400157
Overview: This dataset accompanies AFProPred, a machine learning method for predicting antifreeze proteins (AFPs). AFPs enable organisms (fish, insects, fungi, bacteria) to survive in sub‑zero temperatures via thermal hysteresis and ice recrystallisation inhibition, with applications in food preservation, medicine, and cryosurgery. Unlike existing methods evaluated on unreviewed data, this study uses a validation dataset of reviewed (Swiss‑Prot) AFPs and non‑AFPs.
Content:
| Dataset | AFPs | Non‑AFPs | Source |
|---|---|---|---|
| Main (training) | 8,134 | 9,439 | UniProt (unreviewed) + AFP‑Pred |
| Validation (independent) | 80 | 73 | Swiss‑Prot (reviewed) – keyword: "antifreeze protein" vs. "NOT_antifreeze_protein" |
Validation set length range: 16–2,439 amino acids (CD‑HIT 40% redundancy reduction)
Key Findings – Compositional analysis (AFPs enriched in): Alanine (A), Isoleucine (I), Valine (V), Threonine (T) – Thr increases AFP activity by adding hydrogen bonds to surface area
Best Model Performance (validation set – 80 AFPs + 73 non‑AFPs, reviewed):
| Model | Features | AUC | MCC | Accuracy |
|---|---|---|---|---|
| ET | PSSM + AAC | 0.93 | 0.77 | 88.2% |
| RF | PSSM + AAC | 0.91 | 0.64 | 81.7% |
| ET | 150 selected (mRMR) | 0.90 | 0.69 | 84.3% |
| XGB | AAC only | 0.89 | 0.63 | 81.7% |
Comparison with existing methods (same validation dataset – reviewed):
| Method | AUC | MCC | Accuracy |
|---|---|---|---|
| AFProPred (ET + PSSM+AAC) | 0.93 | 0.77 | 88.2% |
| AFP‑CKSAAP (2019) | 0.89 | 0.65 | 82.0% |
| AFP‑LSE (2020) | — | 0.48 | 74.0% |
| CryoProtect (2017) | 0.61 | 0.23 | 60.1% |
| AFP‑SRC (2022) | — | 0.14 | 57.0% |
Alignment‑based methods (BLAST, MERCI motifs) failed due to poor coverage – ML models essential.
Data Curation & Quality Control:
-
Validation set: Swiss‑Prot reviewed entries only (manually curated)
-
Training set: Unreviewed UniProt + AFP‑Pred (CD‑HIT 40% identity)
-
Length filter: 16–2,439 amino acids
-
Evolutionary features: PSSM generated via PSI‑BLAST (Swiss‑Prot, 3 iterations)
-
Feature selection: mRMR (minimum redundancy maximum relevance)
Usage: Predicting antifreeze proteins for food preservation, cryopreservation, and medical applications, scanning protein sequences for AFP regions, designing AFP mutants.
Related Resources: Web server: https://webs.iiitd.edu.in/raghava/afpropred/ | GitHub: https://github.com/raghavagps/afpropred
Contact: raghava@iiitd.ac.in (Gajendra P. S. Raghava)
Files
raghavagps/afpropred-v1.0.zip
Files
(4.0 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:d71b704c66575f9ac60956d61a3c4851
|
4.0 MB | Preview Download |
Additional details
Related works
- Is supplement to
- Software: https://github.com/raghavagps/afpropred/tree/v1.0 (URL)
Software
- Repository URL
- https://github.com/raghavagps/afpropred