Published April 23, 2007 | Version v1
Dataset Open

GSTPred: SVM-based method for predicting Glutathione S-transferase protein

  • 1. ROR icon Institute of Microbial Technology
  • 2. ROR icon Indraprastha Institute of Information Technology Delhi

Description

Welcome to the official documentation for GSTPred, a specialized computational tool developed for the prediction and functional annotation of Glutathione S-transferase (GST) proteins[cite: 1900]. [cite_start]GSTs are a ubiquitous enzyme superfamily essential for detoxification, stress survival, and drug resistance in both prokaryotes and eukaryotes.

Web Server: (https://webs.iiitd.edu.in/raghava/gstpred/)

Citation

Nitish Kumar Mishra, Manish Kumar and G.P.S. Raghava (2007). Support Vector Machine Based Prediction of Glutathione S-Transferase Proteins. Protein & Peptide Letters, 14(6),(https://doi.org/10.2174/092986607780990046)

 

GitHub:-https://github.com/Manish-IIITD-repository/GSTPred

About the Platform

[cite_start]GSTPred was developed to overcome the limitations of traditional similarity-based searching (like BLAST or FASTA), which often fail to identify novel proteins that lack significant sequence similarity to known databases.The platform utilizes Support Vector Machines (SVM) to classify proteins based on global sequence features, specifically amino acid composition and local order.

The platform is designed to:

  • Annotate Genomes: Provide high-speed functional assignment for uncharacterized protein products in the post-genomic era.
  • Support Drug Discovery: Identify targets for asthma, cancer, and HIV, where GSTs play a central role in drug metabolism.
  • Analyze Resistance: Help researchers understand anti-cancer drug resistance related to the overexpression of specific GST classes, such as GST π.

Key Features

Predictive Modeling

  • Machine Learning: Built using the SVM_light package with a Radial Basis Function (RBF) kernel, which proved superior to linear and polynomial kernels for this task.
  • High Performance:
    • Amino Acid Composition: Achieved 91.59% accuracy.
    • Dipeptide Composition: Achieved 95.79% accuracy.
    • Tripeptide Composition: Achieved a maximum accuracy of 97.66%, outperforming traditional HMM-based profile searching (96.26%).
  • Validation: Evaluated using a rigorous five-fold cross-validation technique on a non-redundant dataset.

Integrated Features

  • Compositional Analysis: Encapsulates 20-dimensional (amino acid), 400-dimensional (dipeptide), and 8000-dimensional (tripeptide) vectors to capture both frequency and local sequence order.
  • N-terminal Analysis: Incorporates findings that the N-terminal of GSTs is rich in Tyr, Ser, and Gly, which are essential for active site interactions with Glutathione (GSH).

Overview of Model Development

The training dataset consisted of 107 experimentally annotated GST proteins and 107 non-GST proteins, filtered using CD-HIT to ensure no two proteins shared more than 90% sequence identity.

Approach Threshold Sensitivity Specificity Accuracy MCC
Mono-peptide 0.2 91.59% 91.59% 91.59% 0.83
Di-peptide 0.0 96.26% 95.33% 95.79% 0.92
Tri-peptide -0.2 97.20% 98.13% 97.66% 0.95

Applications

  • Cancer Research: Identifying GST π as a prominent marker for cancer and studying its role in drug resistance and apoptosis.
  • Immunology: Investigating how GST levels indirectly regulate cellular immune responses by controlling GSH levels, which determine Th1 or Th2 response patterns.
  • Broad-Spectrum Use: Finding GST proteins across diverse organisms including prokaryotes and eukaryotes (animals, plants, and fungi).

Contact & Support

Prof. G.P.S. Raghava Bioinformatics Center, Institute of Microbial Technology (IMTECH), Chandigarh, India
Emailraghava@imtech.res.in Tel: +91-172-2690557

Files

Manish-IIITD-repository/GSTPred-v1.zip

Files (55.3 kB)

Name Size Download all
md5:a5be81a7dbbe9f7f4fe4195b5191de66
55.3 kB Preview Download

Additional details

Related works