GSTPred: SVM-based method for predicting Glutathione S-transferase protein

Kumar Mishra, Nitish; Kumar, Manish; Raghava, Gajendra

doi:10.5281/zenodo.20069377

Published April 23, 2007 | Version v1

Dataset Open

GSTPred: SVM-based method for predicting Glutathione S-transferase protein

1. Institute of Microbial Technology
2. Indraprastha Institute of Information Technology Delhi

Welcome to the official documentation for GSTPred, a specialized computational tool developed for the prediction and functional annotation of Glutathione S-transferase (GST) proteins[cite: 1900]. [cite_start]GSTs are a ubiquitous enzyme superfamily essential for detoxification, stress survival, and drug resistance in both prokaryotes and eukaryotes.

Web Server: (https://webs.iiitd.edu.in/raghava/gstpred/)

Citation

Nitish Kumar Mishra, Manish Kumar and G.P.S. Raghava (2007). Support Vector Machine Based Prediction of Glutathione S-Transferase Proteins. Protein & Peptide Letters, 14(6),(https://doi.org/10.2174/092986607780990046)

GitHub:-https://github.com/Manish-IIITD-repository/GSTPred

About the Platform

[cite_start]GSTPred was developed to overcome the limitations of traditional similarity-based searching (like BLAST or FASTA), which often fail to identify novel proteins that lack significant sequence similarity to known databases.The platform utilizes Support Vector Machines (SVM) to classify proteins based on global sequence features, specifically amino acid composition and local order.

The platform is designed to:

Annotate Genomes: Provide high-speed functional assignment for uncharacterized protein products in the post-genomic era.
Support Drug Discovery: Identify targets for asthma, cancer, and HIV, where GSTs play a central role in drug metabolism.
Analyze Resistance: Help researchers understand anti-cancer drug resistance related to the overexpression of specific GST classes, such as GST π.

Key Features

Predictive Modeling

Machine Learning: Built using the SVM_light package with a Radial Basis Function (RBF) kernel, which proved superior to linear and polynomial kernels for this task.
High Performance:
- Amino Acid Composition: Achieved 91.59% accuracy.
- Dipeptide Composition: Achieved 95.79% accuracy.
- Tripeptide Composition: Achieved a maximum accuracy of 97.66%, outperforming traditional HMM-based profile searching (96.26%).
Validation: Evaluated using a rigorous five-fold cross-validation technique on a non-redundant dataset.

Integrated Features

Compositional Analysis: Encapsulates 20-dimensional (amino acid), 400-dimensional (dipeptide), and 8000-dimensional (tripeptide) vectors to capture both frequency and local sequence order.
N-terminal Analysis: Incorporates findings that the N-terminal of GSTs is rich in Tyr, Ser, and Gly, which are essential for active site interactions with Glutathione (GSH).

Overview of Model Development

The training dataset consisted of 107 experimentally annotated GST proteins and 107 non-GST proteins, filtered using CD-HIT to ensure no two proteins shared more than 90% sequence identity.

Approach	Threshold	Sensitivity	Specificity	Accuracy	MCC
Mono-peptide	0.2	91.59%	91.59%	91.59%	0.83
Di-peptide	0.0	96.26%	95.33%	95.79%	0.92
Tri-peptide	-0.2	97.20%	98.13%	97.66%	0.95

Applications

Cancer Research: Identifying GST π as a prominent marker for cancer and studying its role in drug resistance and apoptosis.
Immunology: Investigating how GST levels indirectly regulate cellular immune responses by controlling GSH levels, which determine Th1 or Th2 response patterns.
Broad-Spectrum Use: Finding GST proteins across diverse organisms including prokaryotes and eukaryotes (animals, plants, and fungi).

Contact & Support

Prof. G.P.S. Raghava Bioinformatics Center, Institute of Microbial Technology (IMTECH), Chandigarh, India
Email: raghava@imtech.res.in Tel: +91-172-2690557

Files

Manish-IIITD-repository/GSTPred-v1.zip

Files (55.3 kB)

Name	Size	Download all
Manish-IIITD-repository/GSTPred-v1.zip md5:a5be81a7dbbe9f7f4fe4195b5191de66	55.3 kB	Preview Download

Additional details

Is supplement to: Software: https://github.com/Manish-IIITD-repository/GSTPred/tree/v1 (URL)

Repository URL: https://github.com/Manish-IIITD-repository/GSTPred

	All versions	This version
Views	22	22
Downloads	3	3
Data volume	165.9 kB	165.9 kB

GSTPred: SVM-based method for predicting Glutathione S-transferase protein

Authors/Creators

Description

Citation

About the Platform

Key Features

Predictive Modeling

Integrated Features

Overview of Model Development

Applications

Contact & Support

Files

Manish-IIITD-repository/GSTPred-v1.zip

Files (55.3 kB)

Additional details

Related works

Software