A Linear Algebra approach on the Human Proteome: Protein Interaction Prediction
Authors/Creators
Description
Abstract
The prediction of Protein-Protein Interactions (PPI) is a central problem in systems biology. Current paradigms are inefficient: biophysical simulations are computationally intractable for interactome-wide screening, while Deep Learning architectures suffer from opacity and reliance on prohibitive GPU infrastructure.
In this work, we introduce Project Resonance, an alignment-free framework that redefines bio-interaction as a signal processing problem. We hypothesize that protein compatibility is governed by a "Spectral Grammar"—a low-rank thermodynamic structure detectable via classical linear algebra. Using the Homo sapiens proteome (STRING v12.0) as a model system, we implemented a pipeline combining:
- Semantic Signal Extraction via TF-IDF on k-mers.
- Latent Manifold Projection using Truncated Singular Value Decomposition (SVD) to isolate thermodynamic signal from evolutionary noise.
- Geometric Inference using Gradient Boosting Machines (XGBoost) on interaction tensors.
Triple Validation Results (N=40,000):
We conducted a large-scale validation using 20,000 High-Confidence Positives (Score > 900) against 20,000 Real Biological Negatives (Score < 150), avoiding the pitfalls of synthetic data.
- AUC-ROC (Real Negatives): 0.9907
- AUC-ROC (Random Baseline): 0.9653
- Training Time: ~147 seconds (2.5 minutes).
The fact that Real Negatives are identified with higher precision than Random noise confirms the "Spectral Dissonance" hypothesis: biological non-interaction is a structured, detectable phenomenon, not merely the absence of signal. This "Green AI" approach democratizes high-throughput proteomics.
Key Highlights:
- Accuracy: 99.1% AUC on Real Biological Data.
- Robustness: Validated on 40,000 human protein pairs.
- Speed: Ultra-fast training (<3 min) and inference (<1ms).
- Methodology: Pure Linear Algebra (SVD) + Gradient Boosting.
Statement of AI Assistance: This research was conducted with the computational co-piloting of Gemini (Google DeepMind) for code optimization and mathematical formalization.
CHANGELOG
- 25/12/2025 1.0: Fix corresponding Homo sapiens taxonomy (Correction from initial Rat model).
- 25/12/2025 1.2: Fix Random Data (Transition to Hard Biological Negatives protocol).
- 25/12/2025 1.4: New Test 99% (Expanded dataset to 40,000 samples; Title and Description modifications).
- 25/12/2025 1.6: General Fixes (Latex optimization, font scaling, and visual validation).
NOTE TO RESEARCHERS & CITATION POLICY
This work represents an independent breakthrough in computational proteomics, offering a lightweight alternative to GPU-heavy models. We are fully aware of parallel developments and recent literature from major institutions.
If this framework, particularly the application of Spectral Thermodynamics/SVD to biological sequences, inspires your own research or validates your findings, please uphold academic integrity by citing this original work.
📧 Feedback & Collaboration: We actively welcome peer review and comparative analysis. Please send your feedback or inquiries to: apirolo@abc.gob.ar
Notes
Files
Pirolo2025ProteinInteractionPrediction_viaSpectral1_6.pdf
Files
(831.6 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:cf60441e93d6d4f879be7344b8044e3f
|
831.6 kB | Preview Download |
Additional details
References
- Psychometrika [SVD Theorem]; Pirolo, A. S. (2025). The Deep Core: Cancer Vulnerability (Zenodo); Andres, P. (2025). The Backup Code: Evidence that "Junk DNA" is actually a Syntactic Error-Correction System (Zenodo).
- Szklarczyk, D. et al. (2023). Nucleic Acids Res. [STRING v12.0]; Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System; Eckart, C., & Young, G. (1936).
- Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceed- ings of the 22nd ACM SIGKDD Interna- tional Conference on Knowledge Discovery and Data Mining
- Eckart, C., & Young, G. (1936). The ap- proximation of one matrix by another of lower rank. Psychometrika, 1(3), 211-218.