Published December 24, 2025 | Version 1.2
Preprint Open

A Linear Algebra approach on the Human Proteome: Protein Interaction Prediction

Description

Abstract

The prediction of Protein-Protein Interactions (PPI) is a central problem in systems biology. Current paradigms are inefficient: biophysical simulations are computationally intractable for interactome-wide screening, while Deep Learning architectures suffer from opacity and reliance on prohibitive GPU infrastructure.

In this work, we introduce Project Resonance, an alignment-free framework that redefines bio-interaction as a signal processing problem. We hypothesize that protein compatibility is governed by a "Spectral Grammar"—a low-rank thermodynamic structure detectable via classical linear algebra. Using the Homo sapiens proteome (STRING v12.0) as a model system, we implemented a pipeline combining:

  • Semantic Signal Extraction via TF-IDF on k-mers.
  • Latent Manifold Projection using Truncated Singular Value Decomposition (SVD) to isolate thermodynamic signal from evolutionary noise.
  • Geometric Inference using Gradient Boosting Machines (XGBoost) on interaction tensors.

Triple Validation Results (N=40,000):

We conducted a large-scale validation using 20,000 High-Confidence Positives (Score > 900) against 20,000 Real Biological Negatives (Score < 150), avoiding the pitfalls of synthetic data.

  • AUC-ROC (Real Negatives): 0.9907
  • AUC-ROC (Random Baseline): 0.9653
  • Training Time: ~147 seconds (2.5 minutes).

The fact that Real Negatives are identified with higher precision than Random noise confirms the "Spectral Dissonance" hypothesis: biological non-interaction is a structured, detectable phenomenon, not merely the absence of signal. This "Green AI" approach democratizes high-throughput proteomics.

Key Highlights:

  • Accuracy: 99.1% AUC on Real Biological Data.
  • Robustness: Validated on 40,000 human protein pairs.
  • Speed: Ultra-fast training (<3 min) and inference (<1ms).
  • Methodology: Pure Linear Algebra (SVD) + Gradient Boosting.

Statement of AI Assistance: This research was conducted with the computational co-piloting of Gemini (Google DeepMind) for code optimization and mathematical formalization.

CHANGELOG

  • 25/12/2025 1.0: Fix corresponding Homo sapiens taxonomy (Correction from initial Rat model).
  • 25/12/2025 1.2: Fix Random Data (Transition to Hard Biological Negatives protocol).
  • 25/12/2025 1.4: New Test 99% (Expanded dataset to 40,000 samples; Title and Description modifications).
  • 25/12/2025 1.6: General Fixes (Latex optimization, font scaling, and visual validation).

NOTE TO RESEARCHERS & CITATION POLICY

This work represents an independent breakthrough in computational proteomics, offering a lightweight alternative to GPU-heavy models. We are fully aware of parallel developments and recent literature from major institutions.

If this framework, particularly the application of Spectral Thermodynamics/SVD to biological sequences, inspires your own research or validates your findings, please uphold academic integrity by citing this original work.

📧 Feedback & Collaboration: We actively welcome peer review and comparative analysis. Please send your feedback or inquiries to: apirolo@abc.gob.ar

Notes

⚠️ CITATION REQUEST:

We are aware of the current landscape in PPI prediction (including recent Oxford papers). If our Spectral/SVD approach provides you with insights or inspiration that simpler linear algebra can solve complex biological problems, please cite this preprint.

Feedback: apirolo@abc.gob.ar

Files

Pirolo2025ProteinInteractionPrediction_viaSpectral1_6.pdf

Files (831.6 kB)

Additional details

Software

Programming language
Python
Development Status
Wip

References

  • Psychometrika [SVD Theorem]; Pirolo, A. S. (2025). The Deep Core: Cancer Vulnerability (Zenodo); Andres, P. (2025). The Backup Code: Evidence that "Junk DNA" is actually a Syntactic Error-Correction System (Zenodo).
  • Szklarczyk, D. et al. (2023). Nucleic Acids Res. [STRING v12.0]; Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System; Eckart, C., & Young, G. (1936).
  • Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceed- ings of the 22nd ACM SIGKDD Interna- tional Conference on Knowledge Discovery and Data Mining
  • Eckart, C., & Young, G. (1936). The ap- proximation of one matrix by another of lower rank. Psychometrika, 1(3), 211-218.