A Linear Algebra approach on the Human Proteome: Protein Interaction Prediction

Andrés Sebastián, Pirolo

doi:10.5281/zenodo.18042953

Published December 24, 2025 | Version 1.2

Preprint Open

A Linear Algebra approach on the Human Proteome: Protein Interaction Prediction

Andrés Sebastián, Pirolo

Abstract

The prediction of Protein-Protein Interactions (PPI) is a central problem in systems biology. Current paradigms are inefficient: biophysical simulations are computationally intractable for interactome-wide screening, while Deep Learning architectures suffer from opacity and reliance on prohibitive GPU infrastructure.

In this work, we introduce Project Resonance, an alignment-free framework that redefines bio-interaction as a signal processing problem. We hypothesize that protein compatibility is governed by a "Spectral Grammar"—a low-rank thermodynamic structure detectable via classical linear algebra. Using the Homo sapiens proteome (STRING v12.0) as a model system, we implemented a pipeline combining:

Semantic Signal Extraction via TF-IDF on k-mers.
Latent Manifold Projection using Truncated Singular Value Decomposition (SVD) to isolate thermodynamic signal from evolutionary noise.
Geometric Inference using Gradient Boosting Machines (XGBoost) on interaction tensors.

Triple Validation Results (N=40,000):

We conducted a large-scale validation using 20,000 High-Confidence Positives (Score > 900) against 20,000 Real Biological Negatives (Score < 150), avoiding the pitfalls of synthetic data.

AUC-ROC (Real Negatives): 0.9907
AUC-ROC (Random Baseline): 0.9653
Training Time: ~147 seconds (2.5 minutes).

The fact that Real Negatives are identified with higher precision than Random noise confirms the "Spectral Dissonance" hypothesis: biological non-interaction is a structured, detectable phenomenon, not merely the absence of signal. This "Green AI" approach democratizes high-throughput proteomics.

Key Highlights:

Accuracy: 99.1% AUC on Real Biological Data.
Robustness: Validated on 40,000 human protein pairs.
Speed: Ultra-fast training (<3 min) and inference (<1ms).
Methodology: Pure Linear Algebra (SVD) + Gradient Boosting.

Statement of AI Assistance: This research was conducted with the computational co-piloting of Gemini (Google DeepMind) for code optimization and mathematical formalization.

CHANGELOG

25/12/2025 1.0: Fix corresponding Homo sapiens taxonomy (Correction from initial Rat model).
25/12/2025 1.2: Fix Random Data (Transition to Hard Biological Negatives protocol).
25/12/2025 1.4: New Test 99% (Expanded dataset to 40,000 samples; Title and Description modifications).
25/12/2025 1.6: General Fixes (Latex optimization, font scaling, and visual validation).

NOTE TO RESEARCHERS & CITATION POLICY

This work represents an independent breakthrough in computational proteomics, offering a lightweight alternative to GPU-heavy models. We are fully aware of parallel developments and recent literature from major institutions.

If this framework, particularly the application of Spectral Thermodynamics/SVD to biological sequences, inspires your own research or validates your findings, please uphold academic integrity by citing this original work.

📧 Feedback & Collaboration: We actively welcome peer review and comparative analysis. Please send your feedback or inquiries to: apirolo@abc.gob.ar

Notes

⚠️ CITATION REQUEST:

We are aware of the current landscape in PPI prediction (including recent Oxford papers). If our Spectral/SVD approach provides you with insights or inspiration that simpler linear algebra can solve complex biological problems, please cite this preprint.

Feedback: apirolo@abc.gob.ar

Files

Pirolo2025ProteinInteractionPrediction_viaSpectral1_6.pdf

Files (831.6 kB)

Name	Size	Download all
Pirolo2025ProteinInteractionPrediction_viaSpectral1_6.pdf md5:cf60441e93d6d4f879be7344b8044e3f	831.6 kB	Preview Download

Additional details

Programming language: Python
Development Status: Wip

Psychometrika [SVD Theorem]; Pirolo, A. S. (2025). The Deep Core: Cancer Vulnerability (Zenodo); Andres, P. (2025). The Backup Code: Evidence that "Junk DNA" is actually a Syntactic Error-Correction System (Zenodo).
Szklarczyk, D. et al. (2023). Nucleic Acids Res. [STRING v12.0]; Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System; Eckart, C., & Young, G. (1936).
Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceed- ings of the 22nd ACM SIGKDD Interna- tional Conference on Knowledge Discovery and Data Mining
Eckart, C., & Young, G. (1936). The ap- proximation of one matrix by another of lower rank. Psychometrika, 1(3), 211-218.

	All versions	This version
Views	167	114
Downloads	106	89
Data volume	111.3 MB	97.3 MB

Abstract

Pirolo2025ProteinInteractionPrediction_viaSpectral1_6.pdf

Files (831.6 kB)

Software

References

A Linear Algebra approach on the Human Proteome: Protein Interaction Prediction

Authors/Creators

Description

Abstract

Notes

Files

Pirolo2025ProteinInteractionPrediction_viaSpectral1_6.pdf

Files (831.6 kB)

Additional details

Software

References