Published May 31, 2026 | Version v1
Dataset Open

AF_Cache: Efficient Pipeline for Running AlphaFold for High-Throughput Protein-Protein Interaction Prediction

  • 1. ROR icon Stockholm University
  • 2. ROR icon Linköping University

Description

Data, scripts, and analysis code for the AF_Cache benchmarking study, provided as a .tar.gz archive together with a README.md.

Motivation: Accurate prediction of protein-protein interactions is essential for understanding biological processes, and recent advances such as AlphaFold2 and AlphaFold3 have enabled structure-based interaction prediction at unprecedented accuracy. However, the high computational cost of these methods, driven primarily by CPU-based repeated multiple sequence alignment (MSA) generation and, for AlphaFold2, repeated model recompilations, limits their applicability in large-scale, high-throughput settings. This creates a need for efficient pipelines that retain predictive performance while substantially reducing runtime.

Results: We present AF_Cache, a high-throughput Nextflow pipeline for accelerating protein-protein interaction prediction using AlphaFold2 and AlphaFold3. AF_Cache combines GPU-accelerated MSA generation with MMseqs2, feature caching to eliminate redundant alignment computations, and sequence length bucketing to minimise repeated JAX compilations. Benchmarking on a dataset of 5,050 human mitochondrial protein pairs demonstrates a ~2-fold reduction in inference time for AlphaFold2 and up to a 13-fold speedup of the MSA generation. Prediction scores across runs and configurations correlate with Pearson coefficient r=0.6 to 0.7 and even stronger correlations (r ≥ 0.94) for structurally implied interactions. AF_Cache enables efficient large-scale interaction screening and provides a practical framework for deploying AlphaFold-based methods in high-throughput applications.

The dataset contains prediction metrics, runtime summaries, predicted model outputs, pregenerated MSAs, and figure/table generation scripts comparing standard AlphaFold 2.3 and 3.0 workflows with AF_Cache-based workflows. Also includes the AF_Cache code version used for the paper and input FASTA sequences for the benchmark set.

Files

README.md

Files (21.1 GB)

Name Size Download all
md5:fe3cd1283501a7748f7a27ab6e18a621
21.1 GB Download
md5:0a0d1e7ff7b6f050bf4664db39e246d1
10.1 kB Preview Download