Published September 29, 2023 | Version v1
Dataset Open

Alignment-based protein mutational landscape prediction: doing more with less

  • 1. Sorbonne University
  • 2. Technical University Munich
  • 3. University of Paris

Description

The wealth of genomic data has boosted the development of computational methods predicting the phenotypic outcomes of missense variants. The most accurate ones exploit multiple sequence alignments, which can be costly to generate. Recent efforts for democratizing protein structure prediction have overcome this bottleneck by leveraging the fast homology search of MMseqs2. Here, we show the usefulness of this strategy for mutational outcome prediction through a large-scale assessment of 1.5M missense variants across 72 protein families. Our study demonstrates the feasibility of producing alignment-based mutational landscape predictions that are both high-quality and compute-efficient for entire proteomes. We provide the community with the whole human proteome mutational landscape and simplified access to our predictive pipeline.

Notes

Funding provided by: Agence Nationale de la Recherche
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100001665
Award Number: ANR-20-CE44-0010

Funding provided by: Bayerisches Staatsministerium für Bildung und Kultus, Wissenschaft und Kunst
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100004563
Award Number: 031L0168

Files

README.md

Files (28.3 GB)

Name Size Download all
md5:83b5707c2aa5e9f42bffb56ce756a868
7.9 MB Download
md5:058e656c18cb60d230bfca9ba6d55e50
23.7 kB Download
md5:2dfeb2d397f27082225854535eea6f0e
28.0 GB Download
md5:93e7eb19ab302fdf2ffb1fc33d864fbd
295.2 MB Download
md5:9edb9e7c568eafe0f2325c0d13cd1a42
10.9 kB Preview Download

Additional details

Related works

Is cited by
10.1101/2022.12.13.520259 (DOI)