Published March 31, 2023 | Version 1.2.0
Dataset Open

Cross-phyla protein annotation by structural prediction and alignment

Description

Background: Protein annotation is a major goal in molecular biology, yet experimentally determined knowledge is typically limited to a few model organisms. In non-model species, the sequence-based prediction of gene orthology can be used to infer protein identity, however this approach loses predictive power at longer evolutionary distances. Here we propose a workflow for protein annotation using structural similarity, exploiting the fact that similar protein structures often reflect homology and are more conserved than protein sequences.

Results:  We propose a workflow of openly available tools for the functional annotation of proteins via structural similarity (MorF: MorphologFinder) and use it to annotate the complete proteome of a sponge. Sponges are highly relevant for inferring the early history of animals, yet their proteomes remain sparsely annotated. MorF accurately predicts the functions of proteins with known homology in >90% cases, and annotates an additional 50% of the proteome beyond standard sequence-based methods. We uncover new functions for sponge cell types, including extensive FGF, TGF and Ephrin signalling in sponge epithelia, and redox metabolism and control in myopeptidocytes. Notably, we also annotate genes specific to the enigmatic sponge mesocytes, proposing they function to digest cell walls.

Conclusions: Our work demonstrates that structural similarity is a powerful approach that complements and extends sequence similarity searches to identify homologous proteins over long evolutionary distances. We anticipate this to be a powerful approach that boosts discovery in numerous -omics datasets, especially for non-model organisms.

Notes

we upload two more tables that contain the yeast and Arabidopsis proteins with similar structures to human proteins, as well as information on whether they are homologs or not.

Files

alphafold_performance.zip

Files (21.2 GB)

Name Size Download all
md5:7cc0c55cb9f0b69ee0892ce0a72feb83
7.8 GB Download
md5:466e8359c8be4772d0c0fd3058453821
1.4 GB Download
md5:b487dc52148942ff3d9e1343c575ec10
1.6 MB Preview Download
md5:45cf9fd053f24864bc971a7d6867d4b8
6.6 MB Download
md5:36088d89f9ae8c40c0d6d22146cb78c0
96.8 MB Preview Download
md5:2a425b1f26fd872b2101db035277e988
536.3 MB Download
md5:0ba5c9dcd2f200d0a030a7e2626e8c34
4.3 kB Download
md5:068cab23fde5d6c31ef6a3d067556983
6.6 GB Download
md5:a4c7162dd60ff7bcab8cd2a0faf95935
175.5 MB Preview Download
md5:8e34ca298bbd73b098154343558deda1
4.3 GB Download
md5:ca99114333a7fa299785163d0eae28ad
244.5 MB Preview Download
md5:2920604e67bbcc15babcc943e9755bbc
10.7 MB Download
md5:f68a9d70b7fa5834491f3fd7dfe0f71d
40.4 MB Download
md5:db913e2f0aae96b38a2d537fc5c9caf8
2.1 kB Download
md5:e218d0e7bfe37525441198c73cf6274e
1.9 MB Download

Additional details

Related works

Is supplement to
Preprint: 10.1101/2022.07.05.498892 (DOI)

Funding

European Commission
DeCoDe Platy - Reconstructing the complete developmental lineage of Platynereis dumerilii 101031984
European Commission
IGNITE - Comparative genomics of non-model invertebrates 764840

References