Published September 12, 2023 | Version v1.0
Preprint Open

Viral protein family embeddings

Authors/Creators

Description

Each dataset is serialized as python pkl dictionary where keys are sequences and values are embeddings. Embeddings were produced using the protbert_bfd model from ProtTrans (DOI: 10.1109/TPAMI.2021.3095381). Example code for loading the embedding dictionaries can be found in the project github repository- https://github.com/kellylab/viral-protein-function-annotation-with-protein-language-model

PHROGs sequences come from the Prokaryotic virus Remote Homologous Groups database v3 (DOI: 10.1093/nargab/lqab067)

EFAM sequences come from the efam database (DOI: 10.25739/9vze-4143)

Files

Files (12.3 GB)

Name Size Download all
md5:10131a5115ca2d784a58f73ed10a5458
10.1 GB Download
md5:2ca0f87a9c37f2894001a141cec73d86
2.2 GB Download