Published September 12, 2023
| Version v1.0
Preprint
Open
Viral protein family embeddings
Authors/Creators
Description
Each dataset is serialized as python pkl dictionary where keys are sequences and values are embeddings. Embeddings were produced using the protbert_bfd model from ProtTrans (DOI: 10.1109/TPAMI.2021.3095381). Example code for loading the embedding dictionaries can be found in the project github repository- https://github.com/kellylab/viral-protein-function-annotation-with-protein-language-model
PHROGs sequences come from the Prokaryotic virus Remote Homologous Groups database v3 (DOI: 10.1093/nargab/lqab067)
EFAM sequences come from the efam database (DOI: 10.25739/9vze-4143)
Files
Files
(12.3 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:10131a5115ca2d784a58f73ed10a5458
|
10.1 GB | Download |
|
md5:2ca0f87a9c37f2894001a141cec73d86
|
2.2 GB | Download |