Published April 2, 2023 | Version v1.0.0
Dataset Open

Protein Function Embeddings: First Beta Release of Datasets

  • 1. University of Bonn
  • 2. ZB MED - Information Centre for Life Sciences
  • 1. Bonn-Aachen International Centre for Information Technology (B-IT), University of Bonn
  • 2. University of Cologne

Description

This release corresponds to the datasets generated from a thesis work that explores how information for protein functions can be exploited through embeddings so that the produced information can be used to improve protein function annotations. The underlying hypothesis here is that any pair of proteins with high sequence similarity will also share a similar biological function which would be reflected by the corresponding protein embeddings. The comparison and evaluation of this is done using two text-driven embedding approaches: Word2doc2Vec and Hybrid-Word2doc2Vec.

Files

annotations.zip

Files (7.6 GB)

Name Size Download all
md5:b986686409ca84357f74d06734ef2c10
193.7 MB Preview Download
md5:3b84b7cb871581570ffe907b42e1af7f
776.6 MB Preview Download
md5:84424e2145ac304b747b2aa05e61a57e
3.6 GB Preview Download
md5:ef61a4d5ed13d4872060974f8c7a1d99
2.8 kB Preview Download
md5:0ca4e6c05669ceb2bb7f6e9dc6119266
2.3 GB Preview Download
md5:b16ccdb72d70ce4b39983980f108cf93
117.5 kB Preview Download
md5:2d95252227718f9d09c07166c89b6532
114.5 MB Preview Download
md5:97b99720fd4048dc8e2613f9664b0f3c
171.7 MB Preview Download
md5:8384bbe965be628bb4f2948fbfccaa16
363.9 MB Preview Download
md5:bdae70019b15528bdb22d9df25273c94
1.9 MB Preview Download

Additional details

Related works

Is derived from
Software: 10.5281/zenodo.7781870 (DOI)