Published October 28, 2020 | Version 1.0
Journal article Open

Learning Functional Properties of Proteins with Language Models

  • 1. Cancer Systems Biology Laboratory (KanSiL), Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
  • 2. Department of Biostatistics and Computer Sciences, Karadeniz Technical University, Trabzon, Turkey

Description

This dataset includes;

- Precomputed representation vectors of human proteins with various protein embedding models.

- Precomputed representation vectors of SKEMPI dataset with various protein embedding models.

- MSAs of human proteins calculated with HHBlits.

     -- Splitted tar.gz files can be opened by command; cat human_protein_msa.tar.gz.* | tar xzvf -

- MSAs of protein sequences of SKEMPI dataset calculated with HHBlits.

Files

Files (11.2 GB)

Name Size Download all
md5:f63d0ee17f1e5fdcde3fb4498afa35c7
1.1 GB Download
md5:6e8b90f0f50c34df23f9e299eebedd48
1.1 GB Download
md5:1dca9c24ae9d17f9955a81626ec824f8
1.1 GB Download
md5:9f82ad4e4dac947de8e795a37cfeab34
1.1 GB Download
md5:39f8fca9bd196720bb4d9148b4e0621f
1.1 GB Download
md5:da4e8d88cdc038bccd47d20ddb1d3e47
1.1 GB Download
md5:e9bfd5047ed66c9da0a9d8e549483ada
700.4 MB Download
md5:e3e7f713dcc3e3df6bdce12226a8731f
3.2 GB Download
md5:ffc029cec8d988b4869bfa8da0d62a28
404.5 MB Download
md5:2411098fe9b4f5639b81830b33612a32
435.0 MB Download