There is a newer version of the record available.

Published March 27, 2023 | Version 1.1.0
Dataset Open

TemStaPro Datasets

  • 1. Institute of Biotechnology, Life Sciences Center, Vilnius University; Institute of Informatics, Faculty of Mathematics and Informatics, Vilnius University
  • 2. Institute of Biotechnology, Life Sciences Center, Vilnius University
  • 3. CasZyme
  • 4. Institute of Biotechnology, Life Sciences Center, Vilnius University; CasZyme

Description

This dataset contains protein sequences used to train, validate, and test binary classifiers that form TemStaPro program, which is applied for protein thermostability prediction with respect to nine temperature thresholds from 40 to 80 degrees Celsius using a step of five degrees.

The data is given in files of FASTA format. Each protein sequence has a header made of three values separated by vertical bar symbols: organism's, to which the protein belongs, UniParc taxonomy identifier; UniProtKB/TrEMBL identifier of the protein sequence; organism's growth temperature taken from the dataset of growth temperatures of over 21 thousand organisms (Engqvist, 2018).

TemStaPro-Major set is composed of 21 files:

  • one training
  • one validation
  • one imbalanced testing
  • nine balanced for each of nine binary classifiers' thresholds testing sets
  • nine balanced samples of 2000 sequences from each of the balanced testing set

TemStaPro-Minor set is composed of training, validation, and testing files all balanced for 65 degrees Celsius temperature threshold.

SupplementaryFileC2EPsPredictions.tsv file contains thermostability predictions using the default mode of TemStaPro program to check the thermostability of different C2EP groups.

The detailed description is given in the corresponding paper (https://doi.org/10.1101/2023.03.27.534365).

If you use the data from this dataset, please cite both the paper and the DOI of the dataset.

Notes

This project has received funding from European Regional Development Fund (project No 13.1.1-LMT-K-718-05-0021) under grant agreement with the Research Council of Lithuania (LMTLT). Funded as European Union's measure in response to COVID-19 pandemic.

Files

Files (762.1 MB)

Name Size Download all
md5:885d33021413e2af75d8c06745f752a8
4.9 MB Download
md5:899496d62aae8f49e6e2e74b3f29b39e
377.8 kB Download
md5:1fd267390df9f9f15b53dfe8e57ec024
42.9 MB Download
md5:665019a9148324b5bed7db7f7ca70a0c
376.4 kB Download
md5:2508a7720159e6db3067f971832345d2
38.0 MB Download
md5:2681aeafff0ad9ad99edd11559caaa9d
375.2 kB Download
md5:147874fd202ab12838bd3971b4bce087
30.3 MB Download
md5:96e977241862f680c5dd1a225b1a5b84
371.1 kB Download
md5:f6d035006a24e2432f13f596bb299b1f
21.3 MB Download
md5:170785e9a6e3fe26e9cdbe8e13dd6fe8
371.8 kB Download
md5:4e45e2f43266bfb2bc3cdf5f31b09fc2
12.4 MB Download
md5:097aca7720f3b1d873dda95948690fa0
367.0 kB Download
md5:31a5482845f543df7c8770bd72edc113
9.6 MB Download
md5:b4b8129feb895251e2ad71730d0eed34
373.9 kB Download
md5:3010b1af2d2761eaadfdfb3a4e8b1bff
7.8 MB Download
md5:eae08e4181e9484a5a744ffa04c63c2c
355.0 kB Download
md5:3ffeac2e7f313e48e253f22e5fafe5ac
3.1 MB Download
md5:dc5ba0ef194676064e725e9a89b033b4
349.3 kB Download
md5:627e6623c7009b4498a840d2cd11929e
2.5 MB Download
md5:7d8bc3fe4338f24d2ff230ac724b61a8
109.6 MB Download
md5:c6a5a49350dc5aae6fccee15cc8bf9f3
330.4 MB Download
md5:faf1ae9c71ca12b79fe15f7bec24dd5f
70.7 MB Download
md5:3a0a537fc98406ca4a3321a28d5fa050
12.9 MB Download
md5:d9eb30ea4df60962e6c1e85bb323f4b2
51.3 MB Download
md5:7a4920210b2354e708ed05b82039a4e9
11.1 MB Download

Additional details

Related works

Is supplement to
Preprint: 10.1101/2023.03.27.534365 (DOI)

References

  • Engqvist, Martin Karl Magnus. (2018). Growth temperatures for 21,498 microorganisms (1.0.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.1175609
  • Engqvist, Martin Karl Magnus. (2018). Correlating enzyme annotations with a large set of microbial growth temperatures reveals metabolic adaptations to growth at diverse temperatures. BMC microbiology, 18, 1-14. Engqvist, M. K. (2018). Correlating enzyme annotations with a large set of microbial growth temperatures reveals metabolic adaptations to growth at diverse temperatures. BMC microbiology, 18, 1-14. https://doi.org/10.1186/s12866-018-1320-7