TemStaPro Datasets
Authors/Creators
- 1. Institute of Biotechnology, Life Sciences Center, Vilnius University; Institute of Informatics, Faculty of Mathematics and Informatics, Vilnius University
- 2. Institute of Biotechnology, Life Sciences Center, Vilnius University
- 3. CasZyme
- 4. Institute of Biotechnology, Life Sciences Center, Vilnius University; CasZyme
Description
This dataset contains protein sequences used to train, validate, and test binary classifiers that form TemStaPro program, which is applied for protein thermostability prediction with respect to nine temperature thresholds from 40 to 80 degrees Celsius using a step of five degrees.
The data is given in files of FASTA format. Each protein sequence has a header made of three values separated by vertical bar symbols: organism's, to which the protein belongs, UniParc taxonomy identifier; UniProtKB/TrEMBL identifier of the protein sequence; organism's growth temperature taken from the dataset of growth temperatures of over 21 thousand organisms (Engqvist, 2018).
TemStaPro-Major set is composed of 21 files:
- one training
- one validation
- one imbalanced testing
- nine balanced for each of nine binary classifiers' thresholds testing sets
- nine balanced samples of 2000 sequences from each of the balanced testing set
TemStaPro-Minor set is composed of training, validation, and testing files all balanced for 65 degrees Celsius temperature threshold.
SupplementaryFileC2EPsPredictions.tsv file contains thermostability predictions using the default mode of TemStaPro program to check the thermostability of different C2EP groups.
The detailed description is given in the corresponding paper (https://doi.org/10.1101/2023.03.27.534365).
If you use the data from this dataset, please cite both the paper and the DOI of the dataset.
Notes
Files
Files
(762.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:885d33021413e2af75d8c06745f752a8
|
4.9 MB | Download |
|
md5:899496d62aae8f49e6e2e74b3f29b39e
|
377.8 kB | Download |
|
md5:1fd267390df9f9f15b53dfe8e57ec024
|
42.9 MB | Download |
|
md5:665019a9148324b5bed7db7f7ca70a0c
|
376.4 kB | Download |
|
md5:2508a7720159e6db3067f971832345d2
|
38.0 MB | Download |
|
md5:2681aeafff0ad9ad99edd11559caaa9d
|
375.2 kB | Download |
|
md5:147874fd202ab12838bd3971b4bce087
|
30.3 MB | Download |
|
md5:96e977241862f680c5dd1a225b1a5b84
|
371.1 kB | Download |
|
md5:f6d035006a24e2432f13f596bb299b1f
|
21.3 MB | Download |
|
md5:170785e9a6e3fe26e9cdbe8e13dd6fe8
|
371.8 kB | Download |
|
md5:4e45e2f43266bfb2bc3cdf5f31b09fc2
|
12.4 MB | Download |
|
md5:097aca7720f3b1d873dda95948690fa0
|
367.0 kB | Download |
|
md5:31a5482845f543df7c8770bd72edc113
|
9.6 MB | Download |
|
md5:b4b8129feb895251e2ad71730d0eed34
|
373.9 kB | Download |
|
md5:3010b1af2d2761eaadfdfb3a4e8b1bff
|
7.8 MB | Download |
|
md5:eae08e4181e9484a5a744ffa04c63c2c
|
355.0 kB | Download |
|
md5:3ffeac2e7f313e48e253f22e5fafe5ac
|
3.1 MB | Download |
|
md5:dc5ba0ef194676064e725e9a89b033b4
|
349.3 kB | Download |
|
md5:627e6623c7009b4498a840d2cd11929e
|
2.5 MB | Download |
|
md5:7d8bc3fe4338f24d2ff230ac724b61a8
|
109.6 MB | Download |
|
md5:c6a5a49350dc5aae6fccee15cc8bf9f3
|
330.4 MB | Download |
|
md5:faf1ae9c71ca12b79fe15f7bec24dd5f
|
70.7 MB | Download |
|
md5:3a0a537fc98406ca4a3321a28d5fa050
|
12.9 MB | Download |
|
md5:d9eb30ea4df60962e6c1e85bb323f4b2
|
51.3 MB | Download |
|
md5:7a4920210b2354e708ed05b82039a4e9
|
11.1 MB | Download |
Additional details
Related works
- Is supplement to
- Preprint: 10.1101/2023.03.27.534365 (DOI)
References
- Engqvist, Martin Karl Magnus. (2018). Growth temperatures for 21,498 microorganisms (1.0.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.1175609
- Engqvist, Martin Karl Magnus. (2018). Correlating enzyme annotations with a large set of microbial growth temperatures reveals metabolic adaptations to growth at diverse temperatures. BMC microbiology, 18, 1-14. Engqvist, M. K. (2018). Correlating enzyme annotations with a large set of microbial growth temperatures reveals metabolic adaptations to growth at diverse temperatures. BMC microbiology, 18, 1-14. https://doi.org/10.1186/s12866-018-1320-7