Published March 26, 2025
| Version v3
Dataset
Open
PRESCOTT/ESCOTT/iGEMME mutational effect predictions of all single point mutations for ~3000 proteins
Authors/Creators
Description
This dataset contains all necessary data to reproduce our analyses on ~3000 proteins:
It is made up of 4 compressed datasets.
- All colabfold MSAs and structures for ~3000 proteins: colabfold-sequences-structures-3000-proteins.tar.bz2
- All escott prediction: escott-v-1-6-0-max-two-components-colabfold-msas-entire-single-point-mutations-cvRC7.tgz
- All igemme predictions: escott-v-1-6-0-tjet-only-colabfold-msas-entire-single-point-mutations.tgz
- All gnomad v4.0.0 csv files used for prescott predictions: gnomadv4-0-0-csv-files.tgz
The dataset contains list of 500 proteins used in determining PRESCOTT coefficients in TRAINING-all_gene_names_v4_set1_no_acmg.txt.
Furthermore, the list of 1883 proteins used to measure for testing purpose is given in TESTING-all_gene_names_v4_set1_with_acmg.txt.
Files
TESTING-all_gene_names_v4_set1_with_acmg.txt
Files
(40.5 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:e926667f5c6c56d710459729cc11be1d
|
16.7 GB | Download |
|
md5:fdbda059fa886a9f9bab511782bfd8d0
|
11.4 GB | Download |
|
md5:cea2e12a561ebfe4ad7ba299a68b80ce
|
12.0 GB | Download |
|
md5:dad9725c4a82697733189f8c77ef0636
|
490.5 MB | Download |
|
md5:c7cf69e68088b3f669f9d04a6f58e5df
|
21.6 kB | Preview Download |
|
md5:1ccd86fe91a9623e849f386f0bf0db09
|
5.8 kB | Preview Download |
Additional details
Related works
- Is published in
- Publication: 10.1186/s13059-025-03581-y (DOI)