Dataset for "Computational structural genomics unravels common folds and predicted functions in the secretome of fungal phytopathogen Magnaporthe oryzae"
Description
Datasets for:
"Computational structural genomics unravels common folds and predicted functions in the secretome of fungal phytopathogen Magnaporthe oryzae" https://apsjournals.apsnet.org/doi/abs/10.1094/MPMI-03-21-0071-R
This version of the dataset includes structures predicted by AlphaFold (https://github.com/deepmind/alphafold). If you would like to download the datasets produced with TrRosetta and I-TASSER, please access the previous version of the dataset. AlphaFold was run with "--preset=full_dbs" with all the databases required by AlphaFold. The template structures were downloaded around July 20th, and all templates were allowed to be used for modeling. The only change in the database was ~1650 fungal genome annotations from Joint Genome Institute appended to the uniref90 database.
In total, five PDB structures were generated per protein sequence. Four relied on the CASP 14 model (model_1, model_3, model_4 and model_5), and the other one was generated with model_2_ptm to obtain the pTM score.
The datasets included are:
1) Best_models.tar.gz: This zipped file contains the best model (ranked_0.pdb) of the five for all secreted proteins.
2) Best_models_pkl.tar.gz: This zipped file contains result_model_<>.pkl files for the best models. The pkl files store extra information about the predicted structures. For more details, please visit AlphaFold's GitHub page.
3) Network.tar.gz: This zipped file includes sequence-based homology and structure-based analogy search results, filtered multiple sequence alignments for each secreted protein, and structural similarity search results against the databases.
4) Magnaporthe_Oryza_Structure_prediction_and_clustering_metadata.zenodo.xlsx: This file, similar to Table S5, contains metadata about secreted proteins and their assignments into clusters based on sequence and structural similarity. Only AlphaFold models were used for structure-based clustering, and the criteria for clustering were the same as those for TrRosetta models.
If you have questions about the outputs or need additional data from us, please don't hesitate to email us ( s.kyungyong@berkeley.edu and kseniak@berkeley.edu ) !!
Files
Files
(77.4 GB)
Name | Size | Download all |
---|---|---|
md5:8393ca4c7543c9854c0b18ea754eee14
|
137.9 MB | Download |
md5:19827e18544a4e757e4bb8866f77d120
|
75.2 GB | Download |
md5:b3de2052f3c8423e3f3dcc926929f072
|
747.4 kB | Download |
md5:2c8cb879f7c123ab3c15ff7bc7a9cb32
|
2.1 GB | Download |