Phold INPHARED 1419 and Benchmarking Datasets Protein Structures, Genomes and Annotations
Authors/Creators
Description
* This tarball contains all ColabFold structures, genomes, pharokka and phold (with ColabFold structures) annotations for INPHARED 1419, Tara, Cook and Crass datasets analysed and/or used for benchmarking Phold
* We anticipate the INPHARED 1419 virus protein structures may be especially useful (see supplementary information for details on the viruses we generated proteomes for), please contact george.bouras@adelaide.edu.au if you have any issues or questions
* All 1419 INPHARED viruses from unique genera have their structures in separate subdirectories within the top level `inphared_structures.tar.gz` tarball according to their RefSeq accession e.g. all structures for Gordonia phage Forza NC_070763 are in the `NC_070763` subdirectory. These proteins are from the pharokka/pyrodigal-gv gene calls (not the NCBI GenBank CDS).
* All 1419 INPAHRED viruses also have separate `.gbk` GenBank Phold annotation files within the `separate_gbks` subdirectory of the tarball `all_final_inphared_unique_genus_phold_compare_structures.tar.gz`
* Phynteny (https://github.com/susiegriggo/Phynteny_transformer) annotations are also added in v2 of this record for each of the four benchmarking datasets (INPHARED 1419, Cook, Crass, Tara). These are in `phynteny_annotations.tar.gz`. Phold with ColabFold structures genbank files were used as input for Phynteny.
Files
Files
(6.5 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:44e2d0d3952849d8d7096c093d8c7a51
|
6.3 GB | Download |
|
md5:f6fde2327cc32cea5320ff77e0c83c5a
|
137.9 MB | Download |