Published November 11, 2025 | Version v2
Dataset Open

Phold INPHARED 1419 and Benchmarking Datasets Protein Structures, Genomes and Annotations

Authors/Creators

Description

* This tarball contains all ColabFold structures, genomes, pharokka and phold (with ColabFold structures)  annotations for INPHARED 1419, Tara, Cook and Crass datasets analysed and/or used for benchmarking Phold

* We anticipate the INPHARED 1419 virus protein structures may be especially useful (see supplementary information for details on the viruses we generated proteomes for), please contact george.bouras@adelaide.edu.au if you have any issues or questions

* All 1419 INPHARED viruses from unique genera have their structures in separate subdirectories within the top level `inphared_structures.tar.gz` tarball according to their RefSeq accession e.g. all structures for Gordonia phage Forza NC_070763 are in the `NC_070763` subdirectory. These proteins are from the pharokka/pyrodigal-gv gene calls (not the NCBI GenBank CDS).

* All 1419 INPAHRED viruses also have separate `.gbk` GenBank Phold annotation files within the `separate_gbks` subdirectory of the tarball `all_final_inphared_unique_genus_phold_compare_structures.tar.gz` 

* Phynteny (https://github.com/susiegriggo/Phynteny_transformer) annotations are also added in v2 of this record for each of the four benchmarking datasets (INPHARED 1419, Cook, Crass, Tara). These are in `phynteny_annotations.tar.gz`. Phold with ColabFold structures genbank files were used as input for Phynteny.

Files

Files (6.5 GB)

Name Size Download all
md5:44e2d0d3952849d8d7096c093d8c7a51
6.3 GB Download
md5:f6fde2327cc32cea5320ff77e0c83c5a
137.9 MB Download