There is a newer version of the record available.

Published August 6, 2025 | Version v1.0.0
Software Open

gbouras13/phold: v1.0.0

  • 1. Flinders University
  • 2. Institut Pasteur

Description

Major Phold release to go with the preprint. For more details, see the preprint and updated documentation.

Major Changes

  • Phold search database has been modified, filtered and curated to contain 1,363,704 proteins structures with functional labels (see https://zenodo.org/records/16741548). In particular, since the previous release of Phold, the enVhogs were re-clustered and re-labelled by the authors of that work. This release contains the updated enVhog structures.
  • We additionally make available a larger database containing 3,166,602 structures (i.e. the Phold search database plus an extra 1.8M efam and enVhog proteins without PHROG assignment or functional label) to download using phold install --extended_db. Using this database provides marginally fewer functional annotations and takes longer than using the default Phold search database, so is not recommended for functional annotation, but finds more hits (i.e. including to unknown function proteins) overall, so may be of interest for viral identification tasks.
  • PHROG functional labels have been updated in for 2,798 PHROGs using manual curation informed by structural similarity searches. See the preprint for more details. The updated annotations are available in the phold database under phold_annots.tsv
  • Phold search database is no longer pre-clustered, as it was shown not to significantly differ sensitivity and runtime from unclustered for the updated database.
  • Phold supports Foldseek-GPU acceleration for NVIDIA GPUs using foldseek_gpu. Note that it is still ideal to run Phold with multiple CPU-threads (e.g. -t 8 or however many threads you have available), as GPU acceleration only accelerates and improves the prefilter of Foldseek.
  • Phold supports custom user-specified Foldseek databases with --custom_db.
  • Phold adds high, medium and low confidence annotation heuristics to guide the user (especially users from wet-lab backgrounds or without much understand of protein structural alignment metrics) as to what annotations they should trust with a very high degree of confidence, and which they should prioritise for manual curation. See the documentation for more.
  • Phold will now mask all residues below 25 by default with --mask_threshold ProstT5 Confidence, as this was shown to increase annotation performance compared to no masking.
  • If you only want to annotated hypothetical proteins from Pharokka to save runtime and resource usage, you can use --hyps
  • You can run Phold with fine-tuned ProstT5 models using --finetune (phage finetuned ProstT5 encoder and phage fine-tuned CNN) or --vanilla (phage finetuned ProstT5 encoder and vanilla PDB-based CNN). Annotation performance with these do not dramatically differ with the default ProstT5 (see the preprint), but may be of interest to some users of Phold.

Files

gbouras13/phold-v1.0.0.zip

Files (28.0 MB)

Name Size Download all
md5:65c336510696ad536e8c0615ee698706
28.0 MB Preview Download

Additional details

Related works

Is supplement to
Software: https://github.com/gbouras13/phold/tree/v1.0.0 (URL)

Software