Release date: 31 January 2026
The Pathogen–Host Interactions Database (PHI-base) is an online knowledge base that catalogues experimentally- verified pathogenicity, virulence and effector genes from fungal, oomycete, and bacterial pathogens, which infect animal, plant, fungal, and insect hosts. PHI-base is a valuable resource in the discovery of genes in medically and agronomically important pathogens, which may be potential targets for chemical intervention.
Information in PHI-base is manually curated by domain experts and is supported by strong experimental evidence (for example, gene disruption and gene complementation experiments), as well as references to the literature in which the original experiments are described. Annotations are made using terms from ontologies and controlled vocabularies, including the Gene Ontology (GO), Brenda Tissue Ontology (BTO), and the Pathogen–Host Interaction Phenotype Ontology (PHIPO).
PHI-base 5 includes information that was curated using a new curation process, described in Cuzick et al. (2023). PHI-base 5 uses a different data schema from PHI-base 4, but the majority of information from PHI-base 4 has been migrated to PHI-base 5. Information that has not yet been migrated can still be viewed in the PHI-base 4 dataset that is available on Zenodo at https://doi.org/10.5281/zenodo.5356870.
For more information about the planned transition from PHI-base 4 to PHI-base 5, see the Help and Announcements page on the PHI-base 5 website.
This version of the PHI-base 5 dataset contains the following types of information:
| Data type | Count |
|---|---|
| Genes | 10475 |
| Interactions | 33876 |
| Pathogen species | 309 |
| Host species | 237 |
| Diseases | 344 |
| References (publications) | 5273 |
| Annotations | |
| Pathogen-host interaction phenotype | 19898 |
| Gene-for-gene phenotype | 575 |
| Pathogen phenotype | 12421 |
| Host phenotype | 21 |
| GO biological process | 1540 |
| GO cellular component | 139 |
| GO molecular function | 215 |
| Post-translational modification | 8 |
| Physical interaction | 118 |
| WT RNA expression | 76 |
| WT protein expression | 2 |
phi-base_v5.3.xlsx: the PHI-base dataset as an Excel spreadsheet. This format follows the layout of the PHI-base 5 website, with sheets corresponding to the sections of gene pages on the website. This format is designed for use by non-technical users.
phi-base_v5.3.json: the PHI-base dataset in JSON format. This is modelled on the export format used by PHI-Canto, the curation tool used by PHI-base. This format is primarily intended for programmatic usage and has additional information (e.g. metadata for curation sessions) that is not included in the spreadsheet format.
phi-base.schema.json: a JSON Schema file for the JSON format of the dataset. This is included as documentation for the fields in the JSON file, but can also be used to validate the dataset.
To cite this version of the dataset (version 5.3), use the following citation:
Chang, H., Seager, J., Urban, M., & Hammond-Kosack, K. (2026). PHI-base: the Pathogen-Host Interactions Database, version 5.3 [Data set]. Zenodo. https://doi.org/10.5281/zenodo.18449986
Rights holder: Rothamsted Research
Licence: Creative Commons Attribution 4.0 International (https://creativecommons.org/licenses/by/4.0/)
Citation: Chang, H., Seager, J., Urban, M., & Hammond-Kosack, K. (2026). PHI-base: the Pathogen-Host Interactions Database, version 5.3 [Data set]. Zenodo. https://doi.org/10.5281/zenodo.18449986
Rothamsted Research relies on the integrity of our users to ensure that we receive suitable acknowledgment as being the originator of this dataset. This enables us to monitor the use of this dataset and to demonstrate its value. Please send us a link to any publication that uses this dataset.
The current members of the PHI-base team are the authors of this dataset.
The following professional curators provided curation for this release:
No members of the research community were recorded as providing curation or review to this version of the dataset.
Some contributors may not have consented to having their personal details shown here. For a complete list of contributors, see the record page on Zenodo for this dataset: https://doi.org/10.5281/zenodo.18449986
The following sections provide guidance on how to interpret and use the information in the dataset.
In PHI-base, a ‘metagenotype’ is a concept that represents the combined genotype of a pathogen and host during a pathogen–host interaction. The metagenotype is annotated with phenotypes that describe the outcome of the interaction and its effects on the pathogen and host, which may be closely interconnected.
The concept of a metagenotype is broadly synonymous with ‘interaction’, although we view an interaction as a dynamic process, whereas a metagenotype is more static.
PHI-base applies annotations to biological features, which include genes, genotypes, and metagenotypes. These features are displayed on the website and in the spreadsheet using concise display names. The syntax of the display names is based on the syntax used by the PHI-Canto curation tool.
The display name for the gene is simply the gene name provided by UniProtKB, where available. The spreadsheet typically uses the UniProtKB accession number instead of the gene name to reduce ambiguity.
Genotype names consist of a list of alleles, where each allele is followed by the species name, then the strain name in parentheses. Each allele contains: an allele name, which may immediately be followed by a description of the allele in parentheses; the allele type, in parentheses; and changes in expression level, if applicable, in square brackets.
The following template illustrates the syntax:
allele_name(description) (allele_type)[expression] species_name (strain)
An example of a deletion allele, where the expression level does not apply:
lsrKΔ (deletion) Escherichia coli (APEC94)
An example of an allele with a description:
AvrSen1deltaSP(1-30) (partial_amino_acid_deletion)[Not assayed] Synchytrium endobioticum (MB42)
Metagenotype names consist of two genotype names, the first being the pathogen genotype name, and the second being the host genotype name.
Shown below is an example of a metagenotype for an interaction between Trypanosoma cruzi (the pathogen) and Mus musculus (the host).
CUBΔ (deletion) Trypanosoma cruzi (Tulahuen) wild type Mus musculus (BALB/c)
PHI-base uses annotation extensions to provide additional information about the annotations made to biological features. Examples include the tissue site infected during a pathogen–host interaction, the change in infective ability of the pathogen during an interaction, and interactions used as a control case for an experiment.
Each annotation extension can be thought of as a named relation between the annotation (the domain) and a value (the range), where values can be an ontology term, a gene, a metagenotype, and so on.
The syntax of an annotation extension is the relation name, followed by a display name for the value, followed by a unique identifier for the value in parentheses.
In the following example, ‘infects_tissue’ is the relation name, ‘leaf’ is the display name for an ontology term, and BTO:0000713 is an identifier for the ontology term.
infects_tissue leaf (BTO:0000713)
PHI-base uses ontology terms in annotation. Ontology terms have a human-readable term label in addition to a machine-readable unique identifier. For example, the label ‘pathogen host interaction phenotype’ has the identifier PHIPO:0000001.
Most of the ontologies used by PHI-base follow the principles set out by the OBO Foundry, which mandates the use of a unique ontology prefix (e.g. PHIPO, GO, BTO) followed by a zero-padded number, usually seven digits in length.
Ontology term labels should not be regarded as stable or unique identifiers for an ontology term: only the ontology term identifier has these properties. Fortunately, the term identifier can be used to resolve the current term label. This can be done using lookup services, such as the Ontology Lookup Service provided by the EBI.
Note that the ontology term labels in the PHI-base spreadsheet are provided as a convenience. If you are planning to load information from PHI-base into your own application, we recommend using an ontology term lookup service to retrieve the latest term labels.
This guidance is for users of the PHI-base JSON format.
The PHI-base JSON export is organised into a collection of curation
sessions (the curation_sessions object). Each curation
session corresponds to a single publication that has been curated by
PHI-base.
Annotations and biological features can reference the unique identifier of other biological features: for example, a genotype can reference one or more alleles. To retrieve the information for the referenced biological feature, the identifier must be looked up elsewhere in the JSON file.
The hierarchy of biological features is shown below. Essentially, each type of biological feature ‘contains’ the feature below it:
Allele objects contain a gene identifier as the value of the
gene key. This identifier will match a key of the
genes object of the curation session.
Genotype objects contain allele identifiers in the loci
array. These identifiers match the keys of the alleles
object of the curation session object.
Metagenotype objects contain genotype identifiers as the values of
the pathogen_genotype and host_genotype keys.
These identifiers match the keys of the genotypes object of
the curation session object.
Annotation objects in the annotations array can contain
one of the following identifiers: a gene identifier (the
gene key), a genotype identifier (a genotype
key), or a metagenotype identifier (a metagenotype)
identifier. The metagenotype identifiers match the keys of the
metagenotypes object of the curation session object; the
other identifiers can be resolved using the process described above.
Organism objects in the organisms object are a special
case: they are identified by their NCBI Taxonomy ID (as a string type)
but are referenced in two different ways:
Gene objects reference organisms by their scientific name, in the
value of the organism key, which corresponds to the value
of the full_name key of the organism object.
Genotype objects reference organisms in the value of the
organism_taxonid key, which is the NCBI Taxonomy ID as an
integer type (instead of a string type).