Mobile genetic elements (MGEs) are a type of genetic material that can move around within a genome, or that can be transferred from one species or replicon to another. Newly acquired genes through this mechanism can increase fitness by gaining new or additional functions. On the other hand, MGEs can also decrease fitness by introducing disease-causing alleles or mutations. For instance, prophages are bacteriophages that have been inserted and integrated into the bacterial chromosome or plasmid. It is the latent form of a phage. ICEs (integrative and conjugative elements), on the other hand, are integrative mobile genetic elements that encode a conjugation machinery. They can confer selective advantages and can also encode resistance determinants and virulence factors.
In this context, this pipeline is capable of automatically annotating some mobile genetic elements using public available resources such as:
BLASTp
All the predictions were passed through a user defined threshold for minimum coverage and identity:
> 85
> 85
PHAST is a protein database scanned via BLASTp; ICEberg is a protein and nucleotide database that contains the full-length sequences of known ICEs and also contains the sequences of a multitude of proteins commonly found inside these ICEs. Full-length ICEs are blasted to the genome via BLASTn while the protein sequences are blasted tto the predicted genes via BLASTp; Plasmidfinder is a nucleotide database scanned via BLASTn. The other software have its own metrics.
Genomic Islands (GIs) were predicted with islandPath. The predicted genomic islands are integrated into the JBrowse genome viewer so that users can interactively interrogate the results and check the genes found inside these islands. The resulting genome browser are provided in the jbrowse
directory inside the query main output directory. This genome browser can be opened with the http-server command or the JBrowse Desktop software.
Additionally, these genomic islands were parsed in a very generic manner in order to provide a simple visualization of the annotation in these regions. The plots were rendered with the python package gff-toolbox and are available at the directory: genomic_islands/plots
in the main query output directory. An example of these plots is shown in Figure 1.
Figure 1: Examplification of the visualization of genomic islands regions with the gff-toolbox package.
As discussed, these images were rendered in a very generic manner just to show some visualization possibilities to the user. If desired, users can check the gff-toolbox package to produce more customized plots.
Plasmidfinder is a tool for the in silico detection of plasmids. Its results are summarized in Table 1
plasmids/plasmidfinder
under the main output directory.Platon detects plasmid contigs within bacterial draft genomes from WGS short-read assemblies. Therefore, Platon analyzes the natural distribution biases of certain protein coding genes between chromosomes and plasmids. This analysis is complemented by comprehensive contig characterizations upon which several heuristics are applied. Its results are summarized in Table 2.
plasmids/platon
under the main output directory.MOB-typer provides in silico predictions of the replicon family, relaxase type, mate-pair formation type and predicted transferability of the plasmid. Using a combination of biomarkers and MOB-cluster codes, it will also provide an observed host-range of your plasmid based on its replicon, relaxase and cluster assignment. This is combined with information mined from the literature to provide a prediction of the taxonomic rank at which the plasmid is likely to be stably maintained but it does not provide source attribution predictions.
plasmids/mob_suite
under the main output directory.All the prophage sequences and genes are available in the genome browser provided, it is worthy taking notes of prophage’s genomic regions for a better exploration when using it. The genome browser was automatically created (stored in a dir called
jbrowse
) and can be visualized with JBROWSE desktop ot http-server.
Phigaro is a standalone command-line application that is able to detect prophage regions taking raw genome and metagenome assemblies as an input. It also produces dynamic annotated “prophage genome maps” and marks possible transposon insertion spots inside prophages. Its results can be nicely visualized in its own html report file stored in its output directory. The genomic regions predicted as putative prophage sequences are also summarized in Table 4.
prophages/phigaro
in the main output directory01_ANNOTATION/prophages/phigaro/ecoli_phigaro.html
PhiSpy is a standalone tool that identifies prophages in Bacterial (and probably Archaeal) genomes. Given an annotated genome it will use several approaches to identify the most likely prophage regions. The genomic regions predicted as putative prophage sequences are also summarized in Table 5.
prophages/phispy
in the main output directoryAll prophage genes from PHAST database that had good alignments to the genes of the query genome are summarized in Table 6. The protein sequences of these genes were aligned against the gene sequences predicted by Prokka via BLASTp. They are all available in the genome browser provided. A good way to interrogate this annotation is to visualize the putative prophage regions predicted by phigaro and phispy interpolating it with the prophage gene annotation provided with phast database.
Unfortunately, PHASTER database have no searchable interface to visualize its prophages. Therefore, this table has no links to external sources.
Full-length ICEs are available at ICEberg database in nucleotide fastas while the proteins found inside these ICEs are in protein fastas. Since the ICEfinder script has no licenses to be incorporated to the pipeline, we try to search for the full-length ICEs. However, they are very difficult to be completely found in new genomes, thus they are scanned without coverage or identity thresholds. The filtering and selection of these is up to you. We have found a total of 35 alignments in the query genome, check it out in table 7.
Users are advised to also use the ICEfinder tool to predict the putative genomic position of known ICEs since we are not allowed to include this step under this pipeline.
All query genes predicted by Prokka that have a match in ICEberg database are shown in Table 8. It is summarized the ICE id and all its genes that were found in the query genome. All of them are linked to the database for further investigations.
Take note: The fact that the genome possess some proteins from ICEs does not necessarily means that the ICE is present in the genome. Please, check the number of proteins that the ICE of origin posses in the ICEberg database list of ICEs, and then make inferences based one the alignments you see.
Users are advised to also use the ICEfinder tool to predict the putative genomic position of known ICEs since we are not allowed to include this step under this pipeline.
Figure 2: The number of genes from known ICEs (from ICEberg) found in the query genome
Insertions sequences have been predicted with digIS. The digIS search pipeline operates in the following steps:
The program is executed with the GenBank annotation
Not a integron have been predicted with Integron Finder. This might have happened either because your genome really do not have integron sequences or due to misassemblies. You can always try to run the online version of the tool: https://integronfinder.readthedocs.io/en/latest/user_guide/webserver.html