ecoli general annotation report

About

This report was built to summarise in a report the results of the most generic annotation contents, which are: Prokka, Barrnap, mlst, KofamScan and refseq_masher. If you’d like to see any other result included in this report please flag an enhancement issue on Github.

RefSeq Masher

RefSeq Masher is a tool that enables to rapidly find what NCBI RefSeq genomes match or are contained within your sequence data using Mash MinHash with a Mash sketch database of NCBI RefSeq Genomes. The results are shown below (bacannot outputs only the top 10).

MLST

Bacannot uses the mlst package to scan the PubMLST schemes available in order to classify the genome under public multilocus sequence type schemes. The results for ecoli are shown below.

Prokka

Prokka is generic prokaryotic genome annotation tool that produces standards-compliant output files.

In bacannot, when using prokka, the prokka database is incremented with either TIGRFAM hmm hosted at NCBI or with the extensive PGAP hmm database hosted at NCBI with the parameter --prokka_use_pgap is used.

Barrnap

Barrnap is a fast Ribosomal RNA predictor for bacterias, from the same developer of Prokka. It is fast and produces a GFF of the predicted rRNAs (See below).

KEGG KOs

KEGG KOs are annotated with KofamScan, which is a gene function annotation tool based on KEGG Orthology and hidden Markov model. You need KOfam database to use this tool. Online version is available on https://www.genome.jp/tools/kofamkoala/.

After annotation, the results are plotted with KEGGDecoder (See below).
KEGGDecoder heatmap of KofamScan annotation results.

Figure 1: KEGGDecoder heatmap of KofamScan annotation results.

Sourmash

Sourmash is a command-line tool and Python/Rust library for metagenome analysis and genome comparison using k-mers. It supports the compositional analysis of metagenomes, rapid search of large sequence databases, and flexible taxonomic profiling with both NCBI and GTDB taxonomies (see our prepared databases for more information). sourmash works well with sequences 30kb or larger, including bacterial and viral genomes.

In Bacannot, the sourmash tool was used for performing genome comparison and dendogram plot with all the genomes given as input, plus, all the 10 first genomes identified as closest to each genome based on refseq_masher results.

Duplicate genomes were removed (same genome is closest to multiple inputs).

The sourmash genome comparison results, and the compositional data of each sample is given as output, so that users can further utilize them to make customised sourmash plots as described in their documentation.

Sourmash genome comparison

Figure 2: Sourmash genome comparison