Published January 10, 2024 | Version 3
Other Open

Bioinformatics tools for basic analysis of Next Generation Sequencing data

  • 1. ROR icon Istituto Superiore di Sanità
  • 1. ROR icon French Agency for Food, Environmental and Occupational Health & Safety
  • 2. ROR icon Istituto Superiore di Sanità
  • 3. ROR icon National Institute for Public Health and the Environment
  • 4. ROR icon Technical University of Denmark
  • 5. ROR icon National Veterinary Institute
  • 6. Swedish Food Agency - Livsmedelsverket

Description

In the framework of the activities of the Inter-EURLs working group on Next Generation Sequencing, an inventory was performed during Summer 2018 on the bioinformatics tools in use for the analysis of next generation sequencing (NGS) data across the National Reference Laboratories (NRLs) networks. The aim of this action was the collection of info on the most commonly used tools, to be provided to the NRLs, and the identification of potential areas of implementation. This inventory was used as the basis for compiling a list of tools routinely used by the majority of NRLs and by EURLs for NGS data analysis. The list is routinely updated in order to keep it inclusive of novel tools developed and to delete eventual tools which are no longer maintained. This document is not to be interpreted as a list of validated tools, but only as an information on those whose use is most spread among the networks of NRLs and EURLs, along with links for an easy and fast access to such tools.

Brief description of analytical steps for basic NGS analysis

  • Quality check: This step aims to perform a preliminary assessment on the overall quality of the sequences produced. All the tools performing quality check accept raw sequencing files in .fastq format in input.
  • Trimming: This step is used to remove adaptors sequences and low quality sequences. All the tools performing trimming accept raw sequencing files in .fastq format in input.
  • Assembly: This step involves the identification of overlapping regions among the sequencing reads included in the sequencing file (.fastq), with the aim of producing longer sequences representative of genomic regions (contigs) compiled in an output file in .fasta format. Some assemblers are specific for long single molecule sequencing reads, such as those produced by PacBio or Oxford Nanopore platforms. Assembly pipelines can use assembly tools to optimize the assembling process. Assembly correction tools correct raw contigs generated by rapid assembly methods by comparing them with consensus sequences generated through reads alignment.
  • Seven genes Multi Locus Sequence Typing (MLST): This step is used to type bacterial strains according to established schemes of allelic sequences of housekeeping genes.
  • Virulotyping: The identification of the presence of virulence genes in the sequencing files, through comparison with precompiled databases of sequences of virulence genes.
  • Serotype identification: Typing of the analysed bacteria by identifying serotype-associated genes in the sequencing files, through comparison with sequences in precompiled databases.
  • Inference on antimicrobial resistance: This step is used to predict the antimicrobial resistance of bacterial strains from whole genome sequences, through comparison with precompiled databases of antimicrobial resistance genes and with databases of known chromosomal mutations inducing resistance to antimicrobial compounds.

Files

Biorisks EURLs WG on NGS - Del4_Bioinformatics tools for basic NGS analysis-Michelacci-20240110-v3.pdf