Software Open Access

GnS-PIPE: an optimized bioinformatic pipeline to efficiently assess microbial taxonomic diversity of complex environments using high throughput sequencing technologies

TERRAT, Sébastien

The recent development of high-throughput sequencing technologies has allowed the assessment of millions of sequences from a single metagenomic DNA at an affordable cost (Westcott and Schloss, 2015). These developments have led to the discovery of new organisms at a higher rate than taxonomists can describe and name (He et al., 2015). PCR amplification and sequencing of rRNA genes (16S, 18S, 23S, etc.) from metagenomic DNA are now widely used to study microbial communities in complex environments. However, these massive amounts of data produced needs now solutions to efficiently treat, process and analyze such information.

However, the development of bioinformatic tools must now be validated by various biological tests. This was particularly true for key steps to appraise microbial diversity and richness. Here, we present a new pipeline named GnS-PIPE, a software application performing bacterial, archaeal and fungal taxonomic diversity analyses. One of the key design in the development of GnS-PIPE was that we conduct biological validations of defined bioinformatic steps. These biological tests have been performed using the expertise of the GenoSol platform, a biological resource centre unique in France, devoted to the conservation and analysis of the genetic resources of soil microbial communities.

GnS-PIPE is mainly written in PERL (v5.16.0), except some specific steps written in Python 2.7 and in C language. GnS-PIPE was developed to be used with command lines on all Linux systems. Several dependencies are required: the PERL libraries Inline, List::Util, File::Path, File::Copy, Math::Int64 and POSIX; the Python libraries math, decimal, gmpy, os, sys, types, random and matplotlib (Hunter, 2007) and PyCogent (Knight et al., 2007). Several third-party tools are also essential: PrinSeq (Schmieder and Edwards, 2011), Flash (Magoc and Salzberg, 2011) Usearch (Edgar et al., 2010), FastTree (Price et al., 2010), INFERNAL (Nawrocki et al., 2009) and RDP Classifier (Wang et al., 2007). 

Files (75.7 MB)
Name Size
75.7 MB Download
All versions This version
Views 1,1501,150
Downloads 1919
Data volume 1.4 GB1.4 GB
Unique views 1,1141,114
Unique downloads 1919


Cite as