Published December 14, 2015 | Version v1
Software Open

Metapasta

  • 1. Era7 bioinformatics

Description

Metapasta is an open-source, fast and horizontally scalable tool for community profiling based on the analysis of 16S metagenomics data. It is entirely cloud-based and specifically designed to take advantage of it: it performs the community profiling of a sample starting from raw Illumina reads in approximately 1 hour, needing approximately the same time for doing the same on hundreds of samples. It uses BLAST or LAST, but other mapping solutions can be integrated. The taxonomic assignment is done using a best hit and a lowest common ancestor paradigm taking the NCBI taxonomy as reference. As an output, Metapasta generates the frequencies of all the identified taxa in any of the samples in tab-separated value text files. This output includes direct assignment frequencies and cumulative frequencies based on the hierarchical structure of the taxonomy tree. Reports format can be configured using DSL similar to spreadsheet formulas. PDF files with assigned taxonomy tree can be rendered.

Metapasta is implemented in Scala and based on cloud computing (Amazon Web Services). The graph data platform Bio4jis used for retrieving taxonomy related information and the tool Compota is used for distributing and coordinating compute tasks.

Files

metapasta.zip

Files (136.0 kB)

Name Size Download all
md5:44521a15e5f9535a4b83d240065a6b7b
136.0 kB Preview Download

Additional details

Funding

European Commission
INTERCROSSING - Innotive Training Environment for Researchers Combining the Resources of Statistical Science, Informatics & Genetics 289974