Presentation Open Access
The MG-RAST portal [Meyer] and its European sister project MGnify [Mitchell] at the European Bioinformatics Institute (EMBL-EBI) provide metagenome analysis services to a large, international community of scientists. The systems capture metadata about each data set according to the standards of the Genomics Standards Consortium (GSC) [Field] and both have more recently begun to convert their workflows to Common Workflow Language [Amstutz] format.
Both metagenome data and computation with metagenomes are expensive [Thomas], significant degrees of freedom exists for the computational analysis underscoring the need for reproducibility in the field of environmental sequence analysis. While existing initiatives are attempting to benchmark different computational approaches [Sczyrba], it is vital for researchers to understand the provenance of information derived from metagenomes.
Our CWL formatted workflows allow rapid comparison of the two pipelines, with CWL described tools being reused to from other, related workflows for the analysis of marine eukaryotic transcriptomics.
MG-RAST has captured metadata about the data objects using GSC standards for several years and is exporting those via RESTful APIs [Wilke] and MGnify [Mitchell]. Together we expect to use Research Objects to export provenance information as part of our APIs. We are now working towards domain specific profiles, evaluating CWLProv [Khan] to identify any community specific extensions that might be needed for the Microbiome research community.
Full abstract: https://doi.org/10.5281/zenodo.1309962