Presentation Open Access

Towards solving the metagenomics reproducibility crisis with CWL and RO

Folker Meyer

The MG-RAST portal [Meyer] and its European sister project MGnify [Mitchell] at the European Bioinformatics Institute (EMBL-EBI) provide metagenome analysis services to a large, international community of scientists. The systems capture metadata about each data set according to the standards of the Genomics Standards Consortium (GSC) [Field] and both have more recently begun to convert their workflows to Common Workflow Language [Amstutz] format.

Both metagenome data and computation with metagenomes are expensive [Thomas], significant degrees of freedom exists for the computational analysis underscoring the need for reproducibility in the field of environmental sequence analysis. While existing initiatives are attempting to benchmark different computational approaches [Sczyrba], it is vital for researchers to understand the provenance of information derived from metagenomes.

Our CWL formatted workflows allow rapid comparison of the two pipelines, with CWL described tools being reused to from other, related workflows for the analysis of marine eukaryotic transcriptomics.

MG-RAST has captured metadata about the data objects using GSC standards for several years and is exporting those via RESTful APIs [Wilke] and MGnify [Mitchell]. Together we expect to use Research Objects to export provenance information as part of our APIs. We are now working towards domain specific profiles, evaluating CWLProv [Khan] to identify any community specific extensions that might be needed for the Microbiome research community.

Full abstract: https://doi.org/10.5281/zenodo.1309962

Invited Talk at RO2018
Files (4.6 MB)
Name Size
S03E01-Folker Meyer - folker_ro_amsterdam.pdf
md5:86f02678ed1f8d6cb7c24d885019ff9e
4.6 MB Download
6
7
views
downloads
All versions This version
Views 66
Downloads 77
Data volume 32.1 MB32.1 MB
Unique views 55
Unique downloads 66

Share

Cite as