Folker Meyer
2018-10-29
<p>The <a href="https://www.mg-rast.org/">MG-RAST portal</a> [<a href="https://doi.org/10.1186/1471-2105-9-386">Meyer</a>] and its European sister project <a href="https://www.ebi.ac.uk/metagenomics/">MGnify</a> [<a href="https://doi.org/10.1093/nar/gkx967">Mitchell</a>] at the European Bioinformatics Institute (EMBL-EBI) provide metagenome analysis services to a large, international community of scientists. The systems capture metadata about each data set according to the standards of the <a href="http://gensc.org/">Genomics Standards Consortium</a> (GSC) [<a href="https://doi.org/10.1371/journal.pbio.1001088">Field</a>] and both have more recently begun to convert their workflows to <a href="http://commonwl.org/">Common Workflow Language</a> [<a href="https://doi.org/10.6084/m9.figshare.3115156.v2">Amstutz</a>] format.</p>
<p>Both metagenome data and computation with metagenomes are expensive [<a href="https://doi.org/10.1186/2042-5783-2-3">Thomas</a>], significant degrees of freedom exists for the computational analysis underscoring the need for reproducibility in the field of environmental sequence analysis. While existing initiatives are attempting to benchmark different computational approaches [<a href="https://doi.org/10.1038/nmeth.4458">Sczyrba</a>], it is vital for researchers to understand the provenance of information derived from metagenomes.</p>
<p>Our CWL formatted workflows allow rapid comparison of the two pipelines, with CWL described tools being reused to from other, related workflows for the analysis of marine eukaryotic transcriptomics.</p>
<p>MG-RAST has captured metadata about the data objects using GSC standards for several years and is exporting those via RESTful APIs [<a href="http://doi.org/10.1371/journal.pcbi.1004008">Wilke</a>] and MGnify [<a href="https://doi.org/10.1093/nar/gkx967">Mitchell</a>]. Together we expect to use Research Objects to export provenance information as part of our APIs. We are now working towards domain specific profiles, evaluating <a href="https://w3id.org/cwl/prov/">CWLProv</a> [<a href="http://10.5281/zenodo.1208477">Khan</a>] to identify any community specific extensions that might be needed for the Microbiome research community.</p>
<p>Full abstract: <a href="https://doi.org/10.5281/zenodo.1309962">https://doi.org/10.5281/zenodo.1309962</a></p>
Invited Talk at RO2018
https://doi.org/10.5281/zenodo.1484480
oai:zenodo.org:1484480
Zenodo
https://doi.org/10.5281/zenodo.1309962
https://doi.org/10.1186/1471-2105-9-386
https://www.mg-rast.org/
https://doi.org/10.1093/nar/gkx967
https://doi.org/10.5281/zenodo.1208477
https://doi.org/10.6084/m9.figshare.3115156.v2
https://w3id.org/cwl/prov/
https://doi.org/10.1371/journal.pbio.1001088
https://doi.org/10.1186/2042-5783-2-3
https://doi.org/10.1038/nmeth.4458
https://doi.org/10.1371/journal.pcbi.1004008
https://mg-rast.org/
http://researchobject.org/
https://doi.org/10.1109/DataCloud.2014.6
http://www.mg-rast.org/mgmain.html?mgpage=download&metagenome=mgm4441680.3
https://api.mg-rast.org//metagenome/mgm4441680.3
https://api.mg-rast.org/researchobject/manifest/mgm4441680.3
https://zenodo.org/communities/ro
https://doi.org/10.5281/zenodo.1484479
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
RO2018, Workshop on Research Objects, IEEE eScience 2018, Amsterdam, Netherlands, 2018-10-29
Towards solving the metagenomics reproducibility crisis with CWL and RO
info:eu-repo/semantics/lecture