Published August 28, 2017 | Version v1
Journal article Open

Using MIxS: An Implementation Report from Two Metagenomic Information Systems

  • 1. Agriculture Agri-Food Canada, Ottawa, Canada
  • 2. NOAA Southwest Fisheries Science Center, La Jolla, CA, United States of America
  • 3. Agriculture and Agri-Food Canada, Ottawa, Canada

Description

MIxS (Minimum Information about any Sequence) (Yilmaz et al. 2011) is a metadata standard of the Genomics Standards Consortium (GSC), designed to make sequence data findable, accessible, and interoperable. It contains fields for recording physical and chemical characteristics of the sampling environment, geographical and habitat information, and other metadata about the sample and its provenance, which are critical for downstream intepretation of data derived from the sample. We will present our experience implementing MIxS in two metagenomic information systems – the Earth Microbiome Project (EMP) and the Government of Canada (GoC) Ecobiomics project.

The EMP (Gilbert et al. 2014) is an ongoing effort to crowdsource environmental microbiome samples from around Earth, then sequence and analyze them using a standardized workflow. The EMP has aggregated and sequenced over 50,000 samples, which are queryable using a publicly available catalogue. A meta-analysis of the first 25,000 samples is currently in review. MIxS and the Environment Ontology (ENVO) (Buttigieg et al. 2016) have been useful in structuring environmental metadata from EMP studies. For the particular application of the EMP meta-analysis, however, several issues were encountered: often there are multiple possible 'correct' assignments to the biome, feature, and material fields; the fields are not hierarchical, limiting logical organization; and the primary ecological factors differentiating microbial communites are not captured. In response to these challenges, the EMP team worked with the ENVO team to devise a new hierarchical structure, the EMP ontology (EMPO), that captures the primary axes along which microbial communities tend to be structured (host-associated or not, saline or not). EMPO is an application ontology, with a formally defined W3C Web Ontology Language (OWL) document mapping to existing ontologies, enabling reuse by the microbial ecology community.

Ecobiomics is a joint project of multiple GoC departments and involves the complete workflow, from sampling in a variety of aquatic, soil, and benthic environments, through sample prep, DNA extraction, library prep, sequencing, and analysis. In contrast to the EMP—where some of the samples and metadata had been collected before the establishment of the MIxS standards—the Ecobiomics project has been able to create metadata profiles for each sub-project to conform to, extend, and build, upon the existing MIxS standards.

Despite these two different contexts, EMP and Ecobiomics encountered a number of common issues that prevented a complete implementation of MIxS. These issues include ambiguous term names and definitions; inconsistencies amongst the environmental packages; non-standard ways of dealing with units; and a number of issues surrounding ENVO (the Environment Ontology), which is required for filling out the mandatory MIxS fields "Environmental material", "Biome", and "Environmental feature". We will describe these issues, and, more generally, the successes and challenges of our implementations.

Files

TDWGProc_article_20637.pdf

Files (80.4 kB)

Name Size Download all
md5:be10351e7670ba4f091c2d97bf1ce606
53.4 kB Preview Download
md5:aabe515b3ea2f10c730905289fe3423e
27.0 kB Preview Download

Linked records