Project deliverable Open Access
Inge Alexander Raknes; Guy Cochrane; Lars Ailo Bongo; Nils Peder Willassen; Rob Finn; Juan Fu; Sudhagar Veerabadran Balasundaram; Espen Robertsen; Terje Klementsen; Giacomo Tartari
The marine databases; MarRef, MarDb, and MarCat, are public available resources that promotes marine research and innovation.
The marine resources, which have been implemented in the Marine Metagenomics Portal (MMP), are a collection of richly annotated and manually curated contextual (metadata) and sequence databases representing three tiers of accuracy. While MarRef is a database for completely sequenced marine prokaryotic genomes, which represent a marine prokaryote reference genome database, MarDb includes all sequenced marine prokaryotic genomes regardless of level of completeness. MarCat represent a gene (protein) catalogue of uncultivable (and cultivable) marine genes and proteins derived from metagenomics samples.
The first versions of MarRef and MarDb contain 484 and 2557 entries, respectively. Each record is build up of 104 metadata fields including attributes for sampling, sequencing, assembly and annotation in addition to organism and taxonomic information. For MarRef and MarDb, data from various sources, such as sequence, contextual, taxonomy and literature databases, in addition to data from bacterial diversity metadata and culture collection databases has been curated and integrated to produce robust databases. The corresponding genome, gene and protein sequence databases has been build by downloading the individual entries from ENA
(European Nucleotide Archive).
MarCat contains currently the Tara Ocean samples containing 1433 entries. In MarCat each record contains 103 metadata fields. As for MarRef and MarDb each entries has been manually curated and enriched with taxonomical annotation, assembly and functional annotation data. The corresponding DNA, gene and protein databases were generated using META-pipe, a pipeline for taxonomic classification and functional annotation of metagenomics sample.
To generate the contextual databases, controlled vocabularies and ontologies are used, which allow a more streamlined curation, better consistency of the data, enhanced quality control (QC) and not least data to be more easily aggregated and analysed. The manual curation of the data produces more robust, richly annotated datasets with highly accurate and detailed information.
The contextual and sequence databases has been incorporated into the Marine Metagenomics Portal (MMP) and are available at https://mmp.sfb.uit.no/.