Specific marine databases

Inge Alexander Raknes; Guy Cochrane; Lars Ailo Bongo; Nils Peder Willassen; Rob Finn; Juan Fu; Sudhagar Veerabadran Balasundaram; Espen Robertsen; Terje Klementsen; Giacomo Tartari

doi:10.5281/zenodo.557021

Published April 24, 2017 | Version v1

Project deliverable Open

Specific marine databases

1. UiT
2. EMBL-EBI

The marine databases; MarRef, MarDb, and MarCat, are public available resources that promotes marine research and innovation.

The marine resources, which have been implemented in the Marine Metagenomics Portal (MMP), are a collection of richly annotated and manually curated contextual (metadata) and sequence databases representing three tiers of accuracy. While MarRef is a database for completely sequenced marine prokaryotic genomes, which represent a marine prokaryote reference genome database, MarDb includes all sequenced marine prokaryotic genomes regardless of level of completeness. MarCat represent a gene (protein) catalogue of uncultivable (and cultivable) marine genes and proteins derived from metagenomics samples.

The first versions of MarRef and MarDb contain 484 and 2557 entries, respectively. Each record is build up of 104 metadata fields including attributes for sampling, sequencing, assembly and annotation in addition to organism and taxonomic information. For MarRef and MarDb, data from various sources, such as sequence, contextual, taxonomy and literature databases, in addition to data from bacterial diversity metadata and culture collection databases has been curated and integrated to produce robust databases. The corresponding genome, gene and protein sequence databases has been build by downloading the individual entries from ENA
(European Nucleotide Archive).

MarCat contains currently the Tara Ocean samples containing 1433 entries. In MarCat each record contains 103 metadata fields. As for MarRef and MarDb each entries has been manually curated and enriched with taxonomical annotation, assembly and functional annotation data. The corresponding DNA, gene and protein databases were generated using META-pipe, a pipeline for taxonomic classification and functional annotation of metagenomics sample.

To generate the contextual databases, controlled vocabularies and ontologies are used, which allow a more streamlined curation, better consistency of the data, enhanced quality control (QC) and not least data to be more easily aggregated and analysed. The manual curation of the data produces more robust, richly annotated datasets with highly accurate and detailed information.

The contextual and sequence databases has been incorporated into the Marine Metagenomics Portal (MMP) and are available at https://mmp.sfb.uit.no/.

Files

D6.1 Specific marine databases .pdf

Files (2.1 MB)

Name	Size	Download all
D6.1 Specific marine databases .pdf md5:cdf6b16b6a6c29d32721be58b59adccb	2.1 MB	Preview Download

Additional details

European Commission
ELIXIR-EXCELERATE - ELIXIR-EXCELERATE: Fast-track ELIXIR implementation and drive early user exploitation across the life-sciences. 676559

	All versions	This version
Views	551	550
Downloads	343	342
Data volume	785.5 MB	783.3 MB

Specific marine databases

Authors/Creators

Description

Files

D6.1 Specific marine databases .pdf

Files (2.1 MB)

Additional details

Funding