Published February 23, 2023 | Version 0.91
Dataset Open

The North Pacific Eukaryotic Gene Catalog: Raw assemblies from Gradients 1, 2 and 3

Description

The North Pacific Eukaryotic Gene Catalog consolidates eukaryotic metatranscriptome data from three latitudinal transects of the North Pacific transition zone and one cruise in the subtropical gyre. Metatranscriptomes were gathered from latitudinally-resolved surface samples, and diel-resolved temporal studies, with samples taken in triplicate or duplicate and collected on 0.2-100 μm, 0.2-3 μm, and 3 μm-100 or 200 μm size fractions. These metatranscriptome data were de novo assembled into 175 independent assemblies, totalling 182 million clustered nucleotide contigs. Assemblies were annotated by taxonomy and function. This catalog provides assembled environmental contigs, their translated peptide sequences, and their taxonomic and functional annotations with the aim of facilitating continued discoveries about the molecular ecology of microbial eukaryotes in the North Pacific.

A full description of this data is published in Scientific Data, available here: The North Pacific Eukaryotic Gene Catalog of metatranscriptome assemblies and annotations. Please cite this publication if your research uses this data:

Groussman, R. D., Coesel, S. N., Durham, B. P., Schatz, M. J., & Armbrust, E. V. (2024). The North Pacific Eukaryotic Gene Catalog of metatranscriptome assemblies and annotations. Scientific Data, 11(1), 1161.

This dataset repository is associated with a codebase and documentation repository:
https://github.com/armbrustlab/NPac_euk_gene_catalog
Please see this code repository for additional data and project updates

Translated and processed protein sequences and their annotations are available in this repository:
https://zenodo.org/doi/10.5281/zenodo.10472589

99% identity clustered nucleotide sequences and kallisto enumerations are available here:
https://zenodo.org/doi/10.5281/zenodo.10570448

File contents: this repository contains five .tar.gz compressed tarballs with raw de novo Trinity assemblies of poly-A selected metatranscriptomes from the Gradients 1 through 3 cruises, and a plain-text file with the custom spike-in mRNA standards (CustomStandardSequences.txt)


Gradients1.KOK1606.PA.assemblies.tar.gz

- Link to G1PA project github page
- Simons CMAP cruise page and datasets: https://simonscmap.com/catalog/cruises/KOK1606
- Short read processing code: G1PA.process_short_reads.sh
- Trinity assembly code: G1PA.trinity_assemblies.sh


Gradients2.MGL1704.PA.assemblies.tar.gz

- Link to G2PA project github page
- Simons CMAP cruise page and datasets: https://simonscmap.com/catalog/cruises/MGL1704
- Short read processing code: G2PA.process_short_reads.sh
- Trinity assembly code: G2PA.trinity_assemblies.sh


Gradients3.KM1906.PA.assemblies.tar.gz

- Link go G3PA project github page
- Simons CMAP cruise page and datasets: https://simonscmap.com/catalog/cruises/KM1906
- Short read processing code: G3PA_UW.process_short_reads.sh
- Trinity assembly code: G3PA_UW.trinity_assemblies.sh


G3_diel.KM1906.PA.assemblies.tar.gz

- Link go G3PA project github page
- Simons CMAP cruise page and datasets: https://simonscmap.com/catalog/cruises/KM1906
- Short read processing code: G3PA_diel.process_short_reads.sh
- Trinity assembly code: G3PA_diel.trinity_assemblies.sh


CustomStandardSequences.txt
- Plain-text FASTA file with the spike-in standards used during mRNA extraction and sequencing prep
- Link to publication of spike-in standards methods: https://www.nature.com/articles/s41564-019-0507-5

The 2015 SCOPE Diel metatranscriptome raw assemblies have been released in a previous Zenodo repository, and are not included again in this deposition. We provide the links to the Diel1 resources here:
- Diel1 raw metatranscriptome assembly Zenodo repository: https://zenodo.org/records/5009803
- Dataset DOI: https://doi.org/10.5281/zenodo.5009803
- Associated publication: https://www.frontiersin.org/articles/10.3389/fmicb.2021.682651/full
- Codebase: https://github.com/armbrustlab/diel_eukaryotes
- Simons CMAP cruise page and datasets: https://simonscmap.com/catalog/cruises/KM1513
- Short read processing code: D1PA.process_short_reads.sh
- Trinity assembly code: D1PA.trinity_assemblies.sh



Files

CustomStandardSequences.txt

Files (34.8 GB)

Name Size Download all
md5:2cdf16373c80c4607938aca7f3bba938
13.0 kB Preview Download
md5:2bd622b69eb5d5b5d9aed53cf27d151e
5.3 GB Download
md5:6cf87325f0285025c5e21c83081c7c8c
9.7 GB Download
md5:757d4a2fd83ff93b0ed2ad94297fd86a
12.8 GB Download
md5:eadf9de8169060ad4ad90df8e0846039
7.0 GB Download