Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

Published July 27, 2024 | Version v1
Dataset Open

DNA sampling in the Gulf of Mexico on GOMECC-4

  • 1. ROR icon University of New Hampshire
  • 2. ROR icon NOAA Atlantic Oceanographic and Meteorological Laboratories
  • 3. Cooperative Institute for Marine and Atmospheric Studies
  • 4. ROR icon Northern Gulf Institute
  • 5. University of Louisiana at Lafayette
  • 6. ROR icon North Carolina State University
  • 7. ROR icon NOAA National Marine Fisheries Service

Description

Tourmaline files for processing of 16S (Bacteria and Archaea) and 18S (protists) DNA metabarcoding samples that were collected in the Gulf of Mexico as part of the fourth Gulf of Mexico Ecosystems and Carbon Cycle (GOMECC-4) cruise. These files are associated with the following project: https://github.com/aomlomics/gomecc 

Tourmaline uses a Snakemake workflow and wraps programs like DADA2 and QIIME 2 to infer amplicon sequence variants (ASVs) and assigns taxonomy with common reference databases. All files from Tourmaline are included for 16S and 18S samples (separate folders). Taxonomic reference database files are also included for each marker region, which included SILVA (Version 138.1) and the Protistan Ribosomal Reference (PR2; Version 5.0.1) databases for 16S and 18S samples, respectively. 16S samples were sequenced on two different sequencing runs, and so, DADA2 in Tourmaline was performed separately on run 1 (plates 1-2) and 2 (plates 3-6) using the same parameters. With this approach, 16S ASV tables from separate runs were merged and assigned taxonomy on the combined table to avoid inflating the number of 16S ASVs. 

Folders with final Tourmaline output:

  • 16S  - two Tourmaline runs for plates 1-2 and 3-6
  • 18S  - single Tourmaline run

Folders with reference database files (e.g., raw files and classifiers)

  • 16S taxonomy (16S-tax-files)
  • 18S taxonomy (18S-tax-files)

Other files archived include a metadata table and markdown file describing the Tourmaline code used for bioinformatics. 

Raw FASTQ sequence data are available in NCBI SRA under BioProject ID PRJNA887898. FASTQ files included in the Tourmaline runs have been trimmed to remove primers. 

This work was funded in part through the NOAA Ocean Acidification Program (OAP) ROR #02bfn4816 under project numbers 21392 (Thompson) and 20708 (Barbero) and by awards NA16OAR4320199 and NA21OAR4320190 to the Northern Gulf Institute from NOAA’s Office of Oceanic and Atmospheric Research, U.S. Department of Commerce. This research was carried out in part under the auspices of the Cooperative Institute for Marine and Atmospheric Studies (CIMAS) and NOAA, cooperative agreement NA20OAR4320472.

Files

16S-tax-files.zip

Files (10.0 GB)

Name Size Download all
md5:6abeda3608473fe01f799f01a7c2799b
259.2 MB Preview Download
md5:151eb2a6bf764787425b68707925a754
6.3 GB Preview Download
md5:d90b9adcc95768b527bed7c3cd52c09c
120.3 MB Preview Download
md5:0e42f7cd6f371da1e93138a88f83ef1d
3.3 GB Preview Download
md5:2b3075ba48ad9f982380e311d0acd6b2
7.9 kB Preview Download
md5:2a9591fb864526a15728aec6cf665b5f
468.4 kB Preview Download