This MeroInd_README.txt file was generated on 2021-05-09 by Raphaelle Descoteaux GENERAL INFORMATION -------------------- 1. Title of Dataset: -------------------- Data from: Meroplankton diversity, seasonality and life-history traits across the Barents Sea Polar Front revealed by high-throughput DNA barcoding --------------------- 2. Author Information --------------------- A. Principal Investigator Contact Information Name: Raphaelle Descoteaux Institution: UiT, the Arctic University of Norway Email: raphaelle.descoteaux@uit.no B. Co-investigator Contact Information Name: Elizaveta Ershova Institution: Norwegian Institute of Marine Research Email: elizaveta.ershova@hi.no C. Co-investigator Contact Information Name: Owen S. Wangensteen Institution: UiT, the Arctic University of Norway Email: owen.wangensteen@uit.no D. Co-investigator Contact Information Name: Kim Præbel Institution: UiT, the Arctic University of Norway Email: kim.praebel@uit.no E. Co-investigator Contact Information Name: Paul Renaud Institution: Akvaplan-niva Email: per@akvaplan.niva.no F. Co-investigator Contact Information Name: Finlo Cottier Institution: Scottish Association For Marine Science Email: finlo.cottier@sams.ac.uk G. Co-investigator Contact Information Name: Bodil Bluhm Institution: UiT, the Arctic University of Norway Email: bodil.bluhm@uit.no --------------------------- 3. Date of data collection: --------------------------- 2017-11-23 to 2018-08-11 (5 seasonal cruises) ----------------------------------------- 4. Geographic location of data collection: ----------------------------------------- Barents Sea north (~77.5 °N, ~30.0 °E) and south (~75.5 °N, ~30.0 °E) of Polar Front. ------------------------------------------------------------------------- 5. Information about funding sources that supported the collection of the data: ------------------------------------------------------------------------- This research has been jointly funded by UiT the Arctic University of Norway and the Tromsø Research Foundation under the project "Arctic Seasonal Ice Zone Ecology", project number 01vm/h1 as well as by the Fram Centre Flagship “Climate Change in Fjord and Coast” grant number 272019 and the Fonds de Recherche Nature et Technologies du Québec (file 270604). The ArcticPRIZE project (NE/P006302/1 - UK Natural Environment Research Council) and the Nansen Legacy project (Norwegian Research Council project 276730) contributed ship time for sampling. The UiT library covered the costs of open access publication. The work of EE was done within the framework of the state assignment of IO RAS (theme No. 0128-2021-0007). SHARING/ACCESS INFORMATION -------------------------------------------- 1. Licenses/restrictions placed on the data: -------------------------------------------- None -------------------------------------------------- 2. Links to publications that cite or use the data: -------------------------------------------------- Descôteaux R, Ershova E, Wangensteen OS, Praebel K, Renaud PE, Cottier F and Bluhm BA (2021) Meroplankton Diversity, Seasonality and Life-History Traits Across the Barents Sea Polar Front Revealed by High-Throughput DNA Barcoding. Front. Mar. Sci. 8:677732. doi: 10.3389/fmars.2021.677732 ----------------------------------------------------------- 3. Links to other publicly accessible locations of the data: ------------------------------------------------------------ None --------------------------------------------- 4. Links/relationships to ancillary data sets: --------------------------------------------- See DATA & FILE OVERVIEW section 2 below. ---------------------------------------- 5. Was data derived from another source? --------------------------------------- No ---------------------------------------- 6. Recommended citation for this dataset: ---------------------------------------- Descoteaux, Raphaelle et al. (2021), Data from: Meroplankton diversity, seasonality and life-history traits across the Barents Sea Polar Front revealed by high-throughput DNA barcoding, Dryad, Dataset, https://doi.org/10.5061/dryad.n8pk0p2vf DATA & FILE OVERVIEW ------------- 1. File List: ------------- For processing of raw DNA data: MeroInd_Rcode_Bioinformatics.R XXXX_metadata.csv (3 files) ngsfilter_XXXX.csv (19 files) For data analysis, figures and tables: MeroInd_Rcode_DataAnalysis.Rmd MeroInd_IndividualData.csv MeroInd_SequenceAssignments.csv MeroInd_IDMatch.csv MeroInd_Quantification.csv MeroInd_Env_Summary.csv MeroInd_StationInfo.csv Plankton photos (1881 files) ------------------------------------------- 2. Relationship between files, if important: ------------------------------------------- The FASTQ files (see section 3 below) combined with the ngsfilter_XXXX.csv files, the XXXX_metadata.csv files and the MeroInd_Rcode_Bioinformatics.R can be used to produce MOTU tables for all sequencing libraries. Note that R package Mjolnir and its parameterization presented here are equivalent to the OBITools software used in the article related to this dataset. Data from the MOTU tables are shown in their processed form in the MeroInd_IndividualData.csv file (variables: Sample, Sequence, Highest.seq, Total.reads, Percent.highest.seq). Generally, the sequence with the highest number of reads in each sample was assumed to correspond to the individual of interest. Note that in a few cases, two individuals were combined into one sample for DNA extraction, amplification and sequencing, and therefore had to be added manually in the MeroInd_IndividualData.csv file. In this case, the two individuals would have different Ind.No but would share the same Sample number.  The MeroInd_IndividualData.csv, MeroInd_SequenceAssignments.csv, MeroInd_IDMatch.csv, MeroInd_Quantification.csv, MeroInd_Env_Summary.csv and the R code MeroInd_Rcode_DataAnalysis.Rmd are used together to produce the data analysis, figures and tables found in the related article. ------------------------------------------------------------------------ 3. Additional related data collected that was not included in the current data package: ------------------------------------------------------------------------ The raw sequencing datasets presented in this study are publicly available in the Sequence Read Archive (SRA) repository of NCBI. Bioproject name: PRJNA725248. https://www.ncbi.nlm.nih.gov/bioproject/PRJNA725248 CTD data (temperature, salinity and fluorescence) could not be included here due to copyright issues. Most of the code can be run without these data but please contact corresponding author for information on how to access these data if needed. ---------------------------------------------- 4. Are there multiple versions of the dataset? ---------------------------------------------- No -------------------------- METHODOLOGICAL INFORMATION -------------------------- See: Descôteaux R, Ershova E, Wangensteen OS, Praebel K, Renaud PE, Cottier F and Bluhm BA (2021) Meroplankton Diversity, Seasonality and Life-History Traits Across the Barents Sea Polar Front Revealed by High-Throughput DNA Barcoding. Front. Mar. Sci. 8:677732. doi: 10.3389/fmars.2021.677732 ------------------------------------------------------------------- DATA-SPECIFIC INFORMATION FOR: PLAY_metadata.csv, PLRX_metadata.csv, ILXX_metadata.csv ------------------------------------------------------------------- 1. Number of variables: 3 2. Number of cases/rows: 736 (PLAY), 480 (PLRX), 521 (ILXX) 3. Variable List: mjolnir_agnomens – Name given in Mjolnir (same as original sample name). original_samples – Original sample name. The first four letters/numbers represent the sequencing library and the last three numbers refer to the sample number in that library. position – The last 3 digits correspond to positions in ngsfilter_XXXX.tsv files. ----------------------------------------------------------- DATA-SPECIFIC INFORMATION FOR: ngsfilter_XXXX.tsv (19 files) ----------------------------------------------------------- 1. Number of variables: 6 2. Number of cases/rows: 96 – except IL25 (72), PLAH (64) 3. Variable List: *** Note that the variables are not explicitly labelled Library – Alphanumerical identification of each sequencing library. Sample – Alphanumerical identification of each sample for DNA extraction, amplification and sequencing. The first four letters/numbers represent the sequencing library and the last three numbers refer to the sample number in that library. No units. Library Tag – Library tag sequence. Forward primer – Forward primer sequence. Reverse primer – Reverse primer sequence. Position – Position in sequencing library. --------------------------------------------------------- DATA-SPECIFIC INFORMATION FOR: MeroInd_IndividualData.csv -------------------------------------------------------- 1. Number of variables: 17 2. Number of cases/rows: 1745 3. Variable List: Ind.No – Unique ID for each larval individual. No units. Month – Month of the year when the individual was collected. No units. Location – Refers to north or south of the Barents Sea Polar Front. No units. Depth.max – Maximum (deepest) depth of zooplankton net tow in meters (m). Depth.min – Minimum (shallowest) depth of zooplankton net tow in meters (m). Visual.ID – Visual identification of each individual. Not standardized across samples but corresponds to Visual.ID in MeroInd_Quantification.csv within each sample. No units. Visual.ID.standardized – Visual identification of each individual, standardized across samples. No units. Visual.ID.comments – Special characteristics that distinguish morphotype. Also includes comments from sorting and extraction process. For example, in some cases, two individuals were placed in the same vial for extraction, amplification, etc. This is mentioned here. No units. Photo – Number of the photograph of each individual within each sample. No units. Length – Length in micrometers (µm) of each individual. Only taken for bivalves and ophiuroids. Feature.measured – Body part or feature measured. No units. Measurement.comments – Comments about measuring process, when appropriate. No units. Sample – Alphanumerical identification of each sample for DNA extraction, amplification and sequencing. Note that generally, one sample corresponds to one individual larva but that in some cases, two larvae were combined into one vial and therefore assigned the same sample number. The first four letters/numbers represent the sequencing library and the last three numbers refer to the sample number in that library. No units. Sequence – Most abundant DNA sequence. Without primers or tags. Note that for the few occasions when two individuals were combined into one vial for extraction, amplification and sequencing, the most abundant and second most abundant sequences were assumed to correspond to the two larvae. No units. Highest.seq.counts – Number of DNA reads assigned to the sequence. Total.reads – Total number of DNA reads for that sample. Percent.highest.seq – Percent of the total reads which are assigned to the most abundant sample. Unit = % 4. Missing data codes: NA ------------------------------------------------------------- DATA-SPECIFIC INFORMATION FOR: MeroInd_SequenceAssignments.csv ------------------------------------------------------------- 1. Number of variables: 19 2. Number of cases/rows: 180 3. Variable List: Sequence – DNA sequence without primers or tags. BOLD.ID – Closest match on Barcode of Life Database (BOLD). Labelled as “No match” when database did not produce a match. BOLD.Percent – Percent match of sequence to closest match on Barcode of Life Database (BOLD). Units = % BLAST.ID – Closest match on NCBI Blast database. Note that BLAST.ID was “Not checked” when BOLD already produced a match >98%. Labelled as “No match” when database did not produce a match. BLAST.Percent – Percent match of sequence to closest match on NCBI Blast dataset. Comments – Comments related to BOLD and BLAST matches, particularly when the sequence matched two different organisms equally or, for BLAST searches, when % query was less than 100%. Best.ID – Either BOLD or BLAST ID depending on which one had highest percent match. Labelled as “No match” when neither databases produced a match. Best.ID.Percent – Percent match of sequence to Best.ID match. Best.ID.source – Which of BOLD or BLAST provided the “Best.ID”. Labelled as “No match” when neither databases produced a match. Meroplankton – Whether the Best.ID corresponds to a meroplanktonic taxon or not. Labelled “Ambiguous” when it is uncertain whether the taxon is meroplanktonic, especially when identified at a coarse taxonomic resolution. Labelled as “No match” when neither databases produced a match. Resolution – Taxonomic level at which larva was identified. ScientificName.accepted – Accepted taxonomic name according to World Register of Marine Species (WoRMS). Labelled as “No match” when neither databases produced a match. Kingdom – The full scientific name of the kingdom in which the taxon is classified. Phylum – The full scientific name of the phylum in which the taxon is classified. Class – The full scientific name of the class in which the taxon is classified. Order – The full scientific name of the order in which the taxon is classified. Family – The full scientific name of the family in which the taxon is classified. Genus – The full scientific name of the genus in which the taxon is classified. Species - The full scientific name of the species in which the taxon is classified. 4. Missing data codes: NA -------------------------------------------------- DATA-SPECIFIC INFORMATION FOR: MeroInd_IDMatch.csv -------------------------------------------------- 1. Number of variables: 5 2. Number of cases/rows: 524 3. Variable List: Best.ID – Closest match in BOLD or BLAST databases. See MeroInd_SequenceAssignments.csv Visual.ID – Visual identification of each individual. Corresponds to Visual.ID in MeroInd_IndividualData.csv. Visual.ID.comments – Special characteristics that distinguish morphotype. Also includes comments from sorting and extraction process. For example, in some cases, two individuals were placed in the same vial for extraction, amplification, etc. This is mentioned here. No units. Corresponds to Visual.ID.comments in MeroInd_IndividualData.csv. Number – Number of individuals which correspond to this particular combination of Best.ID, Visual.ID and Visual.ID.comments. Match – Decision whether the Best.ID (DNA-based) matches the Visual.ID. 4. Missing data codes: NA --------------------------------------------------------- DATA-SPECIFIC INFORMATION FOR: MeroInd_Quantification.csv --------------------------------------------------------- 1. Number of variables: 12 2. Number of cases/rows: 293 3. Variable List: Location – Refers to north or south of the Barents Sea Polar Front. No units. Month – Month of the year when the individual was collected. No units. Depth.max – Maximum (deepest) depth of zooplankton net tow in meters (m). Depth.min – Minimum (shallowest) depth of zooplankton net tow in meters (m). Net.opening – Surface area in square meters (m2) of the zooplankton net opening. Sieved.volume – Total volume of the concentrated zooplankton sample in milliliters (mL). Subs.volume – Volume of the subsample quantified in milliliters (mL). Previous.subs – The proportion of the total volume previously taken. Only applicable when multiple subsamples were taken from the same sample. Visual.ID – Visual identification of each individual. Not standardized across samples but corresponds to Visual.ID in MeroInd_IndividualData.csv within each sample. No units. Visual.ID.standardized – Visual identification of each individual, standardized across samples. No units. Comments – Special characteristics that distinguish morphotype. No units. Total.subs – Total number of individuals of each morphotype in the subsample. ------------------------------------------------------ DATA-SPECIFIC INFORMATION FOR: MeroInd_Env_Summary.csv ------------------------------------------------------ 1. Number of variables: 5 2. Number of cases/rows: 27 3. Variable List: Location – Refers to north or south of the Barents Sea Polar Front. No units. Month – Month of the year when the individual was collected. No units. Depth.max – Maximum (deepest) depth of zooplankton tow in meters (m). Layer - Categorical depth layer (Surface, Intermediate 1, Intermediate 2, Deep). WaterMass - Water mass assignment of each layer: Atlantic Water (AW), Arctic Water (ArW), Barents Sea Water (BSW) and Surface Water (SW). ------------------------------------------------------ DATA-SPECIFIC INFORMATION FOR: MeroInd_StationInfo.csv ------------------------------------------------------ 1. Number of variables: 5 2. Number of cases/rows: 9 3. Variable List: Location – Refers to north or south of the Barents Sea Polar Front. No units. Month – Month of the year when the individual was collected. No units. Date collected – Date of sample collected in format DD.MM.YYYY. Latitude – The geographic latitude (in decimal degrees). Longitude – The geographic longitude (in decimal degrees). ------------------------------------------------------- DATA-SPECIFIC INFORMATION FOR: Photos (.jpg, 1881 files) -------------------------------------------------------- Photos are named according to Location (N vs S), Month (Nov, Jan, Apr, Jun, Aug), Depth.max, Depth.min, Photo #.  Example: N_Apr_120_80_003.jpg represents an individual collected between 120-80 m at the northern location in April. Photo #3. Note that Photo # is only unique within a sample.