Published September 29, 2023 | Version v1
Dataset Open

SOMLIT-Astan time-series (2009-2016) rDNA 18S V4 ASV table (dada2)

  • 1. CNRS, FR 2424, ABiMS Platform, Station Biologique de Roscoff, Sorbonne Université, Roscoff, France
  • 2. CNRS, Station Biologique de Roscoff, AD2M, UMR7144, Sorbonne Université, Roscoff, France
  • 3. CNRS, Station Biologique de Roscoff, FR2424, Sorbonne Université, Roscoff, France

Description

This repository contains a rDNA 18S V4 ASV table (astan-18sv4_dada2_v1.0.filtered.table.with.taxo.lulu.tsv.gz) for SOMLIT-Astan time-series (2009-2016). Each ASV, one per row, is described by the following fields: amplicon = ASV identifier; taxonomy = taxonomic path assigned to the ASV using IDTAXA; confidence = IDTAXA confidence scores for each taxonomic rank; sequence = ASV nucleic acid sequence; total = total number of reads for the entire dataset; spread = number of samples in which the ASV is detected; RAXXXXXX-X = number of reads in each of the 375 SOMLIT-Astan time-series samples. Sample ids contain information about the sampling date and the size fraction. The six digits after RA indicate the date (year, month and day), and the value after - indicate the size fraction, 02 for 0.2 to 3 µm and 3 for superior to 3 µm.

How this table has been generated:

The procedures used for DNA extraction and amplification of the 18S V4 region of the ribosomal operon are described in https://doi.org/10.1111/mec.16539. The eukaryote-specific primers used were TAReuk454FWD1 (5’-CCAGCASCYGCGGTAATTCC-3’, Saccharomyces cerevisiae position 565‐584) and TAReukREV3 (5’-ACTTTCGTTCTTGATYRA-3’, Saccharomyces cerevisiae position 964‐981) (Stoeck et al., 2010). Raw sequences are available at the European Nucleotide Archive (ENA) under the project id PRJEB48571.

The paired-end fastq files obtained from sequencing were demultiplexed and primers were removed using Cutadapt v2.8, filtering out untrimmed reads. Then, forward and reverse reads were trimmed at position 210 and reads with ambiguous nucleotides or with a maximum number of expected errors (maxEE) superior to 2 were filtered out using the function filterAndTrim() from the R package dada2 version 1.22 with R version 4.1.1 . For each run, error rates were defined using the function learnErrors(), reads were dereplicated using the function derepFastq() function and denoised using the dada() function with default options before being merged. Remaining chimaeras were removed using the function removeBimeraDenovo(). Only amplicon sequence variants (ASVs) with at least three reads in two samples were retained. ASVs were taxonomically assigned using IDTAXA with default parameters with the PR2 database version 4.14. Finally, the LULU curation approach was applied to the ASV table to remove remaining erroneous amplicons. For more details relative to the bioinformatic pipeline used to generate the ASV tables, see https://gitlab.sb-roscoff.fr/nhenry/rosko-naples-bioinfo.

Files

Files (1.5 MB)