Tara Oceans (2009-2013) rDNA 18S V4 ASV table (dada2)
Creators
- 1. Nantes Université, École Centrale Nantes, CNRS, LS2N, UMR 6004, F-44000 Nantes, France
- 2. CNRS, FR 2424, ABiMS Platform, Station Biologique de Roscoff, Sorbonne Université, Roscoff, France
- 3. Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland
- 4. Génomique Métabolique, Génoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, Evry, France
- 5. Sorbonne Université, CNRS, Station Biologique de Roscoff, AD2M, UMR 7144, ECOMAP, Roscoff, France
Description
This repository contains a rDNA 18S V4 ASV table (TARA-Oceans_18S-V4_dada2_table.tsv.gz) for Tara Oceans (2009-2013) and complementary taxonomic results (TARA-Oceans_18S-V4_dada2_vsearch_taxo.tsv.gz) from pairwise global alignment (usearch_global VSEARCH's command)
In the file TARA-Oceans_18S-V4_dada2_table.tsv.gz, each ASV, one per row, is described by the following fields: amplicon = ASV identifier; taxonomy = taxonomic path assigned to the ASV using IDTAXA; confidence = IDTAXA confidence scores for each taxonomic rank; total = total number of reads for the entire dataset; spread = number of samples in which the ASV is detected; sequence = ASV nucleic acid sequence; TARA_XXXXXXXXXX = number of reads in each of the 1,011 Tara Oceans samples.
In the file TARA-Oceans_18S-V4_dada2_vsearch_taxo.tsv.gz, each ASV is described by the following fields: amplicon = ASV identifier; similarity = percentage of similarity with the most similar reference sequence(s) (best hit) in PR2 v4.14; taxonomy = last common ancestor (LCA) of best hit(s); refs = best hits identifier.
Detailed information about DNA extraction, PCR amplification and Illumina sequencing of metabarcodes, as well as subsequent sequence data cleaning and taxonomic assignment can be found in (de Vargas et al. 2015; Alberti et al. 2017). Briefly, DNA samples were amplified by PCR targeting the hypervariable region V4 (385 ± 4 base pairs length; primer pair TAReuk454FWD1 5’-CCAGCASCYGCGGTAATTCC-3’ and TAReukREV3 5’-ACTTTCGTTCTTGATYRA-3’; (Stoeck et al. 2010)) of the 18S rRNA marker gene followed by the high-throughput sequencing of the amplicons (de Vargas et al. 2015). The details of PCR mixes, thermocycling and sequencing conditions are provided in Alberti et al. (2017).
Resulting paired-end reads were mixed-oriented meaning that both R1 and R2 files are composed by a mix of forward and reverse reads. Paired-end reads were trimmed to remove PCR primer sequences using Cutadapt v2.7 (Martin, 2011) and dispatched into four files, 2 files for the classical orientation (forward reads in R1 and reverse reads in R2) and 2 others for the other orientation (reverse reads in R1 and forward reads in R2). Paired-end reads without both primers were filtered out using the option --discard-untrimmed. Forward and reverse reads were trimmed at position 215 and reads with ambiguous nucleotides or with a maximum number of expected errors (maxEE) superior to 2 were filtered out using the function filterAndTrim() from the R package dada2 (Callahan et al., 2016). For each run and read orientation, error rates were defined using the function learnErrors() and denoised using the dada() function with pool = TRUE before being merged using mergePairs() with default parameters. Mixed orientated reads from the same sample and sequencing replicates were summed together. Remaining chimeras were removed using the function removeBimeraDenovo(). Scripts producing the ASV table are publicly available here: https://gitlab.univ-nantes.fr/combi-ls2n/taradada.
To remove potentially spurious ASVs, ASVs with less than 3 reads or present in only one sample were filtered out.
ASVs were taxonomically assigned using IDTAXA (50% confidence threshold) (Murali, Bhargava, and Wright 2018) with the PR2 database version 4.14 (Guillou et al. 2013) and results were added to the ASV table. Scripts used for the taxonomic assignment (IDTAXA) and the pairwise comparison (VSEARCH) are available here: https://gitlab.sb-roscoff.fr/nhenry/abims-metabarcoding-taxonomic-assignment/-/tree/v1.0.1
Alberti, A., Poulain, J., Engelen, S., Labadie, K., Romac, S., Ferrera, I., Albini, G., Aury, J.-M., Belser, C., Bertrand, A., Cruaud, C., Da Silva, C., Dossat, C., Gavory, F., Gas, S., Guy, J., Haquelle, M., Jacoby, E., Jaillon, O., Lemainque, A., Pelletier, E., Samson, G., Wessner, M., Genoscope Technical Team, Bazire, P., Beluche, O., Bertrand, L., Besnard-Gonnet, M., Bordelais, I., Boutard, M., Dubois, M., Dumont, C., Ettedgui, E., Fernandez, P., Garcia, E., Aiach, N.G., Guerin, T., Hamon, C., Brun, E., Lebled, S., Lenoble, P., Louesse, C., Mahieu, E., Mairey, B., Martins, N., Megret, C., Milani, C., Muanga, J., Orvain, C., Payen, E., Perroud, P., Petit, E., Robert, D., Ronsin, M., Vacherie, B., Acinas, S.G., Royo-Llonch, M., Cornejo-Castillo, F.M., Logares, R., Fernández-Gómez, B., Bowler, C., Cochrane, G., Amid, C., Hoopen, P.T., De Vargas, C., Grimsley, N., Desgranges, E., Kandels-Lewis, S., Ogata, H., Poulton, N., Sieracki, M.E., Stepanauskas, R., Sullivan, M.B., Brum, J.R., Duhaime, M.B., Poulos, B.T., Hurwitz, B.L., Tara Oceans Consortium Coordinators, Acinas, S.G., Bork, P., Boss, E., Bowler, C., De Vargas, C., Follows, M., Gorsky, G., Grimsley, N., Hingamp, P., Iudicone, D., Jaillon, O., Kandels-Lewis, S., Karp-Boss, L., Karsenti, E., Not, F., Ogata, H., Pesant, S., Raes, J., Sardet, C., Sieracki, M.E., Speich, S., Stemmann, L., Sullivan, M.B., Sunagawa, S., Wincker, P., Pesant, S., Karsenti, E., Wincker, P., 2017. Viral to metazoan marine plankton nucleotide sequences from the Tara Oceans expedition. Sci Data 4, 170093. https://doi.org/10.1038/sdata.2017.93
Callahan, B.J., McMurdie, P.J., Rosen, M.J., Han, A.W., Johnson, A.J.A., Holmes, S.P., 2016. DADA2: High-resolution sample inference from Illumina amplicon data. Nat Methods 13, 581–583. https://doi.org/10.1038/nmeth.3869
Guillou, L., Bachar, D., Audic, S., Bass, D., Berney, C., Bittner, L., Boutte, C., Burgaud, G., de Vargas, C., Decelle, J., del Campo, J., Dolan, J.R., Dunthorn, M., Edvardsen, B., Holzmann, M., Kooistra, W.H.C.F., Lara, E., Le Bescot, N., Logares, R., Mahé, F., Massana, R., Montresor, M., Morard, R., Not, F., Pawlowski, J., Probert, I., Sauvadet, A.-L., Siano, R., Stoeck, T., Vaulot, D., Zimmermann, P., Christen, R., 2012. The Protist Ribosomal Reference database (PR2): a catalog of unicellular eukaryote Small Sub-Unit rRNA sequences with curated taxonomy. Nucleic Acids Research 41, D597–D604. https://doi.org/10.1093/nar/gks1160
Martin, M., 2011. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet j. 17, 10. https://doi.org/10.14806/ej.17.1.200
Murali, A., Bhargava, A., Wright, E.S., 2018. IDTAXA: a novel approach for accurate taxonomic classification of microbiome sequences. Microbiome 6, 140. https://doi.org/10.1186/s40168-018-0521-5
de Vargas, C., Audic, S., Henry, N., Decelle, J., Mahé, F., Logares, R., Lara, E., Berney, C., Le Bescot, N., Probert, I., Carmichael, M., Poulain, J., Romac, S., Colin, S., Aury, J.-M., Bittner, L., Chaffron, S., Dunthorn, M., Engelen, S., Flegontova, O., Guidi, L., Horák, A., Jaillon, O., Lima-Mendez, G., Lukeš, J., Malviya, S., Morard, R., Mulot, M., Scalco, E., Siano, R., Vincent, F., Zingone, A., Dimier, C., Picheral, M., Searson, S., Kandels-Lewis, S., Tara Oceans Coordinators, Acinas, S.G., Bork, P., Bowler, C., Gorsky, G., Grimsley, N., Hingamp, P., Iudicone, D., Not, F., Ogata, H., Pesant, S., Raes, J., Sieracki, M.E., Speich, S., Stemmann, L., Sunagawa, S., Weissenbach, J., Wincker, P., Karsenti, E., Boss, E., Follows, M., Karp-Boss, L., Krzic, U., Reynaud, E.G., Sardet, C., Sullivan, M.B., Velayoudon, D., 2015. Eukaryotic plankton diversity in the sunlit ocean. Science 348, 1261605. https://doi.org/10.1126/science.1261605
Files
Files
(39.4 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:fdf02f7f9aebbc3cb33a4dc69835d8b3
|
31.6 MB | Download |
|
md5:30aeb8f9fd47faf547b9b1c191db0378
|
7.8 MB | Download |