Are we overestimating protistan diversity in nature?*
Description
*Supplemental information for submitted publication: David A. Caron & Sarah K. Hu. Are we overestimating protistan diversity in nature? In Review. Trends in Microbiology.
18S rRNA gene tag-sequencing dataset
Summary: Single-celled microbial eukaryotes (protists) fulfill fundamental roles in carbon fixation, energy flow, elemental transfer, decomposition, and diseases in virtually all environments on the planet. Until recently it has been difficult for ecologists to determine the full breadth of protistan species richness in natural communities using the traditional morphology-based taxonomy, but high-throughput sequencing (HTS) of target genes has revealed unprecedented and unexpected species richness. The most common use of HTS is to cluster sequences into approximately species-level designations, often as Operational Taxonomic Units (OTUs); yet, recent estimates of total OTUs appear to be approaching the total number of individual protists in a sample. Thus, are we overestimating protistan diversity in nature? In an effort to explore this question we processed a set of sequences using several common approaches (Figure 1, Schematic_V4tagseqsamples.pdf). A full description of sample collection, processing, and analysis are also available.
Data availability & contents:
- All sequence data is publicly available from the Short Read Archive under SRA ID: SRP110149. Sequences amplified using the Stoeck et al. (2010) primers are under accession numbers SAMN07211761- SAMN07211764 and those amplified with the Balzano et al. (2015) primers can be found under SAMN07211767- SAMN07211770.
- QIIME2 artifact files for running closed reference, de novo, and open-reference OTU clustering: derep_table.qza, derep_seqs.qza. To run, you'll need to acquire PR2 database (https://github.com/vaulot/pr2database)
- QIIME2 artifact files of trimmed reads to run QC and determine Amplicon Sequence Variants: demux_trimmed.qza
- Schematic_V4tagseqsamples.pdf - representation of sample collection, processing, and analysis (See below for full description).
Sample collection and PCR amplification
Seawater was collected in April 2014 at the San Pedro Ocean time-series station (SPOT; 33°33’N, 118°24’W) from 5 m, the subsurface chlorophyll maximum (SCM, 32 m), 150 m, and 890 m. Water was sequentially pre-filtered through 200 µm and 80 µm mesh to reduce the presence of multicellular eukaryotes (metazoa) and finally vacuum filtered into 47 mm GF/F filters (nominal pore size 0.7 µm; Whatman, International Ltd, Florham Park, NJ, USA) and immediately flash frozen for RNA extractions. Details can be found in Hu et al. (2016) and protocols.io (dx.doi.org/10.17504/protocols.io.hisb4ee).
Total extracted RNA was reverse transcribed into cDNA (iScript Reverse Transcription Supermix; Bio-Rad Laboratories, Hercules, CA, USA, #170-8840). cDNA was PCR amplified using either Stoeck et al. (2010) V4 primers [FWD 5’-CCAGCASCYGCGGTAATTCC-3’, REV 5’- ACTTTCGTTCTTGATYRA-3’] or Balzano et al. (2015) V4 primers [FWD 5’-CCAGCASCYGCGGTAATTCC-3’, REV 5’- ACTTTCGTTCTTGATYRR-3’], the latter having an extra degenerate nucleotide on the 3’ end of the reverse primer. PCR reactions for both sets of primers were identical, differing only by the added V4 primers, and consisted of 1X Phusion High-Fidelity DNA polymerase (New England Biolabs, Ipswich, MA, USA, #M0530S), 200 µM of dNTPs, 0.5 µM of each V4 forward and reverse primer, 3% DMSO, 50 mM of MgCl, and 5 ng of cDNA starting material.
The PCR thermal profile for the Stoeck, et al. (2010). V4 primers consisted of a 98°C denaturation step for 30 seconds (s), followed by 10 cycles of 10 s at 98°C, 30 s at 53°C, and 30 s at 72°C, and then 15 cycles of 10 s at 98°C, 30 s at 48°C, and 30 s at 72°C, and a final elongation step at 72°C for ten minutes, as described in Rodríguez-Martínez et al. (2012). The thermal profile for the Balzano, et al. (2015) PCR reactions was comprised of 98°C denaturation step for 30 seconds (s), followed by 14 cycles of 30 s at 98°C, 30 s at 53°C, and 30 s at 72°C, and then 21 cycles of 30 s at 98°C, 30 s at 53 and 48°C (duplicate reactions run at each temperature), and 30 s at 72°C, and a final elongation step at 72°C for one minute (Balzano, et al. 2015). PCR products were purified using AMPure XP beads (manufacturer info), quality checked on the Qubit fluormeter (v. 2.0 Life technologies) and Agilent Bioanalyzer (information). Extraction and PCR amplification steps can be found at protocols.io (dx.doi.org/10.17504/protocols.io.hk3b4yn and dx.doi.org/10.17504/protocols.io.hdmb246).
Sequence analysis
Raw sequences were quality filtered and clustered into OTUs or ASVs using both QIIMEv1 and v2. Bioinformatic pipelines are available at github (https://github.com/shu251). Initial quality control for the QIIMEv1 pipeline consisted of merging paired end sequences, quality filtering (Q>20), V4 primers were removed (cutadapt), and sequence were filtered by length (sequences longer than 500 bps or shorter than 150 bps were removed). In QIIMEv2, V4 primers were removed (cutadapt), sequences were filtered for quality (Q>20), and paired end reads were merged and dereplicated.
Before OTU clustering in QIIMEv1, quality checked sequences were combined and chimeras were removed using a reference-based approach (vsearch). Open-reference OTU clustering was conducted (Rideout et al.) with step 4 suppressed. De novo OTU clustering, closed-reference OTU clustering, and Amplicon Sequence Variant (ASV) determination was conducted in QIIMEv2. Following OTU or ASV clustering, chimeras were removed using a reference-based approach (vsearch). For reference-based bioinformatic steps (chimera removal, open-reference OTU clustering, and closed-reference OTU clustering) the Protist Ribosomal 2 database v4.75 (https://github.com/vaulot/pr2database) was used. OTU or ASV tables were compiled in R and all code is available at github (https://github.com/shu251).
References
Balzano S, Abs E, Leterme SC. 2015. Protist diversity along a salinity gradient in a coastal lagoon. Mar 25;74:263-277.
Hu, S. K. and others 2016. Protistan diversity and activity inferred from RNA and DNA at a coastal ocean site in the eastern North Pacific. FEMS Microbiol. Ecol. 92: 1-13.
Rodríguez-Martínez R, Rocap G, Logares R, Romac S, Massana R. 2012. Low evolutionary diversification in a widespread and abundant uncultured protist (MAST-4). Mol Biol Evol. May 01;29:1393-1406.
Stoeck T, Bass D, Nebel M, Christen R, Jones MDM, Breiner H-W, Richards TA. 2010. Multiple marker parallel tag environmental DNA sequencing reveals a highly complex eukaryotic community in marine anoxic water. Mol Ecol. Mar 01;19:21-31.
Files
Schematic_V4tagseqsamples.pdf
Files
(3.3 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:ae9b2dace934a9cffa032ac32d6dc99b
|
2.9 GB | Download |
|
md5:853c0e6aacf0391451bef560a9d5ba7d
|
325.8 MB | Download |
|
md5:a9f2583c618ad0bfd08c3d1cc7d232da
|
94.7 MB | Download |
|
md5:be24008d03ba0eaea3eceb7c5cacff9e
|
11.2 MB | Download |
|
md5:6bd73ef3d44c4f97c0762f1508aae99d
|
836.2 kB | Preview Download |