Published October 12, 2021 | Version v2
Dataset Open

Plant Communities at the Eden Project, UK, derived from soil eDNA

  • 1. Centre for Ecology and Conservation, University of Exeter, Penryn, UK
  • 2. Evolution et Diversité Biologique (EDB UMR 5174), Université Toulouse 3 Paul Sabatier

Description

The project seeks to understand the potential for the use of eDNA collected from soil to characterise plant communities. To do so, soils were sampled at the Eden Project in Cornwall UK, within the two covered biomes where we have a good understanding of the structure and composition of plant communities (further quantified with above ground plant coverage inventories). 32 plots were established across 10 different plant assemblages, each of which experiences subtle differences in soil chemistry and microclimate. Each plot consists of a 2 x 2 m quadrat, with four soil aggregates collected at each corner. 

eDNA was then extracted and amplified following the methods detailed in Zinger et al. (2016) and Donald et al. (2021). The primers used targeted the P6 loop of thechloroplastic  trnL intron [primer_fwd: GGGCAATCCTGAGCCAA, primer_rev: CCATTGAGTCTCTGCACCTATC] (Taberlet et al. 2007). 16 Extraction, 54 Sequencing, and 16 PCR controls are included so as to account for potential errors generated during the processing of samples, with a mock community (4 positive controls) of 10 known plant sequences also included to guide filtering thresholds. PCR products were pooled and sequencing libraries were constructed using the Illumina TruSeq NanoPCRFree kit following the supplier’s instructions (Illumina Inc., San Diego, California, USA), except that the ligation product was not PCR amplified to limit tag-jump biases (Taberlet et al 2018). The libraries were then sequenced on an Illumina Hiseq platform (San Diego, CA, USA).

Sequencing was conducted by the GenoToul bioinformatics platform (Toulouse, France), with the OBITOOLS package (Boyer et al. 2016). Here, the produced sequence data was processed using the following steps. First, ‘illuminapairedend’ was used to assemble paired-end reads. This algorithm is based on an exact alignment algorithm that considers the quality scores at all positions during the assembly process. Subsequently, we used the ‘ngsfilter’ command to identify and remove the primers and tags on each read, and assign reads to their respective samples (NGS filter file provided: ngsfilter_TRNL_PLANTS_EDEN_PROJECTb.txt). This program was used with its default parameters tolerating two mismatches for each of the two primers and no mismatch for the tags. Following this, sequencing reads were dereplicated using the ‘obiuniq’ command. The produced data.uniq.fasta file is supplied here. Sequences were then further filtered to remove sequences of low quality (containing Ns or with paired-end alignment scores below 50), and sequences represented by only one read (singletons) using the ‘obigrep’ command. To remove PCR/sequencing errors as well as intraspecific variability, we built OTUs (Operational Taxonomic Units) using the ‘sumaclust’ clustering algorithm (Mercier et al. 2013), which considers the most abundant sequence of each cluster as the cluster representative.  OTUs were set at a sequence similarity threshold of 95%. To assign a taxon to plant OTUs, we built a reference sequence database using the ecoPCR programme (Ficetola et al. 2010) on the European Molecular Biology Laboratory (EMBL; release 141). OTUs were then assigned a taxonomy, using OBITOOL’s ecotag programme (Boyer et al. 2016), which performs a global alignment of each OTU sequence (the query) against each reference. The reference taxon assigned to each OTU corresponds to the Last Common Ancestor of all the best-match sequences for the query. 


Datasets were subsequently filtered to remove contaminants as well as artefacts such as PCR chimeras and remaining sequencing errors, using routines implemented in the metabaR R package (Zinger et al 2021), in R version 3.6.1 (R Development Core Team, 2013). The filtering process consisted of four steps: (i) a negative control-based filtering. OTUs whose maximum abundance was found in extraction/PCR negative controls were removed from the dataset, as they were likely to be reagent/aerosol contaminants, better amplified in the absence of competing DNA fragments as it is the case in biological samples. (ii) a reference-based filtering. OTUs which are too dissimilar from sequences available in reference databases are potential chimeras generated during sequencing and amplification. In this study, we chose to set similarity thresholds at 100%. (iii) an abundance-based filtering. This procedure targets incorrect assignment of a few numbers of sequences corresponding to true OTUs occurring to the wrong sample, a phenomenon called “tag-switching”. It consists in setting OTUs abundances to 0 in samples where their abundance represents < 0.03% of the total OTU abundance in the entire dataset. (iv) Finally, we conducted a PCR-based filtering by considering any PCR reaction that yielded less than 1000 reads as non-functional, and removed them from the dataset. The script used for implementing this is provided (metabaR_Eden_Plants_100sim.html), with sequence data processed to remove contaminants, OTUs of low taxonomic resolution, and PCRs with too low a read count. The clean data is provided (eden_plant_postclean_100sim.rds).

References:

Boyer, F. et al. (2016) ‘obitools: a unix-inspired software package for DNA metabarcoding’, Molecular Ecology Resources, 16(1), pp. 176–182. doi:10.1111/1755-0998.12428.

Donald, J. et al. (2021) '‘Multi-taxa environmental DNA inventories reveal distinct taxonomic and functional diversity in urban tropical forest fragments.‘ Global Ecology and Conservation 29 (2021): e01724.

Mercier, C. et al. (2013) ‘SUMATRA and SUMACLUST: fast and exact comparison and clustering of sequences’, in Programs and Abstracts of the SeqBio 2013 workshop. Abstract. Citeseer, pp. 27–29.

Taberlet, P. et al. (2007) ‘Power and limitations of the chloroplast trn L (UAA) intron for plant DNA barcoding’, Nucleic Acids Research, 35(3), pp. e14–e14. doi:10.1093/nar/gkl938.

Taberlet, P. et al. (2018) Environmental DNA: For Biodiversity Research and Monitoring. Oxford University Press.

Team, R.C. (2013) R: A language and environment for statistical computing. Vienna, Austria.

Zinger, L. et al. (2016) ‘Extracellular DNA extraction is a fast, cheap and reliable alternative for multi-taxa surveys based on soil DNA’, Soil Biology and Biochemistry, 96, pp. 16–19.

Zinger, L. et al. (2021) ‘metabaR: An r package for the evaluation and improvement of DNA metabarcoding data quality’, Methods in Ecology and Evolution. DOI: https://doi.org/10.1111/2041-210X.13552

Files

ngsfilter_TRNL_PLANTS_EDEN_PROJECTb.txt

Files (1.2 GB)

Name Size Download all
md5:3458912f4358ef481fab8815a6759d66
1.2 GB Download
md5:2d8e993fc6a13297858f09e22cf4b906
201.3 kB Download
md5:2b7a36e9acdbd4ffd7a63b02e1917e8d
2.9 MB Download
md5:59201b2d1f44471c36dc575f5f7a981f
285.1 kB Preview Download

Additional details

Funding

European Commission
MICOCO - How do biotic and environmental variation affect soil microbial community composition and functioning across spatial scales? 101024135
Agence Nationale de la Recherche
CEBA - CEnter of the study of Biodiversity in Amazonia 10-LABX-0025
Agence Nationale de la Recherche
TULIP - Towards a Unified theory of biotic Interactions: the roLe of environmental 10-LABX-0041