Multi-taxa environmental DNA inventories reveal distinct taxonomic and functional diversity in urban tropical forest fragments

Donald, Julian; Murienne, Jerome; Chave, Jerome; Iribar, Amaia; Louisanna, Eliane; Manzi, Sophie; Orivel, Jerome; Roy, Melanie; Schimann, Heidy; Tao, Shengli; Zinger, Lucie

doi:10.5281/zenodo.6516869

Published July 21, 2021 | Version v3

Dataset Open

Multi-taxa environmental DNA inventories reveal distinct taxonomic and functional diversity in urban tropical forest fragments

1. Evolution et Diversité Biologique (EDB UMR 5174), Université Toulouse 3 Paul Sabatier, CNRS, IRD - Toulouse, France; Centre for Ecology and Conservation, University of Exeter, Penryn TR10 9FE, UK
2. Evolution et Diversité Biologique (EDB UMR 5174), Université Toulouse 3 Paul Sabatier, CNRS, IRD - Toulouse, France
3. INRAE, CNRS, AgroParisTech, CIRAD, Université de Guyane, Université des Antilles, UMR Ecologie des Forêts de Guyane (EcoFoG), Campus agronomique, Kourou, France
4. Institut de Biologie de l'ENS (IBENS), Département de Biologie, École Normale Supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France

Urban expansion and associated habitat transformation drives shifts in biodiversity, with declines in taxonomic and functional diversity. Forests fragments within urban landscapes offer a number of ecosystem services, and help to maintain biodiversity and ecosystem functions. Here, we focus on a tropical forest environment, and on the soil biota. Using eDNA metabarcoding, we compare forest fragments within the city of Cayenne, French Guiana, with a neighbouring continuous undisturbed forest. We wished to determine if urban forest fragments conserve high levels of alpha and beta diversity as well as similar functional composition for plants, soil animals, fungi and bacteria. We found that alpha diversity is similar across habitats for plants and fungi, lower in urban forests for metazoans and higher for bacteria. We also found that urban forests communities differ from undisturbed forests in their taxonomic composition, with urban forests exhibiting greater turnover between fragments potentially caused by ecological drift and limited dispersal. However, their functional composition exhibited limited differences, with an enrichment of palms, arbuscular mycorrhizal fungi and bacteria and a depletion of climber plants and termites. Thus, although urban forest fragments do shelter soil biodiversity that differs from native forests, the losses of soil functions may be relatively limited. This study demonstrates the strong potential of a multi-taxa eDNA approach for rapid inventories across taxonomic kingdoms, in particular for cryptic soil diversity. It also demonstrates the key role of urban forest fragments in conserving biodiversity and ecosystem function, and points to a need for more systematic monitoring of these areas in urban management plans.

For each of the 16 samples per plot, 15 g of soil was used for eDNA analyses. Extracellular DNA was extracted as described previously (Zinger et al., 2016; 2019), where each soil sample is added to 15ml of saturated phosphate buffer (Na₂HPO₄; 0.12m; pH ≈8) in 50ml Falcon tubes. This is placed in an agitator for 15 minutes, before a 2ml aliquot of the soil/phosphate buffer mixture is pipetted into an Eppendorf tube and centrifuged for five minutes at 13000 rcf. 500μL of the resulting supernatant is then recovered and used for the next extraction steps that are carried out with a commercial kit for soil DNA (NucleoSpin® Soil; Macherey-Nagel, Düren, Germany), skipping the lysis step and following manufacturer’s instructions. The DNA extract was recovered in 100 μL and diluted 10 times before being used as PCR template.

For each plot one DNA extraction negative control was performed adding up 17 extractions per plot. PCR amplifications were then conducted for four DNA molecular markers, with primers targeting either Viridiplantae (subsequently referred to as plants), Eukaryotes, Fungi or Bacteria (Table 1). For each marker, PCR amplification of samples occurred across 12 plates. Each PCR reaction was performed in a total volume of 20 μl and comprised 10 μl of AmpliTaq Gold Master Mix (Life Technologies, Carlsbad, CA, USA), 5.84 μl of Nuclease-Free Ambion Water (Thermo Fisher Scientific, Massachusetts, USA), 0.25 μM of each primer, 3.2 μg of BSA (Roche Diagnostic, Basel, Switzerland), and 2 μl of DNA template that was before 10-fold diluted to reduce the amounts of PCR inhibitors. Thermocycling conditions for each primer pair are indicated in Table 1. A negative extraction control per site and a negative PCR control per PCR plate were amplified and sequenced in parallel with the regular samples. Positive controls were also included and consisted of mock communities of plants and fungi DNA (no mock communities were built for bacteria or eukaryotes here), which were used to guide choices in our data curation process. Two PCR replicates were performed for each sample and control. Amplification was conducted using a double indexing system strategy (Binladen et al. 2007) using a system of 32 by 36 octamers with at least five differences between them located at the 5’ end of each primer (Coissac 2012). In doing so, each PCR product had a unique combination of tags for both forward and reverse primers, allowing for the retrieval of sequence data for each sample. Ten wells per PCR plate were left empty to act as sequencing controls (non-used tag combinations) for downstream data curation (see below). PCR products were pooled and sequencing libraries were constructed using the Illumina TruSeq NanoPCRFree kit following the supplier’s instructions (Illumina Inc., San Diego, California, USA), except that the ligation product was not PCR amplified to limit tag-jump biases (Taberlet et al 2018). The libraries were then sequenced on different Illumina platforms (San Diego, CA, USA) depending on the marker considered (Table S1), using the paired-end technology.

Bioinformatic analyses were performed on the GenoToul bioinformatics platform (Toulouse, France), with the OBITOOLS package (Boyer et al. 2016). First, ‘illuminapairedend’ was used to assemble paired-end reads. This algorithm is based on an exact alignment algorithm that considers the quality scores at all positions during the assembly process. Subsequently, we used the ‘ngsfilter’ command to identify and remove the primers and tags on each read, and assign reads to their respective samples. This program was used with its default parameters tolerating two mismatches for each of the two primers and no mismatch for the tags. Following this, sequencing reads were dereplicated using the ‘obiuniq’ command. Sequences of low quality (containing Ns or with paired-end alignment scores below 50) were excluded using the ‘obigrep’ command. The same command was used to exclude sequences represented by only one read (singletons) as they are more likely to be molecular artefacts (Taberlet et al. 2018). Sequences outside of the preset range were also discarded (Table 1). To remove PCR/sequencing errors as well as intraspecific variability, we built OTUs (Operational Taxonomic Units) using the ‘sumaclust’ clustering algorithm (Mercier et al. 2013), which considers the most abundant sequence of each cluster as the cluster representative. OTUs were set at a sequence similarity threshold of 97% for eukaryotes, fungi and bacteria following the standards in microbial ecology, but this was lowered to 95% for plants since the eDNA target region is shorter (typically around 50 base pairs), where one mismatch inherently results in a lower percentage of similarity. To assign a taxon to plant and fungal OTUs, we built two reference sequence databases, one global, using the ecoPCR programme (Ficetola et al. 2010) and the plant / fungi specific markers on the European Molecular Biology Laboratory (EMBL; release 141), a second local, generated from specimens of fungi (Jaouen et al. 2019) and plants (see Zinger et al. 2019) collected in French Guiana. OTUs were then assigned a taxonomy, using OBITOOL’s ecotag programme (Boyer et al. 2016), which performs a global alignment of each OTU sequence (the query) against each reference. The reference taxon assigned to each OTU corresponds to the Last Common Ancestor of all the best-match sequences for the query. For taxonomic assignment of bacteria and eukaryote OTUs, the SILVA taxonomic database was used (version 1.3; Quast et al., 2012). Classification was performed by a local nucleotide BLAST search against the non-redundant version of the SILVA SSU Ref dataset (release 132; http://www.arb-silva.de) using blastn (version 2.2.30+; http://blast.ncbi.nlm.nih.gov/Blast.cgi) with standard settings (Camacho et al., 2009). Eukaryote derived metazoan OTUs were then further assigned a taxonomy for Phyla identified at the Arthropoda, Annelida and Nematoda level using reference sequence databases built as above for these groups using the ecoPCR programme on EMBL release 141.

Datasets were subsequently filtered to remove contaminants as well as artefacts such as PCR chimeras and remaining sequencing errors, following Zinger et al. (2019) and using routines now implemented in the metabaR R package (Zinger et al 2020b), in R version 3.6.1 (R Development Core Team, 2013). The filtering process consisted of four steps: (i) a negative control-based filtering. OTUs whose maximum abundance was found in extraction/PCR negative controls were removed from the dataset, as they were likely to be reagent/aerosol contaminants, better amplified in the absence of competing DNA fragments as it is the case in biological samples. (ii) a reference-based filtering. OTUs which are too dissimilar from sequences available in reference databases are potential chimeras generated during sequencing and amplification. In this study, we chose to set similarity thresholds at 95% for plants, 80% for bacteria and eukaryotes and due to the marker being more polymorphic, 65% for fungi. For plants and fungi, the remaining assignment was then verified with the local database, to confirm if assigned taxa also occurred in the local dataset, with preference given to local assignment. In addition, we removed all taxa that are not targeted by the primer used. (iii) an abundance-based filtering. This procedure targets incorrect assignment of a few numbers of sequences corresponding to true OTUs occurring to the wrong sample, a phenomenon called “tag-switching” (Esling et al. 2015), “tag jumps” (Schnell et al. 2015) or “cross-talk” (Edgar 2018). It consists in setting OTUs abundances to 0 in samples where their abundance represents < 0.03% of the total OTU abundance in the entire dataset. (iv) Finally, we conducted a PCR-based filtering by considering any PCR reaction that yielded less than 100 reads for plants, 1000 reads for fungi, bacteria and eukaryotes as non-functional, and removed them from the dataset.

Data provided consists of 4 x OTU tables for each of the markers used to target different components of the soil biota, with rows representing each OTU, and columns the features of the OTU within the dataset, namely their id code, the number of read counts in the analysed dataset, their similarity score against the taxonomic dataset used to identify them, and when possible, a functional group assignment used in the manuscript. Details of these can be found above and in the manuscript and supplementary information.

For each of the four datasets, we also provide a .rds file, corresponding to the processed dataset used in manuscript preparation. This is in the format of a metabaR list which includes PCR, Sample, Read count and the seperately provided OTU datasets. To facilitate interpretation, please refer to Zinger, L., Lionnet, C., Benoiston, A.S., Donald, J., Mercier, C. and Boyer, F., 2021. metabaR: an R package for the evaluation and improvement of DNA metabarcoding data quality. Methods in Ecology and Evolution, 12(4), pp.586-592.

For the fungal (ITS) data, we also provide :

- the R1/R2 raw fastq files of the samples used in the paper + experimental controls

- a tsv file containing the tag combinations corresponding to the samples/PCR replicates, to enable demultiplexing of data.

- a csv file containing the description of each sample.

Files

DIAMOND_URBAN_BACTERIA_Motus.csv

Files (7.7 GB)

Name	Size	Download all
data.uniq.urban.bac.tab md5:3be540be65b0e8dc764b4fe81d057c46	1.7 GB	Download
data.uniq.urban.euk.tab md5:8de210872030cdb21906ad4404b6739c	4.1 GB	Download
data.uniq.urban.fungi.tab md5:eb02a56fc13a274f7b9904b0134a4ca2	693.3 MB	Download
data.uniq.urban.plant.tab md5:d774609bc7b2a2fbaf6d95ceebc6c859	618.6 MB	Download
Diamond_Bacteria_clean_annotated.rds md5:5ffaf0b7ec426649d55413dc37874bc8	1.6 MB	Download
Diamond_Fungi_clean_annotated.rds md5:28c597f3f7a531732c11d2ae8e638348	659.5 kB	Download
Diamond_Metazoa_clean_annotated.rds md5:7b54daaf8e227c59d22dde8b0d9a752b	933.3 kB	Download
Diamond_Plant_clean_annotated.rds md5:9244dd7a232f0b409bc5fde16ec32bc4	82.1 kB	Download
DIAMOND_URBAN_BACTERIA_Motus.csv md5:e24a1d40279c7267588027091d4c02ad	7.5 MB	Preview Download
DIAMOND_URBAN_FUNGI_Motus.csv md5:841d957f7f326b2a69ff0393668b7535	3.3 MB	Preview Download
DIAMOND_URBAN_METAZOA_Motus.csv md5:39b3b764064f70e7689ae82d545f5399	1.3 MB	Preview Download
DIAMOND_URBAN_PLANT_Motus.csv md5:3309d2c9309ca26f0a1803c0b5703704	1.7 MB	Preview Download
Donald_2021_GEC_fungi_ngsfilter.tsv md5:28b4acd1b4e57337832f27a388dfa42f	123.5 kB	Download
Donald_2021_GEC_fungi_R1.fastq.gz md5:d89e127a88d51cabe808a9fedf607812	264.5 MB	Download
Donald_2021_GEC_fungi_R2.fastq.gz md5:7a29ceecfcc3164e34b86518e0e74791	316.8 MB	Download
Donald_2021_GEC_fungi_sample_info.csv md5:685a078f471e7b66568e01d59c9562b4	39.3 kB	Preview Download

	All versions	This version
Views	808	222
Downloads	1,474	841
Data volume	758.2 GB	515.0 GB

Multi-taxa environmental DNA inventories reveal distinct taxonomic and functional diversity in urban tropical forest fragments

Authors/Creators

Description

Files

DIAMOND_URBAN_BACTERIA_Motus.csv

Files (7.7 GB)