Dispersal of bacteria and stimulation of permafrost decomposition by Collembola: data analysis pipeline

Sylvain Monteux - Sveriges Lantbruksuniversitet / Stockholms Universitet - 3/31/2022

Sylvain Monteux, Janine Marien, Eveline J. Krab

Biogeosciences Discussions, 2022, bg-2022-98

File structure

1. Acknowledgements

2. Summary of the molecular work

3. Summary of the bioinformatics work

4. Directory structure

5. Pipeline

1. Acknowledgements

The molecular work was hosted by the Department of Forest Mycology and Plant Pathology at the Swedish University of Agricultural Sciences SLU.

Sequencing was performed by the SNP&SEQ Technology Platform in Uppsala. The facility is part of the National Genomics Infrastructure (NGI) Sweden and Science for Life Laboratory. The SNP&SEQ Platform is also supported by the Swedish Research Council and the Knut and Alice Wallenberg Foundation.

2. Summary of the molecular work

  • Soils were freeze-dried and homogenyzed in Precellys Dry hard tissue (15ml) grinding tubes on a Precellys bead-beater at 5000rpm for 2 x 30s.
  • DNA was extracted from c. 0.20g of freeze-dried material with DNEasy PowerSoil Pro Kit (Qiagen) according to manufacturer’s instructions
  • 2µl of DNA template at 5ng.µl were used in 25µl PCR reaction volume with Phusion Taq ready-made mastermix with 0.25µM final primer concentration
  • The V4V5 region of the 16S rRNA gene was targeted by PCR using the following primers, supplemented with Nextera adapters (3’):
    • 515F (5’ GTGYCAGCMGCCGCGGTAA 3’)
    • 926R (5’ CCGYCAATTYMTTTRAGTTT 3’)
    • Parada et al., 2016: https://doi.org/10.1111/1462-2920.13023
    • PCR conditions: initial denaturation 98°C 3’; 25 cycles of (denaturation 98°C 15“; annealing 50°C 30”; elongation 72°C 40"); final elongation 72°C 10’
    • Three extraction blanks, two PCR blanks and four positive controls (Zymobiomics mock community DNA standard at 5 and 0.5ng.µl) were included
    • PCR products were cleaned and their concentration normalized with SequalPrep Normalization Plate Kit
  • A second PCR step was carried out to add Nextera indexing barcodes
    • PCR conditions: initial denaturation 98°C 3’; 8 cycles of (denaturation 98°C 30“; annealing 55°C 30”; elongation 72°C 40"); final elongation 72°C 10’
  • PCR products were cleaned, their concentration normalized, and they were pooled equimolarly with SequalPrep Normalization Plate Kit, using serial elution across columns
  • DNA was sent to SciLifeLab for sequencing on Illumina MiSeq using V3 chemistry (2 x 300 bp) and a 15% PhiX spike-in

3. Summary of the bioinformatics work

  • Bioinformatics analyses to produce the data are found at https://git.bolin.su.se/smonteux/Collembola_vector
    • Primers were removed with cutadapt v3.10 using default expected error-rate (0.1)
    • ASVs were computed using DADA2 1.18.0 in R 4.1.3
      • Sequences were trimmed to 250bp for forward reads and 210bp for reverse after visual check of bp-quality distribution on a subset of samples
      • Error-learning, denoising and ASV computing was carried out using default parameters with pseudo-pooling,
      • Only ASVs between 366 and 390bp long were kept, discarding the tails of the length distribution
      • Chimeras were identified by the consensus method
    • Taxonomy was assigned using the RDP classifier within DADA2 with SILVA 138.1 datasets as reference data (https://doi.org/10.5281/zenodo.4587955), and further assigned to putative species as implemented in the assignSpecies algorithm, using the same database
    • ASV sequences were aligned using the AlignSeqs algorithm in DECIPHER 2.22.0
    • A phylogenetic tree was computed by Neighbor-Joining using phangorn 2.8.1 and likelihood optimized with the optim.pml function
    • Putative contaminants were identified and removing with the decontam 1.14.0 R package (https://doi.org/10.1186/s40168-018-0605-2)
      • Combined prevalence- and frequency-based method with the default threshold of 0.1
      • Resulted in 2 ASVs identified as contaminant, one of which was part of the mock communities and thus wasn’t removed
      • Frequency-based method with a threshold of 0.05 identified another 3 contaminants not detected by the combined method, which were removed
      • Prevalence-based method with a threshold of 0.05 identified another 6 contaminants, one of which was abundant in mock communities and was kept, the other 5 were removed
      • All contaminant ASVs removed amounted up to 0.0243% of the total reads
      • List of contaminant ASVs is found in results/contaminant_asvs.txt

4. Directory structure

  • intermediate/ - temporary files used throughout the analysis, can be used to modify some steps
    • phyloseq_object.rds - saved R object of the assembled phyloseq object after initial processing with dada2
  • metadata/
    • metadata_vector.txt - Contains sample information and DNA concentrations
    • flux_incubated.txt - Contains flux measurement data in mg-C, expressed per day, per weight and as cumulative values
    • data_description.txt - Description of data found in Metadata_vector.txt and flux_incubated.txt
  • results/ - Final result files
    • contaminant_asvs.txt - List of ASVs identified as contaminants (see scripts/contaminants.html)
    • dada2_asvs.fasta - ASV sequences
    • anova.manyglm.2k.rds - saved R object, anova.manyglm object
    • *.pdf and *.txt - figures and tables files
  • scripts/
    • Manuscript_figures.Rmd - R notebook of analyses carried out for the manuscript
  • ./README.md - This document