Published February 7, 2023 | Version v1
Dataset Open

Intraspecific genome SNP frequencies comparison

  • 1. Jiangxi Science and Technology Normal University
  • 2. McMaster University

Description

Genome sequence analyses can provide crucial for understanding the origin and spread of infectious diseases, population history, speciation, and taxonomy. In Class Agaricomycete where most mushroom-forming fungi belong, most species so far have been defined based on morphological, ecological, and/or molecular features, but there is no defined threshold for any type of features that can be applied across multiple genera, families, and orders. In this study, we investigated genome-wide single nucleotide polymorphism (SNP) frequencies within species to understand the patterns of variation within both the nuclear and mitochondrial genomes of the current whole-genome sequenced species. In total, our analyses included 398 and 106 published available nuclear and mitochondrial genomes of Agaricomycetes, respectively. The SNP frequencies among nuclear genomes within individual species ranged 0.00~7.69% while for the mitochondrial genome comparison, the intraspecific SNP frequencies ranged 0.00~4.41%. The Spearman's non-parametric rank correlation test showed a weak but statistically significant positive correlation between the paired nuclear and mitochondrial genome datasets. Overall, we observed a significantly higher SNP frequency in the nuclear genome than in the mitochondrial genomes between strains within most species. Interestingly, across the broad Basidiomycetes, the ratios of mitochondrial genome SNPs and nuclear genome SNPs between pairs of strains within each species were highly similar, with a mean of 0.24. We discuss the implications of these results for Agaricomycetes systematics and the implementation of genome sequence-based species delimitation in fungi.

Notes

The genome-wide SNP analyses within individual species were determined by the alignment-based program MUMmer 3.23, with longer assemblies (larger genome and better-assembled genomes/fewer scaffolds) in each pairwise comparison serving as the reference for each analyzed species. Our alignments used the following specific commands: "–mum -p" parameter for aligning each pair of assembled genomes and identifying overlapping regions between two profiles (maxgap=500, mincluster=100), followed by "delta-filter -1" processing to filter out repeated comparisons, then "show-snps -CHITrl" to detect base substitutions. Insertions and deletions (InDels) in those overlapping regions were excluded from SNP frequency calculations.

Funding provided by: Natural Sciences and Engineering Research Council of Canada
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100000038
Award Number: RGPIN-2020-05732

Files

README.md

Files (111.0 kB)

Name Size Download all
md5:a0f4c66f396a0f9168a14a8819518156
3.5 kB Preview Download
md5:3a378abc6dfa0f6071395d06a4d916b1
107.5 kB Preview Download