Population genetics of the Four Corners potato (Solanum jamesii)
Authors/Creators
- 1. University of Utah
- 2. University of Wisconsin-Madison
- 3. Edward T. Schafer Agricultural Research Center
- 4. Natural History Museum of Utah
Description
Premise: The domestication of wild plant species can begin with gathering and transport of propagules by Indigenous peoples. The effect on genomic composition, especially in clonal, self-incompatible perennials would be instantaneous and drastic with respect to new, anthropogenic populations subsequently established. Reductions in genetic diversity and mating capability would be symptomatic and the presence of unique alleles and genetic sequences would reveal the origins and ancestry of populations associated with archaeological sites. The current distribution of the Four Corners potato, Solanum jamesii Torr. in the Southwestern USA, may thus reflect the early stages of a domestication process that began with tuber transport.
Methods: Herein genetic sequencing (GBS) data are used to further examine the hypothesis of domestication in this culturally significant species by sampling 25 archaeological and non-archaeological populations.
Key Results: Archaeological populations from Utah, Colorado, and northern Arizona have lower levels of polymorphic loci, unique alleles, and heterozygosity than non-archaeological populations from the Mogollon Region of central Arizona and New Mexico. Principle components analysis, Fst values, and structure analysis revealed that genetic relationships among archaeological populations did not correspond to geographic proximity. Populations in Escalante, Utah were related to those on the Mogollon Rim (400 km south) and had multiple origins and significant disjunctions with those in Bears Ears, Chaco Canyon, and Mesa Verde.
Conclusions: Movement of tubers from the Mogollon Region may have occurred many times and in multiple directions during the past, resulting in the complex genetic patterns seen in populations from across the Four Corners Region.
Notes
Methods
Selection of sample sites – The S. jamesii accession database contains a total of 169 extant occurrences from across the entire range of the species (Bamberg et al., 2003, with updates), of which 36 are associated with ancient sites (e.g. pueblos, granaries, habitation structures, kivas). These were divided into three latitudinal strata, each containing roughly a third of the total occurrences. From each stratum, a total of 12 primary and secondary candidate sites were randomly chosen. Candidate sites were surveyed in 2019, 2020, and 2021, and depending on monsoon rainfall, sampled for DNA analysis (in dry years a secondary site might be substituted for a primary site if plants were not present in the latter). In all we were able to obtain leaf samples and site metadata from 14 archaeological (A) and 11 non-archaeological (NA) populations (Fig. 1 and Data available from Dryad at https://doi.org/10.5061/dryad.cz8w9gj9w).
Field sampling – Leaf samples of +30 individuals (optimized by Bamberg and del Rio, 2004) were obtained from multiple (6 to 8) widely separated patches in each of the 25 study populations of S. jamesii. There were placed in individual paper tea bags and immediately submerged in silicone gel and/or powder to desiccate. Each sampled individual was geo-referenced with sub-meter GPS (Eos Arrow 100 GNSS, Anatum GeoMobile Solutions, Portland, Oregon, USA), and notes taken on habitat and archeological features (if present within a 300m radius).
Sequencing
DNA extraction and GBS library construction – Plant DNA was extracted from tissue for each of the samples collected in the collecting trips using a QIACube HT (Qiagen, Germantown, Maryland, USA), an automated extraction robot, which makes high throughput extractions in 96-well plates. These DNA samples were then utilized for Genotyping by Sequencing (GBS), using 48- and 96-plex plates. The first step was to create a reduced representation of the genomes through restriction endonucleases. This cleaved genomic DNA into manageable fragments for PCR and sequencing. Previous work in Solanum jamesii determined that the enzyme combination of PstI and BfaI was the most effective for GBS (Bamberg et al., 2021). After the enzymatic cleavage the sticky ends of DNA were used as anchors for the ligation of barcoded adaptors that served (a) to identify each sample individually from the others during pooled DNA sequencing, (b) as primer binding sites for PCR amplification of the DNA before sequencing, and (c) to adhere other sequencing instrument specific adaptors. After barcode ligation the DNA was amplified via PCR, and then purified and quantified before sequencing. The sequencing was done using a NovaSeq 6000 Sequencer (Illumina, San Diego, California, USA), at the University of Wisconsin-Madison Biotechnology Center (UWBC, https://www. biotech.wisc.edu/).
GBS sequence analysis and SNP discovery – To perform the sequencing, reads having low-quality, unidentified bases, and parts of the adapter sequences were eliminated using computational pipelines developed at the Bioinformatics Resource Center (BRC), which is part of the Advanced Genome Analysis Resource unit of the UWBC. The software Skewer (Jiang et al., 2014) was applied to pre-process the raw fastq data files, removing remaining sequences of adapters and/or primers from reads used for sequencing. The reads found to be trimmed too short with poor quality, and those that did not meet minimum sequence length of 150 bp were discarded from downstream analysis. This process recovers true targeted DNA sequences. Reads were then checked for quality using FastQC; each sequence was recognized by its unique barcode adapter (https://sourceforge.net/projects/gbsbarcode/). Raw sequences were processed using Tassel 3.0 and Universal Network Enabled Analysis Kit (UNEAK) pipelines (Lu et al., 2013) for SNP discovery.
The bioinformatics computational steps were done on the Unix platform Zcluster at the UWBC. The raw reads were analyzed for quality and sequences were trimmed from the 3' end of the reads until reaching a Phred quality of 20 (Ewing et al. 1998). These were de-multiplexed using GBSSeqToTagDBPlugin, a step in Tassel GBS Pipeline, generating high-quality reads for each genotype. Genebank accession number GCA_000226075.1 was used as the Reference Genome (Massa et al. 2011) and reads were mapped using Bowtie 2 alignment software (Langmead and Salzberg 2012). SAMtools version 1.3.1 (Li et al., 2009) was used for analyzing and manipulating alignments. A SNP master matrix for each plant collected in the 25 S. jamesii populations was created after the pipeline for SNP discovery and genotype calling using TASSEL reference-based GBS analysis pipeline for SNP calling (Glaubitz et al. 2014). The pipeline for SNP discovery followed standard conditions of depth coverage (minDP≥2), maximum mismatch for alignment (n=3), sufficient data for imputation (i.e. SNP sites with data for >=80% of samples), and minimum Minor Allele Frequency (MAF≥0.05).
Data Analysis
To determine levels of genetic diversity in the populations, SNP markers were used to calculate genetic relationships among populations and standard estimates of population genetic structure using GenAlex software version 6.5 (Peakall and Smouse, 2006). These include levels of polymorphism, heterozygosity, and number of unique alleles. Principal Component (PC) analyses were also generated with GenAlex to present visual representations of the genetic diversity and differentiation among populations. The purpose of PC was to organize the populations in a space of reduced dimensionality while keeping their genetic identities and divergence, especially in this study where a very large number of individual plants were genotyped.
An analysis to determine patterns in the distribution of genetic diversity among the 25 populations was done with the software STRUCTURE 2.3.4 (Pritchard et al., 2000). Hence, a Bayesian approach was used to estimate posterior probabilities to support whether individuals belong to a given K source population by placing individuals into clusters sharing similar patterns of variation. This was done for all the individual plants collected from each population at the sampling sites. An admixture model was assumed for the analyses. The burn-in generation and the MCMC were set to 250,000, with 20 iterations. Delta K, the optimal number of clusters (K) for the sample set, was estimated using the Evanno method (Evanno et al., 2005; Earl and Vonholdt, 2012) through STRUCTURE Harvester.
The probability of sampling from the same ramet ("P-clone") in a population was roughly estimated by tallying the number of individual stems sampled (plants) that had a coefficient of genetic similarity > 0.975 estimated as the complement of Nei's Genetic Distance coefficient (Nei 1972) and dividing by the total number of stems sampled. Having maximized the distance between samples during collection, this estimate roughly indicates the degree to which cloning (tubering) demographically structures each population.
Files
Allelic_Variation.pdf
Files
(1.6 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:ba5139ecabf9299301ca7036d72e121e
|
802.4 kB | Preview Download |
|
md5:84ab350b0ab0d00b3315556b4d43dc7d
|
824.8 kB | Preview Download |
Additional details
Related works
- Is derived from
- 10.5061/dryad.cz8w9gj9w (DOI)