Complex cycles of divergence and migration shape lineage structure in the common kingsnake species complex

The Nearctic is a complex patchwork of habitats and geologic features that form barriers to gene flow resulting in phylogeographic structure and speciation in many lineages. Habitats are rarely stable over geologic time, and the Nearctic has undergone major climatic changes in the past few million years. We use the common kingsnake species complex to study how climate, geography, and history influence lineage formation over a large, complex landscape.


| INTRODUC TI ON
Identifying shared barriers and processes driving lineage formation is one of the primary goals of phylogeography. Genetic differentiation at shared boundaries has now been identified in numerous taxa (e.g. Mayden, 1988;Myers et al., 2019;Omernik, 1987;Remington, 1968;Soltis et al., 2006;Swenson, 2006). Although these barriers shape the distributions of numerous species and intraspecific lineages, many species groups with lineage boundaries at these barriers show incomplete reproductive isolation (e.g. Burbrink et al., 2021;Dufresnes et al., 2021;Myers et al., 2019;Nevado et al., 2018). In temperate regions of the world, genetic structure is influenced not only by present-day barriers and environmental complexity, but also by major climatic changes and glacial cycles that have caused drastic changes in the biotic and abiotic landscape over time (Hewitt, 2004).
Moreover, repeated changes in climate associated with glacial cycles of the Pleistocene have isolated or compressed populations into refugia (Burbrink et al., 2016;Hewitt, 2000). All of these barriers and environmental processes can have strong effects of isolating lineages and shaping the genomic structure of lineages.
Recent work across biomes has demonstrated that introgression is particularly widespread among lineages that have overlapping geographical distributions and occur in areas exhibiting strong climatic fluctuations through time, such as the Pleistocene glacial cycles of North America (Singhal et al., 2021). Geographic proximity promotes introgression for the obvious reason that if individuals cannot physically traverse the distance between two lineages, no inter-lineage reproduction can occur. Climatic fluctuations may promote introgression given that these climatic cycles result in cyclic patterns of isolation and fusion of lineages as individuals track contracting and expanding suitable habitat (Rosenblum et al., 2012).
Widely distributed taxa often span not just one, but multiple major barriers and ecotones. In temperate regions, such widespread taxa are subject to a variety of highly divergent habitats, both across space and through time as Pleistocene and earlier climatic fluctuations have had major impacts on the environment. In North America, many taxa have expanded since the Pleistocene in response to the fluctuations in suitable habitat during glacial cycles (e.g. Burbrink et al., 2016;Derkarabetian et al., 2016;Myers et al., 2017;Pelletier & Carstens, 2016). Widespread species in North America are therefore expected to have undergone some lineage fragmentation and isolation in the past, with resulting degrees of genetic differentiation, reproductive isolation and introgression at present depending on length of isolation and strength of differential environmental adaptation (Gavrilets, 2004;Singhal & Moritz, 2013). One possibility is that while isolated, lineages accumulated incompatibilities via selection or drift such that introgression is now maladaptive and lineages maintain their own identities with little or no gene flow (Orr, 1995).
Alternatively, less divergent lineages may remain genetically and ecological compatible and may continue to exchange genes. If rates of gene flow among lineages are high enough, then it is expected that lineages will eventually merge (Behm et al., 2010;Seehausen et al., 2008;Taylor et al., 2006).
Here, we examine the common kingsnake (Lampropeltis getula) species complex to investigate how complex landscapes and climate changes may influence genetic variation in a widespread taxon. Common kingsnakes range from the Pacific to Atlantic coasts of the southern USA and northern Mexico (Ernst & Ernst, 2003;Heimes, 2016), thus spanning multiple known phylogeographic boundaries, including the Appalachian Mountains, the Mississippi River, and the Cochise filter barrier at the Chihuahuan and Sonoran Deserts (Myers et al., 2019;Omernik, 1987;Soltis et al., 2006). Whereas many snakes that have similarly wide distributions are generally found in somewhat similar microhabitats (e.g. they are found in eastern North American forests and mesic regions of western North America, but are excluded from more arid habitats in the west [e.g. racers, Coluber constrictor; Powell et al., 2016]), common kingsnakes occur in nearly the full range of ecoregions in the southern half of North America, from wet eastern forest to grassy plains to sandy, open desert (Blaney, 1977;Ernst & Ernst, 2003). These radically different habitats are characterized by unique climatic regimes and ecological communities and represent major barriers important for defining the ranges of many species (Omernik, 1987;Omernik & Griffith, 2014;Savage, 1960).
Common kingsnakes show striking colour pattern variation across their range, with boundaries between variants at several major ecological transitions and geographical boundaries, including the Cochise filter barrier, the transition from desert to great plains, and the transition from great plains to eastern forest (Blaney, 1977).
Previous work has shown that several of these boundaries between pattern variants (many previously considered to be subspecies) also exhibit major mitochondrial DNA (mtDNA) breaks, and Pyron and Burbrink (2009) named these divergent mtDNA lineages as species ( Figure 1). However, as nuclear DNA (nDNA) datasets have rapidly become larger and more easily generated over the last decade, it has become increasingly clear that signals in mtDNA and nDNA datasets can be geographically discordant and that geographic groupings identified in mtDNA datasets may not always reflect barriers to the movement of nuclear alleles (Burbrink et al., 2021;Fontenot et al., 2011;McGuire et al., 2007;Toews & Brelsford, 2012).
To provide further insight into how major ecological transitions across North America affect patterns of genetic differentiation across the L. getula species complex, we used double digest restriction site-associated (ddRAD) sequencing to sequence thousands of nuclear loci from kingsnakes spanning the range of the species complex. Using these data, we asked a set of related questions: (1) how many discrete lineages exist within the L. getula species complex and do lineage breaks correspond to known biogeographical barriers; (2) what are the environmental correlates of genetic differentiation that may be driving or maintaining lineage isolation; and (3) what is the history of isolation and gene flow among lineages and to what degree have isolation or migration changed through the Pleistocene. We show that there are two primary genetic breaks within common kingsnakes: the transition from great plains to desert and the Cochise filter barrier.
East of these barriers, isolation by distance, rather than discrete structure, dominates. Gene flow among discrete genetic clusters is high and asymmetric, and demographic models suggest a history of multiple cycles of isolation followed by subsequent migration, likely shaped by the dynamic climatic history of the Nearctic.
These results show that widespread species or species complexes may show little or no structure across some established boundaries, and that the amounts of divergence and gene flow occurring at lineage boundaries that do exist can be modulated by the timing and duration of past cycles of isolation as climate changes.

| Dataset
We extracted DNA from 56 tissue samples from members of the L. getula species complex obtained from museum collections (Appendix S1, available on Dryad at https://doi.org/10.5061/ dryad.18931 zd16) using the Qiagen DNEasy kit following manufacturer protocol. We sent samples to the University of Wisconsin-Madison Biotechnology Center for ddRAD library preparation using enzymes PstI and Bfal and subsequent sequencing on an Illumina NovaSeq6000 to generate paired-end 150 bp reads. We obtained 165 M reads across all 56 samples. We assembled the data using the ipyrad 0.9.63 pipeline (Eaton & Overcast, 2020) using primarily default parameters (e.g. maximum of 5 low-quality bases, clustering threshold of 0.85; remaining parameters are shown in the params file on Github (https://github.com/seanh arrin gton2 56/Lgetu la_gbs)). After an initial round of assembly, we decided to remove five individuals from the dataset due to their large amounts of missing data, and proceeded with a dataset consisting of 51 individuals.
We required a locus to be present in at least 38 of 51 individuals (~75% of individuals) to be retained, resulting in a final single nucleotide polymorphism (SNP) matrix of 168,568 sites containing ~19% missing data.
We also used ipyrad to generate a dataset using the same parameters, but including only the 35 individuals from what we defined as the eastern lineage (see results) to determine whether analyses on this lineage only would show finer discrete genetic structure. We required a locus to be present in at least 26 individuals (~75% of individuals) to be retained, resulting in a SNP matrix of 160,995 sites with ~19% missing data. Finally, we generated a western dataset that included the individuals in each of two western lineages, which correspond to the currently recognized L. californiae and L. splendida lineages (see results). This included 16 individuals, and we required loci to be present in at least 12 individuals to be retained, resulting in a SNP matrix of 73,363 sites containing ~18% missing data.

| Lineage clusters
We identified lineage clusters and admixture among those clusters across the species complex. We used the sparse nonnegative matrix factorization (sNMF) method (Frichot et al., 2014) implemented in the 'LEA' R package v3.2.0 (Frichot & François, 2015) in R version 4.1.1 (R Core Team, 2021). We ran sNMF for values of K (the number of discrete clusters) ranging from 1 to 10 with 10 replicates each and selected the optimal value of K as the value with the lowest cross-entropy score. We also ran discriminant analysis of principal components (DAPC), a method that does not explicitly model genetic variation, for each value of K using functions in the 'adegenet' R package v2.1.5 (Jombart, 2008;Jombart & Ahmed, 2011;Jombart et al., 2010) to confirm consistent clustering between methods. We repeated these same analyses for the datasets each composed of only eastern or only western individuals.

| Correlates of genetic structure
We next tested the effects of geographic and environmental distances in shaping genetic distances across the complex. We calculated uncorrected genetic distances and linear geographical distances in R and used functions from the 'adegenet' package to create kernel density plots to visualize the relationship between genetic and geographical distances. To simultaneously test the effect of geographic and environmental distances on genetic distances, we used a generalized dissimilarity modelling (GDM; Ferrier et al., 2007;Fitzpatrick & Keller, 2015) approach applying functions from the 'gdm' R package v1.5.0-1 (Fitzpatrick et al., 2021). We downloaded the Bioclim dataset (Hijmans et al., 2005) and elevation data from the Worldclim database (Fick & Hijmans, 2017) and combined these with the Envirem dataset (Title & Bemmels, 2018), then removed highly correlated variables with r > 0.8, leaving us with 13 climatic and environmental variables (Bio1, mean annual temperature; Bio2, mean diurnal temperature range; Bio4, temperature seasonality; Bio5, maximum temp of the warmest month; Bio8, mean temperature of the wettest quarter; Bio9, mean temperature of the driest quarter; Bio12 annual precipitation; Bio15, precipitation seasonality; Bio18, precipitation of the warmest quarter; SAGA-GIS topographic wetness index; terrain roughness index; Thorn's aridity index; PET seasonality; elevation). We calculated pairwise distances among our samples for each variable and then ran a GDM model including all environmental distances plus geographical distance as predictors of genetic distance. We repeated these same analyses for the dataset composed of only eastern or only western individuals.

| Historical demography
Finally, we used FastSimCoal v2.6 (FSC; Excoffier et al., 2013;Excoffier & Foll, 2011) to estimate population genetic parameters and the best model of divergence among the lineages identified by sNMF. This program requires the site frequency spectrum (SFS) as input; we used easySFS (https://github.com/isaac overc ast/easySFS) to convert SNPs to the SFS and assigned samples to lineages based on the best fit K = 3 clustering from sNMF. Admixed individuals were assigned to whichever cluster contributed the largest ancestral allele proportion to an individual. We tested the fit of 14 models that included various combinations of divergence time, migration and population size parameters ( Figure S1, available on Dryad at https://doi. org/10.5061/dryad.18931 zd16). These models are primarily distinguished by whether and when migration is included among lineages.
One model includes no migration and is a simple isolation model. Some models include migration that is constant among lineages beginning at the time of divergence. Other models represent secondary contact that include a period of isolation with no migration immediately after lineage divergence followed by migration. Finally, a subset of these models were parameterized to include population size change along branches ( Figure S1). For all models other than star tree models, we assumed a phylogeny in which the two western lineages are sister to each other to the exclusion of the eastern lineage. This is based on the mtDNA results of Pyron and Burbrink (2009) and our K = 2 clustering results, which support the split between the eastern lineage and western lineages combined as the most differentiated split (see results below).
We ran 50 replicate FSC analyses for each model, with each analysis consisting of 100,000 simulations for composite likelihood estimation and 40 ECM cycles for parameter optimization. We consider the maximum pseudo-likelihood estimate for each model to be the single replicate run with the highest likelihood. Parameter estimation from the folded SFS in FSC requires that at least one parameter value be fixed. We chose to fix the root split of the L. getula complex to 4.4 Mya based on multiple previous studies that have converged on estimates near this value using different methods and datasets (Burbrink & Gehara 2018;Chen et al., 2017;Pyron & Burbrink, 2009;Zheng & Wiens, 2016). We used the number of parameters and likelihood estimated by FSC to calculate Akaike information criterion (AIC) and AIC weights for each model to compare model fits. For the single best fit model, we performed parametric bootstrapping to estimate confidence intervals around the parameter estimates. We used FSC to generate 100 bootstrap SFS replicates from the best fit model parameters, and then re-estimated the parameters for each dataset following the same procedure as for the empirical SFS described above.

| Discrete structure
A model of three lineages provided the best fit to the data based on the cross-entropy scores from sNMF (Figures 2, Figure S2). This includes a break in west Texas at the boundary between the Great Plains and the more arid desert regions immediately to the west and a boundary at the Cochise filter barrier at the Arizona/New Mexico border.
Admixed individuals are present at both boundaries. We also examined plots of K = 2, 4 and 5, although these were suboptimal based on cross-entropy scores ( Figures S2 and S3, available on Dryad). When individuals were classified into only two clusters, a single genetic break occurs at the Great Plains/desert transition, implying the most differentiation between the eastern lineage and the two western lineages. At K = 4, the eastern lineage splits into two near the Mississippi River, with very extensive admixture among the two eastern lineages.
At K = 5, those previously admixed individuals are assigned to a unique cluster, still showing considerable admixture with the lineage to the west. The two western lineages in the best fit K = 3 model are broadly concordant with the species (and previously subspecies) designations for the California, L. californiae, and desert, L. splendida, kingsnakes and so for the duration of this paper, we refer to these as the L. californiae and L. splendida lineages, while referring to the remaining lineage simply as the eastern lineage. Results from DAPC analyses at each value of K were concordant with results from sNMF ( Figure S3).

| Environmental and geographical effects on lineage structure
A kernel density plot of genetic versus geographic distance shows a pattern consistent with IBD, but also additional spatial structure ( Figure 3). If IBD were the only force shaping spatial genetic diversity, we would expect a single, linear cloud of points. Instead, we see partial disjunctions in the cloud and a change in slope that indicates the presence of discrete genetic structure as well. Our GDM analysis, which includes geographical as well as several environmental distances as potential predictors of genetic distance, supports the strong role of geographic distance in shaping genetic distance across the kingsnake complex. Although geographic distance is the most important predictor, we also find significant effects of precipitation in the warmest quarter and Thorn's aridity index (Figure 4). Both of these environmental predictors show curves approximating exponential increases in genetic distances  with increases in environmental distance, whereas geographic and genetic distances show an approximately linear relationship ( Figure 4).
After running the above analyses, we repeated them on a dataset composed of only the eastern lineage to determine if finer-scale structure was being masked when analysing the full dataset. We found that the lowest cross-entropy score from sNMF was for K = 2; however, this was only slightly lower than the score for K = 1, indicating it does provide a considerably better fit ( Figure S4). A kernel density plot of genetic versus geographic distance showed a single cloud of points indicating continuous spatial structure, rather than discrete structure and GDM showed a significant and roughly linear effect of geographic distance on genetic distance ( Figure S5).
Therefore, we consider the eastern cluster to be a single large lineage, although we do not rule out the possibility that eastern individuals were once fragmented into multiple populations that have now merged. When analysing the western individuals alone with sNMF, we find the same differentiation into the L. californiae and L. splendida lineages as when we analysed all individuals together ( Figure S6). Additionally, the kernel density plot of genetic against geographic distance for these western individuals showed an obvious disjunction indicating discrete structure ( Figures S6 and S7).
A unique set of environmental variables significantly affect genetic distance for the eastern and western datasets. In the eastern lineage, isolation by distance is the most important variable, as in the model for all individuals, with precipitation in the warmest quarter as the only significant environmental predictor. In the GDM model for only western samples (L. californiae + L. splendida), geographic distance, mean diurnal temperature range and mean temperature of the driest quarter were all significant predictors of genetic distance with the two environmental predictors both having higher variable importance than geographic distance ( Figure S8).

| Demographic modelling
A model including multiple episodes of divergence and secondary contact plus population size changes best fit these data using the program FSC ( Figure 5,    (Table S1), and we therefore expect that this discrepancy is a result of a factor not modelled by any of the FSC models we included. We consider the most likely candidate to be continuous geographical structure within lineages (as shown by GDM analyses and kernel density plots of genetic and geographical distance), which is not adequately modelled in conjunction with complex scenarios of divergence and gene flow in any program of which we are aware.

| DISCUSS ION
Using genome-wide RAD data, we show that the genetic structure and history of divergence within the common kingsnake complex are much more complicated than the scenario detectable from mtDNA alone. This is an increasingly common pattern across many taxa (e.g. Burbrink et al., 2021;Chan & Levin, 2005;Firneno et al., 2021;Myers et al., 2020;Streicher et al., 2016;Toews & Brelsford, 2012), as the relatively recent ability to obtain 1,000s of nuclear loci has greatly expanded our power to make complex population genetic inferences. We identify three discrete lineages within the common kingsnake complex, corresponding to L. californiae, L. splendida and a single eastern lineage composed of the rest of the complex. Our analyses suggest considerable gene flow among these lineages, and continuous spatial structure resulting from isolation by distance within them. Although we identify considerable gene flow across the complex that was not hypothesized based on mtDNA alone, we note that the lineages identified by Pyron and Burbrink (2009) are not discordant with our findings per se. When we examined clusters inferred by sNMF under a five cluster model, the inferred lineages matched those proposed as species by Pyron and Burbrink (2009).
However, higher sNMF support for all eastern samples as a single cluster and a linear relationship between genetic and geographical distance for the eastern samples indicates that the structure in the eastern lineage of the L. getula complex (holbrooki, nigra and getula sensu stricto) is primarily continuous spatial structure, consistent with isolation by distance, rather than discrete lineage structure.
A similar pattern was identified in the Eastern Pinesnake (Pituophis melanoleucus) complex, which also shows clinal structure across the eastern Nearctic dominated by isolation by distance (Nikolakis  Time ( Parameter estimates are shown as violin plots of estimates from 100 parametric bootstraps, with the maximum pseudo-likelihood estimates shown as black diamonds. Note that the y-axis for migration rates is truncated, due to high uncertainty in the migration rate from the ancestral western lineage into the eastern lineage in the past. et al., 2022). This does not imply that the eastern lineage was not fragmented at some point during glacial cycles or subject to other processes, only that our data do not indicate the presence of multiple discrete lineages at present.
In addition to the strong, expected effect of geographic distance on genetic distance across the L. getula complex, we also identified roles for environmental factors in shaping genetic distance. Precipitation in the warmest quarter (i.e. summer rains in the Nearctic) and Thorn's aridity index both have strong effects on genetic distances, with genetic distances increasing more than linearly with these environmental distances. The transition from the great plains to the desert is the most dramatic shift in precipitation and temperature (both of which define degree of aridity) and also the most differentiated lineage break that we identify in the common kingsnake complex. Organisms in desert environments regularly encounter extreme high temperatures and water scarcity not encountered by organisms of the plains and forested environments.
Survival in these harsh environments is often facilitated through combinations of adaptive and plastic changes to temperature tolerances, water loss, metabolic rates and thermoregulatory behaviours (Bradshaw, 1988;Lahav & Dmi'El, 1996;Rocha et al., 2021;Williams & Tieleman, 2005). Additionally, it has been shown that water loss is highest in mesic habitats and corresponds to greatly reduced biogeographical transitions between arid and mesic habitats relative to between arid and semi-arid regions (Cox & Cox, 2015). We therefore suspect that kingsnakes in desert environments likely show aridspecific adaptation and that this may help to drive lineage differentiation and the association between genetic distance and aridity and precipitation across the complex.
We find that aridity is not a significant predictor of genetic distance in the subset datasets containing only eastern or western samples, suggesting that aridity may specifically be important in shaping the boundary between lineages at the plains/desert boundary. In the eastern lineage, precipitation of the warmest quarter remains a significant non-spatial predictor of genetic distance. In the western lineage, two temperature-based environmental predictors are each more important than geographical distance in shaping these west-  (Provost et al., 2021).
Overall, this model suggests multiple rounds of divergence in isolation, followed by high migration in secondary contact, consistent with the mixing-isolation-mixing model of divergence (He et al., 2019). It is possible that the history of isolation and contact is more complex than can be modelled with these data, with several successive rounds of isolation and contact. Therefore, additional isolation and contact may have occurred in response to the multiple glacial cycles during the Pleistocene or earlier climatic fluctuations.
However, there is limited resolution to detect isolation and contact over short time-scales. Distinguishing even a single isolation and contact event from an isolation with migration model can be difficult if lineages were not isolated for sufficient time before coming into contact (Roux et al., 2016). We expect that the mixing-isolationmixing model of divergence may be prevalent among many widespread species groups in areas of cyclic climate change, and hope that the continued development of new methods will enable accurate identification of this process.
Our results suggest that lineages within the L. getula complex are clearly not completely reproductively isolated from one another. Migration rates among current lineages range from 0.31 to 4.93 effective individuals/generation. The latter migration rate is far above the often cited threshold of 1 individual/generation that will cause populations to merge into panmixia (Wright, 1931); however, this threshold is derived under a simple population model that does include complexities such as continuous spatial structure (Wang, 2004). It is intriguing that this order of magnitude difference in migration rates is consistently asymmetric: migration rates from east to west are all less than one, whereas eastward migration rates out of western lineages are always greater than one. This pattern occurs among the three present-day lineages and in the past prior to the divergence of the L. californiae and L. splendida lineages. One speculative hypothesis for this is that survival in the arid environments of the west may require adapta-

| CON CLUS IONS
We demonstrate that across the varied habitats of the Nearctic, genetic diversity in the common kingsnake is shaped by complex mixtures of present climatic gradients, geography, and cyclical rounds of isolation and contact influenced by a changing climate. These varied processes result in a complicated mixture of discrete and continuous spatial structure. Across the complex, geographic distance, rainfall and aridity affect genetic distance, suggesting a primary role for the transition from great plains to desert habitats in contributing to lineage differentiation. At least two rounds of isolation followed by secondary contact characterize the divergence history of the complex, indicating a role for climatic cycles in temporarily isolating lineages, after which point they partially or perhaps near fully merged before the next round of isolation. We propose that survival in harsh desert environments may require genetic adaptations which mediate asymmetric migration and maintain lineage boundaries, such that selection limits the ability of eastern snakes and the alleles they carry to migrate westward. Testing this hypothesis will require a genomic approach in the future to characterize the landscape of loci that do or do not resist migration across lineage boundaries.

ACK N OWLED G EM ENTS
Tissues were obtained from the following personal and institutional collections (and therefore no permits were required), and we thank all institutions and staff for their contributions: Lauren

CO N FLI C T O F I NTE R E S T
The authors declare that we have no conflict of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
All scripts for data processing and analysis are available at https://