Published October 24, 2025 | Version v1
Software Open

Data from: Genetic, phenotypic, and environmental drivers of local adaptation and climate-change induced maladaptation in yellow warblers

  • 1. Colorado State University
  • 2. University of California, Davis

Description

Understanding processes driving local adaptation in wild species is a key goal in evolutionary biology, but linking genotype to phenotype to environmental drivers of natural selection remains challenging. This dataset contains the necessary data to replicate the analyses in Rodriguez et al, which explores the connections between genotypes, phenotypes, and environment in yellow warblers across their breeding range. First, we conduct genome-wide association studies (GWAS) to identify loci related to bill shape and individual quality. We then conduct a gene-environment association (GEA) analysis on the resulting loci and find precipitation is underlying putative selection on bill shape. Finally, we test whether contemporary individuals whose bill shape deviates from historical relationships with precipitation exhibit increased stress—measured by telomere length—resulting from maladaptation. We collected samples from 121 yellow warblers from two reference populations in Michigan and Pennsylvania. At each site, birds were captured using mist-netting, bill depth measurements were taken, and blood samples were collected via brachial venipuncture and preserved in Queens lysis buffer. Further, we collected an additional 171 genetic samples from 22 sites across the yellow warbler breeding range to validate associations between allele frequencies and environmental variables in key loci. From the 171 samples, 63 samples with bill depth measurements from 10 sites across the breeding range were also used to validate the associations between bill depth and environmental variables. In addition, 169 historical yellow warbler samples were collected from museum specimens on the breeding range to run a population structure analysis to ask if local populations have shifted their geographic ranges over the last century.

Notes

Funding provided by: U.S. National Science Foundation
ROR ID: https://ror.org/021nxhr62
Award Number: 006784

Methods

We collected samples from 121 yellow warblers from two reference populations in Michigan and Pennsylvania. At each site, birds were captured using mist-netting, bill depth measurements were taken, and blood samples were collected via brachial venipuncture and preserved in Queens lysis buffer. Further, we collected an additional 171 genetic samples from 22 sites across the yellow warbler breeding range to validate associations between allele frequencies and environmental variables in key loci. From the 171 samples, 63 samples with bill depth measurements from 10 sites across the breeding range were also used to validate the associations between bill depth and environmental variables.

Whole genome sequencing libraries were prepared following modifications of Illumina's Nextera Library Preparation protocol with a target sequencing depth of 2X per individual. We used the program Trimmomatic 0.39 to trim the sequence data to remove Illumina adapter sequences and polyG tails using a sliding window approach (SLIDINGWINDOW:4:20). We then mapped reads to the yellow warbler reference genome (NCBI BioProject PRJNA777222) using BWA 0.7.17. After mapping, the resulting SAM files were sorted, converted to BAM files, and indexed using Samtools version 1.16. We used MarkDuplicates from Picard (http://broadinstitute.github.io/picard) to mark read duplicates and clipped overlapping reads with the clipOverlap function from bamUtil. To reduce sequencing depth variation, we used the downsample function from Picard (http://broadinstitute.github.io/picard) to downsample reads from BAM files with greater than 3X coverage, to 3X coverage. This resulted in an average read depth of 2.7X coverage.

To identify genetic markers from low-coverage WGS data, we used the program HaplotypeCaller in the Genome Analysis Toolkit (GATK version 4.1.6.0) applying a minimum base quality score of 33 and a minimum mapping quality score of 20 to reduce lane effects. To parallel the genotype calling process, we generated genomic databases in ~3 Mb intervals across the genome and combined and indexed the genotyped VCF files with BCFtools 1.16. To remove systematic errors, we applied a hard filter to the subsequent VCF file with the following parameters, "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0", filtering the indels separately with "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0". We then used BCFtools to keep biallelic sites (-m 2 -M 2) missing in fewer than 20% of the sampled individuals ('F_MISSING < 0.20'), with minor allele frequency of at least 0.05 (--min-af 0.05, --max-af 0.95), and with a sequencing quality score of at least 30 ('QUAL > 30')42. This filtering resulted in 2,999,708 variants in 298 individuals with an average of 21% missing data.

We measured telomere length from bam files using Telseq v0.0.2. We modified parameters in the Telseq source code to adapt it to the yellow warbler genome, which includes changing the number of chromosomal ends, read length, and total GC content (bp). The parameters TELOMERE_ENDS, READ_LENGTH and GENOME_LENGTH_AT_TEL_GC were set equal to 62, 100, and 143831148, respectively. We calculated the latter by measuring the total length of 150 base pair windows in the yellow warbler genome with a GC content between 48% and 52%.

To compare the current and pre-climate-change associations between phenotype and environment, we used data from Wiedenfeld (1991) which includes morphometric measurements from 153 yellow warblers captured between 1873 and 1987. We used wing-chord as a proxy for body size to calculate body-size corrected bill depth in historic and current samples. Using locations of capture, we extracted historical monthly climate data from Worldclim for the breeding months of May, June, and July for each sample between the years of 1901 – 1950, which we then averaged. As bioclim variables are not available for historic time-periods, we used an average precipitation. We then used the 'lm' function in R version 3.5.3 (https://www.R-project.org) to fit linear models to test the association between bill depth and the environment for both our historic and contemporary samples. We then calculated the residuals from the contemporary association to the historical line of best fit. We used those residuals as a measure of change between the historic and contemporary relationship between bill depth and climate, where a larger residual means a bigger mismatch between bill depth and the environment, relative to what we assume is the pre-climate change optimal. 

We used population structure analyses to ask whether local populations have shifted their geographic ranges over the last century. We assembled a collection 169 historic samples of yellow warblers sampled on their breeding range. Historic samples were skin or toe pads loaned from museums (Supplementary Table ##). All samples were extracted using the Qiagen DNeasy Blood and Tissue Kit and genotyped at a set of 96 SNPs, previously identified for geographic assignment (Bay et al. 2021), using a Fluidigm 96.96 IFC controller. After SNP genotyping, we discarded individuals with poor quality data (<50% of SNPs genotyped). Genotypes from historic samples were combined with previously genotyped contemporary yellow warblers (1990-present) sampled on their breeding range (Bay et al. 2021). This left us with a final set of 551 samples (129 historical and 422 contemporary)

We performed principal components analysis (PCA) on contemporary samples only to establish the relationship between genetic variation and geography. PCA was performed using the SNPRelate package in R  v4.3.2. We then predicted loadings of historical samples using the snpgdsPCASampLoading function. Historical samples were plotted alongside contemporary samples to visualize whether relationships between genetic variation and geography changed over time. We used linear models to test for effects of latitude, longitude, and time (historical v. contemporary) on PC axes.

Files

Files (16.9 kB)

Name Size Download all
md5:1afea3b10815cfc19ae6255738fa6a55
1.7 kB Download
md5:878bf243dac83d8ea8794ce42ef1a396
2.4 kB Download
md5:eeffa19229698978c4c1124bcfc1d22f
6.6 kB Download
md5:292fb55dd00071a2e512adffe918bce9
410 Bytes Download
md5:e599d6636b12a1560d9ab299bb5c7905
2.0 kB Download
md5:fd24df8d51879956e5dcb7eb0b634aef
3.4 kB Download
md5:314d4bae5a564d5681fbb7bbb7c6a184
360 Bytes Download

Additional details

Related works

Is source of
10.5061/dryad.hmgqnk9tp (DOI)