# Data and code repository README

21 Oct 2022
Carla Cicero, Nicholas A. Mason, Zheng Oong, Pascal O. Title, Melissa E. Morales, Kevin A. Feldheim, Michelle S. Koo, and Rauri C. K. Bowie. Deep ecomorphological and genetic divergence in Steller's Jays (Cyanocitta stelleri, Aves: Corvidae). Ecology & Evolution.

## Description of the data and file structure

This directory contains R scripts and data files to replicate all analyses presented in the manuscript. 

The main Dryad directory contains 5 folders: supplemental tables, scripts, data, utility and output. When running the R scripts, set your working directory to this main directory. All paths are relative to this position.

* The supplemental tables directory contains original data used in this study.
* The scripts directory contains all R code to run analyses and to generate figures.
* The data directory contains data files needed to run the analyses. 
* The utility directory contains other files needed for the analyses.
* The output directory is empty, but is where R scripts will save figures upon execution of the code.

For all modeling and analyses related to Steller's Jay's geographic distribution and environmental data, please see the SDM-oriented README README_SDM.txt.

Descriptions of files:

* The supplemental tables directory contains...
************************************************
- SuppInfo_TableS1_specimen_records_and_GenBank_numbers: This file contains the collection data for specimens used in this study, as well as GenBank accession numbers and the number of microsatellite loci analyzed for each specimen.
- SuppInfo_TableS2_measurement_data: This file contains the raw morphological measurements and frontal streak coloration for each specimen used in this study.
- SuppInfo_TableS3_popart_input: This is a Nexus file for input of ND2 haplotypes to PopART.
- SuppInfo_TableS4_microsatellites_rawdata: This file contains raw genotype data at 12 microsatellite loci for all specimens used in this study.
- SuppInfo_TableS6_microsatellite_variability_allpops: This file contains data on microsatellite variability at each locus across all populations studied.
- SuppInfo_TableS7_microsatellite_LD_test_summary: This file contains pairwise comparison tests for linkage disequilibrium in the microsatellite data.

In addition to these 6 supplemental tables, Supplemental Table S5 is deposited in Zenodo and can be accessed at the following URL: https://doi.org/10.5281/zenodo.7311504. This file contains Steller's Jay records downloaded from the Global Biodiversity Information Facility (GBIF) and filtered to reduce spatial sampling bias in species distribution models. The file is not deposited in Dryad because of a licensing conflict. The file contains the GBIF licensing for each record.


* The data directory contains...
********************************
- Cst_msat_Contact_K2.txt: This file contains the microsatellite Q-scores obtained from STRUCTURE analysis that have been subsetted to just the individuals from the 11-population-transect along the putative contact zone from Northern Idaho to Southern Utah. It is the input file for fitting the microsatellite cline (HZAR) in Cyanocitta_HZAR_msat_v3.R.
- Cst_msat_Contact_K2_Locality.txt: This file contains the cumulative linear distances of the 11 localities along the contact zone transect. It is provides the locality for the microsatellite cline-fitting (HZAR) in Cyanocitta_HZAR_msat_v3.R.
- Cyanocitta_morpho_clinal_locality.txt: This file contains the cumulative linear distances of the 11 localities along the contact zone transect. It is provides the locality for the microsatellite cline-fitting (HZAR) in Cyanocitta_HZAR_morpho.R. It is identical to Cst_msat_Contact_K2_Locality.txt.
- Cyanocitta_morpho_PCAscores.txt: This file contains the first 2 principal component scores obtained from the morphology PCA subsetted to the individuals from the 11-population-transect along the putative contact zone from Northern Idaho to Southern Utah. It is the input file for fitting the morphological cline (HZAR) in Cyanocitta_HZAR_morpho.R.
- hzar_mtDNA_input.txt: This file is a tab delimited table of the Interior vs. Rocky Mountains ND2 haplotype frequencies for the 11 contact zone populations. It also contains the cumulative linear distances of the 11 localities. It is the input file for fitting the mtDNA cline (HZAR) in Cyanocitta_HZAR_ND2_v3.R.
- Cyanocitta_streaks_input.txt: This file is a tab delimited table of the blue vs. white frontal streak frequencies for the 11 contact zone populations. It also contains the cumulative linear distances of the 11 localities. It is the input file for fitting the frontal streaks cline (HZAR) in Cyanocitta_HZAR_streaks.R.
- Cst-nDemes500-chain1: This folder contains all the output from the first replicate chain from the program EEMS run with 500 demes (vertices) with the microsatellite data set. See https://github.com/dipetkov/eems for more information.
- Cst-nDemes500-chain2: This folder contains all the output from the second replicate chain from the program EEMS run with 500 demes (vertices) with the microsatellite data set. See https://github.com/dipetkov/eems for more information.
- Cst-nDemes500-chain3: This folder contains all the output from the third replicate chain from the program EEMS run with 500 demes (vertices) with the microsatellite data set. See https://github.com/dipetkov/eems for more information.
- 1_All_Sequences - Edited_forPopART.nex: This is the input file formatted to create the haplotype network based on mtDNA data
- 2_Cyanocitta_microsats_19Apr2022_FINAL.xlsx: This is the microsatellite data from our study and it is referenced by various scripts here. The tab "master list" has all the sample names and the corresponding repeat lengths for each microsatellite locus. The "Sample (sorted)" and the "Input_file" sorted tabs are various formatted versions of the "Master list" tab. The "Pop list" tab indicates which numerical population each sample belongs to and provides a name for each population. The "Pop list with ind #" tab indicates which number corresponds to which population and which individual belongs in each populaiton.
- AllPopsK2.outfile: This is the output from when all populations are run in structure with K = 2. It is referenced in the scripts that generate the figures for Structure.
- AllPopsK3.outfile: This is the output from when all populations are run in structure with K = 3. It is referenced in the scripts that generate the figures for Structure.
- AllPopsK4.outfile: This is the output from when all populations are run in structure with K = 4. It is referenced in the scripts that generate the figures for Structure.
- AllPopsK5.outfile: This is the output from when all populations are run in structure with K = 5. It is referenced in the scripts that generate the figures for Structure.
- AllPopsK6.outfile: This is the output from when all populations are run in structure with K = 6. It is referenced in the scripts that generate the figures for Structure.
- AllPopsK7.outfile: This is the output from when all populations are run in structure with K = 7. It is referenced in the scripts that generate the figures for Structure.
- CoastalK3.outfile: This is the output from when the Coastal populations are run with K = 3. It is referenced in the scripts that generate the figures for Structure.
- CosIntK2.outfile: This is the output from when Coastal and Interior populations are run in structure with K = 2. It is referenced in the scripts that generate the figures for Structure.
- Cst.coord: input file for running EEMS
- Cst.outer: input file for running EEMS
- Cst.sites: input file for running EEMS
- Cyanocitta_stelleri_measurements_FINAL_20190416_forR.xlsx: Phenotypic measurements that are referenced and plotted in the script STJA_Phenotypes_v10.R.
- RockyOnlyK2.outfile: This is the output from when the Rocky Mountain pops are run in structure with K = 2. It is referenced in the scripts that generate the figures for Structure.
- STJA_ND2_subsampled.nex: This file is a nexus file of ND2 samples from each major ecomorphological clade such that one individual was sampled for each haplotype.
- STJA_run123_MCC.tre: This is the maximum clade credibility tree file output from BEAST that was used to generate the phylogeny figure in the main paper. 
- Cyanocitta_HZAR_ND2.Rdata: Output from the R package hzar for ND2 data that is referenced in the script STJA_hzar_plotcombine_v5.R.
- Cyanocitta_morpho_PCAscores.Rdata: Output from the R package hzar for morphological PCA scores that is referenced in the script STJA_hzar_plotcombine_v5.R.
- Cyanocitta_streaks_HZAR_analysis.Rdata: Output from the R package hzar for facial color streaks that is referenced in the script STJA_hzar_plotcombine_v5.R.
- StellersJay_msat_clinal_18May2018.Rdata: Output from the R package hzar for microsatellite data that is referenced in the script STJA_hzar_plotcombine_v5.R.
- DFAdat directory: This directory contains input files needed to generate main Figure 4.
- SDM directory: This directory contains data files specific to SDM-related analyses. See the SDM-specific README_SDM.txt for additional details.

* The scripts directory contains...
***********************************
- Cyanocitta_HZAR_streaks.R: This script runs the cline analysis (HZAR) for frontal streaks presented in the manuscript.
- Cyanocitta_HZAR_ND2_v3.R: This script runs the cline analysis (HZAR) for ND2 haplotype frequency presented in the manuscript.
- Cyanocitta_HZAR_msat_v3.R: This script runs the cline analysis (HZAR) for microsatellite Q-scores presented in the manuscript.
- Cyanocitta_HZAR_morpho.R: This script runs the cline analysis (HZAR) for morphological measurements (PC1 scores) presented in the manuscript. 
- STJA_EEMSOutput_v2.R: This script generates a plot of the output for genetic diversity and migration rates from the EEMS program.
- STJA_HaplotypeSetup_v3.R: This script creates the input file for the popart program to generate the haplotype network.
- STJA_Phenotypes_v10.R: This script runs the phenotypic analyses presented in the manuscript.
- STJA_phylo_analyze_v3.R: This script plots the phylogeny and gets the genetic distances reported in the manuscript.
- STJA_SisterSpeciesComparisons_v3.R: This script downloads mtDNA data off gen bank and runs the sister species comparisons presented in the paper.
- STJA_StructureOutput_manyKs_v3.R: This script creates the figure in the supplementary information that has all populations and structure output for many different K values.
- STJA_StructureOutput_v6.R: This creates the main Structure figure presented in the manuscript.
- STJA_hzar_plotcombine_v5.R: This script generates the figure of combined cline analyses from HZAR into a single plot along with an inset that shows the location of the sampling points in the cline transect.
- SDM directory: This directory contains R scripts specific to SDM analyses. Please see README_SDM.txt for additional details. 

* The utility directory contains...
***********************************
- Cst_msat_clinal_locality.txt: Distances along an approximate transect that corresponds to the cline analyses presented in the mansucript.
- Cyanocitta_stelleri_population_mapping.xlsx: Coordinates for each of the populations sampled in the manuscript
- Cyanocitta_stelleri_shp: directory containing the shapefile for the species geographic range.
- na_adm0: A folder that contains a shape file and ancillary files for the country boundaries for Canada, USA, and Mexico
- na_adm1: A folder that contains a shape file and ancillary files for the state boundaries for Canada, USA, and Mexico
- na_alt: a raster file that contains elevational data used for plotting purposes (not included in quantitative analyses)
- Taxon_pair_comparisons_ND2_divergence_v2.xlsx: the set of sister taxa pairs that were compared in our manuscript

## Sharing/access Information

Specimen data are accessible through Arctos (https://arctos.database.museum).
GenBank data are accessible through NCBI (https://www.ncbi.nlm.nih.gov/genbank).