Data used in: Assessing models of speciation under different biogeographic scenarios; an empirical study using multi-locus and RNA-seq analyses Taylor Edwards, Marc Tollis, PingHsun Hsieh, Ryan N. Gutenkunst, Zhen Liu, Kenro Kusumi, Melanie Culver, Robert W. Murphy Ecology and Evolution SampleID Lineage LS11 Sinaloan LS12 Sinaloan SJRNA1 Sonoran SJRNA2 Sonoran TC11 Mojave TC14 Mojave File: EdwardsEtAl_Gopherus_TranscriptomeSuperAssembly.fa Superasembly generated combining RNAseq data from three individuals, one for each of the following lineages: a captive individual of G. agassizii in Arizona that originated in California (Moj_A haplotype); a captive individual of Sonoran G. morafkai from Arizona; and a wild-caught Sinaloan G. morafkai obtained from just outside of Alamos, Sonora Mexico (Rancho Las Cabras; RLC). We created a de novo transcriptome assembly consisting of reads from all three libraries that was used as a reference. We assembled transcript contigs using TRINITY (Grabherr et al. 2011; Haas et al. 2013) with default settings. As de novo transcriptome assemblies often consist of many thousands of possibly chimeric contigs that lack clear gene content (Cahais et al. 2012), we further filtered the TRINITY output for contigs with single gene annotations. To accomplish this, we treated the TRINITY contigs as a query in a BLASTX search of mouse and chicken proteins from UniProt (Magrane et al. 2011) with an E-value cutoff of 1e-6. We then selected contigs containing unique BLAST hits to incorporate into a reference transcriptome for downstream analyses. File: EdwardsEtAl_Filtered_SNPS_6Gopherus.vcf We followed a slightly modified protocol of De Wit et al. (2012) for mapping and variant detection. We performed the analysis using the six low-coverage RNA-seq samples and mapped these to the reference transcriptome (File: EdwardsEtAl_Gopherus_TranscriptomeSuperAssembly.fa). We used BURROUGHS WHEELER ALIGNER 0.6.1 (Li & Durbin 2009) to generate Sequence/Alignment Map (SAM) files. We performed several trials to assess parameter sets and settled on using default parameters with assumed offset of 33, allowed for 0.005 differences between reference and query (-n), and allowed up to five differences in the seed (-k) to achieve > 67% of reads for each individual mapped to the reference transcriptome. We then converted SAM to BAM file format and removed duplicates using SAMTOOLS 0.1.18 (Li et al. 2009). Next, we merged all six individuals together to create a single BAM file and then followed recommendations in the De Wit et al. (2012) protocol to realign poorly mapped regions near indels using GENOMEANALYSISTK-1.0.5974 (GATK; McKenna et al. 2010). We also GATK to detect and annotate variants and generate Variant Call Format (VCF) files (DePristo et al. 2011). We followed the recommendations of De Wit et al. (2012) and called only variant sites with a Phred scale quality of more than 30. We then performed low threshold variant detection and Variant Quality Score Recalibration (VQSR) following De Wit et al. (2012) to build a Gaussian mixture model to be able to accurately distinguish true variant sites from false positives. SampleID Lineage LS11 Sinaloan LS12 Sinaloan SJRNA1 Sonoran SJRNA2 Sonoran TC11 Mojave TC14 Mojave File: EdwardsEtAl_Genotypes_shared_by_all_6_Gopherus.txt We executed the python script provided by De Wit et al. (2012) to parse out only the variant sites for which we have genotype information for all individuals with a Phred quality score cutoff of 20 (From file: EdwardsEtAl_Filtered_SNPS_6Gopherus.vcf). This file can now be opened in Microsoft Excel to visualize the genotype data.