This RAWLENCE_ET_AL_2022_README.txt file was generated on 2022-03-28 by Nic Rawlence GENERAL INFORMATION 1. Title of Dataset: Data from: Rapid radiation of Southern Ocean shags in response to receding sea ice. 2. Author Information A. Principal Investigator Contact Information Name: Dr Nic Rawlence Institution: Department of Zoology, University of Otago, Dunedin, New Zealand Email: nic.rawlence@otago.ac.nz B. Associate or Co-investigator Contact Information Name: Dr Martyn Kennedy Institution: Department of Zoology, University of Otago, Dunedin, New Zealand Email: martyn.kennedy@otago.ac.nz 3. Date of data collection: 2013-2021 4. Geographic location of data collection: Dunedin, New Zealand 5. Information about funding sources that supported the collection of the data: Department of Zoology, University of Otago; French Polar Institute; German Research Foundation; British Antarctic Survey SHARING/ACCESS INFORMATION 1. Licenses/restrictions placed on the data: None 2. Links to publications that cite or use the data: https://onlinelibrary.wiley.com/doi/10.1111/jbi.14360 3. Was data derived from another source? No 4. Recommended citation for this dataset: Rawlence et al. (2022), Data from: Rapid radiation of Southern Ocean shags in response to receding sea ice, Dryad, Dataset DATA & FILE OVERVIEW 1. File List: File 1 Name: Leucocarbo_mt_nuDNA_v6_edited.nex File 1 Description: Mitochondrial and nuclear gene alignment for phylogenetic analysis File 2 Name: Leucocarbo_CR_Haplotype.txt File 2 Description: Mitochondrial Control Region alignment for phylogenetic network analysis METHODOLOGICAL INFORMATION 1. Description of methods used for collection/generation of data: Tissue, blood or feathers were obtained from a number of sources, covering the geographic distribution of Leucocarbo shags (Appendix Table S1.1–1.2). Total genomic DNA was extracted using a phenol/chloroform extraction, a 5% Chelex 100 solution or the Qiagen DNeasy Tissue Kit (Kennedy & Spencer, 2014; Walsh et al., 2013). Negative controls were included with each extraction. For the phylogenetic dataset (using a single location per taxon, except for four taxa where two locations were used and one taxon where three locations were used, see Appendix Table S1.1) DNA was amplified for five mitochondrial genes (12S, overlapping ATPase 8 and 6, ND2, COI) and five nuclear genes (FIB7, PARK7, IRF2, CRYAA and RAPGEF1) (following Kennedy & Spencer, 2014; Kennedy et al., 2019). The phylogenetic dataset's primer details are shown in Table S1 of Kennedy et al. (2019). To investigate within taxon diversity, control region (CR) sequences were amplified for multiple individuals per location (except for two taxa where only a single individual was able to be used, see Appendix Table S1.2) (following Rawlence et al., 2014). Negative controls were included with each PCR. PCR products were purified using the Ultra-Sep Gel extraction kit (Omega) and sequenced on an Applied Biosystems 3730 capillary sequencer. Newly generated sequences for the phylogenetic and CR datasets were added to, and aligned with, those previously published (Kennedy & Spencer, 2014; Rawlence et al., 2014; Rawlence et al., 2017) (see Appendix Tables S1.1-S1.2). 2. Methods for processing the data: The phylogenetic dataset (including two outgroups, the double-crested cormorant Nannopterum auratum and the neotropic cormorant N. brasilianum [Kennedy & Spencer, 2014; Gill et al., 2021], see Appendix Table S1.1) was divided into nine partitions, five nuclear loci and four mitochondrial loci (the overlapping ATPase 8 and 6 were treated as a single ATPase partition). Models of nucleotide substitution were selected using the Akaike Information Criterion of Modeltest 3.7 (Posada & Crandall, 1998). The models selected for each gene region were as follows: HKY + I for 12S (2st + I), GTR + I for ATPase (6st + I), TIM + G for ND2 (6st + G), TIM + I for COI (6st + I), HKY + I for FIB7 (2st + I), HKY for PARK7 (2st), TrN for IRF2 (6st), HKY for CRYAA (2st) and HKY + I for RAPGEF1 (2st + I). We used StarBEAST2 (v. 0.15.13) implemented in BEAST 2.6.3 (Bouckaert et al., 2019) to jointly infer the Leucocarbo species tree along with co-estimation of the mitochondrial and individual nuclear gene trees. We implemented an analytical population size integration model (Bouckaert et al., 2019), unlinked substitution models for all partitions, linked trees for mitochondrial genes and a birth–death tree prior. Strict molecular clocks were used due to the shallow phylogenetic scale encompassed by Leucocarbo shags and absence of fossil calibration points within crown-group Phalacrocoracidae (Worthy, 2011); one linked clock for nuclear genes and unlinked clocks for mitochondrial genes. The clock rates for mitochondrial genes were modelled as normal priors with mean substitution rate estimated from rates for terminal nodes 33–39 (i.e. the clade that encompasses Leucocarbo and Pelecaniformes, with the most recent common ancestor at node 102) in the phylogeny of Figure S2 of Pacheco et al. (2011). Rates corresponding to terminal nodes 33–39 were obtained from Table S2 of Pacheco et al. (2011). The mean substitution rates in substitutions/site/million years (s/s/Ma) (and standard deviations) used in our analysis were as follows: ND2: 0.00388 s/s/Ma (0.0013); COX1: 0.00232 s/s/Ma (0.0007); ATPase: 0.0029 s/s/Ma (0.0011); and 12S: 0.00145 s/s/Ma (0.0011). Substitution rates for individual nuclear genes were modelled using uninformative 1/X priors. We ran three independent MCMC chains, each run for 50 million steps, sampling every 5000 steps. Additionally, to estimate per species population sizes, analyses were rerun using the linear with constant root populations model (Barido-Sottani et al., 2018; Heled & Drummond, 2010) with the same parameters, but increasing the MCMC chain to 100 million steps, sampling every 10,000 steps. We checked for convergence and sufficient sampling of parameters in Tracer v1.7.1 (Rambaut et al., 2018) and combined individual runs after discarding the first 10% of steps as burn-in in LogCombiner. Maximum clade credibility consensus trees were generated in TreeAnnotator using the median node age. DensiTree v2.2.7 (Bouckaert, 2010) was used to simultaneously visualize all trees post burn-in and generate consensus trees scaled by estimated effective population size. PopArt (Leigh & Bryant, 2015) was used to construct a median joining network of the mitochondrial control region (CR) data. Sites with >5% unidentified states were masked in the analysis.