Population genomics and phylogeography of four Australasian waterfowl

ABSTRACT Biogeographic barriers can restrict gene flow, but variation in ecological drivers of dispersal influences the effectiveness of these barriers among different species. Detailed information about the genetic connectivity and movement of waterfowl across biogeographic barriers in northern Australia and Papua New Guinea is limited. We compared genetic connectivity for four species of Australasian waterfowl that vary in their capacity and predisposition for dispersal: Radjah Shelduck (Radjah radjah), Wandering Whistling Duck (Dendrocygna arcuata), Green Pygmy Goose (Nettapus pulchellus), and Pacific Black Duck (Anas superciliosa). We obtained >3,700 loci from double-digest restriction-associated DNA sequencing for 15 to 40 individuals per species and found idiosyncratic patterns of population structure among the four species. The mostly sedentary Radjah Shelduck exhibited clear genetic differences between New Guinea and Australia as well as among locations within Australia. Although the population structure was consistent with isolation by distance, the Torres Strait and Carpentaria Barrier contributed more to genetic differences than geographic distance alone. In contrast, the presumed sedentary Green Pygmy Goose did not show obvious structure. Likewise, populations of the more dispersive Wandering Whistling Duck and Pacific Black Duck were unstructured and genetically indistinguishable between southern New Guinea and northern Australia. Our data suggest that some Australo-Papuan biogeographical barriers are insufficient to impede gene flow in waterfowl species capable of dispersing great distances. In sedentary species like the Radjah Shelduck, these barriers, perhaps coupled with its ecology and natural history, restrict gene flow. Our findings bring new insight into the population ecology of Australo-Papuan waterfowl.


Introduction
Species' ranges are the product of evolutionary history, fundamental and realised niches, historical or current environmental conditions, and movement ecology (e.g. Chase and Leibold 2003;Broennimann et al. 2006;Cumming et al. 2012). The integration of movement ecology (Cumming et al. 2012) into classical biogeographical theory is necessary to provide a greater understanding of responses to habitat and climate change and is essential to our ability to predict those responses (Winkler et al. 2014;Lamb et al. 2019). Specifically, dispersal ability is an indicator of a species' potential to escape declining environmental conditions, find mates, and exploit seasonal or variable resource surges (Nathan et al. 2008;Cumming et al. 2012).
Impediments to dispersal can facilitate local adaptation, allopatric speciation, and idiosyncratic patterns in distribution among species (Mayr 1942;Kyrkjeeide et al. 2016). Thus, changes in environmental conditions and species' responses to them can potentially contribute to the formation of new species, especially when a species' capacity for dispersal is low. Although banding, telemetry, and aerial surveys can be used to study movement patterns, such methods are often hampered by small sample sizes, short time periods, and year-to-year variation in dispersal patterns (Mech and Barber 2002;Corrigan et al. 2018;Caley et al. 2022). Conversely, studying the molecular diversity of populations can reveal whether populations are genetically connected over long periods of time, and estimates of connectivity can often be accomplished with a few individuals (Sonsthagen et al. 2019;Brown et al. 2021). Thus, by identifying systematic variation in allele frequencies, phylogeographic investigations can provide insight into both historical and current impediments to gene flow.
In Australia and New Guinea, approximately 30 biogeographic barriers, including both terrestrial and marine, have been inferred from the distributions of multiple species (reviews in Schodde and Mason 1999;Schodde 2006;Bryant and Krosch 2016). Major marine, climatic, habitat or topological barriers within the region under study here include the Torres Strait, Barkly Tablelands, several barriers within the Great Dividing Mountain Range and the Range itself, the Carpentaria Barrier, and regions of savanna and rainforest that could hinder movements between freshwater wetland areas in Papua New Guinea (Supplementary material; Figure S1). The strength of these barriers is expected to vary dramatically among species, depending on species-specific idiosyncrasies in habitat requirements, capacity for dispersal, and other life history traits.
Historically, the importance of these barriers fluctuated as climatic oscillations resulted in variable sealevels and the variable extent of the Australian arid zone (Byrne et al. 2008). Fluctuations in sea level created and dissipated marine barriers to movement between Australia and the island of New Guinea (Keast 1961;Ford 1987;Schodde and Mason 1999;Joseph et al. 2019;Lamb et al. 2019). Several phylogeographic studies have documented the influence of these barriers and the eustatic fluctuations throughout the Pleistocene on the genetic structuring and genomic divergence of populations of many Australo-Papuan birds (Donnellan et al. 2009;Joseph and Omland 2009;Guay et al. 2010;Kearns et al. 2010;Toon et al. 2010;Dhami et al. 2013;Dolman and Joseph 2012;Edwards et al. 2017;Peñalba et al. 2019;Brown et al. 2021).
Understanding of the relationship between movement behaviour and genetic connectivity across biogeographic barriers is limited in Southern Hemisphere birds, especially in remote parts of Australia and the island of New Guinea (Peck and Congdon 2004;Rhymer et al. 2004;Guay et al. 2010;Dhami et al. 2013;Roshier et al. 2012;Peñalba et al. 2019). A study of genetic connectivity of two closely related species of Australian teal, for example, revealed differential patterns of population genetic structure consistent with differences in movement behaviour (Dhami et al. 2013). The Grey Teal (Anas gracilis) is a highly dispersive species that lacks population structure despite being geographically distributed across known biogeographic barriers. In contrast, the more sedentary Chestnut Teal (Anas castanea) is more structured across its range in the southern portions of Australia's mesic zones. Here, we report on genetic connectivity in four waterfowl species distributed across the Torres Strait in both southern Papua New Guinea and northern Australia.

Study species
Our study taxa are four waterfowl species that differ in life history traits, ranges, and dispersal propensity, and which occur in both Australia and the island of New Guinea: Radjah Shelduck (Radjah radjah), Green Pygmy Goose (Nettapus pulchellus), Wandering Whistling Duck (Dendrocygna arcuata), and Pacific Black Duck (Anas superciliosa). Within these species, subspecies are often divided by important geographical features. Following the International Ornithological Congress checklist of birds (Gill et al. 2023), Radjah Shelduck comprises two recognised subspecies: R. r. radjah in New Guinea and Maluku Islands (Moluccas) and R. r. rufitergum in northern Australia. The Pacific Black Duck has two recognised subspecies: A. s. superciliosa is in southern New Guinea, Australia and New Zealand, whereas A. s. pelewensis is in northern New Guinea and several islands throughout the South Pacific. The Wandering Whistling Duck has three recognised subspecies: D. a. arcuata in the Philippines and Indonesia, D. a. australis in southern New Guinea and northern Australia, and D. a. pygmaea in northern New Guinea. In contrast, the Green Pygmy Goose is monotypic.
Among our studied species, Radjah Shelduck and Green Pygmy Goose are thought to only make restricted, local movements. In contrast, Wandering Whistling Duck and Pacific Black Duck readily undertake long-distance movements of hundreds of kilometres in response to environmental factors (Marchant and Higgins 1990;McEvoy et al. 2015), although some populations on permanent waters can be mostly sedentary. Our objective was to quantify the genetic connectivity among sampling locations of these species of ducks within and between Papua New Guinea and Australia. We predict that as a result of greater capacity and predisposition for dispersal, the Pacific Black Duck and Wandering Whistling Duck (long-and intermediate-distance movements, respectively) will have greater genetic connectivity among locations (i.e. no population sub-structuring) than the Green Pygmy Goose or Radjah Shelduck (local movements). In particular, we predicted that the Torres Strait, separating Papua New Guinea and Australia, serves as a prominent barrier to dispersal. In addition, given the discontinuous breeding ranges of several species of ducks in northern Australia, we predicted that the Carpentaria Barrier will separate eastern and western populations in northern Australia and that the Black Mountain Barrier and Burdekin Gap will separate populations within northern Queensland (see Figure S1).

Sampling
Tissue samples for Radjah Shelduck (N = 32), Wandering Whistling Duck (N = 27), Green Pygmy Goose (N = 15), and Pacific Black Duck (N = 40) from northern Australia and Papua New Guinea (PNG) were obtained from the Australian National Wildlife Collection (Supplementary Material, Table S1). The PNG samples were collected from the Western Province and Central Province (PNG-W, PNG-C, respectively), and those from Australia were from northern Western Australia (WA), Northern Territory (NT), Cape York Peninsula (CYP), and eastern Queensland south of Cape York Peninsula (QLD).

Library preparation and assembly
To generate DNA sequences from a pseudo-random sampling across the genome, we used the doubledigest restriction site associated DNA sequencing (ddRAD-seq) (Miller et al. 2007) protocol of Dacosta and Sorenson (2014). Genomic DNA was extracted using a DNeasy Blood and Tissue Kit (Qiagen, Valencia, USA) and quantified using a NanoDrop 2000 Spectrophotometer (Thermo Scientific, Waltham, USA). Approximately 1 µg of genomic DNA was digested using 10 U of restriction enzymes SbfI and EcoRI. Adapters containing sequences compatible with Illumina TruSeq reagents and barcodes for de-multiplexing were ligated to the sticky ends generated by double digest. Adapterligated DNA was size-selected (300-450 bp) using gel electrophoresis (2% low-melt agarose) and a MinElute gel extraction kit (Qiagen, Valencia, USA). Size-selected fragments were amplified using the polymerase chain reaction (PCR) with Phusion high-fidelity DNA polymerase (Thermo Scientific, Pittsburgh, USA). Amplified products were purified using magnetic AMPure XP beads (Beckman Coulter Inc., Indianapolis, USA) and quantified using real-time PCR with an Illumina library quantification kit (KAPA Biosystems, Wilmington, USA) on an ABI 7900HT SDS (Applied Biosystems, Foster City, USA). Samples with compatible barcode combinations were pooled in equimolar concentrations, and multiplexed libraries were sequenced on an Illumina HiSeq 2500 at TUCF Genomics, Tufts University (Medford, USA).
Raw Illumina reads from each species were processed in separate runs using the computational pipeline of Dacosta and Sorenson (2014) [Scripts available at: http://github.com/BU-RAD-seq/ddRADseq-Pipeline]. Reads from all individuals were assigned to individual samples based on unique barcode combinations. For each sample, identical reads were combined while retaining read counts and the highest quality score for each nucleotide position. Individual reads >10% divergent and/or those with an average Phred score of <20 were removed. Reads were then clustered into putative loci using the UCLUST function in USEARCH v. 5 (Edgar 2010) with an -id setting of 0.85. The highest read quality from each cluster was mapped to an assembled mallard (Anas platyrhynchos) reference genome (provided by T. Farault, unpubl. data) using BLASTN V. 2 (Altschul et al. 1990). Reads within each cluster (i.e. putative locus) were aligned using MUSCLE V. 3 (Edgar 2004), and samples within each aligned cluster were genotyped. Alignments with end gaps due to indels and/or polymorphisms at the SbfI restriction site were either automatically trimmed or flagged for manual editing. Alignments with greater than or equal to two polymorphisms in the last five base pairs were also flagged for manual inspection. Genotypes were scored as described in Dacosta and Sorenson (2014): homozygous genotypes were defined if greater than 93% of sequence reads were consistent with a single haplotype, whereas heterozygotes were defined if a second haplotype was represented by at least 29% of reads, or if a second haplotype was represented by as few as 10% of reads and the haplotype was present in other individuals. Individual genotypes that did not meet either criteria or contained more than two haplotypes were flagged. From these flagged samples, we retained the allele represented by the majority of reads and scored additional alleles as missing data. The second allele was scored as missing for apparently homozygous genotypes based on 1 to 5 reads which was considered 'low depth'. We retained all loci, with ≤10% missing genotypes and ≤5% flagged genotypes.
We categorised ddRAD-seq loci as either autosomal or sex-linked on the basis of alignments to the Mallard genome and on sex-specific patterns of read depth and heterozygosity (Lavretsky et al. 2015;Peters et al. 2016). Because females carry only one copy of the Z-chromosome, they should have half as many reads for Z-linked loci compared to males and no heterozygosity. In contrast, the number of reads for autosomal loci should be comparable between males and females.

Kinship
To eliminate relatedness as a confounding factor in our population analyses, we calculated genotype similarities among individuals using maximum likelihood as implemented in ML-RELATE (Kalinowski et al. 2006). ML-RELATE assumes that no two individuals being compared are inbred, no migrants enter the population, and individuals are sampled from a panmictic population. Under these conditions, full siblings are expected to have a relatedness (r) value of 0.5, and half siblings will have an r of 0.25.
When populations are not panmictic, individuals will be more likely to breed with genetically similar individuals within subpopulations, which can result in individuals being genetically equivalent to close kin in analyses of relatedness that include individuals from other subpopulations. Thus, to determine whether inferred sibling groups were indicative of true siblings or resulted from population structure, we used SPAGeDi 1.5a to calculate Ritland's (1996) kinship coefficient (F ij ) between pairs of individuals and an inbreeding coefficient (F) within individuals (Hardy and Vekemans 2002). The kinship coefficient is the probability that two alleles, one sampled from each of two individuals, are identical by state for a given locus. The inbreeding coefficient is the probability that an individual carries two alleles that are identical by state at a given locus. Both of these values are calculated relative to the probability of randomly sampling two identical alleles from the entire population; thus, positive values of F ij indicate two individuals are more related to each other than expected by chance, and negative values indicate individuals are less related than expected. Similarly, positive values of F indicate inbreeding.
If our population is panmictic and a random sample from that population does not contain siblings, then we expect both F ij and F to be near zero. Deviations from our null hypothesis would suggest that our samples include related individuals or that our population is subdivided (Ritland 1996). In subdivided populations, individuals from the same population will be genetically more similar to each other than individuals from different populations, and therefore, kinship coefficients will be positive. Population subdivision should also result in higher inbreeding coefficients (i.e. higher homozygosity) than expected under a model of panmixia. In contrast, if a sample from a population that is not subdivided includes siblings, then we expect kinship coefficients between those individuals to be high relative to inbreeding coefficients. In the event that these analyses suggested that siblings were a better explanation of genetic similarity than population structure, we filtered the data for full-sib (r ≈ 0.5) and half-sib (r ≈ 0.25) relationships, retaining for further analyses the individual with the largest number of recovered loci.

Population structure
We used several approaches to visualise the population structure within each species. First, to determine if our waterfowl species have discrete population units or are a panmictic population (see Peters et al. 2016), we used the R package adegenet 2.1.2 (Jombart 2008) to conduct a Principal Component Analysis (PCA), which reduces the high dimensionality of genomic data sets while capturing the major components of genetic variation. In general, individuals that are more genetically similar are expected to cluster together within the PCA plot. For this analysis, alleles were categorised based on full sequences and coded from 1 to n, where n is the observed number of unique alleles/haplotypes at each locus.
Second, we identified the optimum number of genetic populations (K) within each species by calculating individual assignment probabilities in STRUCTURE v. 2.3.4 (Pritchard et al. 2000). Alleles were coded as described for the PCA (see above). We evaluated the ln Pr(X|K) for K populations of one to six and without incorporating a priori information about sampling locality or population origin. We included only parsimony informative loci in analyses. STRUCTURE was run using an admixture model and correlated allele frequencies for 500,000 burn-in and 1,000,000 sampling generations. We replicated each analysis five times and calculated ∆K to determine the most likely number of populations (Evanno et al. 2005) using STRUCTURE HARVESTER v0.6.94 (Earl and von Holdt 2012).
Third, composite pairwise values of the relative divergence (i.e. Φ ST ; the proportion of total genetic variation partitioned among groups) and nucleotide diversity (the average number of pairwise differences among all sequences within populations) were calculated in the R package PopGenome 2.6.1 (Pfeifer et al. 2014).
Finally, we measured the association between geographic Euclidean distance and relatedness between all sampling sites across Australia and Papua New Guinea. We calculated pairwise kinship coefficients (F ij ) as described above. A Mantel test was then used to test for a correlation between geographic distance between pairs of samples and kinship using the program zt (Bonnet and Van de Peer 2002). Under an isolation-by-distance model, we expect a negative correlation between kinship and geographic distance, because individuals within close proximity have a higher probability of sharing alleles. In the event of significant isolation by distance, we used a partial Mantel test to determine whether the Carpentaria Barrier, Torres Strait, and/or the Black Mountain Barrier and Burdekin Gap contributed to population structure while controlling for geographic distance. For these analyses, pairwise comparisons were coded as 0 if the two individuals were sampled from the same side of the putative barrier and 1 if they were sampled on opposite sides of the barrier. Significance for both tests was based on 100,000 randomisations of variables.
For species with evidence of population structure, relationships among groups were visualised by estimating a maximum likelihood (ML) species tree in combination with the direction and weight (w) of gene flow based on allele frequencies in the program TreeMix version 1.12 (Pickrell and Pritchard 2012). An unrooted nuclear tree was reconstructed using bi-allelic SNPs. Analyses were run across each bi-allelic SNP (-k 1), with global rearrangement occurring during tree building (-global). The node support was based on 1,000 bootstraps using the python script treemix_tree_with_bootstraps.py (https://github.com/mgharvey/misc_python/blob/ master/bin/TreeMix/treemix_tree_with_bootstraps. py). The final tree and nodal support were summarised across bootstraps using TreeAnnotator v2.5.2 and viewed in FigTree v1.4.0 (http://tree.bio. ed.ac.uk/software/figtree). Connectivity among populations was analysed by sequentially adding migration events (-m 0 -n) for up to the total number (n) of possible pair-wise population comparisons. The optimum number of migration edges is determined by the proportion of the variance explained by each migration model and estimated with the 'get_f()' R function provided with the TreeMix Package. To limit the overconfidence in the tree model, migration edges were added until >98% of the variance in the tree model was explained. Finally, standard errors (-se) calculated in TreeMix were used to assess the statistical significance of each migration edge.

Results
We obtained an average of 715,627 (± 396,054) high quality reads per individual. For each species, greater than 3,700 loci were recovered from >90% of the individuals with ≤5% flagged genotypes (Supplementary Material, Table S2). The median read depth for these loci for all four species was 135 (±75.2) reads per individual per locus. Genotypes were complete for 98.1% and partial (one allele scored) for 0.6% of the individuals per locus. We inferred 94.3-95.7% of the loci were autosomal and 4.3-5.7% of the loci were Z-linked (Table S2).

Radjah Shelduck
A total of 3,838 variable ddRAD-seq loci were recovered for 32 individual Radjah Shelducks. We detected four sets of closely related shelducks comprising a total of 10 individuals related at the half-sib or full-sib level (r = 0.18-0.49; Supplementary Material, Figure S2). Based on relatedness and inbreeding values, two pairs of sibling groups (N = 4 individuals in total) from WA were identified; in both cases, putative siblings were collected on the same date and location. The WA sibling groups had high kinship (avg. F ij = 0.19) and low inbreeding coefficients (avg. F = 0.03), which suggests a level of relatedness that is not the result of population subdivision. In contrast, a group of four highly related (r = 0.21-0.49) Radjah Shelducks were identified in eastern Queensland, but they were sampled over three different years and from two sites. These QLD samples had high inbreeding coefficients (avg. F = 0.11), suggesting the high kinship resulted from genetic isolation rather than true sibships. Finally, our last sibling group consisted of two PNG-C samples with high kinship (F ij = 0.11) and high inbreeding coefficients (avg. F = 0.15), which suggests a population structure. Thus, the QLD and PNG-C samples were retained for further analyses, whereas one individual from each of the WA sibling groups was removed. Relatedness and inbreeding coefficients were low for the remaining samples (r < 0.01; avg. F = 0.04). After filtering out full-and half-siblings, our final sample size for Radjah Shelduck was 29 individuals.
Both PCA (Figure 1(b)) and STRUCTURE (Figure 1(c)) revealed four main Radjah Shelduck clusters, which largely corresponded with geography (Figure 1(a)). WA and NT samples clustered together, whereas samples from CYP, QLD, and PNG-C each clustered into different groups (Figure 1(b)). Analyses in STRUCTURE indicated the data best fit a four-population model, and population horizontal stripe), and central PNG (PNG-C; diagonal stripe). b) Principal component analysis using 1,789 parsimony-informative ddRAD-seq loci from 29 individuals revealing site-specific clusters. Percentages on the x-and y-axes indicate the percent of total genetic diversity explained by PC1 and PC2, respectively. c) STRUCTURE bar plots for K = 4 populations (best-supported model) and K = 5 populations, which revealed additional separation within PNG. assignment probabilities mirrored the PCA results (Figure 1(c)). However, the additional resolution provided when evaluating assignment probabilities at K = 5 suggested PNG-W and PNG-C may be genetically differentiated. Estimates of relative differentiation further supported a moderate level of genetic differentiation (Φ ST = 0.09). Specifically, WA and NT were genetically similar, indicating weak or no population structure (Φ ST = 0.01), whereas all the other sampling locations were genetically differentiated from each other (Φ ST = 0.04-0.17; Table S3).
Overall, we found a significant pattern of isolation by distance (r = −0.69, P < 0.00001). Controlling for the influence of geographic distance, the Torres Strait contributed significantly to differentiation (i.e. PNG and Australian individuals were more differentiated from each other than expected based on geographic distance alone; r = −0.35, P < 0.00001), Carpentaria Barrier contributed somewhat to differentiation (r = −0.15, P = 0.018), whereas the Black Mountain Barrier and Burdekin Gap did not (r = 0.09, P = 0.13).
Relationships among the four Radjah Shelduck populations were further visualised in TreeMix, where a tree was reconstructed using bi-allelic ddRAD-seq autosomal SNPs. A TreeMix tree without gene flow was the optimum model and was found to explain >99% of the variance ( Figure S3). Nevertheless, a single migration edge from CY to QLD was recovered across trees when the model included gene flow ( Figure S3). Finally, closer relationships were recovered between CY and NT/WA and between QLD and PNG.

Wandering Whistling Duck
We recovered 4,150 variable ddRAD-seq loci in 26 individual Wandering Whistling Duck samples. From these data, we detected three sets of closely related individuals comprising a total of 11 individuals related at the half-sib or full-sib level (r = 0.13-0.51; Figure S4). A group of five individuals collected from the same site and on the same day, had high relatedness (r = 0.41-0.50), high kinship coefficients (F ij = 0.11-0.14) but low inbreeding coefficients (avg. F < 0.0). Another group of four individuals from a different site on CYP were also all collected on the same day and had high relatedness (r = 0.41-0.48), high kinship (F ij = 0.13-0.16), but low inbreeding (avg. F < −0.005). In contrast, three additional individuals collected from the same site and day had low relatedness and low kinship with each other and with the four related individuals (r = 0.0; F ij < −0.009). Finally, two individuals collected from a third site on CYP appeared to be related at a level equivalent to first cousins (r = 0.13) and had a moderately high kinship coefficient (F ij = 0.058) but low inbreeding (F = 0.01). The high relatedness and high kinship coefficients, coupled with low inbreeding coefficients, suggest that genetic similarity among individuals is better explained by familial relationships rather than population structure. Relatedness was low for the remaining samples (r < 0.01; F ij < −0.007).
After filtering out closely related individuals (N = 8), our final sample size for Wandering Whistling Duck was 18 samples (Figure 2(a)). Pairwise composite Φ ST values across sampling sites were generally low, with an average of 0.021 (Φ ST range = 0.006-0.037; Table S3). In addition to these low Φ ST estimates, PCA (Figure 2(b)) and STRUCTURE (Figure 2(c)) demonstrated an absence of population structure across sites, and there was not a significant pattern of isolation by distance (r = 0.092, P = 0.32).

Green Pygmy Goose
We recovered 4,110 variable ddRAD-seq loci in 15 individual Green Pygmy Goose samples for our analyses. After removing one individual with a low number of reads, our final sample size was 14 individuals. No sibling relationships were recovered (r < 0.01; Figure  S5). Pairwise sampling site composite Φ ST values ranged from 0 to 0.015 (Table S3), indicating no genetic structure amongst our samples. PCA (Figure 3(b)) and STRUCTURE (Figure 3(c)) corroborated the lack of genetic structure, and there was not a significant pattern of isolation by distance (r = 0.0080, P = 0.46).

Pacific Black Duck
We recovered 3,720 variable ddRAD-seq loci in 40 individual Pacific Black Ducks for our analyses. From our data, we did not detect half-or full-sib groups; relatedness was low among all samples (r < 0.01; Fig.  S6). Low pair-wise composite Φ ST (Φ ST range = 0.005-0.035; Table S3), an absence of clustering in PCA (Figure 4(b)) and STRUCTURE (Figure 4(c)), and no evidence of isolation by distance (r = 0.033, P = 0.17) suggested a lack of population structure across the six black duck sampling sites.

Discussion
Here, we present population genetic data from four waterfowl species -Radjah Shelduck, Green Pygmy Goose, Wandering Whistling Duck and Pacific Black Duck -each having been sampled in both southern Papua New Guinea and throughout northern Australia. Based on reported differences in their Principal component analysis using 2,812 parsimony-informative ddRAD-seq loci from 18 individuals did not reveal sitespecific clusters. Percentages on the x-and y-axes indicate the percent of total genetic diversity explained by PC1 and PC2, respectively. c) STRUCTURE bar plot for K = 2 populations; K = 1 population was the best-supported model, and K = 2 populations did not reveal population genetic structure. Percentages on the x-and y-axes indicate the percent of total genetic diversity explained by PC1 and PC2, respectively. c) STRUCTURE bar plot for K = 2 populations; K = 1 population was the best-supported model, and K = 2 populations did not reveal population genetic structure.
propensity to move long distances (Marchant and Higgins 1990;Kear 2005), we predicted differences in genetic connectivity among our study species. In general, population structure or lack thereof was partially explained by differences in vagility. For example, both Pacific Black Duck (Figure 4) and Wandering Whistling Duck (Figure 2) were unstructured across sampling sites, and we did not find evidence of isolation by distance. This agrees with our initial expectations for greater genetic connectivity in these species as a result of a high propensity for long-distance movements in response to resource availability (Marchant and Higgins 1990;Kear 2005). Our finding of low genetic differentiation among Australian sampling sites of Pacific Black Duck was consistent with results from 19 nuclear introns investigated by Brown et al. (2021), though the same study reported population structure in mitochondrial DNA sequences, suggesting sex-biased dispersal. The nuclear DNA results further suggest that the proposed biogeographic barriers are not impeding gene flow for these species. We note, however, that the northern New Guinean populations of both Wandering Whistling Duck and Pacific Black Duck are recognised as a different subspecies (Beehler et al. 2016). As our sampling did not include any individuals from that region, there might be more population structure over a short distance than what was detected in this study.
Consistent with their sedentary nature, our data suggest at least four subpopulations in Radjah Shelduck: (1) Western Australia and Northern Territory, (2) Cape York Peninsula, (3) eastern Queensland (notwithstanding some evidence of inbreeding there), and (4) Papua New Guinea ( Figure 2). Population structure reflected a pattern of isolation by distance, but there was also evidence of geographic barriers contributing to subpopulation differentiation. First, differentiation between southern Papua New Guinea and northern Australia was greater than expected given geographic distance, suggesting the Torres Strait acts as an effective barrier to movement of this species, and is consistent with the recognition of separate subspecies on these landmasses (Marchant and Higgins 1990). We also found support for population structure between the two sampled sites in Papua New Guinea, but this is based on a limited number of samples per site. Second, although subtler than the Torres Strait, the Carpentaria Barrier, separating eastern and western subpopulations, also had a greater contribution to population structure than geographic distance alone. These findings suggest that limited dispersal and movement coupled with landscape complexity have contributed to this population structure. parsimony-informative ddRAD-seq loci from 40 individuals did not reveal sample-site specific clusters. Percentages on the xand y-axes indicate the percent of total genetic diversity explained by PC1 and PC2, respectively. c) STRUCTURE bar plot for K = 2 populations; K = 1 population was the best-supported model, and K = 2 populations did not reveal population genetic structure.
Interestingly, Radjah Shelduck samples from eastern Queensland were highly inbred. This inbreeding likely explains why they were so highly segregated from the other shelduck populations (Figure 2(b,c)) given that reduced genetic diversity within the region will contribute to greater differentiation from neighbouring populations in metrics like F ST . Yet, this differentiation fit the broader pattern of isolation by distance rather than the Black Mountain and Burdekin Gap acting as barriers. Consistent with this scenario, gene flow from Cape York to Queensland, across these hypothesised barriers, was the only inference of gene flow uncovered in our TreeMix models. Sampling in closer proximity to these potential barriers could clarify the contribution of geographic distance versus physical barriers to inbreeding in this population.
We predicted population structure within the Green Pygmy Goose because this species was thought to be largely sedentary (Marchant and Higgins 1990). However, the genetic data support a single, panmictic population (Φ ST = 0; Figure 3), which is consistent with this species being treated as monotypic. Weak genetic differentiation may reflect a recent population expansion and an insufficient amount of time for genetic differences to accumulate. Alternatively, our results may suggest that Green Pygmy Goose is more dispersive than previously thought and that individuals occasionally or even regularly move between regions. The presumed sedentary nature might simply reflect the limited natural history and movement data on this species. Uncovering discordances between presumed dispersal behaviour and the genetic signature of a species demonstrates the utility of genome-scale data in quantifying genetic connectivity as a means to better understand the movement ecology of poorly studied species (Sonsthagen et al. 2019).
One caveat to our interpretations of population structure is that uneven sampling can bias population assignments, especially in the context of interpreting admixture proportions and distorting principal component space (McVean 2009, Lawson et al. 2018. Furthermore, if a differentiated population is represented by only one or two samples, the population structure could go undetected in our analyses. Thus, the lack of population structure in Wandering Whistling Duck, Pacific Black Duck, and Green Pygmy Goose should be interpreted cautiously. In several cases, our sampling was likely insufficient for detecting population structure, especially if subtle allele frequency differences distinguish populations. Increasing sample sizes and expanding geographic coverage would provide greater resolution for further testing our inferences.

Kinship confounds population structure
In our study, we found several pairs/groups of genetically similar individuals within Radjah Shelduck and Wandering Whistling Duck. Including or excluding familial groups from analyses of the shelducks did not have a major effect on the visualisation of population structure ( Figure S7). However, the effect familial groups had on the visualisation of population structure was especially prominent in the Wandering Whistling Duck ( Figure S8). The inclusion of familial groups in our whistling duck data set resulted in three clusters. Individuals from Cape York Peninsula were dispersed among all three clusters, whereas all other individuals grouped together within a single cluster. To mitigate high relatedness skewing estimates of population structure, we retained one representative sample from each familial group. After sib-groups were excluded from our analyses, we observed no site-specific clustering or evidence of population structure.
Our Wandering Whistling Duck results conflict with previously reported inferences of population structure by Roshier et al. (2012), who investigated population structure in the same species using microsatellite data from seven nuclear loci. They found fine-scale population structure from samples collected in northern Australia, Papua New Guinea and Timor Leste (Roshier et al. 2012). Indeed, two flocks sampled 12.5 km and 1 week apart in the Aurukun area of the western Cape York Peninsula were genetically differentiated. In our study, we included a small subset of the samples from Roshier et al. (2012) and found high kinship among several pairs of individuals collected from this region. This suggests that kin groups might have contributed to the signatures of population structure. However, kinship seems inadequate to explain all the patterns. For example, in addition to the structure found within the Cape York Peninsula, Roshier et al. (2012) found significant differentiation between Papua New Guinea and Western Australia. Yet despite including a subset of the same samples as Roshier et al. (2012), we detected neither population differences between nor kin groups within these regions. The larger sample size examined in their study may have provided finer scale resolution for detecting population structure than our data, despite the much lower number of markers examined.
Our results further demonstrate that finding highly related individuals within a sampling site can be the product of population structure, kinship, or both (Sul et al. 2018). In general, a strong population structure would indicate that populations are subdivided, that gene flow is restricted, and thus, that individuals within the local population are genetically more similar to each other than to individuals from different populations (Ritland 1996). Conversely, kinship, sibling or family groups would likely underestimate genetic differences among individuals within a particular region, and thus, would be of less value in estimating the genetic connectivity of these individuals to the larger population (Voight and Pritchard 2005;Sul et al. 2018). Future genomic studies with additional samples are necessary to resolve the contrast between our results and those of Roshier et al.'s (2012) microsatellite data to better understand population connectivity in Wandering Whistling Ducks.

Conclusions
Perhaps unsurprisingly, the Torres Strait, Carpentaria Barrier, Burdekin Gap, and other biogeographic barriers in northern Australia appear insufficient to impede gene flow in waterfowl that can disperse large distances across them, such as the Wandering Whistling Duck, Pacific Black Duck, and perhaps Green Pygmy Goose. In contrast, and arguably more surprisingly for such inherently vagile birds, these barriers coupled with the birds' ecology and natural history interact to restrict gene flow for the mostly sedentary Radjah Shelduck. Our study has provided significant new and sometimes surprising knowledge concerning the genetic connectivity of these four waterfowl species. Our data offer insight for conservation and wildlife biologists into the movements and structure of waterfowl populations within the Australo-Papuan region that are not so readily available through other kinds of data, such as from tracking and behavioural studies.