Going against the flow: Barriers to gene flow impact patterns of connectivity in cryptic coral reef gobies throughout the western Atlantic

Complex oceanographic features have historically caused difficulty in understanding gene flow in marine taxa. Here, we evaluate the impact of potential phylogeographic barriers to gene flow and assess demography and evolutionary history of a coral reef goby species complex. Specifically, we test how the Amazon River outflow and ocean currents impact gene flow.


| INTRODUC TI ON
Gene flow is an important biological process that can lead to genetic homogenization of populations and prevent taxon divergence (Tigano & Friesen, 2016). Although marine systems exhibit few obvious biogeographic barriers, at least four major factors are known to impact among-population isolation throughout the oceanic realm.
First, life-history characteristics such as spawning modes influence overall dispersal potential such that demersal spawners typically exhibit greater genetic structure than pelagic spawners (e.g. Floeter et al., 2008). Second, currents can also act as barriers that prevent larval dispersal (Gaylord & Gaines, 2000), and lead to genetic isolation (e.g. Huyghe & Kochzius, 2018;Santos et al., 2006). Third, freshwater and sediment outflow from rivers can create nearly impassable biogeographic barriers to some taxa (Rocha, 2003). Fourth, genetic isolation can result among populations living in geographically isolated habitats, such as oceanic islands (Dias et al., 2019).
The western Atlantic is characterized by several biogeographic barriers and is thus divided into Caribbean and Brazilian biogeographic provinces (Floeter et al., 2008). At the largest scale, marked differences in species composition between the Caribbean and Brazilian provinces have been attributed to the Amazon River outflow which acts as an ecological barrier to gene flow (Briggs & Bowen, 2012;Rocha, 2003). At finer spatial scales, genetic patterns show an east-west division within the Caribbean and isolation of the Bahamas that are likely due to ocean currents (DeBiasse et al., 2016;Foster et al., 2012;Jackson et al., 2014;Taylor & Hellberg, 2006). Within the Brazilian province, genetic structure among populations of decapods, photosynthetic dinoflagellates and fish has been attributed to the Southern Equatorial Current (SEC) and Cabo Frio barriers resulting in northern, central and southern clusters (Boschi, 2000;Picciani et al., 2016;Santos et al., 2006). Species with lower dispersal potential (i.e. demersal spawners) are model species to evaluate the impact of biogeographic barriers throughout the western Atlantic. Two sister species, the bridled goby (Coryphopterus glaucofraenum Gill 1863) and the sand-canyon goby (C. venezuelae Cervigón 1966), are small (<55 mm), benthic fishes that lay demersal eggs in nests on sandy patches near coral reefs which they defend (Forrester et al., 2010). Both C. glaucofraenum and C. venezuelae occur throughout the Caribbean while C. glaucofraenum extends to southern Brazil (Robins & Ray 1986). Early studies of C. glaucofraenum described subtle morphological variation in bone and ray counts as well as coloration among western Atlantic populations (Böhlke & Robins, 1960). Subsequent studies elevated multiple subspecies of C. glaucofraenum to full species designation, including C. venezuelae, based on genetic and morphological characteristics (Baldwin et al., 2009). However, morphology is often similar between species within Coryphopterus, particularly larval morphology, which commonly leads to misidentification. Because most gobies are demersal spawners and often demonstrate significant genetic structure across their geographic range (Milá et al., 2017), it is likely that whole-range genetic sampling can facilitate taxonomic differentiation between closely related, sympatric species.
Here, we sampled across a broad geographic scale and employed mtDNA sequence data and thousands of single nucleotide polymorphisms (SNPs) to test if species-or population-level differences were present within C. glaucofraenum and C. venezuelae. We hypothesized that the Amazon River outflow and the isolation of the Brazilian oceanic reefs would create unique genetic clades indicative of species-level genetic divergence among populations on either side of these major barriers. Second, we hypothesized that minor barriers in both the Caribbean and Brazilian provinces would promote population structure such that populations within the eastern or western Caribbean would be more genetically similar to each other than populations compared across the Caribbean. Similarly, we hypothesized that coastal Brazilian populations would be separated into three genetic clusters based on the SEC and Cabo Frio barriers. Finally, we used coalescent simulations to test how dispersal occurred across these barriers.

| COI sequence analyses
We collected 112 individuals of C. glaucofraenum across the Brazilian coast and supplemented these with 94 individuals of C. glaucofraenum and C. venezuelae from the Caribbean through GenBank (Figure 1, Table 1). Tissue and fin clips from field capture were placed in 95% ethanol and frozen for long-term storage. We extracted genomic DNA using a Serapure bead protocol (Rohland & Reich, 2012) and amplified a 690 bp alignment of the COI gene with FishF1 and FishR1 primers using slight modifications (i.e. annealing temp was 35 s at 53°C) from Ward et al. (2005). PCR products were sent to Eurofins Genomics for sequencing. We verified chromatographs by eye using Sequencher version 5.1 (Gene Codes, Ann Arbor, MI, USA).
Sequences were then trimmed and aligned with GenBank samples using mega version 7 (Kumar et al., 2016) followed by file formatting for each analysis using PgDSPiDer (Lischer & Excoffier, 2012).
In order to determine the evolutionary relationships among lineages, we performed a Bayesian phylogenetic analysis using BeaSt2 (Bouckaert et al., 2014) with the HKY + G model of evolution as determined in PartitionFinder (Lanfear et al., 2012). We used a related species, C. tortugae, as an outgroup and performed four independent runs of 100 million generations each with samples being taken every 10,000 generations. Each run was checked in tracer version 1.6 (Bouckaert et al., 2014) to ensure effective sample sizes (ESS) were ≥200 for each parameter. LogcomBiner version 2.4.7 (Bouckaert et al., 2014) was used to discard 10% burnin for each run and combine a subset of trees from each run for a total of 9,000 tree states.
Using this combined file, we used treeannotator to create a 50% majority-rule consensus tree which was viewed in Figtree version 1.4.2 (Rambaut, 2016). We considered Bayesian posterior support values >0.95 to be indicative of highly supported nodes. We also created a TCS (Clement et al., 2000) haplotype network in PoPart (Leigh & Bryant, 2015) to visualize the distribution of haplotypes among clades and populations.
F I G U R E 1 Western Atlantic map of populations of C. glaucofraenum and C. venezuelae used in this study. All sampling points include COI data while circles with dots indicate that SNPs were also used. Haplotype network represents COI data with shading that illustrates populations. Dashed

| COI population genetic analyses
We first summarized basic genetic diversity estimates. In arLequin To evaluate whether barriers impact population connectivity in Brazil, we also tested whether C. glaucofraenum populations within northern, central or southern Brazil were more similar to each other than population pairs across these regions using a Wilcoxon rank-sum test.

| SNP generation and filtering
A reduced-sample SNP dataset was generated using 103 total individuals from 16 populations across the range of both C. glaucofraenum and C. venezuelae including three individuals of C. tortugae as an outgroup ( Figure 1, Table 1). Genomic DNA was converted TA B L E 1 Collection location and genetic diversity estimates for COI and SNP datasets with standard deviation in parentheses. Populations with < 5 samples were not included in estimates of genetic diversity.

| SNP phylogenetic analyses
To estimate evolutionary relationships among species, we first to accurately assess the relationships among species. Constraining these taxa is justified based on the strong support of the COI dataset (see Results). Additionally, we performed a maximum likelihood analysis in raxmL (Stamatakis, 2014) using concatenated loci and a correction bias due to using all variable sites (Lewis, 2001). Here, only two samples from AR with large amounts of missing data were constrained while all other taxa were not. Both trees were visualized and modified in Figtree.

| SNP population genetic analyses
To obtain a comprehensive understanding of the genetic diversity present, we first summarized basic genetic diversity metrics for populations for the SNP dataset. To evaluate population structure within clades identified in the phylogenetic analysis, a Bayesian clustering analysis was performed within each clade and without population location priors in Structure (Pritchard et al., 2000). Here, we performed 10 runs for each population (K) up to the maximum number of populations within each clade using a 50,000-replicate burnin and 500,000 replicates for each run. The Evanno ΔK method (Evanno et al., 2005) was used in StructureharveSter (Earl & VonHoldt, 2012) to determine the most likely value for K.  (Rousset, 2008). Other regions had too few populations to make similar comparisons.
Lastly, we tested four of the most plausible alternative hypotheses of the demographic history of these lineages ( Figure 2

| COI sequence analyses
We found four highly supported monophyletic clades with all four highly divergent from one another (Figure 3)  primary clades ranged from 6.58% (between C. venezuelae and AR C. glaucofraenum) to 13.29% (between Caribbean C. glaucofraenum and Brazil C. glaucofraenum; Table S2). Within C. venezuelae, individuals collected from Venezuela showed strong support for monophyly despite having diverged <1% from the rest of the Caribbean C. venezuelae samples. The haplotype network revealed that none of the 74 haplotypes were shared among clades and a minimum of 27 (AR C. glaucofraenum -C. venezuelae) and a maximum of 45 mutations (C.
The overall star-shape configuration of the haplotype network suggests Brazil C. glaucofraenum and Caribbean C. glaucofraenum have undergone a recent expansion. As with the phylogeny above, the haplotype network showed that within C. venezuelae, the Venezuela population was isolated from any other Caribbean population. In contrast, Brazilian haplotypes were evenly distributed among areas with no structure detected across barriers.

| COI population genetic analyses
Basic genetic diversity estimates are summarized in

| SNP filtering
Although 103 individuals were sent out for SNP genotyping, the final dataset included 91 samples because 12 were removed due to poorquality sequencing. Despite failing to meet the a priori threshold for <20% missing data within an individual (Table S1), the two samples from AR were maintained in the dataset due to their importance for phylogenetic analyses. Locus outlier detection (using Bayescan v.2.1 (Foll and Gaggiotti, 2008)) revealed zero outlier loci in any of the pairwise comparisons ( Figure S1), so all loci were retained for downstream analyses.

| SNP phylogenetic analyses
As with the COI tree, the Bayesian SNP tree showed strong support for four monophyletic clades (Figure 4). There was high support for the overall clade consisting of C. venezuelae, AR C. glaucofraenum and Brazil C. glaucofraenum with strong support for AR and Brazil being sister taxa. Similarly, C. venezuelae nodes were strongly supported, particularly for Belizean individuals, which formed a monophyletic clade. In contrast, samples within the Brazilian clade largely consisted of a polytomy. The maximum likelihood analysis was topologically identical at all major nodes to the Bayesian tree.

| SNP population genetic analyses
Using the ΔK approach, Bayesian clustering analyses in Structure  Figure S3).
For demographic model selection using coalescent simulations, AIC identified Model 3 (Figure 2) as the most likely model (Table 3).
This model suggested that an ancestral Caribbean population dispersed to mainland Brazil. Once established along the Brazilian coast, there was an additional dispersal event to AR.

| D ISCUSS I ON
In this study, we were able to demonstrate that C. glaucofraenum and C. venezuelae, exhibit significant genetic structure throughout the western Atlantic that corresponded to previously described biogeographic barriers. We used two informative datasets to find incongruence between taxonomy and evolutionary relationships.
Overall, we identified two novel clades across the Amazon barrier that are indicative of species-level genetic divergence; one clade was endemic to the Brazilian coast while the other was restricted to Atol das Rocas (AR) off the northeast coast of Brazil.
According to our coalescent simulations, a Caribbean population of C. venezuelae likely colonized Brazil first, followed by dispersal from Brazil to AR. In addition, several population-level barriers were found including an east-west Caribbean divide, isolation of Venezuela from the rest of the Caribbean, the cold-water upwelling at Cabo Frio and possibly the weak barrier from the southern equatorial current (SEC). These results are discussed in more detail below as they relate to phylogeography of marine taxa in the western Atlantic.

| Phylogeny and taxonomy
Even though each monophyletic clade was strongly supported, the relationships among these lineages were discordant between the SNP and COI data. The mtDNA suggests AR C. glaucofraenum and C. venezuelae are more closely related, but SNP data suggests AR C. glaucofraenum and Brazil C. glaucofraenum are more closely related. Although previous studies of coral reef fishes have suggested genetic connections between the Caribbean and AR due to ecologically similar environments (Lima et al., 2005;Pinheiro et al., 2018;Rocha et al., 2005), the relationships determined using SNPs more likely represent an accurate representation of the species tree for two reasons. First, the proximity of AR and the coast (260 km) relative to AR and the Caribbean (>2,000 km) should allow more gene flow to occur across a short distance. Second and more importantly, sampling many genes from across the genome (as was the case with the SNP dataset) is likely to infer a more accurate species tree overall and resolve homoplasy caused by either introgression or incomplete lineage sorting that is likely to occur when analysing only a single (mitochondrial) gene (Brito & Edwards, 2009).
We found the closely related Brazilian and AR lineages were likely formed as a result of dispersal and subsequent isolation from the Amazon River outflow rather than a vicariant event where the Amazon River restricted gene flow between a continuous F I G U R E 4 Bayesian and ML phylogeny of C. glaucofraenum and C. venezuelae using SNP data. Black, grey and white nodes represent posterior probabilities of 1.0, ≥0.95 and ≥ 0.90 respectively. Values above nodes represent bootstrap support from clades found in RAxML analysis. Each clade was constrained to monophyly for the MrBayes tree while only the two samples from AR were constrained in the RAxML tree. This approach resulted in an identical topology between the two approaches. The ML approach used concatenated loci and a correction bias was implemented due to using all variable sites (Lewis, 2001). A GTR + G nucleotide substitution model was implemented followed by 1,000 bootstraps for likelihood estimation [Colour figure can be viewed at wileyonlinelibrary.com] population. The Amazon River is a well-known ecological barrier for many marine taxa and often results in speciation for low dispersal organisms like C. glaucofraenum and C. venezuelae (Briggs & Bowen, 2012;Dias et al., 2019;Floeter et al., 2008). The Amazon River likely intermittently restricted gene flow for the past 9 Myr due to fluctuating sea levels that created or prevented dispersal corridors (Hoorn et al., 2017;Rocha, 2003 Rocha, 2003;Rocha et al., 2005). Given that these fish exhibit a pattern of isolation-by-distance, it may be a combination of geographic distance and ecological differences that result in genetic isolation of AR here. Additional morphological and ecological data are needed to more thoroughly differentiate these taxa.

| Caribbean biogeography
There are two synergistic hypotheses to explain why marine biodiversity is high in the Caribbean: 1) the Caribbean serves as a centre of origin for marine speciation in the western Atlantic (Floeter et al., 2008) and 2) the Caribbean serves as a centre of accumulation from nearby areas (Bowen et al. 2013;Rocha et al. 2008

| Brazilian biogeography
In accordance with our prediction, two possible barriers were found in Brazil that genetically divide north, central and southern Brazil. The weaker of the two barriers separates northern from central Brazil and could be caused by two potential mechanisms. Cabo Frio serves as the southern distribution limit for many taxa (Spalding et al., 2007) and has been known to cause differentiation in crustaceans and fishes (Boschi, 2000;Machado et al., 2017;Santos et al., 2006). Two possible reasons for differentiation across the Cabo Frio barrier are 1) ecological differences across the barrier and 2) currents that prevent larval dispersal. First, the cold water and nutrient upwelling system represents an ecological transition from tropical coral reefs northward to warm temperate rocky reefs southward (Ferreira et al., 2004;Santos et al., 2006). Ecological transitions were also suggested to have caused tropical and subtropical clades to diverge in Chaetodipterus faber (Machado et al., 2017). Second, ocean currents may physically restrict gene flow between central and southern populations. There is a tendency for the Brazil Current to lose pelagic larvae off the continental shelf rather than follow the coastline and maintain connectivity between central and southern Brazil (D'Agostini et al., 2015;Endo et al., 2019). The similarity in biogeographic patterns across many species along Brazil is intriguing and warrants further investigation among different taxa.
Overall, this study has demonstrated how the biogeography of a western Atlantic goby species complex was influenced by genetic connectivity through permeable biogeographic barriers. The Amazon River outflow has isolated Brazilian from Caribbean lineages while the Brazilian oceanic reef lineage has also diverged from the coastal lineage. Furthermore, both COI and SNP datasets provided important information regarding barriers to gene flow within regions. The mtDNA dataset provided widespread sampling throughout the range of both Caribbean species which helped detect the east-west Caribbean barrier, whereas the SNP dataset provided in-depth information concerning the SEC/São Francisco and Cabo Frio barriers in Brazil that were undetectable using a single coarse marker. Lastly, coalescent simulations shed light on the processes that led to observed phylogeographic patterns. This study provides groundwork for future seascape genetics studies to evaluate how different oceanographic features impact geneflow among populations. We suggest additional studies be performed in the western Atlantic across taxa to confirm the processes at work with particular emphasis along Brazil's coast.

ACK N OWLED G EM ENTS
We would like to thank Ramon Noguchi, Ben Victor, Raphael Macieira and Carole Baldwin for providing samples for this study. R.
Macieira helped with species identification. We would like to thank Michelle Gaither and two anonymous reviewers for helpful edits on the manuscript. This project was supported by the UCF Department