Seascape genetics in a polychaete worm: Disentangling the roles of a biogeographic barrier and environmental factors

Seascape genomic studies aim to understand how environmental variables shape species diversity through genotype–environment associations. Identifying these effects on lecithotrophic larval species that live in intertidal zones is particularly challenging because they are subject to environmental heterogeneity and anthropogenic events. Here, we evaluate how biotic and abiotic features in the Southwest Atlantic littoral zone can affect a high dispersal species' present and historical demography.


| INTRODUC TI ON
Marine populations were long thought to be panmictic, with high rates of gene flow (Caley et al., 1996;Palumbi, 1994), explained by the lack of explicit geographical barriers. Some barriers to gene flow in marine systems are seasonally determined, resulting in partially isolated populations (Palumbi, 1994). In most marine species, the larval phase is the dominant dispersal stage, and pelagic larval duration (PLD) is directly linked to dispersal ability (Bohonak, 1999;Levin, 2006;Pascual et al., 2017). Consequently, species with planktonic larval stages are more likely to present better connected populations than those without planktonic stages. Nevertheless, many studies have shown that complex interactions between speciesspecific traits and environmental characteristics result in different patterns of connectivity that cannot be explained solely by larval dispersion through ocean currents (Bernardi et al., 1993;Bernatchez et al., 2018). Environmental conditions like temperature and salinity are suggested to have adaptive importance in shaping gene flow in marine species (Bernatchez et al., 2018;Saenz-Agudelo et al., 2015;Sandoval-Castillo et al., 2017).
Seascape genetic studies of single-nucleotide polymorphisms (SNPs) are highly sensitive for detecting patterns of population isolation and local adaptation because of the high number of markers analysed. A few hundred SNPs can detect discrete changes in gene flow (Bernatchez et al., 2018;Saenz-Agudelo et al., 2015;Selkoe et al., 2016), allowing the identification of genomic regions that are possibly involved in the adaptive process, which provides important insights into a changing environment. Among invertebrates, most studies have focused on species of commercial importance, primarily molluscs and crustaceans. Therefore, these patterns remain unknown for many invertebrate groups, such as polychaetes, which are crucial in marine communities. Polychaeta includes diverse invertebrates under Annelida, comprising burial, planktonic, tubedwelling and reef-building species (Nunes et al., 2021;Pamungkas et al., 2019). They are among the most abundant and species-rich marine metazoans in benthic environments (Díaz-Castañeda & Reish, 2009;Fauchald & Jumars, 1979;Rouse & Pleijel, 2001).
Perinereis ponteni Kinberg, 1865, is a Nereididae polychaete species with a broad spatial distribution in the Atlantic Ocean, from Mexico to Brazil (Figure 1). Although these animals are more easily found in mussels of the Brachidontes genus, they also inhabit algae, oysters and barnacle beds, acting as essential food generalists in these communities. Despite their ecological importance, there have been no detailed studies of their reproduction. The species is known to exhibit epitoky with a planktonic stage and might have a lecithotrophic larval phase like most Nereididae species (Bakken et al., 2018; F I G U R E 1 (a) Perinereis ponteni collection sites from North to South-Sabiaguaba (SBP); Prado (PAP); Gamboa (GBP); Itaoca (ItaP); Prainha (PP); Jabaquara (JP); Praia dura (DuP); Araçá (AP); (b) P. ponteni head detail; (c) known distribution, from Mexico to South Brazil. The projection of the main map is the Equirectangular.  Rouse, 2000). Previous studies using mitochondrial markers to investigate the population structure of P. ponteni on the Brazilian coast found a high level of mixture, with high rates of gene flow among all populations (Paiva et al., 2019).
In this study, high levels of gene flow between the Brazilian populations of P. ponteni were expected. Any disruption in this pattern is likely a result of environmental factors possibly acting on the larval dispersion and post-settlement survival rates as well as the past demographic scenarios. This study is among the first to evaluate the environmental impact on population connectivity of an important invertebrate species in the Southwest Atlantic Ocean. We coupled genomic scale sequencing with ecological association tests, oceanic circulation and niche-based models to understand the present and historic gene flow patterns and how these can be affected by a changing environment.

| Sampling
The specimens were collected from oyster and barnacle clusters in eight localities along the Brazilian coast ( Figure 1; Supplementary Material) from 2017 to 2019. The clusters were collected from exposed rocks and pillars with a spatula and transported to the laboratory in plastic bags. In the laboratory, clusters were submerged in seawater for a few hours before being taken apart. Subsequently, all worms were carefully removed with forceps. The specimens were relaxed using pulverized menthol dissolved in seawater and identified under a stereomicroscope. Subsequently, the worms were fixed in 99% ethanol and stored at −20°C until DNA extraction.

| DNA extraction, library preparation and sequencing
Genomic DNA was extracted from 61 specimens of P. ponteni using the protocol by Doyle and Doyle (1987). DNA quality and concentration were assessed by gel electrophoresis, spectrometry (NanoDrop Lite spectrophotometer; Thermo Fisher Scientific) and fluorometry (Qubit 3.0 Fluorometer; Invitrogen). Approximately 100 ng of the extracted DNA per sample was sent to EcoMol Consultoria e Projetos Ltd., where GBS libraries were prepared according to the protocol described by Elshire et al. (2011) and modified by Nunes et al. (2017).
The libraries were sequenced in 100 bp single-end fragments using the Illumina HiSeq 2500 platform at the Centro de Genômica Funcional ESALQ-USP.
Structure software (Pritchard et al., 2000) was used to analyse the complete SNP dataset, putatively outliers and neutral SNPs (see below), using values of K from 1 to 16 with default settings, except for lambda and alpha values. The alpha value was set to one in all runs, and the lambda value was first estimated in one run with all other values fixed. The estimated lambda value was fixed for 10 other runs. The results from the 10 iterations were compacted and analysed using the R package Structure Harvester (Earl, 2012), where the likelihood of each K was compared, and the most suitable K was selected. The selected K was used to generate a bar plot showing the most likely ancestry of each individual using the R package 'PopHelper' (Francis, 2017).
The dataset was used in the 'vegan' package (Oksanen et al., 2020) as input for the Mantel test (Mantel, 1967) to identify possible isolation-by-distance (IBD), and for the isolationby-environment (IBE) analysis (Wang & Bradburd, 2014). Each collection site was considered a different population for this test.
The genetic distance was inferred using pairwise linearized F ST (Weir & Cockerham, 1984). The smallest distance between the two sites was calculated as the line along the coast connecting the two points using Google Maps (Google Corporation). Mantel and partial Mantel tests were performed with 1000 simulations. Partial Mantel tests were performed separately for each environmental variable using geographic distance as a control for 18 variables (Table S1). The present and past conditions from the mid-Holocene were considered for each variable because they were the same as those used for the palaeodistribution simulations (see below and Table S1, first 18 rows). Climatic distances were calculated as Euclidean distances using the dist function in R.

| Cline analyses
The genetic transition between populations through cline-fitting models was performed in 'hzar' (Derryberry et al., 2014). This R package allows flexibility in fitting cline models of varying complexity owing to its modular design. This package uses the allele frequencies of each SNP and attempts to correlate its frequency with geographic distribution, creating a cline-fitting model for each allele.
Only 100 SNPs with a higher contribution to principal component analysis (PCA) structuring were used ( Figure S1), as they had the most contrasting frequencies among the populations. Sixteen models, including a null model, were tested by combining the different tails and scaling values for each SNP (Table S2). Three independent runs for each model were conducted, each consisting of 20,000 iterations, following an initial burn-in phase of 500 iterations. The model with the lowest Akaike information criterion (AIC) for each locus was retained. Only the results where the null model was not accepted are discussed.

| Demographic analyses and estimates of historical migration
The migration matrix and demographic scenarios were estimated using Fastsimcoal 2.7 (Excoffier & Foll, 2011). The minor allele site frequency spectrum parameter computed with Arlequin with 1000 bootstraps was used as the input. In the population expansion scenarios, the two clusters identified in the Structure software were used as populations, and the two models were tested. The first model estimated population expansion from the northeast (NE), with posterior migration towards the southeast (SE). The second considered population expansion from the SE, with posterior northward migration. Fifty independent replicates were performed for each model. Each of these replicates included 40 estimation loops with 300,000 coalescence simulations. Given the observed data, the probability of each model was determined based on the maximum likelihood value and AIC. The historical migration rates were estimated using each locality as a population, with 10 independent replicates, each including 40 estimation loops with 60,000 coalescence simulations, and assuming current migration between all pairs of populations. In both cases, the mutation rate was also estimated using Fastsimcoal, with the first cluster's results ranging from 1e −9 to 1e −6 .
The population connection through larval dispersion was simulated using OpenDrift (1.7.1, Dagestad et al., 2017), an open-source particle tracking framework that uses the Eulerian velocity and thermohaline fields from an Eulerian model and a fourth-order Runge-Kutta (Edwards et al., 2006) scheme to transport particles within the domain. The OpenDrift module Larvalfish has been adapted to P. ponteni larvae, where the functions of fish growth (Folkvord, 2005) and vertical migration of larvae were applied to fit the module to the maximum length of the organism (~310 μm) and a maximum depth of vertical migration (−2.5 m). The adapted module considers the drift composed of ocean currents, wind drift at the surface, vertical mixing, vertical migration, the hatching from an egg to larvae and its continuous growth and the terminal velocity, according to Sundby (1983).
The between the other two data locations is considerable; therefore, the currents would not allow a displacement long enough to establish a direct connection between those populations.

| Outlier SNPs
To identify SNPs under selection, the SNP matrix was used as input in BayeScan 2.0 (Foll & Gaggiotti, 2008), using 20 pilot runs of 50,000 iterations followed by 100,000 simulations with prior odds of 100 and 1% false discovery rate. Each location was considered a different population.
The genotype-environment association method redundancy analysis (RDA) was applied to detect possible SNPs under selection (Forester et al., 2018). The environmental variables analysed were the same as those used in the IBE inferences (Table S1). First, a correlation analysis was performed using all 18 variables, and 11 highly correlated variables (|r| > 0.7) were removed. The remaining seven variables (annual mean sea surface salinity, annual mean sea surface temperature, bathymetry, east-west aspect, north-south aspect, profile curvature, and concavity) were set as RDA predictors, and only the SNPs indicated by the significant RDs were considered.
Possible associations between environmental variables and SNP frequency were also assessed with the latent factor mixed model tests (LFMM; Frichot et al., 2013;Rellstab et al., 2015) implemented in the R package 'LEA' (Frichot & François, 2015). To select the environmental variables, we first ran a PCA using 24 environmental variables (Table S3)

| Niche-based inferences
The palaeodistribution of P. ponteni was based on niche-based models that use known occurrences and climate variables to pre-

| Population structure
The Structure results from the matrices with 23,300 SNPs  (Table S8). For AMOVA analyses, we defined each sampling site as a population and the two clusters defined by Structure as groups. The results revealed that most of the variation was within individuals (84.05%), followed by variation among individuals in each population (13.14%) (Table S9).

| Cline analyses
Of the 100 SNPs that contributed the most to genetic structuring, five were outlier SNPs, and 95 were putatively neutral SNPs. The null model had the lowest AIC values for all outlier SNPs and the 81 putatively neutral SNPs. Therefore, these SNPs cannot produce a cline-fitting graph. The Hzar test indicated a cline in the geographic area between the two clusters delimited by the Structure software.
Fourteen putatively neutral SNPs presented the lowest AIC for Model 1 (no tail and fixed scaling) ( Figure S3). The centre slope of these clines is approximately 2900 km from Sabiaguaba (NE), which is near Prainha (SE

| Demographic analyses and estimates of migration
The optimal-fitting model tested by Fastsimcoal, according to the highest likelihood (−6415.83) and the lowest AIC (29,258), was the one in which the population divergence initiated in the northeastern region towards the south approximately 555,000 generations ago.
The best model assumed an initial unidirectional gene flow, followed by bidirectional migration (Figure 3).

| Outlier SNPs
BayeScan identified 12 outlier SNPs, all were indicative of positive selection ( Figure S4). In contrast, LFMM indicated the presence of 404 loci (1421 SNPs) with allele frequencies correlated to the eight environmental variables tested (Table S11; Figures S5 and S6). However, we were able to identify genes responsible for intracellular transport (kinase heavy chain) and cytoskeletal dynamics (myosin), among others (Table S12).
Outlier SNPs were also used to investigate the population structure to evaluate any possible structuring related to environmental variables. The PCA results exhibited the same pattern for all loci (data not shown). Structure indicated the presence of three genetic clusters. The first comprised specimens from SE locations up to Rio de Janeiro, the second included specimens from the remaining localities and the third cluster was observed in all individuals ( Figure S8).
The Structure results using only putatively neutral SNPs revealed only two genetic clusters ( Figure S9).

| Niche-based inferences
The Sul (South region) was highly suitable for species survival (Figure 6b).
The current spatial distribution simulation showed similar results, with all coasts being highly suitable for P. ponteni (Figure 6c).

According to the suitability distribution, the southern coast of South
America and the Amazonas River estuary show lower values of suitability, indicating that these areas might act as permeable barriers.

| DISCUSS ION
We attempted to assess genomic diversity and possible factors influ- According to Rouse (2000), all Nereididae species have lecithotrophic larvae with a shortened PLD until settlement compared to planktotrophic larvae (Hoegh-Guldberg & Emlet, 1997). The PLD is long enough for larvae to travel thousands of kilometres, connecting the Brazilian populations carried by ocean currents. The detected low but significant value of the global F ST (0.037) is an indication of limited connectivity. However, the absence of IBD and the presence of IBE (indicated by the significant correlation between genetic and temperature differences) suggest that the environment, and not the geographic distance, is responsible for such genetic differentiation. The presence of IBE in the absence of IBD was detected for other marine species, like the clownfish Amphiprion bicinctus (Nanninga et al., 2014;Saenz-Agudelo et al., 2015), the shore crab Cyclograpsus punctatus and the Cape urchin Parechinus angulosus (Nielsen et al., 2020).
The difference in temperature in this case seems to affect not only the dispersion phase (i.e. larvae) but mainly the adult phase when P.
ponteni has lower motility and is exposed to local conditions for extended periods (Bakken et al., 2018). In addition, the temperature can affect these animals directly and indirectly by changing the amount of dissolved O 2 and microbiota abundance (White et al., 1991).
Structural results support this possibility, highlighting a boundary for migration in the northern limit of the SE region around Cabo São Tomé, where something seems to act as a boundary for connectivity.
Structure, PCA, cline analyses and oceanic circulation models indicated that this boundary is located around the Cabo Frio region.
The Cabo Frio region, which is the most known region of the upwelling system, seems to be a gene flow barrier or a sieve for many marine species, such as the crustaceans Excirolana braziliensis and Litopenaeus schmitti (Hurtado et al., 2016;Maggioni et al., 2003), the coral Mussismilia hispida (Peluso et al., 2018), the fish Mugil liza (Mai et al., 2014) and even the dolphin Pontoporia blainvillei (Lázaro et al., 2004). Mainly during the summer, this region receives strong winds from the continental area that push the warm coastal waters, giving rise to deeper cold waters rich in nutrients and forming a constant upwelling (Martins et al., 2021;Valentin et al., 1987). Therefore, the temperature and nutrient regimes in this area are quite different from those in subjacent zones and are thought to alter the local community and act as a semipermeable boundary for some species (Mai et al., 2014;Martins et al., 2021;Peluso et al., 2018;Volk et al., 2021). In addition, our results indicate a hydrodynamic barrier preventing larval exchange between adjacent areas (such as Prainha and Gamboa) rather than an ecological barrier. Hydrodynamic barriers are of cause isolation among coral species between the Great Barrier Reef and Lord Howe Island (Keith et al., 2015;Wood et al., 2014), which might also be the cause of the pattern seen here.

| Palaeodistribution and origin of present populations
A previous study on mitochondrial genes found evidence of an important founder effect in P. ponteni populations along the Brazilian coast (Paiva et al., 2019). This is congruent with our findings in pal-  (Steig, 1999). The ocean currents and ecological factors were also similar to the present ones (Gu et al., 2018), and the species distribution in this period was very similar to the present.

| Signals of adaptive selection
Of the 23,300 SNPs analysed, 1571 were supposedly under selection according to BayeScan, RDA and LEA, representing over 450  (Table S13). RDA and LFMM have a similar power to identify genetic markers related to environmental variables (Capblancq et al., 2018). RDA has the advantage of being able to detect the main environmental gradients. In addition to IBE, indicating the importance of temperature in these populations, a potential association between allele frequency and temperature was supported by the RDA and LFMM, indicating that this variable also has a high contribution to the genotype distribution, according to the pattern observed in Figure S7. Temperature variation was already reported as being capable of driving population divergences for invertebrate species across different geographic scales (Benestan et al., 2016;Miller et al., 2019;Takeuchi et al., 2020). Moreover, other predictors seem to be relevant to the distribution of allele frequencies, such as salinity, bathymetry and curvature profile. Such variables play an essential role in the performance and survival of species with planktonic larval stages, especially for those who inhabit rocky shores. Furthermore, three-dimensional features, such as curvature profile, can cause widespread impacts on biodiversity and functional traits (Borland et al., 2022;Demopoulos et al., 2018), since it influences water flow dynamics across the surface. These assumptions are related to how geomorphic metrics can distinguish different benthic communities, even for polychaetes (De Leo et al., 2014), showing again that environmental heterogeneity, rather than only the geographic distance, is responsible for the genetic differentiation (Harley & Helmuth, 2003). This study sheds light on the influence of temperature on connectivity among Brazilian populations of P. ponteni, as indicated by the genetic boundary in the upwelling region. Many outlier loci discovered here were directly (971 SNPs) or indirectly (863 SNPs) related to temperature. This is not only due to the disruption of local conditions but also because ocean currents are determined by the differences in ocean temperature and salinity, both of which can change dramatically in the coming years (Voosen, 2020). This change in the velocity and direction of the currents will most likely affect the connection between populations, affecting gene flow in the species.
Not only does temperature affect species survival or fitness, but other environmental factors such as primary production may also significantly affect Perinereis fitness. This study reveals that environmental changes can also affect polychaetes at the genomic level.
It might be an interesting model for analysing the oceanic warming effect and its consequences on natural populations. de São Paulo) and its staff for providing the essential laboratory facilities and logistics. We would like to thank Editage (www.edita ge.com) for the English language editing and both reviewers for their insightful comments, which improved the manuscript.

CO N FLI C T O F I NTE R E S T
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
The datasets analysed in this study can be found in the NCBI SRA da-