A genomic island linked to ecotype divergence in Atlantic cod

The genomic architecture underlying ecological divergence and ecological speciation with gene ﬂow is still largely unknown for most organisms. One central question is whether divergence is genome-wide or localized in ‘genomic mosaics’ during early stages when gene ﬂow is still pronounced. Empirical work has so far been limited, and the relative impacts of gene ﬂow and natural selection on genomic patterns have not been fully explored. Here, we use ecotypes of Atlantic cod to investigate genomic patterns of diversity and population differentiation in a natural system characterized by high gene ﬂow and large effective population sizes, properties which theoretically could restrict divergence in local genomic regions. We identify a genomic region of strong population differentiation, extending over approximately 20 cM, between pairs of migratory and stationary ecotypes examined at two different localities. Furthermore, the region is characterized by markedly reduced levels of genetic diversity in migratory ecotype samples. The results highlight the genomic region, or ‘genomic island’, as potentially associated with ecological divergence and suggest the involvement of a selective sweep. Finally, we also conﬁrm earlier ﬁndings of localized genomic differentiation in three other linkage groups associated with divergence among eastern Atlantic populations. Thus, although the underlying mechanisms are still unknown, the results suggest that ‘genomic mosaics’ of differentiation may even be found under high levels of gene ﬂow and that marine ﬁshes may provide insightful model systems for studying and identifying initial targets of selection during ecological divergence.


Introduction
The genomic architecture underlying adaptation to local environments and ultimately ecological speciation (Schluter 2001;Nosil 2012) is poorly understood for most organisms (Wu 2001;Nosil et al. 2009;Feder et al. 2012a,b). Recent studies have suggested that, during early stages of ecological divergence where gene flow is still on-going, genetic differentiation may be limited to a few specific genomic locations, or 'genomic islands', while the majority of the genome remains homogenized by gene flow (Wu 2001;Turner et al. 2005;Via & West 2008;Nosil et al. 2009;Feder et al. 2012a,b). Various mechanisms, such as chromosomal inversions (Kirkpatrick & Barton 2006;Feder et al. 2011), divergence hitchhiking (Via & West 2008) and processes promoting the genomic co-localization of genes involved in adaptation Yeaman & Whitlock 2011), have been proposed as potential mechanisms that would allow differing levels of divergence to evolve within a single genome in the face of gene flow. However, theoretical work has indicated that the conditions, with respect to the relative strengths of selection and gene flow, available for such mechanisms to operate can be relatively restricted (Feder & Nosil 2009, 2010Feder et al. 2011Feder et al. , 2012b and that genome-wide divergence should be more common due to the effects of reproductive isolation and selection on multiple loci, leading to genome-wide reductions in gene flow (Feder & Nosil 2010). While high gene flow has been predicted to constrain the formation of localized genomic divergence (Feder & Nosil 2009, 2010, it has also been suggested that gene flow should promote the clustering of genes involved in local adaptation (Yeaman & Whitlock 2011). Moreover, divergence limited to specific genomic regions should in fact be most readily observable early in the process of divergence, for example, between ecotypes (Mallet 2008), rather than at later stages where gene flow is more restricted and genomic divergence pronounced (Via 2009;Weetman et al. 2012).
Hitherto, the investigation of genomic patterns associated with ecological divergence has been restricted to a few, well-known model systems, such as walking stick insects (Nosil et al. 2008), Heliconius butterflies (Nadeau et al. 2012), pea aphids (Via & West 2008), malaria mosquitos (Turner et al. 2005;Lawniczak et al. 2010), coregonid whitefish (Bernatchez et al. 2010), three-spined stickleback (Shapiro et al. 2004;Colosimo et al. 2005;Roesti et al. 2012a) and salmonids (Miller et al. 2012). Marine fishes provide excellent models for studying interactions between gene flow and selection in the wild because they are often distributed over diverse ecological habitats and are typically characterized by high levels of gene flow and large effective population sizes (Nielsen et al. 2009a). However, although population genetics of non-model organisms, including most marine fishes, has recently moved from the analyses of neutral processes towards targeting adaptation to local environments (Luikart et al. 2003;Nielsen et al. 2009a;Helyar et al. 2011), no studies have yet investigated the genomic architecture associated with ecological divergence in these taxa.
Atlantic cod, Gadus morhua, has a wide geographical distribution and exploits diverse ecological niches (Mieszkowska et al. 2009), ranging from brackish to highly saline environments, and from low temperatures in the Arctic to high and variable temperatures in the southern parts of the distribution (Righton et al. 2010). As typical for marine fishes, population structuring is generally shallow (Nielsen et al. 2003;O'Leary et al. 2007), suggesting high levels of gene flow (Waples 1998) and large effective population sizes (Poulsen et al. 2006;Therkildsen et al. 2010). Thus, both gene flow and natural selection are predicted to shape genomic patterns of divergence among populations.
Ecologically distinct ecotypes, usually characterized as 'migratory' and 'stationary' behavioural types, have been described for cod in both eastern and western parts of the Atlantic (Palsson & Thorsteinsson 2003;Robichaud & Rose 2004;Grabowski et al. 2011;Nordeide et al. 2011). In the eastern Atlantic, these ecotypes are well described in both Iceland and Norway. Migratory individuals are also named 'frontal cod' in Iceland and 'Northeast Arctic cod' in Norway, while stationary individuals are known also as 'coastal cod' in Iceland and 'Norwegian coastal cod' in Norway. In general, migratory ecotypes exploit deeper and more offshore habitats at some times of the year compared to stationary individuals which frequent coastal water habitats during their entire life (Palsson & Thorsteinsson 2003;Nordeide et al. 2011). Migratory individuals from both locations may also undertake pronounced vertical migrations and cross-thermal fronts, formed where warm Atlantic and cold Arctic water meet, during the feeding season (Stensholt 2001;Palsson & Thorsteinsson 2003;Pampoulie et al. 2008a,b) 3 . Furthermore, Norwegian migratory individuals are characterized by long-distance migrations, for example, the~800 km migration from Lofoten on the Norwegian coast to the feeding areas in the Barents Sea (Jørgensen et al. 2008;Sundby & Nakken 2008). In addition to migratory and feeding characteristics, differences in several other lifehistory-related traits, such as growth rate and age at maturity, and in bioenergetics (Pardoe & Marteinsdottir 2009;Nordeide et al. 2011) suggest pronounced ecological differences between the two ecotypes [see Nordeide et al. (2011) for a comprehensive review]. Thus, it is likely that the two ecotypes represent divergent lifehistory strategies encompassing several behavioural and physiological characteristics of adaptive importance in both Iceland and Norway. Although the ecotypes are ecologically distinct, there is a potential for hybridization between the two types as spawning areas overlap in some regions (Grabowski et al. 2011;Nordeide et al. 2011). Individuals displaying an intermediate type of behaviour have been identified through electronic tagging of fish in the wild (Grabowski et al. 2011), suggesting that hybridization may occur in nature, but the degree of interbreeding and level of gene flow between ecotypes is presently unknown. Traditionally, morphological characters, such as ear bone structures (otoliths), and single gene markers, such as the membrane protein gene pantophysin (Pan I), have been used to designate individuals as either migratory or stationary (Berg & Albert 2003;Pampoulie et al. 2008a,b;Wennevik et al. 2008). Recently, population genetic work has provided some molecular evidence for adaptive divergence between the ecotypes from Norway (Moen et al. 2008;Nielsen et al. 2009b), and the finding of consistent migratory profiles over consecutive years for individual fish has suggested a genetic basis for ecotypic divergence in Iceland (Thorsteinsson et al. 2012). Yet, the evolutionary relationship between ecotypes is still largely unknown (Nordeide et al. 2011) as is the underlying genomic architecture associated with the observed ecotypic differentiation. Furthermore, despite the ecological similarities described earlier, the evolutionary relationship between Norwegian and Icelandic populations in these parallel systems has not previously been explored.
Here we investigate genomic signatures associated with ecological divergence in a high gene flow scenario. We use the migratory and stationary ecotypes in Atlantic cod as a model system and examine single nucleotide polymorphisms (SNPs) in population samples of both ecotypes from the two partially isolated systems in Iceland and Norway, along with reference samples from the major population complexes in the species. Information from the Atlantic cod linkage map and the Atlantic cod genome assembly is used to investigate genomic patterns associated with ecotypic divergence.

Sampling
Tissue samples of 31-40 adult individuals were collected from each of seven spawning locations and one feeding ground ( Fig. 1 and Table 1). Samples representing stationary ecotypes, named 'coastal cod' or 'stationary cod' in Iceland and 'Norwegian coastal cod' in Norway, and migratory ecotypes, named 'frontal cod' or 'migratory cod' in Iceland and 'Northeast Arctic cod' in Norway, were collected from spawning grounds from Iceland and Norway, and individuals were assigned to ecotype based on sampling location and depth (Iceland) and ear bone (otolith) morphology [Norway, see also Wennevik et al. (2008)]. In Iceland, samples were collected in inshore waters (depth: 58 m), known to be mainly inhabited by the stationary ecotype, and from a deeper offshore location (depth: 135 m), where the migratory ecotype has been suggested to predominate (Pampoulie et al. 2006(Pampoulie et al. , 2008a. In Norway, stationary and migratory ecotypes were collected on spawning grounds near the island of Lofoten on the northern Norwegian coast. Due to overlapping spawning areas between the two ecotypes (Grabowski et al. 2011;Nordeide et al. 2011), there is a risk of including hybrids and/or misclassified individuals in samples collected from spawning areas. Thus, we included a sample from the extreme northern feeding grounds in the Barents Sea ( Fig. 1 and Table 1), which are used only by the migratory ecotype (Nordeide et al. 2011) and therefore represents a pure 'migratory' ecotype sample. To relate findings from the stationary/migratory comparison to neighbouring areas, we also included one sample from the highly divergent Baltic Sea (Nielsen et al. 2001) and a sample from the North Sea, representing populations near the southernmost part of the distribution in the eastern Atlantic. Finally, one western Atlantic sample was included as an outgroup. Thus, with the reference populations, the sampling scheme targeted the major population complexes in the species (O'Leary et al. 2007;Bigg et al. 2008). The reference populations in the North Sea and the Baltic Sea are not known to undertake long-distance migrations. However, to allow a direct comparison between the two ecotypes, we refer only to the 'stationary' ecotype where it can potentially interbreed with the 'migratory' ecotype.
To assess temporal stability of genomic patterns, we also analysed temporally replicated samples collected from migratory and stationary populations from Norwegian spawning grounds (Lofoten) and from reference populations in the North Sea and Baltic Sea (Table 1).
Genotyping and initial data filtering DNA was recovered from samples using the Omega EZNA Tissue DNA kit (Omega Bio-Tek) and subsequently normalized to 50 ng/lL. Samples were genotyped for 1536 single nucleotide polymorphisms, most of which were originally developed from EST sequences from western Atlantic cod populations [(Hubert et al.  Table S1, Supporting information], using Illumina's GoldenGate SAM assay on the Bead Array Reader platform. Data were checked against internal sample-independent quality controls, clustered and the resulting genotypes then edited manually using the proprietary GENOMESTUDIO software 4 . A replicate individual was included on all plates to ensure genotype reproducibility. Loci with low signal and/or poor clustering were excluded from the analyses.

Linking to the genome assembly
We used the published linkage map consisting of 1310 SNPs  to infer linkage group and position within linkage group for individual SNPs. In addition, a number of SNPs were anchored to the linkage map by mapping the 120 bp flanking sequence of each SNP, available in public data bases, onto the ATLCOD1A genome assembly (Star et al. 2011) using BLASTN with an e-value threshold of 10 À10 . While these SNPs could be assigned to linkage groups, their position within linkage groups is unknown. We highlight loci in linkage groups previously found to be targets of selection in Atlantic cod [i.e. loci in linkage groups 2, 7 and 12, see Bradbury et al. (2010)] along with loci in linkage group 1, which was found to be highly differentiated between ecotypes in this study (see Results). The ATLCOD1A genome assembly was also used to estimate the distance (in base pairs) between adjacent SNPs located within the same scaffolds.

Population genetic analyses
For each analysis, loci fixed in all population samples and loci with more than 15% missing genotypes in any sample were removed. Conformance to Hardy-Weinberg equilibrium was tested for each locus in each sample with the package GENETICS v. 1.3.4 for R (R development core team 2011). To exclude loci with consistent HWE departures across samples, we excluded loci deviating at the 5% level of significance in more than half of the eight samples. This filtering should assure that loci deviating due to systematic technical or biological reasons were excluded from the analyses. When examining departures from Hardy-Weinberg equilibrium across loci within each sample, we corrected results for multiple testing using a false discovery rate (FDR) threshold of 5%. FDR correction was done with the package STATS for R, following (Benjamini & Hochberg 1995).
Individual locus pairwise F ST coefficients, following (Weir & Cockerham 1984), were estimated with the R package GENELAND (Guillot et al. 2005), and mean and 95% confidence intervals were estimated from 1000 data sets generated by bootstrapping over loci.
Population structuring over all loci was examined through correspondence analysis in the package ADEGE-NET for R (Jombart 2008), using six axes to describe the relationship among the seven eastern Atlantic population samples. In addition to the full data set, overall pairwise F ST was estimated and correspondence analysis conducted on a data set where highly divergent outlier loci identified through Bayesian regression had been excluded. Loci in the reduced data set were presumed to be primarily affected by neutral evolutionary forces, such as gene flow and genetic drift. We also investigated the effects of removing loci with global minor allele frequencies below 10% in both the full and the reduced data set, as the correspondence analyses gives higher weight to rare alleles (Jombart et al. 2009), potentially biasing these analyses. Observed levels of heterozygosity within samples were estimated for each locus with the R package GENETICS v. 1.3.4, and the R package ZOO was used to calculate moving averages of single locus estimates with a window size of 10 SNPs along each individual linkage group.
A statistical test for F ST outliers was conducted by the Bayesian regression method implemented in BAYESCAN 2.1 (Foll & Gaggiotti 2008). The method uses reversiblejump Markov chain Monte Carlo sampling to estimate posterior odds for a model with selection against a model without selection for individual loci. Prior odds for a model without selection were set to 10:1 and 20 pilot runs of each 5000 samplings were used to adjust acceptance rates and to obtain a prior estimate of mean and variance of parameter distributions. Pilot runs were followed by an additional burn in of 50 000 and 5000 samplings with a thinning interval of 10 for the estimation of posterior distributions. The false discovery rate was controlled at 5% with the R function plot bayescan distributed with the package (available from http:// cmpg.unibe.ch/software/bayescan/ 5 ). Outliers were identified in a data set excluding the highly divergent western Atlantic sample to reduce bias due to hierarchical levels of population structuring (Excoffier et al. 2009) and to allow a more detailed investigation of patterns among eastern Atlantic samples. Loci with minor allele frequencies below 2% across all samples were excluded as loci with low information content may bias computations (Beaumont & Balding 2004). The additional filtering step reduced the number of loci to 975 in this analysis. As loci with low levels of variation may bias outlier tests due to a depression of global F ST (Roesti et al. 2012b), we estimated global F ST for different minor allele frequency thresholds in the eastern Atlantic data set to examine whether the chosen threshold had an effect on global F ST . In addition, we conducted the outlier test for a data set where loci with a minor allele frequency below 10% had been excluded to examine whether outliers were confirmed at a more stringent threshold.

Data filtering and control
Following genotyping and initial data filtering, 295 individuals and 1282 loci were exported for statistical analyses (Table S1, Supporting information). Data quality among retained loci was generally high, with 95% of loci having an average GenCall (GC) score above 0.61 for called genotypes. Initial blast results identified three pairs of identical loci mapping to the same scaffold and position within scaffold (Table S1, Supporting informa-tion). One locus from each pair was removed from further analyses. Ten loci were removed from all analyses due to departures from Hardy-Weinberg equilibrium in more than half of the eight samples. After this filtering, only a few loci (between 0 and 11, see Table S1, Supporting information) deviated significantly in each sample, suggesting conformance to Hardy-Weinberg expectations within each of the sampled populations. Following the removal of loci fixed in all population samples and loci with more than 15% missing genotypes in any sample, 1199 loci remained for further analyses when all eight population samples were used. For analyses focusing on the seven eastern Atlantic samples, similar data filtering resulted in a data set consisting of 1164 loci. The lower number resulted from a higher number of monomorphic loci among these samples. In addition, observed levels of heterozygosity (Ho) were similar in the eastern Atlantic and Baltic Sea (range of average Ho: 0.23-0.26), but lower than in the western Atlantic (average Ho: 0.34, Table S1, Supporting information), indicating effects from ascertainment bias (see also Discussion).

Genomic distribution of SNPs
The majority of analysed loci, 983 of 1199, were already placed on the linkage map (Table S1, Supporting information). In addition, we were able to assign linkage groups to another 161 SNPs, although with unknown position within linkage groups, through blasting against the ATLCOD1A genome assembly (Table S1, Supporting information). Among the remaining 55 loci, 32 SNPs did not map to a scaffold while 23 SNPs were found in scaffolds that did not contain mapped SNPs. Thus, these loci could not be assigned to any linkage group. While most loci mapped to a scaffold, 227 SNPs mapped to scaffolds containing just the one SNP. The remaining loci were distributed on 236 scaffolds, with the majority of scaffolds containing only few SNPs (Fig. S1, Supporting information). This distribution illustrates the relatively fragmented nature of the current genome assembly. The distribution of distances between adjacent SNPs within scaffolds was also skewed towards lower values (Fig. S2, Supporting information). Thus, the distance to the previous SNP within the same scaffold was below 50 000 bp for most loci and only few pairwise distances were above 1 Mb.

Population genetics
Correspondence analysis showed marked differences between the two ecotypes with migratory and stationary samples forming completely separate clusters, each containing both Icelandic and Norwegian samples,  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53 when all markers were included in the analysis (Fig. 2a). In contrast, these samples grouped according to geographic origin when a reduced 'neutral' data set (i.e. where 87 significant and highly divergent outlier loci had been removed, see also below) was analysed (Fig. 2b). The North Sea and Baltic Sea samples, representing geographically isolated samples, were also genetically isolated in both data sets (Fig. 2). These results were confirmed when loci with a minor allele frequency below 10% were removed (Fig. S3, Supporting information), illustrating that these global patterns were robust to the inclusion of rare alleles. The patterns were supported by estimates of pairwise F ST (Table S2, Supporting information). With the reduced (neutral) data set, confidence intervals overlapped with zero when comparing ecotypes from spawning grounds within localities. In contrast, although pairwise F ST estimates were low, confidence intervals did not overlap with zero when similar ecotypes were compared across the two localities (Table S2, Supporting information).
Levels of population differentiation, assessed through individual locus pairwise F ST , varied along the linkage groups ( Fig. 3; see also Fig. S4, Supporting information for all comparisons). The pairwise comparisons of migratory and stationary ecotypes collected in both Norway and Iceland (Fig. 3a-c) showed markedly increased levels of differentiation for loci in linkage groups 1, 2 and 7 in addition to a few loci that were not mapped to a linkage group. In contrast, the pairwise comparisons between similar ecotypes across geographic locations (Fig. 3d,e) showed that differentiation was very shallow across all linkage groups. The pairwise comparison between the southernmost eastern Atlantic location from the North Sea and the Norwegian stationary ecotype collected in the northern Atlantic (Fig. 3f) revealed elevated levels of structure for loci in linkage groups 2, 7 and 12, while most remaining loci were weakly differentiated, thus confirming earlier findings of high differentiation in these linkage groups (Bradbury et al. 2010). The comparison between the North Sea and the Baltic Sea samples (Fig. 3g), representing reproductively isolated populations (Nielsen et al. 2003; see also Discussion), showed elevated differentiation for loci across most linkage groups, as did the comparison between the North Sea and the western Atlantic sample (Fig. 3h).
Observed levels of heterozygosity also varied among linkage groups (Fig. 4). Remarkably different patterns in the distribution of heterozygosity were observed among the populations, with dramatic reductions in linkage group 1 in the migratory ecotype samples (Fig. 4a-c). In addition, reduced levels of heterozygosity were observed in linkage group 7 for the migratory ecotype samples (Fig. 4a-c), the North Sea population sample (Fig. 4d) and the western Atlantic sample (Fig. 4h), while the stationary ecotype samples showed increased levels of heterozygosity for the same genomic region (Fig. 4e,f).
Eighty-seven high F ST outlier loci were identified through Bayesian regression on a data set excluding the highly divergent western Atlantic sample and loci with a minor allele frequency below 2%. These outlier loci were primarily located in linkage groups 1, 2, 7 and 12 (71 of 87 outliers; Table S3, Supporting information). Global F ST changed only slightly (from 0.056 to 0.065) between minor allele frequency thresholds of 0% and 20% (Fig. S5, Supporting information). Changes in global F ST were larger for thresholds above 20%, but these analyses only included few loci because most of the loci were removed from analysis at these very high thresholds. In addition, an outlier test including only loci with minor allele frequencies above 10% identified almost the same set of outliers as the test applied on loci with minor allele frequencies above 2% (only four outlier loci were not identified with a threshold of 10%, see Table S3, Supporting information). Thus, results from the outlier test appear very robust to the effects of loci with low information content [see also discussion in Roesti et al. (2012b)].
Patterns of single locus population differentiation and genetic diversity were confirmed when temporal replicates of the samples from the North Sea, the Baltic sea and both migratory and stationary ecotypes from Norwegian spawning grounds were analysed (Figs S6 and S7, Supporting information). Differentiation was increased in linkage groups 1, 2 and 7 in the comparison between the two ecotypes, while differentiation was increased in linkage groups 2, 7 and 12 in the comparison between the North Sea and the stationary samples. Differentiation was low across the remaining linkage groups in these comparisons, while differentiation was high across all linkage groups in comparisons involving the Baltic Sea sample (Fig. S6, Supporting information). Genetic diversity was drastically reduced in linkage group 1 in the migratory sample. In addition, linkage group 7 showed decreased diversity in the migratory and North Sea samples, while it showed increased diversity in the stationary sample. Finally, loci in linkage group 12 showed decreased diversity in the North Sea sample (Fig. S7, Supporting information). These results indicate temporal stability of observed patterns. A detailed investigation of the loci in linkage group 1 revealed that loci displaying elevated levels of population differentiation between migratory and stationary ecotypes were located between 14.3 and 37.2 cM (Fig. 5 and Table S4, Supporting information). This pattern was evident for both Norwegian and Icelandic comparisons. The previously intensely studied locus in the gene pantophysin (Pan I) is located at position 25.1 cM in this linkage group [  and Table S1, Supporting information].

Discussion
In addition to identifying a region of high differentiation between ecotypes in linkage group 1, we confirmed earlier findings suggesting selection in linkage groups 2, 7 and 12 in Atlantic cod (Bradbury et al. 2010). However, these signals were not specifically associated with the migratory ecotype as was the case for the highly differentiated region in linkage group 1. The region of elevated differentiation between ecotypes extends over

Origin of migratory ecotype
Despite decades of research on the ecotypes in both Norway and Iceland (Palsson & Thorsteinsson 2003;Nordeide et al. 2011), no study has so far directly compared populations from the two regions through the use of a large number of genetic markers. Genetic differentiation between Norway and Iceland (across ecotypes) revealed with neutral genetic markers (Fig 2b  and Table S2, Supporting information) suggests reproductive isolation between these locations. Yet, results illustrate marked similarities in genomic signatures associated with ecotypic divergence. Thus, although the description of the ecotypes (or behaviour types) in Icelandic waters has so far only been based on the information from data storage tags (Palsson & Thorsteinsson 2003;Pampoulie et al. 2008a,b;Grabowski et al. 2011), our study confirms the presence of two divergent groups in coastal and deep off-shore locations, respectively. The region of increased differentiation between ecotypes is also characterized by dramatically reduced levels of diversity in samples representing the migratory ecotype, a classical signal of a selective sweep (Storz 2005). This suggests that initially these populations may have experienced a selective sweep involving the specific region on linkage group 1.
Extremely shallow population differentiation across most of the genome (Fig. 3a-c) as well as the close relationship among populations within geographic locations (across ecotypes), as estimated with neutral genetic markers (Fig. 2b), suggest two possible scenarios for the origin of migratory ecotype populations. In one scenario, the migratory ecotype arose twice through convergent evolution in two parallel systems (Iceland and Norway) following colonization after the last glacial maximum (LGM) around 21 000 years ago. Similarities within geographic regions (Fig. 2b) could then reflect shared ancestry and recent divergence ) rather than effects from gene flow between ecotypes. However, highly divergent allele lineages for one gene in the region affected by the selective sweep, pantophysin (Pogson & Mesa 2004), suggest that the split of the two ecotypes is ancient compared to the LGM. If the pantophysin gene is representative for the region, these data suggest that recent convergent adaptation is not likely. In contrast, a more parsimonious scenario is that the two ecotypes were already present when deglaciated regions around Iceland and Norway were colonized following the LGM (Kettle et al. 2011) and that the geographically based structure at neutral markers is caused by on-going gene flow between ecotypes within localities. This scenario is also consistent with the hypothesized, although still highly speculative, existence of both coastal and off-shore refugia for Atlantic cod during the LGM (Pampoulie et al. 2008a,b;Kettle et al. 2011). Modelling work has suggested that periods of allopatry, for instance, in isolated glacial refugia, could favour the establishment of local genomic differentiation under some models of adaptive divergence (Feder et al. 2011). With the current data set, it is not possible to determine whether secondary contact between ecotypes occurred before or after colonization. However, the combination of highly divergent allele lineages within and extremely shallow differentiation outside the region on linkage group 1 is difficult to explain without a significant role for gene flow. Indeed, if the split is very old and gene flow is not occurring between ecotypes, we would expect to see similar patterns of  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54 structuring for neutral markers as those observed for the loci within this specific genomic region as neutral markers would then reveal common ancestry of ecotypes across locations. In addition, on-going gene flow is also indirectly supported by observations of individuals expressing an intermediate type of behaviour in nature (Grabowski et al. 2011), which could suggest on-going hybridization between the ecotypes.
Neutral genetic differentiation between Norway and Iceland (for both ecotypes) also suggests at least partial isolation of the two geographical systems (Waples & Gaggiotti 2006) and that gene flow mostly occurs between ecotypes within the two regions. This gene flow would then be counteracted by on-going selection in the two parallel systems in the specific genomic region in linkage group 1.

Underlying mechanism for genomic differentiation
A number of mechanisms could be responsible for generating and maintaining strong differentiation between ecotypes in the specific region in linkage group 1. If, as suggested above, natural selection is involved, both exogenous (e.g. adaptation to local environmental conditions) and endogenous (i.e. intrinsic incompatibilities) factors could be important and it may be very difficult to disentangle such effects (Bierne et al. 2011). While an intrinsic incompatibility unrelated to known ecological and environmental differences cannot be ruled out, the data are also consistent with the alternative interpretation that the migratory ecotype was affected by a selective sweep linked to the unique life-history characteristics known for these populations. It is plausible that the life-history strategy of the migratory ecotype is linked to utilizing high productivity frontal niches in the Arctic for feeding (Stensholt 2001;Grabowski et al. 2011) and that the well-described migratory and behavioural characteristics reflect this adaptation. Alternative and more specialized adaptations to different temperature conditions (Righton et al. 2010;Grabowski et al. 2011) are also likely linked to these differences in life-history strategies between ecotypes.
Many studies have discussed selection on the pantophysin gene [e.g. (Pogson 2001;Karlsson & Mork 2003;Case et al. 2005;Skarstein et al. 2007)], while some authors have noted that observed patterns of linkage disequilibrium within the gene could indicate that selection is instead targeting a linked gene (Fevolden & Pogson 1997). The latter hypothesis is supported by the present study, which suggests that pantophysin may be linked to a large genomic region, potentially harbouring hundreds of genes, rather than the actual target of selection.
Although the link between ecotypes and genomic patterns is consistent with patterns resulting from natural selection (through exogenous or endogenous factors) in local populations, alternative explanations could, in principle, also explain our findings. For instance, it has been suggested that transient phases during the fixation process of a globally favourable mutation could generate signals similar to selective sweeps in local populations (Bierne 2010). However, in a scenario of a globally favourable mutation, sweep signals of different magnitudes should be observed in all populations and should be unrelated to specific ecological characteristics [see also (Roesti et al. 2012a)]. Thus, expected patterns under a globally favourable mutation model are difficult to reconcile with observed patterns, where sweep signals are specifically observed in populations characterized by the migratory life-history strategy. Similarly, structural chromosomal features, such as chromosome centromeres, could potentially explain localized genomic increases in population differentiation due to reduced recombination rates in these regions Roesti et al. 2012a). However, while recombination rate variation would be expected to result in increased levels of differentiation in some parts of the genome, it cannot explain the extreme reduction in diversity observed only in the migratory population samples. Thus, the most plausible explanation remains a balance between local selection and gene flow. Finally, ascertainment bias could have affected some of the analyses conducted in this study because markers were primarily developed from western Atlantic cod populations. Previous studies have not found markedly different levels of diversity in eastern and western Atlantic cod populations (O'Leary et al. 2007;Bigg et al. 2008), and the lower levels of variation observed in the eastern Atlantic in this study could therefore suggest an effect from ascertainment bias. However, we still do not expect these effects to severely bias the major conclusions drawn from analyses focusing on eastern Atlantic populations, as levels of variation are similar in the eastern Atlantic samples (Table S1, Supporting information) and as all samples in the eastern Atlantic (migratory and stationary populations, in particular) are weakly differentiated from each other and show common divergence from the western Atlantic [Table S2, Supporting information, see also e.g. Rosenblum & Novembre (2007)]. Thus, ascertainment bias would be expected to affect eastern Atlantic samples to the same degree.
While data suggest increased differentiation over one large genomic region, the relatively modest genome coverage in this study and the fragmented nature of the current cod genome assembly (see Figs S1 and S2, Supporting information) does not allow a formal assessment of whether the signals reflect few or several targets of selection [see discussion in Via (2012)]. It is possible that future studies applying higher genome coverage may identify more complex patterns of differentiation between cod ecotypes, such as observed in malaria mosquitoes Neafsey et al. 2010). Similarly, the data do not allow for an assessment of whether divergence hitchhiking, chromosomal rearrangement, such as inversions, or another mechanism is most likely responsible for the observed patterns. It is likely, however, that dense sequencing of the region could elucidate the underlying processes responsible.

Genomic mosaic of differentiation in Atlantic cod
In contrast to patterns observed in linkage group 1, regions of increased differentiation in linkage groups 2, 7 and 12 are not associated with the migratory ecotype samples. These patterns have previously been attributed to co-evolution of several genes in response to common environmental conditions [temperature; (Bradbury et al. 2010)], but they have not been related to the extremely low levels of differentiation across other parts of the genome, as observed here.
Collectively our results suggest that, on a genomewide scale, relatively few and potentially large regions, or 'genomic islands', could be affected by selection in populations still influenced by gene flow. These patterns are consistent with a 'genomic mosaic of divergence' (Wu 2001;Via & West 2008), originally proposed to underlie early stages of ecological divergence in malaria mosquitoes and pea aphids (Turner et al. 2005;Via & West 2008;Via 2009Via , 2012White et al. 2010). Since these original studies, theoretical and conceptual work has considered whether divergence should be localized or genome-wide during different stages of the 'divergence-with-gene-flow continuum' (Feder et al. 2012a,b;Via 2012). Although the number of empirical studies is increasing, relatively few model systems have so far been studied. While some studies have identified genome-wide patterns of divergence, for instance, in walking stick insects (Nosil et al. 2008) and three-spined stickleback (Roesti et al. 2012a), others have suggested localized divergence, for example, in pea aphids (Via & West 2008;Via et al. 2012) and Heliconius butterflies (Nadeau et al. 2012). Interestingly, results from the original model case introducing the 'genomic island' metaphor (Turner et al. 2005) have been reinterpreted with the availability of genome-wide data to actually reflect pervasive divergence throughout the genome Neafsey et al. 2010), and even studies on the same species under different settings have arrived at different conclusions (Hohenlohe et al. 2012;Roesti et al. 2012a). Thus, so far, empirical work has not identified a universal remnant genomic signature following ecological divergence, and it seems likely that different processes operate on different stages of the continuum from panmixia to complete reproductive isolation (Feder et al. 2012a).
In Atlantic cod, patterns of genomic differentiation associated with clearly differentiated populations from the Baltic Sea and the western Atlantic were different from those observed between weakly differentiated groups. Among highly divergent populations, population differentiation was found across all linkage groups ( Fig. 3 and Fig. S4, Supporting information), suggesting reproductive isolation and reduced gene flow (Nielsen et al. 2003;Feder et al. 2012a). Divergence between the eastern and western Atlantic is believed to be more than 100 000 years old, predating the last glacial maximum (Bigg et al. 2008). Thus, it may not be surprising that time has allowed genomic differentiation to develop across the Atlantic. In the case of the Baltic Sea, however, Atlantic cod most likely colonized the region following the last glacial retreat from this area around 8000 years ago (Nielsen et al. 2003;Johannesson & Andre 2006). For Atlantic cod and many other marine species, it is therefore plausible that genomic differentiation arose over a relatively short evolutionary timescale following a colonization process involving adaptation, reproductive isolation and increased levels of genetic drift in the Baltic Sea (Johannesson & Andre 2006). Indeed, several lifehistory characteristics, such as unique sperm activity and egg buoyancy (Nissling & Westin 1997), as well as pronounced genetic differentiation for both neutral and non-neutral genetic markers (Nielsen et al. 2003(Nielsen et al. , 2009b of Atlantic cod in the Baltic Sea, suggest significant roles for both neutral and non-neutral evolutionary forces in Baltic Sea populations. The scenarios represented by the Atlantic cod system may therefore represent different stages on the continuum from panmixia to complete isolation (Feder et al. 2012a;Via 2012). Importantly, even though the initial split between ecotypes was not recent per se,t h es c e n a r i o may still represent an early stage of divergence, that is, a stage where populations remain connected through significant levels of gene flow (Via 2009). In contrast, reductions in gene flow between highly differentiated groups illustrate that genome-wide effects from neutral evolutionary forces will make it difficult to detect genomic regions associated with initial stages of divergence if populations are investigated at later stages (Via 2009(Via , 2012.

Conclusions
The Atlantic cod ecotypes have contributed novel insights on the possible genomic signatures underlying ecological divergence in a high gene flow species. Even though the responsible mechanism and the nature of targets of selection are still unknown, our findings provide additional insights into the long-standing controversy on the interactions between diversifying selection and homogenizing gene flow (Ehrlich & Raven 1969;Mayr 1969;Lenormand 2002;Garant et al. 2007). While predictions on the extent and pattern of adaptive divergence can be tested using comparisons of phenotypic traits across populations, analysis at the genomic level allows for unequivocal identification of the integrated effects of selection and gene flow, as well as indicating genes potentially of major effect. Importantly, the frequently documented negative correlations between phenotypic differences and gene flow (Rasanen & Hendry 2008) may be underlain by a much more complex genomic mosaic of response even in high gene flow species [see also Nadeau et al. (2012)]. Thus, the Atlantic cod ecotypes represent an informative model to study evolution in action (Via 2009), particularly in relation to the dramatic environmental changes predicted for Arctic marine environments under future climate change (Solomon et al. 2007).

Supporting information
Additional supporting information may be found in the online version of this article.
Table S1 Genotyped SNPs with SNPs equence, associated dbSNP reference number, information from the genome alignment and estimates of heterozygosity in each of the eight samples.

Table S2
Pairwise F ST with 95% confidence intervals from 1000 bootstrapped data sets for the full data set (1199 loci, below diagonal) and for a reduced data set where outliers identified through Bayesian regression were removed (1112 loci, above diagonal).