Drivers and dynamics of a massive adaptive radiation in cichlid fishes

Adaptive radiation is the likely source of much of the ecological and morphological diversity of life1–4. How adaptive radiations proceed and what determines their extent remains unclear in most cases1,4. Here we report the in-depth examination of the spectacular adaptive radiation of cichlid fishes in Lake Tanganyika. On the basis of whole-genome phylogenetic analyses, multivariate morphological measurements of three ecologically relevant trait complexes (body shape, upper oral jaw morphology and lower pharyngeal jaw shape), scoring of pigmentation patterns and approximations of the ecology of nearly all of the approximately 240 cichlid species endemic to Lake Tanganyika, we show that the radiation occurred within the confines of the lake and that morphological diversification proceeded in consecutive trait-specific pulses of rapid morphospace expansion. We provide empirical support for two theoretical predictions of how adaptive radiations proceed, the ‘early-burst’ scenario1,5 (for body shape) and the stages model1,6,7 (for all traits investigated). Through the analysis of two genomes per species and by taking advantage of the uneven distribution of species in subclades of the radiation, we further show that species richness scales positively with per-individual heterozygosity, but is not correlated with transposable element content, number of gene duplications or genome-wide levels of selection in coding sequences. Analyses of molecular, anatomical, pigmentation and ecological characteristics of nearly all of the approximately 240 species of cichlid fishes in Lake Tanganyika show that the massive adaptive radiation occurred within the confines of the lake through trait-specific pulses of accelerated evolution.

At the macroevolutionary level, the diversity of life has been shaped mainly by two antagonistic processes: evolutionary radiations increase, and extinction events decrease, organismal diversity over time 5,8,9 . Evolutionary radiations are referred to as adaptive radiations if new lifeforms evolve rapidly through adaptive diversification into a variety of ecological niches, which typically presupposes ecological opportunity [1][2][3]10 . Whether or not an adaptive radiation occurs depends on a variety of extrinsic and intrinsic factors as well as on contingency, whereas the magnitude of an adaptive radiation is determined by the interplay between its main components, speciation (minus extinction) and adaptation to distinct ecological niches 1,2,4,11 . Despite considerable scientific interest in the phenomenon of adaptive radiation as the cradle of organismal diversity 1,2,10,12,13 , many predictions regarding its drivers and dynamics remain untested, particularly in exceptionally species-rich instances. Here, we examine what some consider as the "most outstanding example of adaptive radiation" 14 , the species flock of cichlid fishes in Lake Tanganyika. This cichlid assemblage comprises about 240 species 15 , which together feature an extraordinary degree of morphological, ecological and behavioural diversity [14][15][16][17] . We construct a species tree of Lake Tanganyika's cichlid fauna on the basis of genome-wide data, demonstrate the adaptive nature of the radiation, reconstruct eco-morphological diversification along the species tree, and test general and cichlid-specific predictions related to adaptive radiation.

In situ radiation in Lake Tanganyika
To establish the phylogenetic context of cichlid evolution in Lake Tanganyika, we estimated the age of the radiation through divergence time analyses based on cichlid and other teleost fossils 18 , and constructed time-calibrated species trees using 547 newly sequenced cichlid genomes (Supplementary Table 1). Our new phylogenetic hypotheses ( Fig. 1, Extended Data Figs. 1-4, Supplementary Figs. 1, 2) support the assignment of the Tanganyikan cichlid fauna into 16 subcladescorresponding to the taxonomic grouping of species into tribes 15 -and confirm that the Tanganyikan representatives of the tribes Coptodonini, Oreochromini and Tylochromini belong to more ancestral and widespread lineages that have colonized the lake secondarily 12,15,19 (Supplementary Discussion). It has been under debate whether all endemic Tanganyikan cichlid tribes evolved within the confines of Lake Tanganyika or whether some of them evolved elsewhere before the formation of the lake [20][21][22] . Our time calibrations establish that the most recent common ancestor of the cichlid radiation in Lake Tanganyika lived around 9.7 million years ago (Ma) (95% highest-posterior-density   Fig. 1 | Time-calibrated species tree of the cichlid fishes of African Lake Tanganyika. The species tree was time calibrated with a relaxed-clock model and is based on a maximum-likelihood topology inferred from genome-wide SNPs. Species names are abbreviated using a six-letter code, whereby the first three letters represent the genus and the last three letters the species name (Supplementary Table 1; see Extended Data Fig. 2 for the phylogeny with full species names). Branches are coloured according to tribes, and for all lake species an illustration is shown. Representatives of riverine cichlids (grey font) are nested within the radiation. The inset shows the time-calibrated phylogeny of more ancestral cichlid lineages (estimated under the multi-species coalescent model, Extended Data Fig. 1), highlighting the phylogenetic positions of the Tanganyikan representatives of the tribes Coptodonini (Coptodon rendalli (Copren)), Oreochromini (Oreochromis tanganicae (Oretan)) and Tylochromini (Tylochromis polylepis (Tylpol)), which colonized the lake secondarily. The schematic map of the African continent shows the position of the three Great Lakes Victoria, Malawi and Tanganyika, with a magnified section of Lake Tanganyika. The presumed age of Lake Tanganyika 23 (9)(10)(11)(12) Article age interval: 10.1-9.1 Ma) (Fig. 1), which coincides with the appearance of lacustrine conditions in the Tanganyikan Rift 23 . This suggests that the radiation commenced shortly after the lake had formed and that all endemic cichlid tribes have evolved and diversified in situ, that is, within the temporal and geographical context of Lake Tanganyika.

Phenotypes correlate with environments
Because-in the case of adaptive radiation-diversification occurs via niche specialization, a strong association is expected in the extant fauna between the environment occupied by a species and the specific morphological features used to exploit it 2,3 . To quantify eco-morphological diversification across the radiation, we investigated three trait complexes through landmark-based morphometric analyses. Specifically, we quantified body shape and upper oral jaw morphology using 2D landmarks acquired from X-ray images and the shape of the lower pharyngeal jaw bone based on 3D landmarks derived from micro-computed tomography (μCT) scans (Extended Data Fig. 5). To approximate the ecological niche of each species, we used the carbon and nitrogen stable-isotope composition of muscle tissue, which provides information about the relative position along the benthic-pelagic axis (δ 13 C value) and the relative trophic level (δ 15 N value), respectively 16,24 -a pattern that we corroborate here for Lake Tanganyika (Extended Data Fig. 6a, Supplementary Discussion). The major axes of shape variation for each trait complex were identified through a principal component analysis (PCA). To test for phenotype-environment correlations and to identify the ecologically most relevant components of each of these trait complexes, we performed a two-block partial least-square analysis (PLS) with the stable-isotope measurements, and applied a phylogenetic generalized least-square analysis (pGLS) to account for phylogenetic dependence.
The quantification of variation in body shape revealed that principal component 1 (PC1) represented mainly differences in aspect ratio, whereas PC2 was loaded with changes in head morphology (Fig. 2a). The changes in aspect ratio (comparable to PC1) were correlated with the δ 13 C and δ 15 N values (PLS: Pearson's r = 0.69, R 2 = 0.48, P = 0.001; pGLS: R 2 = 0.12, P < 0.001, λ pGLS = 1.007). PC1 of upper oral jaw morphology mainly represented changes in the orientation and relative size of the premaxilla, which was also the main correlate to the stable C and N isotope composition (PLS: Pearson's r = 0.62, R 2 = 0.38, P = 0.001; pGLS: R 2 = 0.09, P < 0.001, λ pGLS = 1.023), whereas PC2 was defined by changes in the ratio of the rostral versus the lateral part of the bone (Fig. 2b). For lower pharyngeal jaw shape, we found that PC1 reflected mainly changes in the aspect ratio of the jaw bone in combination with an increased posterior thickness, whereas PC2 involved similar shifts in thickness, yet in this case in combination with changes in the lengths of the postero-lateral horns that act as muscle-attachment structures 25 (Fig. 2c). The PLS revealed that shape changes similar to PC2 are best associated with stable-isotope values (PLS: Pearson's r = 0.67, R 2 = 0.45, P = 0.001; pGLS: R 2 = 0.16, P < 0.001, λ pGLS = 1.018). The PCAs further revealed that the occupied area of the morphospace and ecospace scales with the number of species in the tribes (Extended Data Figs. 6, 7; ecospace: Pearson's r = 0.88, d.f. = 9, P < 0.001; body shape: Pearson's r = 0.91, d.f. = 9, P < 0.001; upper oral jaw morphology: Pearson's r = 0.88, d.f. = 9, P < 0.001; lower pharyngeal jaw shape: Pearson's r = 0.83, d.f. = 9, P = 0.002), a pattern that is not driven by sample size only (Supplementary Discussion).
Overall, the significant association between each of the three traits and the stable C and N isotope composition underpins their adaptive value (Extended Data Fig. 8a-c). A joint consideration points out that deep-bodied cichlids with inferior mouths and thick lower pharyngeal jaws with short horns are associated with higher stable-isotope projections (high δ 13 C and low δ 15 N values), indicating that such fishes occur predominantly in the benthic/littoral zone of the lake and feed on plants and algae, whereas more elongated species with more superior mouths and longer and thinner lower pharyngeal jaws are generally associated with lower stable-isotope projections (low δ 13 C and high δ 15 N values), suggesting a more pelagic lifestyle and a higher position in the food chain.

Pulses of morphological diversification
Next, we investigated the temporal dynamics of how the observed eco-morphological disparity emerged over the course of the radiation. In addition to the three eco-morphological traits, we also scored male pigmentation patterns to approximate disparity along the signalling axis-another potentially important component of diversification in adaptive radiations 1,6,7,26 . For all four traits, we estimated morphospace expansion through time using ancestral-state reconstructions along the time-calibrated species tree and applying a variable-rates model of trait evolution 27,28 (Extended Data Fig. 8d, e). We calculated morphological disparity as the extent of occupied morphospace in time intervals of 0.15 million years (Myr) in comparison to a null model that assumes Brownian motion. Likewise, evolutionary rates through time were calculated as mean evolutionary rates derived from the variable-rates model, sampled at the same time points along the phylogeny.
Our analyses uncovered a pattern of discrete pulses in morphospace expansion, which were followed, in most cases, by morphospace packing (Fig. 3). The timing of these pulses differed among the traits. For body shape, we found a pulse of rapid morphospace expansion early in the radiation, alongside the first pulse of lower pharyngeal jaw shape diversification (Fig. 3b, c); this early phase of the radiation also features the highest evolutionary rates for body shape (Fig. 3d). The pulse in upper oral jaw diversification occurred in the middle phase of the radiation. Evolutionary rates were increased during this period, and were even higher at a later phase that was dominated by packing of the upper oral jaw morphospace rather than its expansion (Fig. 3b- suggests that, in that later phase, rapidly evolving lineages diverged into pre-occupied regions of the morphospace, ultimately resulting in convergent forms 16 . The second pulse in lower pharyngeal jaw morphospace expansion happened late in the radiation when evolutionary rates were also highest for this trait (Fig. 3b-d). Thus, the theoretical prediction that eco-morphological diversification is rapid early in an adaptive radiation and slows down through time as the available niche space becomes filled 1,5 applies only to body shape. Yet, this early burst in body shape diversification was not connected to a substantial increase in lineage accumulation (Fig. 3c). Pigmentation patterns showed a single pulse of diversification and increased evolutionary rates late in the radiation-a signature unlikely to be caused by a high turnover rate in this trait (Supplementary Discussion). This late pulse of diversification in pigmentation patterns, together with the consecutive pulses of morphospace expansion in the eco-morphological traits, is in agreement with the prediction that Article diversification in an adaptive radiation proceeds in discrete temporal stages-first in macrohabitat use, then by trophic specialization, followed by a final stage of divergence along the signalling axes 1,6,7 . However, in contrast to the conventional stages model, the most recent stage of the cichlid adaptive radiation in Lake Tanganyika, which coincides with a large number of speciation events (Fig. 3c), is characterized by temporally overlapping pulses of diversification in both a putative signalling trait and in an ecologically relevant trait. The lower pharyngeal jaw shape is the only trait complex showing two discrete pulses of morphospace expansion-one early in the radiation and one late when niche space already became limited. This later pulse suggests that diversification in the pharyngeal jaw apparatus facilitated fine-scaled resource partitioning after body shape and upper oral jaw morphospaces had been explored, resulting in the densely packed niche space observed today (Figs. 2, 3b).

Genomic features and species richness
Finally, we examined whether the diversity patterns arising over the course of the radiation are linked with particular genomic features. It has previously been suggested-on the basis of five reference cichlid genomes-that the radiating African cichlid lineages are characterized by increased transposable element counts, increased levels of gene duplications, and genome-wide accelerated coding-sequence evolution 13 . Because of the phylogenetic substructure of Lake Tanganyika's cichlid fauna and the widely differing species numbers among tribes, our data offered the opportunity to examine genomic features for an association with per-tribe species richness within a large-scale radiation. We did not find evidence that members of species-rich tribes exhibit greater numbers of transposable elements (Fig. 4a) or more duplicated genes in their genomes (Fig. 4b), nor do they feature elevated genome-wide signatures of selection in coding sequences ( Fig. 4c) (see also Extended Data Fig. 9). However, we found that a tribe's species richness scales positively with a common measure of genetic diversity: genome-wide heterozygosity (Fig. 4d). That genetic diversity is linked to species richness has been previously suspected, although the nature of this relationship and the determinants of genetic diversity are under debate 29,30 .
Elevated levels of heterozygosity could potentially result from hybridization 31 , which has itself been suggested as a trigger of cichlid radiations 22,32,33 . In Tanganyikan cichlids, the level of gene flow within tribes (estimated using f 4 -ratio values 34 ) does not correlate with a tribe's species richness (Fig. 4e, Extended Data Fig. 10). Nevertheless, much of the variation in heterozygosity as well as its correlation with species richness can be explained by the observed levels of gene flow within tribes in combination with the reduced gene flow among them: through coalescent simulations of genome evolution along the species tree we show that variation in migration rates, sampled from the empirical f 4 -ratio estimates, can produce levels of heterozygosity that are similar to the ones observed in nature (Fig. 4f). Hence, the correlation between species richness and heterozygosity can be explained by gene flow and phylogenetic structure, which is consistent with the expectation that the effect of gene flow scales positively with the number of hybridizing species and the divergence among these. In the cichlid radiation in Lake Malawi, which is an order of magnitude younger than the one in Lake Tanganyika, heterozygosity levels vary much less among lineages and do not scale with species richness, which-according to our findingscan be explained by the much lower levels of genetic differentiation between the hybridizing species 33 .

Conclusion
On the basis of a comprehensive dataset on cichlid fishes from African Lake Tanganyika, we tested predictions related to the phenomenon of adaptive radiation. We establish that the Tanganyikan cichlid radiation unfolded within the temporal and spatial confines of the lake, giving rise to an endemic fauna consisting of about 240 species in 52 genera and 13 tribes in less than 10 Myr. Although the ancestors of these tribes initially found comparable ecological opportunity, present-day species numbers differ by two orders of magnitude among these phylogenetic sublineages. Our analyses of morphological, ecological and genomic information revealed that, taken as a whole, species-rich tribes occupy larger fractions of the morphospace and ecospace and contain species that are, at the per-genome level, genetically more diverse, which appears to be linked to gene flow. We demonstrate a phenotype-environment association in three trait complexes (body shape, upper oral jaw morphology and lower pharyngeal jaw shape) and pinpoint their most relevant adaptive components. We show that eco-morphological diversification was not gradual over the course of the radiation. Instead, we identified trait-specific pulses of accelerated phenotypic evolution, whereby only diversification in body shape shows an early burst 1,5 . The sequence of the trait-specific pulses essentially follows the pattern postulated in the stages model of adaptive radiation 1,6,7 , with the extension that the most recent stage of the cichlid adaptive radiation in Lake Tanganyika, which is characterized by a large number of speciation events, is defined by increased diversification in both an ecological (lower pharyngeal jaw) and a signalling (pigmentation) trait. To what extent the observed diversity and disparity patterns were shaped by past environmental fluctuations and extinction dynamics cannot be answered conclusively through the investigation of the extant fauna alone.

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41586-020-2930-4.

Methods
No statistical methods were used to predetermine sample size. The experiments were not randomized. The investigators were not blinded to allocation during experiments and outcome assessment.

Sampling
Sampling was conducted between 2014 and 2017 at 130 locations at Lake Tanganyika. To maximise taxon coverage, we included additional specimens from previous expeditions (4.9% of the samples) as well as from other collections (0.8%). The final dataset (301 taxa; n = 2,723 specimens) contained an almost complete taxon sampling of the cichlid fauna of Lake Tanganyika, as well as 18 representative cichlid species from nearby waterbodies, and 32 outgroup species. All analyses described below are based on the same set of typically 10 specimens per species, or subsets thereof (Supplementary Tables 1, 2, Supplementary Methods).

Whole-genome sequencing
Genomic DNA of typically one male and one female specimen per species (n = 547) was extracted from fin clips preserved in ethanol using the E. Determining the age of the radiation To determine the age of the cichlid radiation in Lake Tanganyika, we applied phylogenomic molecular-clock analyses for representatives of all cichlid subfamilies and the most divergent tribes, together with non-cichlid outgroups (44 species; Extended Data Fig. 1). Following Matschiner et al. 18 we identified and filtered orthologue sequences from genome assemblies and compiled 'strict' and 'permissive' datasets that contained alignments for 510 and 1,161 genes and had total alignment lengths of 542,922 and 1,353,747 bp, respectively. We first analysed the topology of the species with the multi-species coalescent model implemented in ASTRAL 49 (v.5.6.3), based on gene trees that we estimated for both datasets with BEAST2 50 (v.2.5.0). As undetected past introgression can influence divergence-time estimates in molecular clock analyses, we further tested for signals of introgression in the form of asymmetric species relationship in gene trees and excluded five species (Fundulus heteroclitus, Tilapia brevimanus, Pelmatolapia mariae, Tilapia sparrmanii, and Steatocranus sp. 'ultraslender') potentially affected by introgression from all subsequent molecular-clock analyses. We then estimated divergence times among the most divergent cichlid tribes and the age of the cichlid radiation in Lake Tanganyika with the multi-species coalescent model in StarBEAST2 51 (v.0.15.5), using the 'strict' set of gene alignments (Extended Data Fig. 1). Further details are provided in the Supplementary Methods.

Phylogenetic inference
To infer a complete phylogeny of the cichlid radiation in Lake Tang Fig. 4). Species-level phylogenies resulting from these different approaches were used as topological constraints in subsequent relaxed-clock analyses of divergence times (see below). In addition, we estimated the mitochondrial phylogeny based on maximum-likelihood with RAxML ( Supplementary Fig. 2). Further details are provided in the Supplementary Methods.

Divergence time estimates within the radiation
For relaxed-clock analyses, the 1,272 alignments were further filtered by applying stricter thresholds on the proportion of missing data and the strength of recombination signals. Ten remaining alignments with a length greater than 2,500 bp and less than 130 hemiplasies (total length: 30,738 bp; completeness: 95.8%), were then used jointly to estimate divergence times with the uncorrelated-lognormal relaxed-clock model implemented in BEAST2. To account for phylogenetic uncertainty in downstream phylogenetic comparative analyses, we performed three separate sets of relaxed clock analyses, in which the topology was either fixed to the species-level phylogeny inferred with RAxML ( Fig. 1, Extended Data Fig. 2), the species tree inferred with ASTRAL (Extended Data Fig. 3) or the Bayesian species tree inferred with SNAPP (Extended Data Fig. 4). Further details are provided in the Supplementary Methods.

Morphometrics
To quantify body shape and upper oral jaw morphology, we applied a landmark-based geometric morphometric approach to digital X-ray images (for the full set of 10 specimens per species whenever possible; n = 2,197). We selected 21 landmarks, of which 17 were distributed across the skeleton and four defined the premaxilla (Extended Data Fig. 5a).
To extract overall body shape information, we excluded landmark 16, which marks the lateral end of the premaxilla, hence minimizing the impact of the orientation of the upper oral jaw. We then applied a Procrustes superimposition to remove the effect of size, orientation, and translational position of the coordinates. For upper oral jaw morphology, we used a subset of four landmarks. A crucial feature of the oral jaw morphology is the orientation of the mouth relative to the body axes. However, this component of the upper oral jaw morphology would be lost in a classical geometric morphometric analysis, in which only pure shape information is retained. To overcome this, we extracted the premaxilla-specific landmarks (1, 2, 16 and 21) after Procrustes superimposition of the entire set of landmarks and subsequently recentred the landmarks to align the specimens without rotation. Thus, the resulting landmark coordinates do not represent the pure shape of the premaxilla but additionally contain information on its orientation and size in relation to body axes and body size, respectively.
To quantify lower pharyngeal jaw bone shape in 3D, a landmark-based geometric morphometric approach was applied on μCT scans of the head region of five specimens per species (n = 1,168). To capture all potential functionally important structures of the lower pharyngeal jaw bone, we selected a set of 27 landmarks (10 true landmarks and 17 sliding semi-landmarks) well distributed across the left side of the bone (Extended Data Fig. 5b). Landmark coordinates were acquired using TINA 56 (v.6.0). To retain the lateral symmetric properties of the shape data during superimposition, we reconstructed the right side of the lower pharyngeal jaw bone by mirroring the landmark coordinates across the plane of bilateral symmetry fitted through all landmarks theoretically lying on this plane. We then superimposed the resulting 42 landmarks while sliding the semi-landmarks along the curves by minimizing Procrustes distances and retained the symmetric component only.
To identify the major axes of shape variation across the multivariate datasets we performed a PCA for each trait. We also calculated morphospace size per tribe as the square root of the convex hull area spanned by species means of the PC1 and PC2 scores. We then tested for a correlation between morphospace size and estimated species richness of a tribe 15 (log-transformed to obtain normal distribution). To account for phylogenetic non-independence, we calculated phylogenetic independent contrasts with the R package ape 57 (v.5.2) using the species tree ( Fig. 1) pruned to the tribe level. We then calculated Pearson's correlation coefficients for independent contrasts using the function cor.table of the R package picante 58 (v.1.8).

Stable-isotope analysis
To approximate ecology for each species, we measured the stable carbon (C) and nitrogen (N) isotope composition of all available specimens from Lake Tanganyika (n = 2,259). We analysed a small (0.5-1 mg) dried muscle sample of each specimen with a Flash 2000 elemental analyser coupled to a Delta Plus XP continuous-flow isotope ratio mass spectrometer (IRMS) via a Conflo IV interface (Thermo Fisher Scientific). Carbon and nitrogen isotope data were normalized to the VPDB (Vienna Pee Dee Belemnite) and Air-N 2 scales, respectively, using laboratory standards which were calibrated against international standards. Values are reported in standard per-mil notation (‰), and long-term analytical precision was 0.2‰ for δ 13 C values and 0.1‰ for δ 15 N values. Note that we have used some of these stable-isotope values in a previous study 62 .
To confirm interpretability of the δ 13 C and δ 15 N values, we additionally collected and analysed baseline samples covering several trophic levels from the northern and the southern basin of Lake Tanganyika (Supplementary Methods, Supplementary Discussion).
To test for a correlation of ecospace size with species richness of the tribes, we applied the same approach as described above to the δ 13 C and δ 15 N values.

Phenotype-environment association
For each trait (body shape, upper oral jaw, lower pharyngeal jaw) we performed a two-block PLS analysis based on species means of the Procrustes aligned landmark coordinates and the stable C and N isotope compositions using the function two.b.pls in geomorph. To account for phylogenetic dependence of the data we applied a pGLS as implemented in the R package caper 63 (v.1.0.1) across the two sets of PLS scores (each morphological axis and the stable-isotope projection) using the time-calibrated species tree based on the maximum-likelihood topology. The strength of phylogenetic signal in the data was accounted for by optimising the branch length transformation parameter lambda using a maximum-likelihood approach.

Scoring pigmentation patterns
To quantify a putative signalling trait in cichlids, we scored the pigmentation patterns in typically five male specimens per species (n = 1,016), on the basis of standardized images taken in the field after capture of the specimens (see Supplementary Methods). Following the strategy described in Seehausen et al. 64 , the presence or absence of 20 pigmentation features was recorded, whereby we extended number of scored features to include additional body and fin pigmentation patterns (Extended Data Fig. 5c). We then applied a logistic PCA implemented in the R package logisticPCA 65 (v.0.2) and used the PC1 scores as univariate proxy for differentiation along the signalling axes for further analyses.

Trait evolution modelling and disparity estimates
To investigate the temporal dynamics of morphological diversification over the course of the radiation we essentially followed the strategy of Cooney et al. 28 (which is based on measurements on extant taxa and assumes constant niche space and no or constant extinction over the course of the radiation), using the PLS scores of body shape, upper oral jaw morphology, and lower pharyngeal jaw shape and the PC1 scores of pigmentation patterns as well as the time-calibrated maximum-likelihood species tree topology. For each trait we assessed the phylogenetic signal in the data by calculating Pagel's lambda and Blomberg's K with the R package phytools 66 (v.0.6-60). We then tested the fit of four models of trait evolution for each of the four traits. We applied a white noise model, a Brownian motion model, a single-optimum Ornstein-Uhlenbeck model and an early burst model of trait evolution using the function fitContinuous of the R package geiger 67 (v.2.0.6.1). Additionally, we fitted a variable-rates model (a Brownian motion model which allows for rate shift on branches and nodes) using the software BayesTrait (http://www.evolution.rdg.ac.uk/; v.3) with uniform prior distributions adjusted to our dataset (alpha: −1-1, sigma: 0-0.001 for morphometric traits; alpha: 0-10, sigma: 0-10 for pigmentation pattern) and applying single-chain Markov-chain Monte Carlo runs with one billion iterations. We sampled parameters every 100,000th iteration, after a pre-set burnin of 10,000,000 iterations. We then tested for each trait for convergence of the chain using a Cramervon Mises statistic implemented in the R package coda 68 (v.0.19-3). The models were compared by calculating their log-likelihood and Akaike information criterion (AIC) difference (Extended Data Fig. 8d). Based on differences in AIC, the variable-rates model was best supported for all traits but body shape, which showed a strong signal of an early burst of trait evolution (Extended Data Fig. 8d, note that the variable-rates model has the highest log-likelihood for body shape as well). We nevertheless focused on the variable-rates model for further analyses of all traits to be able to compare temporal patterns of trait evolution among the traits.

Article
To estimate morphospace expansion through time we used a maximum-likelihood ancestral-state reconstruction implemented in phytools. To account for differences in the rate of trait evolution along the phylogeny, we reconstructed ancestral states using the mean rate-transformed tree derived from the variable-rates model. We then projected the ancestral states onto the original species tree and calculated the morphospace extent (that is, the range of trait values) in time intervals of 0.15 million years (note that this is an arbitrary value; however, differently sized time intervals had no effect on the interpretation of the results). For each time point we extracted the branches existing at that time and predicted the trait value linearly between nodes. We then compared the resulting morphospace expansion over time relative to a null model of trait evolution. We therefore simulated 500 datasets (PLS and PC1 scores) under Brownian motion given the original species tree with parameters derived from the Brownian motion model fit to the original data. For each simulated dataset we produced morphospace-expansion curves using the same approach as described above. We then compared the slopes of our observed data with each of the null models by calculating the difference of slopes through time (Fig. 3) using linear models fitted for each time interval with the two subsequent time intervals. Note that for body shape we also estimated morphospace expansion through time using the early burst model for ancestral-state reconstruction, which resulted in a very similar pattern of trait diversification.
Unlike other metrics of disparity (for example, variance or mean pairwise distances) morphospace extent is not sensitive to the density distribution of measurements within the morphospace and captures its full range 69 . Hence, comparing the extent of morphospace between observed data and the null model directly unveils the contribution of morphospace expansion relative to the null model; and because the increase in lineages over time is identical in the observed and the simulated data, this comparison also provides an estimate for morphospace packing.
To summarize evolutionary rates we calculated the mean rate of trait evolution inferred by the variable-rates model in the same 0.15 million years intervals along the phylogeny.
To account for phylogenetic uncertainty in the tree topology we repeated the analyses of trait evolution using the time-calibrated trees based on tree topologies estimated with ASTRAL and SNAPP (Extended Data Figs. 3, 4; Supplementary Methods; Supplementary Discussion). Furthermore, to also account for uncertainty in branch lengths, we repeated the analysis on 100 trees from the Bayesian posterior distribution for each of the three trees (Extended Data Fig. 8d, e, results are provided on Dryad).
Further details can be found in the Supplementary Methods.

Characterization of repeat content
For the repeat content analysis, we randomly selected one de novo genome assembly per species of the radiation (n = 245). We performed a de novo identification of repeat families using RepeatModeler (v. to identify and soft-mask interspersed repeats and low complexity DNA sequences in each assembly. The reported summary statistics were obtained using RepeatMasker's buildSummary.pl script (Fig. 4a, Extended Data Fig. 9a, results per genome are provided on Dryad).

Gene duplication estimates
Per genome, gene duplication events were identified with the structural variant identification pipeline smoove (population calling method; https://github.com/brentp/smoove, docker image cloned 20/12/2018), which builds upon lumpy 70 , svtyper 71 and svtools (https:// github.com/hall-lab/svtools). Variants were called per sample (n = 488 genomes, 246 taxa of the Tanganyika radiation) from the initial mapping files against the Nile tilapia reference genome with the function 'call'. The union of sites across all samples was obtained with the function 'merge', then all samples were genotyped at those sites with the function 'genotype', and depth information was added with --duphold. Genotypes were combined with the function 'paste' and annotated with 'annotate' and the reference genome annotation file. The obtained VCF file was filtered with BCFtools to keep only duplications longer than 1 kb and of high quality (MSHQ >3 or MSHQ = −1, FMT/DHFFC[0] > 1.3, QUAL >100). The resulting file was loaded into R (v.3.6.0) with vcfR 72 (v.1.8.0) and filtered to keep only duplications with less than 20% missing genotypes. Next, we removed duplication events with a length outside 1.5 times the interquartile range above the upper quartile of all duplication length, resulting in a final dataset of 476 duplications (Fig. 4b).

Analyses of selection on coding sequence
To predict genes within the de novo genome assemblies, we used AUGUSTUS 73 (v.3.2.3) with default parameters and 'zebrafish' as species parameter (n = 485 genomes, 245 taxa). For each prediction we inferred orthology to Nile tilapia genes (GCF_001858045.1_ASM185804v2) with GMAP (GMAP-GSNAP 74 ; v.2017-08-15) applying a minimum trimmed coverage of 0.5 and a minimum identity of 0.8. We excluded specimens with less than 18,000 tilapia orthologous genes detected (resulting in n = 471 genomes, 243 taxa). Next, we kept only those tilapia protein coding sequences that had at least one of their exons present in at least 80% of the assemblies (260,335 exons were retained, representing 34,793 protein coding sequences). Based on the Nile tilapia reference genome annotation file, we reconstructed for each assembly the orthologous coding sequences. Missing exon sequences were set to Ns. We then kept a single protein coding sequence per gene (the one being present in the maximum number of species with the highest percentage of sequence length), resulting in 15,294 protein coding sequences. Per gene, a multiple sequence alignment was then produced using MACSE 75 (v.2.01). We calculated for each specimen and each gene the number of synonymous (S) and non-synonymous (N) substitutions by pairwise comparison to the orthologue tilapia sequence using codeml with runmode -2 within PAML 76 (v.4.9e). To obtain an estimate of the genome-wide sequence evolution rate that is independent of filtering thresholds, we calculated the genome-wide dN/dS ratio for each specimen based on the sum of dS and dN across all genes (Fig. 4c, Extended Data Fig. 9b).

Signals of past introgression
We used the f 4 -ratio statistic 34 to assess genomic evidence for interspecific gene exchange. We calculated the f 4 -ratio for all combinations of trios of species on the filtered VCF files using the software Dsuite 77 (v.0.2 r20), with T. sparrmanii as outgroup species (we excluded N. cancellatus as all specimens of this species appeared to be F 1 hybrids; Supplementary Methods). The f 4 -ratio statistic estimates the admixture proportion, that is, the proportion of the genome affected by gene flow. The results presented in this study (Fig. 4e, Extended Data Fig. 10) are based on the 'tree' output of the Dsuite function Dtrios, with each trio arranged according to the species tree on the basis of the maximum-likelihood topology. The per-tribe analyses (Fig. 4e) were based only on comparisons where all species within a trio belong to the same tribe (n = 243 taxa).
In addition to the f 4 -ratio we also identified signals of past introgression among species using a phylogenetic approach by testing for asymmetry in the relationships of species trios in 1,272 local maximum-likelihood trees generated using IQ-TREE (Supplementary Methods; Extended Data Fig. 10).

Heterozygosity
We calculated the number of heterozygous sites per genome (n = 488 genomes, 246 taxa from the Tanganyika radiation) from the VCF files using the BCFtools function stats and then quantified the percentage of heterozygous sites among the number of callable sites per genome (see above) (Fig. 4d).
To explore if the observed levels of heterozygosity per tribe can be explained by the levels of gene flow within tribes we performed coalescent simulations with msprime 78 (v.0.7.4). We simulated genome evolution of all species of the radiation following the time-calibrated species tree (Fig. 1), assuming a generation time of 3 years 79 and a constant effective population size of 20,000 individuals. Species divergences were implemented as mass migration events and introgression within tribes as migration between species pairs with rates set according to their introgression (f 4 -ratio) signals inferred with Dsuite. To convert the f 4 -ratio values into migration rates, we applied a scaling factor of 5 × 10 −6 , which results in a close correspondence in magnitude of the simulated introgression signals to those observed empirically (Fig. 4, Extended Data Fig. 9c). In each of 20 separate simulations, we randomly sampled one pairwise f 4 -ratio value for each pair of species (there are many f 4 ratios per species pair-one for each possible third species added to the test trio; the maximum values per pair are shown in Extended Data Fig. 10). The simulated data consisted of one chromosome of 100 kb (mutation rate: 3.5 × 10 −9 per bp per generation 33 , recombination rate: 2.2 × 10 −8 per bp per generation; see Supplementary Methods). Levels of heterozygosity were calculated for all simulated datasets as described for the empirical data.
To account for between-tribe gene flow we further performed simulations in which migration between tribes was also sampled from the empirical f 4 -ratio distribution. For simplicity in setting up the simulation model, we assume that gene flow between tribes is ongoing until present day, which is clearly an overestimate (see Supplementary  Discussion). Nevertheless, the results of these simulations support our hypothesized scenario, confirming that much of the variation in heterozygosity as well as its correlation with species richness can be explained by the observed levels of gene flow.

Correlation of genome-wide statistics with species richness
We tested for a correlation between tribe means (based on species means) of each genomic summary statistics (transposable element counts, number of gene duplications, genome-wide dN/dS ratio, per-genome heterozygosity, and f 4 -ratio, as well as the heterozygosity and f 4 -ratio statistics derived from simulated genome evolution) and species richness of the tribes, applying the same approach as described above for tests of correlation between morpho-and ecospace size and species richness.

Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

Data availability
All newly sequenced genomes for this study and their raw reads are available from NCBI under the BioProject accession number PRJNA550295 (https://www.ncbi.nlm.nih.gov/bioproject/). The VCF file, tree files, summary statistics of the assembled genomes and phenotypic datasets generated and analysed during this study are available as downloadable files on Dryad (https://doi.org/10.5061/ dryad.9w0vt4bbf). The Nile tilapia reference genome used is available under RefSeq accession GCF_001858045.1. All X-ray data are available on MorphoSource under the project number P1093. Source data are provided with this paper.

Code availability
Code used to analyse the data is available on GitHub (https://github. com/cichlidx/ronco_et_al), except for analyses where single commands from publicly available software were used and where all settings are fully reported in the Methods and/or Supplementary Methods.  Age of the LT radiation 12  Extended Data Fig. 2 | Time-calibrated species tree of the cichlid adaptive radiation in Lake Tanganyika. The species tree is based on the maximum-likelihood topology estimated with RAxML (Fig. 1) and was time-calibrated using a relaxed-clock model in BEAST2, applied to a selected set of alignments.    Fig. 4 | Alternative time-calibrated species tree of the cichlid adaptive radiation in Lake Tanganyika. The species tree is based on the topology estimated with SNAPP and was time-calibrated using a relaxed-clock model in BEAST2, applied to a selected set of alignments. Fig. 5 | Phenotyping of the specimens. a, Two-dimensional landmarks placed on X-ray images of the specimens. To quantify overall body shape we excluded landmark 16 (to minimise the effect of the orientation of the oral jaw). To analyse upper oral jaw morphology we used landmarks 1, 2, 16 and 21. b, Three-dimensional landmarks used to analyse lower pharyngeal jaw shape on μCT scans of the heads. True landmarks are indicated in red, sliding semi-landmarks are indicated in blue. c, Body regions scored for presence/ absence of pigmentation patterns. Extended Data Fig. 6 | Ecospace and morphospace occupation of the cichlid adaptive radiation in Lake Tanganyika. Scatter plots for each focal tribe (indicated with colours, see Fig. 1 for colour key) against the total eco-and morphospace (grey). Species ranges are indicated with convex hulls. a, Stable N and C isotope compositions (δ 15 N and δ 13 C values). The additional plot shows δ 15 N and δ 13 C values of a baseline dataset which confirms the interpretability of the stable N and C isotope composition in Lake Tanganyika (see Supplementary Methods and Discussion). b, PC1 and PC2 of body shape (for shape changes associated with the PC axes see Fig. 2). The last plot for each trait shows the size of the traitspace per tribe in relation to species numbers (stable isotopes: Pearson's r = 0.88, d.f. = 9, P = 0.0004; body shape: Pearson's r = 0.91, d.f. = 9, P = 0.0001). Traitspace size was calculated as the square root of the convex hull area spanned by species means.  Last updated by author(s): Aug 7, 2020 Reporting Summary Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting. For further information on Nature Research policies, see Authors & Referees and the Editorial Policy Checklist.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Data Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability All newly sequenced genomes for this study and their raw reads are available from NCBI under the BioProject accession number PRJNA550295 (https:// www.ncbi.nlm.nih.gov/bioproject/). The VCF file, tree files, summary statistics of the assembled genomes, and phenotypic datasets generated and analysed during this study are available as downloadable files on Dryad (https://doi.org/10.5061/dryad.9w0vt4bbf). The Nile tilapia reference genome used is available under RefSeq accession GCF_001858045.1. All X-ray data are available on MorphoSource under the project number P1093.

Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection. All studies must disclose on these points even when the disclosure is negative.

Study description
For the purpose of a comprehensive exploration of the evolution of cichlid fishes in Lake Tanganyika, we collected ten specimens of nearly all cichlid species occurring in that lake; sequenced the genome of one male and one female specimen per species (plus one genome of some outgroups and riverine sister taxa); assessed eco-morphological divergence by quantifying body shape (10 per species), oral jaw morphology (10 per species), lower pharyngeal jaw shape (5 per species) and stable carbon and nitrogen isotope compositions (10 per species); and quantified divergence in pigmentation patterns (5 per species).

Research sample
Our set of samples consists of ten specimens of nearly all cichlid fish species occurring in Lake Tanganyika, a set of selected outgroup species and a set of riverine species nested within the radiation. This sample was selected to maximally represent the cichlid fauna in the Lake Tanganyika drainage and the phylogenetic spectrum of East African cichlids. A comprehensive list of taxa (n=297) and specimens (n= 2'723; typically 5 males and 5 females per species) including information on the sex of the specimens is provided as Supplementary Tables 1 and 2. The ages are unknown for all specimens, but all specimens were adults. No manipulations were performed.

Sampling strategy
We collected specimens of cichlid fishes at African Lake Tanganyika that were either caught with barrier nets while snorkeling or Scuba diving, or purchased from local fishermen. After euthanasia with clove oil, we measured, weighted and photographed each specimen and took a fin clip for later DNA extraction. Specimen were formalin fixed and in a standardized way. Sampling was performed under research permits issued by the relevant authorities in the Republic of Burundi, the United Republic of Tanzania, and the Republic of Zambia.
To maximize taxon sampling we included additional specimens from previous expeditions (4.9% of the samples) as well as from other collections (0.8%). The final dataset (297 taxa; n = 2'723 specimens) contained an almost complete taxon sampling of the cichlid fauna of Lake Tanganyika, 18 non-Tanganyikan cichlids nested within the radiation, and 28 outgroup species (see Supplementary  Tables 1 and 2 for details).
No sample size calculations were performed a priori. We sampled 10 adult specimens per species, which is sufficient to quantify ecomorphological disparity and estimate representative species means for comparative analyses. For genome sequencing we selected, whenever available, one male and one female individual per species to have both sexes represented.

Data collection
Digitalisation of Landmarks for body shape and upper oral jaw morphology: Data recorded by Fabrizia Ronco using the Software FIJI (v2.0.0-rc-68/1.521i) based on X-ray Images of the specimens.
Digitalisation of Landmarks for lower pharyngeal jaw morphology: Data recorded by Fabrizia Ronco using the Software TINA (v.6.0) based on CT-scans of the specimens.
Scoring pigmentation pattern: Data recorded by Walter Salzburger, scored by eye based on photographs of the specimens. Timing and spatial scale Sampling was conducted between 2014 and 2017 at 130 locations around Lake Tanganyika, followed by sample processing in Basel and Oslo, which required the following work packages and durations: DNA extraction and genome sequencing: April 2014 -February 2017 (Basel and Oslo).