How sexual selection can drive the evolution of costly sperm ornamentation

Post-copulatory sexual selection (PSS), fuelled by female promiscuity, is credited with the rapid evolution of sperm quality traits across diverse taxa. Yet, our understanding of the adaptive significance of sperm ornaments and the cryptic female preferences driving their evolution is extremely limited. Here we review the evolutionary allometry of exaggerated sexual traits (for example, antlers, horns, tail feathers, mandibles and dewlaps), show that the giant sperm of some Drosophila species are possibly the most extreme ornaments in all of nature and demonstrate how their existence challenges theories explaining the intensity of sexual selection, mating-system evolution and the fundamental nature of sex differences. We also combine quantitative genetic analyses of interacting sex-specific traits in D. melanogaster with comparative analyses of the condition dependence of male and female reproductive potential across species with varying ornament size to reveal complex dynamics that may underlie sperm-length evolution. Our results suggest that producing few gigantic sperm evolved by (1) Fisherian runaway selection mediated by genetic correlations between sperm length, the female preference for long sperm and female mating frequency, and (2) longer sperm increasing the indirect benefits to females. Our results also suggest that the developmental integration of sperm quality and quantity renders post-copulatory sexual selection on ejaculates unlikely to treat male–male competition and female choice as discrete processes.

sperm competition 15 and cryptic female choice 2 . The best-known adaptation to post-copulatory sexual selection (PSS) is the production of copious sperm. More sperm should nearly always enhance competitive fertilization success, thus explaining the widespread positive correlation between relative testis size and sperm competition risk 15 . Taxa with this adaptation will tend to exhibit positive covariation between the strength of PSS and sexual disparity in reproductive potential, similar to the pattern for premating sexual selection.
A theoretical conundrum arises, however, when considering that PSS also selects for longer sperm in Drosophila 3,16-18 and numerous other taxa 1 . Because sperm length competes locally for resources with sperm number owing to their spatial and temporal co-occurrence within the developmental environment of the testes, the two traits are relatively constrained to evolutionarily trade off against one another 19 . Across Drosophila species, sperm length displays strong negative correlation with both the number of sperm manufactured (slope = −0.97, R 2 = 0.55) and ejaculated (slope = −1.56, R 2 = 0.90) 20 . Consequently, species with gigantic sperm (and particularly intense PSS) exhibit the least sex difference in reproductive potential 4 . For example, D. bifurca has 5.8-cm-long sperm, and only a few times more sperm than eggs are produced in the population 4 . Because sexual selection theory predicts the weakest sexual selection for such species (see above), this phenomenon was coined the 'big-sperm paradox' 4 .
To better characterize this paradox, we first examined the evolutionary allometry of sperm length and egg volume across all Drosophila species that had reports for both traits in the literature (n = 46 species; Extended Data Table 2 and Extended Data Fig. 1) using phylogenetic reduced major-axis (RMA) regressions. The slope of the sperm-length allometry was 5.52 ( Fig. 1a; P < 0.0001, λ = 1.0), which is approximately twofold greater than slopes for nearly all other sexually selected traits previously studied ( Fig. 2; Supplementary Tables 1-3; Extended Data Figs 2 and 3). In sharp contrast, linearized egg size was negatively allometric, albeit not significantly so ( Fig. 1b; slope = 0.84, P = 0.19, λ = 1.00). We further examined all available data on ovariole number for this set of species as an index of the number of eggs produced 21 and found it to exhibit positive allometry (n = 35, slope = 2.63, P < 0.0001, λ = 0.99). Finally, egg volume declined as ovariole number increased in a phylogenetic regression controlled for body size (n = 35, r = -0.69, P < 0.0001; thorax length: r = 0.77, P < 0.0001; λ < 0.0001 1.00,0.02 ). That larger-bodied species produce fewer, longer sperm, yet more eggs, reinforces the big-sperm paradox by further limiting the number of sperm competing for each egg 4 and hence the predicted intensity of PSS on sperm quality 9 . Bjork and Pitnick 4 showed that, contrary to theoretical prediction, the 'opportunity for sexual selection' , which is the standardized intra-sexual variance in the number of offspring produced and expresses the maximum potential strength of sexual selection 22 , did not decline with increasing sperm length. Moreover, the femalespecific opportunity for sexual selection increased with sperm length LETTER RESEARCH (R 2 = 0.994) 4 . However, Bjork and Pitnick 4 were unable to explain these patterns despite the ratio of sperm to eggs approaching parity.
Achieving a resolution to the big-sperm paradox requires explaining the mechanism(s) by which a stronger female preference compensates for the theoretically predicted (but not realized 4 ) intrinsic decline in the strength of PSS resulting from reduced sperm numbers with increasing investment per sperm. A resolution should also discern how females benefit from their preference for longer sperm. The length of the female's primary sperm-storage organ, the seminal receptacle (SR), co-diversifies with sperm length in Drosophila 23 and numerous other taxa 1 and has been demonstrated to be the proximate basis of a cryptic female preference for sperm length. Specifically, longer sperm are superior at displacing, and resisting displacement by, shorter competitor sperm within the SR 3,16-18 , and longer SRs drive sperm-length evolution by enhancing this competitive advantage 3 . Because there are substantive developmental and longevity costs associated with longer SRs 18 , SR length is more likely to evolutionarily increase if these costs are compensated for by direct and/or indirect benefits accrued by biasing fertilization in favour of longer sperm. Although Drosophila sperm have been shown to contribute no direct benefits to the female or her offspring 24,25 , indirect benefits postulated to explain the evolution of premating female preferences may similarly explain cryptic postmating female preferences 2 .
We first investigated whether Fisherian runaway sexual selection could provide a countervailing mechanism for the intrinsic decline in the strength of selection predicted to accompany increases in sperm length. We conducted an intraspecific test of an essential prediction of this hypothesis-a positive genetic correlation between SR and sperm length-using a well-replicated diallel breeding design between ten D. melanogaster isogenic lines and evaluating the genetic architecture underlying trait variation (see Methods and also ref. 26). We found a highly significant, positive genetic correlation between sperm and SR length (Table 1), which would theoretically serve to drive spermlength evolution as SR length evolves (and vice versa). Importantly, increases in SR length would further intensify directional selection on sperm length, as SR length was negatively genetically correlated with female remating interval and positively correlated with the time interval between insemination and active female ejection of excess last-male and displaced resident sperm from the reproductive tract (Table 1). Faster remating enhances PSS, and later sperm ejection prolongs direct competition between sperm for limited storage space and affords longer sperm greater opportunity to exert their superior competitiveness 26 (also note the positive genetic correlation between SR length and the proportion of resident sperm displaced; Table 1).
We next explored the potential for females to accrue indirect (genetic) benefits by virtue of sperm length serving as a reliable indicator of male quality. We compared D. melanogaster reared in benign and stressful developmental environments within a quantitative genetic framework to assess the sensitivity of sperm length to the nutritional history and the physiological condition of males 11-13 . Sperm length was highly heritable (Table 1) but not condition-dependent (linear mixed-effects model controlling for genetic background of 45 nuclear genotypes: t = −0.57, P = 0.58; Extended Data Fig. 4). At face value, this result refutes all indicator models as an explanation for SR-length evolution. Nevertheless, because of the strong negative evolutionary relationship between sperm length and number in Drosophila 20 , sperm-length evolution may be mediated by its influence on the condition dependence of sperm number. We thus investigated seven Drosophila species varying in body sizes, sperm lengths and egg volumes (Extended Data Table 2; Extended Data Fig. 5). Rearing each under varying larval densities, we produced a range of adult body sizes as a proxy for condition 12,13,27 , as previous studies employing a similar approach with Drosophila have demonstrated positive associations between male body size and fitness 28 . These adults were assayed for reproductive potential with no reproductive competition and ad libitum access to mates, food and oviposition substrate. We then examined the strength and slope of the within-species, sex-specific relationships between body condition and reproductive potential (see Methods) to test the prediction that male reproductive potential becomes increasingly condition-dependent as sperm length increases.
Male reproductive potential increased with condition in all species (Extended Data Fig. 6a-g), although not significantly so in D. arizonae with the shortest sperm (Extended Data Fig. 6a; r = 0.36, P = 0.11; all other species: r ≥ 0.49, P ≤ 0.01; Extended Data Table 3 and Extended Data Fig. 5a, c). Drosophila bifurca, with the longest sperm, exhibited the strongest relationship (r = 0.93, P < 0.0001;   Table 3). Note that D. arizonae (Extended Data Fig. 6h) has the smallest eggs and D. hydei (Extended Data Fig. 6m) has medium-sized eggs; D. melanogaster showed the strongest relationship (Extended Data Fig. 6i), also with medium-sized eggs (Extended Data Table 2). Next, we combined these intraspecific relationships for all seven species into comparative analyses to determine how much of the among-species variation in the condition dependence of sex-specific reproductive potential is explained by variation in gamete size (Fig. 3). In phylogenetic regressions, the male reproductive potential became increasingly condition-dependent as sperm length increased (r = 0.82, P = 0.02, λ < 0.0001 1.0,0.04 ; Fig. 3a), with the standardized slopes also becoming steeper (r = 0.94, P = 0.002, λ = 1.0 0.09,1.00 ; Fig. 3b). Hence, males of any condition can produce and inseminate many 'cheap' sperm, but only high-quality males have the available resources to produce abundant 'expensive' sperm. In striking contrast, producing larger eggs did not increase the condition dependence of the reproductive potential in females (r = 0.51, P = 0.24, λ < 0.0001 1.0,0.17 ; Fig. 3c), nor did the intraspecific slopes become steeper as egg volume increased (r = 0.66, P = 0.11, λ < 0.0001 1.0,0.11 ; Fig. 3d). Hence, investment per gamete underlies interspecific variation in the condition dependence of reproductive potential for males but not females.
Our findings offer a possible resolution to the big-sperm paradox by revealing an interacting combination of trait covariance and mating-system characteristics antithetical to the weakening of the sexual selection intensity as sperm length increases. Given the substantial costs of producing long sperm 20,29 , it is unclear how this trait has evaded the theoretically predicted development of condition dependence found for other costly sexual characters 13 . Nevertheless, the intimate developmental association between sperm length and number renders the latter trait a surrogate indicator of correlated condition. Smaller (poor-quality) males pay higher costs for the same increase in trait size 11,30 , making the production of plentiful long sperm an intrinsically 'unfakeable' trait. Females of species with longer SRs remate more frequently, owing to both a negative genetic correlation between the two traits and faster sperm depletion when receiving smaller ejaculates. In D. bifurca and other species with very long sperm, females typically mate with several males each day 4 , which may explain the previously observed, strong positive relationship between sperm length and the femalespecific opportunity for sexual selection 4 . What is perhaps most critical to our understanding of sperm-length evolution is that only males in good condition can produce sufficient sperm to capitalize on the increased mating opportunities, with females consequently receiving indirect genetic benefits. These results reveal a novel component to our understanding of the operation of sexual selection: the intensity of selection on female preferences can remain strong owing to within-population variance in male reproductive potential, even when sex-specific mean reproductive potentials and the operational sex ratio approach unity.
By experimentally manipulating sperm length and number in D. melanogaster, both traits were previously found to contribute to competitive fertilization success, with the relative fitness contribution of sperm length increasing as sperm numbers decreased 16 . Here we further demonstrate the non-independence of selection on sperm quantity and quality, and hence the false dichotomy of sperm competition and cryptic female choice as forces shaping the evolution of sperm form. For many species, what may matter most in PSS is not simply transferring the most sperm or the best sperm, but rather the greatest number of sperm that are designed to survive and compete best given the specific female reproductive environment.
Online Content Methods, along with any additional Extended Data display items and Source Data, are available in the online version of the paper; references unique to these sections appear only in the online paper.

METHODS
No statistical methods were used to predetermine sample size. Flies were randomly assigned to experimental treatments. All measurements and counts were conducted blind to treatment and to values of other traits and outcomes in mating experiments. Experimental material. Condition dependence of sex-specific reproductive potential was assayed using strains of D. eohydei Quantitative genetic analyses of sperm and female reproductive tract morphology, sperm handling and sperm competition outcomes was performed with genetically transformed LH m populations of D. melanogaster that express a protamine labelled with either green (GFP) or red fluorescent protein (RFP) in sperm heads 31 . All experimental flies derived from isogenic lines 32 ('isolines') of the respective GFP and RFP populations, following 15 generations of full-sibling inbreeding (theoretical inbreeding coefficient = 0.96) 33 . Evolutionary allometry of sperm and egg size. Sperm length, egg volume, ovariole number and the sex-specific thorax length data for 46 species were obtained from the literature 21,29,34 , with novel data (except ovariole number) obtained for ten additional species using identical methods (Extended Data Table 2). Drosophila ficusphila was excluded from the analyses including ovariole number due to being an extreme outlier (13 compared to 22.6-52.86 ovarioles in all other species; Extended Data Table 2). Evolutionary allometry of exaggerated, sexually selected traits from different taxa. For comparison of the allometric slope of Drosophila sperm with slopes of other sexual traits that are widely considered to be exaggerated due to intra-or intersexual selection, we obtained interspecific allometric slopes or comparative data sets permitting such analyses from the literature for a range of classic examples 14, [35][36][37][38] (Extended Data Table 1; Supplementary Tables 1-3). Reported allometric slopes were not usually controlled for phylogeny and could not always be reanalysed because data sets were not provided, but where possible, we reanalysed them by incorporation of a molecular phylogeny (Extended Data Figs 2 and 3; Supplementary Table 3). Since all phylogenies were reconstructed from published figures without branch length information or were combined from different molecular trees, we used equal branch lengths in all taxa. Based on slope comparisons with and without phylogenetic control, however, the lack of such control did not have a major impact on the interspecific slopes. Within these constraints, precise slope estimates should be used with care. Condition dependence of sperm length. Using the same isolines as the quantitative genetic analyses (see below) but in a half-diallel instead of diallel cross design (that is, n = 45), 40 newly-hatched larvae of each cross were transferred to a rearing vial with regular fly medium (see above) and another 40 larvae to a vial with 75% less yeast in the medium and only half the amount of medium in the vial. Larvae were randomly assigned to rearing treatments. Following development under these benign and moderately stressful conditions, respectively, five random males of each cross and rearing treatment were aged for at least a week before measuring their thorax length and the length of five sperm per male. Condition dependence of reproductive potential. For all seven species, variation in body size was generated by transferring first-instar larvae randomly to culture vials at three different densities: 25, 75, and 150 larvae per 8-dram vial containing 8 ml of medium. Virgin flies were then collected on the day of eclosion and thorax length, a reliable index of total dry mass 39 , was recorded. Focal males and females were selected to represent the entire size distribution, with each fly then isolated within a vial containing medium and live yeast and transferred to a fresh vial every three days until reaching two days post-reproductive maturity, the age of which varies between sexes and among species 29 . All virgin males and females used as mates of focal flies were derived from population bottles.
The reproductive potential of each focal male (n = 15-27 per species) was assayed by placing it with eight randomly assigned virgin females in a plastic 200 ml bottle that was inverted over a small Petri dish containing medium and live yeast. Every 24 h, across four successive days, the male was removed and transferred to a new bottle containing eight virgin females. Because males could exhibit sizerelated variation in the number of mature sperm stored in the seminal vesicles at the start of the experiment, the eight females from day 1 were discarded. The 24 females from days 2-4 were provided with fresh oviposition plates daily until the production of offspring ceased (that is, no eggs hatched). Oviposition plates were stored at 25 °C and the number of larvae hatching on each plate was counted after 48 h. All larvae produced by the 24 females exposed to each male were summed as a measure of that male's reproductive potential.
Female reproductive potential was assayed in a manner similar to males, except that each focal female (n = 25-36 per species) was placed with three randomly assigned virgin males in a vial containing medium and live yeast. Each focal female was transferred to a fresh vial with three new virgin males every 24 h across four successive days. The day 1 vial was discarded to control for variation among females in the number of mature oocytes at the start of the experiment. All eggs laid by each female from days 2-4 were summed as a measure of that female's reproductive potential. Quantitative genetic analyses of female preference, male ornament and associated characters. To vary the female genetic background, single pairs of virgin males and females of ten different RFP isolines were crossed in all non-self combinations (that is, 90 diallel crosses with 45 different nuclear genotypes, all independent of the RFP standard competitor male 26 ). In each of two blocks separated by two generations, we assayed three random F 1 females from each of three separate malefemale pairs per cross (that is, 90 crosses × 2 blocks × 3 families × 3 females = 1,620 females). All virgin flies were aged for three days before their first mating. All experimental males were F 1 progeny from crosses among a single pair of isolines with either GFP-or RFP-tagged sperm.
Using a double-mating design, reproductive outcomes were quantified immediately after female sperm ejection (that is, <5 h after mating and before the first egg has entered the bursa for fertilization) following the second mating, which we have shown repeatedly to directly predict paternity shares among competing males over the three subsequent days of oviposition 17,26,31 . Each female was mated with a virgin GFP male and, two days later, with a virgin RFP male, with additional 6-h remating opportunities on days 3-4 for any refractory females. Each male was used for only one mating. Following all matings with a second male, we used established protocols to quantify (i) copulation duration, (ii) the number of resident first-male sperm at the time of remating, (iii) time until female ejection of excess second-male and displaced first-male sperm, (iv) the number of displaced first-male sperm, the number of second-male sperm (v) transferred and (vi) ejected, (vii) the proportion of each male's sperm ejected, (viii) the distribution of both competitors' sperm, respectively, across the different organs of the female reproductive tract (that is, bursa copulatrix, SR, and paired spermathecae) and (ix) the proportional representation of sperm derived from the first (S 1 ) or second male (S 2 ) in each respective location (for example, the SR, which is the primary source of sperm for fertilization 31 ) and in the entirety of the female reproductive tract. For one random female of each family (that is, six females per cross), we additionally measured the length of the thorax and the SR 17,26,31 . Statistical analyses. All analyses were performed using the statistical package R version 3.0.2 (R Development Core Team 2013) and SAS v9.3 (SAS Institute 2011). Evolutionary allometry of sperm and eggs. We used phylogenetically controlled reduced major-axis regressions (phyl.RMA in R package phytools). For these analyses, additional species (that is, D. mettleri, D. pachea, D. subpalustris, D. rhopaloa and D. suzukii) were added to the van der Linde et al. 40 phylogeny based on other molecular phylogenies 29,41 (Extended Data Fig. 1). We linearized egg volume by the cube root for consistent dimensionality with female thorax length and sperm length 22 . For comparison, however, we also used egg length, the allometric slope of which was identical to linearized egg volume up to the third decimal point (b = 0.836 compared to 0.835). Evolutionary allometry of exaggerated, sexually selected traits from different taxa. Wherever data and corresponding phylogenies were available, we analysed them using phyl.RMA as for Drosophila gametes. For direct comparison between taxa and/or traits, we adjusted all data to equal dimensionality (that is, cube-rooting mass variables or square-rooting area variables) to ensure that isometry was at a slope of 1. All analyses were confirmed to exhibit a significant association between the two traits compared in phylogenetic least-squares regressions before calculating phylogenetic RMA slopes. Condition dependence of sperm length. Treatment effects on sperm length were analysed in linear mixed-effects models controlling for the genetic background of sires and dams and their interaction as random effects. For comparison, we repeated these analyses on the thorax length of the same males. Condition dependence of reproductive potential. For each of the seven species, regression analyses were used to examine the relationship between either the total number of progeny produced and male size (that is, thorax length) or the total number of eggs laid and female size. For these relationships, we calculated the intraspecific correlation coefficients, r, which represent their strength and direction, as well as the standardized slopes, for use in subsequent comparative analyses. A Bartlett's test of homogeneity of variances confirmed no differences among the seven species in the coefficient of thorax length for males (K 2 = 9.92, P = 0.13).

LETTER RESEARCH
Although there was a marginally significant difference for females (K 2 = 12.67, P = 0.05), this was primarily attributable to a greater standard deviation in female thorax length in D. hydei (Extended Data Fig. 7; a Bartlett's test revealed no significant difference among the remaining species when D. hydei was excluded: n = 6, K 2 = 4.38, P = 0.50).
To compare the degree of intraspecific condition dependence among species, we converted the correlation coefficients, r, of the intraspecific regressions using Fisher's transformation and weighted them by sample size to obtain a weighted Z r for each species 42 . Comparative relationships between weighted Z r values and the species-specific means of sperm length (for males) and egg volume (for females), respectively, were then examined. These among-species relationships, as well as those of the standardized slopes, were examined using phylogenetic generalized least-squared (PGLS) regressions 43 to account for statistical non-independence of data points due to shared ancestry of species, based on the same molecular phylogeny as in the allometric relationships above 40 . Using maximum-likelihood methods, PGLS models estimate the phylogenetic scaling parameter Pagel's λ to evaluate the phylogenetic relationship of the covariance in the residuals 43 . We used likelihood ratio tests to establish whether the models with the maximum-likelihood value of λ differed from models with values of λ = 0 or λ = 1, respectively, with λ close to 0 indicating phylogenetic independence and λ close to 1 indicating a strong phylogenetic association of the traits 43 . Quantitative genetic analyses of female preference, male ornament and associated characters. The genetic architecture underlying each trait was evaluated by using the 'animal model' and a resampling approach to estimate the variance components 44,45 . Means of each of the six families per isoline cross, rather than individual flies, represented our sample size in order to minimize missing data and because, for some traits such as SR and thorax length, we had only one measure per family 26 . We resampled with replacement among the three family means per isoline cross and block using the SURVEYSELECT procedure in SAS v9.3 (SAS Institute 2011) and calculated their mean for each of 1,000 resampling replicates. For each replicate data set, we then conducted a generalized linear mixed model (procedure GLIMMIX) on these mean values, with block as a fixed effect, paternal and maternal lines and their interactions as random effects, and a multimember effect defining the nuclear parental contributions. This model is an incomplete diallel with reciprocal but no self crosses 44,45 : in the diallel analysis it is assumed that the nuclear contributions (N) of the male and females are drawn from the same distribution.
The model decomposed for each replicate the total phenotypic variance into different genetic and residual contributions 44,45 : where Y ijk is the trait of the kth replicate cross between isoline i sires and isoline j dams, and μ is the trait mean of the population. N i and N j represent the additive contributions by nuclear genes of the respective parental isolines, independent of sex; T ij is the interaction between the haploid nuclear contributions; M j represents the maternal genetic and environmental effects of isoline j and P i the paternal genetic and environmental effects of isoline i; K ij reflects the interaction between maternal and paternal contributions; and R k(ij) is the effect of the kth replicate cross within each combination of dam × sire lines 46,47 . Means and standard errors of these variance components across all replicate data sets were then bootstrapped and their statistical significance was determined by testing their z scores (that is, variance component divided by its bootstrapped standard error) against the corresponding significance levels from a standard normal probability table. We used one-tailed significance levels under the a priori constraint that variances are means of squared values, which therefore necessarily have a positive sign.
In the present study, we used only the additive nuclear variance components, σ 2 n , which was necessary to calculate the heritability of, and genetic correlations between, traits of interest. Based on the estimates of the variance components from the diallel analysis, the causal component of the additive nuclear variance, V A , was estimated as V A = 4σ 2 n /(1 + f), where f is the theoretical inbreeding coefficient (f = 0.96 based on 15 generations of full-sibling inbreeding 37 ). The additive-by-additive epistatic variance was ignored under the assumption that such higher-order variance is generally very small 45,48 . Mean values calculated in the above resampling procedure were used to estimate the variances and covariances based on separate univariate analyses of traits x 1 and x 2 , and x 1 + x 2 , resulting in covariances as cov(x 1 ,x 2 ) = [var(x 1 + x 2 ) -var(x 1 ) -var(x 2 )]/2. We then calculated the corresponding genetic correlations as r A = cov(x 1 ,x 2 )/[var(x 1 ) × var(x 2 )] for each of the 1,000 replicates 49 , bootstrapped the genetic correlation coefficient and its standard error, and tested for statistical significance by comparing the z scores to two-tailed significance levels derived from a standard normal distribution 50 .