Admixture Mapping Comes of Age ∗

Admixturemappingisbasedonthehypothesisthatdifferencesindisease rates between populations are due in part to frequency differences in disease-causing genetic variants. In admixed populations, these genetic variants occur more often on chromosome segments inherited from the ancestral population with the higher disease variant frequency. A genome scan for disease association requires only enough markers to identify the ancestral chromosome segments; for recently admixed populations, such as African Americans, 1,500–2,500 ancestry-informative markers (AIMs) are sufﬁcient. The method was proposed over 50 years ago, but the AIM panels and statistical methods required have only recently become available. Since the ﬁrst admixture scan in 2005, the geneticbasesforarangeofdiseases/traitshavebeenidentiﬁedbyadmix-ture mapping. Here, we provide a historical perspective, review AIM panels and software packages,


INTRODUCTION
A major goal of human genetics is to identify genetic variation predisposing to complex, common human diseases. Genome-wideassociation scans (GWAS) have led to the discovery of thousands of alleles associated with human diseases and traits (31), but GWAS are costly and, because of the large number of single nucleotide polymorphisms (SNPs) analyzed, incur a stiff statistical penalty (61). This is problematic because of the growing realization that most common-risk alleles have small effect sizes while alleles with large effect sizes tend to be much less frequent (37). To have the power to identify small to moderate effect requires thousands of cases and controls ( Table 1). Admixture mapping provides an attractive and more powered alternative to GWAS for gene discovery in admixed populations for a subset of diseases/traits that are differentially distributed in the ancestral (parental) populations (49). The idea is straightforwardthe genetic variation causing the disease/trait of interest will be more frequent on chromosome segments derived from the parental population with the higher disease/trait incidence.
In this review, we provide only a brief overview of the theoretical basis and statistic methods for admixture mapping as these have been expertly treated by others (41,43,69,72,85). Instead, we provide a comprehensive review of admixture mapping software programs, the development of admixture panels, applications, and successes in the 5-year span marking the passage from an elegant theoretical approach to a powerful gene mapping application with several notable successes.

HISTORICAL PERSPECTIVE
The human diaspora that has occurred in the last 400-600 years has resulted in gene flow between previously separated human subpopulations (Figure 1). Meiotic crossover in admixed populations leads to a mosaic of chromosomal segments derived from one or the other ancestral (parental) subpopulation ( Figure  2). The duration, direction, and rate of gene flow between two populations influence the proportion of admixture and the length of chromosomal segments derived from the ancestral populations, which will vary among individuals.
In admixed populations where gene flow commenced within the last several hundred years (e.g., African Americans or Hispanics), linked alleles will show extended linkage disequilibrium (LD) relative to the ancestral populations.
Using the admixture linkage disequilibrium (ALD) generated in recently admixed human populations to assign the traits to linkage groups was first proposed by Rife in 1954 (59), but it took nearly 4 decades for the approach to gain serious attention. In 1988, Chakroborty & Weiss (13) renewed interest in ALD to map genes by positing a classical linkage approach analogous to family or hybrid animal studies that exploits long range LD to limit the number of markers required for genome-wide coverage. This was followed by a flurry of theoretical papers in the 1990s (11,41,60,72); these early pioneers advanced various statistical strategies and methods, all of which were based on the association between a marker allele and trait to assign genes to a linkage group-a method Stephens et al. (72) termed mapping by admixture linkage disequilibrium (MALD). Major migrations and diasporas, 1400-1800, that are sources of important admixed populations for admixture mapping.
In 1998, McKeigue (42) proposed an alternative approach to disease gene localization that tested for the linkage of the disease or trait with parental ancestry at each locus, defined as 0, 1, or 2 allele copies inherited from the ancestral populations. McKeigue named this approach admixture mapping because it is based on the association of local chromosomal ancestry with the disease rather than on LD between the marker and phenotype (Figure 3). It was quickly appreciated that admixture mapping could be applied to case-control studies by comparing locus-specific ancestry at each ancestryinformative marker (AIM) between cases and controls. Hoggart et al. (32) and Montana & Pritchard (45) showed through simulations that the extent of ancestry at each locus could also be compared to genome-wide average ancestry for a case-only study design (32,45). They demonstrated that for rare diseases, an affectedcase-only design is highly efficient and better powered than a case-control design. A statistical deviation in local ancestry away from the genome-wide average could be used to identify a peak region of interest. In studies where both cases and controls are available, most investigators elect to calculate both a case-control statistic quantifying the difference in ancestry between cases and controls, and a case-only statistic comparing the extent of ancestry at each locus to the genome-wide average. Like hybrid or family linkage studies, a moderate number (≈1,500-2,500) of ancestryinformative markers is needed for the initial genome-wide scan, followed by fine mapping with additional markers to identify the causal allele (49). Further, Patterson has shown that admixture mapping has the power to detect association with relatively modest odds ratios with 2,500 or fewer cases (Figure 4); this offers an advantage over standard GWAS results because it uses far fewer markers and thus has a much more modest correction for multiple comparisons.
The development of the theoretical approach to using ancestry to map genes by McKeigue in 1998 was followed rapidly by the development of statistical methods, panels of AIMs for admixture typing, and software programs, many of which became available by Schematic pattern of chromosomal ancestry resulting from a moderate number (∼8-20) of generations since a two-way admixture event. Starting with the second generation, recombination produces chromosomal blocks of different continental ancestries. The present day admixed population has a varying extent of overall ancestry and has blocks of ancestry that vary in size both because of the random nature of recombination and because the original chromosomes have been subject to recombination for different numbers of generations.
2004. The successful application of admixture mapping rapidly followed the publication of well-designed Latino and African genotyping panels of AIMs (16,50,70). Admixture mapping has come of age and is now being applied to a wide range of traits and diseases for which it is hypothesized that the differences in disease rates across populations are due to populationspecific frequency differences of the causal variant(s).

THE PROCESS OF ADMIXTURE
Gene flow between reproductively isolated populations results in chromosomal admixture with contributions from each contributing ancestral population. The gene flow can be a single event in time or continuous over many generations. The gene flow results in the temporary generation of long haplotype blocks, which includes polymorphic variants, derived from one or the other ancestral population (Figure 2). These blocks of alleles in ALD are extremely extended in the first few generations following introgression, but the LD segments become progressively shorter by recombination with increasing generations. The length of haplotype segments derived from each of the ancestral parent populations is a function of both the number of generations since the initial admixture event and the duration of gene flow.
In recently admixed populations, the extent of ALD is intermediate between ancestral populations in which recombination has occurred over hundreds of thousands of generations and families in which recombination has occurred in only a few generations. The alternative fixation of the FY− allele of the Duffy antigen receptor for chemokines (DARC or FY) in African populations and the FY+ allele in European populations were used to empirically track ALD between the FY locus and neighboring markers in African Americans. ALD was found to extend across a 30-cM region, centered on the FY locus, but was strongest for a flanking interval of 5-10 cM ( Figure 5) (35). These results are in agreement with other studies that indicate similar-sized regions of ALD (49). Using a set of 3,011 AIMs spanning the genomes of African Americans, Patterson and colleagues estimated that strong ALD extends on average for approximately 17 cM (49), indicating an average of six to seven generations of admixture (49,70). These findings of extended ALD in the recently admixed African American population provided empirical support for the practical application of admixture mapping.

ADMIXED POPULATIONS
Although gene flow and resulting admixture have occurred throughout human history, it is the relatively recent gene flow between continental populations that is currently most amenable for admixture mapping. The forced diaspora of Africans to the Americas in the sixteenth century has resulted in two-way admixture between Africans and Europeans in the United States (Figure 1). On average, African Americans have gametes that are approximately 80% African derived and 20% European derived (49,58). Of course, for any single individual, African ancestry may vary from 100% African derived to 1% (44,47,48,51). Admixture has also occurred between the Spanish and Amerindians as a result of the Spanish conquest and colonization of the New World over three centuries, beginning with the 1492 discovery of the New World. In a survey of ancestry estimates of Latino populations from California (Los Angeles), Mexico, Brazil, and Columbia, the Latino populations from Mexico and Los Angeles had approximately 50% European ancestry and 40% North American Amerindian, whereas Latinos from Brazil and Columbia had 71% European ancestry; average African ancestry ranges from approximately 4% in Mexicans to 10-11% in South American Latinos ( Table 2) (53). Other surveys of Latino ancestry indicate considerable heterogeneity among regions, with a range of 33-95% European ancestry, 0-58% Native American, and 0-29% West African; the proportions show regional and geographic variation (8,10,16,36) as well as differences in ancestry associated with socioeconomic status (24).
Unique admixed populations are the sotermed Cape Colored residing in the western Cape of South Africa (50) and the Uyghurs of west China (81,82). The Cape Colored population is genetically heterogeneous, with admixture contributions from the isiXhosa, Eu- Schematic of the pattern of chromosomal admixture around a disease locus. We suppose the disease is inherited from the majority ancestry population (dark green), with the minority ancestry population shown in light green. The graphs show the percentage of ancestry derived from the dark green segment of chromosome. (a) In the region of the disease locus (yellow bar), there is an excess of majority ancestry blocks among cases, revealed as a spike in a graph of average ancestry for cases along the chromosome. The orange bar indicates the location of the disease gene. (b) Among population controls, the distribution of ancestry blocks is random across the chromosome. The spike of ancestry can be quantified either by comparing case ancestry with control ancestry at the same location or by comparing peak case ancestry with average case ancestry across all chromosomes.
ropeans, South Asians, and Indonesians. Since the isiXhosa are themselves admixed, with major ancestry contribution from the Bantu and to a lesser degree from the Bushman (San), the South African Cape Colored are at least five-way admixed (50 Locus where risk is higher in the population contributing more ancestry (e.g., multiple myeloma in African Americans) Locus where risk is higher in the population contributing less ancestry (e.g., testicular cancer in African Americans)

Figure 4
Number of samples required to detect a disease or trait locus with perfect information on ancestry and the same proportions of two-way ancestry in each parent. The sample number needed to detect an association in African Americans is estimated by averaging the power for a given risk model and the percentage of ancestry over the percentages of ancestry seen in African Americans (European ancestry ∼20 ± 12%). In practice, the power is robust for ancestry ranging from 10-90%. From Reference (49).
connecting East Asia with Central Asia and Europe. Studies by Xu & Jin (82) found that the Uyghurs, representing 50% of the population of the Xinjiang Uyghur Autonomous   have significant amounts of European ancestry, estimated at approximately 50%. Assuming a single pulse of admixture, STRUCTURE estimates the Asian-European admixture occurred 2,080-2,720 years ago (104-136 generations), whereas ADMIXMAP dates the event to 1,680-2,400 years ago (84-120 generations). The length of the chromosomal regions derived from East Asian and European populations averages 2.4 cM and 4.1 cM, respectively; therefore, it will take approximately tenfold as many AIMs for genome-wide coverage. Other populations where recent admixture has been documented are the Australian Aboriginals (63) and the Pacific Island populations (e.g., Hawaii, Norfolk Island) (28,29,40). These populations offer unique opportunities to identify genes associated with medical conditions or physiological traits that differ across populations. With increasing knowledge gained through resequencing and high-density genotyping arrays of diverse populations, we anticipate the identification of more admixed populations and finer grained discernment for both interand intracontinental ancestry contributions. To date, high density, genome-wide admixture mapping panels have been constructed only for African Americans, Latino/Hispanics, and Uyghurs (Table 3)-populations with intercontinental ancestry mixtures. Withincontinent admixture can also be exploited for gene mapping; the 1,000 Genomes Project and International HapMap Project will no doubt identify a subset of SNPs that is differentiated among intracontinental populations (e.g., northern and southern Europeans) to construct fine-grained admixture maps for these admixed populations.
Two-way admixture between continental populations has been modeled and applied to admixture studies, but three-way admixture is not uncommon, particularly among Latinos/Hispanics, who may have ancestral contributions from northern, central, or southern Native Americans, Africans, and Europeans. Further, as observed in the Cape Colored population in the western Cape of South Africa, gene flow may occur between continental groups (e.g., African Americans), within continental groups (e.g., isiXhosa in South Africa), or in combination (e.g., Cape Colored ethnic group in South Africa). Current statistical methods and panels for ancestry-informative markers are optimized for two-way admixture between continental groups. However, as discussed below, tools for distinguishing recent and ancient admixture, as well as fine-grained intracontinental admixture, are being developed.

METHODS AND PROGRAMS
Here, we give a brief conceptual review of the general algorithmic approach to inferring chromosomal ancestry and of using this inference to estimate the likelihood that a genomic region is associated with disease ( Table 3). Zhu gives an extensive review of the mathematical issues of admixture mapping (85). The proposal of McKeigue to infer the ancestry of stretches of chromosome was the starting point for practical use of admixture mapping (42). The accuracy of inference of ancestry for a single locus is strictly set by the allele frequency difference between the two ancestral populations; for the extremely rare case of an allele with a fixed difference [fixation index (F st ) = 1] between populations, the allele in an admixed individual must be inherited from the populations carrying that allele. This is key to McKeigue's proposal; however, this inference becomes rapidly less certain as the frequency difference [delta (δ)] becomes smaller. By con-sidering the ancestry of chromosomal segments rather than that of individual loci, one takes advantage of the fact that, as long as the average distance between tested loci is less than the average distance of recombination fragments between the admixed populations, adjacent markers will have a significantly greater-than-chance likelihood of being on the same segment.
This tendency is exploited by the hidden Markov model (HMM) algorithms that form the basis of the ancestry calculations (42,49). Considering a chromosome in an admixed individual, by assumption, each locus is inherited from one of the ancestral groups. However, we actually observe only the allele, not the ancestry. Starting from a particular locus, if we proceed in one direction down the chromosome, we encounter successive tested loci. Between one locus and the next, there may or may not have been a recombination event since the original population admixture events. If there was a recombination event, the ancestry of the loci switches. We assume the chance of crossover between any two loci is independent of the presence or absence of recombination between previous loci (this is not strictly true biologically but is a reasonable approximation). With this assumption, the succession of ancestries along the sequence of loci forms a Markov process; that is, the chance that locus n is of African ancestry depends on whether locus n−1 is African or European, and on the likelihood of recombination between n−1 and n, but not on the ancestry of any of the earlier loci. Because www.annualreviews.org • Admixture Mapping Comes of Age Example of the influence of underlying chromosomal ancestry on observed genotype. For simplicity, we suppose we are viewing a single chromosome (X chromosome or autosomal chromosome with known phase). Observation of the genotype at a locus allows a probabilistic inference of the ancestry of the locus; e.g., for locus n, the observed allele 1 is more likely to have come from an A chromosome than from a B chromosome (here, for simplicity, allele 1 is always the more frequent allele in ancestral population A).
Where recombination has occurred since the admixture event, the chromosomal ancestry switches, so there is a succession of blocks of alternating ancestry. The observed alleles will probabilistically follow the allele frequencies from the underlying ancestral population of that chromosomal block. The task is to use knowledge of the ancestral allele frequencies, proportion of A and B ancestry, and amount of recombination (a function of the genetic distance between the loci and the time since admixture) to infer the succeeding blocks of A and B ancestry from observation of the genotypes.
we cannot actually observe the ancestry of the loci, this is a hidden Markov model. What we actually observe are the genotypes at the loci. Knowing the overall ancestry of the individual, and the likelihood of recombination between successive loci, we may calculate probabilities of different sequences of ancestries of the loci. If we know the ancestral allele frequencies at each locus, we know the probabilities of observing a given allele at each locus, given a specific sequence of ancestries ( Figure 6). It is now a computationally intensive, but tractable, problem to infer the probability of a given sequence of ancestry, given the observed sequence of alleles. The programs diverge in the mathematical approach to this inference of ancestries from genotypes, with one group using a Markov chain Monte Carlo (MCMC) algorithm (22,32,45,49) and a second group using maximum-likelihood inference (Table 3) (73,86). The output of these inferences is an estimate of whether an individual has zero, one, or two chromsomes inherited from a particular ancestry, at a given point on the chromsome (Figure 7).
The common assumption in these programs is that the loci are not in linkage disequilibrium in any of the ancestral populations. This assumption is required if we assume that the probability of a particular allele at locus n is determined totally by the ancestry of that locus. Conversely, if there is LD in an ancestral population between locus n and locus n−1, and if the two loci both have that ancestry, then the allele frequency at locus n is also a function of the allele present at locus n−1, by the definition of LD.

SCORING DISEASE ASSOCIATION
If there is in fact a detectable ancestryassociated disease association due to genetic variation, there will be a peak around the disease locus in the calculated fraction of ancestry from the ancestral population at greater risk for the disease group. A measure of the strength and statistical significance of this peak is needed; available software programs supply several alternative but fairly equivalent measures. For case-only data, all of these measures involve a comparison of the proportion of ancestry from the ancestral population with higher frequency of the risk allele, at a putative disease associated marker, compared with the genome average or average over unrelated sections of the genome. A straightforward example is the Z score statistic used in both ADMIX-PROGRAM and MALDsoft (45,86), which for a case-only design compares the difference Ideal output from a chromosomal ancestry inference program: ancestry for an autosomal chromosome pair from an individual (a) and from a second individual (b). For each point along the chromosome, the program indicates whether 0, 1, or 2 chromosomes carry the specified ancestry (light green). Realistically, programs indicate the probability of carrying 0, 1, or 2 chromosomes from the specified ancestry; in favorable cases, the program predicts the ancestry with near certainty.
between the calculated proportion of ancestry at the marker m and the overall proportion of ancestry, with the variance in the proportion of ancestry: where Z C (m) is the Z score for the marker; p d (m) and p d (unl) are, respectively, the calculated proportions of ancestry from the risk population at the marker and at an unlinked region of the genome; and σ ( p d (m)) is the variance of the calculated proportion of ancestry at the marker. Z scores for case-control designs are calculated in an analogous manner. Alternative approaches (ANCESTRYMAP, ADMIXMAP) calculate a likelihood ratio for the probabilities of the calculated proportion of risk population ancestry at the marker, comparing the probability under the assumption that the marker is linked to a disease locus with the probability under the assumption that the marker is unlinked (32,49).

NEWER SOFTWARE ANALYSIS PROGRAMS
Part of the original motivation for developing admixture mapping was economizing on genotyping; it was not anticipated how rapidly typing ∼1M SNPs per individual would become standard. Thus the methods described above rely on several thousand SNPs, which, to be optimally informative, are chosen to have large frequency differences between the ancestral populations and, for simplicity in the HMM calculation, should not have significant LD between markers in the ancestral populations. However, there are fundamental reasons why typing a dense set of markers, e.g., from a standard GWAS panel, intrinsically provides more information for ancestry determination. Above all, such a panel makes information on short-range haplotypes from the ancestral populations available; these are powerful for ancestry estimation as, although it is extremely rare for an allele of a single polymorphism to be unique to a continental population, it is quite common for a specific haplotype to be unique to a population.

74
Winkler · Nelson · Smith Allowing ancestral LD between markers implies relaxing the Markov model assumption, since with LD the genotypes within a single inherited block follow their own Markov model; thus the genotype probabilities are a function of the previous genotype as well as of the hidden Markov model. Thus such approaches have been referred to as Markov hidden Markov Models (MHMMs) (85). These algorithms must deal with the substantially increased complexity of considering the multiple haplotype states within each ancestral population in addition to the multiple ancestral populations. As of this writing, to our knowledge, none of these programs (listed in Table 3) include a disease-association calculation; rather, the programs output estimated ancestral population frequencies across the genome, which can be applied by the user to association analysis.

ANCESTRY-INFORMATIVE MARKERS FOR ADMIXTURE MAPPING
Ancestry informative markers are genetic polymorphisms that differ in allele frequencies between the ancestral populations. Although any marker can be used [e.g., single tandem repeats (STRs), single nucleotide polymorphisms (SNPs)], the most often employed are biallelic SNPs; SNPs are abundant, evenly spaced across the genome, and readily genotyped. Table 4 summarizes genome-wide AIM panels informative for ancestry between continental populations. Admixture panels comprise markers with high information content for ancestry determination that are evenly spaced for genome-wide coverage. The optimal density of the panel is determined by the length of the ALD blocks, which is in part determined by the number of generations since the admixture event-increasing generations lead to a decay of ALD and restoration of linkage equilibrium that result in shorter ALD blocks. Shorter ALD blocks or more complex ancestry (e.g., three-or four-way admixture ancestry) will require a higher density of markers to differentiate chromosome ancestry transition due to mei- otic crossover events. The key requirement for admixture mapping panels is a set of genetic markers that provides information at each locus of the ancestry origin of the allele. In addition, the markers need to be distributed across the genome, to be independent (i.e., not in LD with each other), and dense enough to provide resolution of the ancestry transition from one ancestral chromosomal state to the other. A number of methods can be used to determine the information content of a marker or a set of markers. Generally, markers are selected based on their large differences, or delta (δ), in allele frequency between the ancestral populations. Alleles with an F st = 1 provide maximal information, but such highly differentiated markers are the exception. Usually, a fixation index (F st ) > 0.5 is considered sufficient for ancestry differentiation. The information content of markers for distinguishing two ancestral populations can be quantified by any of six methods: the absolute allele frequency difference (δ) between the two population samples; F st , a measure of intra-and interpopulation; variation the Shannon information content (SIC); the Fisher information content (FIC); pairwise Kullback-Leibler divergence; and the informativeness for assignment (I n ) (62,85). A comparison of the six methods by Rosenberg et al. (62) indicates that the measurements are highly correlated, although I n may perform slightly better.
www.annualreviews.org • Admixture Mapping Comes of Age A major challenge in marker selection is genetic heterogeneity that may exist among populations contributing to the ancestral parental population-this can be reduced by testing AIMs in several subpopulations and selecting SNPs that have low information content (e.g., F st < 0.05 in pairwise comparisons among North and South American Amerindian populations) (53). In contrast to the relative homogeneity of Europeans and West Africans, Amerindian populations are more genetically diverse (53,70). It should be noted that admixture maps constructed to date are for admixture mapping in populations where the founding ancestral populations are continental populations-African and European or Amerindian and European. It should also be possible to use admixture mapping for continental subpopulations using the same principles; however, large SNP databases will be required to select SNPs that differentiate between two closely related founder populations since the allele frequency differences are estimated to be on average much lower compared to intercontinental [e.g., among African (17,77) or Amerindian populations (16, 53)] allelefrequency differences.
The first admixture panel comprised 744 STRs for admixture mapping in African American and Hispanic (Latino) populations (68). This admixture map was superseded in 2004 by a high-density admixture map for African Americans (70). The development of the first high-density SNP map illustrates the challenges of SNP selection for ancestry informativeness, prior to the publication of the first phase of the International HapMap Project in 2007 (2). A total of 450,000 SNPs from various public and private sources were queried to obtain a subset of nonredundant markers with concordant physical and genetic (De-Code) positions and known frequencies in Europeans and Africans. The "best" SNPs were selected using a computer program written to choose SNPs that were evenly spaced across the genome and that were maximally informative for ancestry at each locus along the genome. Using an iterative greedy algorithm, SNPs were selected to add the most information to SNPs that had already been selected. That is, some information may already be provided by previously selected SNPs in the 8-cM window of the candidate gene; new markers were added only if they provided additional information as determined by the SIC. Additional criteria for each added SNP were that it had to be at least 50 kb from its nearest neighbor; the estimates of frequencies from databases were adjusted to account for sampling fluctuation that might inflate differential frequencies; and the estimates of frequencies were adjusted by transforming all SNPs so they were 7% closer to 0.5. This prompted the program to select markers for the map close to even the most informative SNPsthereby insuring power even if genotypes were missing.
Once the SNPs were selected, they were genotyped on European Americans, African Americans, sub-Saharan Africans, and Mexican Americans. The validated SNPs were included on the final mapping panel if they (a) genotyped successfully in West African and European American parental populations; (b) conformed to Hardy-Weinberg equilibrium in the parental populations; (c) had a minimum level of informativeness (SIC > 0.035), out of a maximal of 0.709 at the DARC null (FY−) locus; and (d) were similar in frequency for intracontinental populations. In addition, SNPs were eliminated if they were spaced <50 kb from each other or if they were in LD in the parental populations. The final map comprised 3,011 AIMs with a 1.2-cM spacing and 70% informativeness for distinguishing between African and European origins of chromosomal segments. Illumina now has a commercially available product for admixture mapping, with 1,509 ancestry-informative SNPs for admixture studies in African Americans, developed in collaboration with the Reich and Patterson group (26,49).
A second-generation, high-density map for African Americans was published in 2006 (77) using extensive SNP (∼4 million) genotypefrequency data obtained for Asians from Beijing (BEI) and Tokyo (CHB), West African 76 Winkler · Nelson · Smith Yoruba from Ibadan, Nigeria (YRI), and CEPH Europeans (CEU) by the International HapMap Project first and second phases (2,25,75) for initial SNP selection. Over 300,000 SNPs showing a high degree of allele frequency differences between YRI and CEU, with F st > 0.25 and Fisher's information content (FIC) > 1.0, were selected for further evaluation. The FIC identified SNPs that were particularly informative where one parental population (African) contributed substantially more than that of the other population (European) in the admixed African American population. Using the FIC values, >5,000 SNPs from the original 300,000 were selected by choosing a maximum of 4 SNPs in a 2-Mb window, with a minimal distance of 100 kb between SNPs, and by eliminating SNPs that either provided redundant information or were difficult to genotype. This set of selected HapMap SNPs was further tested on two populations of West Africans (Bini and Kanuri), as well as the HapMap CEU and an independent set of Europeans, for differences in FIC, F st , delta (δ), and allele frequencies to identify and remove SNPs that showed heterogeneous allele frequencies within continental populations or that were not in Hardy-Weinberg equilibrium in the parental populations. The final SNP set comprised 4,222 SNP AIMs. This admixture map included only SNPs that were separated by a minimum of 100 kb, but the SNPs were not eliminated if they were in LD with each other. It has been proposed that LD between SNPs might result in false-positive signals. The results of simulations using case-only and case-control algorithms for the effects of LD between SNPs with ADMIXMAP, STRUCTURE, and MALDSOFT ( Table 3) indicated that ANCESTRYMAP was more sensitive to false-positive peaks of excess ancestry with a case-only design, but all programs were robust against false positive ancestry signals using case-control algorithms (77).
Both the Smith et al. (70) and the Tian et al. (77) African American admixture mapping panels were constructed using SNP rather than mi-crosatellite markers and rigorous criteria for SNP selection. The Tian et al. panel had a larger SNP pool for AIM selection; this permitted the group to select SNPs that were close to fixation in West Africans by first choosing SNPs with high FIC rather than high F st values, thereby increasing the informativeness in African American subjects where the individual admixture was 20:80 European: African (77). It should be noted that the Tian et al. panel provides a very dense map with 4,222 SNP AIMs and also a less dense map of 2,000 SNP AIMs; the 4,222 AIMs are reported to extract >60% and >70% of the admixture information for more than 98% and 90%of the genome, respectively, whereas information extraction with the 2,000 AIMs was decreased. At 80% ancestry information, the 4,222 AIMs provided coverage for more than 60% of the genome compared with only 35% for the 2,000 AIMs. The use of denser SNP panels may be more informative for diseases with small ethnicity/ancestry risk ratios (77).
Latinos/Hispanics in the United States and throughout Latin America are largely a mix of European and Native American ancestry, resulting from European colonial rule from the fifteenth to nineteenth century, and in some regions include a variable degree of African ancestry (53,64,66). The ancestry of Latino/Hispanic populations also shows considerable regional differences due in part to historical differences in the extent of European and African immigration, the density of Native American populations, and the duration of gene flow (64,66). Hence, the admixture of Latinos/Hispanics may be two-or three-way, with varying degrees of ancestry contribution from Native Americans, Europeans, and Africans. Added challenges to designing Hispanic/Latino SNP panels are that the Native American populations are genetically heterogeneous (12,16,18,70) and have considerably greater linkage disequilibrium than other populations (18). Attesting to the interest of gene discovery for high-frequency conditions (notably type 2 diabetes and metabolic syndrome) that occur more often in Native Americans and www.annualreviews.org • Admixture Mapping Comes of Age Hispanic/Latinos compared with their European counterparts ( Table 4), three SNP admixture maps were published in a single volume of the American Journal of Human Genetics (38,53,76). Although each of the panels differs in selection criteria and targeted Hispanic/Latino populations, they make the application of admixture mapping to this diverse population practical.
The Mao et al. panel comprises 2,120 SNPs, with high-frequency differences between Native American and European populations and an average intermarker genetic distance of 1.7 cM. The SNPs were selected from the Affymetrix GeneChip Human Mapping 500-K array used to obtain genotypes for population samples from Europeans, MesoAmericans (Mexico) comprising Maya and Nahua, and South Americans comprising Aymara/Quechua (Boliva) and Quechua (Peru). The primary criteria for SNP selection were maximizing allele frequency differences between Amerindians and Europeans and minimizing the allele frequency differences, or delta (δ), among the Native American populations, making the panel appropriate for admixture mapping in two-way admixed populations throughout the Americas.
The Tian et al. SNP panel for Mexican American admixture mapping (76) is similar to the Mao et al. panel (38) in that the markers were selected to differeniate between Amerindian and European ancestries. The AIMs were selected from over 400,000 SNPs chosen from the gene-centric 100-K Illumina array, the 317-K Human Hap array (that utilized HapMap data to select haplotypetagging SNPs), and a set of 20,000 selected on their high F st > 0.25 values between East Asians and Europeans in the International HapMap Project. Because of the shared ancestry between Asians and Amerindians, these SNPs were enriched for ancestry information content. Two Amerindian populations, Pima (Arizona) and Mayan (Guatemala), were used to eliminate the subset of SNPs that differentiated only one Amerindian group from Europeans. The final panel of 8,144 SNPs had an F st > 30 (mean = 0.48), with all but 3 separated by a minimum of 50 kb. A subset of this panel (5,287 SNPs) was shown to discriminate between Europeans and Amerindians from South America, indicating that it, like the Mao et al. panel, would be broadly useful for admixture studies of disparate Hispanic/Latino populations.
It is not uncommon for Hispanics/Latinos to have African ancestry. Price et al. (53) designed an SNP admixture panel that differentiates Amerindian ancestry from both European and African ancestry. SNPs were identified that had similar allele frequencies between Africans and Europeans but substantially different allele frequencies in Amerindians; a small amount of African ancestry was unlikely to be powered for disease-gene detection but might be sufficient to inflate signals if African and Amerindian alleles had similar allele frequencies (53). By decreasing the complexity introduced by three-way admixture, the panel can be usefully analyzed using currently available admixture software programs developed for two-way admixture. The panel is also robust for both North and South American admixed populations in which Amerindian ancestry increases or decreases risk of disease. The investigators were careful to select markers that reduced within-Amerindian differentiation by using six populations from Central and North America and six from South America. A particular and unique strength of this panel is that the complexity of three-way ancestry is controlled by the selection of SNPs having similar allele frequencies between Europeans and Africans (53).

GENE DISCOVERY BY ADMIXTURE MAPPING
In the fall of 2005, the first two genome-wide admixture studies were reported in Nature Genetics for hypertension and for multiple sclerosis in European Americans (56,84). Five years later, admixture scans have been conducted for a range of traits and diseases that have different rates in Europeans, Latinos, and African Americans ( Table 5). Admixture mapping has been successfully applied to discrete disease 78 Winkler · Nelson · Smith phenotypes [prostate cancer (9,26) and nondiabetic kidney disease (33,34)] and quantitative traits [e.g., interleukin 6 and IL6 soluble receptor levels (57), lipid levels (5, 6), obesity (14,15), and white blood cell counts (46,55)], as well as hypertension (83,84), type 2 diabetes (21,33), breast cancer (23), and peripheral arterial disease (67). Because of the extensive length of ALD, it was widely predicted that the admixture scan, like family linkage studies, would coarsely map a region of interest, and extensive fine mapping would be needed to identify a disease gene and its causal variation. However, in practice, admixture mapping has been remarkably adaptable to fine mapping. Reich et al. (57) provided the proof of principle that admixture mapping could effectively be used to fine-map a gene or causal variant. By adding SNPs to the 95%-credible interval identified in the admixture scan, SNP associations with the disease or trait that are above and beyond the admixture ancestry association can be identified. Highly significant, validated associations using this approach have now been reported for nondiabetic kidney disease in the region of MYH9 (34), low white blood cell counts due to neutropenia with the DARC FY− null promoter mutation (33), and IL-6 and IL-6-soluble receptor levels to a causal, functional allele of the IL-6 receptor gene (57). Without European-African admixture, the identification of the DARC promoter allele (−46 T > C) association with white blood cell counts (WBCs) would have been difficult if not impossible as the alternative alleles are close to fixation in each ancestral population. The FY− mutation abrogates DARC expression on erythroid cells by disrupting the binding site for the GATA1 erythroid transcription factor-homozygotes for the mutation are protected against Plasmodium vivax malaria. The null mutation (−46C) is nearly fixed in sub-Saharan Africa, whereas the alternative wildtype allele (−46T) is fixed outside of Africa (78). The association of increased European ancestry with high WBC and African ancestry with low WBC due to benign neutropenia was critical to finding the association (46,55).
Admixture mapping also provides information about the contribution of nongenetic, sociocultural factors and genetic factors in major U.S. health disparities, e.g., hypertension (71), kidney disease (19), prostate cancer (27), and early onset, invasive breast cancer (1,23). Each of these common diseases, causes of considerable morbidity and mortality, is more frequent in African Americans compared to their European American counterparts. Although it is hypothesized that the increased burden of these diseases is multifactoral-that is, a combination of genetic and environmental factorsancestry mapping may provide clues to the relative contribution and the effect size of genetic factors contributing to the differential risk. If excess global African ancestry is noted across the entire genome in the affected group relative to the control group but there is no significant rise in local ancestry at a particular locus, this may point to a stronger role for sociocultural factors (e.g., access to health care, diet, or lifestyle) that may be tracking ancestry (20,30). Deo et al. (20) conducted a well-powered admixture scan using a robust set of AIMs for hypertension. This study, consistent with another (74), indicates that hypertensive cases tend to have higher African ancestry compared to normotensive controls. However, there were no significant or suggestive increases in local chromosomal African ancestry using either a caseonly or case-control statistic using the ANCES-TRYMAP program. The Deo group was able to eliminate 98% of the genome for harboring genetic variation with OR > 1.7 associated with hypertension by exclusion mapping. While this does not exclude multiple smalleffect genes contributing to the hypertension in persons of African ancestry, it does suggest that there are no large-effect genes that explain the disparate risk. This finding is consistent with the research of Gavlee et al. (30) and others indicating that sociocultural factors such as racial and cultural identity may play an important role in this health disparity. However, the proportion of this epidemiological difference that can be ascribed to genetic or environmental factors is unknown.    On the other hand, there is a genetic basis for the predilection of prostate cancer, nondiabetic kidney disease, and lower white blood counts due to benign neutropenia in African Americans. African American men in the United States have the highest incidence and mortality from prostate cancer in the world-the risk is 1.6-fold higher in African Americans compared with European Americans. An admixture scan identified a region of increased African ancestry in the 8q24 region in younger African American men with prostate cancer, a region previously implicated by a linkage study of Icelanders with prostate cancer and replicated in African Americans (3). The Freedman et al. (26) study estimated that African ancestry at the 8q24 region explains as much as 49% of prostate cancer incidence in the African American population. African American women are more likely to present with aggressive breast cancer tumors that do not express estrogen or progesterone receptors (ER−PR−) whereas European women generally present with ER+PR+ tumors that are more responsive to treatment (4,52). An admixture scan of African American women with breast cancer found an overall increase in European ancestry in women with ER+PR+ tumors and with localized tumors. Unlike prostate cancer, there were no genome-wide-significant associations with African or European ancestry at any specific locus and breast cancer, hormone receptor status, or grade. The increase in European global ancestry with localized tumors and ER+ status positively suggests that differences in breast cancer risk are unlikely to be due to large-effect genetic variation (OR > 1.5) but may be due to population differences in multiple small-to moderate-effect genetic variants and/or population differences in nongenetic factors (e.g., parity, age of menarche onset, physical activity) (7,23). The resolution of the role of ancestry in breast cancer risk will require larger population sizes and inclusion of more patients with ER−/PR− breast cancer tumors to detect smaller effect-size associations.
It has long been recognized that African Americans are at increased risk for chronic and end stage renal disease. HIV-associated nephropathy (HIVAN) is rarely observed in individuals not of African descent, and focal segmental glomerulosclerosis (FSGS) is fourfold higher in African Americans compared with European Americans. End stage renal disease due to any cause is also threefold higher in African Americans. It has long been hypothesized that this disparity was due to genetic risk factors. Two independent studies, one using biopsy-proven HIVAN and FSGS as outcomes and the other nondiabetic end stage kidney disease, identified peak of excess African ancestry on chromosome 22. Subsequent association analysis identified risk alleles in MYH9 on chromosome 22 as strongly associated with HIVAN, FSGS, and nondiabetic end stage renal disease (OR = 5, 4, 2.2, respectively) after correcting for local ancestry. Although the causal allele has not yet been identified, the associated markers occur with high frequency in African Americans (60%) and are infrequent or absent in other populations. The attributable risk for HIVAN and FSGS is 100% and 70%, respectively, thus the increased risk of major forms of kidney disease in the African American population has a strong genetic basis.

SUMMARY
These studies attest to the utility of admixture mapping to quantify the contributions of ancestry to many traits and diseases that are disparate across populations. Although most of the applications to date have been in African Americans, admixture mapping holds promise for identifying the role of genetics and ancestry for conditions such as metabolic syndrome, type 2 diabetes, obesity, and gallbladder disease with higher incidence in persons of Amerindian ancestry compared to Europeans (39,79,80). Patterson and colleagues (49) have shown that the power for admixture mapping was robust for ancestry ranging from 10-90% and can detect disease associations for diseases on either the majority or minority ancestry with near equal power (Figure 8) (49). As shown in Results of the admixture mapping genome scan for combined kidney diseases focal segmental glomerulosclerosis (FSGS) and HIV-associated nephropathy (HIVAN), which are, respectively, fourfold and 60-fold more frequent in African Americans than in European Americans. The sharp peak of African ancestry among cases occurs in the region of MYH9 on chromosome 22. The inset shows the close up of the peak, and the localization of the association to a 95% credible interval of ∼3 Mb. Also shown are genomewide and peak (LOD) scores for several calculations; genome-wide LOD scores greater than 2 are considered significant. Adapted from Reference (34).
human cancers also have disparate rates in diverse populations. Since these diseases are complex, the relevant diseases and traits are those for which the disparate rates are not fully explained by environmental factors. However, ancestry may also be tracking environmental influences such as socioeconomic status, access to health care, and sociocultural factors that influence complex diseases-the identification of excess global parental ancestry in an affected group in the absence of a local spike in parental ancestry in the affected group suggests the importance of nongenetic (i.e., cultural and environmental) factors correlated with ancestry. These insights are critical to developing public health policies and interventions to reduce the disease burden of complex diseases due to environmental factors and to improve clinical outcomes for diseases with a biological basis through rational-based drug development, personalized drug therapies, and genetic screening.  Relative frequencies of cancers in African Americans and European Americans. Cancers with significant differences in frequency (red or green) are potential targets for admixture mapping. Data were extracted using SEER software using U.S. cancer incidence from 2000-2005, age adjusted using 2000 census results as the standard. Incidence rates were calculated separately for European (EA) and American Americans (AA) for the number of cases per 100,000 person years.

DISCLOSURE STATEMENT
The authors are listed as inventors on a patent application for the use of MYH9 genetic variation as a predictor of propensity for kidney disease. This project has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under contract HHSN261200800001E. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government.