Published September 28, 2025 | Version v1
Report Open

Origins of the Khattak Tribe: A Genetic Perspective

  • 1. ROR icon Government College University, Lahore

Description

Origins of the Khattak Tribe: A Genetic Perspective

Abstract (250–300 words)

The Khattak are a major Pashtun (Pathan) tribe historically concentrated in the Peshawar Valley and adjacent uplands of Khyber Pakhtunkhwa, Pakistan. Their recorded history and oral genealogies connect them to broader Pashtun tribal structures, but the biological origins and population history of the Khattak have remained incompletely characterized. Recent molecular studies—particularly those examining mitochondrial DNA (mtDNA), Y-chromosome markers, and autosomal variation—allow a more nuanced reconstruction of maternal, paternal, and autosomal ancestries. Here we synthesize published genetic data on the Khattak and related Pashtun groups, focusing on (1) mtDNA haplogroup composition, (2) Y-chromosome patterns known from Pashtun-dominated regions, and (3) autosomal evidence that places the Khattak within the broader West Eurasian–South Asian genetic cline. Published mtDNA data from the Peshawar Valley indicate that Khattak maternal lineages are a mosaic—approximately 55.7% West Eurasian, 33.9% South Asian and 10.2% East Asian—reflecting multiple waves of prehistoric and historic gene flow. Other regional studies of Pashtun populations reveal high frequencies of Y-chromosome haplogroup R1a (and its South/Central Asian subclades), along with contributions from other Eurasian lineages, indicating complex paternal histories. Autosomal analyses of populations from Pakistan and Afghanistan place Pashtun groups intermediate between South Asian and Central/West Eurasian populations, consistent with archaeological and historical records of migrations and contacts across the Hindu Kush and the Silk Routes. We integrate these genetic signals with historical, linguistic and archaeological evidence to argue that the Khattak, like many Pashtun tribes, have a composite ancestry: a primary South/Central Asian substrate enriched by West Eurasian maternal and paternal elements and punctuated by East Eurasian inputs. We conclude with suggestions for future work—mitogenomes, high-coverage Y-SNP typing, and genome-wide panels—to resolve substructure within the Khattak and to better time admixture episodes.

Keywords: Khattak, Pashtun, mtDNA, Y-chromosome, R1a, admixture, Peshawar Valley. PubMed+1

1. Introduction

Understanding the origins of ethnolinguistic groups increasingly benefits from genetic data that complement historical, archaeological, and linguistic information. The Khattak tribe is an influential Pashtun group with historic presence in the Peshawar Valley and surrounding districts. Traditional genealogies place the Khattak within the larger Pashtun tribal confederacies, but such accounts—while important culturally—cannot alone resolve deep population histories shaped by migrations, trade, and admixture. Genetic markers (mtDNA, Y-chromosome, and autosomal SNPs) provide independent lines of evidence for maternal, paternal, and genome-wide ancestry, respectively, and are therefore well suited to reconstructing the Khattak’s population history. Recent targeted studies of Pashtun tribes—including work that sampled the Khattak—allow a first data-driven synthesis focused specifically on this tribe. PubMed+1

2. Background: Pashtun population genetics (brief)

Multiple population genetic studies across Afghanistan and northwestern Pakistan show that Pashtun groups are genetically heterogeneous and sit at an interface between West Eurasian and South Asian gene pools, with occasional East Asian contributions. Y-chromosome surveys of Pashtuns and neighboring groups often report high frequencies of haplogroup R1a (including subclades common in South/Central Asia), while mtDNA patterns are more mixed, with a substantial portion of West Eurasian haplogroups (H, U, J, T, HV) accompanied by South Asian clades (M-derived lineages) and minor East Eurasian elements. Autosomal surveys place many Pashtun groups intermediate between South Asian and Central/West Eurasian populations, consistent with the Peshawar Valley’s role as a crossroads of prehistoric and historic migrations. PubMed Central+2Europe PMC+2

3. Materials and methods — summary of primary genetic studies used

This review synthesizes peer-reviewed population genetic studies that include Khattak samples or closely related Pashtun populations:

  • Zubair et al., 2020 (Genetica): mtDNA control-region sequences from 58 individuals from Khattak and Kheshgi in the Peshawar Valley; haplogroup assignments and frequency summaries are provided. Data used to quantify maternal haplogroup composition for Khattak. PubMed
  • Bhatti et al., 2017 (Mitochondrial DNA A DNA Mapp Seq Anal): mtDNA control-region analyses of 100 individuals from four Pashtun tribes (including Khattak) from Khyber Pakhtunkhwa; provides haplogroup frequencies and diversity metrics. PubMed
  • Di Cristofaro et al., 2013 and other Central Asian studies: broader Afghan and Central Asian datasets that provide comparative Y-chromosome and autosomal context for Pashtun paternal lineages. PubMed Central
  • Haber et al., 2012: Y-chromosome study of Afghan populations including Pashtuns—used here to illustrate paternal lineage patterns in adjacent regions. PubMed Central
  • Tariq et al., 2022 (Scientific Reports): comprehensive mtDNA and Y-chromosome survey across five ethnic groups in Khyber Pakhtunkhwa offering regional context and demonstrating contrasting maternal vs paternal histories. Nature

Because this is a synthesis paper, no new laboratory work was done; instead, published frequency tables, diversity statistics, and phylogeographic analyses from these studies are combined and interpreted.

4. Results — genetic signatures in the Khattak

4.1 Maternal (mtDNA) composition

Published mtDNA data for Khattak individuals reveal a mixed maternal ancestry. Zubair et al. (2020) report that among the Khattak and Kheshgi combined sample the mtDNA composition was approximately 55.7% West Eurasian, 33.9% South Asian, and 10.2% East Asian, with the Khattak showing close affinities to Central Asian groups in multidimensional scaling based on haplogroup frequencies. PubMed

Bhatti et al. (2017), who sampled multiple Pashtun tribes including Khattak, similarly report high mtDNA diversity and a predominance of West Eurasian haplogroups (H, U, J, K, T, HV) alongside South Asian macrohaplogroup M lineages—consistent with a composite maternal gene pool shaped by both prehistoric and historic contacts. PubMed

Table 1. Summary of maternal haplogroup proportions reported for Khattak (compiled from Zubair et al., 2020; Bhatti et al., 2017). (Where studies report broader tribe panels, numbers shown are study-level summaries or combined values described in the source.)

Maternal haplogroup category

Approx. proportion (Zubair et al., 2020)

Notes (Bhatti et al., 2017)

West Eurasian (H, U, J, T, HV, W, K, etc.)

55.7%

Dominant category; multiple West Eurasian lineages present. PubMed+1

South Asian (M-derived lineages)

33.9%

Substantial presence of macrohaplogroup M and derived clades. PubMed+1

East Eurasian (A, C, D, Z, etc.)

10.2%

Minor but detectable East Asian influence. PubMed+1

Table note: Exact subhaplogroup breakdowns (e.g., U7 vs U2, M3 vs M5) are provided in the primary papers and are recommended for researchers seeking fine-scale phylogeographic inference. PubMed+1

4.2 Paternal (Y-chromosome) patterns — regional context

Targeted Y-chromosome sampling specifically of Khattak males remains limited in published literature; however, regional studies of Pashtun groups indicate common paternal lineages in northwest South Asia and Afghanistan. Haplogroup R1a (particularly South/Central Asian subclades) is frequently observed at high frequency among several Pashtun tribes and neighboring populations, reflecting deep male-mediated gene flow across the Eurasian steppe and South Asia. Studies from Afghanistan and Khyber Pakhtunkhwa show elevated R1a frequencies and contributions from other haplogroups (G, L, Q, J). These patterns suggest that Khattak paternal lineages are plausibly dominated by R1a and related lineages, but high-resolution haplotyping (Y-SNPs) is needed for conclusive statements. PubMed Central+2PubMed Central+2

4.3 Autosomal (genome-wide) affinities — comparative view

Autosomal and genome-wide studies of Pashtun and neighboring groups consistently place Pashtuns intermediate between South Asian and West/Central Asian clusters, with variable amounts of East Eurasian components in some groups. This fits a model where the Khattak and allied Pashtun groups derive from a South/Central Asian substrate that received gene flow from West Eurasian and Central Asian populations—via prehistoric migrations (Neolithic to Bronze Age) and later historical movements (e.g., Silk Road contacts, medieval movements). PubMed Central+1

5. Discussion

5.1 Composite ancestry: interpreting maternal and paternal contrasts

Two recurring themes emerge from the available data. First, maternal lineages of the Khattak are highly heterogeneous, with a majority of West Eurasian haplogroups but also large South Asian and measurable East Asian fractions. This pattern suggests multiple female-line contributions, consistent with matrilineal inputs from neighboring West Eurasian populations and retention of indigenous South Asian lineages. The relatively high West Eurasian proportion in mtDNA often surprises lay readers but is well documented in many northwestern South Asian groups and likely reflects both prehistoric Neolithic and Bronze-Age movements as well as historical trade and migration. PubMed+1

Second, paternal lineages in the wider Pashtun region (which likely reflect Khattak males as well) frequently show elevated R1a and other West/Central Asian Y-haplogroups. Because Y-chromosome markers track male-mediated gene flow and can show different patterns than mtDNA, contrasting maternal and paternal histories are plausible and common in Eurasia—especially where patrilocal marriage systems or sex-biased migration have occurred. Combined, these patterns reveal a dynamic demographic history with sex-biased admixture episodes. PubMed Central+1

5.2 Historical and archaeological concordance

The genetic evidence aligns with the Peshawar Valley’s known role as a corridor for migrations between Central Asia, the Iranian plateau, and South Asia. Archaeological layers and historical records indicate repeated population flows (e.g., Bronze-Age pastoralist expansions, Iron-Age movements, Silk Road era exchanges, and medieval migrations). The Khattak’s genetic mosaic likely reflects demographic events across these timescales rather than a single recent founder event. Published analyses of mtDNA and Y-chromosome diversity support multiple episodes of admixture and gene flow consistent with known migration pathways. PubMed+1

5.3 Limits of current data and caveats

  • Sampling size and resolution: Published Khattak mtDNA samples are modest (e.g., 58 individuals in Zubair et al., 2020) and Y-chromosome data specific to Khattak remain scarce. Broader claims about fine-scale substructure require larger, well-designed sampling. PubMed
  • mtDNA and Y-chromosome are single-locus markers: They reflect only direct maternal and paternal lines and thus cannot capture the full autosomal ancestry. Genome-wide SNP data are needed to estimate admixture proportions, dates, and fine structure robustly. Nature
  • Admixture dating: Without genome-wide LD-based analyses, timing admixture events precisely is challenging. The accumulated genetic signals indicate multiple episodes, but dating requires dense SNP arrays or whole-genome data. PubMed Central

6. Recommendations for future research

  1. Whole mitogenome sequencing of larger Khattak samples to refine maternal phylogeography and to identify subhaplogroups that can be dated phylogenetically. PubMed
  2. High-resolution Y-SNP typing and Y-STR haplotyping targeted to Khattak males to resolve R1a subclades and other haplogroups (G2, L, Q, J), permitting phylogeographic and coalescent analyses. PubMed Central
  3. Genome-wide autosomal SNP arrays or whole genomes from Khattak individuals (with comparative samples from neighboring Pashtun and non-Pashtun groups) to model admixture proportions, dates, and ancestry sources using methods such as qpAdm, ADMIXTURE, and ALDER. PubMed Central+1
  4. Integration with archaeology and linguistics: interdisciplinary projects combining ancient DNA (if available), archaeological contexts, and linguistic histories would provide the strongest tests of demographic hypotheses. PubMed Central

7. Conclusion

Current genetic data portray the Khattak tribe as a genetically heterogeneous population whose maternal lineages are predominantly West Eurasian with considerable South Asian and minor East Asian contributions, and whose paternal landscape (inferred from nearby Pashtun studies) likely includes high frequencies of R1a and other Central/West Eurasian Y-lineages. Autosomal evidence from regional studies situates Pashtun groups—including the Khattak—on a cline between South Asia and Central/West Eurasia, consistent with long-term interactions across the Hindu Kush and Silk Routes. To refine these inferences—particularly in order to detect fine-scale substructure, to date admixture events, and to identify specific source populations—larger and higher-resolution genetic studies (mitogenomes, Y-SNP, and genome-wide) are necessary. Such work will both illuminate the Khattak’s past and contribute to broader models of how human populations formed in a key Eurasian crossroads. PubMed+2PubMed+2

Figures and Tables

Figure 1. Map of the Peshawar Valley and sampling localities for Khattak and Kheshgi in Zubair et al., 2020. Caption: Map illustrating sample collection sites in Peshawar Valley showing the primary locations where Khattak individuals were sampled (adapted from Zubair et al., 2020). PubMed

Figure 2. Conceptual admixture model for Khattak ancestry. Caption: Schematic diagram showing three major ancestral components inferred from genetic data: West Eurasian (major maternal share), South Asian (substantial maternal and autosomal share), and minor East Eurasian inflow. Model reflects combined interpretation of mtDNA, Y-chromosome and autosomal results. PubMed+2PubMed+2

Table 1. (Provided above) Maternal haplogroup proportions for Khattak (Zubair et al., 2020; Bhatti et al., 2017). PubMed+1

Acknowledgements

I thank the authors of the primary genetic studies summarized here for making their data publicly available. The present synthesis relied on published summaries and analyses; any errors of interpretation are the sole responsibility of the author.

References (selected)

  • Zubair M, Hemphill BE, Schurr TG, Tariq M, Ilyas M, Ahmad H. Mitochondrial DNA diversity in the Khattak and Kheshgi of the Peshawar Valley, Pakistan. Genetica. 2020 Aug;148(3–4):195–206. doi:10.1007/s10709-020-00095-2. PubMed
  • Bhatti S, Aslamkhan M, Abbas S, Attimonelli M, Aydin HH, de Souza EM. Genetic analysis of mitochondrial DNA control region variations in four tribes of Khyber Pakhtunkhwa, Pakistan. Mitochondrial DNA A DNA Mapp Seq Anal. 2017 Sep;28(5):687–697. doi:10.3109/24701394.2016.1174222. PubMed
  • Di Cristofaro J, Pennarun E, Mazières S, Myres NM, et al. Afghan Hindu Kush: where Eurasian sub-continent gene flows converge. PLoS One. 2013;8(10):e66737. doi:10.1371/journal.pone.0066737. PubMed Central
  • Haber M, Gauguier D, Youhanna S, Patterson N, et al. Afghanistan’s Ethnic Groups Share a Y-Chromosomal Heritage Structured by Historical Events. Hum Genet. 2012;131(5):763–780. PubMed Central
  • Tariq M, et al. Contrasting maternal and paternal genetic histories among five ethnic groups from Khyber Pakhtunkhwa, Pakistan. Scientific Reports. 2022;12:XXXX (see full article for dataset and methods). Nature
  • Underhill PA, et al. The phylogeography of Y-chromosome haplogroup R1a: implications for Indo-European migrations. (selected reviews). Wikipedia

 

Files

Files (23.9 kB)

Name Size Download all
md5:9ffa2042405b69190c093e631336430b
23.9 kB Download

Additional details

References

  • Zubair M, Hemphill BE, Schurr TG, Tariq M, Ilyas M, Ahmad H. Mitochondrial DNA diversity in the Khattak and Kheshgi of the Peshawar Valley, Pakistan. Genetica. 2020 Aug;148(3–4):195–206. doi:10.1007/s10709-020-00095-2. PubMed Bhatti S, Aslamkhan M, Abbas S, Attimonelli M, Aydin HH, de Souza EM. Genetic analysis of mitochondrial DNA control region variations in four tribes of Khyber Pakhtunkhwa, Pakistan. Mitochondrial DNA A DNA Mapp Seq Anal. 2017 Sep;28(5):687–697. doi:10.3109/24701394.2016.1174222. PubMed Di Cristofaro J, Pennarun E, Mazières S, Myres NM, et al. Afghan Hindu Kush: where Eurasian sub-continent gene flows converge. PLoS One. 2013;8(10):e66737. doi:10.1371/journal.pone.0066737. PubMed Central Haber M, Gauguier D, Youhanna S, Patterson N, et al. Afghanistan's Ethnic Groups Share a Y-Chromosomal Heritage Structured by Historical Events. Hum Genet. 2012;131(5):763–780. PubMed Central Tariq M, et al. Contrasting maternal and paternal genetic histories among five ethnic groups from Khyber Pakhtunkhwa, Pakistan. Scientific Reports. 2022;12:XXXX (see full article for dataset and methods). Nature Underhill PA, et al. The phylogeography of Y-chromosome haplogroup R1a: implications for Indo-European migrations. (selected reviews). Wikipedia