Fine-Scale Genetic Profile and Admixture History of Two Hmong-Mien–Speaking Miao Tribes from Southwest China Inferred from Genome-Wide Data

ABSTRACT As the dominant indigenous minority in southern China, Hmong-Mien-speaking Miao people were thought to be the descendants of Neolithic Yangtze rice farmers. However, the fine-scale population structure and genetic profile of the Miao populations remain unclear due to the limited Miao samples from southern China and Southeast Asia. We genotyped 19 individuals from the two largest Miao tribes in Guizhou Province (Southwest China) via SNP chips and co-analyzed the data with published modern and ancient East Asians. The Guizhou Miao displayed a closer genomic affinity with present-day and Neolithic to Iron Age southern East Asians (SEAs) than with most northern East Asians (NEAs). The genetic substructure within Miao groups was driven by different levels of genetic interaction with other ethnolinguistic groups: Hunan Miao (Central China) harbored higher proportions of NEA-related ancestry; Guizhou Miao (Southwest China) and Vietnam Miao (mainland Southeast Asia) received additional gene flow mainly from surrounding groups with Tai-Kadai–related ancestry. There were also more complex admixture events in the newly studied groups between Guizhou Xijiang Miao and surrounding populations compared with Guizhou Congjiang Miao. The qpAdm model further demonstrated that the primary ancestry of Hunan Miao, Guizhou Miao studied here, and Vietnam Miao derived from ancient SEA-related ancestry (represented by coastal early Neolithic SEA Liangdao2), with the additional gene flow from ancient NEA-related ancestry (represented by spatiotemporally inland Yellow River farmers), with slightly different proportions. Our genomic evidence reveals the complex and distinct demographic history of different Miao tribes.

Fine-Scale Genetic Profile and Admixture History of Two Hmong-Mien-Speaking Miao Tribes from Southwest China Inferred from Genome-wide Data Hui Tan, 1,2# Rui Wang, 3# * and Chuan-Chao Wang 1,3,4 * AbstrAct As the dominant indigenous minority in southern China, Hmong-Mien-speaking Miao people were thought to be the descendants of Neolithic Yangtze rice farmers.However, the fine-scale population structure and genetic profile of the Miao populations remain unclear due to the limited Miao samples from southern China and Southeast Asia.We genotyped 19 individuals from the two largest Miao tribes in Guizhou Province (Southwest China) via SNP chips and co-analyzed the data with published modern and ancient East Asians.The Guizhou Miao displayed a closer genomic affinity with present-day and Neolithic to Iron Age southern East Asians (SEAs) than with most northern East Asians (NEAs).The genetic substructure within Miao groups was driven by different levels of genetic interaction with other ethnolinguistic groups: Hunan Miao (Central China) harbored higher proportions of NEA-related ancestry; Guizhou Miao (Southwest China) and Vietnam Miao (mainland Southeast Asia) received additional gene flow mainly from surrounding groups with Tai-Kadai-related ancestry.There were also more complex admixture events in the newly studied groups between Guizhou Xijiang Miao and surrounding populations compared with Guizhou Congjiang Miao.The qpAdm model further demonstrated that the primary ancestry of Hunan Miao, Guizhou Miao studied here, and Vietnam Miao derived from ancient SEA-related ancestry (represented by coastal early Neolithic SEA Liangdao2), with the additional gene flow from ancient NEA-related ancestry (represented by spatiotemporally inland Yellow River farmers), with slightly different proportions.Our genomic evidence reveals the complex and distinct demographic history of different Miao tribes.
S outhern China, one of the well-known birth- places of rain-fed rice agriculture, harbors diverse ethnicities that can be categorized as five main language families: Hmong-Mien (HM)-, Tai-Kadai (TK)-, Austroasiatic (AA)-, Austronesian (AN)-, and Sinitic-Tibetan (ST)-speaking populations.Ancient DNA studies shed light on the massive population movement and the resulting admixture events that gave rise to the formation of population structure and transformed the genetic in southern China and Southeast Asia via multiple waves of historic migrations (Li 2000) and mainly inhabit Guizhou Province (Southwest China), as well as adjoining regions, including Hunan (Central China), Yunnan (Southwest China), and Vietnam (mainland Southeast Asia).Recent genetic analysis of HM-speaking Miao populations focused on low-density autosomal/uniparental single tandem repeat (STR) and indel markers and mainly assessed forensic characteristics and investigated the genetic diversity of Miao, revealing the genetic affinity between Miao and geographically close populations, such as Guizhou Gelao and Guizhou Han (Chen et al. 2019a), Hunan Han and Guizhou Sui (Chen et al. 2018), and Enshi Tujia (Chen et al. 2019b).The shared rare patrilineage O-M7 suggests that present-day Miao might be direct descendants of Neolithic Daxi culture (6,500-5,300 YBP)-related Yangtze River farmers (Su et al. 1999;Li et al. 2007;Yu and Li 2021).The STR diversity of Y-chromosomal haplogroups O2a-M95, O3a3b-M7, and O3a3c1-M117 indicated one unidirectional route from Mon-Khmer (MK) to HM to ST, where the ancestors of East Asians migrated from Southeast Asia into East Asia through the Yun-Gui Plateau (Cai et al. 2011).Wen et al. (2005) proposed the southern origin of HM-speaking populations from a mitochondrial DNA (mtDNA) lineage perspective, whereas the observed higher frequency of northern-dominant mtDNA lineages in Hmong suggests they had more genetic contact with NEAs than with Mien speakers.Key genetic findings of populations from Southeast Asia and southern China based on genome-wide singlenucleotide polymorphism (SNP) data revealed extensive genetic interactions among ST-, TK-, and HM-speaking populations, which may have led to the rise of the HM-related genetic cline (Xia et al. 2019;Huang et al. 2021;Kutanan et al. 2021;Liu et al. 2022).
Because of the low preservation of ancient DNA due to the humid and hot environment, which led to a lack of ancient samples in southern China, and the limited systematical sampling of present-day Miao samples from various Miao tribes, population studies based on genome-wide data from presentday Miao groups and ancient Neolithic Yangtze River farmers, which can provide genomic insights into population substructure, admixture history, and the genetic origin, are still insufficient.In this diversity in southern China and Southeast Asia.Late Neolithic southern East Asians (SEAs) (represented by Tanshishan and Xitoucun) harbored more northern East Asia (NEA)-related ancestry compared with the early Neolithic SEAs (represented by Liangdao2 and Qihe) due to southward migrations of NEAs; Neolithic coastal Shandong hunter-gatherers (represented by Bianbian, Boshan, Xiaogao, and Xiaojingshan) and Fujianrelated ancestries (represented by Liangdao2 and Qihe) contributed considerable ancestry to present-day East Asians (Yang et al. 2020).The population expansion of Yangtze River farmers (termed Yangtze "ghost" population) may spread AN, TK, and AA languages and shape the genetic profiles of populations in southern China and Southeast Asia (C.-C.Wang et al. 2021).Based on newly reported Guangxi ancients, T. Wang et al. (2021) revealed the temporal genetic dynamics and complex admixture interactions at the crossroads of East and Southeast Asia: 11,000 YBP (years before present) inland Guangxi Longlin-related lineage (Southwest China), coastal Fujian Qihe-related ancestry (Southeast China), and deep Asians related to Hòabìnhian hunter-gatherers in inland Southeast Asia involved in the formation of the Early Neolithic Guangxi ancients (represented by Baojianshan).In contrast, local Longlin-related ancestries did not contribute to the ancestry compositions of historical Guangxi ancients and present-day indigenous populations in southern China (T.Wang et al. 2021).The first Southeast Asian farmers (represented by Late Neolithic Vietnam Man Bac) could be fitted as having their main ancestry from present-day southern Chinese and deep-diverged ~8,000 YBP Hòabìnhian huntergathers, supporting that southern Chinese spread agricultural technologies into mainland Southeast Asia via demic diffusion (Lipson et al. 2018;McColl et al. 2018).
Located in the joint area of Yunnan, Hunan and Sichuan Provinces and Guangxi Zhuang Autonomous Region, Guizhou Province is one of the most ethnolinguistically diverse regions in southern China.With a population size of approximate 4 million, the Miao people, who belong to the HM language family (also referred to as Miao-Yao), are the sixth largest ethnic group of 55 minorities in mainland China, based on the 2010 Chinese census.The present-day Miao people are widely distributed study, we genotyped approximate 700,000 SNPs via SNP arrays of 19 Miao individuals from two of the largest Miao tribes: (1) "Thousand Miao Villages Xijiang," located in Guizhou Province, Southwest China, well known as the largest Miao village worldwide; and (2) Congjiang County, located in southeast Guizhou, bordering on Guangxi Zhuang Autonomous Region.We then comprehensively compared these data with published data of modern and ancient reference East Asians to dissect the population structure among HM-speaking populations, genetic relationships between the Miao groups and East Asians, and the admixture history of Miao people.

Sample Collection, Genotyping, and Data Quality Control
We collected 19 saliva samples from two Miao tribes in Guizhou Province (Southwest China) with informed consent.The study was reviewed and approved by the Medical Ethics Committee of Xiamen University (approval no.XDYX2019009), and the participants provided their written informed consent to participate in this study.The geographic locations for the sampled populations are displayed in Figure 1, and Table 1 provides details on these samples.We genotyped 699,537 SNP via using the Illumina WeGene Arrays.We first employed PLINK1.9(Purcell et al. 2007) to filter the poorquality data with the following parameters: "--hwe 1e-6 --maf 0.01 --geno 0.01 --mind 0.01".PLINK1.9 with option "--missing" was used to calculate the SNP calling rate for each individual.
To exclude kinship-related individuals, we first used GCTA software (genome-wide complex trait analysis) (Yang et al. 2011) to estimate the kinship between every pair of our newly sampled individuals (--autosome --make-grm) and applied PLINK1.9(Purcell et al. 2007) with the option: "--remove" to exclude close relatives within third-degree kinship.The genetic relationship matrix (GRM) based on raw data and the GRM based on removed kinship are displayed in Supplementary Figure S1.We obtained 10 unrelated Xijiang Miao individuals and 7 unrelated Congjiang Miao individuals, with 417,227 genome-wide SNPs for population genetic analysis.yes".To explore the phylogenetic relationships among northern East Asians (NEAs) and southern East Asians (SEAs), we constructed the unrooted maximum likelihood tree with 0-3 migration edges via TreeMix (Pickrell and Pritchard 2012) and the neighbor-joining phylogenetic tree based on F ST genetic distance via Mega 7 software (Kumar et al. 2016).

Allele Sharing Analysis
We used qp3Pop and qpDstat programs in Admix-Tools (Patterson et al. 2012) to calculate f 3 statistics and f 4 statistics.To measure the shared genetic drift between populations X and Y since their divergence from the outgroup Yoruba, we calculated outgroup f 3 statistics in the form f 3 (X, Y; Yoruba).To investigate the potential admixture signals for the target populations, we performed admixture f 3 statistics in the form f 3 (source 1, source 2; target).To detect the admixture events across four populations, we applied f 4 statistics in the form f 4 (Yoruba, W; X, Y).The significant negative f 4 (Z < -3) indicated W shared more excess derived alleles with X than with Y. Similarly, Z > 3 suggested that W shared more alleles with Y than with X, whereas |Z| < 3 indicated that W shared equal loci with X and Y.

Inferring the Admixture Coefficients
We applied qpWave and qpAdm in AdmixTools (Patterson et al. 2012) to explore the minimum streams of ancestry and the optimal admixture model.The model with p > 0.05 for rank = 0 was the acceptable n-way admixture model (n = number of the source populations).Nested p-value > 0.05 indicated that the (n -1)-way admixture model was better than the n-way model.We accepted only the models with (1) p-value > 0.05, (2) nested p-value < 0.05, and (3) admixture proportions ranging from 0 to 1.

Overview of Population Genetic Structure in East Asia
We initially constructed the top two principal components (PCs) based on the genomic data of present-day East Asians and then projected the ancient samples onto the PC axis (PC1 variation: 1.296%; PC2 variation: 0.7311%) to investigate the

Merging Data
We merged our newly collected Guizhou Miao data with present-day and ancient reference East Asians via the mergeit package in Eigensoft (Patterson et al. 2006)

Population Structure Analyses
We applied the smartpca program implemented in Eigensoft (Patterson et al. 2006) to conduct PCA, with the following options: lsqproject: yes; shinkmode: yes; and numoutlieriter: 0. We used PLINK1.9(Purcell et al. 2007)  We then used model-based Admixture to explore the population structure further.The lowest cross-validation error occurred at K = 4 (cross-validation error = 0.59663; Supplementary Figure S2).At K = 4 (Figure 3), the clustering patterns revealed by the Admixture plot were population substructure and patterns of genetic relationships among East Asians (Figure 2).The populations belonging to the same linguistic or geographic categories tend to cluster together.We identified three meta-population clusters: (1) the NEA-related genetic cline, which consisted of Turkic, Mongolic, Tungusic, Tibeto-Burman (TB), Japanese, Korean, and Sinitic-related populations; (2) the SEA-related cluster, comprising AA, TK, and AN from southern China and Southeast Asia; and (3) HM-speaking populations from southern China and Southeast Asia forming a unique genetic cline, illustrating the population substructure within HM speakers.Our Guizhou Miao populations took an intermediate position between Vietnam Hmong and Hunan Miao.Our Miao Congjiang sample shifted toward HM-speaking Vietnam Hmong, while our Miao Xijiang individuals showed a relatively closer relationship with some of the HM-speakers in Guangxi, such as Miao Rongshui, Miao Huanjiang, and Vietnam PaThen, and fell

Genetic Affinities among Studied Guizhou Miao Groups and Published East Asians
To explore the genetic relationships between the Miao populations studied here and reference modern and ancient East Asians, we measured the shared genetic drift via outgroup f 3 statistics in the form f 3 (X, Y; Yoruba).Based on the merged HO data set (Figure 4), we found that newly genotyped Miao groups, HM-speaking Hmong, and PaThen had the closest relationships with each other and also shared high genetic drift with TK speakers (e.g., Dong, CoLao, and Maonan) and neighboring Sinitic speakers.In contrast, HM-speaking Hunan Miao and Fujian She clustered with Han from southern China, such as Guangdong, Fujian, and Sichuan Provinces, in accordance with the genetic affinity revealed in PCA and Admixture results.The alleles shared between studied Guizhou Miao groups and ancient East Asians in the merged 1240K data set (Figure 5) indicated the Guizhou Miao populations in our sample had the largest f 3 values with 500 YBP historical Guangxi GaoHuaHua, followed by 1,500 YBP historical Guangxi ancients (Baban-QinCen, Layi, Yiyang, and Shenxian), Iron Age Gongguan, Hanben ancients from Taiwan, and Neolithic-Bronze Age WLR and Yellow River (YR) basin populations.
Neighbor-joining tree analysis based on a pairwise F ST matrix revealed phylogenetic relationships among Guizhou Miao and representative modern AA, AN, TK, Sinitic-, TB-, and Altaic-related populations (Figure 6).HM-related populations formed a genetic clade intermediate between TK-related and Sinitic-related clusters; newly genotyped Miao groups displayed a close genetic affinity with each other, generally agreeing with the observed patterns in outgroup f 3 statistics.Furthermore, we used TreeMix software to explore the population splits and potential gene flow based on genetic variations of modern East Asian populations.The maximum likelihood phylogenetic trees with predefined 0-3 migration edges strongly supported that Miao and other HM-speaking groups were genetically close and then clustered with TK and southern Han people, in agreement with the F ST neighbor-joining tree analysis and outgroup f 3 statistics, but we did not detect specific gene flow into the Miao groups (Figure 7).
To investigate genetic similarities and differentiation among the Miao studied here and published highly associated with population linguistic affiliations and geographical distributions.We observed four main predefined ancestries in our Guizhou Miao sample.The orange component in Figure 3 was enriched in HM-speaking Vietnam Hmong and historical GaoHuaHua samples and was widely distributed in modern TK-, AA-, and Sinitic-speaking populations but nearly absent in Neolithic ancient individuals from Guangxi (Southwest China), Fujian (Southeast China), and Amur River Basin (Northeast China).The pink source in Figure 3 was maximal in modern Tibetans and ancient NEAs from the Amur River, Western Liao River (WLR), and coastal Siberia; the yellowcomponent-related ancestry was dominant in AAspeaking Mang and Late Neolithic populations from Malaysia and Laos, which widely appeared in southern China and Southeast Asia populations; coastal Late Neolithic Fujian ancients, Iron Age Taiwan Gongguan individuals, and present-day indigenous AN-speaking Amis harbored a high proportion of blue-component-related ancestry.The Guizhou Miao Congjiang individuals were assigned considerable Hmong-related (orange) ancestry (76%) with 15% AA-related yellow ancestry, 7% Taiwan ancient or modern AN-related blue ancestry, and a small proportion of NEA-related and Tibetan-related pink ancestry (2%), displaying a genetic profile similar to that of HM-speaking Vietnam Hmong and historical Guangxi GaoHua-Hua individuals.The Miao Xijiang clustered with Vietnam PaThen and carried less Hmong-related ancestry (50%) but higher proportions of AArelated (27%), indigenous AN-related (14%), and Tibetan-related components (9%) comparing with the Miao Congjiang.Published Hunan Miao and Fujian She displayed nearly equivalent ancestral compositions, clustering with Han Chinese and Hunan Tujia rather than with Guizhou Miao and Vietnam HM-related speakers, indicating possible intermarriage with geographically close ST-related populations.The YR farmer-related ancestry was mainly represented by Neolithic to Iron Age NEAs from the inland middle and upper YR basin.However, no ancient DNA data are available from the Yangtze River-related culture.We used all available ancient samples from southern China (Fujian, Taiwan, and Guangxi) and present-day indigenous groups (Qiongzhong Li, and Taiwan Amis and Atayal) to represent Yangtze River rice farmer-related ancestry, although this hypothesis can be tested only with future Yangtze River Valley-related ancient DNA data.Numerous significant positive Z-scores from f 4 (Yoruba, Yangtze River-related populations; East Asians, studied Miao) suggested that studied Guizhou Miao derived their main ancestry from Yangtze River rice farmers, partly supporting the southern origin of studied Miao (Figure 8).The significant positive Z-scores in f 4 (Yoruba, YRrelated populations; SEAs, studied Miao) showed that Guizhou Miao harbored more inland YR-basin farmer-related ancestry than many SEAs, such as Maonan, Li, Muong, Dai, Kinh, Amis, Atayal, Tagalog, Mang, and Cambodian (Figure 8).Furthermore, to explore whether the Miao received additional gene flow compared with the potential source, we calculated symmetrical f 4 statistics in the form f 4 (Yoruba, East Asians; studied Miao, YR/ Yangtze River farmers) (Supplementary Figure S5).Intriguingly, we observed that nearly all present-day NEA and SEA reference populations (via SNP-array genotype) shared more derived alleles with the Guizhou Miao than with the hypothesized Yangtze River farmer-related ancients (via pseudohaploid calls), including Guangxi historical GaoHuaHua ancients, which had the closest relationships with Guizhou Miao than with other published modern and ancient East Asian populations in both the descriptive analysis (i.e., PCA and Admixture) and quantitative analysis (i.e., outgroup f 3 statistics, f 4 (Yoruba, GaoHuaHua; ref, studied Guizhou Miao) > 0).The results of f 4 (Yoruba, ref; TK-speaking Li, studied Guizhou Miao) suggested that studied Guizhou Miao received genomic influence mainly from NEA-related ancestry (e.g., Tibetan and Oroqen) after divergence from the present-day Qiongzhong Li population (via SNP-array genotype), who were assumed to represent proto-TKspeaking-related ancestry.
To elucidate the fine-scale population substructure within Miao populations (i.e., Hunan Miao, Vietnam Hmong, and newly collected Miao   genetic heterogeny between Hunan Miao and Guizhou Miao (p-value < 0.0021989016 for rank = 0), suggesting the genetic profile of the Hunan Miao differed from the Guizhou Miao studied here.Miao Xijiang and Miao Congjiang were relatively genetically homogeneous (p-value = 0.434071112 for rank = 0).

Exploring Admixture History and Estimating Admixture Proportions for Each Studied Miao Group
We performed admixture f 3 statistics in the form f 3 (source 1, source 2; studied Guizhou Miao) to explore potential admixture signals (Figure 10).Z-scores < -3 suggested that the allele frequency of studied Guizhou Miao is between that of source 1 and source 2. The statistically significant positive f 3 values in f 3 (Hmong, non-Hmong; Miao Congjiang) suggest that Miao Congjiang did not receive substantial gene flow from other language family groups.We observed significant negative Z-scores for Miao Xijiang when we combined Miao Congjiang with one NEA source (e.g., Korean; Altaic-speaking Oroqen and Mongolian; TB-speaking highland Tibetan and Tu; Chinese Han from Jiangsu, Hubei, Shanxi, and Henan) or SEA (e.g., AN-speaking Tagalog and Atayal; AA-speaking Kinh; TK-speaking Zhuang, Thai, and Mulam), indicating that Miao Xijiang was a genetic admixture population.There were more complex admixture events between Miao Xijiang and surrounding populations compared with Miao Congjiang.However, we observed that one source from Amur River, WLR, or coastal Siberia (AR_IA, AR_EN, or Boisman_MN) and the other related to Late Neolithic SEAs (Vietnam_LN, Malaysia_LN, In-doesia_LN_BA, or Tanshishan_LN) could generate the top negative f 3 values for both Miao Xijiang and Miao Congjiang, suggesting possible north-south admixture patterns for Guizhou Miao.
We calculated the admixture proportions via a series of qpAdm-based two-way admixture models (Figure 11).We used spatiotemporally diverse YR farmers (i.e., Neolithic to Iron Age YR farmers from the central and upper YR basins: Shimao_LN, Upper_YR_LN, Upper_YR_IA, YR_LN, YR_MN, and YR_LBIA) as NEA-related ancestry; we used Yangtze River basin farmer-related ancestry as the proxy of SEAs.Guizhou Miao derived 61.9-77% of its ancestry from Yangtze River-related populations Congjiang, and Miao Xijiang from Guizhou), we subsequently conducted a series of symmetrical f 4 statistics in the form f 4 (Yoruba, 127 HO-based reference East Asians; Miao population 1, Miao population 2).The significant f 4 values confirmed genetic differentiation among the Miao populations from different geographic locations.HM-speaking Hmong, PaThen, and historical GaoHuaHua shared more derived alleles with Miao Congjiang than with Miao Xijiang and Hunan Miao.No significant Z-scores in f 4 (Yoruba, East Asians; Miao Congjiang, Hmong) indicated that Miao from Congjiang and Vietnam formed a clade relative to the references used (Figure 9A), while historical GaoHuaHua and Miao Congjiang shared significantly more alleles with Hmong than did Miao Xijiang.With the threshold of Z-scores set to 2, Han Guangdong, TK-speaking Mulam, AN-speaking Atayal, Tagalog, and AA-speaking Kinh shared more derived alleles with Miao Xijiang than with Hmong (i.e., f 4 (Yoruba, East Asians; Miao Xijiang, Hmong), -2.426 < Zscore < -2.027), suggesting that geographically close non-Miao populations might have played an important role in the formation of Miao Xijiang (Figure 9B).Hunan Miao harbored more Tibetan/ YR farmer/Oroqen-related ancestry and less HM/ TK-speaking-related ancestry (e.g., Dong, CoLao, PaThen, and GaoHuaHua) compared with other Miao groups, suggesting Hunan Miao received genomic influence mainly from NEA-related populations, while Guizhou Miao and Vietnam Miao may have admixed with the surrounding TK-related speakers at different levels (Figure 9C).

Discussion
Several lines of genetic evidence have repeatedly supported that numerous population divergence,  Our genomic evidence based on PCA, Admixture, pairwise F ST , TreeMix, and allele-sharing-based f-statistics suggests that, fitting for their common linguistic affiliations, HM speakers had a strong genetic affinity with SEAs, especially neighboring TK speakers, which was consistent with the genetically close relationships revealed via low-density genetic markers and genome-wide SNP data from other Miao tribes from Southwest China (Zhang et al. 2015;Chen et al. 2018Chen et al. , 2019aChen et al. , 2019b;;Han et al. 2019;Zhang et al. 2019;Fan et al. 2020;Feng et al. 2020;Tang et al. 2020;Huang et al. in press;Luo et al. 2021).
Furthermore, we observed genetic differentiation among HM speakers from Guizhou (Southwest China), Hunan (Central China), and Vietnam (mainland Southeast Asia).Based on the PCA plot and Admixture clustering patterns, we identified strong associations between geographical locations and the population substructure of HM speakers.Our results from admixture f 3 analysis and symmetrical f 4 statistics indicate that the Miao Xijiang population harbors more NEA-related ancestry and neighboring TK-and AN-related ancestry and was an admix between studied Miao Congjiang-related ancient Gongguan and Hanben individuals from Taiwan (who had additional gene flow from NEA-related ancestry) could be regarded as their nearly direct descendants (C.-C.Wang et al. 2021).Based on a series of f 4 statistics, we observed that studied Guizhou Miao groups were not the direct descendants of representative YR/Yangtze River farmer-related ancestry; they received additional gene flow mainly from ST-related populations and surrounding TK-related populations.The results of qpAdm-based two-way admixture models revealed that the Guizhou Miao derived their ancestry from ancient coastal SEAs, whereas Hunan Miao harbored more ancient coastal NEA-related ancestry.The difference in ancestry proportions supports the fine-scale population substructure of Miao groups.Different Miao people from different geographic locations even in the same province displayed detectable genetic differentiation, mirroring the complex demographic history of HMspeaking Miao populations, consistent with the uniparental genetic evidence (Wen et al. 2005) and the Central China-Southwest China-Southeast Asia route according to the historical records (Li 2000).
The whole-genome sequence data of Miao individuals, which carried more genetic information than SNP arrays, were essential for reconstructing the deep history of Miao people.Facilitated by capture/shotgun sequence technology, we obtained ancient DNA in Southwest China (Guangxi) (T.Wang et al. 2021) and Southeast China (Fujian, Taiwan) (Yang et al. 2020;C.-C. Wang et al. 2021).The data gap of Yangtze River rice farmer-related ancestry will also be addressed and can provide further insights into complex prehistorical evolutionary and the genetic origin of present-day East Asians.

SUPPLEMENTARY FIGURE S4
. Assessing allele sharing between our Miao sample and ancient reference populations via f 4 (Yoruba, studied Miao; ancient East Asians 1, ancient East Asians 2).

FIGURE 3 .
FIGURE 3. Model-based Admixture results with four predefined ancestral sources.Yellow, genetic components of AA-speaking Mang-related ancestry; blue, AN-related populations sharing a component; pink, NEA-related ancestry; orange, enriched in HM-speaking Hmong and Guangxi historical GaoHuaHua component.

Focusing
on f 4 (Yoruba, studied Miao; NEAs 1, NEAs 2), we observed that Guizhou Miao shared more alleles with northern Chinese Han, Japanese, and Korean populations than with other TB or Altaic speakers.The observed significant Z-scores of f 4 (Yoruba, studied Miao, ancient 1, ancient 2) also suggested that the Guizhou Miao showed strong genetic affinity with historical Guangxi ancients (GaoHuaHua, Shenxian, LaCen, Layi, Yiyang, and BabanQinCen), Iron Age Gongguan and Hanben ancients from Taiwan, and Neolithic to Iron Age populations from the YR basin (YR_MN, YR_LN, YR_LBIA, and Upper_YR_IA) and WLR basin (WLR_LN) relative to other published ancients.

FIGURE 5 .
FIGURE 5. Outgroup f 3 profile in the form f 3 (ancient ref, studied Guizhou Miao; Yoruba) based on the merged 1240K data set.Error bars denote standard error.Higher f 3 values indicate closer genetic relationship.

FIGURE 6 .
FIGURE 6. Neighbor-joining tree based on the F ST genetic distance matrix among modern East Asians.

FIGURE 7 .
FIGURE 7. TreeMix.The maximum likelihood model with 0 migration events revealed the phylogeny relationships among modern East Asians.
migration, and admixture events contributed to the patterns of population structure in East Asians(Ning et al. 2020;Yang et al. 2020; C.-C.  Wang et al. 2021; T. Wang et al. 2021).Nearly half of the HM-speaking Miao populations inhabit Guizhou Province in Southwest China.However, limited sampling size and low-resolution genetic markers failed to reveal the comprehensive admixture history of Miao from different tribes.Here, we compared the high-density SNP data of 17 (represented by coastal SEA Liangdao2) and 23-38.1% YR-related ancestry, while Hunan Miao harbored more NEA-related ancestry (41.1-56.6%)than did the Guizhou Miao.

FIGURE 10 .
FIGURE 10.Admixture f 3 statistics in the form f 3 (source 1, source 2; target).Significant negative values (Z < -3) denote the target could be modeled as the admix between source 1 and source 2.
ancestry and representative NEAs/SEAs, suggesting geographically close ethnic-group-related ancestry contributed to the gene pool of the Miao Xijiang, consistent with the observed bidirectional gene flow between Vietnam/Guangxi/Guizhou Hmong and geographically close TK and ST speakers(Fan et al. 2020;Liu et al. 2020; Huang et al. in press).The Miao Congjiang formed a clade with Vietnam Hmong and could be modeled as nearly unmixed populations of Vietnam Hmong-related ancestry.Hunan Miao displayed more NEA-related ancestry and minor SEA-related ancestry than did the Guizhou Miao and Vietnam Hmong, which may relate to possible intermarriage with Han-related immigrants, in agreement with the historical context accounts of multiple waves of migrations of Miao people generally.C.-C.Wang et al. (2021)  reconstructed the deep history of East Asians, demonstrating that YR-related farmers spread the ST language and admixed with southern indigenous populations and then contributed to the formation of the north-south Han-related cline.The population movement of Yangtze-River-related farmers spread the TK, AA, and AN languages, accompanied with admixture events with surrounding populations.Archaeological and historical evidence supports the link between proto-HM speakers and Neolithic Yangtze River rice farmers related to the Daxi culture (5,300-6,400 YBP) and the Qujialing culture (4,600-5,000 YBP)(Yu and Li 2021).Although ancient samples from the Neolithic Yangtze River Valley are not available, present-day Miao individuals from two Miao tribes in Guizhou Province with all published Neolithic to historical East Asian ancients, as well as modern East Asians from HM, TK, AA, AN, ST, and Altaic language families, to provide genomic insights into the demographic history of the Guizhou Miao and their relationships with ancient and modern East Asians.

FIGURE 11 .
FIGURE 11.Estimating admixture proportions via the two-way admixture model.Error bars denote standard error.The p-value of each qpAdm model appears next to the population name.