Ethnic Origin of Crime Scene Evidential Materials Determination in Three Main Ethno-linguistic Population Groups in Nigeria

DNA analysis using autosomal short tandem repeat (microsatellite) polymorphism is a useful tool for forensic purposes, such as individual identification, stain analysis and paternity testing. Analyses of such materials are carried out by the comparison of profiles from questioned samples or crime scene with those from suspects or victims or from database. In some instances, the profiles generated will neither match that from suspects nor the database. The objective of the current study is to identify population specific markers that will show distinct genetic variability among the three main ethno-linguistic population groups in Nigeria. The profiles generated can be used to infer ethnic origin of test samples from the populations in an ethnically blinded test. Allele frequencies for each ethnic group from 315 unrelated individuals representing the three populations; Ibo, Hausa and Yoruba were generated using 15 Microsatellite loci (STRs) from Applied Biosystems. Multi-locus genotype frequencies were utilized for testing conformity with Hardy-Weinberg equilibrium. Chi-square goodness of fit showed seven loci to be in Hardy-Weinberg Original Research Article Agbo et al.; ARRB, 12(4): 1-8, 2017; Article no.ARRB.32783 2 equilibrium. However after Bonferoni correction all loci were found to be in conformity with HardyWeinberg equilibrium. The allele frequencies generated for each population were tested in the determination of ethnic origin of twenty test samples randomly collected in an ethnically blinded test. The ethnic origins of the twenty samples were correctly determined with 99.5% success, by using the principle that the ratio of profile frequencies for the same profile in different ethnic groups is a likelihood ratio.


INTRODUCTION
Nigeria is located at the Gulf of Guinea in the Western part of Africa with an estimated population of 183 million according to the July 2016 estimate of the American Central Intelligence Agency (CIA) world fact book [1]. It comprises of over 527 ethno-linguistic groups with 520 of them in existence and 7 extinct [2]. Each ethnic group is associated with a distinct language which belongs to the three main African Linguistic families; Niger-Kordofanian, Afro-Asiatic and Nilo-Saharan [3].
The three main ethnic groups in Nigeria, which are Hausa, Yoruba and Ibo constitute over 68% of its total population. The Hausa ethnic group is found in the North and it belongs to the Afro-Asiatic linguistic family in the subphylum 'Chadic' together with the Angas, Bachamas, Tangale, Bokkos and so on. They are reported to have migrated from North Africa in Upper Egypt. The Yoruba and Ibo ethnic groups are found in the West and East respectively. They belong to the Niger-Kordofanian linguistic family in the subphylum 'Kwa' together with Edo, Itsekiri, Tiv, Igala, Idoma and so on. Studies have it that they migrated from East Africa [4]. Linguistic diversity in Nigeria is attributed to this complex history of migration and settlements over a long period of time, which altered the linguistic landscape of the country resulting in rich cultural heritage (cultural, linguistic, ethnic, etc.) histories [5]. Populations separated from one another for a long time, through genetic drift and mutation will become statistically differentiated by their differing allelic frequencies [6]. These alleles may exist with large allele frequency differential referred to as population specific alleles (PSAs) [6,7].
Many studies have shown that most of the human variations, about 80% to 90% are observed within populations, and about only 10% to 20% are due to population differences [8,9].
Genetic markers found in one population and not in the other were for the first time used by Neel to estimate mutation rates, who referred to them as private alleles. Chakraborty called them "unique Alleles", and concluded that those with the largest allele frequency differences among populations are most useful for forensic and admixture mapping studies. Shriver referred to them as population specific alleles (PSA) and stated that they are markers, which exhibit large frequency differential of more than 50% [10].
The population specific alleles of forensic relevance found in populations can be used to establish ethnic origin of the crime scene evidential materials. Klinstar supported this conclusion and he further observed that the frequency differences between populations could be used to situate an individual with broad, geographically based groups by genetic clustering method. According to the same authors a forensic sample is associated with a population in which its profiles are most common [11]. In his work, Brenner established a possibility for determination of the ethnic origin of a sample by using the principle that the ratio of the profile frequencies for the same profile in different ethnic groups is a likelihood ratio [12].

Populations
Twenty different communities in six states, within the three ethnic groups in Nigeria, were selected based on historical antecedents and ethnic affiliations. Six communities were selected from the Hausa ethnic group, while seven communities were selected from each of the Ibo and Yoruba ethnic groups. Genealogies of each participant were traced to the fourth generation of target population by self-declaration.

Short Tandem Repeat (STR) Markers
Fifteen microsatellite Loci (STR) were utilized in the analysis. They include the 13 core CODIS loci widely used in forensic DNA analysis and two other additional forensically relevant loci. All the selected loci are tetra-nucleotide repeat motif and located on 13 different chromosomes. The loci CSFIPO and D5S818 are located on chromosome five and D2S328 and TPOX are located on chromosome two but they are unlinked, which makes them ideal for the study.

DNA Extraction
Mouth swabs were collected from three hundred and fifteen (315) consenting individuals. One hundred and five (105) from each ethnic group using DNAase free cotton swabs from Sirche Forensic Inc. DNA was extracted from each sample using Applied Biosystems PrepFiler ® forensic DNA extraction kit according to the manufacturer's protocol.

Amplification and Genotyping
The extracted DNA was then amplified using Applied Biosystems GeneAmp 9700 Thermal Cycler. This was done in a single multiplex reaction for each individual using AmpFL STR ® Identifiler ® Direct Kit PCR reagents according to the manufacturer's protocol as validated by the laboratory in 26 cycles (Applied BiosystemAmpFL STR ® Identifiler ® Direct Kit PCR Manual, 2012) [13]. The amplified DNA fragments (amplicons) were separated by capillary electrophoresis using 310 Genetic Analyzer according to the validated method of the laboratory.

Statistical Analysis
Allele frequency for each locus was calculated from the number of genotype in the sample set by the method of gene count since the STR loci are autosomal. Conformity with Hardy Weinberg genotype frequencies were carried out by two tests; Chi-square and Exact test for multi-allelic loci [11]. Bonferoni correction was used to ascertain significant departures from Hardy-Weinberg equilibrium [14].
Ethnic origin inferences were carried out by pairwise likelihood ratio analysis of the genotype frequencies of the twenty samples collected in Lagos using the different ethnic group allele frequencies generated by the study [12].

Allele
Frequency Database Generation and

Power of Discrimination (PD) Determination
From the DNA profiles obtained, genotype data were generated. This was used to calculate the allele frequency for the various ethno-linguistic groups across the 15 loci. A combined genotype and allele frequency for the three ethno-linguistic groups were also generated. The DNA profiles generated was used to calculated power of discrimination (PD) across the 15 loci using Power Stat from Promega Inc [15].

Determination of Ethnic Origin
Mouth swab were collected from randomly selected 20 unrelated consenting subjects without lineage screening for the three ethnic groups in Lagos in the South Western part of Nigeria. DNA profiles were extracted from the samples, amplified, separated and genotyped across the 15 loci. The allele frequencies obtained from the study were used to calculate observed genotype frequencies for the 20 samples for each individual across the 15 loci. Likelihood ratio of the genotype frequencies for each individual across the 15 loci was also calculated. Then, the percentage of the number of loci for an individual with the highest value for likelihood ratio was calculated for the three ethno-linguistic groups. An ethnic group is then assigned to an individual with more number of loci with highest value of the likelihood ratio.

RESULTS
Seven loci out of the 15 loci used in the current study were found to be in conformity with Hardy-Weinberg equilibrium using Chi-Square goodness of fit. (THO1, vWA, D2S1338, D3S1358, D16S539, D18S51, and D21S11). However, after Bonferoni corrections all the loci were found to be in conformity with Hardy-Weinberg equilibrium (Table 1). Exact test for multi-allelic loci showed all loci to be in conformity with Hardy-Weinberg proportion of genotype frequencies.

Allele Frequencies
The variation in allele frequency distribution across the 15 loci for the combined population was substantial with the population variance of 7.13257 ±0.2166 (Fig. 1). The Hausa ethnic group has the highest variation value of allele frequency distribution with population variance of 7.3742 ±1.045. The Ibo ethnic group has the least variation frequency with population variance of 6.8095 ±0.96, while the Yoruba ethnic group has population variance of 7.00581 ±0.99 (Fig.  2). These differences in variance between the ethnic groups are very small in value and not statistically significant.

Power of Discrimination (PD)
Power of Discrimination (PD), an important parameter for individual identification ranged from 0.829 in THO1 locus in the Ibo ethnic group to 0.975 in FGA locus in the Hausa ethnic group with population variance of 0.00126 ±0.0080 ( Table 2). Pair wise comparison of the population variance among the three ethno-linguistic groups showed that the Ibo and Yoruba ethnic groups have the lowest difference of 0.00001 (Fig. 3).

Ethnically Blinded Test
The sum of the Likelihood ratio of the genotype frequencies across the 15 loci in the ethnically blinded test gave correct ethnic origin for 19 of the 20 randomly collected samples from unrelated subjects from the three populations. The remaining one sample gave the same value for both Hausa and Ibo groups. However, ethnic lineage crosscheck on the questionnaire from the individual showed that the individual was an admixture with an Ibo Father and a Hausa mother (Fig. 4).
The percentage success of a locus in the determination of ethnic origin of a sample among the three-population groups showed CSFIPO, D3S1358 and D16S539 have equal success in both Ibo and Yoruba, D2S1338 and D13S317 had equal success in both Hausa and Ibo. D5S818 and D7S820 had the highest success in Ibo ethnic group. D13S317, D19S433, D21S11, FGA and TPOX had the highest success in Yoruba ethnic group. D8S1179, D18S51, THO1 and vWA had the highest success in the Hausa group (Fig. 5).

DISCUSSION
Microsatellites are highly polymorphic and most suitable genetic marker for studies in the field of population genetics [9]. In the investigation, it has been used to infer ethnic origin of test samples. The observed deviations in the chi-square test for conformity of the genotype frequencies with Hardy Weinberg (HW) equilibrium are due to the sample size. This resulted in absence of some alleles in some loci, since it is practically impossible to capture all the

. Percentage Success of Ethnic determination locus-wise
Microsatellites are highly polymorphic and the most suitable genetic marker for studies in the In the current investigation, it has been used to infer ethnic origin of test samples. The observed deviations square test for conformity of the encies with Hardy Weinberg (HW) equilibrium are due to the sample size. This resulted in absence of some alleles in some loci, since it is practically impossible to capture all the alleles in the population in the sample drawn, as well as in allele combination of both high and low frequencies, which usually produce devia the equilibrium [6]. However, no individual locus deviated significantly from HW equilibrium after application of Bonferoni corrections for the 15 loci, which adjusted the P-value to 0.003. Exact test also showed all the loci to be in conformity with HW equilibrium. These results agree with earlier studies of such population structure [14,16].

Fig. 4. Percentage allele frequency proportions of the twenty test samples
alleles in the population in the sample drawn, as tion of both high and low frequencies, which usually produce deviations for . However, no individual locus deviated significantly from HW equilibrium after application of Bonferoni corrections for the 15 to 0.003. Exact test also showed all the loci to be in conformity with HW equilibrium. These results agree with earlier studies of such population structure