Admixpop work package 1: simulated three-way admixture with local ancestries
Authors/Creators
Description
Overview: This dataset contains 30 replicates of simulated scenarios for threeway admixture between African, East Asian, and European superpopulations. A scenarios consists of four populations: an African population (AFR), and East Asian (EAS) population, a European population (EUR), and an admixed population.
File structure: The dataset consists of a set of compressed folders and files. The folders contains output from single replicates of simulation. Within each folder isa VCF file with phased haplotypes for admixed individuals (admixed.vcf.gz), a VCF file with phased haplotypes for ancestral (AFR, EAS, EUR) individuals (admixed.vcf.gz), a file with single-population ancestries for each individual (AFR, EAS, EUR, or admixed; global_ancestries.csv), a file with local ancestries for all individuals, and .tbi files with indexes for the VCF files.
├── replicate1/
│ ├── admixed.vcf.gz
│ ├── admixed.vcf.gz.tbi
│ ├── ancestral.vcf.gz
│ ├── ancestral.vcf.gz.tbi
│ ├── global_ancestries.csv
│ └── local_ancestries.csv
...
└── replicate30/
├── admixed.vcf.gz
├── admixed.vcf.gz.tbi
├── ancestral.vcf.gz
├── ancestral.vcf.gz.tbi
├── global_ancestries.csv
└── local_ancestries.csv
The global_ancestries.csv files: A comma-separated file with two named columns: individual and population, and one row for each individual. The indidual column contains the name of the individuals and the population column contains the population of the individual. For example, the first 7 rows of this file for replicate 1 is:
┌─────────────────┬───────┐
│ individual │population │
├─────────────────┼───────┤
│ EURanc_HG00368 │ EURanc │
│ EURanc_HG02215 │ EURanc │
│ EURanc_HG01513 │ EURanc │
│ EURanc_NA11930 │ EURanc │
│ EURanc_NA20529 │ EURanc │
│ EURanc_NA12234 │ EURanc │
└─────────────────┴───────┘
The local_ancestries.csv files: A comma-separated file with five named columns: individual, chromosome, haplotype, basepairs, and ancestry. In this file, a row is a haplotype block of consecutive loci with the same ancestry. The indidual column contains the name of the individual that hosts the haplotype block, the chromosome column contains the chromosome name that the haplotype block is on, the haplotype column denotes whether the haplotype block is on the paternally or maternally inherited haplotype, the basepairs column contains the basepair range that the haplotype block spans, and the ancestry column contains the ancestry of the haplotype block. For example, the first 7 rows of this file for replicate 1 is:
┌─────────────┬──────────┬──────────┬─────────────────────┬─────────┐
│ individual │ chromosome │ haplotype │ basepairs │ ancestry │
├─────────────┼──────────┼──────────┼─────────────────────┼─────────┤
│ admixed_85786 │ chr22 │ 1 │ 10516173:20036729 │ EUR │
│ admixed_85786 │ chr22 │ 1 │ 20036766:20464190 │ EAS │
│ admixed_85786 │ chr22 │ 1 │ 20464193:27856703 │ EUR │
│ admixed_85786 │ chr22 │ 1 │ 27856777:33165365 │ AFR │
│ admixed_85786 │ chr22 │ 1 │ 33165377:50807862 │ EUR │
│ admixed_85786 │ chr22 │ 2 │ 10516173:50807862 │ AFR │
└─────────────┴──────────┴──────────┴─────────────────────┴─────────┘
Simulation: The simulated data is based on bi-allelic data from The 1000 Genomes Project (See reference). Only genotypes from AFR, EAS, or EUR were used for further analysis. Furthermore, only data from chromosome 22 was used.
Files
Files
(48.9 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:27ad24c37183a5e3fb3dcef342921689
|
48.9 GB | Download |
Additional details
Funding
- Danmarks Frie Forskningsfond
- Unraveling Admixture-introduced Complex Genetic Variation in Genomic Research 4254-00060B
References
- Lowy-Gallego E, Fairley S, Zheng-Bradley X et al. Variant calling on the GRCh38 assembly with the data from phase three of the 1000 Genomes Project [version 2; peer review: 2 approved]. Wellcome Open Res 2019, 4:50 (https://doi.org/10.12688/wellcomeopenres.15126.2)