Published October 20, 2023 | Version v1
Dataset Open

Supplementary Data - Haplotype-based inference of recent effective population size in modern and ancient DNA samples

Description

This repository contains the simulated data analyzed in our manuscript titled "Haplotype-based Inference of Recent Effective Population Size in Modern and Ancient DNA Samples".The data is split into 7 datasets. 

- demographies.tar.gz: the simulated demographic models.
- modern_data.tar.gz: simulated SNP-araray data. The genotypes were simulated under 4 demographic histories (see demographies.tar.gz) for 256 samples and under 10 different random seeds (Replicate 1-10). Additionally, Replicates 11 and 12 include simulations at larger sample sizes. Note that the results at lower sample sizes can be obtained by keeping the first N samples of the simulated files.
- true_ibd.tar.gz : This dataset contains the IBD segments from the simulated modern_data (ground truth from ARGON simulator).
- ancient_data.tar.gz dataset contains simulated aDNA data. The dataset contains data simulated at different coverages (MISSING_$M, where $M = exp(-coverage)), sample sizes, demographic models, and random seeds. Split into three parts. See below for instructions on how to reconstruct the dataset. 
- structure.tar.gz and admixture.tar.gz : contain the data simulated under more complex demographic histories involving 2 isolated populations (structure) or a single population undergoing a recent admixture event (admixtuer.tar.gz). The manuscript provides more details about the demographic histories.
- imputed.tar.gz dataset contains simulated imputed aDNA data. Each region was simulated independently. Each folder corresponds to a chromosome arm and contains:
    - data.anc_array.npy: list of the SNPs included in the analysis (simulating a 1240k array)
    - data.glimpse.vcf.gz: phased ancient population data as phased by GLIMPSE v1
    - data.glimpse.vcf.gz.csi: index for the above VCF file
    - data.imputed.vcf.gz: imputed ancient population data (unphased, including dosages and genotype posteriors)
    - data.imputed.vcf.gz.csi: index file for the above VCF
    - data.map.gz: genetic map for GLIMPSE (tab-separated format: pos chr cM)
    - data.ref.tsv.gz: file used for calculating genotype likelihoods with BCFtools mpileup command to use as input for GLIMPSE (format: chromosome position ref_allele,alt_allele)
    - data.ref.tsv.gz.tbi: index for the above file
    - data.ref.vcf.gz: simulated sequencing data from the reference panel
    - data.ref.vcf.gz.csi: index file for the above VCF
    - data.target_ground_truth.vcf.gz: simulated sequencing data for the ancient population (ground truth)
    - datalist.txt: list of genotype likelihood files for each target individual (to be used for merging into a single file)
    - data.temp.map.gz: genetic map for data creation with msprime simulator (format: chr  position  rate(cM/Mb) cM)
    - data.tree -> msprime simulator output containing all samples, both reference panel samples and ancient population samples
    - data.vcf.gz: VCF containing ground truth sequencing data, phased genotypes for reference panel samples and for ancient population samples
    - dataref.fa.fai: index for reference fasta file used during reads creation

 

Note that some of the datasets have been split into multiple parts, for example admixed.tar.gz  has been split into three different parts admixed.tar.gz-part-aa, admixed.tar.gz-part-ab, and admixed.tar.gz-part-ac
You can get the data by typing:

cat admixed.tar.gz.part-* > admixed.tar.gz


 

Files

Files (47.4 GB)

Name Size Download all
md5:4e266637a9b1f13c309500ec92887325
3.2 GB Download
md5:801bd46ccaa5c53792c79137be7eb13f
3.2 GB Download
md5:6de8821898578c3e04260a26f43c4837
3.2 GB Download
md5:aefef28bf8026e3415cce66d3b7c87a3
3.2 GB Download
md5:c44d6ab92eda2d3befc3b881c9fd214f
3.2 GB Download
md5:708e9b9f51fb5f8fdf6e260536607b1c
3.2 GB Download
md5:04d8100242ceceb196d4a94015fc88e3
3.2 GB Download
md5:f887981e4e4f19c5fdb87a764a18b722
3.2 GB Download
md5:76fc9c6745f3d7073054f22105636951
3.2 GB Download
md5:67b8d8c075a142be4b6499becce551db
3.2 GB Download
md5:d73721f61310598acb5e36b953fce062
3.2 GB Download
md5:9cc4186fba1195008dfa7f3628de717a
571.8 MB Download
md5:b23ecba310507fec1dda34a968c0901b
1.2 kB Download
md5:b326bf0edca290ad3f2e5c87a0b728c3
3.9 GB Download
md5:fb6b67961c1c90a43da0e66ed36ba062
4.9 GB Download
md5:60120322eb196f92ff41476982ce111d
1.8 GB Download
md5:a20fdd856bd3d09a13757c04d2fa56a0
900.9 MB Download