Twigstats scripts and example dataset
Creators
Description
This repository provides all scripts to run Relate and Twigstats on imputed ancient genomes. We also provide a complete self contained example dataset, but you should be able to use the exact same scripts on your own datasets as well.
Installation
- Please install bcftools if you haven't already (https://samtools.github.io/bcftools/howtos/install.html). Please make sure that the executable is added to your
PATH
and thatBCFTOOLS_PLUGINS
is set to the correct plugin path (see bcftools link). - Please download Relate from https://myersgroup.github.io/relate/
- Please also install the R package Twigstats from https://leospeidel.github.io/twigstats/
- Optional: For plotting purposes and downstream analyses, please install the R packages
- relater from https://github.com/leospeidel/relater/
- ggplot2
- dplyr
- tidyr
- plyr
- umap
Download
To run this on your own dataset please download scripts.tgz
and Relate_input_files.tgz
.
To run the provided example, please additionally download example_data_chr1.tgz
or example_data.tgz
.
All output files that are generated by run_wg.sh
are stored under results/
.
Running the scripts
Please extract tar balls, e.g. using tar -xzvf scripts.tgz
.
The script run.sh
shows how to run everything 'in order' for chromosome 1. The script run_wg.sh
runs everything for the whole genome.
You can find the individual scripts that are being called under scripts/
.
Input files
The directory example_data_chr1
stores files for only chromosome 1, whereas example_data
stores files for the whole genome.
Under example_data/
and example_data_chr1/
you will find the following files:
- GLIMPSE imputed vcf, here named
ancients_glimpse2_chr1.bcf.
- Modern vcf (e.g. 1000G), here named
1000GP_sub_chr1.bcf
. - A poplabels file listing population labels for each individual. Individuals have to appear in the same order as in the merged vcf file. The file should contain four columns: ID POP GROUP SEX. The second column is used for population assignment.
- A second poplabels file used for the MDS analysis. The second column should now list IDs of all individuals plotted in the MDS (i.e. should be identical to first column). The outgroup should be grouped together into one population.
- File containing sample ages in generations, two lines per sample (diploid), e.g. for 3 samples of ages 0, 10, and 100 generations:
0
0
10
10
100
100
- We provide all the other required Relate input files under
Relate_input_files/
. You can reuse these in your analysis.
In this example, we are using data from the 1000 Genomes Project dataset (Nature 2015). We additionally use low coverage shotgun genomes from Anglo-Saxon contexts, British Iron/Roman Age, Irish Bronze Age, and the Scandinavian Early Iron Age (Cassidy et al, PNAS 2016; Martiniano et al, Nature Communications 2016; Anastasiadou et al, Communications Biology 2023; Schiffels et al Nature Communications 2016; Gretzinger et al Nature 2022; Rodriguez-Varela et al Cell 2023). These were imputed using GLIMPSE (https://odelaneau.github.io/GLIMPSE).
Step by step guide
Please follow run.sh
(chromosome 1 only). The script run_wg.sh
will run the whole genome.
These scripts will
- Run
scripts/1_prep_vcf.sh
to filter the imputed genotypes. - Then run
scripts/2_prep_Relate.sh
to prepare Relate input files - Finally run
scripts/3_run_Relate.sh
to estimate genealogies
We can use these Relate files for various analyses:
- You can run Twigstats and infer admixture proportions using
Rscript scripts/4_run_Twigstats.R
. - You can estimate coalescence rates and population sizes using
Rscript scripts/5_plot_popsize.R
. - You can run an MDS using
Rscript scripts/6_plot_MDS.R
.
To see the arguments required in each script, you can execute the script without arguments, e.g. by executing scripts/1_prep_vcf.sh
or Rscript scripts/4_run_Twigstats.R
.
The expected output is shown in the attached pdf.
Files
Fig1.pdf
Files
(7.5 GB)
Name | Size | Download all |
---|---|---|
md5:d577433626ee4923c4b3dcdf133515dc
|
3.4 GB | Download |
md5:8e7e3f23e95115f4cdfeb9c8f7865270
|
273.4 MB | Download |
md5:86600acea139d78206757bb8eb1a55a8
|
152.0 kB | Preview Download |
md5:f5082440da3424f061143cd3fd2f674e
|
2.3 GB | Download |
md5:bd17f09a941a750ce7f3fe45072f3baf
|
1.5 GB | Download |
md5:72834206b3171c9203d3c88984cbd6e3
|
1.8 kB | Download |
md5:eb6c325edfcfe702123c01af6da4d62c
|
1.9 kB | Download |
md5:d6e09afa397f60d1ac103420f7feb820
|
8.5 kB | Download |
Additional details
Software
- Repository URL
- https://leospeidel.github.io/twigstats/