Twigstats scripts and example dataset

Speidel, Leo

doi:10.5281/zenodo.13880459

Published October 2, 2024 | Version v1

Dataset Open

Twigstats scripts and example dataset

Speidel, Leo

This repository provides all scripts to run Relate and Twigstats on imputed ancient genomes. We also provide a complete self contained example dataset, but you should be able to use the exact same scripts on your own datasets as well.

Installation

Please install bcftools if you haven't already (https://samtools.github.io/bcftools/howtos/install.html). Please make sure that the executable is added to your PATH and that BCFTOOLS_PLUGINS is set to the correct plugin path (see bcftools link).
Please download Relate from https://myersgroup.github.io/relate/
Please also install the R package Twigstats from https://leospeidel.github.io/twigstats/
Optional: For plotting purposes and downstream analyses, please install the R packages
- relater from https://github.com/leospeidel/relater/
- ggplot2
- dplyr
- tidyr
- plyr
- umap

Download

To run this on your own dataset please download scripts.tgz and Relate_input_files.tgz.

To run the provided example, please additionally download example_data_chr1.tgz or example_data.tgz.

All output files that are generated by run_wg.sh are stored under results/.

Running the scripts

Please extract tar balls, e.g. using tar -xzvf scripts.tgz.

The script run.sh shows how to run everything 'in order' for chromosome 1. The script run_wg.sh runs everything for the whole genome.
You can find the individual scripts that are being called under scripts/.

Input files

The directory example_data_chr1 stores files for only chromosome 1, whereas example_data stores files for the whole genome.

Under example_data/ and example_data_chr1/ you will find the following files:

GLIMPSE imputed vcf, here named ancients_glimpse2_chr1.bcf.
Modern vcf (e.g. 1000G), here named 1000GP_sub_chr1.bcf.
A poplabels file listing population labels for each individual. Individuals have to appear in the same order as in the merged vcf file. The file should contain four columns: ID POP GROUP SEX. The second column is used for population assignment.
A second poplabels file used for the MDS analysis. The second column should now list IDs of all individuals plotted in the MDS (i.e. should be identical to first column). The outgroup should be grouped together into one population.
File containing sample ages in generations, two lines per sample (diploid), e.g. for 3 samples of ages 0, 10, and 100 generations:
0
0
10
10
100
100

We provide all the other required Relate input files under Relate_input_files/. You can reuse these in your analysis.

In this example, we are using data from the 1000 Genomes Project dataset (Nature 2015). We additionally use low coverage shotgun genomes from Anglo-Saxon contexts, British Iron/Roman Age, Irish Bronze Age, and the Scandinavian Early Iron Age (Cassidy et al, PNAS 2016; Martiniano et al, Nature Communications 2016; Anastasiadou et al, Communications Biology 2023; Schiffels et al Nature Communications 2016; Gretzinger et al Nature 2022; Rodriguez-Varela et al Cell 2023). These were imputed using GLIMPSE (https://odelaneau.github.io/GLIMPSE).

Step by step guide

Please follow run.sh (chromosome 1 only). The script run_wg.sh will run the whole genome.

These scripts will

Run scripts/1_prep_vcf.sh to filter the imputed genotypes.
Then run scripts/2_prep_Relate.sh to prepare Relate input files
Finally run scripts/3_run_Relate.sh to estimate genealogies

We can use these Relate files for various analyses:

You can run Twigstats and infer admixture proportions using Rscript scripts/4_run_Twigstats.R.
You can estimate coalescence rates and population sizes using Rscript scripts/5_plot_popsize.R.
You can run an MDS using Rscript scripts/6_plot_MDS.R.

To see the arguments required in each script, you can execute the script without arguments, e.g. by executing scripts/1_prep_vcf.sh or Rscript scripts/4_run_Twigstats.R.

The expected output is shown in the attached pdf.

Files

Fig1.pdf

Files (7.5 GB)

Name	Size	Download all
example_data.tgz md5:d577433626ee4923c4b3dcdf133515dc	3.4 GB	Download
example_data_chr1.tgz md5:8e7e3f23e95115f4cdfeb9c8f7865270	273.4 MB	Download
Fig1.pdf md5:86600acea139d78206757bb8eb1a55a8	152.0 kB	Preview Download
Relate_input_files.tgz md5:f5082440da3424f061143cd3fd2f674e	2.3 GB	Download
results.tgz md5:bd17f09a941a750ce7f3fe45072f3baf	1.5 GB	Download
run.sh md5:72834206b3171c9203d3c88984cbd6e3	1.8 kB	Download
run_wg.sh md5:eb6c325edfcfe702123c01af6da4d62c	1.9 kB	Download
scripts.tgz md5:d6e09afa397f60d1ac103420f7feb820	8.5 kB	Download

Additional details

Repository URL: https://leospeidel.github.io/twigstats/

	All versions	This version
Views	851	624
Downloads	1,536	1,116
Data volume	794.4 GB	636.4 GB

Twigstats scripts and example dataset

Authors/Creators

Description

Installation

Download

Running the scripts

Input files

Step by step guide

Files

Fig1.pdf

Files (7.5 GB)

Additional details

Software