Published July 24, 2024 | Version v1
Dataset Open

Genetic datasets, climatic conditions at sampled localities, and occurrence data to: Ice age-driven range shifts of diploids and expanding autotetraploids within a conserved niche (Grünig, Patsiou & Parisod, 2024, New Phytologist)

  • 1. ROR icon University of Fribourg
  • 2. University of Bern

Description

This repository includes

- An overview of the raw sequencing reads deposited in the European Nucleotide Archive (ENA) for the 370 individuals sampled in 17 diploid and 19 tetraploid field populations
- Scripts used to genotype diploids and autotetraploids samples of Biscutella laevigata from ddRADseq data
- Input data (as vcf format) used in population genetic analyses
- Scripts used to run the different genetic analyses
- Dataset of extracted climatic conditions at sampled localities
- Occurrence dataset used for the climatic niche modelling

Description of the data and file structure

00.ENA_samples_correspondance.txt: provides ENA project ID, run ID (i.e. raw fastq files), sample ID, and alias for each sample included in the study.
 
1.scripts_reads_to_vcf.zip: consists of the following:
- 1.reads_to_vcf.md: md file with scripts documenting the read quality check, demultiplexing, mapping, SNP calling using GATK4, and filtering steps
- Additional scripts called within 1.reads_to_vcf.md:
-- 1.3. Mapping: 02_run_mapping_XXX.py and BWA-mem_bisc1_sg.py scripts
-- 1.4.a. HaplotypeCaller: 03_V1_gvcf.py
-- 1.4.b. GDBI + genotypeGVCF: 03_V3_gdbi_genotype_per100scaf.py

2.datasets_genetics.tar.gz consists of the following

- bisc_all370_diminDP15_tetraminDP30.vcf.gz: "Initial SNPs dataset" = biallelic SNPs fulfilling GATK quality hard filtering recommendations, present in at least 50% of samples. Genotypes with DP<15 for diploids and DP<30 for tetraploids are set to no-call. This vcf was used as basis for fastsimcoal dataset preparation, and as basis for subsequent selection of loci fulfilling requirements of each analysis. It includes 2246701 biallelic SNPs for 370 samples

- bisc_all370_diminDP15_tetraminDP30_MD05_pruned.vcf.gz: subset of the "Initial SNPs dataset" retaining SNPs called in at least 50% of samples, and pruned for Linkage disequilibrium. This vcf includes 107574 biallelic SNPs for 370 samples and was used in the analysis of the proportion of diploids diagnostic alleles shared by tetraploids.

- bisc_all370_diminDP15_tetraminDP30_MD01_pruned.vcf.gz: subset of the "Initial SNPs dataset", retaining SNPs called in at least 90% of samples, and pruned for Linkage disequilibrium. This vcf includes 4444 biallelic SNPs for 370 samples and was used in the analyses of Population diversity and differentiation (SpaGeDi, GenoDive, PCA), and f3-statistics.

bisc_all370_diminDP15_tetraminDP30_MD0.1_pruned_MAC3rm.vcf.gz: subset of the "Initial SNPs dataset", retaining SNPs called in at least 90% of samples, pruned for Linkage disequilibrium, and with a minor allele count of 3. This vcf includes 2593 biallelic SNPs for 370 samples and was used in STRUCTURE analysis


3.pres_2x.txt: list of the 128 diploid occurrences used in climatic niche modelling

3.pres_4x_strat_reg.txt: list of the 924 tetraploid occurrences used in climatic niche modelling

biscall_chelsa_ordered_noDEM.txt: climatic data extracted from the CHELSA dataset at sampled localities

4.plot_GTfreqs.md: markdown file including scripts to plot allele and genotype frequencies

 

Sharing/Access information

Raw sequencing reads have been deposited in the European Nucleotide Archive (ENA) at EMBL-EBI under the accession number PRJEB48869: https://www.ebi.ac.uk/ena/browser/view/PRJEB48869

Files

00.ENA_samples_correspondance.txt

Files (2.8 GB)

Name Size Download all
md5:f5d3ab018c8fc772afa7e861e5b75f63
58.3 kB Preview Download
md5:c80908c5ae6b878a6de2c85b4fdd1e33
25.7 kB Preview Download
md5:e759b7b33481d4671a3b3ff481784ed5
2.8 GB Download
md5:62ab87419e134ccac9b84cedb6ecd4ae
5.4 kB Preview Download
md5:716cb2f614e998fb5a20c437e3b4c9bb
7.3 kB Preview Download
md5:02e033527ce2c4984e932386fa8017ec
35.1 kB Preview Download
md5:e43ee7ad3c7bd529aa9c533985e90962
4.3 kB Preview Download
md5:15ee6e42c691891dfb4dd8f222c071aa
3.7 kB Preview Download
md5:55549d0947dc4611f9f787bca2ac53ad
62.7 kB Preview Download