Published July 24, 2024
| Version v1
Dataset
Open
Genetic datasets, climatic conditions at sampled localities, and occurrence data to: Ice age-driven range shifts of diploids and expanding autotetraploids within a conserved niche (Grünig, Patsiou & Parisod, 2024, New Phytologist)
Authors/Creators
Description
This repository includes
- An overview of the raw sequencing reads deposited in the European Nucleotide Archive (ENA) for the 370 individuals sampled in 17 diploid and 19 tetraploid field populations- Scripts used to genotype diploids and autotetraploids samples of Biscutella laevigata from ddRADseq data
- Input data (as vcf format) used in population genetic analyses
- Scripts used to run the different genetic analyses
- Dataset of extracted climatic conditions at sampled localities
- Occurrence dataset used for the climatic niche modelling
Description of the data and file structure
00.ENA_samples_correspondance.txt: provides ENA project ID, run ID (i.e. raw fastq files), sample ID, and alias for each sample included in the study.1.scripts_reads_to_vcf.zip: consists of the following:
- 1.reads_to_vcf.md: md file with scripts documenting the read quality check, demultiplexing, mapping, SNP calling using GATK4, and filtering steps- Additional scripts called within 1.reads_to_vcf.md:
-- 1.3. Mapping: 02_run_mapping_XXX.py and BWA-mem_bisc1_sg.py scripts
-- 1.4.a. HaplotypeCaller: 03_V1_gvcf.py
-- 1.4.b. GDBI + genotypeGVCF: 03_V3_gdbi_genotype_per100scaf.py
2.datasets_genetics.tar.gz consists of the following
- bisc_all370_diminDP15_tetraminDP30.vcf.gz: "Initial SNPs dataset" = biallelic SNPs fulfilling GATK quality hard filtering recommendations, present in at least 50% of samples. Genotypes with DP<15 for diploids and DP<30 for tetraploids are set to no-call. This vcf was used as basis for fastsimcoal dataset preparation, and as basis for subsequent selection of loci fulfilling requirements of each analysis. It includes 2246701 biallelic SNPs for 370 samples
- bisc_all370_diminDP15_tetraminDP30_MD05_pruned.vcf.gz: subset of the "Initial SNPs dataset" retaining SNPs called in at least 50% of samples, and pruned for Linkage disequilibrium. This vcf includes 107574 biallelic SNPs for 370 samples and was used in the analysis of the proportion of diploids diagnostic alleles shared by tetraploids.
- bisc_all370_diminDP15_tetraminDP30_MD01_pruned.vcf.gz: subset of the "Initial SNPs dataset", retaining SNPs called in at least 90% of samples, and pruned for Linkage disequilibrium. This vcf includes 4444 biallelic SNPs for 370 samples and was used in the analyses of Population diversity and differentiation (SpaGeDi, GenoDive, PCA), and f3-statistics.
- bisc_all370_diminDP15_tetraminDP30_MD0.1_pruned_MAC3rm.vcf.gz: subset of the "Initial SNPs dataset", retaining SNPs called in at least 90% of samples, pruned for Linkage disequilibrium, and with a minor allele count of 3. This vcf includes 2593 biallelic SNPs for 370 samples and was used in STRUCTURE analysis
3.pres_2x.txt: list of the 128 diploid occurrences used in climatic niche modelling
3.pres_4x_strat_reg.txt: list of the 924 tetraploid occurrences used in climatic niche modelling
biscall_chelsa_ordered_noDEM.txt: climatic data extracted from the CHELSA dataset at sampled localities
4.plot_GTfreqs.md: markdown file including scripts to plot allele and genotype frequencies
Sharing/Access information
Raw sequencing reads have been deposited in the European Nucleotide Archive (ENA) at EMBL-EBI under the accession number PRJEB48869: https://www.ebi.ac.uk/ena/browser/view/PRJEB48869Files
00.ENA_samples_correspondance.txt
Files
(2.8 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:f5d3ab018c8fc772afa7e861e5b75f63
|
58.3 kB | Preview Download |
|
md5:c80908c5ae6b878a6de2c85b4fdd1e33
|
25.7 kB | Preview Download |
|
md5:e759b7b33481d4671a3b3ff481784ed5
|
2.8 GB | Download |
|
md5:62ab87419e134ccac9b84cedb6ecd4ae
|
5.4 kB | Preview Download |
|
md5:716cb2f614e998fb5a20c437e3b4c9bb
|
7.3 kB | Preview Download |
|
md5:02e033527ce2c4984e932386fa8017ec
|
35.1 kB | Preview Download |
|
md5:e43ee7ad3c7bd529aa9c533985e90962
|
4.3 kB | Preview Download |
|
md5:15ee6e42c691891dfb4dd8f222c071aa
|
3.7 kB | Preview Download |
|
md5:55549d0947dc4611f9f787bca2ac53ad
|
62.7 kB | Preview Download |