Published August 31, 2022
| Version v1
Journal article
Evidence of backcross inviability and mitochondrial DNA paternal leakage in sea turtle hybrids
Authors/Creators
- 1. Leibniz Institute for Zoo and Wildlife Research
- 2. University of Ferrara
- 3. Fundação Projeto Tamar
- 4. Kwata NGO
- 5. BOREA, National Museum of Natural History (MNHN), CNRS
- 6. Universidade Federal de Minas Gerais
Description
This dataset contains ddRAD data used in our sea turtles genomics research on hybridization in Brazil. In this paper, we analyzed hybridization patterns, population structure and relatedness between southwest Atlantic population with a focus on Brazilian populations with high frequency of interspecific hybrids. Source code for the R2SCO pipeline is available at https://github.com/mazzoni-izw/mazzoni-begendiv/ and raw sequencing data is available on on NCBI Sequence Read Archive (SRA) database under BioProject PRJNA857276. Data uploaded here contain:
- Hybrids_SNPs.vcf: SNP dataset for population analyses.
- Hybrids_SNPs_DP15.recode.vcf: SNP dataset for population analyses filtered for a minimum coverage (DP) of 15
- R2SCO-MseI-EcoRI-384-448_CcEiHyb.fasta: reference fasta sequences (R2SCO loci) used to map raw reads from loggerheads, hawksbills and their hybrids. Generated using the R2SCO pipeline described in Driller et al (2020)
- R2SCO-MseI-EcoRI-384-448_CcEiLoCm.fasta: reference fasta sequences (R2SCO loci) used to map raw reads from loggerheads, hawksbills, olive ridleys, green turtles, and their hybrids. Generated using the R2SCO pipeline described in Driller et al (2020)
- checkRestrictionSites.py: custom script that filters a fastq file to start and end with defined sequences
Structure folder
- Create_Haplotype_Structure.py: python script used to convert the VCF output from Stacks populations into a Structure file based on haplotypes (instead of SNPs).
- Hybrids_Haps_DP15.str: input structure file for haplotypes dataset
- Hybrids_SNPs_DP15.stru input structure file for SNP dataset
- ParallelStructure_HAPS.R: Command line used in parallel structure for haplotype dataset
- ParallelStructure_SNPs.R: Command line used in parallel structure for SNP dataset
relatedness folder
- populations.snps.vcf.gz: VCF file of SNP dataset with only loggerheads, hawksbills and their hybrids
- run.sh: vcftools command line used to run relatedness analysis
- relatednessmatrix.relatedness: square matrix with relatedness estimates
- inds.txt: individuals included in the square matrix.
- relatedness.R: plotting script in R for relatedness results (figure 3)
NewHybrids folder
- Hybrids.NewHyb.input: input file for NewHybrids analysis.
Notes
Additional details
References
- Driller et al. (2020). BioRXiv. doi: https://doi.org/10.1101/2020.04.03.024331