Exon capture museomics deciphers the nine-banded armadillo species complex and identifies a new species endemic to the Guiana Shield
Creators
- 1. Institut des Sciences de l'Evolution de Montpellier (ISEM), Univ. Montpellier, CNRS, IRD, Montpellier, France
- 2. Department of Ecology and Genetics, Uppsala University, Sweden
- 3. Department of Conservation Biology, CICESE, Ensenada, Baja California, México
- 4. Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- 5. Department of Biology, Valdosta State University, Valdosta, GA, USA
- 6. Institut Pasteur de la Guyane, Cayenne, French Guiana
- 7. Muséum national d'Histoire naturelle, Centre de Recherche en Paléontologie – Paris (CR2P), CNRS/MNHN/Sorbonne Université, Paris, France
Description
Exon capture museomics deciphers the nine-banded armadillo species complex and identifies a new species endemic to the Guiana Shield
Mathilde Barthe*, Loïs Rancilhac, Maria C. Arteaga, Anderson Feijó, Marie-Ka Tilak, Fabienne Justy, W. J. Loughry, Colleen M. McDonough, Benoit de Thoisy, François Catzeflis, Guillaume Billet, Lionel Hautier, Benoit Nabholz, and Frédéric Delsuc*
*Corresponding authors: mathilde.barthe.pro@gmail.com; frederic.delsuc@umontpellier.fr
Description of available files.
01_Figures_&_tables_of_the_main_text.zip
- Figure 1: Phylogenetic relationships reconstructed by maximum likelihood and maps representing the distribution of individuals according to their lineage.
- Figure 2: Assignment of individuals to lineages according to phylogenetic analyses, admixture analysis and phylogenetic delimitation.
- Figure 3: Principal Component Analysis of genetic variance.
- Figure 4: Distribution map and genetic composition of individuals of the four recognized species.
02_Supplementary_tables_&_figures.zip
- Figure S1: Distribution of targeted nuclear loci along a chromosome scale assembly.
- Figure S2: Mitochondrial genome depth of coverage.
- Figure S3: Calculation of mitochondrial lineage support for detecting contamination.
- Figure S4: Mitochondrial lineage support for each individual.
- Figure S5: a) Inbreeding coefficient and b) heterozygosity estimate for individuals according to cleaning steps.
- Figure S6: Percentage of missing data per captured locus.
- Figure S7: Summary information of the 837 cleaned nuclear loci (number of sequences, the proportion of variable sites and the percentage of missing data).
- Figure S8: Phylogenetic relationships of the 62 Dasypus individuals obtained using Astral on the 832 ML gene trees from the captured nuclear loci reconstructed with IQ-Tree and ModelFinder.
- Figure S9: Results of analyses to detect introgression.
- Figure S10: Cross validation errors according to the number of clusters (K) investigated.
- Figure S11: Detailed analysis of the substructure within the newly recognized D. novemcinctus (Southern lineage).
- Figure S12: Species delimitation estimated by bPTP-h.
- Figure S13: Comparison of the three best models from the model selection estimated with PHRAPL.
- Figure S14: Species delimitation estimated using GMYC.
- Figure S15: Heatmaps of pairwise genetic indexes between lineages.
- Figure S16: Effect of filters on admixture results.
- Figure S17: Updated map from Arteaga et al. (2020).
- Figure S18: Maximum likelihood phylogenetic tree of 212 pb of the 16s ribosomal RNA of five individuals analyzed in Abba et al., (2018) and three from this study.
- Table S1: List of biological samples with detailed information.
- Table S2a: Quality statistics by locus after filtering steps.
- Table S2b: Quality statistics by individuals after filtering steps.
- Table S3: Species delimitation estimated using PHRAPL for the four combinations.
- Table S4: Comparison of the lineage of the nineteen individuals in common with Arteaga et al. (2020) and our study.
- Table S5: Adult cranial measurements (in millimeters) of the four Dasypus species recognized in this study following Feijó & Cordeiro-Estrela (2016).
- Table S6: Adult external measurements (in millimeters) of the four Dasypus species recognized in this study.
03_Mitogenomes.zip
- Mitogenome_reference_Dasypus_novemcinctus.fasta: Mitogenome reference used to mapped reads and extract mitochondrial DNA.
- Concatenated_mitochondrial_genes.fasta: Concatenated nucleotide sequences of 15 mitochondrial genes (13 protein-coding + 2 rRNAs). Sites with more than 50% missing data were excluded resulting in a total of 13,924 sites.
- Concatenated_mitochondrial_genes_partition.txt: Partition file of concatenated sequences of the 15 mitochondrial genes (13 protein-coding + 2 rRNAs).
- Concatenated_mitochondrial_genes_TESTNEW.treefile : Maximum likelihood phylogenetic tree inferred from the concatenated sequences of the 15 mitochondrial genes using IQ-TREE under a partitioned model applying ModelFinder on each partition.
- Depth_coverage_mitogenomes.csv: Table of mean depth of coverage and proportion of missing data (Ns) of the 72 reconstructed mitochondrial genomes sequenced for this study.
04_Reanalyses.zip
- Dloop_alignment.fasta: Alignment of the D-loop sequences obtained in this study with those from Arteaga et al. (2020).
- Dloop_alignment.treefile: Maximum likelihood phylogenetic tree inferred from the D-loop alignment using IQ-TREE (GTR+G model).
- Abba_shotgun.fasta: Alignment of the 16S rRNA of individuals from this study and those from Abba et al. (2018).
- Abba_shotgun.fasta.treefile: Maximum likelihood phylogenetic tree inferred from the 16S rRNA alignment using IQ-TREE (GTR+G model).
05_Contamination_exploration.zip
- Mitochondrial_diagnostic_positions.csv: Table of the 350 diagnostic mitochondrial positions used to estimate proportion of reads supporting each lineage. Position number refers to the Complete_mitogenome_alignment.fasta file.
- Read_support_to_diagnostic_positions.csv: For each individual, this table reports the Diagnostic Rate (proportion of diagnostic positions per lineage supported by at least 3 reads), the Read Proportion (mean read proportion supporting diagnostic positions per lineage), Index (proportion of synapomorphies per lineage normalized by average frequency of reads supporting these synapomorphies) and the type of tissue (museum or fresh tissue).
- Contamination_exploration.R: R script used to plot read support to lineages and the effect of tissue type (fresh or museum).
06_Nuclear_dataset.zip
- TATU_1000exons4baits.fasta: Reference sequences of 1,000 exons and flanking regions used to define the probes for exon capture extracted from the Dasypus novemcinctus genome.
- Dasypus_capture_Final_Baits_Set.fas: Sequences of the 16,146 probes used to capture the 997 nuclear loci (exons and flanking regions).
- Diploid_837_nuclear_loci.fasta: Diploid sequences of the 837 nuclear loci for the 62 individuals in PopPhyl format (Locus|lineage|individual|Allele).
- Mean_coverage_by_individuals.csv: Table of mean depth of coverage, horizontal coverage, and number of loci per individual after filtering.
- Mean_coverage_by_loci.csv: Table of mean depth of coverage, horizontal coverage and number of loci per loci after filtering.
- Location_loci_targeted.bed : list of the loci targeted by exon capture, with their genomic locations on the chromosome scale assembly of Dasypus novemcinctus (mDasNov1.hap2)
07_Disentangling_genotyping_errors.zip
- Table_of_heterozygosity_and_inbreeging_coefficient.csv: Table of heterozygosity (He) and inbreeding coefficient (F) estimated for each cleaning steps: initial data, after correction of heterozygous positions (must be supported by a proportion of reads between 0.3 and 0.7), and after exclusion of 159 potentially paralogous loci.
- Plot_effect_of_cleaning_on_He&F.R: R script used to plot the effect of cleaning steps on heterozygosity (He) and inbreeding coefficient (F).
08_Distribution_maps.zip
- Coordinates_according_mito_nuclear_lineages.csv: Table of GPS coordinates of individuals according to their mitochondrial and nuclear lineages.
- Plot_mito_nuclear_distribution.R: R script used to plot individuals on the Neotropical map according to their mitochondrial and nuclear lineages in Figure 1.
- Mitochondrial_distribution.pdf: Geographical distribution of the 75 individuals according to their mitochondrial lineage.
- Nuclear_distribution.pdf: Geographical distribution of the 58 individuals according to their nuclear lineage.
09_Phylogenetic_inference.zip
● Phylogram_Tree
- Concatenated_nuclear_loci.fasta: Concatenated sequences of the 837 nuclear loci representing a total of 506,355 sites.
- Concatenated_nuclear_loci_partition.txt: Partition file for the 837 nuclear loci concatenation.
- Concatenated_nuclear_loci_TESTNEW.treefile: Maximum likelihood phylogenetic tree inferred from the 837 nuclear loci concatenation using IQ-TREE under a partitioned model applying ModelFinder on each partition.
● Ultrametric_Tree
- Ultrametric_tree_concatenated_nuclear_loci.treefile: Ultrametric tree inferred from the 837 nuclear loci concatenation (Concatenated_nuclear_loci.fasta in Phylogram_Tree folder) using a partitioned model applying ModelFinder on each partition (Concatenated_nuclear_loci_partition.txt in Phylogram_Tree folder). The ML phylogram (Concatenated_nuclear_loci_TESTNEW.treefile in Phylogram_Tree folder) was used as a guide tree. The root was dated at 6 Mya.
● Gene_Tree
- Concatenate_gene_tree.treefile: File containing all gene trees reconstructed using IQ-TREE applying ModelFinder to each gene.
- Astral_consensus_tree.txt: Summary species tree reconstructed with Astral using Concatenate_gene_tree_TESTNEW.treefile
● Introgression analyses:
- Concordance_factors_Dasypus.csv
- Topology_Weighting_Dasypus_plots.R
- SnaQ_results_hmax0.out
- SnaQ_results_hmax1.out
- SnaQ_results_hmax2.out
- SnaQ_results_hmax3.out
- twisst_guianensis_spmap.txt
- twisst_guianensis_Weights
- twisst_mexico_spmap.txt
- twisst_mexico_Weights
10_Species_delimitation.zip
● BPP
- input_for_bpp.phy: Sequence alignments of the 837 nuclear loci in phylip format.
- lineage_for_BPP: Correspondence file between individuals and lineages.
- r1 and r2: folders containing config files (bpp.ctl) and outputs of the BPP analysis.
● bPTP
- PTPh_Support_Partition.txt: Details of the most supported species partition.
- PTPh_tree_partition.png: Tree illustrating the most supported species partition.
● GMYC
- Script_GMYC.R: R script used to run the GMYC delimitation method on the ultrametric tree (11_Phylogenetic_inference/Ultrametric_Tree/Ultrametric_tree_concatenated_nuclear_loci.treefile).
- Figure_GMYC.png: Figure illustrating the results of the GMYC species delimitation analysis.
● PHRAPL
- Script_PHRAPL.R: R script used to run the PHRAPL delimitation method on the 09_Phylogenetic_inference.zip/Gene_Tree /Concatenate_gene_tree.treefile
11_Population_genetic_analyses.zip
● PCA
- Input_for_PCA.fasta: Diploid sequences of the 57 individuals (DNO-MC21 and DPI-L29 excluded) in PopPhyl format (Locus|species|individual|allele).
- PCA_Output: Output of the PopPhyl2PCA analysis using the Input_for_PCA.fasta file.
- Script_to_plot_PCA.R: R script used to plot PCA according to the mitochondrial lineage and nuclear composition (Admixture results).
● ADMIXTURE
- lineage_for_Admixture.list: Correspondence between individuals and lineages file.
- Input_Admixture.*: 19,872 SNPs from nuclear data across the Dasypus complex.
- Output_Admixture.k.*: Output from the Admixture analysis according to K values (from 1 to 7).
- Output_Admixture.cv.error: Summary of the error value according to K.
- Plot_Admixture.R: R script used to plot Admixture results reordered by phylogeny.
- Plot_map_distribution_admixture.R: R script used to plot Admixture results on the Neotropical map.
● Stats_Da_Dxy_GDI
- Pairwise_genetic_statistics.csv: Summary statistics computed using ABCstat_global.txt from the DILSmcsnp program for all pairwise combinations of individuals from the different lineages.
- Pairwise_GDI.csv: Genetic Differentiation Index estimates for all pairwise combinations of individuals from the different lineages.
- Plot_genetic_statistics.R: R script used to plot mean genetic statistics between lineages.
● Sublineage_structure
○ ADMIXTURE
- Plot_map_distribution_sublineage_admixture.R: R script used to plot Admixture results on the Neotropical map.
○ PCA
- Input_for_PCA_southern_lineage.fasta: Diploid sequences of the 24 individuals of the Southern lineage in PopPhyl format (Locus|species|individual|allele).
- PCA_Output_southern_lineage: Output of the PopPhyl2PCA analysis using the Input_for_PCA_southern_lineage.fasta file.
- Script_to_plot_PCA_sublineage.R: R script to plot PCA according to the mitochondrial lineage and nuclear composition (Admixture results) focussing on individuals from the Southern lineage.
12_Morpho_molecular_distribution.zip
- Coordinates_according_morphogroup_lineages.csv: GPS coordinates of individuals used in Hautier et al. (2017) according to their morphogroup.
- Plot_map_distribution_morpho_admixture.R: R script used to plot Admixture results and the individuals from Hautier et al. (2017) on the Neotropical map in Figure 5.
- Skull_lateral_*.png: Illustration of the lateral view of the skull of four individuals representing each species.
- Skull_sinuses_*.png: Illustration of the skull and paranasal sinuses of four individuals representing each species.
Files
01_Figures_&_tables_of_the_main_text.zip
Files
(257.7 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:493c05460cda594a87cd8cad909174c8
|
19.3 MB | Preview Download |
|
md5:2af8b384c923f2940745bc170994e957
|
11.9 MB | Preview Download |
|
md5:77b75319bbcba92590a69036ff81484a
|
169.7 kB | Preview Download |
|
md5:348f04ea4ca96d2e0b0e45445b21f87f
|
10.2 kB | Preview Download |
|
md5:61b6b8b9683c52b3463806a485e7eb9c
|
8.3 kB | Preview Download |
|
md5:0083f293839e69f8765ab45ba7707225
|
183.2 MB | Preview Download |
|
md5:e39d24258bab1a513095eed2e8b58af0
|
4.7 kB | Preview Download |
|
md5:f96940b7344c5316d875fe8092bab60c
|
244.6 kB | Preview Download |
|
md5:54f0d9986e9b5726c16784cf5b037ca9
|
9.0 MB | Preview Download |
|
md5:19a8d7a0c8a5803eefc5a053edc5ae7e
|
15.5 MB | Preview Download |
|
md5:b39a5332b7d6938914bdb95ae61cff45
|
13.7 MB | Preview Download |
|
md5:7137dbc627f78efcce4031a30c80a373
|
4.7 MB | Preview Download |
Additional details
Funding
- European Commission
- ConvergeAnt - An Integrative Approach to Understanding Convergent Evolution in Ant-eating Mammals 683257
- Agence Nationale de la Recherche
- CEBA - CEnter of the study of Biodiversity in Amazonia ANR-10-LABX-0025
- Agence Nationale de la Recherche
- CeMEB - Mediterranean Center for Environment and Biodiversity ANR-10-LABX-0004