Data from: Identifying Key Biodiversity Areas based on distinct genetic diversity

Gronefeld, Sarah Christin; López, Heriberto; Schmidt, Robin; Hochkirch, Axel

doi:10.5281/zenodo.13898217

Published November 7, 2025 | Version v1

Software Open

Data from: Identifying Key Biodiversity Areas based on distinct genetic diversity

1. Universität Trier
2. Instituto de Productos Naturales y Agrobiología
3. Technische Universität Braunschweig
4. Musée National d'Histoire Naturelle

Key Biodiversity Areas (KBAs) are sites that contribute significantly to the global persistence of biodiversity. Distinct genetic diversity has been introduced as one of the metrics to estimate whether a site holds a threshold proportion of a species' global genetic diversity during the KBA identification process. However, genetic data has so far not been used due to the lack of thoroughly tested methods and guidance. We tested the applicability of Analyses of Molecular Variance (AMOVA), allelic overlap, the diversity index Simpson's λ, average taxonomic distinctness (AvTD, Δ+), and effective population size (Ne), calculated with the two different programs SPEED-NE and NEESTIMATOR, for identification of KBAs. We conclude that Δ+, a measure that has originally been developed to measure taxonomic distinctness of biotic communities, performs best in the context of KBA identification as it encompasses both genetic distinctiveness and diversity, is based on simple allele abundances, and can be calculated and applied smoothly. AMOVA, allelic overlap, and λ are not suitable as they do not capture distinctiveness and diversity simultaneously. The use of Ne poses several challenges due to strict data requirements and the occurrence of negative and infinite estimates. NEESTIMATOR proved to be more efficient and use-friendly than SPEED-NE, can be applied to a larger number of datasets, and is free of charge. As the calculation of Δ+ is new in the context of genetic analyses, we provide a simple tool to facilitate calculation of this metric.

Here, we deposited additional information on which methods, and how they were calculated on which data sets, including references. We included additional information on how these methods performed. Moreover we included raw reads, an already processed .str file and additional information of additional data set we created to add a case study to our publication. The code we used for our analyzes can be found on GitHub and Zenodo. For more information about the code and the detailed procedure, view the associated publication, the readme on GitHub, and the Supplemental_Information.pdf file we publish here.

Notes

Funding provided by: N/A
Crossref Funder Registry ID: 0

Methods

As both SNP and microsatellite datasets are commonly used to analyze intraspecific genetic variance, we tested the performance of our chosen analytical approaches on 30 published diploid datasets, of which 15 used SNPs (with an average of 184 SNP loci) and 15 microsatellite datasets (with an average of 31 microsatellite loci). Each dataset was analyzed with six methods: AMOVA, allelic overlap , Delta⁺, D_est , lambda corrected for sample size , and N_e. To apply all six methods, an R project was created that makes use of many packages that facilitate displaying results and working with genetic data and tables.

For better comparability between the six methods, all datasets were prepared in the same way. Sites with fewer than 30 individuals were removed from the analysis. Individuals with > 20% missing data were removed from the dataset. Missing allele abundances were replaced with the means of remaining alleles. Datasets were only analyzed if N_e could be calculated to allow a proper comparison of all methods.

To explore similarities between the different approaches, correlations between the results of all six methods, allelic overlap, AMOVA, D⁺, D_est, N_e, and lambda_cor, were calculated in R. To increase the understanding between diversity measuring methods, correlations with allelic richness were additionally calculated. Two outliers were removed from AMOVA results. A Kendall correlation was chosen. Correlations between allelic overlap, allelic richness, AMOVA, D⁺, D_est, N_e, and l_cor, are based on different sample sizes, since N_e could not be calculated for some areas.

We applied the A1b (> 1 % of the global distinct genetic diversity occurs at this site) and B1 (> 10 % of the global distinct genetic diversity occurs at this site) threshold to our studied methods. For each of the six methods, the proportion of distinct genetic diversity was calculated as the simple proportion of distinct genetic diversity at each location from the sum of the distinct genetic diversity of all locations. Areas lacking N_e were allocated the median of remaining areas.

Structure analyses were conducted in addition to the calculation of the six metrics for two case studies: the Chinook salmon (Oncorhynchus tshawytscha; Gomez-Uchida et al. 2019) and the Tenerife Short-winged Bush-cricket (Ariagona margaritae). The latter data set has not yet been published for another project. The site selection was based upon KBA criterion B1. The results were processed in R. Afterwards maps were created.

To create the Tenerife Short-winged Bush-cricket (Ariagona margaritae Kraus, 1892) dataset specimens were collected 2010–2023 on Tenerife and El Hierro. DNA was extracted using the Qiagen DNeasy® Blood & Tissue kit. ddRADseq libraries were prepared for paired-end sequencing on a High-Output Flow cell of an Illumina NextSeq platform (2 x 75bp). Stacks 2.6.6 was used to demultiplex, filter, and trim raw reads to 65bp, create an assembly and a catalogue of loci to finally identify SNPs (n= 64, -p 150, -r 1). Default settings were maintained. Individuals containing more than 20% of missing data were removed from the analysis. The resulting dataset comprised 108 individuals and 5198 loci. No area was excluded from the analysis as each had ≥20 sequenced individuals. The allelic overlap method was omitted for this dataset due to extensive calculation times. For N_e, the smallest possible natural number was added to transform all N_e into positive numbers. Apart from that, this dataset was analyzed in the same way as the previously used datasets.

Files

Barcodes_Lib1.txt

Files (187.2 kB)

Name	Size	Download all
Barcodes_Lib1.txt md5:bf042b76b3302f04ab47d92f57717f79	1.3 kB	Preview Download
Barcodes_Lib2.txt md5:9998143dddf20012a7876239788fb6e5	1.4 kB	Preview Download
examples_Ariagona_create_structure_df.R md5:acc7b51a290ffea6ceffb1713e046059	8.4 kB	Download
examples_Ariagona_genetic_D_D__NeEstimator.R md5:98adfbc730a198460ff582d1325607e8	7.2 kB	Download
examples_convert_structure_to_xlsx.R md5:b87636dc27c5c943b0c937f52243c0f6	2.6 kB	Download
examples_convert_xlsx_to_structure.R md5:bc800bb42d5d229ea1da5840f46c17aa	1.1 kB	Download
examples_create_Structure_df_for_GIS.R md5:33b77e506310edebecfff872bf7f22c8	12.5 kB	Download
examples_PCA_Ariagona.R md5:10b175f94bd5b9a6359e35b25dc5a890	4.9 kB	Download
examples_PCA_calculation.R md5:844a9d6b1bc27892ca90826a516f4ce6	11.0 kB	Download
examples_PCA_plot.R md5:6fb311190d0622a701dc8ef322bc56b1	6.8 kB	Download
genetic_D_D__AMOVA.R md5:b618ddc2edf41f71a6c01390f6a0fecc	9.9 kB	Download
genetic_D_D__AvTD.R md5:e276d93d2982e3d6bb2a10c9ef808e54	6.4 kB	Download
genetic_D_D__correlations.R md5:ffc9f42499d1caa39d710c73ce0e959f	18.2 kB	Download
genetic_D_D__Dest.R md5:58b5942660da1401dbb01bacbb6d8e67	6.0 kB	Download
genetic_D_D__distribution.R md5:9e31641774727d88d904491b81bbf45b	6.9 kB	Download
genetic_D_D__functions.R md5:26de15a3741769678d281e66d9a9a31d	24.3 kB	Download
genetic_D_D__KBA_criteria.R md5:a8595a7fc77695a674b0636b1de288a2	8.1 kB	Download
genetic_D_D__lambda.R md5:e4a314c5e9b900d2e1ecf891c536f26e	7.1 kB	Download
genetic_D_D__libraries.R md5:ad54d7750c90c04e42601a560fd9799a	1.5 kB	Download
genetic_D_D__Main_Script.R md5:b933a8fe8cbfc52e9e826ea2961198f8	3.5 kB	Download
genetic_D_D__NeEstimator.R md5:0179e2b41bfee838a7c1922c1ca7fccb	9.1 kB	Download
genetic_D_D__Overlaps.R md5:0f4163cdf4b8d4e9f07e2788722588a6	15.5 kB	Download
how_to_format_files_ideas.R md5:140c445c21af7e9cc70a8f5880d7bb6b	3.9 kB	Download
ID5.txt md5:666198d845202053afcdb90c3bfa07db	1.3 kB	Preview Download
loop_all_data_example.R md5:5a12b007dbbe9f9eb372e151e5175c98	2.2 kB	Download
Pop_and_ID.txt md5:838872f7fe7205974d1fbddcf8d0560f	1.7 kB	Preview Download
Pop_and_ID2.txt md5:abb91f98b8f287e1d3c13ee2e46e2eb5	1.6 kB	Preview Download
Pop_and_ID2_sorted.txt md5:249c7a690743ff71b55ef64b5716a1eb	1.6 kB	Preview Download
Stacksscript.txt md5:522dc124b7321297e9e4d1cc712aea23	1.2 kB	Preview Download

Additional details

Is source of: 10.5061/dryad.573n5tbhk (DOI)

	All versions	This version
Views	141	85
Downloads	965	596
Data volume	6.0 MB	3.8 MB

Data from: Identifying Key Biodiversity Areas based on distinct genetic diversity

Authors/Creators

Description

Notes

Methods

Files

Barcodes_Lib1.txt

Files (187.2 kB)

Additional details

Related works