Data from: Jointly representing long-range genetic similarity and spatially heterogeneous isolation-by-distance

Shastry, Vivaswat; Musiani, Marco; Novembre, John

doi:10.5281/zenodo.15007585

Published March 12, 2025 | Version v1

Other Open

Data from: Jointly representing long-range genetic similarity and spatially heterogeneous isolation-by-distance

1. University of Chicago
2. University of Bologna

Isolation-by-distance patterns in genetic variation are a widespread feature of the geographic structure of genetic variation in many species, and many methods have been developed to illuminate such patterns in genetic data. However, long-range genetic similarities also exist, often as a result of rare or episodic long-range gene flow. Jointly characterizing patterns of isolation-by-distance and long-range genetic similarity in genetic data is an open data analysis challenge that, if resolved, could help produce more complete representations of the geographic structure of genetic data in any given species. Here, we present a computationally tractable method that identifies long-range genetic similarities in a background of spatially heterogeneous isolation-by-distance variation. The method uses a coalescent-based framework, and models long-range genetic similarity in terms of directional events with source fractions describing the fraction of ancestry at a location tracing back to a remote source. The method produces geographic maps annotated with inferred long-range edges, as well as maps of uncertainty in the geographic location of each source of long-range gene flow. We have implemented the method in a package called FEEMSmix (an extension to FEEMS from Marcus et al 2021), and validated its implementation using simulations representative of typical data applications.
We also apply this method to two empirical data sets. In a data set of over 4,000 humans (Homo sapiens) across Afro-Eurasia, we recover many known signals of long-distance dispersal from recent centuries. Similarly, in a data set of over 100 gray wolves (Canis lupus) across North America, we identify several previously unknown long-range connections, some of which were attributable to recording errors in sampling locations. Therefore, beyond identifying genuine long-range dispersals, our approach also serves as a useful tool for quality control in spatial genetic studies.

Methods

The wolf data set (wolvesadmix_corrected) consists of 108 individuals and 17,729 SNPs. For this study, we correct the locations of two individuals based on an analysis of the sample meta data and remove three individuals with ambiguous locations from the original data set of 111 wolves compiled in Schweizer et al 2016 (data available here:https://doi.org/10.5061/dryad.p8cz8wb18).
The human data set (c1global1nfd_public) consists of 4,070 individuals and 19,954 SNPs. For this study, we subset to individuals with public sharing permissions from the larger data set of 4,697 individuals in Peter et al 2020. (data available on Zenodo as 'Supplemental information').

Files

papers.txt

Files (21.6 MB)

Name	Size	Download all
c1global1nfd_public.bed md5:2597b8fb49629df848721ceabc084fc7	20.3 MB	Download
c1global1nfd_public.bim md5:717f727032ef3d8b386d9a39e9481503	585.2 kB	Download
c1global1nfd_public.coord md5:cc4f3848959ed8d2a535a8cae4ef26c3	204.1 kB	Download
c1global1nfd_public.fam md5:076a3d0185bf10fe00d9c43e2985faf0	144.7 kB	Download
c1global1nfd_public.indiv_meta md5:23aa1fe2f4086812215f0e58ba574578	290.9 kB	Download
c1global1nfd_public.outer md5:53053ceb4b81339982ef7dae0da8d9d1	65.5 kB	Download
papers.txt md5:4af88798c95a43de092f29b3de097132	5.8 kB	Preview Download

Additional details

Is cited by: 10.1101/2025.02.10.637386 (DOI)
Is derived from: 10.5061/dryad.p8cz8wb18 (DOI)

	All versions	This version
Views	31	31
Downloads	68	68
Data volume	195.3 MB	195.3 MB

Data from: Jointly representing long-range genetic similarity and spatially heterogeneous isolation-by-distance

Creators

Description

Methods

Files

papers.txt

Files (21.6 MB)

Additional details

Related works