Published October 17, 2021 | Version v1
Dataset Open

A machine learning approach to integrating genetic and ecological data in tsetse flies (Glossina pallidipes) for spatially explicit vector control planning

  • 1. University of California, Berkeley
  • 2. Yale University
  • 3. Uppsala University
  • 4. University of California*
  • 5. Kenya Agricultural and Livestock Research Organization
  • 6. Maseno University
  • 7. Kenya Medical Research Institute
  • 8. Vector & Vector-Borne Diseases Research Institute
  • 9. Yale School of Public Health*
  • 10. Utah State University

Description

Introduction - Control of vector populations is an effective strategy for addressing vector-borne disease transmission. Effective vector control requires knowledge of habitat use and connectivity. Our goal was to improve this knowledge for the tsetse species Glossina pallidipes, a vector of animal African trypanosomiasis, which is a wasting disease in livestock and represents a serious socioeconomic burden across sub-Saharan Africa. Methods and Results - We used random forest regression to: (i) Build and integrate models of G. pallidipes habitat suitability and genetic connectivity across Kenya and northern Tanzania, and (ii) provide novel vector control recommendations. Inputs for the models included field-survey records from 349 trap locations, genetic data from 11 microsatellite loci from 659 flies and 29 sampling sites, and remotely sensed environmental data. The suitability and connectivity models explained approximately 80% and 67% of the variance in the occurrence and genetic data, and exhibited high accuracy based on cross-validation. The bivariate map showed that suitability and connectivity vary independently across the landscape and inform vector control recommendations. Post-hoc analyses show spatial variation in the correlations between the most important environmental predictors from our models and each response variable (e.g. suitability and connectivity) as well as heterogeneity in expected future climatic change of these predictors. Discussion - The bivariate map suggests vector control is most likely to be successful in the Lake Victoria basin, and supports the previous recommendation that most of eastern Kenya should be managed as a single unit. We further recommend that future monitoring efforts should focus on tracking potential changes in vector presence and dispersal around the Serengeti and the Lake Victoria basin based on projected local climatic shifts. The strong performance of the spatial models suggests potential for our integrative methodology to be used to understand future impacts of climate change in this and other vector systems. 

Notes

The Bishop2021_HabitatSuitability_Data.csv file contains the data used in the habitat suitability model (i.e. information about the trap locations). Abbreviations: TrapNo (Trap Number), Lat (Latitude), Long (Longitude), NumberDays (number of days between StartDate (date traps were set out) and EndDate (date flies were collected from traps)).

The Bishop2021_GenConModel_AllData.csv file contains the data used in the genetic connectivity model. All columns starting with "BIO" are the median values of each bioclimatic variable along straight paths between sites. The "kernel" column contains the median values along straight paths between sites from the kernel density layer. The "pixvals" column contains the geographic distance between sites in units of pixels (1 km resolution). The "Distance" column contains the Cavalli-Sforza and Edwards' chord (CSE) genetic distances between sites. See methods of the paper (Bishop et al., 2021) for more detail.

The Gpd_KenTza_11loci_659indv_genepop.txt file contains the microsatellite genotypes for the 659 individuals used in this study in GenePop format (https://genepop.curtin.edu.au/) and the Gpd_KenTza_11loci_659indv_sample_info.csv file provides information about these individuals.

Funding provided by: Foundation for the National Institutes of Health
Crossref Funder Registry ID: http://dx.doi.org/10.13039/100000009
Award Number: U01 AI115648

Funding provided by: Foundation for the National Institutes of Health Fogarty Global Infectious Diseases Training Grant*
Crossref Funder Registry ID:
Award Number: D43TW007391

Files

Bishop2021_GenConModel_AllData_DF.csv

Files (261.9 kB)

Name Size Download all
md5:40b9386019f7b0b43c2c345b5bd521a9
76.4 kB Preview Download
md5:2e231cbbab518a0b2ed7f9472b9d5b27
37.1 kB Preview Download
md5:678ceb3bf47ab7adc398ff9b79b71148
5.3 kB Preview Download
md5:3938c19c01b6252a4c53e125f8677a34
61.8 kB Preview Download
md5:e11191153be41505c4b90db6411a4e9a
81.3 kB Preview Download