There is a newer version of the record available.

Published September 23, 2020 | Version 1.2
Dataset Open

Rapid Creation of a Data Product for the World's Specimens of Horseshoe Bats and Relatives, a Known Reservoir for Coronaviruses

  • 1. Florida State University
  • 2. Yale University Peabody Museum of Natural History
  • 3. Agriculture and Agri-Food Canada
  • 4. American Museum of Natural History
  • 5. University of Florida
  • 6. Arizona State University


This repository is associated with NSF DBI 2033973, RAPID Grant: Rapid Creation of a Data Product for the World's Specimens of Horseshoe Bats and Relatives, a Known Reservoir for Coronaviruses ( Specifically, this repository contains (1) raw data from iDigBio ( and GBIF (, (2) R code for reproducible data wrangling and improvement, (3) protocols associated with data enhancements, and (4) enhanced versions of the dataset published at various project milestones. Additional code associated with this grant can be found in the BIOSPEX repository ( Long-term data management of the enhanced specimen data created by this project is expected to be accomplished by the natural history collections curating the physical specimens, a list of which can be found in this Zenodo resource.

Grant abstract: "The award to Florida State University will support research contributing to the development of georeferenced, vetted, and versioned data products of the world's specimens of horseshoe bats and their relatives for use by researchers studying the origins and spread of SARS-like coronaviruses, including the causative agent of COVID-19. Horseshoe bats and other closely related species are reported to be reservoirs of several SARS-like coronaviruses. Species of these bats are primarily distributed in regions where these viruses have been introduced to populations of humans. Currently, data associated with specimens of these bats are housed in natural history collections that are widely distributed both nationally and globally. Additionally, information tying these specimens to localities are mostly vague, or in many instances missing. This decreases the utility of the specimens for understanding the source, emergence, and distribution of SARS-COV-2 and similar viruses. This project will provide quality georeferenced data products through the consolidation of ancillary information linked to each bat specimen, using the extended specimen model. The resulting product will serve as a model of how data in biodiversity collections might be used to address emerging diseases of zoonotic origin. Results from the project will be disseminated widely in opensource journals, at scientific meetings, and via websites associated with the participating organizations and institutions. Support of this project provides a quality resource optimized to inform research relevant to improving our understanding of the biology and spread of SARS-CoV-2. The overall objectives are to deliver versioned data products, in formats used by the wider research and biodiversity collections communities, through an open-access repository; project protocols and code via GitHub and described in a peer-reviewed paper, and; sustained engagement with biodiversity collections throughout the project for reintegration of improved data into their local specimen data management systems improving long-term curation.

This RAPID award will produce and deliver a georeferenced, vetted and consolidated data product for horseshoe bats and related species to facilitate understanding of the sources, distribution, and spread of SARS-CoV-2 and related viruses, a timely response to the ongoing global pandemic caused by SARS-CoV-2 and an important contribution to the global effort to consolidate and provide quality data that are relevant to understanding emergent and other properties the current pandemic. This RAPID award is made by the Division of Biological Infrastructure (DBI) using funds from the Coronavirus Aid, Relief, and Economic Security (CARES) Act.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria."

Files included in this resource

  • Raw data from iDigBio
  • Raw data from GBIF
  • Raw data from GBIF
  • RAPID-code_compile-deduplicate.R: code associated with compiling and deduplicating raw data
  • RAPID-code_standardize-country.R: code associated with standardizing country data
  • rapid-data-providers_2020-09-23.csv: list of data providers and number of records provided to rapid-joined-records_country-cleanup_2020-09-23.csv
  • rapid-joined-records_country-cleanup_2020-09-23.csv: data product initial version where raw data has been compiled and deduplicated, and country data has been standardized
  • RAPID-protocol_compile-deduplicate.pdf: protocol associated with compiling and deduplicating raw data
  • RAPID-protocol_standardize-country.pdf: protocol associated with standardizing country data


Funding by the U.S. National Science Foundation DBI 2033973.


Files (629.4 MB)

Name Size Download all
6.2 MB Preview Download
5.8 MB Preview Download
317.1 MB Preview Download
9.2 kB Download
2.3 kB Download
17.9 kB Preview Download
300.1 MB Preview Download
73.1 kB Preview Download
68.3 kB Preview Download