There is a newer version of the record available.

Published June 11, 2021 | Version 1.5
Dataset Open

Rapid Creation of a Data Product for the World's Specimens of Horseshoe Bats and Relatives, a Known Reservoir for Coronaviruses

  • 1. Florida State University
  • 2. Yale University Peabody Museum of Natural History
  • 3. Agriculture and Agri-Food Canada
  • 4. American Museum of Natural History
  • 5. University of Florida
  • 6. Arizona State University

Description

This repository is associated with NSF DBI 2033973, RAPID Grant: Rapid Creation of a Data Product for the World's Specimens of Horseshoe Bats and Relatives, a Known Reservoir for Coronaviruses (https://www.nsf.gov/awardsearch/showAward?AWD_ID=2033973). Specifically, this repository contains (1) raw data from iDigBio (http://portal.idigbio.org) and GBIF (https://www.gbif.org), (2) R code for reproducible data wrangling and improvement, (3) protocols associated with data enhancements, and (4) enhanced versions of the dataset published at various project milestones. Additional code associated with this grant can be found in the BIOSPEX repository (https://github.com/iDigBio/Biospex). Long-term data management of the enhanced specimen data created by this project is expected to be accomplished by the natural history collections curating the physical specimens, a list of which can be found in this Zenodo resource.

Grant abstract: "The award to Florida State University will support research contributing to the development of georeferenced, vetted, and versioned data products of the world's specimens of horseshoe bats and their relatives for use by researchers studying the origins and spread of SARS-like coronaviruses, including the causative agent of COVID-19. Horseshoe bats and other closely related species are reported to be reservoirs of several SARS-like coronaviruses. Species of these bats are primarily distributed in regions where these viruses have been introduced to populations of humans. Currently, data associated with specimens of these bats are housed in natural history collections that are widely distributed both nationally and globally. Additionally, information tying these specimens to localities are mostly vague, or in many instances missing. This decreases the utility of the specimens for understanding the source, emergence, and distribution of SARS-COV-2 and similar viruses. This project will provide quality georeferenced data products through the consolidation of ancillary information linked to each bat specimen, using the extended specimen model. The resulting product will serve as a model of how data in biodiversity collections might be used to address emerging diseases of zoonotic origin. Results from the project will be disseminated widely in opensource journals, at scientific meetings, and via websites associated with the participating organizations and institutions. Support of this project provides a quality resource optimized to inform research relevant to improving our understanding of the biology and spread of SARS-CoV-2. The overall objectives are to deliver versioned data products, in formats used by the wider research and biodiversity collections communities, through an open-access repository; project protocols and code via GitHub and described in a peer-reviewed paper, and; sustained engagement with biodiversity collections throughout the project for reintegration of improved data into their local specimen data management systems improving long-term curation.

This RAPID award will produce and deliver a georeferenced, vetted and consolidated data product for horseshoe bats and related species to facilitate understanding of the sources, distribution, and spread of SARS-CoV-2 and related viruses, a timely response to the ongoing global pandemic caused by SARS-CoV-2 and an important contribution to the global effort to consolidate and provide quality data that are relevant to understanding emergent and other properties the current pandemic. This RAPID award is made by the Division of Biological Infrastructure (DBI) using funds from the Coronavirus Aid, Relief, and Economic Security (CARES) Act.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria."

Files included in this resource

  • 9d4b9069-48c4-4212-90d8-4dd6f4b7f2a5.zip: Raw data from iDigBio, DwC-A format
  • 0067804-200613084148143.zip: Raw data from GBIF, DwC-A format
  • 0067806-200613084148143.zip: Raw data from GBIF, DwC-A format
  • 1620226888.zip: Full export of this project's data (enhanced and raw) from BIOSPEX, CSV format
  • bionomia-datasets-attributions.zip: Directory containing 103 Frictionless Data packages for datasets that have attributions made containing Rhinolophids or Hipposiderids, each package also containing a CSV file for mismatches in person date of birth/death and specimen eventDate. File bionomia-datasets-attributions-key_2021-02-25.csv included in this directory provides a key between dataset identifier (how the Frictionless Data package files are named) and dataset name.
  • bionomia-problem-dates-all-datasets_2021-02-25.csv: List of 21 Hipposiderid or Rhinolophid records whose eventDate or dateIdentified mismatches a wikidata recipient’s date of birth or death across all datasets.
  • flagEventDate.txt: file containing term definition to reference in DwC-A
  • flagExclude.txt: file containing term definition to reference in DwC-A
  • flagGeoreference.txt: file containing term definition to reference in DwC-A
  • flagTaxonomy.txt: file containing term definition to reference in DwC-A
  • georeferencedByID.txt: file containing term definition to reference in DwC-A
  • identifiedByNames.txt: file containing term definition to reference in DwC-A
  • RAPID-code_collection-date.R: code associated with enhancing collection dates
  • RAPID-code_compile-deduplicate.R: code associated with compiling and deduplicating raw data
  • RAPID-code_standardize-country.R: code associated with standardizing country data
  • RAPID-code_external-linkages-bold.R: code associated with enhancing external linkages
  • RAPID-code_external-linkages-genbank.R: code associated with enhancing external linkages
  • RAPID-code_external-linkages-standardize.R: code associated with enhancing external linkages
  • RAPID-code_people.R: code associated with enhancing data about people
  • RAPID-code_standardize-country.R: code associated with standardizing country data
  • rapid-data-providers_2021-05-03.csv: list of data providers and number of records provided to rapid-joined-records_country-cleanup_2020-09-23.csv
  • rapid-final-data-product_2021-05-07.csv: Enhanced dataset, final version from BIOSPEX
  • rapid-joined-records_country-cleanup_2020-09-23.csv: data product initial version where raw data has been compiled and deduplicated, and country data has been standardized
  • RAPID-protocol_collection-date.pdf: protocol associated with enhancing collection dates
  • RAPID-protocol_compile-deduplicate.pdf: protocol associated with compiling and deduplicating raw data
  • RAPID-protocol_external-linkages.pdf: protocol associated with enhancing external linkages
  • RAPID-protocol_georeference.pdf: protocol associated with georeferencing
  • RAPID-protocol_people.pdf: protocol associated with enhancing data about people
  • RAPID-protocol_standardize-country.pdf: protocol associated with standardizing country data
  • RAPID-protocol_taxonomic-names.pdf: protocol associated with enhancing taxonomic name data
  • RAPIDAgentStrings1_archivedCopy_30March2021.ods: resource used in conjunction with RAPID people protocol
  • Rhinolophid-HipposideridAgentStrings_and_People2_archivedCopy_30March2021.ods: resource used in conjunction with RAPID people protocol
  • recordedByNames.txt: file containing term definition to reference in DwC-A
  • wikidata-notes-for-bat-collectors_leachman_2020.docx: resource used in conjunction with RAPID people protocol
  • wikidata-notes-for-bat-collectors_leachman_2020.pdf: resource used in conjunction with RAPID people protocol

Notes

Funding by the U.S. National Science Foundation DBI 2033973.

Files

0067804-200613084148143.zip

Files (953.9 MB)

Name Size Download all
md5:05ce53b0ac3eaa4018bf5e0fc1a74857
6.2 MB Preview Download
md5:5f94c6be52ce63412865665fc5eccbb3
5.8 MB Preview Download
md5:3da246071e81f72269c778a664973636
53.1 MB Preview Download
md5:3daed8fdb7b4f9c39579ee668af2df6e
317.1 MB Preview Download
md5:92ad6abe193a098a27011f9b10bceb09
92.3 MB Preview Download
md5:25baafd9cdc36f7fa64fb416093fa1e6
4.1 kB Preview Download
md5:93e0d35c21649cfa97a1d899db15015a
197 Bytes Preview Download
md5:e3b29b1e6bcd6770a642c5ee5f153f5e
209 Bytes Preview Download
md5:5ed587213a3bcb99f74f775f58d39dad
190 Bytes Preview Download
md5:aa692f34c8fdef3c77eb8a224eaa1bfe
194 Bytes Preview Download
md5:c9ee9daf369298567bf84312b61fab4a
221 Bytes Preview Download
md5:4b5ce7d95cecfd217310025a7ae649a2
333 Bytes Preview Download
md5:a8d69065f1895c400fcc0e81f63aeed0
13.1 kB Download
md5:e0534cac7c89426d047492cc7289b824
9.2 kB Download
md5:41f407c76a7cad91d10c658358aeacc9
2.6 kB Download
md5:0526da7607ba4d1f3047d124c00955fa
4.5 kB Download
md5:7bf33f3178116a5606b5ce6dd1721a54
2.8 kB Download
md5:1a9fab42b92e7ce2ecc144c87d0a6e04
5.7 kB Download
md5:487ff45da3a19fb8bdc1153df594c11d
2.3 kB Download
md5:360b6b0cb6b995c04904ec1dc221577f
18.0 kB Preview Download
md5:aec2e12ea35e961170c914a582d1bb33
172.6 MB Preview Download
md5:fc2228e3edf9b144a34d0915e0fdea83
300.1 MB Preview Download
md5:533ed6dbeff30b5b9049167f98ffb7b0
94.5 kB Preview Download
md5:5b01fe32bbca066beb0ba48bdb39d69e
73.8 kB Preview Download
md5:b3f1e392c2c44a889e7ea9cc84789af3
91.3 kB Preview Download
md5:c12ab0cb35203ee749073121bed6339b
174.4 kB Preview Download
md5:db947a088f88499291a793a6321ca8be
123.1 kB Preview Download
md5:0cddeb798ed3912c4ef9052b86149fc0
80.1 kB Preview Download
md5:e7c9ed46865bf6b244d55e8d9fb45cc2
104.1 kB Preview Download
md5:da70a92097a9eeb0b31ee452b0c1b9eb
634.0 kB Download
md5:101994b0b522b8e10f06458513d52e45
329 Bytes Preview Download
md5:c26cecb34727d56dccbd2ccb28c7f9d7
456.5 kB Download
md5:7b44eaf08de7e851d727fdeea6d2a851
3.7 MB Download
md5:892f4526ad0450f7596d15f168a2ac28
925.5 kB Preview Download