Dataset Open Access

Rapid Creation of a Data Product for the World's Specimens of Horseshoe Bats and Relatives, a Known Reservoir for Coronaviruses

Mast, Austin R.; Paul, Deborah L.; Rios, Nelson; Bruhn, Robert; Dalton, Trevor; Krimmel, Erica R.; Pearson, Katelin D.; Sherman, Aja; Shorthouse, David P.; Simmons, Nancy B.; Soltis, Pam; Upham, Nathan; Abibou, Djihbrihou

This repository is associated with NSF DBI 2033973, RAPID Grant: Rapid Creation of a Data Product for the World's Specimens of Horseshoe Bats and Relatives, a Known Reservoir for Coronaviruses (https://www.nsf.gov/awardsearch/showAward?AWD_ID=2033973). Specifically, this repository contains (1) raw data from iDigBio (http://portal.idigbio.org) and GBIF (https://www.gbif.org), (2) R code for reproducible data wrangling and improvement, (3) protocols associated with data enhancements, and (4) enhanced versions of the dataset published at various project milestones. Additional code associated with this grant can be found in the BIOSPEX repository (https://github.com/iDigBio/Biospex). Long-term data management of the enhanced specimen data created by this project is expected to be accomplished by the natural history collections curating the physical specimens, a list of which can be found in this Zenodo resource.

Grant abstract: "The award to Florida State University will support research contributing to the development of georeferenced, vetted, and versioned data products of the world's specimens of horseshoe bats and their relatives for use by researchers studying the origins and spread of SARS-like coronaviruses, including the causative agent of COVID-19. Horseshoe bats and other closely related species are reported to be reservoirs of several SARS-like coronaviruses. Species of these bats are primarily distributed in regions where these viruses have been introduced to populations of humans. Currently, data associated with specimens of these bats are housed in natural history collections that are widely distributed both nationally and globally. Additionally, information tying these specimens to localities are mostly vague, or in many instances missing. This decreases the utility of the specimens for understanding the source, emergence, and distribution of SARS-COV-2 and similar viruses. This project will provide quality georeferenced data products through the consolidation of ancillary information linked to each bat specimen, using the extended specimen model. The resulting product will serve as a model of how data in biodiversity collections might be used to address emerging diseases of zoonotic origin. Results from the project will be disseminated widely in opensource journals, at scientific meetings, and via websites associated with the participating organizations and institutions. Support of this project provides a quality resource optimized to inform research relevant to improving our understanding of the biology and spread of SARS-CoV-2. The overall objectives are to deliver versioned data products, in formats used by the wider research and biodiversity collections communities, through an open-access repository; project protocols and code via GitHub and described in a peer-reviewed paper, and; sustained engagement with biodiversity collections throughout the project for reintegration of improved data into their local specimen data management systems improving long-term curation.

This RAPID award will produce and deliver a georeferenced, vetted and consolidated data product for horseshoe bats and related species to facilitate understanding of the sources, distribution, and spread of SARS-CoV-2 and related viruses, a timely response to the ongoing global pandemic caused by SARS-CoV-2 and an important contribution to the global effort to consolidate and provide quality data that are relevant to understanding emergent and other properties the current pandemic. This RAPID award is made by the Division of Biological Infrastructure (DBI) using funds from the Coronavirus Aid, Relief, and Economic Security (CARES) Act.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria."

Files included in this resource

  • 9d4b9069-48c4-4212-90d8-4dd6f4b7f2a5.zip: Raw data from iDigBio, DwC-A format
  • 0067804-200613084148143.zip: Raw data from GBIF, DwC-A format
  • 0067806-200613084148143.zip: Raw data from GBIF, DwC-A format
  • 1623690110.zip: Full export of this project's data (enhanced and raw) from BIOSPEX, CSV format
  • bionomia-datasets-attributions.zip: Directory containing 103 Frictionless Data packages for datasets that have attributions made containing Rhinolophids or Hipposiderids, each package also containing a CSV file for mismatches in person date of birth/death and specimen eventDate. File bionomia-datasets-attributions-key_2021-02-25.csv included in this directory provides a key between dataset identifier (how the Frictionless Data package files are named) and dataset name.
  • bionomia-problem-dates-all-datasets_2021-02-25.csv: List of 21 Hipposiderid or Rhinolophid records whose eventDate or dateIdentified mismatches a wikidata recipient’s date of birth or death across all datasets.
  • flagEventDate.txt: file containing term definition to reference in DwC-A
  • flagExclude.txt: file containing term definition to reference in DwC-A
  • flagGeoreference.txt: file containing term definition to reference in DwC-A
  • flagTaxonomy.txt: file containing term definition to reference in DwC-A
  • georeferencedByID.txt: file containing term definition to reference in DwC-A
  • identifiedByNames.txt: file containing term definition to reference in DwC-A
  • instructions-to-get-people-data-from-bionomia-via-datasetKey: instructions given to data providers
  • RAPID-code_collection-date.R: code associated with enhancing collection dates
  • RAPID-code_compile-deduplicate.R: code associated with compiling and deduplicating raw data
  • RAPID-code_external-linkages-bold.R: code associated with enhancing external linkages
  • RAPID-code_external-linkages-genbank.R: code associated with enhancing external linkages
  • RAPID-code_external-linkages-standardize.R: code associated with enhancing external linkages
  • RAPID-code_people.R: code associated with enhancing data about people
  • RAPID-code_standardize-country.R: code associated with standardizing country data
  • RAPID-data-dictionary.pdf: metadata about terms included in this project’s data, in PDF format
  • RAPID-data-dictionary.xlsx: metadata about terms included in this project’s data, in spreadsheet format
  • rapid-data-providers_2021-05-03.csv: list of data providers and number of records provided to rapid-joined-records_country-cleanup_2020-09-23.csv
  • rapid-final-data-product_2021-06-29.zip: Enhanced data from BIOSPEX, DwC-A format
  • rapid-final-gazetteer.zip: Gazetteer providing georeference data and metadata for 10,341 localities assessed as part of this project
  • rapid-joined-records_country-cleanup_2020-09-23.csv: data product initial version where raw data has been compiled and deduplicated, and country data has been standardized
  • RAPID-protocol_collection-date.pdf: protocol associated with enhancing collection dates
  • RAPID-protocol_compile-deduplicate.pdf: protocol associated with compiling and deduplicating raw data
  • RAPID-protocol_external-linkages.pdf: protocol associated with enhancing external linkages
  • RAPID-protocol_georeference.pdf: protocol associated with georeferencing
  • RAPID-protocol_people.pdf: protocol associated with enhancing data about people
  • RAPID-protocol_standardize-country.pdf: protocol associated with standardizing country data
  • RAPID-protocol_taxonomic-names.pdf: protocol associated with enhancing taxonomic name data
  • RAPIDAgentStrings1_archivedCopy_30March2021.ods: resource used in conjunction with RAPID people protocol
  • recordedByNames.txt: file containing term definition to reference in DwC-A
  • Rhinolophid-HipposideridAgentStrings_and_People2_archivedCopy_30March2021.ods: resource used in conjunction with RAPID people protocol
  • wikidata-notes-for-bat-collectors_leachman_2020: please see https://zenodo.org/record/4724139 for this resource
Funding by the U.S. National Science Foundation DBI 2033973.
Files (797.3 MB)
Name Size
0067804-200613084148143.zip
md5:05ce53b0ac3eaa4018bf5e0fc1a74857
6.2 MB Download
0067806-200613084148143.zip
md5:5f94c6be52ce63412865665fc5eccbb3
5.8 MB Download
1623690110.zip
md5:5e2519bd129ebec59bf0f5bb54b73111
53.1 MB Download
9d4b9069-48c4-4212-90d8-4dd6f4b7f2a5.zip
md5:3daed8fdb7b4f9c39579ee668af2df6e
317.1 MB Download
bionomia-datasets-attributions.zip
md5:92ad6abe193a098a27011f9b10bceb09
92.3 MB Download
bionomia-problem-dates-all-datasets_2021-02-25.csv
md5:25baafd9cdc36f7fa64fb416093fa1e6
4.1 kB Download
flagEventDate.txt
md5:93e0d35c21649cfa97a1d899db15015a
197 Bytes Download
flagExclude.txt
md5:e3b29b1e6bcd6770a642c5ee5f153f5e
209 Bytes Download
flagGeoreference.txt
md5:5ed587213a3bcb99f74f775f58d39dad
190 Bytes Download
flagTaxonomy.txt
md5:aa692f34c8fdef3c77eb8a224eaa1bfe
194 Bytes Download
georeferencedByID.txt
md5:c9ee9daf369298567bf84312b61fab4a
221 Bytes Download
identifiedByNames.txt
md5:4b5ce7d95cecfd217310025a7ae649a2
333 Bytes Download
instructions-to-get-people-data-from-bionomia-via-datasetKey.txt
md5:ed033fd44e370b41386e9fbab548aafb
1.3 kB Download
RAPID-code_collection-date.R
md5:a8d69065f1895c400fcc0e81f63aeed0
13.1 kB Download
RAPID-code_compile-deduplicate.R
md5:e0534cac7c89426d047492cc7289b824
9.2 kB Download
RAPID-code_external-linkages-bold.R
md5:41f407c76a7cad91d10c658358aeacc9
2.6 kB Download
RAPID-code_external-linkages-genbank.R
md5:0526da7607ba4d1f3047d124c00955fa
4.5 kB Download
RAPID-code_external-linkages-standardize.R
md5:7bf33f3178116a5606b5ce6dd1721a54
2.8 kB Download
RAPID-code_people.R
md5:1a9fab42b92e7ce2ecc144c87d0a6e04
5.7 kB Download
RAPID-code_standardize-country.R
md5:487ff45da3a19fb8bdc1153df594c11d
2.3 kB Download
RAPID-data-dictionary.pdf
md5:77370179cb4d4b598737ec317f655813
120.0 kB Download
RAPID-data-dictionary.xlsx
md5:4d03da8790368e410a33291e2ce566d7
37.5 kB Download
rapid-data-providers_2021-05-03.csv
md5:360b6b0cb6b995c04904ec1dc221577f
18.0 kB Download
rapid-final-data-product_2021-06-29.zip
md5:5c6039ee8cfec4ada2b755ed412045c3
10.3 MB Download
rapid-final-gazetteer.zip
md5:c8b4d28cafab4e0668e3e41bf6a8c0fc
10.3 MB Download
rapid-joined-records_country-cleanup_2020-09-23.csv
md5:fc2228e3edf9b144a34d0915e0fdea83
300.1 MB Download
RAPID-protocol_collection-date.pdf
md5:533ed6dbeff30b5b9049167f98ffb7b0
94.5 kB Download
RAPID-protocol_compile-deduplicate.pdf
md5:5b01fe32bbca066beb0ba48bdb39d69e
73.8 kB Download
RAPID-protocol_external-linkages.pdf
md5:b3f1e392c2c44a889e7ea9cc84789af3
91.3 kB Download
RAPID-protocol_georeference.pdf
md5:c12ab0cb35203ee749073121bed6339b
174.4 kB Download
RAPID-protocol_people.pdf
md5:db947a088f88499291a793a6321ca8be
123.1 kB Download
RAPID-protocol_standardize-country.pdf
md5:0cddeb798ed3912c4ef9052b86149fc0
80.1 kB Download
RAPID-protocol_taxonomic-names.pdf
md5:e7c9ed46865bf6b244d55e8d9fb45cc2
104.1 kB Download
RAPIDAgentStrings1_archivedCopy_30March2021.ods
md5:da70a92097a9eeb0b31ee452b0c1b9eb
634.0 kB Download
recordedByNames.txt
md5:101994b0b522b8e10f06458513d52e45
329 Bytes Download
Rhinolophid-HipposideridAgentStrings_and_People2_archivedCopy_30March2021.ods
md5:c26cecb34727d56dccbd2ccb28c7f9d7
456.5 kB Download
5,591
3,218
views
downloads
All versions This version
Views 5,5911,129
Downloads 3,218682
Data volume 122.3 GB14.5 GB
Unique views 4,6101,037
Unique downloads 1,298331

Share

Cite as