Search and retrieve data from the Global Biodiversity Information Facility (GBIF)

About the package

rgbif is an R package to search and retrieve data from the Global Biodiversity Information Facility (GBIF). rgbif wraps R code around the GBIF API to allow you to talk to GBIF from R.

Get rgbif

Install from CRAN

Or install the development version from GitHub

devtools::install_github("ropensci/rgbif")

Load rgbif

library("rgbif")

Number of occurrences

Search by type of record, all observational in this case

occ_count(basisOfRecord='OBSERVATION')
#> [1] 20055605

Records for Puma concolor with lat/long data (georeferenced) only. Note that hasCoordinate in occ_search() is the same as georeferenced in occ_count().

occ_count(taxonKey=2435099, georeferenced=TRUE)
#> [1] 3421

All georeferenced records in GBIF

occ_count(georeferenced=TRUE)
#> [1] 943606574

Records from Denmark

denmark_code <- isocodes[grep("Denmark", isocodes$name), "code"]
occ_count(country=denmark_code)
#> [1] 27301026

Number of records in a particular dataset

occ_count(datasetKey='9e7ea106-0bf8-4087-bb61-dfe4f29e0f17')
#> [1] 4591

All records from 2012

Records for a particular dataset, and only for preserved specimens

occ_count(datasetKey='e707e6da-e143-445d-b41d-529c4a777e8b', basisOfRecord='OBSERVATION')
#> [1] 0

Search for taxon names

Get possible values to be used in taxonomic rank arguments in functions

name_lookup() does full text search of name usages covering the scientific and vernacular name, the species description, distribution and the entire classification across all name usages of all or some checklists. Results are ordered by relevance as this search usually returns a lot of results.

By default name_lookup() returns five slots of information: meta, data, facets, hierarchies, and names. hierarchies and names elements are named by their matching GBIF key in the data.frame in the data slot.

out <- name_lookup(query='mammalia')

Search for a genus

Search for the class mammalia

Look up the species Helianthus annuus

The function name_usage() works with lots of different name endpoints in GBIF, listed at http://www.gbif.org/developer/species#nameUsages.

The function name_backbone() is used to search against the GBIF backbone taxonomy

The function name_suggest() is optimized for speed, and gives back suggested names based on query parameters.

Search for occurrences

By default occ_search() returns a dplyr like output summary in which the data printed expands based on how much data is returned, and the size of your window. You can search by scientific name:

occ_search(scientificName = "Ursus americanus", limit = 20)
#> Records found [8907] 
#> Records returned [20] 
#> No. unique hierarchies [1] 
#> No. media records [17] 
#> No. facets [0] 
#> Args [limit=20, offset=0, scientificName=Ursus americanus, fields=all] 
#> # A tibble: 20 x 75
#>    name     key decimalLatitude decimalLongitude issues datasetKey
#>    <chr>  <int>           <dbl>            <dbl> <chr>  <chr>     
#>  1 Ursu… 1.81e9            37.7           -120.  cdrou… 50c9509d-…
#>  2 Ursu… 1.80e9            29.3           -103.  cdrou… 50c9509d-…
#>  3 Ursu… 1.81e9            42.0           -124.  cdrou… 50c9509d-…
#>  4 Ursu… 1.88e9            25.4           -101.  cdrou… 50c9509d-…
#>  5 Ursu… 1.81e9            40.8            -81.7 cdrou… 50c9509d-…
#>  6 Ursu… 1.81e9            34.4           -119.  gass84 50c9509d-…
#>  7 Ursu… 1.88e9            31.8           -105.  cdrou… 50c9509d-…
#>  8 Ursu… 1.81e9            25.4           -101.  gass84 50c9509d-…
#>  9 Ursu… 1.80e9            30.0            -84.3 cdrou… 50c9509d-…
#> 10 Ursu… 1.88e9            39.2           -121.  cdrou… 50c9509d-…
#> 11 Ursu… 1.84e9            49.4           -123.  cdrou… 50c9509d-…
#> 12 Ursu… 1.84e9            44.9           -110.  cdrou… 50c9509d-…
#> 13 Ursu… 1.84e9            34.0           -117.  gass84 50c9509d-…
#> 14 Ursu… 1.84e9            34.0           -117.  gass84 50c9509d-…
#> 15 Ursu… 1.81e9            39.1           -120.  cdrou… 50c9509d-…
#> 16 Ursu… 1.83e9            25.4           -101.  cdrou… 50c9509d-…
#> 17 Ursu… 1.81e9            34.4           -120.  cdrou… 50c9509d-…
#> 18 Ursu… 1.88e9            25.3           -101.  gass84 50c9509d-…
#> 19 Ursu… 1.83e9            34.0           -117.  gass84 50c9509d-…
#> 20 Ursu… 1.81e9            35.8            -83.6 cdrou… 50c9509d-…
#> # ... with 69 more variables: publishingOrgKey <chr>, networkKeys <chr>,
#> #   installationKey <chr>, publishingCountry <chr>, protocol <chr>,
#> #   lastCrawled <chr>, lastParsed <chr>, crawlId <int>, extensions <chr>,
#> #   basisOfRecord <chr>, taxonKey <int>, kingdomKey <int>,
#> #   phylumKey <int>, classKey <int>, orderKey <int>, familyKey <int>,
#> #   genusKey <int>, speciesKey <int>, acceptedTaxonKey <int>,
#> #   scientificName <chr>, acceptedScientificName <chr>, kingdom <chr>,
#> #   phylum <chr>, order <chr>, family <chr>, genus <chr>, species <chr>,
#> #   genericName <chr>, specificEpithet <chr>, taxonRank <chr>,
#> #   taxonomicStatus <chr>, dateIdentified <chr>, stateProvince <chr>,
#> #   year <int>, month <int>, day <int>, eventDate <chr>, modified <chr>,
#> #   lastInterpreted <chr>, references <chr>, license <chr>,
#> #   identifiers <chr>, facts <chr>, relations <chr>, geodeticDatum <chr>,
#> #   class <chr>, countryCode <chr>, country <chr>, rightsHolder <chr>,
#> #   identifier <chr>, verbatimEventDate <chr>, datasetName <chr>,
#> #   collectionCode <chr>, gbifID <chr>, verbatimLocality <chr>,
#> #   occurrenceID <chr>, taxonID <chr>, catalogNumber <chr>,
#> #   recordedBy <chr>, http...unknown.org.occurrenceDetails <chr>,
#> #   institutionCode <chr>, rights <chr>, eventTime <chr>,
#> #   occurrenceRemarks <chr>,
#> #   http...unknown.org.http_..rs.gbif.org.terms.1.0.Multimedia <chr>,
#> #   identificationID <chr>, infraspecificEpithet <chr>,
#> #   coordinateUncertaintyInMeters <dbl>, informationWithheld <chr>

Or to be more precise, you can search for names first, make sure you have the right name, then pass the GBIF key to the occ_search() function:

key <- name_suggest(q='Helianthus annuus', rank='species')$key[1]
occ_search(taxonKey=key, limit=20)
#> Records found [42034] 
#> Records returned [20] 
#> No. unique hierarchies [1] 
#> No. media records [15] 
#> No. facets [0] 
#> Args [limit=20, offset=0, taxonKey=9206251, fields=all] 
#> # A tibble: 20 x 96
#>    name     key decimalLatitude decimalLongitude issues datasetKey
#>    <chr>  <int>           <dbl>            <dbl> <chr>  <chr>     
#>  1 Heli… 1.81e9            52.6             10.1 cdrou… 6ac3f774-…
#>  2 Heli… 1.81e9            25.6           -100.  cdrou… 50c9509d-…
#>  3 Heli… 1.82e9            59.8             17.5 gass8… 38b4c89f-…
#>  4 Heli… 1.84e9            34.1           -116.  gass84 50c9509d-…
#>  5 Heli… 1.82e9            56.6             16.6 cdrou… 38b4c89f-…
#>  6 Heli… 1.81e9            25.7           -100.  cdrou… 50c9509d-…
#>  7 Heli… 1.82e9            56.6             16.4 cdrou… 38b4c89f-…
#>  8 Heli… 1.81e9            32.0           -102.  cdrou… 50c9509d-…
#>  9 Heli… 1.84e9            33.9           -117.  cdrou… 50c9509d-…
#> 10 Heli… 1.84e9             0                0   cucdm… d2470ef8-…
#> 11 Heli… 1.88e9            26.2            -98.2 cdrou… 50c9509d-…
#> 12 Heli… 1.83e9            58.4             14.9 gass8… 38b4c89f-…
#> 13 Heli… 1.81e9            25.7           -100.  cdrou… 50c9509d-…
#> 14 Heli… 1.91e9            30.3            -98.0 cdrou… 50c9509d-…
#> 15 Heli… 1.83e9            58.6             16.2 gass8… 38b4c89f-…
#> 16 Heli… 1.83e9            34.0           -117.  cdrou… 50c9509d-…
#> 17 Heli… 1.84e9            23.8           -107.  cdrou… 50c9509d-…
#> 18 Heli… 1.84e9            26.2            -98.3 cdrou… 50c9509d-…
#> 19 Heli… 1.88e9            26.1            -98.2 cdrou… 50c9509d-…
#> 20 Heli… 1.84e9            25.8           -100.  cdrou… 50c9509d-…
#> # ... with 90 more variables: publishingOrgKey <chr>, networkKeys <chr>,
#> #   installationKey <chr>, publishingCountry <chr>, protocol <chr>,
#> #   lastCrawled <chr>, lastParsed <chr>, crawlId <int>, extensions <chr>,
#> #   basisOfRecord <chr>, taxonKey <int>, kingdomKey <int>,
#> #   phylumKey <int>, classKey <int>, orderKey <int>, familyKey <int>,
#> #   genusKey <int>, speciesKey <int>, acceptedTaxonKey <int>,
#> #   scientificName <chr>, acceptedScientificName <chr>, kingdom <chr>,
#> #   phylum <chr>, order <chr>, family <chr>, genus <chr>, species <chr>,
#> #   genericName <chr>, specificEpithet <chr>, taxonRank <chr>,
#> #   taxonomicStatus <chr>, coordinateUncertaintyInMeters <dbl>,
#> #   year <int>, month <int>, day <int>, eventDate <chr>,
#> #   lastInterpreted <chr>, license <chr>, identifiers <chr>, facts <chr>,
#> #   relations <chr>, geodeticDatum <chr>, class <chr>, countryCode <chr>,
#> #   country <chr>, recordedBy <chr>, catalogNumber <chr>,
#> #   institutionCode <chr>, locality <chr>, gbifID <chr>,
#> #   collectionCode <chr>,
#> #   http...unknown.org.http_..rs.gbif.org.terms.1.0.Multimedia <chr>,
#> #   dateIdentified <chr>, stateProvince <chr>, modified <chr>,
#> #   references <chr>, rightsHolder <chr>, identifier <chr>,
#> #   verbatimEventDate <chr>, datasetName <chr>, verbatimLocality <chr>,
#> #   occurrenceID <chr>, taxonID <chr>,
#> #   http...unknown.org.occurrenceDetails <chr>, rights <chr>,
#> #   eventTime <chr>, identificationID <chr>, individualCount <int>,
#> #   continent <chr>, county <chr>, municipality <chr>,
#> #   identificationVerificationStatus <chr>, language <chr>, type <chr>,
#> #   occurrenceStatus <chr>, vernacularName <chr>, taxonConceptID <chr>,
#> #   informationWithheld <chr>, endDayOfYear <chr>, startDayOfYear <chr>,
#> #   datasetID <chr>, accessRights <chr>, higherClassification <chr>,
#> #   occurrenceRemarks <chr>, habitat <chr>, elevation <dbl>,
#> #   elevationAccuracy <dbl>, recordNumber <chr>,
#> #   ownerInstitutionCode <chr>, identifiedBy <chr>

Like many functions in rgbif, you can choose what to return with the return parameter, here, just returning the metadata:

You can choose what fields to return. This isn’t passed on to the API query to GBIF as they don’t allow that, but we filter out the columns before we give the data back to you.

Most parameters are vectorized, so you can pass in more than one value:

splist <- c('Cyanocitta stelleri', 'Junco hyemalis', 'Aix sponsa')
keys <- sapply(splist, function(x) name_suggest(x)$key[1], USE.NAMES=FALSE)
occ_search(taxonKey=keys, limit=5)
#> Occ. found [2482598 (704794), 9362842 (3810565), 2498387 (1246763)] 
#> Occ. returned [2482598 (5), 9362842 (5), 2498387 (5)] 
#> No. unique hierarchies [2482598 (1), 9362842 (1), 2498387 (1)] 
#> No. media records [2482598 (5), 9362842 (5), 2498387 (5)] 
#> No. facets [2482598 (0), 9362842 (0), 2498387 (0)] 
#> Args [limit=5, offset=0, taxonKey=2482598,9362842,2498387, fields=all] 
#> 3 requests; First 10 rows of data from 2482598
#> 
#> # A tibble: 5 x 72
#>   name     key decimalLatitude decimalLongitude issues datasetKey
#>   <chr>  <int>           <dbl>            <dbl> <chr>  <chr>     
#> 1 Cyan… 1.81e9            48.4            -124. cdrou… 50c9509d-…
#> 2 Cyan… 1.80e9            38.0            -105. gass84 50c9509d-…
#> 3 Cyan… 1.81e9            49.2            -124. cdrou… 50c9509d-…
#> 4 Cyan… 1.80e9            49.2            -124. cdrou… 50c9509d-…
#> 5 Cyan… 1.81e9            32.9            -108. cdrou… 50c9509d-…
#> # ... with 66 more variables: publishingOrgKey <chr>, networkKeys <chr>,
#> #   installationKey <chr>, publishingCountry <chr>, protocol <chr>,
#> #   lastCrawled <chr>, lastParsed <chr>, crawlId <int>, extensions <chr>,
#> #   basisOfRecord <chr>, taxonKey <int>, kingdomKey <int>,
#> #   phylumKey <int>, classKey <int>, orderKey <int>, familyKey <int>,
#> #   genusKey <int>, speciesKey <int>, acceptedTaxonKey <int>,
#> #   scientificName <chr>, acceptedScientificName <chr>, kingdom <chr>,
#> #   phylum <chr>, order <chr>, family <chr>, genus <chr>, species <chr>,
#> #   genericName <chr>, specificEpithet <chr>, taxonRank <chr>,
#> #   taxonomicStatus <chr>, dateIdentified <chr>,
#> #   coordinateUncertaintyInMeters <dbl>, stateProvince <chr>, year <int>,
#> #   month <int>, day <int>, eventDate <chr>, modified <chr>,
#> #   lastInterpreted <chr>, references <chr>, license <chr>,
#> #   identifiers <chr>, facts <chr>, relations <chr>, geodeticDatum <chr>,
#> #   class <chr>, countryCode <chr>, country <chr>, rightsHolder <chr>,
#> #   identifier <chr>, verbatimEventDate <chr>, datasetName <chr>,
#> #   collectionCode <chr>, gbifID <chr>, verbatimLocality <chr>,
#> #   occurrenceID <chr>, taxonID <chr>, catalogNumber <chr>,
#> #   recordedBy <chr>, http...unknown.org.occurrenceDetails <chr>,
#> #   institutionCode <chr>, rights <chr>, eventTime <chr>,
#> #   http...unknown.org.http_..rs.gbif.org.terms.1.0.Multimedia <chr>,
#> #   identificationID <chr>

Maps

Using the GBIF map web tile service, making a raster and visualizing it.

x <- map_fetch(taxonKey = 2480498, year = 2000:2017)
library(raster)
plot(x)
plot of chunk gbifmap1

plot of chunk gbifmap1