Global Biodiversity Information Facility (GBIF): an exhaustive list of gbif record ids, dataset keys, and their associated Occurrence IDs, Institution Code, Collection Codes and Catalog Numbers. hash://sha256/ea88f03a7bfd1ba853fdbea3203d54ab81ac3cdc8e8da7c96bbbba9c4b05d933 hash://md5/c49fe34785354847b37ea4509261e130
Creators
Description
The Global Biodiversity Information Facility (GBIF) indexes thousands of biodiversity datasets from Natural History Collections, citizen science initiatives (e.g., iNaturalist, eBird), and other sources. As part of the index process, GBIF associates at least two identifiers with indexed records: a record id (aka gbifID) and a dataset id (aka dataset key). These ids are central to do lookup, reference data, and package interpreted data products.
This publication contains an exhaustive list of GBIF IDs and ids associated by their data providers as derived from:
GBIF.org (01 March 2023) GBIF Occurrence Download https://doi.org/10.15468/dl.pk3trq
The resource (size: ~260GB) provided by GBIF had content id hash://sha256/c8bac8acb28c8524c53589b3a40e322dbbbdadf5689fef2e20266fbf6ddf6b97 and was used to generate the resource included in this publication using
preston cat 'zip:hash://sha256/c8bac8acb28c8524c53589b3a40e322dbbbdadf5689fef2e20266fbf6ddf6b97!/0015281-230224095556074.csv'\
| cut -f 1,2,3,37,38,39\
| gzip\
> gbifid.tsv.gz
with the content id of gbifid.tsv.gz (size: ~35GB) being hash://sha256/a339e32e10edaad585f61f2ded06cbb23e0618c65a6360db18d7d729054940a8 .
the first 10 lines of gbifid.tsv.gz as extracted via
preston cat --remote https://zenodo.org/record/7789866/files,https://linker.bio hash://sha256/a339e32e10edaad585f61f2ded06cbb23e0618c65a6360db18d7d729054940a8\
| gunzip\
| head
are:
gbifID datasetKey occurrenceID institutionCode collectionCode catalogNumber
2997162320 c71c8000-9fc7-422c-804a-ce6abe751771 3399442 CEPEC CEPEC CEPEC00109669
2997162309 c71c8000-9fc7-422c-804a-ce6abe751771 2733085 CEPEC CEPEC CEPEC00000818
2997162317 c71c8000-9fc7-422c-804a-ce6abe751771 2733086 CEPEC CEPEC CEPEC00000888
2997162313 c71c8000-9fc7-422c-804a-ce6abe751771 3399443 CEPEC CEPEC CEPEC00109744
2997162306 c71c8000-9fc7-422c-804a-ce6abe751771 2733087 CEPEC CEPEC CEPEC00000889
2997162316 c71c8000-9fc7-422c-804a-ce6abe751771 3399440 CEPEC CEPEC CEPEC00109605
2997162324 c71c8000-9fc7-422c-804a-ce6abe751771 2733088 CEPEC CEPEC CEPEC00000890
2997162308 c71c8000-9fc7-422c-804a-ce6abe751771 3399441 CEPEC CEPEC CEPEC00109615
2997162303 c71c8000-9fc7-422c-804a-ce6abe751771 2733089 CEPEC CEPEC CEPEC00000891
Note that at time of writing, the html resource associated with the occurrence id 2997162320, and data set key c71c8000-9fc7-422c-804a-ce6abe751771 (extracted from of the first data row example above) are available via:
https://gbif.org/occurrence/2997162320
and
https://gbif.org/dataset/c71c8000-9fc7-422c-804a-ce6abe751771
respectively.
This resource was initially created to help integrate with Bionomia (https://bionomia.net) to help associate people identifiers provided by bionomia to their original records via their GBIF ids. Bionomia re-uses GBIF records ids as a way to define links between records and the people (e.g., curators, collectors, identifiers) that worked on them.
In other words, this resource provides a versioned translation table from the GBIF data universe (as defined by GBIF record ids, and dataset keys) to the data collections that exist (and evolve) independent of it.
Note that the resource identified by hash://sha256/c8bac8acb28c8524c53589b3a40e322dbbbdadf5689fef2e20266fbf6ddf6b97 was not included in this publication it was too big (260GB) to fit. You may be able to retrieve the resource from its original location at https://api.gbif.org/v1/occurrence/download/request/0015281-230224095556074.zip .
Files
Files
(37.2 GB)
Name | Size | Download all |
---|---|---|
md5:221954c11c873d4254cb8de53a6581c3
|
4.0 kB | Download |
md5:48d6d35d0e3fc60ee07674373ef391f4
|
78 Bytes | Download |
md5:9e548db73bb2dfd023e2c2ebb94fdd5c
|
78 Bytes | Download |
md5:4f931115040a004f6f17dbd2ceaa5485
|
78 Bytes | Download |
md5:fb6b69696c9ee749b0911d1a41e9557e
|
4.7 kB | Download |
md5:7d7ba92e73fab4c8028f36c7e7c082e9
|
6.4 kB | Download |
md5:6de0a4bacfaef0ade544f0abfca50036
|
621.4 kB | Download |
md5:dbc22141c3abe11df59f341c27c06072
|
37.2 GB | Download |
md5:1732a4206b415a5a4de742bd90e6f5e1
|
78 Bytes | Download |
md5:755807adfb1d2e768d0994c70fd6b5d8
|
78 Bytes | Download |
md5:fce4196cea972207e3206978a348d118
|
547 Bytes | Download |
md5:7785b1e1a6d795ec7c13408ecd239260
|
3.8 kB | Download |
md5:c49fe34785354847b37ea4509261e130
|
2.8 kB | Download |
md5:d1c0554a30b0f64f45748f87b3ee6975
|
3.3 kB | Download |
Additional details
Identifiers
Related works
- Is compiled by
- https://github.com/bio-guoda/preston (URL)
- 10.5281/zenodo.7789745 (DOI)
- Is derived from
- Dataset: 10.15468/dl.pk3trq (DOI)
- hash://sha256/c8bac8acb28c8524c53589b3a40e322dbbbdadf5689fef2e20266fbf6ddf6b97 (URL)
- https://api.gbif.org/v1/occurrence/download/request/0015281-230224095556074.zip (URL)
Funding
- U.S. National Science Foundation
- EAGER: Towards the Web of Biodiversity Knowledge: Understanding Data Connectedness to Improve Identifier Practices 1839201