Published March 31, 2023 | Version 0.1
Dataset Open

Global Biodiversity Information Facility (GBIF): an exhaustive list of gbif record ids, dataset keys, and their associated Occurrence IDs, Institution Code, Collection Codes and Catalog Numbers. hash://sha256/ea88f03a7bfd1ba853fdbea3203d54ab81ac3cdc8e8da7c96bbbba9c4b05d933 hash://md5/c49fe34785354847b37ea4509261e130

Creators

Description

The Global Biodiversity Information Facility (GBIF) indexes thousands of biodiversity datasets from Natural History Collections, citizen science initiatives (e.g., iNaturalist, eBird), and other sources. As part of the index process, GBIF associates at least two identifiers with indexed records: a record id (aka gbifID) and a dataset id (aka dataset key). These ids are central to do lookup, reference data, and package interpreted data products.

This publication contains an exhaustive list of GBIF IDs and ids associated by their data providers as derived from:

GBIF.org (01 March 2023) GBIF Occurrence Download https://doi.org/10.15468/dl.pk3trq

The resource (size: ~260GB) provided by GBIF had content id hash://sha256/c8bac8acb28c8524c53589b3a40e322dbbbdadf5689fef2e20266fbf6ddf6b97 and was used to generate the resource included in this publication using

preston cat 'zip:hash://sha256/c8bac8acb28c8524c53589b3a40e322dbbbdadf5689fef2e20266fbf6ddf6b97!/0015281-230224095556074.csv'\
| cut -f 1,2,3,37,38,39\
| gzip\
> gbifid.tsv.gz

with the content id of gbifid.tsv.gz (size: ~35GB) being hash://sha256/a339e32e10edaad585f61f2ded06cbb23e0618c65a6360db18d7d729054940a8 .

the first 10 lines of gbifid.tsv.gz as extracted via

preston cat --remote https://zenodo.org/record/7789866/files,https://linker.bio hash://sha256/a339e32e10edaad585f61f2ded06cbb23e0618c65a6360db18d7d729054940a8\
 | gunzip\
 | head

are:

gbifID	datasetKey	occurrenceID	institutionCode	collectionCode	catalogNumber
2997162320	c71c8000-9fc7-422c-804a-ce6abe751771	3399442	CEPEC	CEPEC	CEPEC00109669
2997162309	c71c8000-9fc7-422c-804a-ce6abe751771	2733085	CEPEC	CEPEC	CEPEC00000818
2997162317	c71c8000-9fc7-422c-804a-ce6abe751771	2733086	CEPEC	CEPEC	CEPEC00000888
2997162313	c71c8000-9fc7-422c-804a-ce6abe751771	3399443	CEPEC	CEPEC	CEPEC00109744
2997162306	c71c8000-9fc7-422c-804a-ce6abe751771	2733087	CEPEC	CEPEC	CEPEC00000889
2997162316	c71c8000-9fc7-422c-804a-ce6abe751771	3399440	CEPEC	CEPEC	CEPEC00109605
2997162324	c71c8000-9fc7-422c-804a-ce6abe751771	2733088	CEPEC	CEPEC	CEPEC00000890
2997162308	c71c8000-9fc7-422c-804a-ce6abe751771	3399441	CEPEC	CEPEC	CEPEC00109615
2997162303	c71c8000-9fc7-422c-804a-ce6abe751771	2733089	CEPEC	CEPEC	CEPEC00000891

Note that at time of writing, the html resource associated with the occurrence id 2997162320, and data set key c71c8000-9fc7-422c-804a-ce6abe751771 (extracted from of the first data row example above) are available via:

https://gbif.org/occurrence/2997162320

and

https://gbif.org/dataset/c71c8000-9fc7-422c-804a-ce6abe751771

respectively.

This resource was initially created to help integrate with Bionomia (https://bionomia.net) to help associate people identifiers provided by bionomia to their original records via their GBIF ids. Bionomia re-uses GBIF records ids as a way to define links between records and the people (e.g., curators, collectors, identifiers) that worked on them. 

In other words, this resource provides a versioned translation table from the GBIF data universe (as defined by GBIF record ids, and dataset keys) to the data collections that exist (and evolve) independent of it. 

Note that the resource identified by hash://sha256/c8bac8acb28c8524c53589b3a40e322dbbbdadf5689fef2e20266fbf6ddf6b97 was not included in this publication it was too big (260GB) to fit. You may be able to retrieve the resource from its original location at https://api.gbif.org/v1/occurrence/download/request/0015281-230224095556074.zip .

Files

Files (37.2 GB)

Name Size Download all
md5:221954c11c873d4254cb8de53a6581c3
4.0 kB Download
md5:48d6d35d0e3fc60ee07674373ef391f4
78 Bytes Download
md5:9e548db73bb2dfd023e2c2ebb94fdd5c
78 Bytes Download
md5:4f931115040a004f6f17dbd2ceaa5485
78 Bytes Download
md5:fb6b69696c9ee749b0911d1a41e9557e
4.7 kB Download
md5:7d7ba92e73fab4c8028f36c7e7c082e9
6.4 kB Download
md5:6de0a4bacfaef0ade544f0abfca50036
621.4 kB Download
md5:dbc22141c3abe11df59f341c27c06072
37.2 GB Download
md5:1732a4206b415a5a4de742bd90e6f5e1
78 Bytes Download
md5:755807adfb1d2e768d0994c70fd6b5d8
78 Bytes Download
md5:fce4196cea972207e3206978a348d118
547 Bytes Download
md5:7785b1e1a6d795ec7c13408ecd239260
3.8 kB Download
md5:c49fe34785354847b37ea4509261e130
2.8 kB Download
md5:d1c0554a30b0f64f45748f87b3ee6975
3.3 kB Download

Additional details

Funding

U.S. National Science Foundation
EAGER: Towards the Web of Biodiversity Knowledge: Understanding Data Connectedness to Improve Identifier Practices 1839201