Dataset Open Access

Global Biotic Interactions: Interpreted Data Products

Poelen, Jorrit H.

Global Biotic Interactions: Interpreted Data Products

Global Biotic Interactions (GloBI, https://globalbioticinteractions.org, [1]) aims to facilitate access to existing species interaction records (e.g., predator-prey, plant-pollinator, virus-host). This data publication provides interpreted species interaction data products. These products are the result of a process in which versioned, existing species interaction datasets ([2]) are linked to the so-called GloBI Taxon Graph ([3]) and transformed into various aggregate formats (e.g., tsv, csv, neo4j, rdf/nquad, darwin core-ish archives). In addition, the applied name maps are included to make the applied taxonomic linking explicit.

Citation
--------

GloBI is made possible by researchers, collections, projects and institutions openly sharing their datasets. When using this data, please make sure to attribute these *original data contributors*, including citing the specific datasets in derivative work. Each species interaction record indexed by GloBI contains a reference and dataset citation. Also, a full lists of all references can be found in citations.csv/citations.tsv files in this publication. If you have ideas on how to make it easier to cite original datasets, please open/join a discussion via https://globalbioticinteractions.org or related projects.

To credit GloBI for more easily finding interaction data, please use the following citation to reference GloBI:

Jorrit H. Poelen, James D. Simons and Chris J. Mungall. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics. https://doi.org/10.1016/j.ecoinf.2014.08.005.

Bias and Errors
--------

As with any analysis and processing workflow, care should be taken to understand the bias and error propagation of data sources and related data transformation processes. The datasets indexed by GloBI are biased geospatially, temporally and taxonomically ([5], [6]). Also, mapping of verbatim names from datasets to known name concept may contains errors due to synonym mismatches, outdated names lists, typos or conflicting name authorities. Finally, bugs may introduce bias and errors in the resulting integrated data product.

To help better understand where bias and errors are introduced, only versioned data and code are used as an input: the datasets ([2]), name maps ([3]) and integration software ([6]) are versioned so that the integration processes can be reproduced if needed. This way, steps take to compile an integrated data record can be traced and the sources of bias and errors can be more easily found.

Contents
--------

README:
this file

citations.csv.gz:
contains data citations in a in a gzipped comma-separated values format.

interactions.csv.gz:
contains species interactions tabulated as pair-wise interactions in a gzipped comma-separated values format.

citations.tsv.gz:
contains data citations in a gzipped tab-separated values format.

interactions.tsv.gz:
contains species interactions tabulated as pair-wise interactions in a gzipped tab-separated values format.

interactions.nq.gz:
contains species interactions expressed in the resource description framework in a gzipped rdf/quads format.

dwca-by-study.zip:
contains species interactions data as a Darwin Core Archive aggregated by study using a custom, occurrence level, association extension.

dwca.zip:
contains species interactions data as a Darwin Core Archive using a custom, occurrence level, association extension.

neo4j-graphdb.zip:
contains a neo4j v2.3.12 graph database snapshot containing a graph representation of the species interaction data.

taxonCache.tsv.gz:
contains hierarchies and identifiers associated with names from naming schemes in a gzipped tab-separated values format.

taxonMap.tsv.gz:
describes how names in existing datasets were mapped into existing naming schemes in a gzipped tab-separated values format.

Notes that each of the data files has an computed content hash in associated .sha256 file.

References
-----

[1] Jorrit H. Poelen, James D. Simons and Chris J. Mungall. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics. doi: 10.1016/j.ecoinf.2014.08.005.

[2] Poelen, J. H. (2020) Global Biotic Interactions: Elton Dataset Cache. Zenodo. doi: 10.5281/ZENODO.3950557.

[3] Poelen, J. H. (2021). Global Biotic Interactions: Taxon Graph (Version 0.3.28) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.4451472

[4] Hortal, J. et al. (2015) Seven Shortfalls that Beset Large-Scale Knowledge of Biodiversity. Annual Review of Ecology, Evolution, and Systematics, 46(1), pp.523–549. doi: 10.1146/annurev-ecolsys-112414-054400.

[5] Cains, M. et al. (2017) Ivmooc 2017 - Gap Analysis Of Globi: Identifying Research And Data Sharing Opportunities For Species Interactions. Zenodo. Zenodo. doi: 10.5281/ZENODO.814978.

[6] Poelen, J. et al. (2020) globalbioticinteractions/globalbioticinteractions v0.19.0. Zenodo. doi: 10.5281/ZENODO.3946991.

Content References
-----

hash://sha256/f98ab483fd127db52e5fe8da6911b9ed18ecd370fd49ee04d2999a43591e1d14

hash://sha256/6386e4ae3bdfa6f7c292e8e3fa208f7ae4c946280c3d948f63edd8915109516a

hash://sha256/c33e53377c2c402d972d9e19138e1b926f0f387135cb3bf03c29758807c94f5c

hash://sha256/568bfdd3d16a1ddc6b0458ed4a3c41d19e6b09124d69ac004264f327349c2f42

hash://sha256/0d7fe46e7054bc7b4c07722c552820cbb9014c1f04b1eddd76980018a2a46013

hash://sha256/8c01f5fe5ef8010ff16479235bbdedcdfca44c556dcce8f1707332cf2a59d724

hash://sha256/cfd15dbb5a8cbeb2b19f5302a75fb1fdb48410a37a2fe307b7de222e88266598

hash://sha256/67a28be22823c3b3c3fe116a23fe69c33baa6f341dd63f7d136d1dd485270ed2

hash://sha256/de14dd3c92165a97688d34fe35e80d53a7c5d7b6aa81e0bf51dbcd77f33a5419

hash://sha256/f5334f79b02f8e1486c28a320716b3d5b887acf999334e52ca12f1522549b930

hash://sha256/373c1397c3bcd3bb527839c7e333e053142e2c5da2a5e7c43ffa5948215db638

 

Files (8.7 GB)
Name Size
citations.csv.gz
md5:131c634e787ed5912ee44b3efae8dd22
12.2 MB Download
citations.csv.sha256
md5:610a85ff1f534b433020dfaf129001f3
65 Bytes Download
citations.tsv.gz
md5:50ba623bc34b6ef1f51d114d8d4aefa8
12.2 MB Download
citations.tsv.sha256
md5:da8ea90b0d9f4f14463f32db9ef91700
65 Bytes Download
dwca-by-study.zip
md5:9d48e3d1f84d71a955a7d272cad743ce
113.4 MB Download
dwca-by-study.zip.sha256
md5:e1db542fc0ecc76709e8c2f949b3ed63
65 Bytes Download
dwca.zip
md5:d4b63461eaa3c48341c4c6ee894d2229
205.1 MB Download
dwca.zip.sha256
md5:1daaf89e598857258bd828bd83784249
65 Bytes Download
interactions.csv.gz
md5:9502f706649ff4933841bc4eceeff4d1
817.6 MB Download
interactions.csv.sha256
md5:0a8e4ee3587329febf4f3b5aaa52952e
65 Bytes Download
interactions.nq.gz
md5:6cce9ee2ba9a34beabfb1bdbb0edf924
3.8 GB Download
interactions.nq.sha256
md5:4ba300de297b18c6395c707a9ff29a0f
65 Bytes Download
interactions.tsv.gz
md5:22a39a16dc39cf8ab5ca25ef30de2b41
816.8 MB Download
interactions.tsv.sha256
md5:9633b914f28587b49973f4fede0f1a22
65 Bytes Download
neo4j-graphdb.zip
md5:8e02a23c307c2e2ebc28d569d34b7532
2.8 GB Download
neo4j-graphdb.zip.sha256
md5:76a027502aa88ae2e38fa748fd17ee1b
65 Bytes Download
README
md5:e986c9d4f3fa38a40e193eb487b7f230
6.0 kB Download
README.sha256
md5:9f4b7cc46314b8ee61d663d9fb21cedb
65 Bytes Download
taxonCache.tsv.gz
md5:7d75966472b6caac19d3379a065ca9f0
80.6 MB Download
taxonCache.tsv.sha256
md5:11eb299986fd8f6817adf18fb01b8f19
65 Bytes Download
taxonMap.tsv.gz
md5:6375d044a9e3ef9817fc6ffdfa7c6a02
58.6 MB Download
taxonMap.tsv.sha256
md5:be9cf36147b478ffa428063f8829da86
65 Bytes Download
169
3,089
views
downloads
All versions This version
Views 16960
Downloads 3,089611
Data volume 3.7 TB470.6 GB
Unique views 15455
Unique downloads 836415

Share

Cite as