Dataset Open Access

20 GB in 10 minutes: Data linking across major biodiversity databases: Data supplements

Poelen, Jorrit

This supplementary data publication contains:

links-globi-wd-ott.tsv.gz: aggregate list of taxon graphs from Open Tree of Life Taxonomy (OTT), GloBI and Wikidata. This tab separated two column table, describe the taxonomic identifiers (e.g., NCBI:9606) that map into OTT, GloBI and Wikidata. For instance, the line "NCBI:9689{tab}WD:Q140" indicates that wikidata links their lion (Panthera leo, https://www.wikidata.org/wiki/Q140) to NCBI's lion (Panthera leo, https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=9689).

wikidata-taxon-info20171227.tsv.gz: a terse 5 column file in tab-separated format of taxon objects extracted from WikiData. (2018). Wikidata dump 2017-12-27 [Data set]. Zenodo. http://doi.org/10.5281/zenodo.1211767 . The columns contain the following:

  1. wikidata taxon item id (e.g., Q140 or https://www.wikidata.org/wiki/Q140)
  2. scientific name of taxon item id (e.g., Panthera leo, Mammalia)
  3. rank id of the taxon item id (e.g., Q7432 species or https://www.wikidata.org/wiki/Q7432). To retrieve a full list of wikidata taxon rank ids and their common names, you can use sparql to query wikidata (e.g., Nomer's WikidataTaxonRankLoader ). 
  4. parent ids if taxon item id using pipes "|" as separators if there's multiple parents.  Please note that some taxon items have multiple parents (e.g., https://www.wikidata.org/wiki/Q774014).
  5. external taxonomic identifiers that taxon item link to (e.g. "ITIS:162532|EOL:8266|GBIF:2960|WORMS:125440") . If muliple are present, pipes "|" are used to separate the links. Only a selection of taxonomic schemes was used, namely: NCBI, GBIF, ITIS, WORMS, FISHBASE, IF (index fungorum) and EOL.

The datasets can be recreated by scripts in https://github.com/bio-guoda/guoda-datasets/tree/master/wikidata or https://doi.org/10.5281/zenodo.1428949 .

Files (130.7 MB)
Name Size
links-globi-wd-ott.tsv.gz
md5:b9ef1826b8994a135226511f3442f1ee
77.6 MB Download
wikidata-taxon-info20171227.tsv.gz
md5:5c71baf1a0f96146731e4e1bcf00fc72
53.1 MB Download
184
34
views
downloads
All versions This version
Views 184184
Downloads 3434
Data volume 2.1 GB2.1 GB
Unique views 174174
Unique downloads 2828

Share

Cite as