Dataset Open Access

20 GB in 10 minutes: Data linking across major biodiversity databases: Data supplements

Poelen, Jorrit

This supplementary data publication contains:

links-globi-wd-ott.tsv.gz: aggregate list of taxon graphs from Open Tree of Life Taxonomy (OTT), GloBI and Wikidata. This tab separated two column table, describe the taxonomic identifiers (e.g., NCBI:9606) that map into OTT, GloBI and Wikidata. For instance, the line "NCBI:9689{tab}WD:Q140" indicates that wikidata links their lion (Panthera leo, to NCBI's lion (Panthera leo,

wikidata-taxon-info20171227.tsv.gz: a terse 5 column file in tab-separated format of taxon objects extracted from WikiData. (2018). Wikidata dump 2017-12-27 [Data set]. Zenodo. . The columns contain the following:

  1. wikidata taxon item id (e.g., Q140 or
  2. scientific name of taxon item id (e.g., Panthera leo, Mammalia)
  3. rank id of the taxon item id (e.g., Q7432 species or To retrieve a full list of wikidata taxon rank ids and their common names, you can use sparql to query wikidata (e.g., Nomer's WikidataTaxonRankLoader ). 
  4. parent ids if taxon item id using pipes "|" as separators if there's multiple parents.  Please note that some taxon items have multiple parents (e.g.,
  5. external taxonomic identifiers that taxon item link to (e.g. "ITIS:162532|EOL:8266|GBIF:2960|WORMS:125440") . If muliple are present, pipes "|" are used to separate the links. Only a selection of taxonomic schemes was used, namely: NCBI, GBIF, ITIS, WORMS, FISHBASE, IF (index fungorum) and EOL.

The datasets can be recreated by scripts in or .

Files (130.7 MB)
Name Size
77.6 MB Download
53.1 MB Download
All versions This version
Views 220220
Downloads 5656
Data volume 3.2 GB3.2 GB
Unique views 210210
Unique downloads 4949


Cite as