Dataset Open Access

20 GB in 10 minutes: Data linking across major biodiversity databases: Data supplements

Poelen, Jorrit


Dublin Core Export

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
  <dc:creator>Poelen, Jorrit</dc:creator>
  <dc:date>2018-04-06</dc:date>
  <dc:description>This supplementary data publication contains:

links-globi-wd-ott.tsv.gz: aggregate list of taxon graphs from Open Tree of Life Taxonomy (OTT), GloBI and Wikidata. This tab separated two column table, describe the taxonomic identifiers (e.g., NCBI:9606) that map into OTT, GloBI and Wikidata. For instance, the line "NCBI:9689{tab}WD:Q140" indicates that wikidata links their lion (Panthera leo, https://www.wikidata.org/wiki/Q140) to NCBI's lion (Panthera leo, https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&amp;id=9689).

wikidata-taxon-info20171227.tsv.gz: a terse 5 column file in tab-separated format of taxon objects extracted from WikiData. (2018). Wikidata dump 2017-12-27 [Data set]. Zenodo. http://doi.org/10.5281/zenodo.1211767 . The columns contain the following:


	wikidata taxon item id (e.g., Q140 or https://www.wikidata.org/wiki/Q140)
	scientific name of taxon item id (e.g., Panthera leo, Mammalia)
	rank id of the taxon item id (e.g., Q7432 species or https://www.wikidata.org/wiki/Q7432). To retrieve a full list of wikidata taxon rank ids and their common names, you can use sparql to query wikidata (e.g., Nomer's WikidataTaxonRankLoader ). 
	parent ids if taxon item id using pipes "|" as separators if there's multiple parents.  Please note that some taxon items have multiple parents (e.g., https://www.wikidata.org/wiki/Q774014).
	external taxonomic identifiers that taxon item link to (e.g. "ITIS:162532|EOL:8266|GBIF:2960|WORMS:125440") . If muliple are present, pipes "|" are used to separate the links. Only a selection of taxonomic schemes was used, namely: NCBI, GBIF, ITIS, WORMS, FISHBASE, IF (index fungorum) and EOL.


The datasets can be recreated by scripts in https://github.com/bio-guoda/guoda-datasets/tree/master/wikidata or https://doi.org/10.5281/zenodo.1428949 .</dc:description>
  <dc:identifier>https://zenodo.org/record/1213477</dc:identifier>
  <dc:identifier>10.5281/zenodo.1213477</dc:identifier>
  <dc:identifier>oai:zenodo.org:1213477</dc:identifier>
  <dc:relation>doi:10.5281/zenodo.1213476</dc:relation>
  <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
  <dc:rights>https://creativecommons.org/licenses/by/4.0/legalcode</dc:rights>
  <dc:title>20 GB in 10 minutes: Data linking across major biodiversity databases: Data supplements</dc:title>
  <dc:type>info:eu-repo/semantics/other</dc:type>
  <dc:type>dataset</dc:type>
</oai_dc:dc>
196
42
views
downloads
All versions This version
Views 196196
Downloads 4242
Data volume 2.5 GB2.5 GB
Unique views 186186
Unique downloads 3535

Share

Cite as