Dataset Open Access

20 GB in 10 minutes: Data linking across major biodiversity databases: Data supplements

Poelen, Jorrit


JSON Export

{
  "files": [
    {
      "links": {
        "self": "https://zenodo.org/api/files/5b7e2a31-01bd-4c04-9e2d-84c8d689798d/links-globi-wd-ott.tsv.gz"
      }, 
      "checksum": "md5:b9ef1826b8994a135226511f3442f1ee", 
      "bucket": "5b7e2a31-01bd-4c04-9e2d-84c8d689798d", 
      "key": "links-globi-wd-ott.tsv.gz", 
      "type": "gz", 
      "size": 77571089
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/5b7e2a31-01bd-4c04-9e2d-84c8d689798d/wikidata-taxon-info20171227.tsv.gz"
      }, 
      "checksum": "md5:5c71baf1a0f96146731e4e1bcf00fc72", 
      "bucket": "5b7e2a31-01bd-4c04-9e2d-84c8d689798d", 
      "key": "wikidata-taxon-info20171227.tsv.gz", 
      "type": "gz", 
      "size": 53115231
    }
  ], 
  "owners": [
    7292
  ], 
  "doi": "10.5281/zenodo.1213477", 
  "stats": {
    "version_unique_downloads": 35.0, 
    "unique_views": 186.0, 
    "views": 196.0, 
    "version_views": 196.0, 
    "unique_downloads": 35.0, 
    "version_unique_views": 186.0, 
    "volume": 2475398282.0, 
    "version_downloads": 42.0, 
    "downloads": 42.0, 
    "version_volume": 2475398282.0
  }, 
  "links": {
    "doi": "https://doi.org/10.5281/zenodo.1213477", 
    "conceptdoi": "https://doi.org/10.5281/zenodo.1213476", 
    "bucket": "https://zenodo.org/api/files/5b7e2a31-01bd-4c04-9e2d-84c8d689798d", 
    "conceptbadge": "https://zenodo.org/badge/doi/10.5281/zenodo.1213476.svg", 
    "html": "https://zenodo.org/record/1213477", 
    "latest_html": "https://zenodo.org/record/1213477", 
    "badge": "https://zenodo.org/badge/doi/10.5281/zenodo.1213477.svg", 
    "latest": "https://zenodo.org/api/records/1213477"
  }, 
  "conceptdoi": "10.5281/zenodo.1213476", 
  "created": "2018-04-06T05:05:58.836885+00:00", 
  "updated": "2020-01-24T19:24:57.479445+00:00", 
  "conceptrecid": "1213476", 
  "revision": 8, 
  "id": 1213477, 
  "metadata": {
    "access_right_category": "success", 
    "doi": "10.5281/zenodo.1213477", 
    "description": "<p>This supplementary data publication contains:</p>\n\n<p><strong>links-globi-wd-ott.tsv.gz:</strong>&nbsp;aggregate list of taxon graphs from Open Tree of Life Taxonomy (OTT), GloBI and Wikidata. This tab separated two column table, describe the taxonomic identifiers&nbsp;(e.g., NCBI:9606) that map into OTT, GloBI and Wikidata. For instance, the line &quot;NCBI:9689{tab}WD:Q140&quot; indicates that wikidata links their lion (<em>Panthera leo</em>,&nbsp;https://www.wikidata.org/wiki/Q140)&nbsp;to NCBI&#39;s lion (<em>Panthera leo</em>, https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&amp;id=9689).</p>\n\n<p><strong>wikidata-taxon-info20171227.tsv.gz:&nbsp;</strong>a terse 5 column file in tab-separated format of taxon objects extracted from&nbsp;WikiData. (2018). Wikidata dump 2017-12-27 [Data set]. Zenodo. http://doi.org/10.5281/zenodo.1211767 . The columns contain the following:</p>\n\n<ol>\n\t<li>wikidata taxon item id (e.g., Q140 or https://www.wikidata.org/wiki/Q140)</li>\n\t<li>scientific name of taxon item id (e.g., Panthera leo, Mammalia)</li>\n\t<li>rank id of the taxon item id (e.g., Q7432 species or https://www.wikidata.org/wiki/Q7432). To retrieve a full list of wikidata taxon rank ids and their common names, you can use sparql to query wikidata (e.g.,&nbsp;<a href=\"https://github.com/globalbioticinteractions/nomer/blob/c3a1f5a2ebfb87ffc67e3bace19b82d96c0d25e8/nomer/src/main/java/org/globalbioticinteractions/nomer/util/WikidataTaxonRankLoader.java\">Nomer&#39;s WikidataTaxonRankLoader</a>&nbsp;).&nbsp;</li>\n\t<li>parent ids if taxon item id using pipes &quot;|&quot; as separators if there&#39;s multiple parents.&nbsp;&nbsp;Please note that some taxon items have multiple parents (e.g.,&nbsp;https://www.wikidata.org/wiki/Q774014).</li>\n\t<li>external taxonomic identifiers that taxon item link to (e.g. &quot;ITIS:162532|EOL:8266|GBIF:2960|WORMS:125440&quot;) . If muliple are present, pipes &quot;|&quot; are used to separate the links. Only a selection of taxonomic schemes was used, namely: NCBI, GBIF, ITIS, WORMS, FISHBASE, IF (index fungorum) and EOL.</li>\n</ol>\n\n<p>The datasets can be recreated by scripts in&nbsp;https://github.com/bio-guoda/guoda-datasets/tree/master/wikidata or <a href=\"https://doi.org/10.5281/zenodo.1428949\">https://doi.org/10.5281/zenodo.1428949</a>&nbsp;.</p>", 
    "license": {
      "id": "CC-BY-4.0"
    }, 
    "title": "20 GB in 10 minutes: Data linking across major biodiversity databases: Data supplements", 
    "relations": {
      "version": [
        {
          "count": 1, 
          "index": 0, 
          "parent": {
            "pid_type": "recid", 
            "pid_value": "1213476"
          }, 
          "is_last": true, 
          "last_child": {
            "pid_type": "recid", 
            "pid_value": "1213477"
          }
        }
      ]
    }, 
    "version": "0.1", 
    "publication_date": "2018-04-06", 
    "creators": [
      {
        "name": "Poelen, Jorrit"
      }
    ], 
    "access_right": "open", 
    "resource_type": {
      "type": "dataset", 
      "title": "Dataset"
    }, 
    "related_identifiers": [
      {
        "scheme": "doi", 
        "identifier": "10.5281/zenodo.1213476", 
        "relation": "isVersionOf"
      }
    ]
  }
}
196
42
views
downloads
All versions This version
Views 196196
Downloads 4242
Data volume 2.5 GB2.5 GB
Unique views 186186
Unique downloads 3535

Share

Cite as