There is a newer version of the record available.

Published May 21, 2018 | Version 0.3.2
Dataset Open

Global Biotic Interactions: Taxon Graph

Authors/Creators

  • 1. 400 Perkins St Apt 104, Oakland, CA 94610

Description


Global Biotic Interactions: Taxon Cache and Taxon Map

Global Biotic Interactions (GloBI) provides access to existing species interaction datasets (Poelen et al. 2014, http://globalbioticinteractions.org). As part of the dataset integration and aggregation, a best effort is made to resolve, match and link taxonomic names and associated vernacular/common names, hierarchies and thumbnails. 

The data archives included in this publication contain established taxonomic links (taxonMap.tsv.gz) and taxonomic information (taxonCache.tsv.gz) that GloBI retrieved and integrated from taxonomic name sources and web services associated with http://itis.gov, http://globalnames.org, http://eol.org and others open data services. 

While GloBI is not a naming authority and the primary goal of the name matching process is to detect incorrect or outdates names, the archives may serve as an example of how to publish denormalized taxonomic records and their interrelatioships in a pragmatic way.

For related discussion threads, see https://github.com/jhpoelen/eol-globi-data/issues/145 , https://github.com/jhpoelen/eol-globi-data/issues/274 , https://github.com/jhpoelen/eol-globi-data/issues/70 , https://github.com/EOL/tramea/issues/10 and https://github.com/jhpoelen/eol-globi-data/issues/274 .

Files
  
  README 
      this file
  
  taxonCache.tsv.gz.md5
     md5 hash of taxonCache.tsv.gz

  taxonCache.tsv.gz 
     Taxonomic name, ids, hierarchies, common names and thumbnail associated to taxa known to GloBI. Accessed at https://depot.globalbioticinteractions.org/datasets/org/globalbioticinteractions/taxon/0.3.2/taxon-0.3.2.zip on 21 May 2018.
 
  taxonCacheFirst10.tsv
      Header and 10 following lines from taxonCache.tsv.gz.
 
  taxonMap.tsv.gz.md5
      ms5 hash of taxonMap.tsv.gz 

  taxonMap.tsv.gz 
      Links between taxon name and ids across various taxon providers. Accessed at https://depot.globalbioticinteractions.org/datasets/org/globalbioticinteractions/taxon/0.3.2/taxon-0.3.2.zip on 21 May 2018. 

  taxonMapFirst10.tsv
      Header and 10 following lines from taxonMap.tsv.gz.
 
  prefixes.tsv
      Term prefixes and their associated uri schemes. 

Column Descriptions

  taxonCache.tsv.gz 

    1 | id
    2 | name
    3 | rank
    4 | commonNames
    5 | path
    6 | pathIds 
    7 | pathNames
    8 | externalUrl
    9 | thumbnailUrl
 
  taxonMap.tsv.gz

    1 | providedTaxonId
    2 | providedTaxonName
    3 | resolvedTaxonId
    4 | resolvedTaxonName

References

Jorrit H. Poelen, James D. Simons and Chris J. Mungall. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics. https://doi.org/10.1016/j.ecoinf.2014.08.005.

Updates

org.globalbioticinteractions.taxon v0.3, 2018-03-02

This taxon archive version was created by taking GloBI taxon v0.2 (Jan 2018) and appending a semi-automatically created WikiData taxon mapping and taxon cache.

org.globalbioticinteractions.taxon v0.3.1, 2018-04-05

This taxon archive version was created by taking GloBI taxon v0.2 (Jan 2018) and appending an automatically created WikiData taxon mapping and taxon cache using Apache Spark scripts at https://github.com/bio-guoda/guoda-datasets/tree/master/wikidata .

org.globalbioticinteractions.taxon v0.3.2, 2018-05-21

This taxon archive version includes the following:

1. all lines in taxonMap.tsv.gz v0.3.1 that passed all validate-term-link tests defined in nomer v0.0.7 (see https://doi.org/10.5281/zenodo.1249964 or https://github.com/globalbioticinteractions/nomer/releases/tag/0.0.7).

2. all lines in taxonCache.tsv.gz. v0.3.1 that passed all validate-term tests defined in nomer v0.0.7 

3. all lines in 1. that did *not* pass the validate-term test, were re-resolved using nomer v0.0.7 commands "append globi-enrich" and "append globi-globalnames". Only SAME_AS and SYNONYM_OF matches were used to generate new entries for taxonCache and taxonMap.

4. in addition, elton v0.4.5 (see https://doi.org/10.5281/zenodo.1212599 or https://github.com/globalbioticinteractions/elton/releases/tag/0.4.5) was used to generate an up-to-date names list by running the "update" and "names" commands on 18-19 May 2018. Of the resulting names, only id/names pairs that were unknown to the taxon graph were resolved using the "append globi-enrich" and "append globi-globalnames" commands of nomer v0.0.7. Only matches classified as SAME_AS and SYNONYM_OF were used to generate new entries for taxonCache and taxonMap.

5. the updated versions of taxonMap.tsv.gz and taxonCache.tsv.gz were produced by appending result of 1., 2., 3. and 4. , removing duplicate lines and sorting the result. 

6. finally, the resulting taxonMap.tsv.gz. and taxonCache.tsv.gz files were validated using the nomer v0.0.7 validate-term-link and validate-term commands, respectively. The result indicated that all lines (other than the header) passed the validation tests.

Please note that nomer and elton rely on web accessible apis like taxonomy resolution services and data portals. This dependence on external web-only accessible services might make reproduction of the results tricky due to network outages, server failures, upgrades, downgrades, data loss and/or abandonment of informatics projects/ datasets. 

Files

Files (118.2 MB)

Name Size Download all
md5:60209249a6a501510672f2cce2391a88
1.5 kB Download
md5:4874181826c562adf6079402aae84301
5.4 kB Download
md5:f924518462964a1454db116dffcc8604
96.9 MB Download
md5:a27c721d89d9d2794a3a23bddda61382
52 Bytes Download
md5:bacb3ad87b12ccfedf02431358813edc
2.6 kB Download
md5:e21aff86f789bd1d5abd56552a344edf
21.2 MB Download
md5:6df99b74126012f650ff19d0b6455168
50 Bytes Download
md5:e1d333f415db567cd7a62f64f6260f63
950 Bytes Download

Additional details

Related works