File uploads: We have fixed an issue which caused file uploads to fail. We apologise for the inconvenience it may have caused.

There is a newer version of the record available.

Published July 2, 2022 | Version v0.1.26
Software Open

INCATools/ontology-access-kit: v0.1.26

  • 1. Lawrence Berkeley National Laboratory
  • 2. Harvard Medical School
  • 3. EMBL
  • 4. semanticly Ltd

Description

Summary

This release has two major enhancements:

  1. The output options for the different commands have been harmonized, and RDF is now an option for most (all well as obo, yaml, csv, obojson, as well as specialized models like sssom where appropriate)
  2. A new fill-table command is added
Command Line outputs

Fixes #147

Started unifying command structure, see #145 term-mappings deleted: use mappings

This also streamlines the formatting for most commands

These are the types; not all types apply to all commands:

OBO_FORMAT = "obo" RDF_FORMAT = "rdf" MD_FORMAT = "md" OBOJSON_FORMAT = "obojson" CSV_FORMAT = "csv" JSON_FORMAT = "json" JSONL_FORMAT = "jsonl" YAML_FORMAT = "yaml" INFO_FORMAT = "info" SSSOM_FORMAT = "sssom" OWLFUN_FORMAT = "ofn"

Examples:

$ runoak -i bioportal:ceph annotate "tentacle squid head" -O rdf

@prefix ns1: <https://w3id.org/linkml/text_annotator/> .
@prefix ns2: <http://w3id.org/sssom/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

[] a <http://www.w3.org/ns/oa#Annotation> ;
    ns2:object_id "CEPH:0000256" ;
    ns2:object_label "tentacle" ;
    ns2:object_source "https://data.bioontology.org/ontologies/CEPH" ;
    ns1:match_type "PREF" ;
    ns1:subject_end 8 ;
    ns1:subject_label "TENTACLE" ;
    ns1:subject_start 1 .

$ echo wikidata:Q42482 | wikidata labels -O rdf -
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

<http://www.wikidata.org/entity/Q42482> rdfs:label "Iron Maiden" .

✗ echo wikidata:Q42482 | wikidata ancestors -p t -
wikidata:Q42482 ! Iron Maiden
wikidata:Q215380 ! musical group
wikidata:Q105756498 ! type of musical ensemble/group
wikidata:Q24027515 ! fifth-order class
wikidata:Q24027526 ! fixed order metaclass of higher order
wikidata:Q56816954 ! heavy metal band
wikidata:Q19361238 ! Wikidata metaclass
wikidata:Q23958852 ! variable-order class
wikidata:Q24017465 ! third-order class
wikidata:Q24027474 ! fourth-order class

✗ echo wikidata:Q42482 | wikidata ancestors -p t - -O csv
id      label
wikidata:Q42482 Iron Maiden
wikidata:Q215380        musical group
wikidata:Q105756498     type of musical ensemble/group
wikidata:Q24027515      fifth-order class
wikidata:Q24027526      fixed order metaclass of higher order
wikidata:Q56816954      heavy metal band
wikidata:Q24017465      third-order class
wikidata:Q24027474      fourth-order class
wikidata:Q19361238      Wikidata metaclass
wikidata:Q23958852      variable-order class

✗ echo wikidata:Q42482 | wikidata ancestors -p t - -O ofn
AnnotationAssertion( rdfs:label <http://www.wikidata.org/entity/Q42482> "Iron Maiden" )
AnnotationAssertion( rdfs:label <http://www.wikidata.org/entity/Q105756498> "type of musical ensemble/group" )
AnnotationAssertion( rdfs:label <http://www.wikidata.org/entity/Q215380> "musical group" )
AnnotationAssertion( rdfs:label <http://www.wikidata.org/entity/Q19361238> "Wikidata metaclass" )
AnnotationAssertion( rdfs:label <http://www.wikidata.org/entity/Q23958852> "variable-order class" )
AnnotationAssertion( rdfs:label <http://www.wikidata.org/entity/Q24017465> "third-order class" )
AnnotationAssertion( rdfs:label <http://www.wikidata.org/entity/Q24027474> "fourth-order class" )
AnnotationAssertion( rdfs:label <http://www.wikidata.org/entity/Q24027515> "fifth-order class" )
AnnotationAssertion( rdfs:label <http://www.wikidata.org/entity/Q24027526> "fixed order metaclass of higher order" )
AnnotationAssertion( rdfs:label <http://www.wikidata.org/entity/Q56816954> "heavy metal band" )
fill-table command

Fixes #154

This adds a new command --fill-table (which of course is backed by a separate python utility module, no logic is interwoven with the CLI)

Fills missing values in a table of ontology elements

Given a TSV with a populated ID column, and unpopulated columns for definition, label, mappings, ancestors, this will iterate through each row filling in each missing value by performing ontology lookups.

In some cases, this can also perform reverse lookups; i.e given a table with labels populated and blank IDs, then fill in the IDs

In the most basic scenario, you have a table with two columns 'id' and 'label'. These are the "conventional" column headers for a table of ontology elements (see later for configuration when you don't follow conventions)

Example:

runoak -i cl.owl.ttl fill-table my-table.tsv

(any implementation can be used)

The same command will work for the reverse scenario - when you have labels populated, but IDs are not populated

By default this will throw an error if a lookup is not successful; this can be relaxed

Relaxed:

runoak -i cl.owl.ttl fill-table --allow-missing my-table.tsv

In this case missing values that cannot be populated will remain empty

To explicitly populate a value:

runoak -i cl.owl.ttl fill-table --missing-value-token NO_DATA my-table.tsv

Currently the following columns are recognized:

  • id -- the unique identifier of the element
  • label -- the rdfs:label of the element
  • definition -- the definition of the element
  • mappings -- mappings for the element
  • ancestors -- ancestors for the element (this can be parameterized)

The metadata inference procedure will also work for when you have denormalized TSV files with columns such as "foo_id" and "foo_name". This will be recognized as an implicit normalized label relation between id and name of a foo element.

You can be more explicit in one of two ways:

  1. Pass in a YAML structure (on command line or in a YAML file) listing relations
  2. Pass in a LinkML data definitions YAML file

For the first method, you can pass in multiple relations using the --relation arg. For example, given a TSV with columns cl_identifier and cl_display_label you can say:

Example:

runoak -i cl.owl.ttl fill-table \
  --relation "{primary_key: cl_identifier, dependent_column: cl_display_label, relation: label}"

You can also specify this in a YAML file

For the 2nd method, you need to specify a LinkML schema with a class where (1) at least one field is annotated as being an identifier (2) one or more slots have slot_uri elements mapping them to standard metadata elements such as rdfs:label.

For example, my-schema.yaml:

        classes:
          Person:
            attributes:
              id:
                identifier: true
              name:
                slot_uri: rdfs:label

This is a powerful command with many ways of configuring it - we will add separate docs for this soon, for now please file an issue on github with any questions

  • TODO: allow for an option that will perform fuzzy matches of labels
  • TODO: reverse lookup is not provided for all fields, such as definitions
  • TODO: add an option to detect inconsistencies
  • TODO: add logical for obsoletion/replaced by
  • TODO: use most optimized method for whichever backend
What's Changed

Full Changelog: https://github.com/INCATools/ontology-access-kit/compare/v0.1.25...v0.1.26

Files

INCATools/ontology-access-kit-v0.1.26.zip

Files (5.1 MB)

Name Size Download all
md5:6f6daa92d521aa212e819e517b918224
5.1 MB Preview Download

Additional details