INCATools/ontology-access-kit: v0.1.26
Creators
- 1. Lawrence Berkeley National Laboratory
- 2. Harvard Medical School
- 3. EMBL
- 4. semanticly Ltd
Description
Summary
This release has two major enhancements:
- The output options for the different commands have been harmonized, and RDF is now an option for most (all well as obo, yaml, csv, obojson, as well as specialized models like sssom where appropriate)
- A new
fill-table
command is added
Fixes #147
Started unifying command structure, see #145 term-mappings deleted: use mappings
This also streamlines the formatting for most commands
These are the types; not all types apply to all commands:
OBO_FORMAT = "obo" RDF_FORMAT = "rdf" MD_FORMAT = "md" OBOJSON_FORMAT = "obojson" CSV_FORMAT = "csv" JSON_FORMAT = "json" JSONL_FORMAT = "jsonl" YAML_FORMAT = "yaml" INFO_FORMAT = "info" SSSOM_FORMAT = "sssom" OWLFUN_FORMAT = "ofn"
Examples:
$ runoak -i bioportal:ceph annotate "tentacle squid head" -O rdf
@prefix ns1: <https://w3id.org/linkml/text_annotator/> .
@prefix ns2: <http://w3id.org/sssom/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
[] a <http://www.w3.org/ns/oa#Annotation> ;
ns2:object_id "CEPH:0000256" ;
ns2:object_label "tentacle" ;
ns2:object_source "https://data.bioontology.org/ontologies/CEPH" ;
ns1:match_type "PREF" ;
ns1:subject_end 8 ;
ns1:subject_label "TENTACLE" ;
ns1:subject_start 1 .
$ echo wikidata:Q42482 | wikidata labels -O rdf -
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
<http://www.wikidata.org/entity/Q42482> rdfs:label "Iron Maiden" .
✗ echo wikidata:Q42482 | wikidata ancestors -p t -
wikidata:Q42482 ! Iron Maiden
wikidata:Q215380 ! musical group
wikidata:Q105756498 ! type of musical ensemble/group
wikidata:Q24027515 ! fifth-order class
wikidata:Q24027526 ! fixed order metaclass of higher order
wikidata:Q56816954 ! heavy metal band
wikidata:Q19361238 ! Wikidata metaclass
wikidata:Q23958852 ! variable-order class
wikidata:Q24017465 ! third-order class
wikidata:Q24027474 ! fourth-order class
✗ echo wikidata:Q42482 | wikidata ancestors -p t - -O csv
id label
wikidata:Q42482 Iron Maiden
wikidata:Q215380 musical group
wikidata:Q105756498 type of musical ensemble/group
wikidata:Q24027515 fifth-order class
wikidata:Q24027526 fixed order metaclass of higher order
wikidata:Q56816954 heavy metal band
wikidata:Q24017465 third-order class
wikidata:Q24027474 fourth-order class
wikidata:Q19361238 Wikidata metaclass
wikidata:Q23958852 variable-order class
✗ echo wikidata:Q42482 | wikidata ancestors -p t - -O ofn
AnnotationAssertion( rdfs:label <http://www.wikidata.org/entity/Q42482> "Iron Maiden" )
AnnotationAssertion( rdfs:label <http://www.wikidata.org/entity/Q105756498> "type of musical ensemble/group" )
AnnotationAssertion( rdfs:label <http://www.wikidata.org/entity/Q215380> "musical group" )
AnnotationAssertion( rdfs:label <http://www.wikidata.org/entity/Q19361238> "Wikidata metaclass" )
AnnotationAssertion( rdfs:label <http://www.wikidata.org/entity/Q23958852> "variable-order class" )
AnnotationAssertion( rdfs:label <http://www.wikidata.org/entity/Q24017465> "third-order class" )
AnnotationAssertion( rdfs:label <http://www.wikidata.org/entity/Q24027474> "fourth-order class" )
AnnotationAssertion( rdfs:label <http://www.wikidata.org/entity/Q24027515> "fifth-order class" )
AnnotationAssertion( rdfs:label <http://www.wikidata.org/entity/Q24027526> "fixed order metaclass of higher order" )
AnnotationAssertion( rdfs:label <http://www.wikidata.org/entity/Q56816954> "heavy metal band" )
fill-table command
Fixes #154
This adds a new command --fill-table
(which of course is backed by a separate python utility module, no logic is interwoven with the CLI)
Fills missing values in a table of ontology elements
Given a TSV with a populated ID column, and unpopulated columns for definition, label, mappings, ancestors, this will iterate through each row filling in each missing value by performing ontology lookups.
In some cases, this can also perform reverse lookups; i.e given a table with labels populated and blank IDs, then fill in the IDs
In the most basic scenario, you have a table with two columns 'id' and 'label'. These are the "conventional" column headers for a table of ontology elements (see later for configuration when you don't follow conventions)
Example:
runoak -i cl.owl.ttl fill-table my-table.tsv
(any implementation can be used)
The same command will work for the reverse scenario - when you have labels populated, but IDs are not populated
By default this will throw an error if a lookup is not successful; this can be relaxed
Relaxed:
runoak -i cl.owl.ttl fill-table --allow-missing my-table.tsv
In this case missing values that cannot be populated will remain empty
To explicitly populate a value:
runoak -i cl.owl.ttl fill-table --missing-value-token NO_DATA my-table.tsv
Currently the following columns are recognized:
- id -- the unique identifier of the element
- label -- the rdfs:label of the element
- definition -- the definition of the element
- mappings -- mappings for the element
- ancestors -- ancestors for the element (this can be parameterized)
The metadata inference procedure will also work for when you have denormalized TSV files with columns such as "foo_id" and "foo_name". This will be recognized as an implicit normalized label relation between id and name of a foo element.
You can be more explicit in one of two ways:
- Pass in a YAML structure (on command line or in a YAML file) listing relations
- Pass in a LinkML data definitions YAML file
For the first method, you can pass in multiple relations using the --relation arg. For example, given a TSV with columns cl_identifier and cl_display_label you can say:
Example:
runoak -i cl.owl.ttl fill-table \
--relation "{primary_key: cl_identifier, dependent_column: cl_display_label, relation: label}"
You can also specify this in a YAML file
For the 2nd method, you need to specify a LinkML schema with a class where (1) at least one field is annotated as being an identifier (2) one or more slots have slot_uri elements mapping them to standard metadata elements such as rdfs:label.
For example, my-schema.yaml:
classes:
Person:
attributes:
id:
identifier: true
name:
slot_uri: rdfs:label
This is a powerful command with many ways of configuring it - we will add separate docs for this soon, for now please file an issue on github with any questions
- TODO: allow for an option that will perform fuzzy matches of labels
- TODO: reverse lookup is not provided for all fields, such as definitions
- TODO: add an option to detect inconsistencies
- TODO: add logical for obsoletion/replaced by
- TODO: use most optimized method for whichever backend
- unpinned sssom in dependencies by @hrshdhgd in https://github.com/INCATools/ontology-access-kit/pull/151
- Added RDF streaming writer and allow it as option for most RDF commands. Fixes #147 by @cmungall in https://github.com/INCATools/ontology-access-kit/pull/153
- Added fill-table command for populating missing values by ontology lookup. by @cmungall in https://github.com/INCATools/ontology-access-kit/pull/155
Full Changelog: https://github.com/INCATools/ontology-access-kit/compare/v0.1.25...v0.1.26
Files
INCATools/ontology-access-kit-v0.1.26.zip
Files
(5.1 MB)
Name | Size | Download all |
---|---|---|
md5:6f6daa92d521aa212e819e517b918224
|
5.1 MB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/INCATools/ontology-access-kit/tree/v0.1.26 (URL)