Software Open Access
pronto
A Python frontend to ontologies.
🗺️ Overview
Pronto is a Python library to parse, browse, create, and export ontologies, supporting several ontology languages and formats. It implement the specifications of the Open Biomedical Ontologies 1.4 in the form of an safe high-level interface. If you’re only interested in parsing OBO or OBO Graphs document, you may wish to consider fastobo
instead.
🏳️ Supported Languages
🔧 Installing
Installing with pip
is the easiest:
# pip install pronto # if you have the admin rights
$ pip install pronto --user # install it in a user-site directory
There is also a conda
recipe in the bioconda
channel:
$ conda install -c bioconda pronto
Finally, a development version can be installed from GitHub using setuptools
, provided you have the right dependencies installed already:
$ git clone https://github.com/althonos/pronto
$ cd pronto
# python setup.py install
💡 Examples
If you’re only reading ontologies, you’ll only use the Ontology
class, which is the main entry point.
>>> from pronto import Ontology
It can be instantiated from a path to an ontology in one of the supported formats, even if the file is compressed:
>>> go = Ontology("tests/data/go.obo.gz")
Loading a file from a persistent URL is also supported, although you may also want to use the Ontology.from_obo_library
method if you’re using persistent URLs a lot:
>>> cl = Ontology("http://purl.obolibrary.org/obo/cl.obo")
>>> stato = Ontology.from_obo_library("stato.owl")
🏷️ Get a term by accession
Ontology
objects can be used as mappings to access any entity they contain from their identifier in compact form:
>>> cl['CL:0002116']
Term('CL:0002116', name='B220-low CD38-positive unswitched memory B cell')
🖊️ Create a new term from scratch
We can load an ontology, and edit it locally. Here, we add a new protein class to the Protein Ontology.
>>> pr = Ontology.from_obo_library("pr.obo")
>>> brh = ms.create_term("PR:XXXXXXXX")
>>> brh.name = "Bacteriorhodopsin"
>>> brh.superclasses().add(pr["PR:000001094"]) # is a rhodopsin-like G-protein
>>> brh.disjoint_from.add(pr["PR:000036194"]) # disjoint from eukaryotic proteins
✏️ Convert an OWL ontology to OBO format
The Ontology.dump
method can be used to serialize an ontology to any of the supported formats (currently OBO and OBO JSON):
>>> edam = Ontology("http://edamontology.org/EDAM.owl")
>>> with open("edam.obo", "wb") as f:
... edam.dump(f, format="obo")
🌿 Find ontology terms without subclasses
The terms
method of Ontology
instances can be used to iterate over all the terms in the ontology (including the ones that are imported). We can then use the is_leaf
method of Term
objects to check is the term is a leaf in the class inclusion graph.
>>> ms = Ontology("ms.obo")
>>> for term in ms.terms():
... if term.is_leaf():
... print(term.id)
MS:0000000
MS:1000001
...
📖 API Reference
A complete API reference can be found in the online documentation, or directly from the command line using pydoc
:
$ pydoc pronto.Ontology
📜 License
This library is provided under the open-source MIT license. Please cite this library if you are using it in a scientific context using the following DOI: 10.5281/zenodo.595572
📒 Changelog
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
frozendict
with immutabledict
(#90).fastobo
dependency to v0.9.0
to support inline comments.Term.subclasses
and Term.superclasses
.Term.equivalent_to
setter crashing with a NameError
.Entity.synonyms
setter not extracting synonym data.anti_symmetric
clauses in OBO typedefs.ResourceWarning
when creating Ontology
from file-handles not mapping to a filesystem location.threads
parameter to Ontology
constructor to control the number of threads used by parsers supporting multithreading (OBO and OBO JSON at the moment).is_a
pseudo-relationship since subclasses/superclasses is now to be handled by the owner Ontology
.Term.subclasses()
and Term.superclasses()
. (#84) fastobo
to v0.8
, which reduce memory footprint of identifiers, and improves the parser speed.Xref
instantiation.idspace
clauses in their headers.TermSet
.nanoset
depency, which was not useful anymore in Python 3.8 and caused issues with multithreading when processing OBO frames in parallel.Synonym.xrefs
now returns a mutable set that can be used to add Xref
to the synonym directly.Ontology
to outlive all of the Term
s created from it.Term.id
property missing a return type annotation.Term.equivalent_to
not returning a TermSet
but a set of strings.SubclassesIterator
andSuperclassesIterator
to make both use the interal subclassing cache.Term.is_leaf
use internal subclassing cache to make it run in constant time.TermSet.subclasses
and TermSet.superclasses
methods to query allTerm
.TermSet
class to the top-level pronto
module.Ontology
class.Term.consider
, Term.union_of
and Term.intersection_of
.cache
keyword argument for the Ontology
.SuperclassesIterator.to_set
being named to_self
because of a typo.fastobo
-backed serializer.with_self
to disable reflexivity of Term.subclasses
and Term.superclasses
iterators.TermSet
class which stores a set of terms efficiently while providing some useful shortcuts to access the underlying data.Term.subclasses
and Term.superclasses
to a dedicated iterator class in the pronto.logic
submodule.contexter
requirement.Synonym.type
setter leading to a potential bug when the given type
is None
.mypy
.fastobo
serializer crashing on namespace clauses because of a type issue.fastobo
parsers using data version clauses as format version clauses.fastobo
to v0.7.0
, switching parser implementation to use multi-threading in order to speedup the parser process.FastoboParser
.OboSerializer
occasionaly missing lines between term and typedef frames.Entity
properties.RdfXMLParser
crashing on entities with rdf:label
elements without literal content.pronto.serializers
module not being embedded in Wheel distribution.Entity.add_synonym
method to create a new synonym and add it to an entity.@roundrepr
now adds a minimal docstring to the generated __repr__
method.Ontology
caches subclassing relationships to greatly improve performance of Term.subclasses
.Entity
subclasses now store their id
directly to improve performance.Term.subclasses
and Term.superclasses
use collections.deque
instead of queue.Queue
as a LIFO structure since thread-safety is not needed.chardet
result is now used even when prediction confidence is under 100% to detect encoding of the handle passed to Ontology
.SynonymType
comparison implementation.Synonym.type
getter crashing on type
not being None
.RdfXMLParser
crashing on synonymtypedefs without scope specifiers.FastoboSerializer
.Ontology.create_term
and Ontology.create_relationship
not raising ValueError
when given an identifier already in the knowledge graph.BaseSerializer.dump
to remove encoding
argument.__slots__
in Entity
in non-typechecking runtime.fastobo
requirement to v0.6.0
.oboInOwl:consider
annotation in RdfXMLParser
.oboInOwl:savedBy
annotation in RdfXMLParser
.subsetdef
and synonymtypedef
as annotation properties in RdfXMLParser
.doap:Version
instead of owl:VersionIri
for specification of ontology data version.PropertyValue
classes, based on the lexicographic order of their serialization.Ontology.dump
and Ontology.dumps
methods to serialize an ontology in obo or obojson format.Metadata
not storing optional description of ID spaces if any.RelationshipData.equivalent_to_chain
.networkx
in Term.subclasses
.fastobo
-derived parsers will not create a new entity if one exists in the graph of dependencies already.pronto.warnings
and the complete warnings hierarchy.OwlXMLParser
.RelationshipData.synonyms
attribute.3.6
.fastobo
.Ontology.__getitem__
can also access entities from imports.Term
, Relationship
, Xref
, SynonymType
compare only based on their ID.Subset
, Definition
compare only based on their textual value.fastobo
.mypy
type hints.__debug__
mode.repr
implementation that should roundtrip most of the time.Term.rchildren
and Term.rparents
and stop making direction assumptions on relationships.Name | Size | |
---|---|---|
althonos/pronto-v2.2.3.zip
md5:111c98705e5af3096f4c2aab758637be |
837.0 kB | Download |
All versions | This version | |
---|---|---|
Views | 662 | 66 |
Downloads | 89 | 0 |
Data volume | 132.8 MB | 0 Bytes |
Unique views | 506 | 53 |
Unique downloads | 44 | 0 |