Software Open Access
A Python frontend to ontologies.
Pronto is a Python library to parse, browse, create, and export ontologies, supporting several ontology languages and formats. It implement the specifications of the Open Biomedical Ontologies 1.4 in the form of an safe high-level interface. If you’re only interested in parsing OBO or OBO Graphs document, you may wish to consider
🏳️ Supported Languages
pip is the easiest:
# pip install pronto # if you have the admin rights $ pip install pronto --user # install it in a user-site directory
There is also a
conda recipe in the
$ conda install -c bioconda pronto
Finally, a development version can be installed from GitHub using
setuptools, provided you have the right dependencies installed already:
$ git clone https://github.com/althonos/pronto $ cd pronto # python setup.py install
If you’re only reading ontologies, you’ll only use the
Ontology class, which is the main entry point.
>>> from pronto import Ontology
It can be instantiated from a path to an ontology in one of the supported formats, even if the file is compressed:
>>> go = Ontology("tests/data/go.obo.gz")
Loading a file from a persistent URL is also supported, although you may also want to use the
Ontology.from_obo_library method if you’re using persistent URLs a lot:
>>> cl = Ontology("http://purl.obolibrary.org/obo/cl.obo") >>> stato = Ontology.from_obo_library("stato.owl")
🏷️ Get a term by accession
Ontology objects can be used as mappings to access any entity they contain from their identifier in compact form:
>>> cl['CL:0002116'] Term('CL:0002116', name='B220-low CD38-positive unswitched memory B cell')
🔎 Get a term by alternate ID
Retrieval of an entity by any alternate ID it was declared from is possible using the same syntax:
>>> hp = pronto.Ontology.from_obo_library("hp.obo") >>> 'HP:0001198' in hp['HP:0009882'].alternate_ids True >>> hp['HP:0001198'] Term('HP:0009882', name='Short distal phalanx of finger')
🖊️ Create a new term from scratch
We can load an ontology, and edit it locally. Here, we add a new protein class to the Protein Ontology.
>>> pr = Ontology.from_obo_library("pr.obo") >>> brh = ms.create_term("PR:XXXXXXXX") >>> brh.name = "Bacteriorhodopsin" >>> brh.superclasses().add(pr["PR:000001094"]) # is a rhodopsin-like G-protein >>> brh.disjoint_from.add(pr["PR:000036194"]) # disjoint from eukaryotic proteins
✏️ Convert an OWL ontology to OBO format
Ontology.dump method can be used to serialize an ontology to any of the supported formats (currently OBO and OBO JSON):
>>> edam = Ontology("http://edamontology.org/EDAM.owl") >>> with open("edam.obo", "wb") as f: ... edam.dump(f, format="obo")
🌿 Find ontology terms without subclasses
terms method of
Ontology instances can be used to iterate over all the terms in the ontology (including the ones that are imported). We can then use the
is_leaf method of
Term objects to check is the term is a leaf in the class inclusion graph.
>>> ms = Ontology("ms.obo") >>> for term in ms.terms(): ... if term.is_leaf(): ... print(term.id) MS:0000000 MS:1000001 ...
📖 API Reference
A complete API reference can be found in the online documentation, or directly from the command line using
$ pydoc pronto.Ontology
__all__attribute to all modules of the data model.
TermSetwith shortcut attributes and proxying of actual
Relationship.superpropertiesmethods to add, remove, clear and iterate over the subproperties and superproperties of a
Ontology.synonym_typesmethod to count (via
SizedIterator) and iterate over the synonym types of an ontology and all of its imports.
Ontology.get_synonym_typemethod to retrieve a single synonym type by ID from an ontology or one of its imports.
Relationshipnow return a
Term.subclassesdescribing the performances of the previous algorithm.
AttributeErrorwith the setter of the
Entity.annotationsreturn a mutable set and add a setter.
Term.relationshiperroneously updating the
v0.9.0to support inline comments.
Term.equivalent_tosetter crashing with a
Entity.synonymssetter not extracting synonym data.
anti_symmetricclauses in OBO typedefs.
Ontologyfrom file-handles not mapping to a filesystem location.
Ontologyconstructor to control the number of threads used by parsers supporting multithreading (OBO and OBO JSON at the moment).
is_apseudo-relationship since subclasses/superclasses is now to be handled by the owner
v0.8, which reduce memory footprint of identifiers, and improves the parser speed.
idspaceclauses in their headers.
nanosetdepency, which was not useful anymore in Python 3.8 and caused issues with multithreading when processing OBO frames in parallel.
Synonym.xrefsnow returns a mutable set that can be used to add
Xrefto the synonym directly.
Ontologyto outlive all of the
Terms created from it.
Term.idproperty missing a return type annotation.
Term.equivalent_tonot returning a
TermSetbut a set of strings.
SuperclassesIteratorto make both use the interal subclassing cache.
Term.is_leafuse internal subclassing cache to make it run in constant time.
TermSet.superclassesmethods to query all
TermSetclass to the top-level
cachekeyword argument for the
to_selfbecause of a typo.
with_selfto disable reflexivity of
TermSetclass which stores a set of terms efficiently while providing some useful shortcuts to access the underlying data.
Term.superclassesto a dedicated iterator class in the
Synonym.typesetter leading to a potential bug when the given
fastoboserializer crashing on namespace clauses because of a type issue.
fastoboparsers using data version clauses as format version clauses.
v0.7.0, switching parser implementation to use multi-threading in order to speedup the parser process.
OboSerializeroccasionaly missing lines between term and typedef frames.
RdfXMLParsercrashing on entities with
rdf:labelelements without literal content.
pronto.serializersmodule not being embedded in Wheel distribution.
Entity.add_synonymmethod to create a new synonym and add it to an entity.
@roundreprnow adds a minimal docstring to the generated
Ontologycaches subclassing relationships to greatly improve performance of
Entitysubclasses now store their
iddirectly to improve performance.
queue.Queueas a LIFO structure since thread-safety is not needed.
chardetresult is now used even when prediction confidence is under 100% to detect encoding of the handle passed to
Synonym.typegetter crashing on
RdfXMLParsercrashing on synonymtypedefs without scope specifiers.
ValueErrorwhen given an identifier already in the knowledge graph.
Entityin non-typechecking runtime.
synonymtypedefas annotation properties in
owl:VersionIrifor specification of ontology data version.
PropertyValueclasses, based on the lexicographic order of their serialization.
Ontology.dumpsmethods to serialize an ontology in obo or obojson format.
Metadatanot storing optional description of ID spaces if any.
fastobo-derived parsers will not create a new entity if one exists in the graph of dependencies already.
pronto.warningsand the complete warnings hierarchy.
Ontology.__getitem__can also access entities from imports.
SynonymTypecompare only based on their ID.
Definitioncompare only based on their textual value.
reprimplementation that should roundtrip most of the time.
Term.rparentsand stop making direction assumptions on relationships.