althonos/pronto: 2.2.3
Contributors
Description
pronto
A Python frontend to ontologies.
🗺️ Overview
Pronto is a Python library to parse, browse, create, and export ontologies, supporting several ontology languages and formats. It implement the specifications of the Open Biomedical Ontologies 1.4 in the form of an safe high-level interface. If you’re only interested in parsing OBO or OBO Graphs document, you may wish to consider fastobo instead.
🏳️ Supported Languages
- Open Biomedical Ontologies 1.4. Because this format is fairly new, not all OBO ontologies can be parsed at the moment. See the OBO Foundry roadmap listing the compliant ontologies, and don’t hesitate to contact their developers to push adoption forward.
- OBO Graphs in JSON format. The format is not yet stabilized to the results may change from file to file.
- Ontology Web Language 2 in RDF/XML format. OWL2 ontologies are reverse translated to OBO using the mapping defined in the OBO 1.4 Semantics.
🔧 Installing
Installing with pip is the easiest:
# pip install pronto # if you have the admin rights
$ pip install pronto --user # install it in a user-site directory
There is also a conda recipe in the bioconda channel:
$ conda install -c bioconda pronto
Finally, a development version can be installed from GitHub using setuptools, provided you have the right dependencies installed already:
$ git clone https://github.com/althonos/pronto
$ cd pronto
# python setup.py install
💡 Examples
If you’re only reading ontologies, you’ll only use the Ontology class, which is the main entry point.
>>> from pronto import Ontology
It can be instantiated from a path to an ontology in one of the supported formats, even if the file is compressed:
>>> go = Ontology("tests/data/go.obo.gz")
Loading a file from a persistent URL is also supported, although you may also want to use the Ontology.from_obo_library method if you’re using persistent URLs a lot:
>>> cl = Ontology("http://purl.obolibrary.org/obo/cl.obo")
>>> stato = Ontology.from_obo_library("stato.owl")
🏷️ Get a term by accession
Ontology objects can be used as mappings to access any entity they contain from their identifier in compact form:
>>> cl['CL:0002116']
Term('CL:0002116', name='B220-low CD38-positive unswitched memory B cell')
🖊️ Create a new term from scratch
We can load an ontology, and edit it locally. Here, we add a new protein class to the Protein Ontology.
>>> pr = Ontology.from_obo_library("pr.obo")
>>> brh = ms.create_term("PR:XXXXXXXX")
>>> brh.name = "Bacteriorhodopsin"
>>> brh.superclasses().add(pr["PR:000001094"]) # is a rhodopsin-like G-protein
>>> brh.disjoint_from.add(pr["PR:000036194"]) # disjoint from eukaryotic proteins
✏️ Convert an OWL ontology to OBO format
The Ontology.dump method can be used to serialize an ontology to any of the supported formats (currently OBO and OBO JSON):
>>> edam = Ontology("http://edamontology.org/EDAM.owl")
>>> with open("edam.obo", "wb") as f:
... edam.dump(f, format="obo")
🌿 Find ontology terms without subclasses
The terms method of Ontology instances can be used to iterate over all the terms in the ontology (including the ones that are imported). We can then use the is_leaf method of Term objects to check is the term is a leaf in the class inclusion graph.
>>> ms = Ontology("ms.obo")
>>> for term in ms.terms():
... if term.is_leaf():
... print(term.id)
MS:0000000
MS:1000001
...
📖 API Reference
A complete API reference can be found in the online documentation, or directly from the command line using pydoc:
$ pydoc pronto.Ontology
📜 License
This library is provided under the open-source MIT license. Please cite this library if you are using it in a scientific context using the following DOI: 10.5281/zenodo.595572
📒 Changelog
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
- 2.2.3 - 2020-07-31
- Changed:
- Replaced
frozendictwithimmutabledict(#90). - Bumped
fastobodependency tov0.9.0to support inline comments. - Parsers will now process their imports in parallel using a thread pool. ### Fixed
- Argument type checking in view layer is now disabled during the parsing phase to reduce overhead.
- Replaced
- Changed:
- 2.2.2 - 2020-07-18
- Added:
- Extraction of basic relationships from RDF/XML documents.
- Fixed:
- Erroneous type annotations on
Term.subclassesandTerm.superclasses. - Bug with
Term.equivalent_tosetter crashing with aNameError. - Bug with
Entity.synonymssetter not extracting synonym data.
- Erroneous type annotations on
- Added:
- 2.2.1 - 2020-06-17
- Fixed:
- Extraction of subclasses/superclasses hierarchy from nested imports.
- Serialization of OBO frames not being done in order.
- Parsing issue with
anti_symmetricclauses in OBO typedefs. - Xrefs not being extracted when declared as axioms in RDF/XML documents.
ResourceWarningwhen creatingOntologyfrom file-handles not mapping to a filesystem location.
- Fixed:
- 2.2.0 - 2020-06-17
- Added:
threadsparameter toOntologyconstructor to control the number of threads used by parsers supporting multithreading (OBO and OBO JSON at the moment).- Deprecation warnings for suspected uses of the
is_apseudo-relationship since subclasses/superclasses is now to be handled by the ownerOntology. - Support for subclass/superclass edition directly from the objects returned by
Term.subclasses()andTerm.superclasses(). (#84)
- Changed:
- Updated
fastobotov0.8, which reduce memory footprint of identifiers, and improves the parser speed. - Improved OBO parser performance using threading plus zero-copy validation of identifiers on
Xrefinstantiation. - Improved performance in debug mode by having the typechecker only extract the wrapped function signature once.
- Updated
- Fixed:
- OBO parser crashing on files containing
idspaceclauses in their headers. - Reference management issue with binary operations of
TermSet.
- OBO parser crashing on files containing
- Removed:
nanosetdepency, which was not useful anymore in Python 3.8 and caused issues with multithreading when processing OBO frames in parallel.
- Added:
- 2.1.0 - 2020-03-23
- Added:
- Changed:
Synonym.xrefsnow returns a mutable set that can be used to addXrefto the synonym directly.
- Fixed:
- 2.0.1 - 2020-02-19
- Fixed
- Internal handling of ontology data forcing an
Ontologyto outlive all of theTerms created from it. Term.idproperty missing a return type annotation.Term.equivalent_tonot returning aTermSetbut a set of strings.
- Internal handling of ontology data forcing an
- Changed
- Refactored implementation of
SubclassesIteratorand
SuperclassesIteratorto make both use the interal subclassing cache. - Make
Term.is_leafuse internal subclassing cache to make it run in constant time.
- Refactored implementation of
- Fixed
- 2.0.0 - 2020-02-14
- Added:
TermSet.subclassesandTermSet.superclassesmethods to query all
the subclasses / superclasses of allTerm.TermSetclass to the top-levelprontomodule.- Dynamic management of subclassing cache for the
Ontologyclass. - Setters for
Term.consider,Term.union_ofandTerm.intersection_of.
- Removed:
cachekeyword argument for theOntology.
- Fixed:
SuperclassesIterator.to_setbeing namedto_selfbecause of a typo.- Several bugs affecting the
fastobo-backed serializer.
- Added:
- 1.2.0 - 2020-02-10
- Added:
- Parameter
with_selfto disable reflexivity ofTerm.subclassesandTerm.superclassesiterators. TermSetclass which stores a set of terms efficiently while providing some useful shortcuts to access the underlying data.
- Parameter
- Changed:
- Moved code of
Term.subclassesandTerm.superclassesto a dedicated iterator class in thepronto.logicsubmodule. - Dropped
contexterrequirement.
- Moved code of
- Fixed:
- Fix a typo in
Synonym.typesetter leading to a potential bug when the giventypeisNone. - Fix miscellaneous bugs found with
mypy. fastoboserializer crashing on namespace clauses because of a type issue.fastoboparsers using data version clauses as format version clauses.
- Fix a typo in
- Added:
- 1.1.5 - 2020-01-25
- Changed:
- Bumped
fastobotov0.7.0, switching parser implementation to use multi-threading in order to speedup the parser process.
- Bumped
- Changed:
- 1.1.4 - 2020-01-21
- Added:
- Explicit support for Python 3.8.
- Support for Windows-style line endings (#53)
- Added:
- 1.1.3 - 2019-11-10
- Fixed:
- Handling of some clauses in
FastoboParser. OboSerializeroccasionaly missing lines between term and typedef frames.
- Handling of some clauses in
- Added:
- Missing docstrings to some
Entityproperties.
- Missing docstrings to some
- Fixed:
- 1.1.2 - 2019-10-30
- Fixed:
RdfXMLParsercrashing on entities withrdf:labelelements without literal content.
- Fixed:
- 1.1.1 - 2019-10-29
- Fixed:
pronto.serializersmodule not being embedded in Wheel distribution.
- Fixed:
- 1.1.0 - 2019-10-24
- Added:
Entity.add_synonymmethod to create a new synonym and add it to an entity.@roundreprnow adds a minimal docstring to the generated__repr__method.Ontologycaches subclassing relationships to greatly improve performance ofTerm.subclasses.
- Changed:
Entitysubclasses now store theiriddirectly to improve performance.Term.subclassesandTerm.superclassesusecollections.dequeinstead ofqueue.Queueas a LIFO structure since thread-safety is not needed.chardetresult is now used even when prediction confidence is under 100% to detect encoding of the handle passed toOntology.
- Fixed:
SynonymTypecomparison implementation.Synonym.typegetter crashing ontypenot beingNone.RdfXMLParsercrashing on synonymtypedefs without scope specifiers.
- Added:
- 1.0.0 - 2019-10-11
- Fixed:
- Issues with typedef serialization in
FastoboSerializer. Ontology.create_termandOntology.create_relationshipnot raisingValueErrorwhen given an identifier already in the knowledge graph.- Signature of
BaseSerializer.dumpto removeencodingargument. - Missing
__slots__inEntityin non-typechecking runtime.
- Issues with typedef serialization in
- Changed:
- Bumped
fastoborequirement tov0.6.0.
- Bumped
- Fixed:
- 1.0.0-alpha.3 - 2019-10-10
- Added:
- Extraction of
oboInOwl:considerannotation inRdfXMLParser. - Extraction of
oboInOwl:savedByannotation inRdfXMLParser. - Extraction of
subsetdefandsynonymtypedefas annotation properties inRdfXMLParser. - Support for
doap:Versioninstead ofowl:VersionIrifor specification of ontology data version. - Proper comparison of
PropertyValueclasses, based on the lexicographic order of their serialization. Ontology.dumpandOntology.dumpsmethods to serialize an ontology in obo or obojson format.
- Extraction of
- Fixed:
Metadatanot storing optional description of ID spaces if any.- Wrong type hints in
RelationshipData.equivalent_to_chain.
- Changed:
- Added type checking to some more property setters.
- Avoid using
networkxinTerm.subclasses. fastobo-derived parsers will not create a new entity if one exists in the graph of dependencies already.- Exposed
pronto.warningsand the complete warnings hierarchy.
- Added:
- 1.0.0-alpha.2 - 2019-10-03
- Added
- Support for extraction of relationships from OWL/XML files to
OwlXMLParser.
- Support for extraction of relationships from OWL/XML files to
- Fixed:
- Type hints of
RelationshipData.synonymsattribute.
- Type hints of
- Added
- 1.0.0-alpha.1 - 2019-10-02
- Changed:
- Dropped support for Python earlier than
3.6. - Brand new data model that follow the OBO 1.4 object model.
- Partial OWL XML parser implementation using the OBO 1.4 semantics.
- New OBO parser implementation based on
fastobo. - Imports are properly separated from the top-level ontology.
Ontology.__getitem__can also access entities from imports.Term,Relationship,Xref,SynonymTypecompare only based on their ID.Subset,Definitioncompare only based on their textual value.
- Dropped support for Python earlier than
- Added:
- Support for OBO JSON parser based on
fastobo. - Provisional
mypytype hints. - Type checking for most properties in
__debug__mode. - Proper
reprimplementation that should roundtrip most of the time. - Detection of file format and encoding based on buffer content.
- Support for OBO JSON parser based on
- Removed:
- OBO and JSON serialization support (for now).
Term.rchildrenandTerm.rparentsand stop making direction assumptions on relationships.
- Changed:
Files
althonos/pronto-v2.2.3.zip
Files
(837.0 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:111c98705e5af3096f4c2aab758637be
|
837.0 kB | Preview Download |
Additional details
Related works
- Cites
- Poster: 10.7490/f1000research.1117405.1 (DOI)
- Working paper: http://owlcollab.github.io/oboformat/doc/obo-syntax.html (URL)
- Is cited by
- Report: 10.5281/zenodo.3492231 (DOI)
- Is supplement to
- Software: https://github.com/althonos/pronto/tree/v2.2.3 (URL)
- Is supplemented by
- Software documentation: https://pronto.readthedocs.io/en/v2.2.3/ (URL)