althonos/pronto: v2.3.1
Contributors
Description
pronto
A Python frontend to ontologies.
๐บ๏ธ Overview
Pronto is a Python library to parse, browse, create, and export ontologies, supporting several ontology languages and formats. It implement the specifications of the Open Biomedical Ontologies 1.4 in the form of an safe high-level interface. If you’re only interested in parsing OBO or OBO Graphs document, you may wish to consider fastobo
instead.
๐ณ๏ธ Supported Languages
- Open Biomedical Ontologies 1.4. Because this format is fairly new, not all OBO ontologies can be parsed at the moment. See the OBO Foundry roadmap listing the compliant ontologies, and don’t hesitate to contact their developers to push adoption forward.
- OBO Graphs in JSON format. The format is not yet stabilized to the results may change from file to file.
- Ontology Web Language 2 in RDF/XML format. OWL2 ontologies are reverse translated to OBO using the mapping defined in the OBO 1.4 Semantics.
๐ง Installing
Installing with pip
is the easiest:
# pip install pronto # if you have the admin rights
$ pip install pronto --user # install it in a user-site directory
There is also a conda
recipe in the bioconda
channel:
$ conda install -c bioconda pronto
Finally, a development version can be installed from GitHub using setuptools
, provided you have the right dependencies installed already:
$ git clone https://github.com/althonos/pronto
$ cd pronto
# python setup.py install
๐ก Examples
If you’re only reading ontologies, you’ll only use the Ontology
class, which is the main entry point.
>>> from pronto import Ontology
It can be instantiated from a path to an ontology in one of the supported formats, even if the file is compressed:
>>> go = Ontology("tests/data/go.obo.gz")
Loading a file from a persistent URL is also supported, although you may also want to use the Ontology.from_obo_library
method if you’re using persistent URLs a lot:
>>> cl = Ontology("http://purl.obolibrary.org/obo/cl.obo")
>>> stato = Ontology.from_obo_library("stato.owl")
๐ท๏ธ Get a term by accession
Ontology
objects can be used as mappings to access any entity they contain from their identifier in compact form:
>>> cl['CL:0002116']
Term('CL:0002116', name='B220-low CD38-positive unswitched memory B cell')
๐ Get a term by alternate ID
Retrieval of an entity by any alternate ID it was declared from is possible using the same syntax:
>>> hp = pronto.Ontology.from_obo_library("hp.obo")
>>> 'HP:0001198' in hp['HP:0009882'].alternate_ids
True
>>> hp['HP:0001198']
Term('HP:0009882', name='Short distal phalanx of finger')
๐๏ธ Create a new term from scratch
We can load an ontology, and edit it locally. Here, we add a new protein class to the Protein Ontology.
>>> pr = Ontology.from_obo_library("pr.obo")
>>> brh = ms.create_term("PR:XXXXXXXX")
>>> brh.name = "Bacteriorhodopsin"
>>> brh.superclasses().add(pr["PR:000001094"]) # is a rhodopsin-like G-protein
>>> brh.disjoint_from.add(pr["PR:000036194"]) # disjoint from eukaryotic proteins
โ๏ธ Convert an OWL ontology to OBO format
The Ontology.dump
method can be used to serialize an ontology to any of the supported formats (currently OBO and OBO JSON):
>>> edam = Ontology("http://edamontology.org/EDAM.owl")
>>> with open("edam.obo", "wb") as f:
... edam.dump(f, format="obo")
๐ฟ Find ontology terms without subclasses
The terms
method of Ontology
instances can be used to iterate over all the terms in the ontology (including the ones that are imported). We can then use the is_leaf
method of Term
objects to check is the term is a leaf in the class inclusion graph.
>>> ms = Ontology("ms.obo")
>>> for term in ms.terms():
... if term.is_leaf():
... print(term.id)
MS:0000000
MS:1000001
...
๐ API Reference
A complete API reference can be found in the online documentation, or directly from the command line using pydoc
:
$ pydoc pronto.Ontology
๐ License
This library is provided under the open-source MIT license. Please cite this library if you are using it in a scientific context using the following DOI: 10.5281/zenodo.595572
๐ Changelog
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
- 2.3.1 - 2020-09-21
- Fixed:
pronto.entity
package not being included in source distribution.
- Fixed:
- 2.3.0 - 2020-09-21
- Added:
- Retrieval of entities via their alternate IDs on the source
Ontology
. - Direct edition of entity relationships via the
Relationships
view. __all__
attribute to all modules of the data model.RelationshipSet
container likeTermSet
with shortcut attributes and proxying of actualRelationship
instances.Relationship.subproperties
andRelationship.superproperties
methods to add, remove, clear and iterate over the subproperties and superproperties of aRelationship
instance.Ontology.synonym_types
method to count (viaSizedIterator
) and iterate over the synonym types of an ontology and all of its imports.Ontology.get_synonym_type
method to retrieve a single synonym type by ID from an ontology or one of its imports.- Changed
- Management of sub-properties / super-properties is now consistent with the management of subclasses / superclasses.
consider
,disjoint_from
,disjoint_over
,equivalent_to
,replaced_by
transitive_over
andunion_of
properties ofRelationship
now return aRelationshipSet
.
- Retrieval of entities via their alternate IDs on the source
- Fixed:
- Outdated documentation in
Term.subclasses
describing the performances of the previous algorithm. - Possible
AttributeError
with the setter of theEntity.synonyms
property. - Issue with synonym types declared in imported ontologies not being usable with synonyms of the actual ontology.
- Various type annotations not updated since version 2.2.2.
- Outdated documentation in
- Added:
- 2.2.4 - 2020-09-11
- Changed:
- Make
Entity.annotations
return a mutable set and add a setter.
- Make
- Fixed:
Term.relationship
erroneously updating theOntology._lineage
cache.- Unneeded code in
pronto.serializers._fastobo
handlingis_a
clauses.
- Changed:
- 2.2.3 - 2020-07-31
- Changed:
- Replaced
frozendict
withimmutabledict
(#90). - Bumped
fastobo
dependency tov0.9.0
to support inline comments. - Parsers will now process their imports in parallel using a thread pool. ### Fixed
- Argument type checking in view layer is now disabled during the parsing phase to reduce overhead.
- Replaced
- Changed:
- 2.2.2 - 2020-07-18
- Added:
- Extraction of basic relationships from RDF/XML documents.
- Fixed:
- Erroneous type annotations on
Term.subclasses
andTerm.superclasses
. - Bug with
Term.equivalent_to
setter crashing with aNameError
. - Bug with
Entity.synonyms
setter not extracting synonym data.
- Erroneous type annotations on
- Added:
- 2.2.1 - 2020-06-17
- Fixed:
- Extraction of subclasses/superclasses hierarchy from nested imports.
- Serialization of OBO frames not being done in order.
- Parsing issue with
anti_symmetric
clauses in OBO typedefs. - Xrefs not being extracted when declared as axioms in RDF/XML documents.
ResourceWarning
when creatingOntology
from file-handles not mapping to a filesystem location.
- Fixed:
- 2.2.0 - 2020-06-17
- Added:
threads
parameter toOntology
constructor to control the number of threads used by parsers supporting multithreading (OBO and OBO JSON at the moment).- Deprecation warnings for suspected uses of the
is_a
pseudo-relationship since subclasses/superclasses is now to be handled by the ownerOntology
. - Support for subclass/superclass edition directly from the objects returned by
Term.subclasses()
andTerm.superclasses()
. (#84)
- Changed:
- Updated
fastobo
tov0.8
, which reduce memory footprint of identifiers, and improves the parser speed. - Improved OBO parser performance using threading plus zero-copy validation of identifiers on
Xref
instantiation. - Improved performance in debug mode by having the typechecker only extract the wrapped function signature once.
- Updated
- Fixed:
- OBO parser crashing on files containing
idspace
clauses in their headers. - Reference management issue with binary operations of
TermSet
.
- OBO parser crashing on files containing
- Removed:
nanoset
depency, which was not useful anymore in Python 3.8 and caused issues with multithreading when processing OBO frames in parallel.
- Added:
- 2.1.0 - 2020-03-23
- Added:
- Changed:
Synonym.xrefs
now returns a mutable set that can be used to addXref
to the synonym directly.
- Fixed:
- 2.0.1 - 2020-02-19
- Fixed
- Internal handling of ontology data forcing an
Ontology
to outlive all of theTerm
s created from it. Term.id
property missing a return type annotation.Term.equivalent_to
not returning aTermSet
but a set of strings.
- Internal handling of ontology data forcing an
- Changed
- Refactored implementation of
SubclassesIterator
and
SuperclassesIterator
to make both use the interal subclassing cache. - Make
Term.is_leaf
use internal subclassing cache to make it run in constant time.
- Refactored implementation of
- Fixed
- 2.0.0 - 2020-02-14
- Added:
TermSet.subclasses
andTermSet.superclasses
methods to query all
the subclasses / superclasses of allTerm
.TermSet
class to the top-levelpronto
module.- Dynamic management of subclassing cache for the
Ontology
class. - Setters for
Term.consider
,Term.union_of
andTerm.intersection_of
.
- Removed:
cache
keyword argument for theOntology
.
- Fixed:
SuperclassesIterator.to_set
being namedto_self
because of a typo.- Several bugs affecting the
fastobo
-backed serializer.
- Added:
- 1.2.0 - 2020-02-10
- Added:
- Parameter
with_self
to disable reflexivity ofTerm.subclasses
andTerm.superclasses
iterators. TermSet
class which stores a set of terms efficiently while providing some useful shortcuts to access the underlying data.
- Parameter
- Changed:
- Moved code of
Term.subclasses
andTerm.superclasses
to a dedicated iterator class in thepronto.logic
submodule. - Dropped
contexter
requirement.
- Moved code of
- Fixed:
- Fix a typo in
Synonym.type
setter leading to a potential bug when the giventype
isNone
. - Fix miscellaneous bugs found with
mypy
. fastobo
serializer crashing on namespace clauses because of a type issue.fastobo
parsers using data version clauses as format version clauses.
- Fix a typo in
- Added:
- 1.1.5 - 2020-01-25
- Changed:
- Bumped
fastobo
tov0.7.0
, switching parser implementation to use multi-threading in order to speedup the parser process.
- Bumped
- Changed:
- 1.1.4 - 2020-01-21
- Added:
- Explicit support for Python 3.8.
- Support for Windows-style line endings (#53)
- Added:
- 1.1.3 - 2019-11-10
- Fixed:
- Handling of some clauses in
FastoboParser
. OboSerializer
occasionaly missing lines between term and typedef frames.
- Handling of some clauses in
- Added:
- Missing docstrings to some
Entity
properties.
- Missing docstrings to some
- Fixed:
- 1.1.2 - 2019-10-30
- Fixed:
RdfXMLParser
crashing on entities withrdf:label
elements without literal content.
- Fixed:
- 1.1.1 - 2019-10-29
- Fixed:
pronto.serializers
module not being embedded in Wheel distribution.
- Fixed:
- 1.1.0 - 2019-10-24
- Added:
Entity.add_synonym
method to create a new synonym and add it to an entity.@roundrepr
now adds a minimal docstring to the generated__repr__
method.Ontology
caches subclassing relationships to greatly improve performance ofTerm.subclasses
.
- Changed:
Entity
subclasses now store theirid
directly to improve performance.Term.subclasses
andTerm.superclasses
usecollections.deque
instead ofqueue.Queue
as a LIFO structure since thread-safety is not needed.chardet
result is now used even when prediction confidence is under 100% to detect encoding of the handle passed toOntology
.
- Fixed:
SynonymType
comparison implementation.Synonym.type
getter crashing ontype
not beingNone
.RdfXMLParser
crashing on synonymtypedefs without scope specifiers.
- Added:
- 1.0.0 - 2019-10-11
- Fixed:
- Issues with typedef serialization in
FastoboSerializer
. Ontology.create_term
andOntology.create_relationship
not raisingValueError
when given an identifier already in the knowledge graph.- Signature of
BaseSerializer.dump
to removeencoding
argument. - Missing
__slots__
inEntity
in non-typechecking runtime.
- Issues with typedef serialization in
- Changed:
- Bumped
fastobo
requirement tov0.6.0
.
- Bumped
- Fixed:
- 1.0.0-alpha.3 - 2019-10-10
- Added:
- Extraction of
oboInOwl:consider
annotation inRdfXMLParser
. - Extraction of
oboInOwl:savedBy
annotation inRdfXMLParser
. - Extraction of
subsetdef
andsynonymtypedef
as annotation properties inRdfXMLParser
. - Support for
doap:Version
instead ofowl:VersionIri
for specification of ontology data version. - Proper comparison of
PropertyValue
classes, based on the lexicographic order of their serialization. Ontology.dump
andOntology.dumps
methods to serialize an ontology in obo or obojson format.
- Extraction of
- Fixed:
Metadata
not storing optional description of ID spaces if any.- Wrong type hints in
RelationshipData.equivalent_to_chain
.
- Changed:
- Added type checking to some more property setters.
- Avoid using
networkx
inTerm.subclasses
. fastobo
-derived parsers will not create a new entity if one exists in the graph of dependencies already.- Exposed
pronto.warnings
and the complete warnings hierarchy.
- Added:
- 1.0.0-alpha.2 - 2019-10-03
- Added
- Support for extraction of relationships from OWL/XML files to
OwlXMLParser
.
- Support for extraction of relationships from OWL/XML files to
- Fixed:
- Type hints of
RelationshipData.synonyms
attribute.
- Type hints of
- Added
- 1.0.0-alpha.1 - 2019-10-02
- Changed:
- Dropped support for Python earlier than
3.6
. - Brand new data model that follow the OBO 1.4 object model.
- Partial OWL XML parser implementation using the OBO 1.4 semantics.
- New OBO parser implementation based on
fastobo
. - Imports are properly separated from the top-level ontology.
Ontology.__getitem__
can also access entities from imports.Term
,Relationship
,Xref
,SynonymType
compare only based on their ID.Subset
,Definition
compare only based on their textual value.
- Dropped support for Python earlier than
- Added:
- Support for OBO JSON parser based on
fastobo
. - Provisional
mypy
type hints. - Type checking for most properties in
__debug__
mode. - Proper
repr
implementation that should roundtrip most of the time. - Detection of file format and encoding based on buffer content.
- Support for OBO JSON parser based on
- Removed:
- OBO and JSON serialization support (for now).
Term.rchildren
andTerm.rparents
and stop making direction assumptions on relationships.
- Changed:
Files
althonos/pronto-v2.3.1.zip
Files
(877.9 kB)
Name | Size | Download all |
---|---|---|
md5:e1eae2559ece69dc2feed4ab23a80211
|
877.9 kB | Preview Download |