althonos/pyrodigal: 0.4.3
Description
๐ฅ Pyrodigal
Python interface to Prodigal, an ORF finder for genomes, progenomes and metagenomes.
๐บ๏ธ Overview
Pyrodigal is a Python module that provides bindings to Prodigal using Cython. It directly interacts with the Prodigal internals, which has the following advantages:
- single dependency: Pyrodigal is distributed as a Python package, so you can add it as a dependency to your project, and stop worrying about the Prodigal binary being present on the end-user machine.
- no intermediate files: everything happens in memory, in a Python object you fully control, so you donโt have to manually import and export sequences to pass to the Prodigal CLI.
- no input formatting: sequences are manipulated directly as strings, which leverages the issue of formatting your input to FASTA for Prodigal.
- lower memory usage: Pyrodigal is slightly more conservative when it comes to using memory, which can help process very large sequences.
๐ Features
The library now features everything needed to run Prodigal in single or metagenomic mode. It is still missing some features of the CLI:
- โ๏ธ Metagenomic mode
- โ๏ธ Single mode
- โ External training file support (
-t
flag) - โ Region masking (
-m
flag)
๐ Memory
Contrary to the Prodigal command line, Pyrodigal attempts to be more conservative about memory usage. This means that most of the allocations will be lazy, and that some functions will reallocate their results to exact-sized arrays when itโs possible. This leads to Pyrodigal using about 30% less memory, but with some more overhead
๐งถ Thread-safety
pyrodigal.Pyrodigal
instances are thread-safe, and use an internal lock to prevent parallel calls to their methods from overwriting the internal buffers. However, a better solution to process sequences in parallel is to use a consumer/worker pattern, and have on Pyrodigal
instance in each worker. Using a pool spawning Pyrodigal
instances on the fly is also fine, but prevents recycling memory:
with multiprocessing.pool.ThreadPool() as pool:
pool.map(lambda s: Pyrodigal(meta=True).find_genes(s), sequences)
๐ง Installing
Pyrodigal can be installed directly from PyPI, which hosts some pre-built CPython wheels for x86-64 Unix and Windows platforms, as well as the code required to compile from source with Cython:
$ pip install --user pyrodigal
Otherwise, Pyrodigal is also available as a Bioconda package:
$ conda install -c bioconda pyrodigal
๐ก Example
Using Biopython, load a sequence from a GenBank file, use Prodigal to find all genes it contains, and print the proteins in FASTA format:
record = Bio.SeqIO.read("sequence.gbk", "genbank")
p = pyrodigal.Pyrodigal(meta=True)
for i, gene in enumerate(p.find_genes(str(record.seq))):
print(f"> {record.id}_{i+1}")
print(textwrap.fill(record.translate()))
To use Pyrodigal
in single mode, you must explicitly call Pyrodigal.train
with the sequence you want to use for training before trying to find genes:
p = pyrodigal.Pyrodigal()
p.train(str(record.seq))
genes = p.find_genes(str(record.seq))
๐ License
This library, like the original Prodigal software, is provided under the GNU General Public License v3.0.
๐ Changelog
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
- v0.4.3 - 2021-03-01
- Fixed:
- Buffer overflow when running in
meta
mode on a sequence too small to have any dynamic programming nodes.
- Buffer overflow when running in
- Fixed:
- v0.4.2 - 2021-02-07
- Fixed:
- Buffer overflow coming from the node array, caused by an incorrect estimation of the node count from the sequence length.
- Fixed:
- v0.4.1 - 2021-01-07
- Removed:
- Python 3.5 from the project metadata (the code was only compatible with Python 3.6+ already because of f-strings).
-
Fixed:
- Broken linking of static
libprodigal
against the_pyrodigal
extension on some OSX environments (bioconda/bioconda-recipes#25568).
- Broken linking of static
- Removed:
- v0.4.0 - 2021-01-06
- Added:
- Option to change the translation table to any allowed number in
Gene.translate
- Option to change the translation table to any allowed number in
- Changed:
trans_table
keyword argument toPyrodigal.train
has been renamed totranslation_table
.
- Added:
- v0.3.2 - 2020-11-27
- Fixed:
- Broken compilation of PyPy wheels in Travis-CI.
- Fixed:
- v0.3.1 - 2020-11-27
- Added:
- Link to Zenodo record in
README.md
Typing :: Typed
classifier to the PyPI metadata.- Explicit support for Python 3.9.
- Link to Zenodo record in
- Changed:
- Streamlined compilation process when building from source distribution.
- Added:
- v0.3.0 - 2020-09-07
- Added:
- Thread-safety for all
Pyrodigal
methods.
- Thread-safety for all
- Fixed:
- Reduced total amount of memory used to allocated dynamic programming nodes for a given sequence.
- Added:
- v0.2.4 - 2020-09-04
- Added:
- Precompiled wheels for Windows x86-64 platform. ### Changed
- Compilation of large
Prodigal/training.c
file is now done in chunks and usesstatic const
to reduce build time.
- Added:
- v0.2.3 - 2020-08-09
- Fixed:
- Buffer overflow issue with Pyrodigal in
closed=False
mode.
- Buffer overflow issue with Pyrodigal in
- Fixed:
- v0.2.2 - 2020-07-14
- Added:
- Access to the translation table of a
Gene
object.
- Access to the translation table of a
- Added:
- v0.2.1 - 2020-05-29
- Fixed:
- Memory issues causing PyPy to crash when using
Pyrodigal
in single mode.
- Memory issues causing PyPy to crash when using
- Fixed:
- v0.2.0 - 2020-05-28
- Added:
- Support for Prodigalโs single mode.
- Added:
- v0.1.1 - 2020-04-30
- Added:
- Distribution of CPython wheels for ManyLinux2010 and OSX platforms.
- Added:
- v0.1.0 - 2020-04-27
- Initial release.
Files
althonos/pyrodigal-v0.4.3.zip
Files
(151.9 kB)
Name | Size | Download all |
---|---|---|
md5:96accb1e827708e8443b19c8ed734c94
|
151.9 kB | Preview Download |
Additional details
Related works
- Cites
- Journal article: 10.1186/1471-2105-11-119 (DOI)
- Is supplement to
- Software: https://github.com/althonos/pyrodigal/tree/v0.4.3 (URL)