There is a newer version of the record available.

Published March 1, 2021 | Version v0.4.3
Software Open

althonos/pyrodigal: 0.4.3

  • 1. EMBL, @zellerlab

Description

๐Ÿ”ฅ Pyrodigal

Python interface to Prodigal, an ORF finder for genomes, progenomes and metagenomes.

๐Ÿ—บ๏ธ Overview

Pyrodigal is a Python module that provides bindings to Prodigal using Cython. It directly interacts with the Prodigal internals, which has the following advantages:

  • single dependency: Pyrodigal is distributed as a Python package, so you can add it as a dependency to your project, and stop worrying about the Prodigal binary being present on the end-user machine.
  • no intermediate files: everything happens in memory, in a Python object you fully control, so you donโ€™t have to manually import and export sequences to pass to the Prodigal CLI.
  • no input formatting: sequences are manipulated directly as strings, which leverages the issue of formatting your input to FASTA for Prodigal.
  • lower memory usage: Pyrodigal is slightly more conservative when it comes to using memory, which can help process very large sequences.

๐Ÿ“‹ Features

The library now features everything needed to run Prodigal in single or metagenomic mode. It is still missing some features of the CLI:

  • โœ”๏ธ Metagenomic mode
  • โœ”๏ธ Single mode
  • โŒ External training file support (-t flag)
  • โŒ Region masking (-m flag)

๐Ÿ Memory

Contrary to the Prodigal command line, Pyrodigal attempts to be more conservative about memory usage. This means that most of the allocations will be lazy, and that some functions will reallocate their results to exact-sized arrays when itโ€™s possible. This leads to Pyrodigal using about 30% less memory, but with some more overhead

๐Ÿงถ Thread-safety

pyrodigal.Pyrodigal instances are thread-safe, and use an internal lock to prevent parallel calls to their methods from overwriting the internal buffers. However, a better solution to process sequences in parallel is to use a consumer/worker pattern, and have on Pyrodigal instance in each worker. Using a pool spawning Pyrodigal instances on the fly is also fine, but prevents recycling memory:

with multiprocessing.pool.ThreadPool() as pool:
    pool.map(lambda s: Pyrodigal(meta=True).find_genes(s), sequences)

๐Ÿ”ง Installing

Pyrodigal can be installed directly from PyPI, which hosts some pre-built CPython wheels for x86-64 Unix and Windows platforms, as well as the code required to compile from source with Cython:

$ pip install --user pyrodigal

Otherwise, Pyrodigal is also available as a Bioconda package:

$ conda install -c bioconda pyrodigal

๐Ÿ’ก Example

Using Biopython, load a sequence from a GenBank file, use Prodigal to find all genes it contains, and print the proteins in FASTA format:

record = Bio.SeqIO.read("sequence.gbk", "genbank")
p = pyrodigal.Pyrodigal(meta=True)

for i, gene in enumerate(p.find_genes(str(record.seq))):
    print(f"> {record.id}_{i+1}")
    print(textwrap.fill(record.translate()))

To use Pyrodigal in single mode, you must explicitly call Pyrodigal.train with the sequence you want to use for training before trying to find genes:

p = pyrodigal.Pyrodigal()
p.train(str(record.seq))
genes = p.find_genes(str(record.seq))

๐Ÿ“œ License

This library, like the original Prodigal software, is provided under the GNU General Public License v3.0.

๐Ÿ“’ Changelog

The format is based on Keep a Changelog and this project adheres to Semantic Versioning.

  • v0.4.3 - 2021-03-01
    • Fixed:
      • Buffer overflow when running in meta mode on a sequence too small to have any dynamic programming nodes.
  • v0.4.2 - 2021-02-07
    • Fixed:
      • Buffer overflow coming from the node array, caused by an incorrect estimation of the node count from the sequence length.
  • v0.4.1 - 2021-01-07
    • Removed:
      • Python 3.5 from the project metadata (the code was only compatible with Python 3.6+ already because of f-strings).
    • Fixed:

  • v0.4.0 - 2021-01-06
    • Added:
      • Option to change the translation table to any allowed number in Gene.translate
    • Changed:
      • trans_table keyword argument to Pyrodigal.train has been renamed to translation_table.
  • v0.3.2 - 2020-11-27
    • Fixed:
      • Broken compilation of PyPy wheels in Travis-CI.
  • v0.3.1 - 2020-11-27
    • Added:
      • Link to Zenodo record in README.md
      • Typing :: Typed classifier to the PyPI metadata.
      • Explicit support for Python 3.9.
    • Changed:
      • Streamlined compilation process when building from source distribution.
  • v0.3.0 - 2020-09-07
    • Added:
      • Thread-safety for all Pyrodigal methods.
    • Fixed:
      • Reduced total amount of memory used to allocated dynamic programming nodes for a given sequence. 
  • v0.2.4 - 2020-09-04
    • Added:
      • Precompiled wheels for Windows x86-64 platform. ### Changed
      • Compilation of large Prodigal/training.c file is now done in chunks and uses static const to reduce build time.
  • v0.2.3 - 2020-08-09
    • Fixed:
      • Buffer overflow issue with Pyrodigal in closed=False mode.
  • v0.2.2 - 2020-07-14
    • Added:
      • Access to the translation table of a Gene object.
  • v0.2.1 - 2020-05-29
    • Fixed:
      • Memory issues causing PyPy to crash when using Pyrodigal in single mode.
  • v0.2.0 - 2020-05-28
    • Added:
      • Support for Prodigalโ€™s single mode.
  • v0.1.1 - 2020-04-30
    • Added:
      • Distribution of CPython wheels for ManyLinux2010 and OSX platforms.
  • v0.1.0 - 2020-04-27
    • Initial release.

 

Files

althonos/pyrodigal-v0.4.3.zip

Files (151.9 kB)

Name Size Download all
md5:96accb1e827708e8443b19c8ed734c94
151.9 kB Preview Download

Additional details

Related works

Cites
Journal article: 10.1186/1471-2105-11-119 (DOI)
Is supplement to
Software: https://github.com/althonos/pyrodigal/tree/v0.4.3 (URL)