cblaster: a Python package for identifying clustered sequence homologs in NCBI BLAST databases
Description
Identifying clusters of co-located, homologous genes is a commonplace procedure in comparative genomics, for example when looking for gene clusters encoding production of secondary metabolites. cblaster is a Python package that facilitates the identification of such gene clusters across publically available Basic Local Alignment Search Tool (BLAST) databases hosted by the National Center for Biotechnology Information (NCBI). Given either a FASTA file of query sequences, or a collection of valid NCBI sequence identifiers, cblaster is capable of both local (using the DIAMOND protein aligner) and remote (using the NCBI's public BLAST API) BLAST searches as well as retrieval and parsing of results. It leverages the NCBI's Identical Protein Groups (IPG) resource to retrieve the genomic context of each BLAST hit, grouping those that are co-located on genomic scaffolds within user-defined thresholds for intergenic distance and number of conserved sequences. cblaster then provides a human-readable report of its results. cblaster provides a simple command line interface with sensible default options, as well as offering several public methods and classes directly usable in Python code. It is installable from PyPI via pip (https://pypi.org/project/cblaster), and source code is freely available on GitHub (https://www.github.com/gamcil/cblaster) under the MIT license.
Files
Files
(25.0 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:8bd052e315ff7829698dce6608b09111
|
25.0 kB | Download |