Published August 9, 2025 | Version v0.2
Software Open

bpucker/bHLH_annotator: bHLH_annotator

  • 1. Data Science in Biomedicine, TU Braunschweig
  • 2. Molecular Plant Biology, Institute of Cellular and Molecular Botany (IZMB), University of Bonn

Description

Reference:

Thoben C. and Pucker B. (2023). Automatic annotation of the bHLH gene family in plants. BMC Genomics 24, 780 (2023). doi: https://doi.org/10.1186/s12864-023-09877-2.

Motivation

The bHLH_annotator allows the automatic identification and functional annotation of the bHLH transcription factor family in novel plant sequence data sets. Coding sequences or peptide sequences derived from a de novo genome and transcriptome assembly can be analyzed with this pipeline. A phylogenetic approach is performed for the annotation of the candidates, based on a bait collection of bHLHs and outgroup sequences (non-bHLHs with a high sequence similarity to bHLHs).

Workflow

For the identification of initial bHLH candidates (step 1), two search options are available: The BLAST option (default) identifies candidates based on sequence similarity to the bait collection. This option is recommended if also bHLHs with a lost domain should be identified. The HMMER option selects candidates which harbour the HMM motif of the bait collection are identified. This includes candidates with a high specification, that are not represented by the bait collection. The initial candidates are sorted out based on their phylogenetic relationship to the bHLH and outgroup baits (step 2). The functional annotation of the candidates is assigned by identifying ortholog reference sequences (step 4). As default references, annotated A. thaliana bHLHs are used. Further, bHLH-specific characteristics are analyzed: Presence of the bHLH domain (step 5), DNA-binding properties (step 5), and the identification of subfamily specific motifs (step 6). A phylogenetic tree is constructed with A. thaliana bHLHs to allow a detailed investigation on the foundation of a well-studied species (step 7). For large datasets like de novo transcriptome assemblies, the collapse option is recommended (step 8 and 9) which collapses paralogous groups by defining a representative candidate. The parallel option is also recommenced to reduce the pipeline runtime and consumption of memory resources during classification. The data files used in each step can be customised by the user to allow an investigation suiting the own research purpose.

Usage

The bHLH_annotator is also available on the BioInfToolServer. Full description of setup and usage of the pipeline: https://github.com/bpucker/bHLH_annotator. A more detailed description of the pipeline and the bait collection can be found here: https://doi.org/10.1186/s12864-023-09877-2

Files

bpucker/bHLH_annotator-v0.2.zip

Files (3.9 MB)

Name Size Download all
md5:f70111a70d45483c55b6baf1d31f1b80
3.9 MB Preview Download

Additional details

Related works