SigmoID overview

    SigmoID front end is a GUI application written in Xojo which gives it the expected look and usability on all three supported platforms (OS X, Linux and Windows). The GUI provides the interface for selected programmes from the HMMER, MEME Suite and TransTerm HP, which are responsible for the actual searches. Sequence logos are calculated by WebLogo which is written in Python, hence Python (version 2.7.x) and BioPython (version 1.64 and above) are required. Processing of nhmmer, mast and TransTerm HP output, sequence format conversions and adding regulatory sites to genome annotation are implemented as separate Python scripts. The scripts are called from the GUI, but could easily be used separately and integrated in an annotation pipeline if desired. Detailed installation instructions are provided within distributions for each platform. The source code for the whole SigmoID application is available with GPL 2.0 license.

    SigmoID allows to:
    - get binding site data from specialised databases;
    - visualise binding site alignments with sequence logos;
    - extend, shorten and mask alignments;
    - create optimised hmm profiles from alignments;
    - search bacterial genomes with calibrated (and uncalibrated) hmm profiles;
    - add annotation of promoters and transcription factor binding sites to GenBank-formatted genome files;
    - edit genome annotation.

This version includes 80 calibrated profiles (for 5 sigma factors and 36 TFs) optimised for enterobacterial phytopathogens Pectobacterium spp. and Dickeya dadantii. Efficiency of these profiles will be lower for other bacteria, but with threshold adjustment they may be usable for many enterobacteria.

The search for binding sites is done by nhmmer which is expected to be installed in the default location.

Adding annotations to GenBank files is done by the HmmGen.py script which could be used on its own. BioPython (version 1.64 and up) is required.

At the moment SigmoID is known to work on OS X (10.8-10.11), Windows (Vista, 7 and 8), Ubuntu (12.04 and 14.04). It may also work on other Linux distributions, provided the required libraries are installed.

Installation

    SigmoID relies heavily on Python and will have severely limited functionality without it. The various python scripts included with SigmoID require python version 2.7 and will not work with version 3. Linux and OS X systems should have python installed, while Windows users can download it from python.org. Please note that on Windows you need to modify system PATH environment variable to include path to python. The easiest way to do it is to select the "Add python.exe to path" option in the installer (it's not selected by default!). You may check that python is installed correctly by typing 'python' at the command prompt, which should launch the python interpreter.
    Biopython (v. 1.64+) should be installed on top of python 2.7. You can download the distribution and read the installation details for your system at biopython.org. Please make sure you download and install the correct version for you system and python version. 
    Depending on your system set-up, MEME may require additional python and perl modules. Please check the log pane of SigmoID's window for error messages and install missing modules if necessary. MEME within SigmoID could be called in two ways: as a simple converter of the aligned sequences to MEME format (via the Convert to MEME menu command) or for finding  binding sites within unaligned sequences (via the Find Sites with MEME menu command). The second option generates html output via a perl script and relies on template files that MEME expects to find in certain locations, hence this is only likely to work when MEME is installed on the user system and the path to the system version is appropriately set in SigmoID preferences.
    There are two Linux distributions of SigmoID, for 32-bit and 64-bit systems. Both may require installation of additional libraries. The 32-bit version depends on WebKit version 1 for displaying help and database search results. WebKit1 is actually included with the 32-bit Linux SigmoID distribution; please see the SigmoID.sh file for the correct command to launch SigmoID with included WebKit1 libraries.

Supported file formats

    SigmoID can open two types of sequence files: with genome sequences and  TFBS/promoter sequences.
    SigmoID expects annotated genome sequence files that should be in GenBank format. Only the files in GenBank format can be opened in SigmoID via the command from the File menu. The current version of genome browser can properly open only files with a single accession per file.  You can still perform a nhmmer (but not MAST) search of the genome split into several separate accessions, but please don't try to open this type of GenBank files in genome browser.
    SigmoID can also work with unannotated  genome sequences in fasta format.  An unannotated genome in fasta format can't be opened directly in  genome browser, but can be selected as a target for nhmmer/MAST search. Of course, filtering options of the post-processing script (HmmGen.py) that rely on feature table of .gbk files will not work with genome sequences in fasta format.
    TFBS/promoter sequences can be in either fasta or special SigmoID profile format (with .sig extension). TFBS/promoter sequences in fasta files should be aligned and of the same length. If facing varying lengths sequences, SigmoID displays error meaasage and doesn't show sequence logo; the sequences, however, are loaded (for viewing purposes and/or for aligning them properly with MEME).
    SigmoID profile format files (.sig extension) are virtual folders containing several separate files: the actual binding site sequences in fasta format, calibrated hmm and MEME profiles, as well as two text files with profile description and search engine/postprocessing options. The contents of the files within the .sig virtual folder can be viewed in the main SigmoID window via commands from the View menu.  SigmoID has two hidden menu commands for converting a real folder to the .sig file and vice versa; holding down the 'Alt' button while selecting the File menu reveals them.
    Please note that due to current Xojo limitation SigmoID can't handle virtual folders properly on 64-bit machines, therefore 64-bit Linux version converts all .sig files to real folders. These can only be accessed via toolbar of the main window and can't be opened via any menu command. Therefore, on 64-bit Linux machine, please move to the current profile folder any .sig folders you may want to open and use the leftmost toolbar button of the main window to open them.
    SigmoID can save genome files in standard GenBank format, export unannotated genome sequence in fasta format and export feature table in Sequin table format required by the NCBI tbl2asn program.  The appropriate commands are located in the File menu.

Interface

Windows

Main Window

    The main window opens at SigmoID launch and is split into two major interface elements: the topmost Viewer and the Log pane behind it. The Log displays informative messages (including errors) and also shows textual output from some of the included command line programmes and python scripts. The Viewer is hidden at SigmoID launch and opens once binding site data is loaded. It displays the sequence logo by default, but can be switched to display other info for the loaded data via the View menu. The info that can be shown by the Viewer depends on the type of the file opened: only the logo and the sequences could be shown for fasta files, while all options are available for .sig files. The Viewer can be used to edit binding site sequences which has two consequences: sequence logo is recalculated and all profile settings are discarded since they become invalid (this is also reflected in the Profile Wizard window). Please note that other types of information (settings, description, hmm profile, etc.) should not be edited here as this will have no effect. Only the changes made via the Profile Wizard window can be saved (in a .sig file) and reused.
    The sequence logo displayed by the Viewer is interactive and allows part(s) of the alignment to be selected. A single area can be selected by dragging a mouse across the logo; additional area(s) could then be added by pressing the "Shift" button and dragging again. Such a selection can have two possible uses. First, you can save the sequences corresponding to the selected area of the logo in a new fasta file (via the Save Profile Selection... command from File menu). Second, you can launch the nhmmer search with selected parts of the alignment masked. The masking happens by default if you initiate a search when a part of the alignment is selected. This works by invoking the alimask programme from the HMMER package to produce a masked hmm profile which is then used to search the target genome. You can set masking options in the nhmmer configuration window. Please refer to HMMER User Guide for the details.
    The toolbar contains the buttons for few of the most used functions. The leftmost "Load Alignment" button allows to open binding site data from either fasta or .sig files. The .sig files provided with SigmoID could be chosen from the drop down menu. User files can be opened by just pressing this button (on Linux or Windows) or choosing More... at the very bottom of the drop down list (OS X). 
    The next "Search" button launches nhmmer search with the currently loaded profile. The raw search results appear in the Log pane. In case a .sig file is opened the post processing python script is launched  by default and the Genome Browser window is opened to display search results. For non-calibrated profiles (opened from fasta files) the post processing script has to be launched separately by pressing the third "PostProcess" toolbar button. Please note that the original GenBank file is never modified and SigmoID will ask where to save the file in the same (GenBank) format with the additions it makes.
    The fourth "Terminators" toolbar button allows to search for terminators. This function uses TransTerm HP, performs the necessary format conversions and adds the terminators to genome annotation. As TransTerm takes some time to run, the results may take couple of minutes to appear, depending on available processing power.
    The fifth "Palindromise" button does a simple thing - it reverse complement sequences of the currently loaded binding sites, adds them to the currently loaded data and recalculates the sequence logo. This function is only meaningful for sites known to be palindromic and is especially useful when only few sequences are available. When searching with palindromised profile, the "Palindromic" check box should be checked. This function should not be used with combination with MEME (since MEME itself does a similar thing).  Please also avoid using this function before saving calibrated profile via the Profile Wisard window, as setting the "Palindromic" check box in Profile Wisard does exactly the same thing (an you'll end up wit every sequence duplicated).
    All genome search commands are also available from the Genome menu.
    The last toolbar button, "Settings" currently allows to set the paths to command line programmes and key scripts used by the GUI.

Genome Browser Window

   This window is opened after a search for binding sites and could be used to quickly skim through the sites just found. Alternatively, the browser could be used to view an existing GenBank file independently of any search function. The window is split into three viewers which display feature map (on top), the actual sequence with six frame translation (in the middle) and search results (in the bottom part). The feature map is interactive and can be used to select either a feature by clicking on it or part of the displayed genome fragment by dragging across it. Selected sequence can be copied to the clipboard, used as a query to launch database searches, edited or deleted. The corresponding commands are located in the contextual menu (brought up by right-clicking in the feature map); you can also double-click a feature to open its editor. Please note that currently no format checks are done in this window: be careful to adhere to GenBank format! Double-clicking outside any features centers feature map at the clicked coordinate.
    Database searches can be launched via contextual menu. Depending on the current selection, the menu will contain commands to search (with BLAST) against the nr database or (with hmmer/BLAST) against SwissProt/Uniprot/CDD database. Since NCBI servers are overloaded most of the time, hmmer searches of SwissProt/UniProt usually run much faster. 
Search results are displayed in the bottom of the window which in essence is a very simple web browser. Rudimentary navigation here (Back/Forward/Reload) is possible via the contextual menu.
    You can manually resize the top and bottom parts of this window by dragging the separator (the line with three dots abow the browser pane) up or down.
    The toolbar located on top of this window can be used to navigate the last hmmer/mast search results (the leftmost arrows control); keyboard 'left arrow' and 'right arrow' keys could also be used for navigation. The hit sequences could be saved  to a text file (in fasta format) via the corresponding command from the Genome menu. The check box to the right of this control could be used to exclude the undesired hits when saving them. The toolbar also allows to zoom in/out feature map (the rightmost control with +/- signs) or to search within the genome. The "smart" search field can distinguish three types of queries (sequence, coordinate or feature text) and performs the search according to query type. Navigating to the next search result is posible via the Control-G shortcat (Command-G on a mac) or the command from the Genome menu.

Database Windows

    The RegPrecise and RegulonDB windows provide access to the corresponding databases with regulatory informations. These windows have similar organisation and behaviour. Most of each window is occupied by the regulator list. Since RegulonDB contains information only for E. coli, the regulators are displayed straight away, while in case of RegPrecise you have to choose a species first (from the popup above the list). The top of the RegulonDB window allows to switch between the transcription and sigma factor  binding sites and filter the sites according to the evidence confidence level.
    Clicking a regulator in the list activates action buttons in the bottom of the windows. The leftmost button (with an "i" letter) connects to the database and displays the information on the regulator in a new window.
    The Check TF Presence button, located to the right of the info button, can be used to verify the presence of the transcription factor in the currently opened genome. The button will be disabled if there's no genome opened. This button connects to the corresponding database to get the amino acid sequence of the regulator and then launches tfastx search vs the opened genome. The three topmost hits of this search are displayed in the log pane of the main window. Since similarity levels vary greatly between genomes, it's up to the user to estimate its significance. A reciprocal confirmation could be helpful here: copy the coorginate of the topmost hit, find the corresponding ORF in the genome browser and launch phmmer search vs SwissProt/Uniprot to see if the original TF and its obvious orthologues come up as the top hits.
    Please note that the path to the genome file should not contain spaces! This is due to the way options are treated by tfastx.
    The Regulon Logo button is located in the lower right of the windows and could be used to load the binding site data for the currently selected regulator an display its logo in the main window. The RegPrecise window contains an additional button to display the logo of the binding sites from the corresponding regulog. Depending on the number of sites available for the regulator and their diversity, either button can be more usable. 

Profile Wizard Window

    This window allows to enter the settings required to make a calibrated profile. The top left of the window contains search thresholds, of which only the nhmmer gathering threshold is strictly required as it is used by default by nhmmer. Choosing the right threshold can be simplified by the Find Minimal Score command which finds the minimal score for binding sites in the training set. Although other two nhmmer cutoffs (and mast p-value threshold) are not required for saving the calibrated .sig file, they are still desirable.
    Entering the correct post-processing options in the top right of this window is critical for making correct additions to genome annotation.  The Palindromic site check box sets corresponding options when running MEME and MAST and filters overlapping results produced for palindromic sites by nhmmer. Checking the Use next locus_tag option will pick up both the locus_tag and gene qualifiers from the following gene when possible (these qualifiers won't be added if the binding site is located between divergently transcribed genes). The Ignore sites within ORFs option can significantly reduce the amount of non-specific hits for "noisy" profiles. This option, however, should be used with caution, as it may also remove some of the specific hits, especially for repressors. The text entered in the protein name field will be used as the value for the required bound_moiety qualifier when adding sites to genome annotation.
    Profile description should be entered in the text box in the lower part of the window. It is required to activate the Save... button and is expected to include the information on the data source(s) and description of the profile construction procedure.
    If an existing .sig file is opened, all settings from this file are entered in the fields of the Profile Wizard. However, these values will be erased if you edit the alignment sequences. If you want to prevent this, press the Lock button located in the lower right of this window.

Command Configuration Windows

These simple windows are opened in most cases before launching command line utilities (nhmmer, meme, mast, TransTerm HP) and python scripts to allow changing some of the options. The options are hopefully self-explanatory, but an explanation is provided in most cases via help tags: hold a mouse pointer over the option for a second to see this help.

Web Browser and Help Windows

The minimalistic web browser windows are used to display info from RegPrecise and RegulonDB databases, as well as SigmoID help. If these seem inconvenient, a link could be copied (via a contextual menu) and opened in a browser of your choice.

SigmoID Preferences Window

    This window can be opened via the Prefereces... command located in the SigmoID menu on OS X or in the Edit menu on Windows/Linux or by pressing the rightmost toolbar button in the main window. The buttons in the top part of this window switch between three preference panels. The panels allow to:
    1) Set the paths to executable files (nhmmer, meme, mast, etc.) used by SigmoID. This can be useful if SigmoID can't find some of the required programmes or if you like to use the ones already installed on your system. You can also reset all paths to their defaults (pointing to the files distributed with SigmoID) with the button in the bottom left of this window.
    2) Set the databases searched by BLAST and optionally restrict the searches to a smaller taxonomic group to speed up them. This panel also allows to switch between two result formats that can be output by the HMMER web server: the full graphics rich html format (default) and simple text format. The HMMER search pages have been changing recently a lot, and some versions could not be displayed properly by the default browser engines used by SigmoID on Windows and 32-bit Linux. This option should be used if you have problems with the default html format.
    3) Switch to an alternative folder with calibrated profiles from the one provided with SigmoID. Only the profiles from this folder will be accessible via the leftmost toolbar button in the main window. Only these profiles could be used by the  Scan Genome function.

Menu Reference

File Menu

    This menu contains the standard open and save commands separated into three groups.
    The topmost group contains commands related to alignments/profiles, the next one – to genome files.
   
    Open Profile...
    Displays an Open File dialog where you can select a profile/alignment file from your local disk. SigmoID can open files in its own format (.sig) or text files in fasta format. The file should have one of the following extensions: .sig, .fasta, .fas, .fsa, .fa.  

    Save Profile As...
    becomes enabled if the binding site sequences are changed. Rather than saving the changes directly, this command opens the Profile Wizard window which allows to enter new profile settings and save the alignment in a .sig file. If you want to save just the sequences in fasta format, please use the next command.

Save Profile Selection...
    saves (in fasta format) the part of the profile corresponding to the currently selected part of the sequence logo.

    Save Logo Picture...
    does what it says and does it in PNG format.

    Close
    Closes the current window. The main window can't be closed with this command.

    Open Genome...
    Displays an Open File dialog where you can select a genome file from your local disk. The file should be in the GenBank format and have the .gb or .gbk extension.
 
    Save Genome...
    Save the file currently opened in the Genome Browser window with the same filename.

    Save Genome As...
    Saves the genome currently opened in the Genome Browser window with a different filename.

    Export Sequence...
    Export the contents of the current genome as plain text file in fasta format. This discards feature table.

    Export Feature Table...
    Export feature table in GenBank Sequin table format. The resulting .tbl file can be used to prepare GenBank submission with the help of tbl2asn.

    Quit
    Closes all SigmoID windows and exits SigmoID completely. If you select this option with unsaved profile or genome, SigmoID will first ask you to save the changes.


Edit Menu

    Undo
    Undoes the last editing action done in the currently active text field. Unfortunately, Undo is not available for changes made to genome files.

    Cut
    Copies the selected text to the clipboard and deletes it from the original position.

    Copy
    Copies the selected text to the clipboard. In Genome Browser window this command copies the nucleotide sequence.

    Copy Protein Sequence
    If a CDS is currently selected in the Genome Browser, this command copies its amino acid sequence to the clipboard.
 
    Paste
    Pastes the text from the clipboard copied using Cut or Copy command to the current cursor location.

    Clear
    Deletes the selected text.

    Select All
    Selects all text in the current text field

    Preferences
    Opens the Preferences window to change personal preferences for SigmoID. Currently only alows to set the paths to executable files. On OS X this submenu is located in the SigmoID menu.


View Menu

This menu can be used to change the information displayed in the Main window and in the Genome Browser. Only the last command, View Details, is related to the Genome Browser. The remaining commands are related to the main window and swich the type of information displayed in the topmost Viewer pane. This menu allows to view contents of all components of a .sig file which is actually a virtual folder containing six text files. This meny allows to view the information contained within the .sig files. Editing the sequences can be done directly in the Viewer, while the rest of information can  only be edited via the Profile Wizard.

    Logo
    Displays sequence logo for the sequences in the currently opened .sig or fasta file (or downloaded from the RegPrecise or RegulonDB databases). Currently, the logo is calculated using the original T. Schneider (1986) formula without small sample correction.

    Sequences
    Shows the actual nucleotide sequences for the loaded binding site data. You can edit the sequences in this view, but if you wish to use the edited data further, please switch to the Logo view, as this finalises your editing and recalculates the logo and the hmm profile.

    Profile Info
    The description of the profile as given by its author. Available only for data from .sig files.

    Hmm Profile
    The calibrated hmm profile produced by hmmbuild when creating the .sig file. Available only for data from .sig files.

    MEME data
    The same sequences in MEME format. These are used for MAST searches. Available only for data from .sig files.

    Profile Settings
    Various settings, including profile calibration thresholds and post processing options. Available only for data from .sig files.
    Hide Viewer
   
hiddes/unhides via the Viewer in the main window to give less/more room for the Log pane.

    View Details
   
shows/hides sequence display with six frame translation in the Genome Browser window.


Profile Menu

    Extend Binding Sites...
    Opens a small window where you can specify the left and right extension limits, as well as the genome file to search. This command finds every sequence from the currently opened binding site data in the genome sequence and adds the specified number of bases to the left and to the right. The results are written to the log pane.

    Convert to Stockholm
    Converts the current profile to the minimal Stockholm format (as required by hmmbuild) and outputs the results to the Log pane.

    Convert to Hmm
    Runs hmmbuild from the HMMER package to create a hmm profile that could be used as input for nhmmer.

    Convert to MEME
    Runs MEME with the currently loaded binding site sequences and outputs the results as plain text to the Log pane. These results san be used as input by MAST (in fact, when running MAST with uncalibrated data MEME is run first in exactly the same way)

    Find Sites with MEME...
    Shows a window allowing you to configure MEME parameters. For this command MEME is configured to produce results in html format, hence they are displayed in the Web Browser window. This command may be useful when dealing with unalugned data, e.g. from RegulonDB.
    The command is currently not available in SigmoID for Windows. Please use the Convert To MEME command which runs the same command, but outputs plain text into the log pane of the main window.

     Profile Wizard...
     Opens the Profile Wizard window which allows to enter calibrated profile settings and then save it in as a .sig file.


Regulon menu

    The first two items of this menu provide access to databases with regulon information. The RegPrecise database contains high quality information on binding sites for many bacteria while the RegulonDB is a specialised E. coli regulon database. While the information from RegulonDB in most cases requires additional steps to be usable, it has data on regulators not present in RegPrecise. When using the data from RegulonDB for genomes other than E. coli, it might be worthwhile to check for the presence of the TF orthologue in the studied genome. This can be done with the Check TF Presence command.

    RegPrecise...
    Opens the window providing access to the RegPrecise database.

    RegulonDB...
    Opens the window providing access to the RegulonDB database.

    Regulon Info
    Opens the RegPrecise or RegulonDB web page with info for the regulon currently selected in one of the database windows.

    Show Logo
    Shows the logo of the currently selected regulator binding sites in the main SigmoID window. The result can usually be used for nhmmer/mast search for RegPrecise data. The binding sites in RegulonDB are often misaligned, in which case the Find Sites with MEME command from the Profile menu may (or may not) be useful.   

    Check TF Presence
    This command gets the regulator protein sequence from RegulonDB and runs tfastx search versus the currently open genome. The top three tfastx search results are displayed in the log pane of the main SigmoID window. We recommend a reciprocal check using the coordinates of the best hit to locate it in the current genome and run the phmmer search vs the SwissProt database which (in case of orthology) should bring the original regulator as the best E. coli hit. At the moment this command is not available for RegPrecise since there's no straightforward way to get regulator sequences from this database.
Find Minimal Score
    May be helpful when determining search thresholds. This command can only be issued when a nhmmer search has just been run and its results are displayed in the Genome Browser window. This function simply compares the current hits to the training set (original binding sites opened in the main window), outputs the lowest score among the training set and lists missed hits. The lowest specific and highest unspecific scores found are also entered as the nhmmer trusted and noise thresholds into the Profile Wizard window. If the noise threshold appears lower than the gathering one, their mean is entered in this window as the gathering threshold (otherwise the value of trusted threshold is entered here). The gathering threshold is actually the one that will be used for further searches. Depending on the original data and the actual genome, this simple approach may fail to choose the right values. You still have to verify thoroughly that these scores are the ones that you want! Also note that this command can't find sites with redundant bases or gaps.
   


Genome Menu

    This menu collects genome related commands and is mostly oriented at various ways of searching for regulatory information in currently opened genome.

    nhmmer Search...
    Opens the window allowing to configure and launch nhmmer, which is the primary search engine in SigmoID. This function is enabled if binding site data is loaded (and sequence logo of the site is displayed in the main window).  To launch the search, you have to choose a file with genome sequence in GenBank format and choose the cutoff score (which is critical for getting the correct results). If a calibrated profile is loaded, then the correct cutoff will be chosen already. The raw search results appear in the Log pane. If you are sure that the cutoff is correct, you may check the "Add annotation to the genome" check box which will run the HmmGen.py python script to filter nhmmer results, add the binding sites to genome annotation and open the updated genome sequence in the Genome Browser window. In case a calibrated profile (.sig file) is opened, this script is launched  by default. For non-calibrated profiles (opened from fasta files) the post processing script has to be launched separately by the Annotate Current Sites command.

    Annotate Current Sites...
    Opens the window allowing to configure and launch python script (HmmGen.py) to filter nhmmer results, add the binding sites to genome annotation and open the updated genome sequence in the Genome Browser window. Using this command separately from nhmmer search may be convenient for unoptimised profiles when deciding on the correct search options. This comand doesn't modify the original GenBank file, but will write a new one (asking where to save it) with the additions it makes.

    MAST Search...
    Opens the window allowing to launch MAST from the MEME Suite package. Please note that compared to nhmmer MAST wasn't extensively tested within SigmoID. The configuration window provides minimum options (basically, just one – the p-value cutoff), but allows to enter additional options if desired. These will be appended to the end of MAST command line. If a non-calibrated profile is opened, MEME is run before MAST to convert the binding site sequences to the required format. The raw search results (and, in case MEME was run, its output as well)  appear in the Log pane. The checkboxes in the bottom of this window instruct SigmoID to show the results filtered by the post processing script (MastGen.py) in the Genome Browser window with or without modifying the annotation.

    Terminator Search...
    Opens the window allowing to configure and launch TransTerm HP to search for terminators. This command performs the necessary format conversions and adds the terminators to genome annotation (with the TermGen.py script). As TransTerm takes some time to run, the results may take couple of minutes to appear, depending on available processing power.

    Scan Genome...
   
This command is designed to perform a full genome scan with all available calibrated profiles with minimal interaction with the user. You can select the desired profiles (or use all of them) and choose if terminator search will be performed. After pressing the Run button, SigmoID will run nhmmer followed by the HmmGen.py script with all checked profiles using preconfigured settings followed by the terminator search. The results will be written to the file specified by you in the GenBank format. With many profiles this function may take a while to run.

    Save Checked Sites...
    Saves to a text file (in fasta format) the hits from the last search currently displayed in the Genome Browser. The check box to the right of the navigation arrows in the toolbar of Genome Browser can be used to exclude the undesired hits when saving them.  

    List Regulons...
    Outputs to the Log pane of the main window either a single regulon (controlled by the specified regulator) or all regulons currently annotated in the genome. The regulons are output by sequentially listing the operons/divergons controlled by a regulator. For the purpose of this command the operon is defined as the genes between a binding site and the nearest terminator or a long intergenic gap. Two divergently transcribed operons are output as a single divergon with the regulator site in the middle. The settings window opened by this command allows to choose the criteria for operon beginning and start.

    Find
    This command simply puts the cursor into the search field of the Genome Browser window located in its top right corner. The search field can distinguish three types of queries (sequence, coordinate or feature text) and performs the search according to the type of query. Type what to search for and press "Enter" on the keyboard to initiate the search. The position of the first occurrence of the query in the genome will be highlighted.

    Find Again
    Highlights the next position in genome of the previous search query.

    Add Plot...
    This command can be used to plot either RNA-seq coverage data produced by samtools (please see the instructions) or simple numerical data (e.g. %GC). Four overlapping graphs can be shown in the plot area – just use the Add Plot… command repeatedly. All graphs are shown in the same plot area which allows, for example, to compare RNA-seq data for two conditions and for both strands. Each plot is currently scaled separately, so the maximal values plotted (shown on the left and on the right) should be taken into account when comparing the plots.

    Remove Plots
    Well, it removes all plots that are displayed

    Merge Plot Data...
    This is an auxiliary command that can be used to merge two data files produced by samtools depth command. This is required to properly display RNA-seq data according to these instructions.
   

Window Menu

    Lists all currently opened SigmoID windows. Selecting a window from this list brings it to the front.



Help Menu

    About SigmoID
    Displays a window with information about SigmoID including the current version and a brief list of credits.

    SigmoID Help
    Opens the Help Viewer window.
    HMMER User Guide
    Opens (in the default PDF viewer) the HMMER User guide from the HMMER package distribution.

    Hmmer.org
    Opens HMMER website in the web browser window.

    MEME Suite Web Portal
    Opens the main MEME Suite website in the web browser window.   


Using SigmoID to view RNA-seq coverage data

    SigmoID can display graphs of RNA-seq coverage that could be very helpful for verifying regulatory sequences and operon boundaries. As of now, SigmoID does not include all functions required and can only load and display read count values. These can be produced in many different ways, one of which (not necessarily the best one) is described below. This approach requires bowtie2 for read mapping and samtools for processing the resulting file.

    The commands to produce the required files in the case of paired reads are described below. The switches -p 8 and -@ 8 will run the bowtie and samtools tasks on eight processor cores: adjust those for your system.

    The commands below assume that the genome file is called 'genome.fasta' and RNA-seq data are in the files 'read1.fastq' and 'read2.fastq'.

1. Index your genome file:
bowtie2-build genome.fasta genome_index

2. Map the reads to your sequence:

bowtie2 -x genome_index -p 8 --very-sensitive-local --no-mixed --no-discordant -1 read1.fastq -2 read2.fastq -S mapped.sam

3. Convert sam to bam, sort and index it

samtools view -bS -@ 8 mapped.sam | samtools sort -@ 8 - mapped.bam
samtools index mapped.bam mapped.bai

4. Remove reads with mapping quality below 2 (which map to more than one location):

samtools view -b -q 2 -@ 8 mapped.bam > mapped2.bam

5. Split the sam file into reads mapping to sense and antisense strands. Since mates of a paired read map to different strands and samtools can't extract those at the same time, samtools has to be run four times:

samtools view -b -@ 8 -f 99 mapped2.bam > sense1.bam
samtools view -b -@ 8 -f 147 mapped2.bam > sense2.bam
samtools view -b -@ 8 -f 83 mapped2.bam > antisense1.bam
samtools view -b -@ 8 -f 163 mapped2.bam > antisense2.bam

(note: three extra bits added to -f exclude unmapped and improperly mapped reads, which is not required in this particular case (but does no harm)

6. Count the reads:

samtools depth sense1.bam > sense1.depth
samtools depth sense2.bam > sense2.depth
samtools depth antisense1.bam > antisense1.depth
samtools depth antisense2.bam > antisense2.depth

7. Open the genome file in SigmoID and combine read counts for sense and antisense strands using the Merge Plot Data... command from the Genome menu.

8. Load the combined read count data for the sense strand using the Add Plot... command from the Genome menu, then load the data for the antisense strand via the same menu command. Repeat for another sample.

Python Scripts

    The scripts described below process output produced by various search programmes, perform format conversions and add features to genome annotation. The scripts are called by SigmoID GUI when necessary, but can be used separately if desired. Type the command below in terminal to get help on command line usage:
  
   python <path_to_the_script> -h


HmmGen.py

    SigmoID processes nhmmer results (table of hits) with the help of the HmmGen.py python script, adding corresponding feature annotations to the genbank file being searched and saving the result in a new file. Some useful options are provided to make annotation more convenient. These you can find in the "HmmGen Settings" window, which pops up after clicking the "Postprocess" button in the main window.
    To run the script, enter the appropriate threshold (either bit score or E-value). By default SigmoID chooses the same value that was used to run nhmmer, but you can increase the bit score or decrease the E-value to reduce the number of hits without re-running nhmmer.
    To filter out all intragenic hits, check the "Consider intergenic regions only" box. nhmmer reports hits on both strands, and in the case of palindromic sites there will be two hits with the same coordinates and identical (or very close) scores. To remove one of the duplicate sites, check the "Palindromic site" checkbox.
    This script can also add 'locus_tag' and 'gene' qualifiers to the feature being annotated, but please note that GenBank will object such additions if you later decide to submit this sequence to the database. If you are certain you really want this addition, check the "Add qualifier" box.
    Choose feature type ("promoter" or "protein_bind") from the "Feature to add:" box (or just type in the valid feature type). The window also allows to configure one qualifier for this feature. The qualifier name could be typed in, but it should remain as 'protein_bind' in most cases. A valid protein name should be entered in the field to the right.
    Pressing the Run button will ask you for the name of the file in which you'd like to save the genome sequence with modified annotation. If the "Show hits in genome browser" box is checked, You'll see the results in the browser window. The script also appends the detailed text report to the log pane.

MastGen.py

    This script allows to add features to a genbank file according to MAST results. From SigmoID it is called when

usage: 
MastGen <report_file>  <input_file> <output_file> [options]

positional arguments:
  report_file           path to MAST report file produced with -tblout option.
  input_file            path to input GenBank file.
  output_file           path to output GenBank file.

optional arguments:
  -h, --help            show this help message and exit
  -L <integer>, --length <integer>
                        final feature's length in genbank file
  -q [<key#"value"> [<key#"value"> ...]], --qual [<key#"value"> [<key#"value"> ...]]
                        add this qualifier to each annotated feature.
  -p, --palindromic     filter palindromic sites.
  -n, --name            don't pick 'locus_tag' and 'gene' qualifiers from the
                        next CDS feature.
  -V <float or integer>, --pval <float or integer>
                        threshold E-Value.
  -S <float or integer>, --score <float or integer>
                        threshold Bit Score.
  -i, --insert          don't add features inside CDS
  -d, --duplicate       no duplicate features with the same location and the
                        same protein_bind qualifier value
  -v, --version         show program's version number and exit
  -f <"feature key">, --feature <"feature key">
                        feature key to add (promoter, protein_bind etc.)

TermGen.py

    This script allows to add terminators to a genbank file according to TransTerm HP results.

usage: 
TermGen <input_file> <output_file> [options]

positional arguments:
  input_file            path to input GenBank file.
  output_file           path to output GenBank file.

optional arguments:
  -h, --help            show this help message and exit
  -o <path>, --output <path>
                        redirects TransTerm HP output file to directory given
  -C <integer>, --confidence <integer>
                        threshold Score.
  --minstem <integer>   Stem must be n nucleotides long
  --minloop <integer>   Loop portion of the hairpin must be at least n long
  --maxlen <integer>    Total extent of hairpin <= n NT long
  --maxloop <integer>   The loop portion can be no longer than n
  -v, --version         show program's version number and exit

ptt_converter.py

This script allows to convert genbank file into .ptt file format.

usage: 
Genbank to PTT converter <input_file> 

positional arguments:
  input_file     path to input Genbank file.

optional arguments:
  -h, --help     show this help message and exit
  -v, --version  show program's version number and exit

OperOn.py

    This script finds putative operons between regulator binding sites and/or terminators/long intergenic gaps.

usage: 
OperOn <input_file> [options]

positional arguments:
  input_file            path to input GenBank file.

optional arguments:
  -h, --help            show this help message and exit
  -g <int>, --gap <int>
                        minimal gap between operons
  -i <int>, --indent <int>
                        maximal distance from binding site to the first
                        downstream CDS
  -t, --terminator      terminators are regarded as operon separator
  -r <name of regulator>, --regulator <name of regulator>
                        only specified regulators are considered
  -p, --palindromic     treat all binding sites as palindromic
  -s, --strict          operon stops on first terminator (if -t is set)
  -v, --version         show program's version number and exit

gbk2tbl.py

This script allows to convert GenBank file into .tbl file format. The resulting table is output to stdout.

usage: 
Genbank to .tbl converter <input_file> [options]

positional arguments:
  input_file            path to input GenBank file.

optional arguments:
  -h, --help            show this help message and exit
  -f, --fasta           creates fasta from genbank file.
  -p PREFIX, --prefix PREFIX
                        sequencing centre prefix.
  -t, --translation     adds translation qualifier to CDS features in .tbl
  -v, --version         show program's version number and exit