Published March 19, 2024 | Version 03.12.2024
Software Open

GapClosure program to patch gaps in genome assemblies

  • 1. University of Pretoria
  • 2. Centre for Bioinformatics and Computational Biology

Description

Program GapClosur 03.12.2024
Author: O. Reva (oleg.reva@up.ac.za)
Last time modified: March 12, 2024
Assisted with OpenAI ChatGPT4

python run.py - show command prompt menu
python run.py [-arguments]
python run.py -h / -H / --help - show this help
python run.py -v / -V / --version - show version

Arguments:
       -t:       # <string> generic subword within subgect sequence and gaps file.
       -r:       # <file name> name of the referense file.
       -s:       # <file name> name of the subject sequence file or a unique marker subword;
                 # If a marker is specified, the subject sequence file name muct contain 
                 # both: the marker and the generic subwords.
       -g:       # <file name> name of the gaps file or a unique marker subword;
                 # If a marker is specified, the subject sequence file name muct contain 
                 # both: the marker and the generic subwords.
                 # This file is an output file of Mauve->Tools->Export->Export Gaps.
       -m:       # <integer> Minimal length of gaps, 10 by default.
       -p:       # <yes/no> Shows a plot created by matplotlib.
                 # Library matplotlib must be installed.
       -o:       # <file name> /optional/ Name of the output sequence file. 
                 # If not specified, generic name will be used.
       -x:       # <folder name> 'input' by default. All input files must be placed here.
       -y:       # <folder name> 'output' by default. Output files are stored here. 

Currently, folder 'input' contains 3 example files:
ID003.reference.fa - reference sequence file;
IB011.subject.fa - subject sequence file;
IB011.gaps.txt - text file containing a list of gaps in sequence IB011
compared to sequence ID003.

Program Mauve (https://darlinglab.org/mauve/download.html) was used to align whole 
genome sequences of S. aureus isolates against each other using the function 
File -> Align with progressiveMauve. The reference sequence must be placed first 
in the list, followed by the subject sequence. After the sequence alignment is 
complete, use the command Tools -> Export -> Export Gaps to save the gaps to a file 
that this program utilizes. 

The program uses the gaps file to identify locations of insertions in the reference 
sequence that are absent in the subject sequence and fills these gaps in the subject 
sequence by patching them from the reference sequence. In the next step, the patches 
must be verified by mapping the initial DNA reads against the resulting sequence stored 
in the 'output' folder.

This program was used for the first time in the paper titled  "Staphylococcus aureus 
associated with post-operative wound infections in Western Kenya reveals genomic 
hotspots for pathogen evolution" by Mogoi et al., 2024.

Files

Files (1.3 MB)

Name Size Download all
md5:b3545bc3324e72c6a67159d532455a24
1.3 MB Download

Additional details

Software

Programming language
Python