A on gel A survey on gel images analysis software tools

: One of the most serious sources of information for molecular biologist is gel image that generated by using gel electrophoresis during the experiment of issr-pcr, sds-pages and rapd-pcr. DNA and protein gel images are obtained through the gel electrophoresis separations techniques of DNA and protein fragments. The separation of the polymorphic bands is based on the sizes of the negatively charged DNA fragments running from the negative cathode toward the positive anode. Each gel image has some vertical lanes; each lane corresponds to one sample and has a number of horizontal bands. The resulting images produced by Gel electrophoresis are sometimes difficult to interpret so that it was important to develop software tools to analysis the gel images to help biologist in the process of analyzing gel image as they draw their conclusions according to the results that generated from gel image analyzer software. In this article, we present a survey of some commercial and non-commercial software tools that are used for analyzing gel images. We develop a novel software for processing and analyzing the gel electrophoresis images, computing the molecular weights, saving them as excel sheet, clustering the bands based on their molecular weights using k-means algorithm, Applying band matching using a tolerance value entered by the user, determine the similarities between samples, drawing the corresponding phylogenetic tree, saving a report of the experiment as a pdf, and printing this report. The novel software will provide the biologist with the ability of manual processing, automatic processing and semi-automatic processing.


INTRODUCTION
Gel Electrophoresis (GE) is an essential technique that used in the experiments of molecular biologist which used for separating the DNA [1,2]. The separation is done based on their weights in more details DNA fragments which run form the negative cathode toward the positive anode are separated based on the size of each fragment as the smaller fragments of DNA migrate faster through the gel and occupy the lower position of the gel and the fragments with larger weights will be appear on the top of the gel as the larger the size of DNA fragment the less chance of it for passing through the small pores on the gel for this reason the DNA fragments with large size move less slowly than other DNA fragments with small size. There are a lot of applications of Gel electrophoresis in the fields of genetic, microbiology, molecular biology and that extracted from samples more accurately, comparing them with each other or with a standard sample known as marker, determining the plant pathogenic types and specifying a genotype combined with a specific bacterium. Gel images "DNA and protein" are produced and obtained through the separation process which done by using gel electrophoresis. Gel image consist of vertical columns called lanes, in which every lane represents a sample and horizontal fragments called bands that are sorted in each lane based on their molecular weights [4]. The quality of most generated gel image is low and not good because of some factors such as "the buffer chamber temperature, reorientation angle, agarose type, time, field strength, etc.…" [5] . Image quality could affect the accuracy of extracting right information from these images. Thus, the most important step is enhancing the uploaded gel image before doing any process on the gel image.
This research is organized as follows. Section 2 is the related work that presents the current gel images analyzer software tools. Section 3 is the Discussion, in this section, the capabilities of our novel software are discussed. Section 4 is the conclusion, in this section, we compare our novel software with other similar noncommercial software and the missing capabilities in our software that will be founded in the next version.

RELATED WORK
There are a few software that has been developed to analysis the gel electrophoresis image but most of them commercial and some of them do not achieve all the requirements of the user and the free software are very complex for user and don't give many options. GelAnalyzer, GelQuant and ImageJ [6] .
On the other side our software is a free software that can generate a dendrogram based on molecular weights.
There are some software perform most of their tasks manually such as ImageJ and ClusterVis [7]. Moreover there are some software do not allow users to add or delete lanes and bands manually such as PyElph [8]. On the other side, our software's tasks are carried out manual, automatic and semi-automatic. It is more accurate in lane and bands detection than PyElph software which is mainly developed for educational uses and is not accurate in detecting lanes and bands. Another example, GelClust [9] is designed using c-sharp programming language like our software and do all what our software can do but GelClust does not give the user any privileges to detect the molecular weights of ladder, does not show him the molecular weights of other unknown bands, does not save weights as excel sheet, does not generate and print reports like our software.
The last program that developed in Egypt is called Image Analyzer [10] that designed using MATLB with not good GUI and its size is so large, on the other hand our software has been developed and designed using csharp programming language with smart GUI that guides the users from first step of loading gel image to the last step of generating phylogenetic tree and its size is very small which can be downloaded by anyone and installed on windows operating system Unlike Image Analyze.

DISCUSSION
Our novel software is a one of the new software in the field of gel image analyzer software that has been developed using C-sharp programming language under windows operating system. It's window contains a home page describes all capabilities of the software, sidebar with two options "Manual processing and Automatic processing", header with some icons for "upload gel image, save image, save experiment to database, save results to pc as excel sheet or pdf and show old experiments that stored in database" and two pages for manual and automatic processing. After the user chooses the processing type from the sidebar, he can upload the gel image. If the user chooses Manual processing from sidebar the manual processing page will be activated and the following steps are followed during processing the gel image:  Uploading the gel image, cropping extra areas from it and remaining the interested region which will be processed in the next steps and enhancement process "gray, complement, contrasting, performing gamma correction and some other filters to remove noise" will be executed automatically.
 Choosing lanes detection from the Detect group box and perform the detection using right mouse-click.
 After finishing detection of lanes, correction of them using left mouse-click.
 Choosing bands detection from the Detect group box and perform the detection using right mouseclick.
 After finishing detection of Bands, correction of them using mouse left-click.
 Enter molecular weights of ladder or marker.
 Check weights, computing the unknown weights, displaying them into datagrideview.
 Generating phylogenetic tree based on extracted weights.
 Saving the molecular weights of unknown bands as excel sheet and printing report.
But if the user chooses Automatic processing from sidebar the Automatic processing page will be activated and the following steps are followed during processing the gel image:  Uploading the gel image, cropping extra areas from it and remaining the interested region which will be processed in the next steps and enhancement process "gray, complement, contrasting, performing gamma correction and some other filters to remove noise" will be executed automatically, converting image to binary image and giving the user the control in determining the best threshold value for the image through track bar that has a range from 0 to 255.  Insert the molecular weights of marker or uploading an existing one, check them as they must inserted in descending order and the user can save the current ladder in his pc for later use.
 Computing the unknown weights, displaying them into datagrideview and detecting whether band is a primer dimmer or not.
 Generate phylogenetic tree based on extracted unknown weights.

Journal of Intelligent Systems and Internet of Things (JISIoT)
Vol.  Saving the experiment results as a pdf which can be printed in any time as a report.
As we said before, the gel image which is generated form the gel electrophoresis is consisting of some lanes, each lane represents a sample and horizontal fragments called bands that are sorted in each lane based on their molecular weights as shown in figure 1 and figure 2. The workflow that is applied to process the gel images is summarized in figure 3.  The first stage is pre-processing the gel image as we said before the quality of gel image that is produced from the gel electrophoresis during the experiment of issr-pcr, sds-pages and rapd-pcr is poor so that the first stage of analyzing gel image is a pre-processing stage. In this stage the software tool crops the gel image as required then enhancing the gel image to facilitate the analysis process.
In the second stage, the lanes of the gel image are detected, the common idea of these methods is the construction of a 'vertical densitometric-curve' or 'histogram' as shown in figure 4, averaging the pixel values on the same vertical line. In the densitometric curve, the local minima correspond to the gap betrween the lanes, and this fact is used to detect the lanes of the image.  The last step is the comparison of the similarity among the different lanes. Our software calculates the similarity matrix using two method, by calculating the number of matching bands between the two lanes that is done in the last step using band matching algorithm and dividing the number of matching bands by the total number of bands in both lanes, this method depend on bands. The second method that our software uses it is, Euclidean distance.
After that our software draw the phylogenetic tree using two method, the first one by implementing UPGMA clustering algorithm. The second method is drawing the tree based on the band matching algorithm that is applied on the last step.

CONCLUSION
After we talked more and more about the existing gel image analyzer software tools and comparing them with our new software which is developed mainly for computing the unknown bands of gel images, clustering the samples, draw phylogenetic tree, printing a report. Based on the results we found that our software is superior to many current gel image analyzer software. Students who work in the field of genetics and molecular biology and Researchers can use it. It is a free software with smart GUI that guides the user, it has a real time adjustment by using track bar to manual corrections, it has a small size that can be downloaded easily and installed on windows OS. But it still has some missing capabilities in this first version which will be founded in the next version of our software. The missing capabilities which could be added in the next versions are: (1) More clustering algorithms.
(2) Different algorithm for image processing.
(3) Database for saving all the experiment and displaying them in any time as needed.
(4) Mobile application version. Furthermore the main specific capability of our software that is embedded only in this software is grouping the bands based on its molecular weights and labeled each band with its group number in the gel image using k-means algorithm.