Unsupervised Local Spatial Mixture Segmentation of Underwater Objects in Sonar Images

In this paper, we focus on the segmentation of sonar images to achieve underwater object detection and classification. Our goal is to achieve accurate segmentation of the object's highlight and shadow regions. We target a robust solution that can manage different seabed backgrounds. Segmentation of sonar images is a challenging task. Speckle noise and intensity inhomogeneity may cause false detections and complex seabed textures, such as sand ripples and seagrass, often leading to false segmentation. In this paper, we propose our local spatial mixture (LSM) method for image segmentation of sidescan deployed sonar systems of any type. This new method estimates pixel labels in sonar images by incorporating the possible spatial correlation between neighboring pixels for improved segmentation. LSM modifies the expectation–maximization algorithm by adding an intermediate step (I-step) between the expectation (E-step) and maximization (M-step) steps. To combat intensity inhomogeneity, we employ a new initialization algorithm, one whose thresholds are set automatically to achieve and maintain robustness in various underwater environments. Using multiple evaluation indexes that measure the geometrical features of the segmented objects, we tested LSM using synthetic and real sonar images, one of which is obtained from our own sea experiment. Our results show that LSM achieves improved segmentation performance over the state-of-the-art methods of four different approaches; LSM is also robust to different seabed textures and intensity inhomogeneity. We share the sonar images from our sea experiments.

of pretrained object detectors, is proposed in [2].In this paper, we offer local spatial mixture (LSM): a method that produces accurate segmentation of the object's borders and of the image's background.LSM is aimed specifically for images produced by a side-mounted sonar system, such as sidescan sonar, mutibeam, and SAS.Image segmentation is the second step of the process of automatic detection and classification (ADAC) and its quality determines to a great extent the performance of the classifier.In this respect, LSM is a blind segmentation method that does not require a priori knowledge of the shape or size of the object.LSM is also robust to the types of both sonar system and seabed.
Sonar image segmentation is an important step in the ADAC chain.A typical ADAC scheme is presented in Fig. 1.The ADAC process works as follows: first, an ROI is detected from the original image.Then, segmentation is performed for the ROI.The segmentation results are used to extract the geometrical features of the detected objects.In the final step, a classification is performed based on the extracted features.In this paper, we focus on the segmentation part of the ADAC process for sonar images.We perceive segmentation as the key to effective ADAC performance.Three types of regions are identified in sonar images: highlight, shadow, and background [3].The highlight originates from reflections of the acoustic wave from the object; shadow regions are created due to the lack of acoustic reverberation behind the object; and background originates from seafloor reverberation.An example of all three regions combined in a single sonar image is presented in Fig. 2. The background regions are identified by rectangles, and the highlight and shadow regions are identified by red lines.Our goal is to accurately identify the pixels in the image that relate either to the highlight or to the shadow regions of an object, and which are separated from the background.That is, in contrast to pure segmentation, we only care about the correct labeling of pixels in relation to the object.
Sonar image segmentation is mostly challenging due to strong reflections from the seafloor and the inhomogeneity of the water column.These phenomena can cause a false segmentation of the image background as an object highlight/shadow.Another challenge is posed by complex seabed textures, such as sand ripples, seagrass, and boulders.Boulders and seagrass may increase the areas in the image with false segmentation.Furthermore, the echoes of the object and sand ripples may blend together, causing a brutal distortion of the geometrical features of the object's shadow.These distortions are because traditional segmentation techniques for (optical) natural images cannot be directly employed for sonar images.Supervised and unsupervised methods are the main approaches in sonar image segmentation.The former uses a database of labeled examples of the target object, whereas no prior training set is available for the latter.Given a reliably labeled database, supervised segmentation will produce better results than unsupervised segmentation.However, since sonar images tend to vary significantly, according to the angle and distance of the observed object, a reliable database is available only for a very limited set of objects.This problem is often exacerbated, due to the existence of rocks and the varying structure of the sea surface, which may produce false detections when using a limited database.In this paper, we focus on unsupervised image segmentation.We assume that we know the parameters of the sonar system (i.e., its carrier frequency and overlooking angle) as well as the depth of the water.We also assume that we have an upper bound for the size of the object and thus can calculate the expected size of the object in the sonar image.Still, we assume no training set for the object of interest.In other words, the actual shape and structure of the target in the sonar image are not known.Under this setting, we perform unsupervised sonar image segmentation.
Our LSM segmentation algorithm is targeted to be computationally inexpensive and to show robustness to intensity inhomogeneity.LSM first performs an initial segmentation of the sonar image into three clusters: highlight, shadow, and background.Then, segmentation is performed using the expectationmaximization (EM) algorithm [4].To achieve robustness to different backgrounds, we have solved the complex analysis for a generalized mixture distribution model.The result is a vast improvement in segmentation compared to the commonly used Gaussian mixture model.To avoid local minima points during the EM process, we issue a novel initialization procedure based on the detection of local shadow and local highlight regions.
The main contributions of this paper are summarized as follows.
1) A mixture model-based method to significantly improve segmentation accuracy compared to four benchmarks that represent the state-of-the-art methods: Dempster-Shafer's theory, Markov random field (MRF), graph cut, and fuzzy logic.2) A novel intermediate step that utilizes the expected spatial dependence of the image's pixels.The result is a new softhard method, which evaluates the posterior.3) A novel initialization procedure with an automatic selection of thresholds that handles the intensity inhomogeneity of sonar images and ultimately reduces the risk of false segmentation and local minima points.4) A statistical analysis of the segmentation results and analysis of real sonar images using novel quantitative measures.We investigate LSM performance in both numerical simulations and using real sonar images.For the former, we implemented a simulator to produce sonar images.For the latter, we use the EdgeTech database, which includes multiple sonar images from different marine environments [5].The results show that, at a cost of a slight complexity increase, our algorithm exceeds the state-of-the-art methods of four different approaches in terms of false segmentation, object shape measures, and object geometrical features.
This paper is organized as follows.In Section II, we survey the state-of-the-art methods of image segmentation.In Section III, the system model is presented.Section IV contains the details of LSM, whereas numerical and experimental results are presented in Section V. Finally, concluding remarks are provided in Section VI.The notations used in this paper are summarized in Table I.

II. RELATED WORK
We identify three main approaches for image segmentation.The first is the fuzzy-based segmentation, the second is the mixture-model MRF, and the third is the active contour (AC) method.Regarding the former, fuzzy C-means (FCM) is a widely used algorithm in image segmentation [6] and [7].FCM performs segmentation via nonlinear optimization, which is based on an objective function.The MRF approach usually involves the estimation of the posterior probability to manage the spatial dependencies between neighboring pixels, such that outliers with respect to their neighborhood are excluded [8].The AC model has been applied successfully to image segmentation [9], and is considered to be robust to noise and to generate smoothed and accurate contours.The segmentation problem can also be solved using graph theory.For example, in graph cut image segmentation, by minimizing the Gibbs energy function, a label is assigned to each node in the graph's representation of the image [10].While fuzzy-based and AC-based segmentation work well for certain types of images, they require manual adaptation for a different seabed structure.To obtain robustness to background types and to combat nonhomogeneity, we find mixture model-based segmentation to be a better fit.Mixture-models MRF-based segmentation is less sensitive to noise and outliers in sonar image than FCM, AC, and graph cut.This is mainly since MRF utilizes the local spatial interactions between neighboring pixels; as a result, the label of the pixel is less influenced by outliers.On the contrary, in FCM, the spatial correlation between neighboring pixels is not taken into account; in AC, the gradient method leads to low segmentation accuracy; and graph cut based segmentation is very sensitive to initialization.To handle the nonhomogeneity in sonar images and to avoid trapping to local minima, a manual adaptation of crucial parameters is needed in fuzzy-based segmentation, whereas in AC, manual initialization is required [11].Also, the approach of parametric shape templates [12] in AC is severely degraded in sonar data, and should be adapted manually to generate satisfactory results.In contrast, employing a mixture model of distributions, such as the gamma distribution, allows flexibility in segmenting sonar data.This is because, by setting its parameters appropriately, the representation through gamma distribution captures a verity of distributions [13].The flexibility of MRF-based segmentation, such as the MRF in [8], in handling noise in different regions in the sonar image is reflected by the prior distribution that varies for each pixel corresponding to each label and is set according to the local spatial information.
Due to its importance and numerous applications, the problem of sonar image segmentation has been widely studied.Recently, the AC method has been applied successfully for image segmentation [14] and [15].Two main approaches exist for AC methods: edge based and region based.Edge-based methods utilize image gradients to identify the object's boundaries [16] and [17].In sonar image segmentation, these are related to the boundaries of the highlight and shadows that characterize an object.The region-based method models the background and object regions statistically and finds an optimum, where the model best fits the image [18].Another interesting approach involves the integration of the image's geometric properties.This is performed in the gradient vector flow (GVF) algorithm, which extends the gradient vector of an edge map, thereby suppressing noise.Wang et al. [19] proposed normally biased GVF to combat weak edge preserving and to achieve noise robustness.
The main limitation of the above-mentioned algorithms is their sensitivity to the initialization step.In particular, if the AC is initialized without comprising a part of the object, the algorithm will diverge.Recently, Fandos and Zoubir [20] proposed a new polygonal AC algorithm for sonar imagery that utilizes the likelihood function of the object region and background region.The initial contour is drawn from the results of the Markovian segmentation.Li et al. [21] proposed the region scalable fitting (RSF) model that utilizes local information in the cost function to deal with intensity inhomogeneity.This local information is based on a weighted average of pixel intensities in a Gaussian window in the object region and background region.However, since the cost function of the RSF model is nonconvex, the final segmentation results are very sensitive to the initial selection of the segments.Javed et al. [22] proposed a method that improves RSF in terms of accuracy and efficiency.The model is built on local entropy and local variance as inputs to fuzzy inference engine to extract proper weights to be used in the cost function.Yang et al. [15] proposed a nonlocal means-based despeckling filter as a pretreatment to improve robustness to multiplicative noise for sidescan sonar images.The authors also integrated an edge-driven constraint into the RSF model to avoid convergence to local minima and to reduce complexity.Jing et al. [23] proposed a novel global minimization AC model (GMACM), which is a convex version of the RSF.GMACM has been applied for oil slick segmentation in synthetic aperture radar (SAR) images with satisfactory results.However, GMACM only utilizes the local mean information, which is not sufficient to accurately model an image with intensity inhomogeneity; hence, its performance is limited.Recently, Song et al. [24] proposed a globally statistical AC model (GSACM) that takes into account not only the local average, but also the local variance of the image's intensities.GSACM is more robust to the initial selection of object and background boundaries since the algorithm solves an unconstrained convex cost function, in contrast to the GMACM, which omits the constrained term when minimizing the cost function.
In [11], Fandos et al. apply a high complexity graph theory based method aimed for a synthetic aperture sonar image.The method splits the image into two groups of pixels, one assigned to the shadow region and the other to the background.The iterative conditional modes (ICM) method is used for the segmentation of the highlight region.Celik and Tjahjadi [25] propose an unsupervised multiscale segmentation method for seabed mapping using sonar imagery.A multiresolution representation of the image is obtained by applying the wavelet transform.A feature set is then computed for each pixel and Kmeans clustering is used.For initialization, the particle swarm optimization method is used in [26], whereas in [27], initialization is based on the fractal segmentation.Maximum likelihood (ML) was used to estimate the distribution parameters of the Markov model, and the iterative conditional estimation algorithm serves for final segmentation.
Finite mixture models have also been widely used in image segmentation to incorporate spatial correlation between pixels [28] and [29].Zhang et al. [30] substituted the prior probability of the pixel label with an estimate of the MRF distribution.The diffused EM (DEM) algorithm [31] uses an anisotropic diffusion step between the E-step and the M-step of the EM algorithm.Recently, the MRF and Dempster-Shafer theory (DST) have been utilized to handle spatial correlation among neighboring pixels [32] in sonar images.In DST-based clustering, pixel labels are used as evidence.Then, according to Dempster's rule, all possible combinations of hypotheses are considered for decisionmaking for pixel i.The work by Nguyen and Wu [8], which represents the state-of-the-art method in MRFs-based segmentation, incorporated the spatial correlation among neighboring pixels by using a novel smoothing prior.This new model was applied in the MRF distribution to form the log-likelihood function initialized via the K-means algorithm.Yet, a limitation of this method is that a single parameter controls the smoothing prior, which may limit the tolerance to noise.Wu and Chung [33] proposed a compound MRF model to represent the boundary of an object.Analyses conducted on simulated images, as well as on real clinical images, reveal an improvement in segmentation and accurate object boundary results.Pan et al. [34] proposed a new mixture model, based on the Student's t-distribution [35] and the MRF.The main advantage of the Student's t-distribution is that it is heavily tailed, and hence provides a much more robust approach than the standard Gaussian mixture model (GMM).In [36], at a high computational cost, Nikou et al. used local spatial interactions between neighboring pixels.The probability of pixel labels is modeled using the Dirichlet compound multinomial density function.To impose smoothness, the Gauss MRF is utilized on the Dirichlet parameters.
While the above surveyed sonar segmentation methods perform well in simple underwater environments, they fail to segment the image in complex seabed textures, such as sand ripples or seagrass.This is mostly due to the problem of intensity inhomogeneity, but also due to nonaccurate segmentation initialization.Considering this challenge, we offer a novel robust initialization algorithm and a new model for incorporating the spatial information among neighboring pixels.

III. SYSTEM MODEL
This paper focuses on the segmentation of sonar images produced from sonar systems mounted on autonomous underwater vehicles (AUV).The goal of image segmentation is to accurately estimate the pixel labels.Let A be a two-dimensional (2-D) sonar image of size N with A i , i = 1, 2, ..., N , being the intensity level of the ith pixel.Let L be a set of labels for the image pixels with |L| = L, where | • | stands for the dimension of a group.We consider three labels L = {S, B, H}, where S, B, and H stand for shadow, background, and highlight, respectively.The result of the segmentation process is an assignment l i ∈ L for each pixel i.Let e l denote an L × 1 vector whose lth component is one and the rest are zero.Also, let I i be an indicator vector with I i = (I i,1 , ..., I i,L ) T ∈ {e 1 , ..., e L }.I i is a discrete random variable with a probability function defined by Prob(I i = e l ) = π i,l ∀ i, l whose distribution is given as ( We use the gamma distribution with parameters α l and μ l suitable for sonar images [37] such that where θ l = [α l , μ l ] T and (•) is the gamma function, and we have (3) The set of parameters is defined by ≡ (π 1 , ..., π N , θ 1 , ..., θ L ), where π i = (π i,1 , ..., π i,L ), i = 1, . . ., N .Using ( 1) and ( 3), we get the likelihood as follows [4]: Denote the complete data set by y T = (A T , I T 1 , ..., I T N ), where A is an N × 1 vector containing all pixel intensities.Then, assuming the components of A are independent identically distributed, as well as the components of {I i }, the conditional distribution of the complete data set is (5) Note that, while traditionally the prior does not depend on specific pixels, in (5), the priors π i,l are pixel dependent.We use this formalization to allow incorporating intensity inhomogeneity of sonar images.Also, the likelihood function in (5) does not incorporate local correlations between neighboring pixels.Hence, applying the ML algorithm directly on (5) does not utilize all available information.

IV. PROPOSED LSM METHOD
In this section, we present the details of our LSM segmentation method.LSM is composed of two main stages: the first is the initialization stage and the second is the clustering stage.The clustering stage is based on a modification of the EM algorithm and is initialized by the first stage as a good starting point to avoid trapping to a local minimum.We start with the initialization of the segmentation procedure and follow with a full description of LSM.

A. Initialization Method
Due to the frequent speckle noise in sonar images, it is crucial to produce good initial clusters to improve the convergence rate as well as the final segmentation results.Referring to the pseudocode in Algorithm 1, our initialization process is composed of three stages.In the first stage, the shadow segmentation process is performed.In the second stage, we identify the highlight-related samples.Finally, the remaining pixels are assigned the background label and the prior is initialized with π (1)   i = (0, 1, 0).To accelerate the processing time of the shadow and highlight segmentations, we process the integral image.
1) Integral Image: An integral image is a representation of the original image, which allows for a very fast computation of rectangular arrays [38].Let u i and v i denote the horizontal and vertical axes of the ith pixel in image A, respectively.Let K (u,v) {( j) : u j ≤ u, v j ≤ v} be the set of all pixels located above and to the left of the pixel at location (u, v).The integral image of A is given by Fig. 3(a) illustrates the process of a windowing architecture.The upper and lower red windows contribute to the calculation of the background, whereas the yellow window contributes to the calculation of the shadow and highlight.Using the integral image, the mean value of pixels inside a window is easily calculated and complexity improves.In Fig. 3(b), the blue region stands for the original image and the sum of all pixels in the gray region is A u i , v i .Let {a, b, c, d} be the four corners of a window [see Fig. 3(b)], then the sum of all pixels inside this window is given by: 2) Shadow Segmentation: Two maps are calculated based on the integral image, namely the background map and the shadow map.The background value of pixel i in the background map is given by where {a, b, c, d} and {e, f, g, h} are the four corners of the upper and lower windows, respectively [see Fig. 3(a)].The above-mentioned value is calculated using a split-window template composed of three equally sized squared windows with a size of γ 2 b .The size of the window is set according to the size of the object of interest such that the object is contained within the middle window.The parameter γ b is selected such that an object of interest is fully contained within, and as a result, it does not contribute to the estimation of the background.Therefore, the exact size of the object is not required, but only its bounds.Let the center of this template be in a (u i , v i ) position.The background map is the mean pixel value while accounting for the top and bottom windows only.We avoid the middle window since the object of interest may be covered by this window.These windows are stacked vertically and not horizontally since the highlight of the object, in sonar imagery, lies from left/right (depends on the sonar system) of the shadow region.
The shadow map S i corresponds to the mean value of the pixels at a squared window located at (u i , v i ) with a size of γ 2 Parameter γ s is chosen to be small enough to keep the object's boundaries.Our experiments reveal that γ s = 5 gives the best results.The intensity of pixels in a shadow region is lower than the intensity of the background.Thus, shadow segmentation is extracted by identifying pixels with a local mean intensity that is lower than the mean background intensity.For shadow segmentation, we develop a new initialization method that best fits sonar images.Our method is inspired by Williams [1].However, the work in [1] deals only with object detection, where the features of the object are of less interest, whereas to achieve the additional aim of object classification, our work also segments the object's regions.More specifically, in contrast to [1], in our case, the size of the window for calculating the shadow map should be small to maintain accurate segmentation along the object's boundaries.Moreover, we automatically select the threshold parameter according to the image's intensity distribution to allow for accurate segmentation in all of the object's regions.The label of the ith pixel is initialized as a shadow label if B i s > β s S i , where β s is an automatically set threshold parameter, as we will explain further.In this case, the prior for pixel i is replaced with π (1)   i = (1, 0, 0). 3) Highlight Segmentation: In sonar images, the background's intensity level is mostly dependent on the structure of the seabed.A sonar image may contain different types of seabed structures, such as sand or boulders.In such a case, the variations in the background's intensity level are high and the task of highlight segmentation is challenging.Considering this challenge, we apply the kth-order statistics [39].Let R γ s (u,v) and R γ b (u,v) be the set of all pixels in a squared window with sizes γ 2 s and γ 2 b , respectively, centered at (u, v) excluding the kth pixels of the largest intensities in each row of the window.The highlight segmentation is calculated using a small rectangle centered in the pixel of interest.
Formally, The highlight map H i is given by and the background map is A j (10) where i 0 is an arbitrarily located pixel in the upper righthand corner of the original image.We label pixel i as highlight if , where β h is an automatically set threshold parameter, as explained further.The prior is replaced with π (1)   i = (0, 0, 1).Note that unlike B i s , which is calculated for each pixel in the image, B i 0 h is constant to preserve the object's highlight boundaries.
4) Automatic Setting of Threshold Parameters: We now describe a method to automatically set the threshold parameters β s and β h .This is required to increase the robustness of our algorithm to different seabed environments.Using a fast approach with reasonable segmentation results, we utilize the ICM method [40] to extract an initial partition of the image into three clusters, namely highlight, shadow, and background, denoted by C H , C S , and C B , respectively.Next, we calculate intensity thresholds F s and F h , defined as the intensities for which the number of pixels with false segmentation is minimized.The intensity threshold F s is determined by solving where I is an indicator function that is unity if its argument is true, and zero otherwise.Similarly, F h is given by solving Since the thresholds β s and β h stand for the gap between the local shadow's mean and background and local highlight's mean and background, we set the threshold parameters β s and β h as the ratio between the intensity thresholds and the average of the shadow or background clusters.Formally The initialization process is described in Fig. 4, and its pseudocode is given in Algorithm 1.

B. Image Segmentation Using Modified EM
Our LSM sonar image segmentation algorithm is similar in its structure to the DEM method [31].Yet, while DEM showed satisfactory results for natural images, the results for sonar images severely degrade (see [37,Figs. 4 and 5]).To fit the case of sonar images, our LSM method replaces the DEM's diffusion step with an intermediate step, which takes into account the local intensity inhomogeneity in sonar images to achieve more accurate segmentation results.The key idea is to achieve robustness to different seabed structures by allowing the labeling of each pixel to be influenced by the states of its neighbors.While MRF-based methods utilize the spatial dependence among neighboring pixels via the smoothing prior according to some model, our approach is based on utilizing both the local texture and local intensity of the neighborhood.
1) Gamma Mixture Model for Spatially Independent EM: The basic EM consists of a conditional expectation (E-step) of the log likelihood and its maximization (M-step).The E-step is given by where (m) are the estimates of the parameter sets at the mth iteration.Based on (5) and ( 15), the conditional expectation is given by Algorithm 1: Proposed Initialization Algorithm.

Input: The original image A;
Output: The pixel labels; Begin 1: Calculate A using (6); 2: Calculate B i 0 h using (10); 3: Set the thresholds β s and β h using ( 13) and ( 14 The expression for the posterior v (m)  i,l is given by [4] where D is a normalization factor given by l=1 L π (m) i,l φ(A i |θ (m) l ).In the M-step, the parameter set (m) is updated by with the constraints l l=1 π i,l = 1 for i = 1, ..., N .This maximization problem is solved using a Lagrange multiplier η and by maximizing the following quantity: The solution of ∂ϒ/∂π i,l = 0 yields the following equation: From (17), we can get and using ( 20) and ( 21), we get the update Fig. 5.The second-order neighborhood configuration of pixel i in N i .This local neighborhood contains eight pixels.
The update for the parameter μ l is the solution of the equation ∂ϒ/∂μ l = 0 and is given by the closed form expression The estimation of α l is set by solving ∂ϒ/∂α l = 0, i.e., ln α (m+1) where ψ(•) is the digamma function.Since (24) has no closed solution, we adopt the following approximation [41]: Substituting (25) into the left-hand side of (24) leads to an approximate closed solution for α (m+1) l , as the positive root of the equivalent quadratic equation.

2) Intermediate
Step: In LSM, we utilize the expected spatial dependence between the image's pixels through an I-step performed in-between the E-step and the M-step.To that end, we already assign labels to pixels during the I-step such that in the M-step, we update the set parameters (m) based on the assigned labels.The labeling of pixels in the I-step is performed by i,l (26) where v (m)  i,l is obtained during the E-step.Let the neighboring group of a pixel i be N i .Note that N i includes a set of pixels such that i ∈ N i and ∀ i ∈ N i , i ∈ N i .In our application, the second-order neighborhood system is chosen [20].The neighborhood of pixel i is defined by 6 , A i,7 , A i,8 }.Fig. 5 shows such a neighborhood configuration.Let l N i be the number of pixels in N i with an assigned label equal to l and τ and δ be the incidence threshold and intensity threshold, respectively.Let O be the group of outlier pixels.Then, pixel i ∈ O if the following two conditions hold: Due to the intensity inhomogeneity of sonar images, we give special treatment to pixels, which are identified as outliers.For each identified outlier, we enforce the smooth posterior to maintain the maximum value of the label that corresponds with the majority of its pixel's neighborhood.Labeling in the I-step is performed by introducing a smoothed posterior, v (m+1) i,l whose value in the (m + 1)th iteration is and the weight (m+1) i,l of each ith pixel for each label l is The parameter σ l is the standard deviation of the pixels belonging to label l and is calculated using the parameter sets θ l as follows: Note that in (29), to exclude local outliers and to keep the boundaries of the object, higher weight is given to the majority labels of the local neighborhood as well as to the label that suits the statistics of the pixel, i.e., i,l → 1 as N l i → |N i |, which reflects the local label information.Moreover, as the statistics of A i suits the model of the lth label, i,l gets higher value.From (29), we observed that a priority is given to a label l in (m)   i,l whose N l i value is high.This means that the smooth prior of the ith pixel seeks a homogeneous environment.
3) Connecting the I-step and the M-step: The connection between the I-step and the M-step is performed as follows.First, posterior v (m)  i,l is substituted by v (m+1) i,l from (27) in the update of the prior probability π i,l in the M-step.Next, the parameter set is updated through a hard decision on the smooth posterior.This is done to improve the convergence rate.Formally, at the beginning of the M-step, we replace the posterior v (m)  i,l with where the assigned label for pixel i at iteration number (m + 1) is given by Algorithm 2: LSM Segmentation Algorithm.Input: The original image A; Output: Pixel labels {l i }; Begin 1: Initialize the priors {π (1)  i }, ∀ i, using algorithm (1); 2: Set {v (1)  i,l } ← {π (1)  i }; 3: Calculate {θ (1)  l }, ∀ l ∈ L; 4: Set m = 2; 5: while m < M max do 6: Calculate the posterior probabilities {v (m)  i,l } using (17); 7: Perform a hard decision on {v (m)  i,l } and get pixels label { l (m) i }; 8: Calculate {v (m+1) i,l } and {ω (m+1) i,l } using ( 27)-(32); } and update prior probabilities using ( 22); 10: Calculate gamma distribution parameters using (23) and ( 24) with {v (m) i,l } ← {ω (m+1) Stop the loop and output the pixel labels {l To summarize the operation of our segmentation algorithm, we illustrate a single cycle of the segmentation process in Fig. 6.The algorithm iterates until the number of maximum iteration steps M is reached or the results converge.Let U (m) be the log of the likelihood given in (5).Then, the latter is determined when |U (m) − U (m−1) | < , where is a user-defined parameter.The pseodocode for the process is given in Algorithm 2.

V. EXPERIMENTAL AND COMPARATIVE STUDIES
In this section, we explore the performance of LSM, as well as those of four state-of-the-art methods that represent different segmentation approaches and that yielded the best results for the segmentation of sonar images (to achieve object detection and classification).The first benchmark is the E-DS-M [32], which uses the evidence theory in the EM's I-step; the second is the Nguyen method in [8], which employs an improved formalization for the smoothing prior in the MRF model; the third benchmark is the robust fuzzy thresholding technique [42], which utilizes soft clustering via fuzzy membership degree; and the fourth benchmark is the graph cut algorithm [10], which uses a graph representation of the image.An efficient implementation of the graph cut method for sonar image segmentation [43] has been utilized.In this implementation, the initial seeds are chosen after the ICM segmentation results.The implementation of the graph cut algorithm, for sonar imagery, was performed according to [11].In this implementation, the highlight region was segmented according to the ICM method, and the graph cut algorithm splits the image into two groups of pixels, one assigned to the shadow and the second to the background.
To study the effect of the seabed texture on the proposed segmentation method, three types of seabed textures are analyzed: sand, seagrass, and sand ripples.The proposed initialization algorithm is compared with the most common approaches, namely the K-means [44] algorithm and the four-region [45] method.We test performance over both simulated sonar images and real sonar images.To verify the performances of LSM as blind segmentation method, we analyze the segmentation results on different kinds of object's shapes.This includes a synthetic cylindrical object and arbitrary geometry of real underwater objects.
Unless otherwise mentioned, the following setting is considered: the compared methods are initialized with a fourregion, with a window of 5 × 5 pixels.This window has been demonstrated to produce satisfactory results [45].The parameters of the proposed initialization algorithm are set with: γ b = 50, γ s = 5, and k = 3 , which give the best initialization results.The threshold parameters τ and δ are set to 4 and 1.5, respectively.The maximum number of iterations is set to 50 and the convergence threshold is set to = 0.001.To choose the parameters of the benchmark methods, we perform a statistical analysis.For the E-DS-M, we choose γ 1 = 0.1 and γ 2 = 1.5 as the best fit, as obtained in the numerical analysis of [32].For the Nguyen method, β = 12 is used [8] and the neighborhood ∂ i is a square window with a size of 3 × 3 pixels.For the fuzzy thresholding method, the best results are obtained using a median filter with a neighborhood block with a size of 3 × 3 and a fuzziness index IEEE JOURNAL OF OCEANIC ENGINEERING of 2 [42].As a good compromise [43], the parameters h and v of the graph cut implementation were set with 0.1

A. Quantitative Measures
The performance analysis of a sonar image segmentation is more than the counting of erroneous pixel identification and should involve determining the shape and formation of the segmented objects.Yet, we have not yet found a proper definition for such measures.Considering this need, in this section, we formalize our quality measures for the segmentation of underwater images containing a submerged object.As statistical quantitative measures, we use the misclassification ratio (MCR) [46] and the probabilistic rand (PR) index [47].MCR is defined as the number of misclassified pixels normalized by the total number of pixels and is given by The PR index measures the similarity between each pixel pair in the image and is given by where p i j = Prob(G i = G j ).Clearly, segmentation improves, the higher the PR is the lower the MCR is.
To allow for a quantitative assessment of the segmented object, we determine three shape measures, namely, the position fit ς p , the shape fit ς s , and the size fit ς n [48].The position fit is given by where c u is the difference in the horizontal direction of the center of the object (in region c) between the segmentation results and the ground-truth map, c v is the difference in the vertical direction, and C H and C S are the highlight and shadow clusters, respectively.Explicit expressions for c u and c v are given by where N c u and N c v are the number of rows and columns in the image, u c S i and v c S i are the average of the abscissas and ordinates of the object (in region c), and u c G i and v c G i represent the groundtruth map averages.The size-fit quality measure is given by where N c f and N c g are the number of pixels of the object at region c in the segmentation results map and the ground-truth map, respectively.Shape fit is given by where N c f ∩g is the number of pixels in the intersection between the results map and the ground-truth map in the object region c and N c f ∪g is the union between the two maps.Clearly, segmentation results improve as ς p , ς s , and ς n increase.
To further examine segmentation performance, we consider the geometrical features of the segmented object.Specifically, we consider the perimeter P of the object, area ratio R area , and solidity Sol.The area ratio is defined as the ratio between the areas of the shadow and the highlight regions, and solidity is the ratio between the area of the object and the area of its convex hull.Based on these geometrical features, we define the following evaluation indices: ) ) where P S f and P S g are the segmented and ground-truth perimeters of the object's shadow, respectively, P H f and P H g are the segmented and ground-truth perimeters of the object's highlight, respectively, Sol S f and Sol S g are the segmented and ground-truth solidities of the object's shadow, respectively, Sol H f and Sol H g are the segmented and ground-truth solidities of the object's highlight, respectively, and R area, f and R area,g are the segmented and ground-truth area ratios of the object, respectively.In general, segmentation improves, the lower the values of κ perim , κ area , and κ sol are.

B. Results on the Synthetic Images
An example of the synthetic sonar images used in our simulations is shown in Fig. 9(a).The figure contains three images of sand, seagrass, and sand ripples.Sand and seagrass seabed textures are generated according to the models in [49], whereas the sand ripples texture is generated according to [50].The object region and background are synthesized separately.The object's intensity level is modeled by a gamma distribution with μ s = 120 and σ s = 10 for the shadow region and μ h = 1 and σ h = 0.1 for the highlight region.Similar to the model proposed in [37], the object and background are superimposed as follows: where χ i,h , χ i,b , and χ i,s stand for the intensity level of the ith pixel in the highlight, background, and shadow regions, respectively.The size of the synthetic images is 300 × 300.A cylinder structure is considered as the object in all synthetic images.To test performance statistically, we performed 1000 Monte Carlo simulation runs.In each run, a new background image was realized, each of which was corrupted by a speckle noise with an STD of 0.3.

1) Parameters Study of γ and δ:
To comment on the sensitivity of our algorithm to parameter choice, in this section, the effect of the two parameters γ and δ is analyzed.These two parameters control the size of the outliers group O. Specifically, a small value of γ increases the number of outliers, whereas a large value increases false segmentation.In contrast, δ controls the relative intensity threshold for outlier declaration.We study these two parameters in the synthetic seagrass image presented in Fig. 9(a).In Fig. 7, we show the MCR (33) values calculated for different values of γ and δ.To set the optimal thresholds, a 2-D search for the threshold combination of the incidence threshold and intensity threshold is required.Fortunately, l N i ∈ {1, ..., 8} such that the complexity of this operation is low.Hence, we calculate the MCR(γ, τ j ) values ∀ τ j = {1, ..., 8}, and select those γ and τ values that lead to minimum MCR.We observe that the best MCR results are obtained for γ = 4 and δ = 1.5.
2) Results for the Proposed Initialization Algorithm: The initialization results of the benchmark methods, K-means, fourregion, and our LSM are presented in Fig. 8 for the seagrass (upper row) and sand ripples (lower row) examples.We notice that the object's shadow is almost fully blended with shadow pixels.Similarly, due to the high-intensity inhomogeneity that exists in the seagrass image, the results presented in Fig. 8(c) show false segmentation for four-region initialization.However, the results also show that the borders of the highlight and shadow of the cylindrical object are preserved.The results of Fig. 8(e) show that the proposed initialization method achieves a clean segmentation performance.For the sand ripple background, the results presented in Fig. 8(g)-(j) show a much more noisy segmentation for all four initialization methods.Note that the results of the proposed method in Fig. 8(e) and (j) for pixels near the edges are mostly labeled as the background.This is because, due to the windowing, fewer pixels are included in B i s near the edges of the image, and thus the local background is estimated on less pixels than windows located far from the edges.As a result, near the edges, less pixels pass the shadow and the highlight tests.
Fig. 8(c) shows that four-region is better than K-means in terms of false shadow segmentation.However, clearly the best segmentation result is obtained when using the LSM.In particular, the object's borders are more accurate and the false segmentations rate is low.
3) Quality Analysis: Segmentation results for benchmark methods and those of our LSM are illustrated in Fig. 9(b)-(d), respectively.The assigned pixels to the labels {S, B, H} are shown in different colors: black for shadow, brown for background, and yellow for highlight.While all three methods segment the image well, segmentation performance degrades significantly the more complex the background is.For the seagrass image in Fig. 9(a), we observe many false segmentations, produced by the Nguyen and fuzzy thresholding methods.This includes an enlargement of the object's shadow region.In contrast, E-DS-M has a cleaner background region compared to the Nguyen method.However, the borders of the highlight and shadow regions are corrupted.This phenomena does not occur with our LSM method, which produces clean segmentation, and the object's highlight and shadow are more accurate, compared to the benchmark methods.A similar effect is observed for the sand ripples image in Fig. 9(a).The problem of false segmentation that appears when operating the Nguyen method is more acute, and the object's shadow region blends in with the sand ripples.The performances of E-DS-M, graph cut, and our LSM are cleaner than the Nguyen, and these methods keep the object's borders in the highlight region.However, in the E-DS-M and graph cut methods, the object's shadow borders are less accurate than the ones produced by the LSM.
4) Quantitative Analysis: a) MCR and PR Analysis: We start by showing the performance in terms of MCR (33) and 1-PR, where PR is given in (34).The MCR and PR measures are indicators of the image segmentation's accuracy.The results are shown in Fig. 10 along with the 95% confidence intervals.All methods, except for the fuzzy thresholding, have an average MCR of less than 1% and an average PR of above 98% for the sand image, which is relatively homogeneous.However, for the complex sand ripples and seagrass images, which have strong intensity inhomogeneity, the performances of the compared methods severely degrade, whereas those of our LSM are not affected.LSM's small confidence interval shows its robustness to different background realizations.We therefore conclude that LSM is specifically targeted to a complex seabed environment.
b) Analysis of Shape Measures: Since for object classification, the shape of the detected object is of interest, in our analysis, we also consider the shape, size, and position of the segmented highlight and shadow regions.Table II shows the shape measures, ς n (38), ς s (39), and ς p (35) for the three compared schemes.The values in the table are mean values and     (39), ς n (38), AND ς p (35) FOR SYNTHETIC SONAR IMAGES WITH SPECKLE NOISE (STD = 0.3) Results show that for complex seabed texture, such as sand ripple and seagrass, LSM achieves the best results.the 95% confidence intervals (in brackets).From Table II, we observe that the average ς p for all methods is almost not affected by the background type.We also observe some inconsistencies between the performance of the five methods for different backgrounds, in terms of the other measures.For example, E-DS-M performs poorly for the seagrass image, whereas those of the Nguyen method reduces for the sand ripples image.However, the proposed method shows robust performance with an average of ς n , ς s , and ς p above 0.98, 0.88, and 0.99, respectively, for all images.This suggests that the proposed algorithm also outperforms the compared algorithms in terms of the shape of the segmented object.We report that similar trends are obtained for higher speckle noise.c) Analysis of Geometrical Featuresa: The object's geometrical properties are usually of vast importance in the object classification process.In this section, we analyze our segmentation method performances and the four benchmark methods, in terms of the geometrical features of the segmented highlight and shadow regions.Table III shows the segmentation performance in terms of our geometrical quality measures, κ perim (40), κ area (42), and κ sol (41).From the results, we conclude that the perimeter of the object in the compared methods is mostly corrupted for seagrass and sand ripples backgrounds.Also in this case, LSM achieves the best performance.Furthermore, the LSM's relatively small confidence intervals and the consistency of the results for the different background types show that the LSM is robust for the seabed environment.

C. Results on Real Sonar Images
To complete our investigation, we now discuss results obtained for a set of real sonar images shown in Fig. 12(a).The three images were produced by an EdgeTech 4125 sidescan sonar and we refer to them as body, fish, and bicycle.The size of the images is 230 × 217, 250 × 385, and 350 × 500 pixels, respectively.
1) Initialization Results: We next explore the tradeoff between the miss segmentation and false segmentation of the  highlights and shadows over the real sonar images at the output of the initialization process.Let FS h and FS s be the false segmentation indices in the highlight and shadow regions, respectively, such that where C H and C S are the clusters of highlight and shadow in the segmentation results, respectively, l i is the label of pixel i in the segmentation results, and G i is the label of pixel i in the ground-truth map G. Similarly, miss segmentations MS h and MS s are defined as the average number of pixels that belong to the highlight and shadow clusters in G, respectively, and were not correctly assigned in the segmentation results.Formally where C g,h and C g,s are the highlight and shadow clusters in the ground-truth map G, respectively.Fig. 11(a) and (b) shows the false segmentation and miss segmentation values of the initialization results for the highlight and shadow regions of the body image.The curves in Fig. 11 are generated by computing ( 44)-( 47) from the results of the proposed initialization method for each intensity threshold F on the curves.The automatic process for setting the threshold levels determine the intensity thresholds F h and F s to be 1.07 and 1.15, respectively, and is represented by the vertical lines in Fig. 11(a) and (b).We observe that our process for automatic determination of the thresholds determines the intensity thresholds for both the highlight and shadow regions around the crossing point of the curves, i.e., our method produces a favorable tradeoff between the false segmentation and the miss segmentation results.
2) Segmentation Results: The segmentation results are presented in Fig. 12.For the bicycle image, we observe obvious false segmentation areas when using the Nguyen, E-DS-M and fuzzy thresholding benchmarks, i.e., a large region is incorrectly labeled as highlight.For the body image, the shadow of the object was accurately segmented by all methods, except for the graph cut.However, using the Nguyen method, the highlight of the object is corrupted.Although it still exists, the highlight corruption is less apparent in using the E-DS-M; yet, a much cleaner and accurate segmentation is achieved by the LSM.Conditions of high intensity inhomogeneity can be clearly seen in the fish image.This inhomogeneity causes a serious degradation in the segmentation performance of all methods.However, compared to the benchmark methods, the LSM and graph cut methods produce cleaner results with no obvious false segmentation.The results in Fig. 12 show that the segmentation performance of graph cut is cleaner, i.e., fewer background-based false negative exists.However, for ADAC's main task of object classification, the objective of the segmentation part is not just to provide a clean segmented image, but also to keep the object's formation and avoid distortions to its highlight and shadow regions.This is important since, later, the segmented highlight and shadow areas are used as extracted features for the classification.In that context, the results in Fig. 12 show that graph cut significantly distorts the highlight and shadow regions of the objects, whereas LSM provides an accurate object-related segmentation.

D. Sea Trial Results
To comment on the performance of LSM in a realistic sea environment of complex seabed and over different sonar systems, we performed three sea experiments.In these experiments, we used a 400-kHz multibeam sonar EM 2040 mounted on the RV Bat-Galim research ship and two Kraken-made SAS systems mounted on the sides of our own A18 5.5 m Eca Robotics Inc., AUV [51].A picture of the SAS system onboard our AUV shown in Fig. 13(b).The first trial included the multibeam system scanning an area of depth 25 m and detecting two targets of truncated metallic cones whose picture is shown in Fig. 13(a).The second experiment involved the multibeam system scanning an area of extreme complex seabed that includes rocks, sand, and reef at depth of 20 m.The third experiment included the AUV scanning with its SAS as deserted gas well as depth of 1000 m.The three produced images are of complex background and include diverse targets.To allow reproducibility, we share these original three sonar images in [52].
The resulting original multibeam sonar image from the first experiment is shown in Fig. 14(a).The image includes two targets (shown in the middle of the image), a diverse background of sand (right side), rocks covered with sediments (middle), exposed rock (left upper corner), and a crack (left-hand side).Segmentation results for LSM and the other four benchmark methods are shown below the original image.From the segmentation results, we observe that the shadow of the object is clearer than the highlight.Hence, only pixels that relate to the shadow of the objects are segmented well.Comparing the segmentation performance, we conclude that also for the sonar image generated in our experiment, LSM outperforms the four benchmark methods in terms of segmentation accuracy of shadow and background regions, as well as in preserving the shape of the segmented object.
The sonar image obtained in the second trial is shown in Fig. 14(b).The image includes a highly complex background of rocks, boulders, and sand ripples.To test segmentation performance, we included two synthetic targets shown in the middle and right lower side of Fig. 14(b).One target is placed on a rocky area, whereas the other is placed on sand ripples.We observe that all methods segmented successfully the object on the rocky area, whereas the segmentation results for the object on the sand ripple were satisfactory only for graph cut and LSM.The segmentation results in Fig. 14(b) revealing that LSM obtained the most clean background.
The sonar image obtained in the third trial is shown in Fig. 14(c).In column (c) of Fig. 14, we introduce the segmentation results of LSM and the benchmark for the SAS image.In the middle of the figure, we see the sonar reflection from the gas well.This structure includes a 4-m-high metal pole whose highlight intensity is extremely high and whose shadow is very long.The result is an image with a complex distribution for the highlight and shadow regions.However, we observe that all methods segment successfully the shadow region of the object, strong intensity inhomogeneity in the original image causes a severe degradation in the segmentation results of the highlight and background regions.In the segmentation results of fuzzy thresholding, the highlight region of the object is merged with the background.Also here, LSM achieves the most clean results while keeping the object's boundaries.

E. Comparison of Running Time
Since the I-step is performed as part of the EM's main loop, it does not influence the complexity of the whole process and Fig. 15.Running time for the five compared segmentation methods for different sized sonar images.The maximum number of iterations is 50 and the convergence threshold is set to = 0.001.The results show that the graph cut method is faster than the other algorithms.the complexity of the LSM is similar to the basic unconstrained EM algorithm, namely, O(M max NL).This is similar to the complexity of the Nguyen method, whereas the complexity of the E-DS-M is O(M max NL 2 ).
To demonstrate the running time of the LSM and compare it with that of the benchmark methods, in Fig. 15, we compare the average running time of the five methods, as a function of the sonar image size.All experiments were performed using a MATLAB R2011a installed onboard an Intel(R) Core(TM) i7-3770 CPU @3.4 GHz, 8GB RAM processor.We observe that for an image of 150 × 150 pixels and above, the E-DS-M is time consuming and cannot be used for real-time applications.We also observe that the Nguyen method is slightly faster than our LSM.The proposed algorithm is time consuming compared to the graph cut and fuzzy thresholding methods.However, the performance and accurate results of the LSM, as shown above, compensates for this drawback.

VI. CONCLUSION
In this paper, we introduced the LSM: a new, unsupervised segmentation algorithm used for object detection and classification in sonar images.LSM aims to overcome the problem of intensity inhomogeneity, which exists in complex seabed textures, such as sand ripples or seagrass, to accurately label the pixels related to the object's highlight and shadow.Our algorithm is built around a simple, yet effective model, which incorporates the expected correlation among neighboring pixels using an intermediate step between the E-step and the M-step of the EM algorithm.To achieve robustness to different seabed structures, we also proposed an automatic process for setting the segmentation thresholds used during the algorithm's initialization process.We measured LSM performance using novel quality measures that examine not only the label's assignment result, but also the segmentation quality in terms of the segmented shape, structure, and area.Our numerical and experimental results showed that, compared to benchmark segmentation methods, the LSM produces accurate segmentation for complex seabed structures, even in the presence of significant speckle noise.Since the main challenge in sonar imagery is still the noisy background, future research on joint image denoising and segmentation is planned.

Fig. 1 .
Fig. 1.Scheme of an ADAC system.Region of interest (ROI) is first detected.Segmentation is then extracted for each ROI.Based on the segmentation map, geometrical features of the detected objects within the ROI are measured.Using these features, the classifier assigns a class to each object.The contributions of this paper are focused on image segmentation.

Fig. 2 .
Fig. 2. Complex seabed texture in a sidescan sonar image.Examples of shadow and highlight regions are represented by red lines.Examples of background are represented by rectangles.

Fig. 3 .
Fig. 3. Schematic diagrams of the windows template for background, shadow and highlight calculation, and the integral image.(a) Upper and lower red windows contribute to the calculation of the background B i s , and the yellow window contribute to the calculation of the shadow map S i and the highlight map H i .(b) Illustration of the integral image.The blue window refers to the original image, and the sum of all pixels in the gray window is A u i , v i .

Fig. 4 .
Fig. 4. Flowchart of the proposed initialization algorithm.Shadow and highlight regions are detected by local windows.The thresholds β s and β h are set automatically.Pixels that are labeled as neither shadow nor highlight are labeled as background.

Fig. 6 .
Fig. 6.Single cycle of the proposed segmentation algorithm.The intermediate step is inserted between the E-step and M-step of the EM algorithm so as to utilize local spatial information.

Fig. 8 .
Fig. 8.Comparison of the initialization methods for the segmentation of the seagrass and sand ripple images: (a) Original seagrass image.(b)-(e) Provide results with seagrass image of K-means, four-region, ICM, and our LSM, respectively.(e) Original sand ripples image.(g)-(j) Provide results with sand ripples image of K-means, four-region, ICM, and LSM, respectively.Results show that our LSM produces the cleanest and most accurate results.

Fig. 9 .
Fig. 9. Final segmentation results for the synthetic sonar images.Each column corresponds to a different sonar image.Synthetic sonar images of cylindrical objects and final segmentation results of Nguyen, E-DS-M, graph cut, fuzzy thresholding, and LSM (from top to bottom, respectively).(a)-(c) For sand, seagrass, and sand ripples background, respectively.The colors in the segmentation maps represent: black for the shadow region, yellow for the highlight region, and brown for the background region.The results show the most accurate segmentation result for our LSM.

Fig. 10 .
Fig.10.PR(34) and MCR(33) values for the synthetic sonar images with speckle noise (STD = 0.3).The 95% confidence interval is introduced on each bar in the figure.(a)Results for the MCR.(b) Results for the (1-PR).The results show that the LSM consistently generates better MCR and PR than the comparison algorithms.

Fig. 11 .
Fig. 11.False segmentation and miss segmentation values [from (44) to (47)] for the body image versus the intensity threshold F. The results of automatic selection of thresholds are F h = 1.07 and F s = 1.15 and are marked with black lines.(a) Results for the highlight region.(b) Results for the shadow region.The results show that our automatic procedure for choosing the threshold achieves a favorable tradeoff.

Fig. 12 .
Fig. 12. Final segmentation results for real sonar images (original image courtesy of EdgeTech [5]).The color in the segmentation maps represent: black for the shadow region, yellow for the highlight region, and brown for the background region.Real sonar images and final segmentation results of the Nguyen, E-DS-M, graph cut, fuzzy thresholding, and LSM (from top to bottom, respectively).(a)-(c) For bicycle, body, and fish, respectively.The best segmentation results are obtained by the LSM.

Fig. 13 .
Fig. 13.Photographs of the targets considered in the sea trial study and the Eca Robotics AUV A18.(a) Two targets were used: one of a cone shape (right-hand side of the figure) and other of the tube shape (left-hand side of the figure).(b) A18 AUV and its SAS system.Picture taken during our sea experiment

TABLE I LIST
OF MAJOR NOTATIONS

TABLE II COMPARISON
OF THE GENERAL SHAPE PROPERTIES ς s