A novel optimized neutrosophic k-means using genetic algorithm for skin lesion detection in dermoscopy images

This paper implemented a new skin lesion detection method based on the genetic algorithm (GA) for optimizing the neutrosophic set (NS) operation to reduce the indeterminacy on the dermoscopy images. Then, k-means clustering is applied to segment the skin lesion regions. Therefore, the proposed method is called optimized neutrosophic k-means (ONKM). On the training images set, an initial value of α\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document} in the α\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}-mean operation of the NS is used with the GA to determine the optimized α\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document} value. The Jaccard index is used as the fitness function during the optimization process. The GA found the optimal α\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document} in the α\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}-mean operation as αoptimal=0.0014\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{\mathrm{optimal}} =0.0014$$\end{document} in the NS, which achieved the best performance using five fold cross-validation. Afterward, the dermoscopy images are transformed into the neutrosophic domain via three memberships, namely true, indeterminate, and false, using αoptimal\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{\mathrm{optimal}}$$\end{document}. The proposed ONKM method is carried out to segment the dermoscopy images. Different random subsets of 50 images from the ISIC 2016 challenge dataset are used from the training dataset during the fivefold cross-validation to train the proposed system and determine αoptimal\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{\mathrm{optimal}}$$\end{document}. Several evaluation metrics, namely the Dice coefficient, specificity, sensitivity, and accuracy, are measured for performance evaluation of the test images using the proposed ONKM method with αoptimal=0.0014\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{\mathrm{optimal}} =0.0014$$\end{document} compared to the k-means, and the γ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma $$\end{document}–k-means methods. The results depicted the dominance of the ONKM method with 99.29±1.61%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$99.29\pm 1.61\%$$\end{document} average accuracy compared with k-means and γ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma $$\end{document}–k-means methods.


Introduction
Dermoscopy is a noninvasive diagnostic device for the pigmented skin lesions' in vivo observation for imaging of the skin surface/subsurface structures [1].The skin lesion detection/segmentation for dermoscopy images diagnosis is still challenging owing to the complexity, the skin lesion structures variability, the presence of artifacts.In order to overcome such uncertainty and indeterminacy, fuzzy set theory has been applied for effective segmentation.A segmentation method using fuzzy c-means (FCM) clustering approach has been implemented to outline the skin malignant regions [2].
In order to resolve the FCM incompetence of managing uncertain data, an integrated fuzzy c-means and neutrosophic set (NS) structure clustering method has been designed [3].Thus, a neutrosophic method has been applied to image segmentation using NS with defined operations called γk-means clustering [4].In order to reduce the indeterminacy, two operations, namely α-mean and β-enhancement, were used for efficient clustering.Consequently, it is essential to find the optimal values of this NS operation, which can be stated as an optimization problem.Several optimization techniques have been applied to solve optimization problems in skin cancer segmentation and other applications.
From the preceding studies, no previous studies have been conducted to optimize the NS operations for segmentation improvement.Consequently, the main contribution of the present work is to determine the optimal value of α which is used in the α-mean operation in [4] of the NS.The GA is used to solve the α value optimization problem.This α optimal is used through the mapping of the dermoscopic images on the NS space.The mapped skin lesion images are segmented using the k-means clustering.Finally, the performance of the proposed optimized NS for skin lesion segmentation is compared with the NS-based k-means as well as the k-means method in terms of several evaluation metrics.This evaluation process was applied to skin cancer ISIC 2016 Challenge dataset [5].

Methodology
The k-means clustering algorithm has an overseen role in different scenarios due to its simplicity and efficiency.The k-means is computationally faster compared to other clustering algorithms while handling large number of variables.Since almost all the skin lesions have approximately circular shape, this inspired the proposed approach to use the k-means clustering.Nevertheless, the existence of some artifacts, including the air lighting reflection, bubbles and the dark hair covering the lesions obliges reducing the indeterminacy of the image using the NS prior to the clustering process to improve the performance of the k-means while handling segmenting the skin lesion images.
This study implemented a skin lesion detection procedure based on the optimized neutrosophic k-means (ONKM) in dermoscopy images.A GA is used to optimize the α value in the NS using the maximization of the Jaccard index (JAC) as a fitness (objective).Then, the dermoscopy images are then mapped into the NS and then processed by the optimized αmean filter using α optimal that reduces the indeterminacy of the image.Subsequently, the image is segmented using the k-means method.Finally, the skin lesion detection is accomplished by the morphology operation.

Neutrosophic image
Neutrosophy can be used efficiently to define the indeterminacy/uncertainty in the information.A membership sets that have a specific degree of truth (T ), indeterminacy (I ), and falsity (F) exist for independent consideration for every event in the NS.These membership functions are used to map the input image into the NS domain producing the NS image (S NS ).Thus, in the image, the pixel S(x, y) is described as S NS (x, y) = S (t, i, f ) = {T (x, y), I (x, y), F(x, y)} in NS domain showing the true, indeterminate, and false belonging to the bright pixel set.Let A(x, y) represent the intensity value of the pixel (x, y), and its local mean value is denoted by Ā(x, y).Thus, the membership functions can be expressed as follows [4]: where A(x, y) is the intensity value at the pixel (x, y), Ā(x, y) is its local mean value, and δ(x, y) represents the absolute value A(x, y) − Ā(x, y) .Thus, the value of I (x, y) is used to measure the indeterminacy of S NS (x, y).Typically, the NS image entropy is defined as the entropies summation of the three sets T , I , and F that reflect the elements distribution in the NS domain, which is expressed as follows: where E T , E I , and E F are the three subsets entropies, and p T (i) , p I (i), and p F (i) represent the probabilities of the elements in the three membership functions.In addition, the deviations in T and F inspire the elements distribution in the image and the entropy of I to make T and F correlated with I .

α-Mean for neutrosophic image
The local mean operation for a gray-level image H is [4]: The α-mean operation for neutrosophic image S NS is where T (α), Ī (α), and F(α) are expressed as follows [4]: where 'b' stands for the size of the average filter, which is set as b = 3 to generate the NS image, I indeterminate subset, and δT (x, y) = abs T (x, y) − T (x, y) , which represents the absolute value of the difference between the mean intensity and its mean value of the mean intensity.Thus, the entropy of I is increased by obtaining uniform distribution of the elements, where the α value in the α-mean operation is optimized in the current work using the GA.

Optimization in α-mean using genetic algorithm
In this work, the optimal value of α is optimized using the GA, which is considered one of the efficient optimization algorithms in different problems [6].The procedure of the GA for optimization is as follows.Throughout the optimization process, the JAC is used as the fitness function, which is a statistical measurement that uses the union '∪' and intersection '∩' operators of any two sets.This fitness JAC is given by: where Ar Y and Ar Q are the computerized segmented skin lesion region using the proposed ONKM method and the ground truth skin lesion region, respectively.Figure 1 illustrates the flowchart of the ONKM skin lesion detection algorithm to obtain α optimal during the training phase, where the step of 'NS for each image' is demonstrated in Fig. 2 showing the γ -k-means skin lesion detection algorithm.
During the testing phase, α optimal is used directly (without any need for further GA optimization) for mapping the test image into the NS domain; afterward, the mapped image is segmented using the k-means clustering process.

K-Means clustering using optimized α-mean
K-means is a clustering technique that groups the data/objects into K groups [7].K-means aims to satisfy the following expression [4]: where Z j and d j are the center and the number of pixels of the j th cluster, respectively, and q is the total number of clusters.In the k-means algorithm, it is required to minimize O by satisfying the following condition: where in the dataset W = {w i , i = 1, 2, . . ., n}, w i is a sample in the d-dimensional space, C = C 1 , C 2 , . . ., C q represents the partition that satisfied that W = ∪ q i=1 C i .In the current work, the k-means clustering process is defined This k-means clustering for the optimized NS is applied to the subset T .

Proposed optimized neutrosophic k-means (ONKM)
The ONKM method consists of two main phases, namely training and testing.In the training phase, α optimal is determined using the GA.Initially, a random initial value of α is assumed to start the optimization process.This initial value is used to transform the input images in the training phase into the NS domain using Eqs.( 1)-( 4).During the optimization process, the indeterminacy of S NS is decreased till it reached its minimum value with the highest JAC at certain α (which is defined as α optimal ) on the subset T .At this α optimal , the JAC value and the entropy of the indeterminate subset I are unchangeable.This optimal α optimal is then used to map the training image on the NS space for further segmentation using k-means clustering method during the optimization process using the GA.Finally, α optimal is used directly in the test phase to map the test images on the optimized NS domain for further segmentation using the k-means clustering on the mapped test.
The main steps of the proposed ONKM algorithm are: The pixels in the dermoscopy images are grouped into different groups based of the pixels' T values.The cluster that has the lowest T value is defined as a lesion candidate pixel according to the lesions' intensity features.The Dice coefficient, specificity, sensitivity, and accuracy [8] are calculated as performance metrics to assess the proposed ONKM skin lesion detection algorithm.

Experimental results and discussion
Dermoscopic images of skin lesions from the international skin imaging collaboration (ISIC) archive [5] are employed in the current experiments to test the proposed method's performance.In the current experiment, different random subsets of 50 images from the ISIC 2016 challenge dataset are used from the training dataset during the five fold cross-validation to train the proposed system and determine α optimal using GA, and the remaining 1229 images are used during the testing phase using the obtained α optimal , where only the test results are included in the performance evaluation analysis.Figure 3 displays the resultant images at the different stages of the proposed method, which includes the NS transform images.

Genetic algorithm-based NS α-means optimization
The GA is configured to calculate and to achieve the maximum of JAC using F(Y , Q) = 1 − JAC(Y , Q) for each iteration to reach the fitness target of 1 −4 over the training dataset.The iteration and convergence process of the GA in Fig. 4 shows the best and mean fitness value of the dermoscopic training set of the skin images.Moreover, the GA iteration results over generations are reported in Table 1, where the first column 'Generation' represents the generation number, the second column ' f -count' represents the cumulative number of fitness function evaluations, and the last column 'Best F(x)' represents best fitness function value across generation.The GA converged to the maximum JAC value that achieved the best fitness function at α = 0.0014  over the ten training images.In the test phase, α optimal is used to optimize the NS for further k-means for segmentation.

Detection results of the ONKM proposed method
Figure 5 illustrates the ONKM detection results associated with the corresponding ground truth images.Figure 5d shows the marked detected boundaries in blue that are highly harmonized with the ground truth results.Figure 5 illustrates the ONKM efficiency for detecting the skin lesion regions, even with the presence of lesions of different sizes, shapes, and color in the existence of hair and other artifacts without using any preprocessing step owing to the ability of the ONKM to reduce the indeterminacy using α optimal = 0.0014.

Evaluation
Five sets of randomly selected 50 training images are used to tune the parameters and determine the optimal parameter α optimal of the proposed method using f -fold crossvalidation.The mean and standard deviation (SD) of the The used optimal value is α optimal = 0.0014, which achieved the best metrics value as reported in Table 2.It achieved 99.29% average accuracy of the ONKM skin lesion region detection with 1.61 standard deviation value compared to the ground truth images.These experimental results established the ONKM ability to detect the skin lesion of different geometries.Furthermore, Fig. 6 illustrates the evaluation metrics results of randomly selected skin lesion images using the proposed method.

Comparative study with γ -k-means and k-means
A comparative study using the evaluation metrics is carried out to compare the ONKM segmentation method with kmeans without NS nor GA, and with the work done in [4] using the γ -k-means clustering that with α = 0.01 and 0.85 without optimizing.The detection results of the selected test images are displayed in Fig. 7 for samples having different shape, size, and skin surface smoothness/roughness. Figure 7(a1-a5), (b1-b5), (c1-c5), (d1-d5), and (e1-e5) illustrates the original dermoscopic images, segmented images using the ONKM method, γ -k-means with α = 0.85, γ -kmeans with α = 0.01 [4], and the segmented images with k-means algorithm, respectively.The red contours on the images represent the ground truth ones, while the blue contours are the detected lesion using the corresponding automated segmentation method.Figure 7 validates that the ONKM method precisely detects the skin lesion for the different cases compared to the ground truth detected regions and the other k-means, γ -k-means methods.The comparative results of the evaluation metrics for ten images using the proposed ONKM, γ -k-means, and k-means are demonstrated in Figs. 8 through 11, respectively.In these figures, the X -axis represents the image name, while the Yaxis represents the measured metric's value of the accuracy, Dice, sensitivity, and specificity, respectively.
Figures 8 through 11 illustrate the superiority of the ONKM compared to the γ -k-means [4], and k-means methods by reducing the indeterminate information more efficiently due to the use of optimized NS with k-means.Table 3 reports the mean and standard deviation of several performance metrics (Figs.9,10,11).Table 3 reports the measured metrics for the k-means, γ -k-means, and the proposed ONKM methods.The results proved the superiority of the proposed method to detect the skin lesion compared to the reported results of the other algorithms.
The foregoing results with the comparative studies proved the superiority of the proposed ONKM using α optimal = 0.0014 compared to the k-means (without NS) and γ -kmeans methods using α = 0.85 [4] and α = 0.01 due to the reduction in the indeterminacy using the optimal optimized NS operation for segmenting the skin lesion in the ISIC dermoscopic images dataset.Furthermore, it is clear that using α optimal in the ONKM provided better results without the use of the β-enhancement that has been engaged in the γ -k-means method.A comparison against the top recorded state-of-the-art (SOTA) ISIC2016 challenge participants to the lesion segmentation challenge (Part 1) is reported in Table  From the preceding results and comparative studies, it is clear that the proposed ONKM approach has overall superior performance compared to the segmented images using γ -kmeans with α = 0.85 and α = 0.01, and k-means algorithm as well as the top recorded state-of-the-art ISIC 2016 challenge results.However, there are some cases in which the proposed method has deteriorated performance as illustrated in Fig. 12.
Figure 12 illustrates that the deteriorated performance is due to the existence of dark and/or different color regions in the lesion.In addition, the black frame that exists in the image (ISIC_0000276) may affect the results too.Consequently, it is recommended to study such cases in the future work in order to improve the performance on the proposed method.

Conclusion
In this work, a novel skin lesion detection/segmentation method is implemented based on a genetic algorithm for optimizing the value of α in α-mean operation in the neutrosophic set for further k-means clustering of the dermoscopy images.The proposed ONKM method found the optimal α is α optimal = 0.0014 that achieved the highest JAC values (fitness function) during the GA optimization process, where 50 images are selected randomly from a public dataset (ISIC 2016) to train the proposed method, while 850 images are used in the test process to evaluate the proposed ONKM method.The skin lesion images are mapped into the NS domain to reduce the indeterminacy and uncertainty during the segmentation process.Then, the mapped image is segmented using the k-means clustering method, where the skin lesion was recognized with its intensity and morphological features.Four evaluation metrics are calculated for comparative study of the proposed ONKM α optimal = 0.0014 and k-means (without NS and α), γ -k-means (α = 0.85 [4] and α = 0.01) methods.The comparative results established the superiority of the ONKM method with 99.3% average accuracy over the achieved accuracies by the k-means and γk-means methods for detecting/segmenting different color, size, shape, skin surface roughness, and uniformity of the skin lesion.Due to the superiority of the proposed approach, it can be compared with the deep learning-based methods in the future work.

Algorithm:
Genetic Algorithm Generate random n populations Calculate the fitness function of these solutions Create new population: Select from the population two parent chromosomes according to their fitness Crossover the parents for new offspring Mutate new offspring Allocate new offspring Use the new generated population for another iteration If the end restraint achieved stop, and provide the pre-eminent solution End if Repeat the preceding steps

Fig. 1
Fig. 1 Flowchart of the training phase of the proposed approach ONKM skin lesion detection

Fig. 2
Fig. 2 Flowchart of γ -k-means skin lesion detection algorithm (NS for each image) Set a range of α to be from 0 to 1 Start 2 Calculate NS for the input images using the initial α Use i nitial α to provide the map the input images on the NS set Group the pixels using k -means Repeat the previous three steps with applying the GA to search for ( optimal α ) within the specified range that provides the highest JAC value Save optimal NS on the testing image using optimal α without using GA Map the test image on the optimized-NS set Group/Segment the pixels using k -means Stop 3 Stop 1

Fig. 3 Fig. 4
Fig. 3 Output of the proposed steps: a original (ISIC_0000185) image, b after preprocessing, c after RGB to gray conversion, d-f first NS conversion operators, where d true image, e false image, f undetermined image, g first NS conversion image, h-j last NS conversion operators, where h true image, i false image, j undetermined image, k last NS conversion image, l after k-means clustering, m segmented lesion (ROI), n comparison between target image and segmented image

Fig. 5
Fig. 5 Detection results: a number of dermoscopic skin image, b original skin lesion image, c ground truth image, d ONKM lesion detection results

Fig. 10
Fig. 10 Segmentation evaluation sensitivity metric of the ten test images using the ONKM, γ -k-means with different α = 0.85 and α = 0.01 as well as the k-means

Fig. 12
Fig. 12 Deteriorated performance segmentation cases, where (a1-a3): original dermoscopic test images, (b1-b3): segmented images using proposed approach ONKM, (c1-c3): segmented images using γ -kmeans (α = 0.85), (d1-d3): segmented images using γ -k-means (α = 0.01), and (e1-e3): segmented images using k-means algorithm 4[9,10] using the same dataset showing the superiority of the proposed method.From the preceding results and comparative studies, it is clear that the proposed ONKM approach has overall superior performance compared to the segmented images using γ -kmeans with α = 0.85 and α = 0.01, and k-means algorithm as well as the top recorded state-of-the-art ISIC 2016 challenge results.However, there are some cases in which the proposed method has deteriorated performance as illustrated in Fig.12.Figure12illustrates that the deteriorated performance is due to the existence of dark and/or different color regions in the lesion.In addition, the black frame that exists in the image (ISIC_0000276) may affect the results too.Consequently, it is recommended to study such cases in the future work in order to improve the performance on the proposed method.

Table 2
Mean and standard of the obtained through five cross-validation of the proposed ONMK (GA + NS + k-means) with reference to ground truth boundaries SpecificityFig.6Performance metrics of 10 selected test images measured evaluation metrics of the ONKM detection results over the 1229 test images are reported in Table2using fivefold cross-validation.

Table 4
Performance metrics study against the top evaluation results for the state-of-the-art segmentation tasks on the ISIC 2016 dataset