Computer Aided Diagnosis using Margin and Posterior Acoustic Featuresfor Breast Ultrasound Images

Breast cancer is the most commonly diagnosed cancer among females worldwide. Computer aided diagnosis (CAD) was developed to assist radiologists in detecting and evaluating nodules so it can improve diagnostic accuracy, avoid unnecessary biopsies, reduce anxiety and control costs. This research proposes a method of CAD for breast ultrasound images based on margin and posterior acoustic features. It consists of preprocessing, segmentation using active contour without edge (ACWE) and morphological, feature extraction and classification. Texture and geometry analysis was used to determine the characteristics of the posterior acoustic and margin nodules. Support vector machines (SVM) provided better performance than multilayer perceptron (MLP). The performance of proposed method achieved the accuracy of 91.35%, sensitivity of 92.00%, specificity of 89.66%, PPV of 95.83%, NPV of 81.26% and Kappa of 0.7915. These results indicate that the developed CAD has potential to be implemented for diagnosis of breast cancer using ultrasound images.


Introduction
The International Agency for Research on Cancer (IARC) released Globocan 2012 which provides the contemporary estimation of the incidence, prevalence, and mortality for 28 major types of cancer in 184 countries worldwide. It was estimated that around 6.7 million females were diagnosed with cancer in 2012 [1]. Breast cancer is the most commonly diagnosed cancer among females in the cast majority (140 of 184) of countries. It contributed a quarter of all cancer cases and 15% of all cancer deaths among females. An estimated 53% of cases occurred in developing countries. Males might also suffer from the frequency of 1% worldwide [2], [3]. In Indonesia, almost half (41.7%) of the estimated 5-year prevalent cancer cases among females is breast cancer [4].
Early detection, accurate diagnosis and treatment of breast cancer are the most effective way to reduce the mortality rate [2]. Screening for breast cancer with mammography has limitations for dense breast and exhibits low negative predictive value. It caused many patients with benign nodules were subjected to unnecessary biopsy. Ultrasound (US) is the most important alternative to the mammogram. Breast US imaging has some benefits such as no radiation, more convenient, safer, cheaper, faster and more sensitive for detecting abnormalities in dense breast compared to mammogram imaging. However, it is highly dependent on the operators, the radiologist's experience and may result in inconsistency of interpretation.
Analysis of US image characteristics can assist in determining benign and malignant breast nodules based on Breast Imaging Reporting and Data System (BIRADS). The BIRADS categories are used to detect malignancy which include shape, margin, echogenicity, orientation and posterior acoustic features [5]. A computer aided diagnosis (CAD) is a system developed by considering the role of radiologists and computers. It transforms the visual features and characteristics of nodules into mathematic models based on the classification schemes. It 1777 enhances the diagnostic accuracy, unnecessary biopsy might be avoidable, reduce of anxiety and control costs. Some researchers have developed breast ultrasound CAD for malignant or benign classification based on the BIRADS category of echogenicity and shape. Huang et al. [6] developed CAD with morphological features based on shape and support vector machine (SVM) classifier to identify the US breast nodules as malignant or benign. Chen et al. [7] used six practical texture features results of the principal component analysis (PCA) for classification of breast lesions as benign and malignant tumors. Chen et al. [8] used seven morphologic features and multilayer feed-forward neural network (MFNN) to distinguish benign and malignant lesions. Wibawanto et al. [9] combined gray level run length matrix (GLRLM) and gray level co-occurrence matrix (GLCM) to improve the performance of classification ultrasound images as cystic and non-cystic lesions.
This paper proposes a method of CAD for classification of breast cancer nodules based on margin and posterior acoustic feature with ultrasound imaging modalities. This method consists of four steps, namely preprocessing, segmentation, feature extraction by using fractal, histogram, GLCM, GLRLM and geometric followed by classification.

Research Method
The developed method consists of four main stages. The first stage is pre-processing. This stage is conducted by cropping the images into the region of interest area, converting images to gray scale images and reducing the noise using adaptive median filter. The second stage is segmentation. In this stage, active contour without edge (ACWE) is used to obtain the nodules' contour and separate the nodule from the background. The third stage is feature extraction. The extracted features consist of fractal, histogram, GLCM, GLRLM and geometric features. The final stage is to classify benign nodules from malignant nodules. The block diagram of this research is shown in Figure 1.

Pre-processing
The adaptive median filter has been widely used compared to the standard median filter. It processes each pixel with different window sizes resulting in a new determined value for the specific pixel. This filter usesthe median value of the pixels in the window as the output. The algorithm of the filter is described as follows [10]: Level I: if Zmin < Zmed <Zmax, go to level II, else, increase the windows size If the windows size < Smax, repeat level A; else,output : Zmed Level II: If Zmin < Zxy<Zmax,output Zxy, else, output Zmed where Smax is the maximum windows size, Zmin is the maximum intensity in windows, Zmin is the minimum intensity in windows, Zmed is the median intensity in windows and Zxy is the intensity of center (x,y). Thus, the adaptive median filter canpreserve sharpness of image and reducealmost all the noise.

Segmentation
Chan-Vese [11] proposed active contours without edges (ACWE) segmentation method. It isan improvement of edge-based models since edge detection is not based on gradient image, but curve evolution. The main idea is to consider information inside the regions, not limited only to the boundaries. The energy is defined as follows: with is the dirac function, H is the heavy side function, u and v are the two parameters updated on each iteration as below: The evolution equation is given by: ACWE with morphological operation achieved the best performance for breast image segmentation [12]. Morphological is an image processing method which modifies the spatial form or structure of objects involves the logical operation. The essence of the morphological method involves two-pixel arrays such as an image (A) and a kernel structure (B) [13]. The basic operations of morphological aredilation and erosion. Dilation of grayscale image is defined as follows: While the erosion of grayscale image is formulated as:

Feature Extraction
Feature extraction provides meaningful information of image description for the next step of image analysis. This research uses texture and geometric features. Texture and geometric features are a very useful for image characterisation. Texture refers to the repetition of several pixels, in which the placement could be periodic, quasiperiodic or random, called as texel. It can be evaluated as being fine, smooth, granulated, rippled, regular, irregular or linear [14]. Several textural features have been widely used for analysing ultrasound images [7], [9], [3], [15], [16]. It is generally classified into three categories, i.e. statistical, structural and spectral approaches. In this study, texture analysis is conducted based on statistical and spectral approaches, in addition to shape analysis using geometric features. The statistical approach consists of the first order based on the histogram, the second order based on gray level co- occurrence matrix (GLCM) and the third order based on gray level run length matrix (GLRLM). While the spectral approach uses fractal dimension.

Histogram
Thehistogram is a simple way to estimateof image probability density functions (PDF). Some of the histogram features are used for texture analysis as follows [17]: 1) Mean is the average brightness of objects in the image. It is calculated based on (8).
Here i denotes the gray level of image, p(i) denotes the probability, and L denotes the highest gray level of image.
2) Standard deviation is related to size of the image contrast which is given by (9).
3) Skewness shows asymmetry of the average intensity which is calculated by the (10).
4) Energy is the distribution of pixel intensities toward gray level and is formulated in (11).
5) Entropy indicates the complexity of the image and can be written as (12): 6) Smoothness measures the fineness or roughness intensity of the imagea can be estimated by (13):

Gray Level Co-Occurrence Matrix
GLCM uses textures on second-order to explain the spatial patterns [18]. The spatial patterns are defined in terms of distance and angle. GLCM features are used for this research as follows [19]: 1) Angular Second Moment (ASM) shows homogeneity relationship of the image.
2) Contrast describes gray level variation of pixel.
3) Inverse Different Moment (IDM) is related to homogeneity.
5) Correlation is related to linear dependence among the image gray level.

Gray Level Run Length Matrix
GLRLM matrix represents a two-dimensional matrix where each element p(i,j/θ) is the number of elements j with the intensity i, at θ direction [19]. There are seven features of GLRLM which used in this research as follows short run emphasis (SRE), long run emphasis (LRE), gray level non-uniformity (GLN), run length non-uniformity (RLN), run percentage (RP), low gray level run emphasis (LGRE), and high gray level run emphasis (HGRE). These features can be calculated by the following equations [20].

Geometry Features
The geometric feature is constructed by a set of geometrical elements such as points, lines, curves or surfaces [21]. This research used 7 geometry features that are convexity, solidity, compactness, circularity, dispercy, Fourier descriptor and aspect ratio. Convexity measures how convex the objects than the convex hull. Solidity is the ratio object area and convex hull surrounding the object. Compactness is an irregularity index. Dispercy is irregularity of an object obtained by the ratio between the length of a major chord with object area. Fourier descriptor is the average value which describes the condition of the lesion edge. Aspect ratio is the ratio between width and height of the object. Each feature can be written as the following [22]. (32)

Classification
This research compares two common approaches for classification process, which are multi-layer perceptron (MLP) and support vector machines (SVM). MLP is a nonlinear supervised classifier and the development of perceptron neural network. It consists of an input layer, some hidden layers, and output layer in which each branch has a weight. It always changes during learning or training process with back propagation algorithm. SVM is a supervised learning classifier. It finds an optimal hyperplane which maximises the distance between hyperplane and support vectors to separate two classes [24]. SVM uses kernel functions. It has excellent generalisation capability to correctly classify samples that are not within features used for training.

Results and Analysis
The data used in this study consist of 104 breast ultrasound images with pathologically proven. Based on the margin, nodules can be classified as circumscribedornot circumscribed. Based on the posterior acoustic featurecan be classified as an enhancement, no posterior or posterior shadow.Each classification nodules is shown in Figure 2.
Rectangular RoI with breast nodules to be analysed is manually selected by the radiologists. Speckle noise frequently occurres at the time of the ultrasonography acquisition [23]. Several ultrasound images contain labels and markers by the radiologists. Therefore, it is important to reduce them. Nugroho et al. [15] conducted a study to reduce the labels and markers on breast ultrasound images using a median filter. However, it caused a blurred image especially at the edges of the nodule. Rahmawaty et al. [13] also conducted a similar study by comparing median filters and median adaptive filter. Both filters can remove markers and Rectangular RoI with breast nodules to be analysed is manually selected by the radiologists. Speckle noise frequently occurresat the time of the ultrasonography acquisition [23]. Several ultrasound images contain labels and markers by the radiologists. Therefore, it is important to reduce them. Nugroho et al. [15] conducted a study to reduce the labels and markers on breast ultrasound images using a median filter. However, it caused a blurred image especially at the edges of the nodule. Rahmawaty et al. [24] also conducted a similar study by comparing median filters and median adaptive filter. Both filters can remove markers and labels, but adaptive median filter produces images with smaller blur effects. The results of preprocessing process are shown in Figure 3. Segmentation process with ACWE method is conducted to separate the nodule areas and the background. In this method, an initial masking is required and will be executed as much as the number of iteration. Initial masking and the number of iteration significantly affect the segmentation results as shown in Figure 4. Morphological technique is used to remove smaller areas on segmentation image which are not nodule but detected as nodule. Thus it is obtained a nodule area only. The additional of morphological operations are able to improve the performance of the segmentation. It is indicated by the increased level of similarity between segmentation result and the gold standard. Geometric features of the nodules and its texture comparison with the surrounding area are used to characterise the nodules margin. While the posterior acoustic characteristic is extracted from the texture comparison of nodule, posterior area, and bottom side nodule area. The next step is the classification of nodules based on margin and posterior acoustic characteristics. This research compares two common classification approaches, which are MLP and SVM. A 3-fold cross validation is used to perform an analysis of the proposed method. All of the data are randomly divided into 3 groups. The performance of the classification approach is listed in Table 1. It shows the SVM provides better performance than the MLP. The SVM performance achieves the accuracy of 91.35%, the sensitivity of 92.00%, specificity of 89.66%, PPV of 95.83%, NPV of 81.26% and Kappa of 0.7915. Unlike MLP, SVM complexity does not depend on the data sets dimension. SVM is based on structural risk minimisation, so it is more efficient to generate better classification. The Kappa value of 0.7915 indicates substantial agreement between the radiological diagnosis and the proposed method.

Conclusion
This research proposes a method for a computer-aided diagnosis of breast ultrasound images based on margin and posterior acoustic features. It consists of preprocessing, segmentation, feature extraction and classification. Performance in differentiating between malignant and benign nodules is strongly affected by the accuracy of the nodules characteristics that is obtained by the system. SVM provides better performance than the MLP. The result shows that the performance of the proposed method achieves the accuracy of 91.35%, a sensitivity of 92.00%, specificity of 89.66%, PPV of 95.83%, NPV of 81.26% and Kappa of 0.7915. This finding is expected to help radiologists in diagnosing breast nodules on ultrasound images. For future works, feature selection can be applied to improve the performance of the proposed method.