Multi-Level of Feature Extraction and Classification for X-Ray Medical Image

ABSTRACT


INTRODUCTION
The production and relatively straightforward management of digital visual content have been increasingly in demand over these recent years. Specifically in the medical domain for digital information, the continuous development of medical images such as X-ray, Computed Tomography (CT) scans, and Magnetic Resonance Image (MRI) scans contributed substantial amount of images daily. For example, the Department of Radiology in University Hospital of Geneva produced 12,000 to 15,000 images daily in 2002 [1]. The number of produced and stored images daily for this department continued to increase to 50,000 images in 2007 [2] and 114,000 images in 2009 [3]. Essentially, these images reveal critical information of visually inaccessible body parts, which are essential for medical diagnosis, medical education, and medical studies.
Therefore, effective techniques to navigate and search substantial amount of medical images accurately are necessary. The conventional image retrieval system depends on keyword search, in which the keywords or annotated image descriptions are manually assigned for indexing purpose. Subsequently, 155 relevant images are retrieved using this indexing system, which is known as Text Based Image Retrieval (TBIR). However, the TBIR method is disregarded due to the presence of thousands or even millions of image in the database. The process of entering metadata to each of these images is costly and timeconsuming [4]. Consequently, rather than depending on TBIR, the Content Based Image Retrieval (CBIR) method is opted, in which the image retrieval process depends on features extracted from the image itself (the visual content of an image). Specifically, low-level features such as color, texture, and shape are considered as feature vectors, which are automatically extracted in the process of searching for specific images with respect to the query image. Accordingly, this technique is less time-consuming compared to the technique that depends on texts for the purposes of indexing and retrieving [5]. However, CBIR does not interpret data in the same way that a human does. Additionally, it is inexpedient for the system to elucidate high pixel images as how a human perceives images. Such limitation is known as semantic gap [6], which is specifically defined as the difference between how a human perceives an image based on a high-level semantic concept and how a computer classifies an image based on low-level features. Nevertheless, in practice, CBIR cannot be achieved based on only simple independent visual features. Various medical image classification methods using machine learning are developed to reduce the issues of semantic gap. With that, this study formulated an effective classification system for X-ray medical images based on multi-level feature extraction, feature reduction, and multi-classification techniques. The evaluation of this integration was performed using ImageCLEF2005 database. Attempts to utilize global or local features with either Support Vector Machine (SVM) classifier or k-Nearest Neighbor (k-NN) classifier for X-ray medical images were performed in various related studies, as summarized in Table 1 [7]- [9]. For this study, the evaluation was based on correctness rate. The correctness rate, as shown in Equation 1, is the result of dividing the number of correctly classified images by the total number of images. (1)

ANALYSIS AND PROPOSED SOLUTION
Realistically, it is a challenge to reduce the semantic gap because visual features of images do not present high-level semantic concepts and instead of utilizing the content of images, users opt for text-based query. With that, this has further instigated studies to develop effective medical image classification methods. However, the familiarization process with the semantic model in classifying images and enhancing the retrieval performance is complex.
Conversely, the results obtained from the previous studies to classify X-ray medical images, as shown in Table 1, utilizing global or local features with either SVM or k-NN classifiers are not regarded as the finest solutions to the issue of reducing the semantic gap. These results remained vary from one another. For example, referring to Table 1, RWTH-i6 team achieved error rate of 12.6% while Montreal team achieved error rate of 55.0% for the same dataset. Meanwhile, Mueen [10] combined feature extractions of global, local, and pixel for X-ray medical image classification and annotation using both SVM classifier and k-NN classifier. The resultant outcome of this combined feature extraction, consisting of 57 classes (ImageCLEF2005 database) revealed that the performance of SVM exceeded the performance of k-NN in most of the classes (specifically, 48 classes) while the performance of k-NN exceeded the performance of SVM in the remaining nine classes only. With that, SVM was considered for annotation purpose. Three hierarchical levels of image annotation were applied to reduce the semantic gap.
Apart from that, in another study on 4,937 X-ray medical images, Fesharaki & Pourghassem [11] achieved accuracy rate of 82.8% using feature extraction of shape and Bayesian classifier. Conversely, Ghofrani [12] achieved higher accuracy rate (90.8%) using feature extraction of shape and edges as well as SVM classifier on a dataset of 1,169 X-ray medical images. The accuracy rate increased to 94.2% with the integration of feature extraction of shape and texture and SVM classifier (rather than neural classification technique) on a dataset of 4,402 X-ray medical images [13]. Zare [14] utilized feature extractions of Gray Level Co-occurrence Matrix (GLCM), Canny, pixel, BoW, and LPBd as well as SVM and k-NN classifiers, where SVM achieved higher accurate rate (90%) based on ImageCLEF2007 database.
In conclusion, there is a need to utilize an effective classification that integrates multi-level feature extraction (global and local features) and multi-classification techniques for X-ray medical image classification.

METHODOLOGY
This present study proposed a framework to classify X-ray medical images based on multi-level feature extraction using the ImageCLEF2005 database. In this study, the development of the proposed framework was based on feature extraction, combination and selection, and classification, which are specifically discussed in the following sections.

Feature Extraction
This study extracted, combined, and utilized various features to explore different aspects of X-ray medical images. As presented in Table 1, several feature extractions were utilized, where global feature and local feature were considered in certain studies. Meanwhile, for this study, the following feature extraction algorithms were considered: (1) global feature, (2) local feature, (3) pixel feature, and (4) speeded up robust features (SURF).
In particular, global features were extracted from each image by applying feature techniques of shape and texture, which generated 282 features. These features included 130 dimensions of shape features and 152 dimensions of texture features. The local features, on the other hand, were extracted by segmenting the input image into four non-overlapping blocks of pixels, resulting to the extraction of 282 dimensions from each patch. The pixel feature was extracted after resizing each image to 15 x 15 pixels, which generated 225 features. SURF technique subsequently extracted 150 features from each image.

Texture Feature
Essentially, texture features refer to the underlying structural arrangement of the surfaces in the input image. There are two types of texture features, which are (1) Gray Level Co-occurrence Matrix (GLCM) and (2) Wavelet Transform (WT). GLCM was firstly introduced by [15]. It is mainly utilized to compute the second-order texture characteristics in solving the issues of categorization efficiently. For N x N image, it includes pixels with gray levels of 0, 1, 2, …. (G -1) and represented by matrix where each matrix element stands for the joint incidence of intensity levels and with prospects at a certain distance, d (which refers to the related distance between each pair of pixels and a related orientation angle) [16].
In order to obtain enhanced outputs, several co-occurrence matrices must be considered; one for each related location offers various texture features or similar features at various scales. Several texture measures of GLCM could be directly calculated [15] [23]. Generally, θ is quantized into four different directions: 0o, 45o, 90o, and 135o.

Mean of
Mean of px and py respectively They represent the "standard deviations of px and py", respectively" They represent "the entropies of px and py respectively" The following equations were utilized to compute the presented twenty-two texture statistics: Maximum Correlation Coefficient = (second largest eigen value of Q )0.5 (20) Meanwhile, one of the most commonly used methods for multi-resolution image description and analysis is the WT. It specifically offers an efficient set of tools for various applications such as compression of images or signals, detection of objects, improvement of images, and noise removal. Wavelets are functions, satisfying a linear combination of various conversion and scaling processes of a wave function. It utilizes wavelet transform, specifically the Haar wavelet, to extract texture feature. This first known wavelet is considered as the simplest wavelet basis, which was utilized for orthonormal wavelet transform with compact support [24]. Equation 24 represents the Haar function equation using a step function, .

Entropy HXY HX HY
The Haar wavelet was applied in this study since it is the most efficient technique to calculate the feature vector [25]. This was performed by applying the Haar wavelet for four times in order to divide the input image into 16 sub-images, as illustrated in Figure 2. Each image I of the size of was initially resized into 100 x 100 pixels. The Haar wavelet was subsequently applied to each image before dividing it into four sub-images, where each image has size -L10L10, L10H10, H10L10, and H10H10. In the sub-image of L10L10, low frequencies were present in both horizontal direction and vertical direction. Specifically, low frequencies were present in the horizontal direction while high frequencies were present in the vertical direction. However, in the sub-image of H10L10, high frequencies were present in the horizontal direction while low frequencies were present in the vertical direction. and in the H10H10 sub-image, there are high frequencies in both directions. Following that, the Haar wavelet was applied on the image of L10L10 with the size of to obtain four new subimages, where each image has size -L11L11, L11H11, H11L11, and H11H11. Similar process was repeated twice to obtain sub-images of and , respectively, as illustrated in Figure 2. Additionally, four features were computed for each of the presented four procedures, which are (1) entropy, (2) energy, (3) mean, and (4) standard deviation. With that, there were 64 features computed from all subimages. Figure 3 illustrates a sample image used as an input for the Haar WT, which was obtained from the ImageCLEF2005 database. The obtained sub-images after applying the second, third, and fourth Haar wavelet are depicted in Figure 4, based on the aforementioned. Haar wavelet was applied four times in order to divide the input image into 16 sub-images to get the most information about the image, applying Haar wavelet for the five time gets the sub images L13L13 equal to zero, therefore was applied only four times. Figure 4. The obtained sub-images after applying the second, third, and fourth Haar wavelet

Shape Feature
The shape feature offers geometrical information concerning an image object, which does not vary with the variations in the orientation, scale, and location. For this process, the shape information of an image was explored based on edges. Thus, the histogram of edge techniques and SURF technique were applied in this study to extract the shape feature of images. Histogram of edge was utilized to explore the shape feature for each image. In particular, both gradient histogram and edge orientation histogram were applied. The first edge histogram technique was utilized to extract 50 features from each image while the second edge histogram technique was utilized with a Canny filter to extract 80 features from each image [26].
The SURF technique has a scale and rotation invariance property, which facilitates object identification with no regards to the image's resize or representation of rotation around a certain axis [27]. Realistically, variance occurs because not all information could be captured from a specific recording. Invariance is an essential property of image since the similarity measurement is probable based on the feature between two images that cannot be duplicated. Thus, the SURF technique was applied to extract 150 features from each image.

Combination and Selection
Combined feature refers to the combination of global feature, local feature, pixel feature, and SURF into one vector. Figure 5 depicts the overall process of feature extraction as well as combination and selection. In order to extract pixel features, images were resized to 15 x 15, which contributed a vector of 225 pixel features. The global features refer to the features of shape and texture, which were extracted from the whole image; thus the resultant outcome of this combined vector was 282 features, specifically 130 features from the edge histogram, 64 features from the WT and 88 features from the GLCM. Conversely, the local features were extracted by segmenting the image into four non-overlap patches, which shared similar 282 features. This led to 1,128 features, combined in one local feature vector. Meanwhile, 150 features were obtained for the SURF. As a result, the overall feature vector dimensionality for each image equals to 1,785 feature vectors. Given the substantial number of feature vectors involved, a certain dimensionality reduction technique must be performed to decrease the feature vectors. The most commonly used dimensionality reduction technique is the principal component analysis (PCA) [28]. This simple technique effectively decreases the dimensionality of data. With the application of this technique, the feature vectors were reduced from 1,785 into 25, 50, and 100 one to study and choose the optimal precision outcomes.

Classification
The classification of image is presented as the main aspect of this present study with respect to the objectives of this study. Four distinct features were initially extracted from the input image, which were global feature, local feature, pixel feature, and SURF. Following that, these extracted features were combined into one feature vector. PCA was subsequently performed to decrease the dimensionality of feature vectors. The developed image classification system from this study was evaluated using the ImageCLEF2005 database [29]. This database was segmented into training set and testing set. The training set was categorized into 57 known classes, which were pre-defined.

EVALUATION
A series of experiments was conducted to evaluate the performance of the proposed method in this study. In particular, this is to validate the proposed method and its significance for X-ray medical image. The implementation of the proposed method included feature extraction, feature combination and reduction, and X-ray medical image classification using SMV and k-NN classifiers, which were evaluated to determine its performance based on the results of accuracy rate. As a result, four experiments were conducted. The specific methods and settings of these experiments are described and obtained results in this study are presented and discussed in the following sub-sections.

Experiment 1 -Feature Reduction
This experiment was conducted to evaluate and investigate the performance of the our proposed system after reducing the number of features using PCA, which was essential to determine the accuracy rate of the system. Feature vectors were reduced from 1,785 features into 25 features, 50 features, and 100 features, which was termed as PC1, PC2 and PC3, respectively. The results of accuracy rate with optimal feature reduction were obtained with and without a threshold for both datasets containing specific ratio of training set and testing set (80:20; 90:10) and were subsequently compared among these obtained values.
The PCA is considered competent and effective in reducing the dimensionality of data. Both SVM (with RBF kernel) and k-NN (with k = 1) were employed for each evaluation stage in this experiment. Table  5 reveals the obtained results using k-NN classifier while Table 6 reveals the obtained results using SVM classifier. Consequently, PC2 (50 features) achieved the highest accuracy rate. Specifically, PC2 obtained the highest percentages, with and without the threshold using both classifiers. Based on this experiment, PC2 was considered for the subsequent experiments. It should be noted that undeniably, it is essential to have sufficient number of features for discrimination and for high accuracy rate. Having few features might lead to low accuracy rate and inadequate number of features subsequently affects the discrimination among the features of other images. Nevertheless, high accuracy rate is not warranted with high number of features due to the high occurrence of common features, which affects the discrimination among the features of other images as well. Consequently, PC2 was proven to achieve the highest accuracy rate, rather than PC1 (25 features) and PC3 (100 features).

Experiment 2 -Feature Combination
This experiment aimed to investigate the performance of a single feature extraction from each of the four features (global feature, local feature, pixel feature, and SURF) and the combination of these four feature extractions. The resultant outcome of this experiment was crucial in terms of accuracy rate and indexing. These results were compared with those of related previous studies in term of feature sets. Most of Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752  the medical content-based image retrieval systems utilize global features. The main advantage of global features is the computation speed where the feature extraction and matching similarity are computationally faster. However, they may fail to identify pertinent visual characteristics. The classification process of global features includes two phases, which are training and testing. In the training phase of this study, global features were extracted from all training images and the classifier was subsequently trained on these extracted features to create a model. In order to classify the test images, features were initially extracted in the same way as in the training phase. The model was then utilized to classify test images. However, local features are inherently robust against translation. In this experiment, local features were extracted from four square images, which were taken from original ones after dividing the image into four blocks. Similar classification process that was applied for global features was subsequently applied for local features, except that local features were extracted from each sub-image. The pixel value comparison is also an effective approach to seek similar images in the database. For most applications, this approach is not feasible because the difference between the pixels of one image to another is not evident. However, it is feasible for the pixel value comparison to identify only one specific object of equal size and located at similar position (similar row and column of an image matrix) between images with small resolutions. For this study, Experiment 2 also utilized pixel information.
The SURF, a descriptor feature, is also a scale and rotation invariant detector. The scale and rotation invariance denotes that an object could be identified even when it is scaled in size or rotated. The SURF was applied in this experiment as well, but it was not utilized as one of the local features given that the extraction of these features is a time-consuming process. In the training phase, all images were resized to 100 x 100 pixels, where the resultant large feature vector containing 1,785 features was reduced to 25 features, 50 features, and 100 features using PCA. Referring to the obtained result of Experiment 1, PC2 (50 features) was considered for Experiment 2.
For the generation of model, both SVM classifier and k-NN classifier were compared. The SVM is widely used for statistical learning and classification. Primarily, the SVM deals with binary classification issues. There are presently two multiple classification approaches in use, specifically one-against-one approach and one-against-all approach. The one-against-all approach was specifically considered for this experiment because it is computationally faster than the other approach. Accordingly, the RBF kernel was applied with g = 0.0625, and a trade-off between the training error and margin, c = 8. It should be noted that these values were obtained from an empirical study. The second most widely used classification method is the k-NN (k = 1), which was used for further comparisons (details on the parameters of SVM and k-NN are further discussed for Experiment 4). Results were calculated after performing random sampling on the dataset for 10 times in order to produce reliable results.
The results shown in Table 7 and Table 8 refer to the correctness rate of different feature sets using both SVM classifier and k-NN classifier, respectively. It could be observed in Table 8 that in the XMIAR prototype, the combined features of all four features using the SVM classifier achieved the highest accuracy rate (95.368%) by applying the second set of evaluation (90% of training images and 10% of testing images) without a threshold. The combined features of all features contained pixel information, global features (features of shape and texture), local features (features of shape and texture), and SURF. Therefore, the application of the SVM using combined features outperformed the other applications using each of the feature sets separately as follows: (1) global feature set, (2) local feature set, (3) pixel value set, and (4) SURF set. The comparison of these distinct features also revealed that the use of pixel features outperformed the uses of both global features and local features for all evaluation sets while the local features provided results of higher accuracy rate than the global features for all evaluation sets. Meanwhile, the combined features achieved the highest correctness rate (95.368%) by applying the SVM and a slightly lower correctness rate of 99.202% by applying the k-NN (90:10) without a threshold. In practice, different features of images reflect different attributes, which explained why the combined features provided results of higher correctness rate. Figure 6 shows the accuracy rate for each class using both SVM classifier and k-NN classifier with the evaluation set of 90:10 without a threshold. It could be observed that the SVM classified images more efficiently than the k-NN for various classes, such as classes 15, 23, 29, 37, and 51 while the k-NN outperformed the SVM for other classes such as classes 21 and 44. On the other hand, both classifiers achieved almost similar accuracy rate for classes 50 and 52. For the remaining classes, both SVM and k-NN classifiers provided convergent results. There were certain classes with substantial amount of training images such as class 12 while there were also other classes with few training images such as classes 51, 52, and 55, with only eight samples, as shown in Table 4. The k-NN classifier performed more efficiently when the objects in images were distinctly clear in contrast with the backgrounds and when all gray pixels were in one part of the images. The results from this study revealed improvement in comparison with the results obtained in previous related studies using the same dataset. In fact, the proposed method in this study provided higher accuracy rate in comparison to the winner teams of the ImageCLEFmed2005 task; RWTH -i6. Additionally, the proposed method provided higher accuracy rate in comparison to the previous related studies [28], [10]-] [14].

Classification and Parameters
The main two classifiers in this study were the SVM with RBF kernel and the k-NN with Euclidean distance metrics to locate the nearest neighbor. Thus, this subsequent experiment was conducted to compare the SVM classifier with the k-NN classifier using different parameters. Additionally, optimal parameters for each classifier were identified with its respective accuracy rate. For this comparison experiment, the k-NN was considered due to its popularity and classification performance shown in previous related studies. Moreover, compared to SVM, the implementation of k-NN is simpler because there is no offline training. For this study, the SVM was applied from the Library for Support Vector Machine (LIBSVM) [29]. The LIBSVM is generally defined as incorporated software to support vector classification and to sustain multiclass classification. Its main features included effective multi-class classification, cross validation for model choice, and various kernels. For this experiment, the k-NN was examined with different values of kernel (k). In addition, a comparison was conducted for the SVM using RBF kernel of different values.
It should be noted that this experiment was an empirical study of trial and error to select the optimal kernel function. Hence, this empirical study revealed that the values obtained using k-NN (k = 1) and SVM (-Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752  t = 2, -c = 8, -g = 0.0600) were optimal based on the classification of ImageCLEF2005 database. For the parameters of SVM, -t represents the type of kernel, -c refers to the tradeoff between the training error and margin, and -g denotes gamma, which refers to how far the influence of a single training example could be achieved. In this context, low values of -g indicated far while high values of -g indicated close.
According to the previous experiment, the dataset was divided into 90:10 to calculate the results obtained in this particular experiment. Similar feature vectors of global, local, pixel, and SURF were utilized with the application of PCA.
As illustrated in Figure 6, results revealed that when c equals to 8, improved classification was achieved for the SVM classifier. Moreover, the default value of gamma (g) is obtained using the following: . Given that both training images and testing images contained 50 features per image, the value obtained equals to 0.02. Conducted empirical study revealed that increased value of gamma provided higher stability and accuracy rate.
The results using k-NN revealed that the highest accuracy rate was achieved when k = 1, which is, in fact, the default value of k. A comparison between this value and other values (k = 2, k = 3) was shown in Figure 7 and 8. One drawback of using the SVM is the time required for offline training. For this study, using the processor of Intel(R) i7-4500U with 8GB RAM and MATLAB 2012a version for coding specifically, the SVM took approximately five hours while for the k-NN, it performed rather instantaneously.

DISCUSSION
These experiments validated the significance of classifying the X-ray medical image for meaningful image retrieval. Experiment 1 was specifically conducted to obtain optimal number of features for subsequent experiment using PCA. The resultant outcome of Experiment 1 is that PC2 (50 features) achieved the highest accuracy rate for both classifiers. Meanwhile, the obtained results from Experiment 2 distinctly revealed that combined features yielded higher accuracy rate compared to the application of single feature. Given the complexity of medical images, this study revealed that utilizing all available features was the optimal approach to enhance the performance of retrieval and classification. Typically, the classification of an image depends on low-level features while the annotation depends on the accuracy rate of classification.
Additionally, in Experiment 3, the classification techniques and its parameters were conducted. Based on the optimal performances of both SVM classifier and k-NN classifier, these two classifiers were utilized in the proposed system, where the SVM and k-NN are used for better accuracy results.