A New Approach of Content Based Image Retrieval Using Color and Texture Features

The dramatic development of


INTRODUCTION
Nowadays image retrieval system is a hot topic in digital image processing techniques from data mining community [1]. Digital images are rapidly growing because of developing techniques of multimedia and information. The storage and transmission challenges are tackled by different image compression techniques [1]. The property of image may be lost or remain unchanged through this compression. Some image compression methods are Run-length encoding, Entropy encoding, Transform encoding and so on [2,3]. The image retrieval is more challenging than storing image. Therefore, it has been studied by researchers worldwide from a wide range of discipline in computer vision [4,1].
Mainly two approaches are used to search and retrieve similar images from a giant database [5,6]. The first one is called text-based image or concept-based image search which is based on textual information of an image given manually by the humans. In this process, the images are described according to their caption or the background information. Also, images are not always annotated, and sometimes it may be annotated differently by different observers [7]. At this, the TBIR system is subjective, incomplete, inconsistent, expensive, tedious and time consuming [5]. To overcome these limitations, CBIR process is considered. The visual content of images like color, texture, and shape are extracted here in order to index the images automatically [8,9].
The CBIR research started in early 1990's and is being continued throughout the last three decades [10,8]. It provides an automated system which retrieves images based on their features. The CBIR has been defined as a system to search similar images from the database for a given query Image. Generally, this system extracts features of the database images and generates image feature vector. Then extract the features of query image and build feature vector. And finally, the distance between two images is measured using distance between their feature vectors [5,11,12,1].
Today images are a part and parcel of communication in the digital world. Images are searched according to their visual features in a CBIR system [11]. Countless computer applications exist which greatly involve image searching, matching and retrieval. Application of image searching includes logo searching signature matching, document matching and so on [5,11,13].
Usually an image searching technique uses a single feature (e.g. color, texture etc.) or multiple features with single model which gives lower accuracy [5]. We have used two features (color and texture) with multiple models to have better accuracy. The time complexity of our system will not be highly variable than the time complexity of the other existing systems.
The rest of the paper is organized as follows. In section 2, different literatures related to this paper are reviewed. Then methodology is discussed in details in section 3. After that, different distance measurements are added in section 4. Further, experimental results are expressed in section 5. Again, the conclusion of this work and possible future studies in this field are highlighted in section 6. And finally, all the helping sources are mentioned in references.

RELATED WORKS
A huge research has been conducted on image retrieval throughout the last few decades. Arthi and Vijayaraghavan [5] proposed a technique for retrieving images from the database on the basis of color models which was also based on Color Co-occurrence Matrix (CCM). They found the CCM for each pixel of an image using the Hue Saturation Value (HSV) of the pixel and then compared with CCM of the images in the database.
Again, Zhao and Grosky [14] proposed a method for content based image retrieval by determining the latent correlation between low-level visual features and high-level semantics and integrating them into a unified vector space model.
In the year 2004, Grosky [15] proposed a technique of image retrieval on the basis of visual contents, which include color, texture, shape and spatial constraints, but these were limited, is an integral part of multimedia information systems. In the past few years, CBIR has seen a great deal of emphasis in the context of multimedia databases.
On the other hand, Byrne [11] proposed a technique which deals with the application of Natural Language Processing and CBIR to the practical problem of finding database images to answer user queries. In the year 2014, Sigla and Garg [10] described another CBIR technique which is based on the combination of Color features, and Gabor wavelet transformed features. They analyzed both query and database images using color moments and autocorrelogram. They used UINT8 images to apply Autocorrelogram. Also, the Gabor wavelet was used to determine the mean squared energy. And for score matching, they used the well-known classifier Support Vector Machine (SVM).
After that, in the year 2015, Husain and Akbar [16] proposed a method for retrieving images from the database using the classifier, Back Propagation Neural Network (BPNN) which consists of an input, a hidden and an output layer. In their method, they evaluated some common features sets to classify images, and identify the relevant features for the users by selecting 50 nodes for the hidden layer based on experiment. They chose half of the images for training and the rest for testing.
Again, in the year 2016, Ali et al. [17] proposed an approach of image retrieval based on histograms of triangles through an additional spatial information to the inverted index of Bag of Features (BoF). They divided each image into two and four triangles separately for the purpose of computing the histograms of triangles.
Furthermore, in the year, 2016, Mehmood et al. [18] introduced an approach for retrieving images using local and global histogram of visual words combinedly. In their approach, they computed global histogram of visual word over the whole image, whereas local histogram of visual words is constructed over the local rectangular region of the image.
In our proposed method, we have used HSV histogram, Autocorrelogram, and color moment in order to extract the color features. On the other hand, Wavelet and Gabor wavelet have been used for the purpose of extracting the texture features of both database and the query images. And, this combined use of both color and the texture features of our proposed technique give better accuracy compared to the previous studies in this field.

METHODOLOGY
Here we have represented a technique that uses multiple features with multi models for similar image search from a vast collection of digital image datasets. Here we are going to describe the whole strategy of our system.

Solution System Architecture
Our CBIR system is integrated with different components. We have used Color moment, two color models, HSV Histogram, and Color Autocorrelogram as color feature descriptor [12]. For this reason, at first to count the number of pixels in each of these discrete level, and then HSV histogram is computed [19]. After that Wavelet Transformation and Gabor Wavelet Transformation are used as texture feature [20].

HSV histogram
From a database, we have worked with HSV color space of different images [5,21]. The HSV planes are shown in Fig. 2.
In order to decrease the number of colors we have quantized the color space into several bins. Smith [22] designed a system to quantize the color space into 166 colors, while Li [23] 72 colors. But in the year 2013, Kaur and Banga [24] proposed 15 non-uniform colors which is used in our work [10]. The formula to transform RGB color channel to HSV is mentioned as: The database images and the query image to be quantized in HSV color space into 8x2x2 equal bins. And Finally using HSV histogram we have produced 1x32 vector.

Autocorrelogram
Correlogram is the correlation statistics which is generally used for data analysis, especially image data. In time series analysis, correlogram is known as autocorrelation plot. For an i the correlogram is defined for a pair level (g by the following formula [12,1]. [25,22] basically captures the spatial correlation of identical levels only. 4 image data. In time series analysis, correlogram is known as autocorrelation plot. For an image, the correlogram is defined for a pair level (g i , g j ) Autocorrelogram [25,22] basically captures the spatial correlation of identical levels only.
Again, the probability of the pixels p1 and p2 at distance d of the same level gi is expressed as: Generally, distance measure is the L1 norm between Autocorrelograms, and imag form is used to extract it. Finally, 1x64 feature vector is created containing the color autocorrelogram. Again, the probability of the pixels p1 and p2 at distance d of the same level gi is expressed (2) Generally, distance measure is the L1 norm between Autocorrelograms, and image in uint8 form is used to extract it. Finally, 1x64 feature vector is created containing the color

Color moment
We have used first order and second order color moments like mean and variance. We analyzed images and extracted first and second moments from each R, G and B colors, and then produced1×6 vector. The Mathematical definition of two moments [11,21,20] are as follows:  

Texture Feature
There are some texture measures like Mean-Coefficient, Standard-Coefficient, Energy, Amplitude, Contrast, Correlation etc. [19]. We have used a combination of Wavelet and Gabor Wavelet Transformation feature vector for describing texture feature of images.

Wavelet transformation
One of the texture feature multi models of an image is wavelet transformation and we have applied it for measuring Wavelet Coefficient like Mean-Coefficient and Standard-Coefficient [26,20]: At first, convert RGB color space to Gray Color Space, then resize all converted gray color space images in a specific height and width. And then calculate different level coefficient using Discrete Wavelet Transformation. Again, calculate Mean and Standard Coefficient using calculated coefficient in previous step. Finally, build wavelet feature vector by Wavelet Coefficient. We have generated texture feature vector after texture feature extraction by applying wavelet transformation which contains wavelet coefficient. And finally, we have organized 1×20 feature vector which contains first two moments of wavelet.

Gabor wavelet transformation
Gabor Filter is generally used for edge detection from image. In image retrieval system, Gabor filter is used to extract the texture features like Mean Amplitude, and Mean Squared Energy from image. Generally, the extraction of Gabor filter means the representation of frequency and specific orientation just like human visual system. For this reason, a texture feature vector is generated under each scale and orientation. In our solution system, we have used 0.4 as radius, 10 as sharpness for measuring median filter, six filter orientations and five wavelet scales. We have also used two dimensional Gabor function [19] and Fourier Transformation which are mathematically defined by equations 3 and 4 respectively [27]: Where, xˊ=xcosθ-ysinθ, yˊ=-xsinθ+ycosθ and F represents the radial frequency of the Gabor function, σ x and σ y are space constants which define the Gaussian envelope along x-and yaxis.
After combining wavelet transformation and Gabor wavelet filtering, we have generated 1×88 texture feature vector for database images, and for a query image. And finally, these texture feature vectors have been combined with color feature vector for producing image feature vector. Therefore, 1×190 feature vector is generated where one column is used as an index of each image.

Ranking Similar Images
The images are retrieved using distance matrix. For each image, corresponding distance matrix has been calculated. Then image names have been sorted for each distance value measured by row to row. For this purpose, Manhattan or Euclidean distance vector is calculated with corresponding image name for each image of dataset. Then swap image name according to distance value. And finally, retrieve at most twenty images on the basis of our user interface.

Distance Vector Based Approach
In the distance vector based approach, it needs to measure the distance between feature vectors of database image and test image. There are some distance formulae among which Euclidian and Manhattan distance formulae are mentioned below.

Euclidian distance
If two vectors be A=A(x 1 ,x 2 ,…x n ) and B=B(y 1 ,y 2 ,…y n ) then the mathematical formulation to measure Euclidian distance can be defined as [8]: Euclidian distance is always non-negative where zero defines an identical point, and other means little similarity.

Manhattan distance
The mathematical formulation of Manhattan distance between two points, x and y, along axis at right angles with k dimensions is defined as follows: Manhattan distance is also known as City block distance. Generally, the Manhattan distance is non-negative where zero defines an identical point, and other means little similarity.

EXPERIMENTAL RESULTS
We have implemented our proposed method using the process introduced by Bajaj et al. [28] based on color and texture feature extraction using MATLAB which is more efficient than other systems. We have introduced precision, recall and F-score curve of with a view to comparing the performance of the system and finally, mentioned the better accuracy than other.

Dataset and Classification of Images
We selected ten types of digital images from Wang dataset [29,30], 100 images in each category and total 1000 images in our experiment [20]. Each category of image is shown in Table 1.

Image Retrieval Result
The result of retrieval images in our implemented system for each types of Query image are displayed below from Fig. 3 to Fig. 12 that deal with the similar image searching result for each types of classified image as query image respectively.

Performance Evaluation
Both objective and subjective performance evaluation has become a crucial part of image retrieval process. Generally, the performance of CBIR system is evaluated using precision and recall.

Accuracy evaluation
Actually, the accuracy depends on precision. It is defined as the ratio of the number of relevant images retrieved to the total number of retrieved images. On the other hand, Recall value defines the ability to retrieve all relevant images from database. It is the ratio of the number of relevant images retrieved to the total number of relevant images in the database [5]. Also, in statistical analysis, F-score is a binary classification with a view to measuring the test accuracy. The precision, recall, and the F-score have been defined as follows [31]:

Retrieved relevant images Precision=
Total retrieved images (7) Retrieved relevant images Recall= Total relevant images in the database (8) Precision Recall score=2. Precision+Recall Generally, F-score gives the best result at value 1 whereas worst at value 0.  The accuracy of our solution system is shown in Table 4 using Manhattan Distance and Euclidean Distance for each type of query images. Fig. 13. Precision, recall, and F-score matrix in a surface using Manhattan distance Fig. 11. Precision, recall, and F-score matrix in a surface using Euclidean distance Based on Table 4, a stem diagram of accuracy using Manhattan, and Euclidean distance is drawn in Fig. 15  Fig. 15. Accuracy of our system using both Manhattan and Euclidean distance Above Table 5 describes the accuracy of our system with other existing systems for each type of query image [13]. And in each case, our solution system provides better accuracy than others.
The aim of our solution system is to increase accuracy. The comparison of our solution system with other existing system is shown in Fig. 16. using Manhattan Distance [20].

Retrieval time evaluation
Image retrieval time means searching time of similar image from dataset. Most of the system takes much time for a simple query image search from a large-scale database system. The elapsed time for image searching of our image retrieval system shows in Table 6 as followings [20].  Our proposed system takes a little time more than the multiple features with single model which is shown in the Fig. 17. The time complexity has a very little impact on our proposed system. It is a system that has lighter effect of time complexity and providing us high accuracy of similarity search [20].

Performance comparison
The performance measured based on average precision, and a comparative evaluation is mentioned in Table-7, with the previous works.

Summary of our Achievements
We have developed a similar image retrieval system that enhances the efficiency and effectiveness than other existing systems. Our large-scale image retrieval system achieves satisfactory precision by incorporating discriminative low-level features extraction techniques. Our approach can also be combined and enhanced by several post-possessing techniques to improve the image retrieval performance in a great extent. Our developed system provides better accuracy and works well in a large-scale dataset. We have implemented a system which properly works on 10 different classes of image with 1,000 database images. We have obtained 87.5% accuracy with 0.875 and 0.175 average precision and recall respectively, which is superior to other existing systems. Also, the time complexity of our system will not be highly variable than the other existing systems.

Future Works
In future, we'll try to work on 10,000 database images as an image dataset. On the basis of our research, we can assure that if shape feature extraction, Contourlet transform instead of wavelet transform (wavelet takes image features from two directions, whereas contourlet take image features multi-directionally), and Local Tetra Pattern (LTrP) can be included in our developed system, we'll get more than 87.5% accuracy.