Texture Classification of Sea Turtle Shell based on Color Features : Color Histograms and Chromaticity Moments

A collaborative system for cataloging sea turtles activity that supports picture/video content demands automated solutions for data classification and analysis. This work assumes that the color characteristics of the carapace are sufficient to classify each species of sea turtles, unlikely to the traditional method that classifies sea turtles manually based on the counting of their shell scales, and the shape of their head. Particularly, the aim of this study is to compare two features extraction techniques based on color, Color Histograms and Chromaticity Moments, combined with two classification methods, K-nearest neighbors (KNN) and Support Vector Machine (SVM), identifying which combination of techniques has a higher effectiveness rate for classifying the five species of sea turtles found along the Brazilian coast. The results showed that the combination using Chromaticity Moments with the KNN classifier presented quantitatively better results for most species of turtles with global accuracy value of 0.74 and accuracy of 100% for the Leatherback sea turtle, while the descriptor of Color Histograms proved to be less precise, independent of the classifier. This work demonstrate that is possible to use a statistical approach to assist the job of a specialist when identifying species of sea turtle.


Introduction
Among seven species of sea turtles found worldwide, five are encountered in Brazil, and all of them are in the red list of threatened species according to the IUCN (International Union for Conservation of Nature) [1].Sea turtles have a long and complex life cicle, whereas the adulthood of these animals starts from 25 years old, leading to complication on the specie development [2], and it may get worse because of the effects of climate change [3].However, the extinction of sea turtles has been combated by environmental organizations such as the Tamar-ICMBio [4].
In Brazil, information about sea turtles began to be collected in the early 80's by identifying the coastal areas of reproduction [5].As a natural consequence of research activity over the years, such areas have experienced the emergence of research centers, as well as the development of environmental education projects focused on sea turtle conservation.The prominent example of this is the [4], a federal program of research applied to marine wildlife preservation.Since 1982, the project have devoted efforts to collect sea turtle data by tagging females, analyzing the spawning phases and mapping the life-cycle of these animals using geolocation information.To manage this data the SITAMAR system was developed [6].The SITAMAR is a restricted-access information system which has remarkably improved storing, querying, and analysis of gathered data.
Despite the progress achieved with SITAMAR, past experiences showed that coastal community participation is essential to assist biologists by providing them information about the turtles daily routine, which has motivated TAMAR-ICMBIO leaders to make the system open-access in the near future [7].This collaborative aspect of the system, however, implies the continued and fast growth of the database, which in conjunction with the variety of turtle species and the limited number of taxonomists or experts on species classification, leads to a big challenge for future studies, as [8] points out.
Current collaborative systems are intrinsically related to mobile technologies that enable, among other things, the users to share text, geolocation, activities, and also image content.SITAMAR is expected to provide a mobile interface that supports on demand uploading of picture/video content of turtle activity, which demands automated solutions for data organization and analysis of this ever-increasing database.This work, particularly, focuses on the problem of turtle species classification based on computer vision techniques.For instance, there are a great effort on studies for automating the classification of animals by using machine learning methods.
Usually, biology experts classify turtles manually based on the counting of their shell scales, and the shape of their head [10].Nonetheless, [8] and [11] showed that computer vision techniques are suitable to classify animal species with distinct patterns of color and texture in their corporeal formation.Therefore, scientists in the area of protection of marine turtles have been trying to extract unique features for identifying to which species an individual animal belongs [12,13,14].Also present keys to identify sea turtles from photographs which goes towards a non-invasive method for identification [15,12,16].However, [14] shows that color texture features outperform the shape feature for the recognition of sea turtles species.In this work, we investigated two color-based descriptors for automated species classification: a) color histogram features [17]; and b) chromaticity moments [18].Although it differs from the human-based classification criteria, these features were adopted based on the distinctive shell color patterns observed for sea turtle species, as depicted in Fig. 1.
The samples of the five sea turtle species found along the Brazilian coast in their natural environment [19] is presented in Figure 1.It is also showed, beside each turtle picture, a rectangular patch cropped from the shell region of the respective turtle, as well as the RGB histogram related to such patch.The histogram for the green turtle, for example, spreads over a wide color range for the three channels (RGB), while the Leatherback turtle histogram tends to concentrate at low values (dark colors).The Loggerhead turtle (or Caretta Caretta) is reddish, in which the peaks of the green and blue channels are very close and have values smaller than the peak of the red channel, which becomes more central in the histogram; The Hawksbill Turtle (or Eretmochelys Imbricata) also has dark colors, but not as much as the Leatherback.While the Olive Ridley turtle (or Lepidochelys Olivacea) has more centralized peaks on its channels.
Different studies argue that supervised machine learning techniques such as Artificial Neural Networks (ANN) [14,20,21],Naive Bayes [22,23], Linear Discriminant Analysis (LDA) [24], Decision Trees [25], KNN [23] and Support Vector Machines (SVM) [23].All these cited works have shown that automated species identification have higher accuracy when compared to human classification.The main contribution of this study is the evaluation of the previously mentioned color descriptors in conjunction to K-Nearest Neighbor (KNN) [26] and Support Vector Machine (SVM) [27] techniques for sea turtle species classification.The image dataset, which is comprised solely of patches of turtle shell, was assembled specifically for the experiments conducted here, thus can be viewed as another contribution of our research1 .
The image dataset used in the experiments was assembled specifically for this study.First, we collected 15 photos of turtles (three per specie) from public online repositories, being the photos RGB images with average resolution of 800 × 600 pixels.For each image, six square 80 × 80 texture patches were manually cropped from the region that encloses the turtle shell (Figure 2), therefore totalling 90 texture patches (18 per sea turtle specie).

Color features
The classification task addressed in this work relies fully on the color information present in sea turtle shells.Such information must be encoded as a n-dimensional feature vector x = (x 1 , x 2 , . . ., x n ) T , i.e., a real-valued vector representation suitable for classification tools.
The ensuing sections presents the theoretical background of color features extraction by means of color histogram features and chromaticity moments, the two colors descriptors explored in this study.

Color histogram features
The first analyzed feature extractor consists of five statistics extracted from color distributions.Let h be a histogram for a single color channel of a M × N image.The probability mass function can be obtained by normalizing h by the total number of pixels from the source image.In mathematical notation, we have where l denotes an intensity level, and L denotes the number of possible intensity levels (here we adopted L = 255).The statistical measures used as features can be computed straight from the probability mass function as follow: • Kurtosis: The mean, the more intuitive measure, is the average intensity level of the image, while the variance quantifies how much the color distribution spreads in relation to the mean.
Kurtosis is also a measure of sparsity and quantifies how flat (or peaky) is the histogram.
Energy evaluates the textural uniformity of the image, where higher energy values associated to images with few colors.Finally, we use the entropy to measure the disorder degree of image data [17].
In this study, statistical measures are independently computed for the RGB channels, therefore each measure gives rise to three elements in the feature vector, totaling 15 features as described in the expression: [18] introduced the chromaticity moments as an efficient descriptor for color texture classification.The chromaticity information of an image is described by the occurrence of certain colors in the CIE-XYZ space (chromaticity diagram), as well as by the number of occurrences (pixels) of each color (chromaticity distribution).[18] makes use of mathematical moments to summarize both the chromaticity diagram and distribution, which characterizes the concept of chromaticity moments.

Chromaticity moments
For a more formal definition, consider an M × N image I.The chromaticity diagram for I is defined as follows: The respective chromaticity distribution is a 2D histogram given by: The Figure 3 shows how T and D are distributed.T (x, y) and D(x, y) can be viewed as functions over which the respective moments of order (m + l) can be calculated: x m y l T (x, y) x m y l D(x, y) where X S = Y S = 100, following the normalization adopted by [18].A set of moments can be obtained by assigning different values for (m, l), being that the insertion/removal of a new moment can affect the classification performance.This work adopts the five combinations in (0, 0), (1, 0), (2, 0), (0, 1), (0, 2) for each moment definition, which leads to the following feature vector: (M 0,0

Classification
This work was based on a supervised classification approach.Supervised classification aims to assign a label to an unknown object given a training dataset consisting of known (or labeled) objects.Such objects are represent by a proper set of features embedded in a multi-dimensional space, the (feature space).In this space, a proper choice of features makes distinct-class objects to occupy regions as disjoint as possible, while same-class objects tend to be restricted to compact regions.This fact is explored in different machine learning algorithms such as Support vector machines (SVM) [27] and K-nearest neighbors (KNN) [26], the two classification techniques investigated in this work.The general workflow described above is show on Figure 4.
In a broader view, SVMs set linear decision boundaries (hyper-planes) as far as possible from example points [27], which comprehend the features extracted from the training dataset.For non-linear classification, the kernel trick can be employed, so that the training data is transposed into a higher-dimensional feature space where data can be easily linearly separated.KNN, in turn, is a simple non-linear classification technique based on a voting scheme.Basically, a new object is labeled with the majority vote of its K nearest neighbors, that is, a subset of the training objects (example points) with the K most similar elements given a distance metric in the feature space.
Figure 4: Supervised classification process used in this study.

Experimental Results And Discussion
All experiments were carried out in Python using the OpenCV library version 2.4 [29].
The color histogram features and the chromaticity moments were tested against the KNN and SVM classifiers.For KNN, it was set K = 3 and the Euclidean distance was used as distance metric.As for SVM, the linear kernel was employed.Performance statistics were estimated with cross-validation techniques Leave-one-out and k -fold [30] (k = 10) were used to assess the performance of each descriptor-classifier combination, preserving class proportions.
The results of those experiments were summarized in confusion matrices.From those matrices three performance measures were calculated: precision, recall and f-measure.Fmeasure is expressed by the harmonic mean of precision and recall, where the best values are close to 1 [31].
The obtained values for these measures are illustrated by means of network and bars graphs.Also, the class and feature extractor names were shortened to facilitate the classification analyses, as shown in Table 1.
Table 2 and Table 3 show results of precision and recall for each sea turtle species.It is easy to perceive that Lepido is the class that presents more problems and variation on precision results, being the worst case 0.23, which shows a difficulty to represent this class on terms of texture.In contrast, Dermo and Eretmo show good results that could reach 0.95 and 0.91 on precision values, and 1 and 0.89 on recall values.
Figure 5a shows a big difference when using distinct classifiers.It is easy to notice that   In contrast, the behavior of other classes was considerably different when changing the combination of the descriptor and classifier.The class Chelonia showed a good performance on the methods Chromaticity Moments and KNN, but had the worst performance when changing the classifier, going from 0.69 to 0.22, showing that the choice of classifier is important for the methods proposed in this work.
Another divergence was verified for the class Lepido.The class that proved to be the most difficulty to classify.This class had its best performance when combining Chromaticity Moments and KNN, but when combining the same descriptor method with the SVM classifier, there was a drastic fall in the value of F-measure reaching 0.1.This class causes the SVM classifier to behave very poorly, generating erroneous predictions.
The class Eretmo presented a behavior totally different from the others, since its best case was achieved by using the method of Color Histograms, even when changing the classifier.This shows that the textures of this class present information that are best captured by the Color Histograms method.
Finally, the Figure 5b shows the k-fold cross-validation method.It is important to note that, for the SVM classifier, it revealed big changes in relation to the Leave-One-Out analysis.The class Lepido presented the most improvement, especially when using the Chromaticity Moment descriptor.Also, as already predicted, the graph of the KNN classifier shows that there was a significant decay in the values, since not all the training images of each class were allocated as in the Leave-One-Out analysis.Maybe, the great variation for the SVM chart is due to the fact of the classifier was less confused by having fewer images of the class Lepido, leading to a small increase in the values of F-measure.The best value found was 3.81 for the combination Chromaticity Moments and KNN as a classifier.Also, this combination was the only one for which the precision values of all classes were above 0.6 or 60%.That is, the precision rate for this combination shows a higher confidence because the maximum error rate is 40%.Only in the Caretta species this combination did not reach the best result, losing to the combination of Chromaticity Moments and the SVM classifier.

Conclusion
The Color Histograms descriptor combined with SVM presented more problems due to the characteristics of the images being represented in a more basic form; and also due to the fact that the classifier is not very precise in separating the presented characteristics and classes.The Chromaticity Moments method combined with the SVM classifier also displayed classification failures due to the lower accuracy of the classifier.
The results of Chromaticity Moments combined with the KNN classifier were clearly superior, probably due to a more precise class separation provided by the classifier.The results obtained were satisfying, especially for the species Dermochelys Coriacea, Eretmochelys Imbricata and Chelonia Mydas.Although, the classes Caretta Caretta and Lepidochelys Olivacea presented lower performance, it was still an improvement over the Color Histograms method.
The results presented in this paper demonstrate that it is possible to use the statistical approach in pattern recognition, but there are still room for improvement as the use of Deep Convolutional Neural Networks [32].The database created for this work will be released for public use, as well as these results, making the community of researchers able to propose different systems and techniques, and at the same time facilitate the comparison of the results.Hence, researchers and taxonomists can use the systems and save time in choosing which approach to use when developing an automatic classification system for sea turtle species.

Figure 6 :
Figure 6: Accumulated Precision for Leave-One-Out

Table 1 :
Abbreviations used on the classification analyses.

Table 2 :
Precision and Recall on each combination of feature descriptor and classifier on the approach K-Fold.

Table 3 :
Precision and Recall on each combination of feature descriptor and classifier on the approach Leave-One-Out.best performance was reached by the KNN classifier, since almost all values are closer to the edge of the graph.Therefore, in class Dermo the classification achieved a very the