Comparative study of ensemble deep learning models to determine the classification of turtle species

ABSTRACT


INTRODUCTION
Turtles are reptiles that can be easily recognized by their distinctive body shape from the head and carapace or dorsal (back) [1].There are three main groups of turtles: land turtles, aquatic turtles, and marine turtles.Marine turtles are also known as sea turtles.There are seven species of sea turtles in the world [2]- [5], six of which can be found in Indonesia: green turtles (Chelonia Mydas), hawksbill turtles (Eretmochelys Imbricata), tortoiseshell turtles (Lepidochelys Olivacea), flat turtles (Natator Depressus), leatherback turtles (Dermochelys Coriacea), and loggerhead turtles (Caretta Caretta) [6].Based on data from the Bengkulu Province Marine and Fisheries Service, it was stated that there were only 4 species that visited the Bengkulu coast, namely green turtles, hawksbill turtles, loggerhead turtles, and olive ridley turtles.
Sea turtles are currently threatened with extinction and were added to the list of endangered reptiles on the international union for conservation of nature (IUCN) red list and convention on international trade in endangered species (CITES) Appendix I of species threatened with extinction [7]- [9].The condition of Comput Sci Inf Technol ISSN: 2722-3221  Comparative study of ensemble deep learning models to … (Ruvita Faurina) 25 endangered sea turtles is caused by threats from human and animal predators.Humans and predators take turtle eggs as a source of protein, and in traditional rituals, turtle backs are used as accessories [10].Coastal communities and other communities, in general, are often mistaken and cannot distinguish the types of turtles they find on the coast.This problem is caused by the high similarity between each type of turtle.This high level of similarity is also an obstacle when reporting turtle findings to conservation authorities.Reports of finding turtles that are still handled manually also cause the process of handling and saving turtles to take a long time.This problem hinders conservationists from making semi-natural nests for sea turtles, which results in increased mortality and eggs failing to hatch.Therefore, to reduce illegal fishing and assist in the conservation of sea turtles, technology is needed to classify turtle species.Deep learning is a new and popular classifier technology.Deep learning can manage vast volumes of data.One of the benefits of deep learning is transfer learning, in which the model learnt for one task can be applied to other tasks with limited data [11]- [13].Deep learning, particularly convolutional neural networks (CNN) inspired by the mammalian visual brain, has the capacity to evaluate and research a huge number of features on its own, including some not previously addressed by experts [14].Not many studies on the turtle classification system that have been carried out by previous researchers can be found.Several related studies were found: Liu et al. [15] in his research conducted a classification of turtles using deep learning with transfer learning: LeNet, AlexNet, VGG16, VGG16-TL, InceptionV3 and Inception v3-TL based on CNN resulting in an average accuracy of 65.2%, 80.6%, 84.4%, 91.4%, 87.2% and 96.4%.Paixao et al. [16] developed a texture-based classification system for five species of sea turtles found on the coast of Brazil.The method used is k-nearest neighbors (KNN) and support vector machine (SVM) with color histograms and chromaticity moments features.The KNN method is claimed to be better than SVM, with a global accuracy of 0.74.Yussof et al. [17], developed a sea turtle identification system using transfer learning, CNN AlexNet, and SVM.The dataset is sourced from the Biodiversity Research Center, Academia Sinica, Taiwan.The highest level of accuracy of the classification system is 62.9%.Dunbar et al. [18] conducted a study on the practical use of photographic identification (PID) methods to identify sea turtles.PID case studies were conducted to identify sea turtles in Reunion Island (France), Roatan (Honduras), and the Republic of Maldives.The study results show that PID can be an effective and efficient method for gathering information about animals.
Different from the studies mentioned above, the learning method used in this study is based on the concept of deep learning training via the well-known and successful use of transfer learning with appropriate pre-trained models [19].Then combine the power of transfer learning models known as "deep learning ensembles" [20].In this case, VGG-16 , ResNet-50, InceptionV3, DenseNet201, and Resnet152 [21].From the training results, it is known that each CNN model has different generalization abilities on the dataset.Based on these observations, the three most successful CNN models, ResNet-50, InceptionV3, and DenseNet201, were selected for the ensemble method.The classification results obtained from the selected CNN model are combined using the ensemble average voting method to reach the final output of the classification.As a result of this ensemble method, satisfactory classification results were obtained.Therefore, this study proposes an ensemble method using three transfer learning models to strengthen the final decision and observes the use of original and augmented data in the model.

METHOD
This study will be built using the cross-industry standard process for data mining (CRISP-DM) method.Cross-industry standard process for data mining (CRISP-DM) was developed in 1996 by the analysis of several industries such as standardization Daimler Chrysler (Daimler-Benz), statistical package for the social sciences (SPSS), and non-conformance report (NCR).CRISP-DM can be used as a general problem-solving strategy for a business or research unit [22]- [24].The flow of this method can be seen in Figure 1.
The CRISP-DM method begins with the Business Understanding Phase, which is the business understanding phase to determine the direction of research to be carried out.Then proceed with the data understanding phase, which is the data understanding phase for dealing with data needs related to business goals.Furthermore, the data preparation phase is carried out, which is a phase to improve data quality so that the data is in accordance with the modeling process to be carried out.This modeling phase involves the creation of a model, after which the data is ready for the model-based training process.Next is the Evaluation phase, in this phase an evaluation will be carried out on the model in which the iteration is made.The last phase is the deployment phase, in which the model will be implemented on the desired platform.

Business understanding
Coastal communities and other communities, in general, are often mistaken and cannot distinguish the types of turtles they find on the coast.Reports of turtle weaving that is still manual also cause the process of handling and saving turtles to take a long time.This problem hinders conservation parties from making semi-natural nests, which results in increased mortality and eggs that fail to hatch.To reduce threats and aid conservation, a technology capable of classifying turtle species is required.One of the emerging and popular technologies for classifying is deep learning.Deep learning can perform classification through images or videos.The advantage of deep learning is the ability to transfer learning, which means that the model learned from one task can be reapplied to another task that may have limited data.Transfer learning performance can be improved by combining transfer learning, also known as the "ensemble deep learning model.The deep learning ensemble model generated in this study can be implemented into a web-or mobile-based system to assist the classification and reporting process when the community finds turtles.As a result, this system is expected to help the community and conservation organizations protect turtles by providing access to a system for the classification and reporting of turtle findings that can be accessed via cellphones or personal computers.

Data understanding
This study requires analysis of data needs and data collection carried out in three ways: literature study, observation, and interviews.The dataset used is a turtle image consisting of 4 classes according to the types of turtles that have been validated by experts: green turtles, hawksbill turtles, olive ridley turtles, and loggerhead turtles.Figure 2 depicts an example sea turtle images from the dataset.Figures 2(a The image dataset used in this study consists of a primary dataset and a secondary dataset of images taken from public datasets.The primary dataset of 654 images was collected by the research team at the "Konservasi Penyu Alun Utara" located in Pekik Nyaring Village, Central Bengkulu Regency, Bengkulu Province, Indonesia.While the secondary dataset of 850 images was taken from Smaranjit Ghose's public dataset on Kaggle [25].The composition of the dataset is shown in Table 1

Data preparation
At the data preparation stage, the research team resizes, augments and separates the dataset from the data that has been obtained.The image size is resized to 224×224 px, then augmented with rotation, noise, brightness, and blur augmentation techniques, and then the dataset is divided into three parts, namely training data, validation data, and test data.Before being divided, the data, especially the distribution of the dataset after processing, are shown in Table 2.The augmentation technique is performed with random values in a range, each of which is rotation: -40 to 40, noise: 1 to 5%, brightness: -25% to +25.%, blur: 1 to 5 px.The distribution of the dataset after the process is shown in Table 2.

Modeling
The ensemble deep learning that will be carried out in this study will use the average voting strategy.The average vote will take the probabilities made for each data point in the average.In this method, the ensemble classifier system takes the average of the predictions from all the models and uses it to make the final prediction.At this stage, we will simulate the ensemble deep learning by adjusting the parameters to produce the best model.The parameters needed to be set in the model training process are input, batch size, epoch, sea turtle dataset, hyperparameter, and weight evaluation.The same parameter properties are applied to five types of transfer learning: InceptionV3, DenseNet, VGG16, ResNet50, and ResNet152.Three of the five models will be selected, which are good for an ensemble model.Details of the design stages of the sea turtle's classification model are shown in Figure 3.
From the detailed steps in Figure 3, it can be seen that the training dataset is trained and validated using transfer learning with the InceptionV3, DenseNet, VGG16, ResNet50, and ResNet152 architectures.A test dataset is used to evaluate the performance of each architecture's output model.The average vote of the three best models was taken based on the performance of the validation and evaluation of the test dataset to be used as the final ensemble model for the classification system of the turtle.In the training process, the initial weights of the pre-trained model used have been trained with the ImageNet dataset; the only layer taken is the feature extraction layer, while the last dense layer is replaced with a fully connected layer for the sea turtle classifier.The training process is evaluated based on data loss and accuracy in the training and validation datasets, as well as the values of precision, recall, and TF1 score.
Five well-known CNN architectures (InceptionV3, DenseNet, VGG16, ResNet50, and ResNet152) were trained in the study with a batch size of 16 and a learning rate of 0.0001.We trained models with the same epoch size (300 epoch).The callbacks list method will save accuracy for the training model.Adam was used as the optimization function to minimize the categorical cross-entropy loss function.The softmax activation function was used in the last layer for classification.Early stopping was utilized to overcome overfitting in models.All experiments were carried out on a Windows-based PC with 4 GB of RAM, a 4 GB hard drive, and a 256-bit Nvidia Core i5 graphics card.The computer languages Python and the keras module

Evaluation
The performance of the classification model is evaluated based on precision, recall, accuracy, and F1 Score.These metrics are calculated based on true positive (TP), true negative (TN), false positive (FP), and false negative (FN) data from the confusion matrix based on (1)-( 4).TP is the number of true positive predictions, TN is the number of true negative predictions, FP is the number of false positive predictions, and FN is the number of false negative predictions [26]- [28].

RESULTS AND DISCUSSION
The evaluation metrics results for the trained models of InceptionV3, DenseNet, VGG16, ResNet50, and ResNet152 in this study are shown in Table 3.It can be seen that the three best models are InceptionV3, DenseNet201, and VGG16.Example of train-validation loss and accuracy graphs of the DenseNet201 are shown in Figure 4 4.It shows that the best ensemble model is VGG16 -DenseNet201 with accuracy, precision, recall, and F1-Score of 0.74, 0.75, 0.74, and 0.76, respectively.The ensemble model shows a significant performance improvement over the original models.


ISSN: 2722-3221 Comput Sci Inf Technol, Vol. 4, No. 1, March 2023: 24-32 28 are utilized in the software development process.The three most successful CNN models were chosen for the ensemble approach from among the five.

Figure 3 .
Figure 3. Model deep learning and Figure 5.In the figure, it can be seen that the loss and accuracy of the DenseNet201 Comput Sci Inf Technol ISSN: 2722-3221  Comparative study of ensemble deep learning models to … (Ruvita Faurina) 29 model in the training process are approaching convergence above 25 epochs.The other models also started to converge around 25 epochs.

Table 3 .
Evaluation metrics for the trained models of the sea turtle classifier