Earprint recognition using deep learning technique

Earprint has interestingly been considered for recognition systems. It refers to the shape of ear, where each person has a unique shape of earprint. It is a strong biometric pattern and it can effectively be used for authentications. In this paper, an efficient deep learning (DL) model for earprint recognition is designed. This model is named the deep earprint learning (DEL). It is a deep network that carefully designed for segmented and normalized ear patterns. IIT Delhi ear database (IITDED) version 1.0 has been exploited in this study. The best obtaining accuracy of 94% is recorded for the proposed DEL. under


INTRODUCTION
Recognizing people is one of the most important field in security systems. It starts from early stage in humans' life. Basically, individuals were started to be recognized by using their genders, names, ages and nationalities. Then, this matter has been further developed where specific documents have been established for each person in order to provide a clear identity. Examples of these documents are passports and identity documents (IDs). Classical recognition systems that consider ID cards, password and personal identification number (PIN) are not sufficient for reliable identification. Because they can easily be forged, forgotten, misplaced, stolen, or shared [1]. On the other hand, biometric characteristics can electronically and automatically recognize individuals [2]. Generally, biometric characteristics can be classified into physiological biometrics and behavioural biometrics. Physiological biometrics refer to the physiological characteristics within the people's body. Behavioural biometrics points to the behavioural characteristics of people's manner [3]. Physiological characteristics are often more reliable and accurate than the behavioural characteristics as the behavioural of humans may be influenced by the emotional feelings like tension or sickness [3]. Examples of physiological biometrics are iris, fingerprint, face and earprint, and examples of behavioural biometrics are voice, gait and signature [4,5]. Earprint is a type of physiological biometric. It principally refers to the outer ear shape. It differs between humans, twins and identical twins. Moreover, ear shapes differ between left and right ears [6]. Figure 1 shows the various earprint features.
The aim of this paper is proposing a DL model for earprint recognition. This model is called the deep earprint learning (DEL), this model using Adam optimization to determine the best parameters of convolution and pooling layers to obtain the best error if compere with other training optimization methods have been examined for the DEL network such as stochastic gradient descent with momentum (SGDM), and root mean square propagation (RMSProp) [7,8]. The remaining sections are distributed as follows: section 2 provides the literature review of this paper, section 3 describes the DEL method, section 4 discusses the results and section 5 declares the conclusion. A limited number of studies considered the earprint as a type of recognition in the literature. In 2015, automatic recognition systems based on ensemble of local and global earptint features was explored, apromising performance was concluded for considering both local and global earprint features [9]. In 2016, a new feature extraction approach was illustrated for the ear geometry recognition. In this approach, both the minimum and maximum ear height lines were employed, then, three ratio-based features were highlighted to enhance the scale of robustness [10]. In the same year, a decision-making of sparse coding-induced was employed with the earprint. It was proved that fusing both residuals and coefficients components can obtain better performances [11]. In 2018 combined different deep convolutional neural network models and analyzed in depth the effect of ear image quality [12]. In the same year, a framework of earprint recognition was described for a light field imaging. A new lenslet light field ear database (LLFEDB) method was illustrated by utilizing the richer spatio-angular features [13]. In 2019, a multi-modal biometric recognition method was explained, where earprint and finger knuckle print (FKP) were used. Techniques of local binary pattern (LBP) and feature level fusion (FLF) were exploited in this study [14]. In the same year, a new approach for a single earprint was proposed. It consists of three phases: providing normalization process, applying a novel Eigenears and utilizing nearest neighbour classifier [15]. In this paper, exploiting a DL technique for earprint recognition is considered. Therefore, a DEL technique is proposed and evaluated.

RESEARCH METHOD
In this work, we are proposing the DEL network. It is a DL technique and a type of convolutional neural network (CNN). It is designed to accept earprint patterns. Firstly, the DEL network can be trained with various earprint patterns that are acquired from different persons. Then, the DEL network is tested for new earprint patterns that have not be seen before. The training stage will be resulted by producing useful values (weights). These values can be stored in order to be used later in the testing stage. Figure 2 shows the general DEL framework structure. DEL network consists of multi-layers. These are: the earprint input, convolution, rectified linear unit (ReLU) layer, pooling, fully connected (FC), softmax and classification layers. The architecture of the DEL network is given in Figure 3. The input layer is adapted for the earprint patterns. It accepts grayscale two-dimensional (2D) images. Thus, each earprint image E involves only one channel.
Regarding the convolution layer, the input image E is transformed into group of feature maps. The feature maps are convolved input image by a kernel weights matrix. The following equation represents the convolution process of the convolution layer [16]: where , , is a ReLU output? Subsequently, the pooling layer further decreases the sizes of previous channels. The following equation represents the pooling computation [18]: where , , is a pooling output, 0 ≤ < ℎ , ℎ is a pooled channel height, 0 ≤ < , is a pooled channel width, 0 ≤ z < = −1 , is a pooled channel matrix, ope is the maximum operation, ph is a sub-channel height and pw is a sub-channel width. Hence, FC layer can match between the pooling neurons and required recognizing subjects. The following equation demonstrates the FC operation [19]: where is a FC output, 1 −1 is the prior channel height of, 2 −1 is the prior channel width, 3 −1 is the number of prior channels, , , , , is a connection weight between FC and pooling, Oz is the vector/vectors of pooling layer outputs, and is the number of required classes. Now, Softmax transfer function can be computed as follows [20,21] where is a softmax output? Hereafter, the classification layer is required. The following equation can be considered [2]: where Dr is a classification output, max is obviously a maximum br value and m is the number of classes.

RESULTS AND ANALYSIS
First of all, earprint database are available for the IITDED version 1.0. It consists of touchless earprint images. It was collected from students and staff from the IIT Delhi in India. It was acquired between October 2006 to June 2007. Earprint images were captured using indoor environment and simple imaging setup. 221 persons in the age between 14 to 58 years were participated with multiple image samples (at least three earprints). All earprint images are of resolution 272204 pixels and they are of type jpeg format. Furthermore, segmented earprint images are also available within the same database, each with a size of 50180 pixels [22,23]. The segmented earprint images of the IITDED version 1.0 has been employed in this paper but for input image size of 18050 pixels. Two third number of earprint samples has been used in the training stage. Whereas, 100 evaluated case has been used in the testing stage. The training parameters have been set as: Adaptive moment estimation (Adam) optimizer, maximum epochs equal 50, initial learn rate equal 0.0001, decay rate of gradient moving average of 0.9, decay rate of squared gradient moving average of 0.999 and denominator offset of 10 -8 .
To determine the best DEL network parameters, many experiments were executed and evaluated. Table1 shows various DEL network experiments with Adam optimization to determine the best parameters of convolution and pooling layers. In this table, convolution layer and pooling layer parameters are evaluated by changing a single parameter and fixing the values of all the remaining parameters. The DEL network performance can simply be evaluated by the obtained accuracy. It can be observed that the best convolution layer parameters are recoded for: filter size of 1313, number of filters equal 10 and padding of 0. Furthermore, it can be seen that the best pooling layer parameters are reported for: pooling type of maximum, filter size of 55, stride of 10 and padding of 0. This is because the DEL accuracy after using these parameters has benchmarked its highest value of 94%. Tuning any parameter value for slightly more or less than the recorded parameter wou ld decrease the accuracy value.
The training curves of the DEL network by using the best obtaining parameters are given in Figure 4. Moreover, different training optimization methods have been examined for the DEL network as given in Table 2. In this table, different training optimizations have been evaluated. These are: stochastic gradient descent with momentum (SGDM), root mean square propagation (RMSProp) and Adam. Obviously, Adam optimization has attained best accuracy of 94% compared with the SGDM and RMSProp as they attained 71% and 93%, respectively as see in Figure 4. It is worth mentioning that the performance of increasing the DEL architecture by adding more than one convolution, ReLU and pooling layers are also investigated. That is, by using two sequential convolutions, ReLU, pooling, FC, Softmax and classification layers the accuracy decreased to 82%. Also, by using convolution_1, ReLU_1, convolution_2, ReLU_2, pooling, FC, Softmax and classification layers the accuracy decreased to 83%. According to these results the reported accuracies are too far from the best attained accuracy. Thus, it is not worth to increase the complexity of the DEL network architecture. The DEL network has been compared with state-of-the-art methods as shown in Table 3. From Table 3, it can clearly be seen that our proposed method has achieved the best accuracy of 94% over state-of-the-art methods after applying their proposed architectures to our data. That is, the proposed CNN architecture in [24] achieved 63% and the novel deep finger texture learning (DFRL) architecture in [25] achieved 72%.

CONCLUSION
In this study, we proposed an efficient DEL network model. This network has the ability to recognize different earprint patterns. The suggested method was investigated and evaluated for different DL parameters. It reported best recognition accuracy of 94% and this can be considered as a promising performance. Also, the DEL outperformed other state-of-the-art networks.