Research And Implementation Of Facialnet Based On Convolutional Neural Network

Deep learning, artificial intelligence and other cutting-edge technologies are constantly being integrated into people's daily lives. Even small vending machines that can be seen everywhere in life have begun to use facial payment methods. The detection and recognition of face images is no longer unattainable, but the analysis and recognition of face information and characteristics (gender, age, race, etc.) is still not fully mature, in order to improve the accuracy of face information recognition In this paper, a face information recognition model is designed. The feature extraction part uses an eight-layer convolutional neural network, and then uses two fully connected modules as the classifiers for gender recognition and age recognition. The experimental results show that the model uses the advantages of the convolutional neural network so that the model can predict the gender and age of the face more accurately.


INTRODUCTION
Since 2012, deep learning has developed rapidly. It not only shines in object detection and image classification, but also has been widely used in the field of face recognition, completely changing the existing methods of face recognition. In this study, we designed a FacialNet model that classifies faces by gender and age. Compared with other similar models, this model has significantly improved gender prediction accuracy. However, the accuracy of age classification is lower than other models. Since our model is an end-to-end model, the two classifiers share a feature extraction network, so we speculate that the reason for the lower accuracy of age prediction is gender and age. The correlation between calculations is weak. Other models use two independent networks to predict gender and age. When training the model, we train on a computer equipped with RTX 2080 GPU. The hyperparameters are set as follows: epoch is set to 100, the initial learning rate is set to 0.01, and batch_size is 25. The optimizer uses SGD, the momentum is set to 0.9, and the weight attenuation is set to 0.01.

II. RELATED WORK
Related scholars at home and abroad have done a lot of research on this topic. In the research of Shi Xuechao [3], a face gender recognition algorithm combining adjustable supervision function and multiple feature fusion was proposed. The algorithm they proposed can be regarded as an improvement of the traditional convolutional neural network. It uses the deep convection of the complete convolution area and multi-layer feature information to combine the second and fourth convolutional layers with the fifth convolutional layer. This method enhances the image feature information from the input layer to the fully connected layer, thereby improving the accuracy of the final classification; in addition, it also introduces the target monitoring function with adjustment mechanism, which can effectively guide the network to learn to make similar patterns The distance between the classes decreases and the distance between samples of different types is larger. In the research of Zhang Zhihua [4] and other scholars, in order to make up for the error between automatic face recognition and gender estimation, they used deep convolutional neural networks for recognition and made great progress. Data set and label analysis, and used to build a simple neural network, and also have good results in gender classification. Although the use of the data set Adience is very challenging, due to the simple structure of the CNN they designed, this method is better than the previous technology.

III. Network Model Design
In order to realize the research on gender and age recognition of the face dataset Adience, this paper designs a network model with eight layers of convolution to extract facial features, and named it FacialNet. The first 4 layers are put together to extract more features, and the last 4 layers are to extract more high-level features. The 8-layer convolution can not only realize the function of feature extraction and classification, but also reduce the amount of calculation. The network structure is shown in Figure 5: The number of the first convolution kernel is set to 32, and the size of the convolution kernel is set to 3×3. When the size of the receptive field is the same, the small size of the convolution kernel can reduce the calculation in the training process. Volume and parameters, so I choose a small-sized convolution kernel in this article. The number of the second convolution kernel is 64. The choice of the pooling layer type mainly considers two sources of error. Average pooling can reduce the error caused by the limited neighborhood size, so that more environmental information is preserved, and the maximum pooling It can reduce errors caused by the convolution process and save more texture detail information. Because the extraction of face information needs to retain more texture information, the maximum pooling is selected to extract more details. The pooling size is 3×3. The number of third, fourth, and fifth convolution kernels is set to 128, among which the first The size of the third and fourth convolution kernel is set to 3×3, and the size of the fifth layer convolution kernel is set to 1×1. The number of the sixth convolution kernel is set to 256, the size of the convolution kernel is set to 3×3, the number of the seventh and eighth convolution kernels is set to 512, of which the size of the seventh layer is set to 3×3, and the eighth layer is volume The size of the product core is set to 1×1.
2 The Sigmoid function has the defect of saturation, so the activation function in this model is the ReLU function. This paper uses the learning rate adjustment function LearningRateScheduler to monitor the changes in the value of loss to adjust the learning rate, and tests the impact of different initial values of the learning rate on the final result on a smallscale data set. The test range is from 0.1 to 0.001. One person is divided into 3 test numbers for experimentation. It can be obtained that when the learning rate is 0.002, the correct rate is the highest, and the loss is also very low, so the initial value of the learning rate in this article is set to 2e-3. The detailed structure of the model is shown in Table 1, Table 2 and Table  3.
-Sample label -Indicates the probability that the sample prediction is positive The multi-category situation is actually an extension of the two-category: among them: (2) M-number of categories -Indicator variable -For the predicted probability of the observed sample belonging to the category

A. Data preprocessing
This experiment uses the Adience data set as the experimental data, which can reflect the real camera environment to a large extent. Remove duplicate data and undetected face images, and finally get preprocessed experimental face images, and use image data enhancement methods to increase the amount of data during training and verification. The data distribution is shown in Figure.2 and Figure.3. This experiment uses a 75*75 picture size for face detection. First, the face part of the face picture is segmented, and then the face is aligned, and the picture is converted into a grayscale picture. The face information is recognized and Color does not matter much, but will affect the training and recognition to learn irrelevant factors, and then the face image training and verification data will be enhanced and then passed to the model for learning. The display of some pictures in the data set after data preprocessing is shown in Figure 4. Figure.4 Partial data display of Adience data set after preprocessing

B. Image data enhancement
This article uses the method of image data set enhancement to increase the amount of image data. This method not only allows the limited image data to generate more equivalent image data, but also prevents overfitting during model training. The effect of the final image data enhancement is shown in Figure 5.

C. Training configuration
This model is implemented using pytorch and trained on RTX 2080. The TensorBoard tool is used to visualize the parameters in the training process. And use the data set Adience for face information recognition, and build a unique CNN network model to solve the problem of gender and age recognition. Below we will elaborate on the model framework and experimental process.

D. Trainningvresults
The first method of this gender recognition experiment is also to use the original data set to train the model, and the correct rate on the Adience test set is 88.6%. The second method is to perform data enhancement on the original data set. After training, the correct rate of the verification set is 90.3%; the correct rate of the first method of the age recognition experiment is 68.1%, and the correct rate of the verification set of the second method is 72.8%. The comparison of the effects is shown in Table 4. Table 4 Comparison of gender and age recognition effect on Adience

Methods
Gender accuracy Age accuracy The model in literature [3] works best ---54.1% The model in literature [4] works best 88.2% ---- The trend curve of gender recognition accuracy rate during training is shown in Figure 6: The trend curve of loss function during training is shown in Figure 8: Figure.8 Loss function trend curve V. CONCLUSION From the above comparison results, it can be seen that the accuracy of this model in gender prediction has a small increase compared with other models, but there is still a gap compared with the excellent age model. Integrating various methods together, finally got good results. In the next work, we will use more complex and large face data sets to build more fficient and accurate models for further research in improving the application scenarios and accuracy of face recognition.