Regenerating face images from multi-spectral palm images using multiple fusion methods

This paper established a relationship between multi-spectral palm images and a face image based on multiple fusion methods. The first fusion method to be considered is a feature extraction between different multi-spectral palm images, where multi-spectral CASIA database was used. The second fusion method to be considered is a score fusion between two parts of an output face image. Our method suggests that both right and left hands are used, and that each hand aims to produce a significant part of a face image by using a Multi-Layer Perceptron (MLP) network. This will lead to the second fusion part to reconstruct the full-face image, in order to examine its appearance. This topology provided interesting results of Equal Error Rate (EER) equal to 1.99%.

The employed objectives can be stated as follows: firstly, image preprocessing steps were handled to extract and enhance the hand images. These preprocessings were eliminated or cropped, morphological operations, adding top-hat characteristics and an unsharp filter. Secondly, a feature fusion between multi-spectral images to combine the features, where Haar wavelet fusion based on the mean rule was used. Thirdly, a wavelet transform was applied for the enhanced and fused image. Fourthly, the MLP neural networks were trained by considering a right hand to predict the inner face image and the left hand to predict the outer face image. Fifthly, a score fusion was utilized to collect the face image according to the maximum or adding rule. Sixthly, the same processings should be followed to test the MLPs using different data. Finally, the last decision was taken and hence, Figures 1 and 2 demonstrate the block diagrams for our proposed method. This new suggested topology will increase the security and effictiveness of the biometric system. This paper is organized as follows: the first section is the introduction, prior work and the proposed method. The second section is the hand images extraction followed by enhancement. The third section is in relation to the two types of fusions, features and the score. The fourth section is for the MLPs artificial neural networks. The fifth section is for the results and discussions, with the final section being the conclusion.

Image Extraction and Enhancement
Data extraction and analysis are considered as the most critical and essential problems in Image Processing (IM) and Artificial Intelligence (AI). In this paper, preprocessing steps are adopted to extract, enhance and normalize the data in order to be prepared for the (MLPs) networks. CASIA multi-spectral palm images database are employed to build a relationship with the ORL face images database. The multi-spectral images of the palm consist of 7200 jpg images for 100 different people. All these images are 8-bits gray-scale with six electromagnetic spectrums starting with 460 nm, 630 nm, 700 nm, 850 nm, 940 nm and white respectively. This valuable database included images for both right and left hands. A CCD camera is positioned at the bottom of a hand with some spectrum lights. There were no pegs or restrict positions for the palm in the device, although it had a uniform background colour. Two sessions were organized to capture the images. Each session took snap shots of three multi-spectrum images. The interval period between each session was more than one month [14].
A number of morphological operations were adopted after the image elimination to reduce any noise and maintain the hand image. To begin with, the cropping image was an 8-bit grayscale denoted as ( , ): 2 → [0,255] which needed to be converted to a binary image defined as ( , ): 2 → {0,1} [15]. Consequently, to convert the image from 8-bit grayscale to a binary image a threshold  was executed in (1): for the binary image ( , ) and a structuring element ℎ( , ), the erosion ⊝ and dilation ⨁ are denoted as [16]: small white objects should be removed from the binary image, while holding the very large area. An open morphological operation may solve this issue. See (4): where:  is a specific area.
Nevertheless, there will still be some small objects connected to the largest hand area, which will not be deleted by (4). To remove these objects, a complement image ̂ defined as ̂( , ): 2 → {1,0} was produced from the last operation. A major morphological operation was performed consecutively, to clear the unexpected noise by setting them to 1's if the neighbourhood majorities are ones [17,18]. Hence, the palm image with fingers is easily created by combining the original image with the complement image as described in (5): where: ( , ) is the new created image, s is a small scalar value and ̂( , ) is the resulting binary image after (4). An example of a hand image before and after the morphological processings is given in Figure 3.  image, followed by adding them to the resulting original image ( , ) appeared to be a high-quality enhancement. Initially, a structuring element is created as a disk shape of one's values [18]: where: R is radius of pixels. Consequently, the top-hat details are isolated from the image as shown in this equation [19]: where: is the image after the top-hat filter and is the structuring element. Hence, these top-hat features are added to the hand image according to (8) [18]: then, an unsharp filter is applied to enhance the details of the edges. So, an is produced as follows [20]: where: is a positive scale factor and ( , ) is the correction signal, calculated as an output of a highpass filter [20]. An example of the image enhancement processing is given in Figure 4.

Fusion
Fusion between different biometric feature acquirements could be considered as a significant method to increase the ability and security of the systems [21]. Two fusion methods are used: feature fusion based on the wavelet with mean rules to extract the hand features from the multi-spectral images and score fusion to combine or collect the final face image.

Feature Fusion
A fusion between images is considered as an interesting technique, so as to increase the level of efficiency of any biometric system. Collecting multiple information from different images is a problem which can be solved by the fusion technique. Merging two images into one individual image is a type of fusion ability that provides more data in a single image [22].
Multi-spectral hand images have many characteristics from palm, fingers and hand geometry to vein, lines and small patterns. Fusion between each two multi-spectral image types will maintain and cover the combination information. In this paper, image spectrums of 460 nm were combined with the image spectrums of 630 nm. Similarly, 700 nm images were merged with 850 nm, whilst 940 nm were fused with the white light image. This feature fusion method is implemented by wavelet fusion with the Haar signal, 4-levels and mean rule for the both approximations and details parts. See Figure 5 (a).
According to this fusion, four coefficients will be generated for each image after the wavelet transform (one approximation and three details). Im OTU1 and Im OTU2 are two where: IDWT is the inverse of the 2D wavelet transform and AV is the average.

Score Fusion
Score matching level fusion is a method that is used extensively with the multiple biometric models [23]. It is performed after the authentication processing, where each output is calculated individually and subsequently, a combination scoring level is applied [24]. The ORL database of face images are used in this paper. This database is produced in AT&T laboratories at Cambridge University through the collaboration of three groups (Speech, Vision and Robotics). It consists of 400 images from 40 people and each person has a different expression [25]. The critical problem in our work is generating face images using score level fusion. Afterwards, a decision is taken according to the most clear and distinct image. To explain in more details, two essential parts of a face image are intended to be predicted by using MLP neural networks. This is the middle part of a face image, which mainly consists of the eyes, nose and mouth. Thus, the right-hand images are used to predict this middle part. The outer part of a face, which commonly has the ears, hair and lower jaw employs the left-hand images to predict this boundary part. Subsequently, a score fusion is performed by using the maximum or adding rule to construct the face image. Figure 5 (b) shows the idea of the score fusion level. Assume According to this technique the security of the system will increase because two hand images are required to prove the face image.

Neural Network
Artificial Neural Network (ANN) is one of the most popular types of the Artificial Intelligence (AI). In recent years, it has become widespread in different fields. Biometric verification, identification and classification are examples of these fields. There are two main types of ANN, supervised and unsupervised. Each one of these types attempts to simulate a significant task in a human brain. Moreover, there are two essential stages in any ANN: the learning stage and the testing stage. In the first stage, the network learns the inputs and generates specific weights to manage the problem. In the second stage, the ANN deals with other inputs which have not been seen before [26]. In our work, supervised MLP neural networks are investigated and adapted to achieve their tasks.
First of all, the input data of each image need to be prepared for the MLP network. Thus, each input image is segmented into specific matrices with different sizes of 5×5, 7×7, …, 13×13 pixels. This will ensure providing different overlaps between the matrices. A Coefficient of variance is calculated to each segment as illuminated in (12)-(14) [27]: where: n is the number of pixels in each segment, seg is the matrix of 5×5, 7×7, 9×9, 11×11 or 13×13 pixels, Mean is the average, SD is the standard deviation and CV is the coefficient of variance. The advantages of using the CV are: no dimension units can be considered, all values are positive, the differences will be given as small ratio values (this will avoid the ANNs overload in the next stage), the variances between the same vector type can be calculated (this will be valuable for the same target in the training stage), and the variances between the different vector types can be determined (this will be useful for the stage of fusion between two different types). The second step, is arranging the CV values into a one-dimensional vector for each image. The final input preparation is mapping the input data in [0,1] range as shown in (15) [28]:

Results and Discsussions
The performance of the proposed method is examined and compared with other work. The databases are collected as well as organized into groups. An input group of 4020 multi-spectral image is used in the ANNs training stage and another input group of 804 multi-spectral images utilized in ANNs testing stage. In addition, each one of the two groups have been separated into two other groups; the left hand and right-hand groups. In the training stage, each hand group contained 2010 images. In the testing stage, each hand group consisted of 402 images. Both training and testing stages of the MLP network attempted to predict a clear and easily recognized part of a facial image, where each individual part has its own MLP. All trainings have been accomplished by the algorithm of the Scaled Conjugate Gradient (SCG), which is described in [29]. Examples of training curves are given in Figure 6. For the regression test, both trainings attained 45 degrees or Regression (R) equal to 1 between the MLP outputs and targets. See Figure 7. To analyse Figure 7, the non-linear relationships were established between two biometrics, which are the right and left hand with the inner and outer part of a face, respectively. This relationship is the base for predicting a full face image from inputs, which have never been seen before. From this point, predicting parts of some face images are shown in Figures 8 (a and b). Whilst, the combination between each two parts are displayed in Figure 8 Figures 9 (a, b and c) have examples of unclear face images. As mentioned before, using the two main face parts will increase the security of the system. Moreover, producing a clear face image will assist any inexperienced individual to easily authorize or identify the certain person.
In the final decision process, the image which is the most clear and nearest to the specific vector will be considered as a 'true' and the image which is the most distorted and furthest from the specific information will be considered as a 'false'. Thus, two types of classification were determined. Hereafter, the three main parameters are calculated: False Acceptance Rate (FAR), False Rejection Rate (FRR) and EER. The first two parameters are calculated according to the following equations: the EER parameter is calculated from the equality between FAR and FRR through different threshold values, see Figure 10.   [7,13] is that both of these works concentrated on regenerating full details of face images. In addition, all of the reported works in Table 1 have used the ORL face images database. From this table it can be seen that simple statistics were used with ANN techniques in [7,13], where the system strength level of the proposed systems is high. In this study, two fusion methods and two multi-spectral hand images (right and left) have been employed to regenerate full face details. Therefore, the strength level is very-high as spoofing the suggested system is so difficult. The EER values have been recorded to 10% for [7] and 6.43% and 2.86% for [13]. In this proposed approach, the EER value has been recorded equal to (1.99%). So, our work appears to have more accurate resutls than other studies.  [7] Simple statistics with the MLP High 10% Al-Nima et al. [13] Simple statistics with the BPN High 6.43% Simple statistics with the DFN High 2.86% Proposed approach Two fusion methods with the MLP Very-High 1.99%

Conclusion
A strategy to predict face image with high-level security was produced in this paper, where two non-linear relationships have been established. Firstly, a relationship between the right-hand features and the middle part of the facial image, which in general contain the eyes, nose and mouth. Secondly, a relationship between the left-hand features and the boundary part of the face image. In this paper, two levels of fusion are examined; the feature level and the score level. The idea behind using the feature level is to combine and enhance the multi-spectral hand characteristics, whilst, the score level is used on the face images to collect and reconstruct clear details of a face.
The suggested approach structure confirmed its efficiency and robustness. The performance of overall technique was benchmarked to EER =1.99% during the testing stage, where full face details were reconstructed. In addition, the proposed system increases the anti-spoofing, strength and security levels. This is because two multi-spectral images of the two hands (left and right) are required to regenerate all face details.