Key-frame extraction based video watermarking using speeded up robust features and discrete cosine transform

ABSTRACT


INTRODUCTION
Recent developments in multimedia technologies and the transmission of digital video over a network show how significant digital video is becoming to be as a broadcasting, communication, and entertainment medium.It gets more difficult to transmit and store raw video as storage capacity grows and video quality improves.Due to the development of video compression, almost all digital videos are now transferred via networks in a compressed format.Different codec, including H.261, H.262, H.263, and H.264, are available for the videos.These codec are used to encode raw videos.New generations of H.264/AVC video codec are widely used due to their great compression efficiency and strong network compatibility.
Initially, watermarks were incorporated into all frames of the video using the same image watermarking approach.However, such types of technique es require more time for embedding and extraction of watermarks, and they fail to address the challenges caused by the temporal dimension of video sequence.It has been observed that the video quality degrades if image watermarking techniques are directly applied to it.Therefore, the key-frame based approach comes into the picture.In this approach, key-frames are extracted from video for embedding and extraction of watermark.This approach requires less time for the watermarking process.This efficiently reduces the time complexity and helps to improve the visual quality of watermarked video.The work presented in this paper is motivated by the requirement of a practical video watermarking scheme that provides authentication to H.264/AVC-based compressed video with high imperceptibility and robustness that protects copyright property efficiently.The proposed video watermarking scheme provides several advantages over existing techniques.In this paper, pearson correlation coefficient (PCC) based shot detection technique is introduced and key-frame from each shot is extracted using statistical measure called entropy.
The organization of paper is: section 2 reviews the existing work.In section 3, the proposed video watermarking scheme is presented.Section 4 presents experimental study on quality and efficiency of the proposed video watermarking scheme.Section 5 describes conclusions.

RELATED WORK
The various video watermarking schemes have been developed in the recent area, in which the main idea is to insert secret information (watermark) into the selected key-frames of the video by introducing some changes that are acceptable in terms of accuracy and usually invisible to the legitimate user.Video watermarking techniquesare divided into two different schemes based on embedding domain like frequency domain (FD) and spatial domain (SD).In SD, the watermark is inserted into the intensity (pixel) values of the frame, whereas in the FD the watermark information is inserted into the frequency coefficients of the video frame instated of modifying direct pixel values.As a result this type of approach provides high robustness and less visible distortion than SD approach.Most research in the field of video watermarking focuses on the FD approach.However, it requires more computational complexity compared to SD approach.
Generally in image watermarking schemes, watermark information is inserted into the image itself either in blocks or region of interest, whereas, in video watermarking schemes watermark is inserted in different ways like: i) frame by frame [1], [2], ii) region based [3]- [5], iii) key-frame based [2], [6] approach.In the frame by frame approach, watermark information was inserted in every frame of the video.This type of approach is very effective and robust against frame attacks like frame dropping, inserting, and swapping.But it is impractical, requires more time for embedding and extraction of watermark information and also increases the size of video.To overcome these challenges, many researchers have used region-based approach for embedding and extraction of watermark.In this type of approach, robust regions or moving blocks are detected within a host frame for embedding watermark.This approach was revealed that these algorithms provide security, high imperceptibility and robustness against common attacks.The main disadvantage is that the watermark's accuracy is dependent on the locations of motion segments or regions retrieved during the extraction process.To improve the correctness and to reduce time complexity the third approach was introduced wherein, the representative frames are selected from each shot or scene of video sequence for embedding and extraction of watermark.This methodology reduces a huge amount of time required for watermarking process.Moreover, it avoids frame redundancy and improves the stability and robustness of the watermarking technique.
In [7], Li et al. have presented a semi-fragile video watermarking scheme for compressed domain videos.The numerical relationship among discrete cosine transform (DCT) non-zero coefficients was considered as an authentication code.In this scheme, first the frame number was converted into an 18-bit watermark sequence, and then the generated code was embedded into a 4×4 sub-block which contains at least three DCT non-zero coefficients.This watermarking technique shows good transparency and tamper detection.Himeur et al. have developed a chaotic encryption-based video watermarking scheme [8].In which, the key-frames were extracted using the gradient magnitude similarity deviation (GMSD) technique for embedding and extraction of watermarks.A blind and secure watermark embedding and extraction technique was adopted using discrete wavelet transform (DWT) and singular value decomposition (SVD).The drawback of this system was that, if the watermarked key frame is lost from the video sequence, it fails to recover the watermark and provides weak resistance to geometric distortion.In [9], a semi-blind video watermarking scheme using speed-up robust features (SURF) and visual cryptography was introduced.A shot detection technique that uses the histogram difference of consecutive frames was introduced.This technique provides robustness against different signal processing and geometric distortion.However, this scheme needs to store generated shares securely.Sethuraman et al. [10] have introduced a key-frame based watermarking technique wherein the structural similarity index metric-absolute difference metric (SSIM-AMD) techniques were adopted for the identification of non-redundant frames.Then, the entropy-AMD method was used to select a key-frame.Furthermore, DWT is applied to decompose the key-frame into subbands.To avoid false-positive attacks, the principal component of the watermark image block was computed and embedded into the middle band of DWT.The strength of the watermark was decided by calculating the scaling factor using the ant colony optimization (ACO) technique.It was observed that this scheme was Comput Sci Inf Technol ISSN:2722-3221  Key-frame extraction based video watermarking using … (Kapre Bhagyashri S.) 87 robust against video processing and false-positive attacks.It provides high performance in terms of imperceptibility and robustness.
In [11], the authors designed a semi fragile video authentication technique using DWT and DCT transform wherein, the robust features were extracted and used to generate a content-based authentication code (CBAC).That code is scrambled using Arnold's transform to generate a quick response code.Thereafter, a quick response code is embedded into the middle frequency sub-band of DWT and extracted blindly without using original video information.This technique outperforms well in terms of watermarked video perceptual quality and discriminating between intentional and unintentional manipulations.To detect and localise tampered area locations in [12], the authors have developed a chromatic DCT-based video watermarking approach.In this scheme, tamper detection was done by using different features of the H.264/AVC coding standards.An experimental result shows that the developed technique was used to detect spatial attacks as well as help localise tempered regions.
A hyper-chaotic Lorentz based video watermarking approach has been developed in [13].In which case, watermark embedding and detection were performed by extracting specific frames from the host video's non-motion frames.Then, watermark embedding was done using the 3D-DWT transform.The developed approach fails to resist temporal attacks.Therefore, this approach is to detect tampered areas but not be able to localise them.In [14], the authors have introduced a fragile video watermarking approach wherein, content-based authentication code is generated using the Arnold transforms.This approach was able to detect and localise tampered area locations.In [15], authors have developed a video authentication technique using audio and video features.The authentication code was generated by using both video and audio content from an MP4 clip.Then, the generated code was embedded in the subtitles.This approach allows frame addition and removal to be detected.One more approach was designed for authentication of the MP4 format in [16], wherein an encrypted hash value of audio data was inserted into the synchronisation content of the MP4 file.The designed approach was robust against compression and able to detect tampered area locations.
A new video authentication technique based on the generation of watermark images has been developed in [17].All watermark images were embedded into all video frames using the DWT transform.During the extraction process, the embedded watermark was extracted and analysed to detect spatial attacks.Thereafter, a binary sequence was generated from all the extracted watermark images that is used to determine the type of temporal attack such as frame removal, addition, re-ordering, and localising tampered frames.In [18], authors have developed a semi-fragile video watermarking approach.This approach was used for tampered area detection and localization.The watermark embedding was done in the P and B frames of the video in low frequency components.The developed approach was robust and imperceptible, but requires original information during the extraction process.Aditya et al. [19] have designed a video watermarking scheme for tamper detection.In this approach, triple transformations like DWT, DCT, and SVD were used to improve the robustness of the designed approach.Both host and watermark videos were transformed by using the DWT and DCT successively.Then, SVD was employed on the original video, and in the same way, SVD was applied to the watermark to obtain singular values.The singular values of watermark were embedded into the singular values of host video with some embedding strength.However, the designed approach was robust, but there was a possibility of false detection.
The literature review on video watermarking systems revealed that, i) embedding watermarks in all frames of the video consumes more time; ii) the watermark block embedded in the key-frame can withstand frame-dropping attacks, iii) these strategies fail to achieve a good balance of resilience and imperceptibility.
In this paper, we have developed a video watermarking technique that provides the tread off between robustness and imperceptibility.Video shot boundaries are detected using PCC and the key-frames are extracted using a statistical measure called entropy value.A maximum entropy valued frame is extracted, which provides maximum information compared to other frames of the same shot.Then, SURF feature-based square regions are extracted for embedding watermark.The SURF feature points are commonly invariant to rotation, scaling, and translating, so they naturally fit into the requirements of geometrically robust image watermarking.A DCT-based embedding technique has been introduced to achieve high robustness and imperceptibility.

PROPOSED SCHEME
A H.264 video watermarking system is explained in this section.First the input video is divided into frames, and then shot boundaries are detected using the PCC technique.Then, the entropy value of each frame of a shot is calculated, and the maximum entropy-valued frame of a shot is selected as a key-frame.The proposed shot boundary and key-frame selection algorithms are used for reducing watermark embedding time and providing robustness against distortion, noise, illumination changes, object motions, and camera operations such as zoom-in and zoom-out.A resilient SURF feature-based watermarking approach in the DCT domain is applied to each key-frame of a video.During the watermark embedding process, SURF feature points are detected from each key-frame.These detected key-points are robust to various geometric and image transformations, such as scaling and rotation, blurring, and JPEG compression.The detected feature points are utilised for the generation of non-overlapped square regions for embedding and extraction of watermarks.The detailed watermarking process is explained.

Shot boundary detection and key-frame extraction
The process of proposed shot boundary detection using PCC is explained in this section.Initially, host video is decomposed into frames and then each frame is divided into R, G and B channels.In the proposed shot boundary detection, first frame of each channel of the host video is considered as a first frame (FF) for the first shot.The PCC is measured using (1) between first frame FF and successive frames (  ) for each red, green and blue channel of respective shot.If measured PCC value of FF and Fi is greater than threshold for each channel then that frame is added into the current shot, otherwise, it is considered as first frame of next shot.Same process is applied on each frame of the host video.The proposed shot boundary detection algorithm is described in Algorithm 1.In order to achieve high accuracy in shot boundary detection, we need to decide appropriate threshold values.For the PCC of red, green and blue channels, the threshold values are measured using the mean (  ,     ), and variance (  2 ,   2   2 ) of PCR, PCG, and PCBusingin (2) to (4).PCC between two images P and Q is calculating in terms of covariance is given as in (1).Where,μ P and σ P are the mean and standard deviation of P, respectively, and μ Q and σ Q are the mean and standard deviation of Q, respectively.

𝛒𝐜𝐜(𝐏, 𝐐) =
The selected values of  is considered in this research is 1.2.Then, a key-frame is selected from each shot using a statistical entropy measure.The randomness that can be used to classify an image is measured using entropy [8].Entropy value of each frame of a shot is measured, and the frame with the highest etropy value is chosen as a key-frame for the shot.The same procedure is repeated for each shot, to obtain all the key-frames.The following ( 5) is used to determine the entropy value.

Initially, the host video is preprocessed into number of frames (𝐹
p encloses the histogram counts of image and   is the i th frame of j th shot.
Where,  is the key-frame of the j th shot.

Generation of invariant regions using SURF
In the proposed video watermarking approach, a SURF feature based invariant regions are detected for embedding and extraction of watermark.Initially, SURF feature points are extracted and those feature points are considered as centre of circular region with radius r.Each circular region is converted into square region using circumscribed square.It is a square surrounding a circle such that the circumference of the circle touches the midpoints of the four sides of the square.The diameter of the circle is equal to the side length of the square.Figure 1 shows the generation of square regions using Figures 1(a) to 1(d). Figure 1

Watermark embedding
In this section, the detailed watermark embedding procedure is explained: Step 1: Initially, shot boundaries are detected using PCC from input video and then key-frame for each shot is detected using entropy measure.
Step 2: Each key-frame is converted from RBG color space to YCbCr color space and Y component is used for further processing.
Step 3: SURF feature points are detected for Y component of each key-frame of the video.
Step 4: Detected feature points are used to generate non-overlapped circumscribed square regions of size 32×32.
Step 5: Each non-overlapped square region of a key-frame is further divided into non-overlapped sub-blocks of size 8×8.
Step 6: DCT is applied on each sub-block (8×8) and get mid-frequency coefficients.
Step 7: Un-correlated pseudorandom (PN) sequences such as Psudo0 and Psudo1 are generated using a secret key.Psudo0-sequence is used to embed watermark bit 0 and Psudo1-sequence is obtained to embed watermark bit Size of each of the two PN-Sequence must be equal to the number of midfrequency elements of 8×8 DCT transformed sub-block.
Step 8: The generated Psudo0 and Psudo1 are embedded with decimal sequence s, which is generated using secret prime number p into the mid-frequency coefficients of DCT using following (7).
Where,   indicate mid-frequency coefficient,  0   1 are pseudorandom sequences, W is the watermark and  indicates the scaling factor.Step 9: Inverse DCT is applied and then, the original square region is replaced with the watermarked one.
Step 10: The above embedding operation is done repeatedly until all the invariant regions are watermarked.
Step 11: From step-2 to step-9 all steps are applied on each key-frame of a shot and finally, watermarked video is generated.

Watermark extraction
We employ a correlation-based watermark detection approach to recover the watermark.Following steps are applied on watermarked video to extract watermark.
Step 1: Shot boundaries are detected using PCC for watermarked video and then entropy measure is used to select watermarked key-frame.Step 2: Each watermarked key-frame is converted from RBG color space to YCbCrcolor space and Y component is used for further processing.
Step 3: SURF frature points are extracted from Y component.
Step 4: Detected feature points are used to generate non-overlapped circumscribed square regions of size 32×32.
Step 5: Each non-overlapped square region (32×32) of a key-frame is further divided into non-overlapped sub-blocks of size of 8×8.
Step 6: DCT is applied on each sub-block 8×8 to get watermarked mid-frequency coefficient.
Step 7: The s-sequence formed using the same prime number used in watermark embedding and it is correlated with the DCT watermarked mid coefficients values using (8).If the Corr is greater than Thtrshold T then watermark bit is 0 otherwise it is 1.Equation ( 9) is used to detect watermark.
Where, IW' DCT coefficient of watermarked frame s-Decimal sequence generated using prime number T-Threshold (which is decided based on the need to minimize error.Corris the correlation value

EXPERIMENTAL RESULTS
The experimentation of proposed video watermarking scheme is estimated in MATLAB 2021a with I3 processor.The efficiency of the proposed scheme has been calculated on standard six videos: Bowing, Coastguard, Silent, Salesman, News and Foreman in terms of imperceptibility and robustness.The perceptual quality of watermarked video is measured using peak signal to noise ratio (PSNR).In terms of imperceptibility, a greater PSNR suggests better performance.PSNR of watermarked image is calculated using (11).
The robustness of the watermark is determined by its resistance to attempts to remove the watermark content using various types of image or signal processing attacks.The robustness of watermark is measured using normalized cross correlation (NCC).It is evaluated using (17).

Imperceptibility and robustness analysis
To evaluate the invisibility of the proposed video watermarking approach, we have used PSNR as an evaluation parameter, by comparing the original video and the watermarked video.Table 1 shows the average PSNR values of watermarked videos.We present the average PSNR values for all frames in the host video to eliminate the effect of randomness.From Table 1, it is found that the average PSNR of all the watermarked videos across all videos is above 62.85 dB, which proves the good imperceptibility of the watermarked videos.The NCC values of a content-based watermark extracted from a watermarked video frame following various attacks are shown in Figure 2. The proposed approach is shown to be resistant to image processing Comput Sci Inf Technol ISSN:2722-3221  Key-frame extraction based video watermarking using … (Kapre Bhagyashri S.) 91 attacks such as Gaussian noise, salt and pepper noise, Poisson noise, blurring, brightening, frame averaging, and swapping.The NCC value of the detected watermark is found to be between 0.9 and 1.

Figure 2. NCC values of watermark after applying various attacks
The proposed entropy-based key-frame extraction scheme is evaluated using precision and recall measure.In this research the detected key-frames are compared with the ground truth, which is generated manually by five human observers after watching the videos.The similarity between detected key-frames using proposed scheme and the ground truth is then measured.The recall and precision are computed using following (12) and (13) Where, true positive (TP) indicates true positive means that the extracted frame by proposed scheme and by human observers are same.If the key-frame extracted by proposed scheme is not observed by human observed then that is called false positive (FP) and false negative (FN) means if the key-frame is observed by human but not by proposed scheme.The results of key-frame detection by proposed scheme for 'silent' video sequence are given in Figure 3 wherein, two different shots of a 'silent' video are shown; one is indicated using green color box and other by using red color box.Figure 4 shows accurately and semantically extracted key-frame of each shot shown in Figure 3 of video 'silent' using entropy measure.

Comparative analysis 4.2.1. Comparative analysis of proposed key-frame extraction
In this section, the results of proposed scheme are compared to existing video watermarking approaches in the literature.We have compared our results of SVD based key-frame with GSMD [8] scheme.Table 2 shows average values of recall and precision for all the above schemes.Recall and precision values obtained by the proposed method are 95%.So it is proved that the proposed entropy based key-frame extraction scheme outperforms the performances GSMD [8].From the experimentation, it is revealed that the proposed key-frame extraction strategy is less difficult, segments shots precisely, and extracts key-frames from each shot with strong robustness.

Comparative analysis of proposed video watermarking scheme in terms of imperceptibility and robustness
In order to evaluate the performance of the proposed scheme, the results of the proposed video watermarking scheme is compared with the results of related video watermarking schemes given in [8]- [10], which we introduced and discussed in the related work.Table 3 shows the results of comparing the proposed scheme with other recent schemes under different kinds of attacks like blur, brighten, Gaussian noise attack, Salt and Pepper, median filtering, frame dropping and averaging.It is observed that the proposed scheme outperforms all the above mentioned existing schemes.It observed from the Table 3, the average NCC value of proposed scheme is higher than 99% for almost all the attacks.

93
The PSNR values obtained from the proposed algorithm is compared with the existing schemes which are shown in Figure 5.In human visual perception, PSNR value of above 50 dB is considered as better visual quality of watermarked video [9]. Figure 5 shows the average PSNR values of proposed scheme and existing schemes [2], [8]- [10].It is observed that the proposed scheme shows PSNR value is above 62.8dB, which is better than the existing schemes [2], [8], [10] except [9].As in [9], authors have considered only boundary blocks of selected key-frames for embedding and extraction of watermark.Figure 5. PSNR of watermarked video of proposed scheme compared with exiting methodology

CONCLUSION
In this research, we provide a blind, efficient, and secure video watermarking scheme for H.264/AVC videos.Initially, we have developed a novel and efficient shot detection technique using PCC.The reason behind the use of PCC is that it is robust against object motion, camera operation, and illumination changes.As embedding watermark in every frame is time-consuming, we have proposed an entropy based key-frame extraction algorithm, which is used for selecting key-frames from each shot of the video.From each key-frame, SURF feature point-based square regions are extracted for embedding and watermark extraction.These extracted regions are robust to various geometric and photographic transformations, such as scaling and rotation, blurring, and JPEG compression.Further, a DCT-based watermark embedding algorithm is used to improve the imperceptibility and robustness of the proposed watermarking scheme.Moreover, the embedded watermark is extracted blindly using the extraction function.In terms of imperceptibility, security, and resilience, promising results have been achieved using the proposed approach.

Figure 1 .
Figure 1.SURF feature based region extraction (a) circumscribed square of a circle (b) SURF feature points (c) Circular region with radius r (d) Non overlapped circumscribed square of a circles

Figure 4 .
Figure 4. Extracted key-frames using entropy from each shot of Figure 3

Table 1 .
The average PSNR values of the watermarked videos

Table 2 .
[8]parative analysis of Recall and precision values of GSMD[8]and proposed scheme for 'silent video'