High Quality Video Assessment Using Salient Features

An efficient modified video compression HEVC technique based on high quality assessment saliency features presented for the assessment of high quality videos. To create an efficient saliency map we extract global temporal alignment component and robust spatial components. To obtain high quality saliency here, we combine spatial saliency features and temporal saliency features together for different macroblocks in association with transformed residuals. In this way, our saliency model outperforms all the existing techniques. In this paper, we have generated high reconstruction quality video after compression considering SFU dataset. Our experimental result outperforms all the existing techniques in terms of saliency map detection, PSNR and high-resolution quality.


Introduction
The growth in consumer demand for ultra-High Definitions (UHD) devices like Smart phones, iPad, MacBook Pro, LAPTOPS, HDTV (High-definition television), UHDTV (Ultra-highdefinition television) has provide immense popularity to 2k/ 4k/8k videos in the entertainment world due to its high quality visibility and richer color.UHD videos becomes a common requirement in the field of entertainment, medical, photography, satellite imaging, HDTV, stereoscopic video processing, face recognition etc.However, there are few problems comes along with high quality UHD videos such as requirement of high storage capacity, limited battery power of high definition devices, long encoding time, high computational complexity.Therefore, in past decades, evolution of digital video compression methods has modernized the entire research industry to create, link and transfer visual data.
Video compression methods are the combination of spatial and temporal motion compression which decreases the data quantity to a huge amount [1].Recent advanced Video compression methods consists of some essential features such as coding efficiency, channel robustness and application flexibility.A broadcasting HD video consist of compressed audio and video data, synchronized data, error detection and controlling signals.MPEG-2 [2], H.263 [3], AVC, SVC [4,5], H.264/AVC [6] standards are some recent existing Video coding standards.However, it consists of few problems likes degradation in quality and coding efficiency, high bitrate, high computational complexity.
In [7], a distributed Map Reduce technique presented to make faster the encoding process based on scheduling and segmentation video encoding.In [8], to identify key-points and achieve efficient feature extraction a Key point Encoding technique presented at lower bitrates.The reduction in bitrates is satisfactory for a single scene but can be complex for multiple scenes of a video.In [9], an adaptive scheduling adopted for heterogeneous devices to achieve real-time coding by using an arrangement of parallel CPU+GPU cores.However, heterogeneous devices consists of high computational complexity and optimization problem.In [10], a real-time H.264 encoding technique presented for HD (High Definition) videos to enhance the video quality after the efficient compression.However, it is difficult to select suitable features for embedded systems as it requires different features for different scenarios.
Higher bit-rate, optimization problem, high computational complexity, suitable feature selection, large coding time and degradation in coding efficiency are the problems often occurs in existing techniques.Therefore, there is a need of an efficient technique which can improve coding efficiency to a large extent without compromising the high quality of video.Therefore, in In this paper, we focus on extraction of high quality saliency features to get high compression efficiency.For the assessment of high quality videos.Here, to create an efficient saliency map we extract global temporal alignment component and robust spatial component.The small alterations between neighboring frames may be not sufficient to define salient regions.Hence, Motion estimation and detection is very critical phenomenon in our saliency model.We can determine saliency map precisely using HEVC (High Efficiency Video Encoding) in association with INTRA or INTER processing blocks with different sizes.To obtain high quality saliency here, we combine spatial saliency features and temporal saliency features together for different macroblocks in association with transformed residuals.Saliency features helps to get high quality compression.In this way, our saliency model outperforms all the existing techniques.
This paper is organize in following sections, which are as follows.In section 2, we describe about the video encoding issues and how they can eliminate by our proposed model.In section 3, we described our proposed methodology.In section 4, experimental results, evaluation shown, and section 5 concludes our paper.

Video Encoding Issues
Video compression is an essential need of a modern era due to enormous growth of many HD and UHD devices which requires an ultra-high definition (HD) quality.However, there are some issues which are associated with HD and UHD videos such as high storage capacity, limited battery power of high definition devices, long encoding time, optimization problem, degraded coding efficiency and high computational complexity.Therefore, in past decade many researchers have done some significant work to reduce these above mentioned issues.A brief of related work in the field of video compression presented in the following section.
In [11], a no-reference based SQA (Subjective Quality Assessment) approach adopted to assess and improve quality of video encoding with the help of human eye traversal.In this approach, approximation of smooth eye traversal computed based on distance, angle and pupilsize feature using HEVC (High Efficiency Video Encoding).However, QMET(quality metric based on eye traversal) can be much more flexible if eye-tracking simulator employed in association with QMET.In [12], a 3D HEVC scheme introduced to reduce high computational complexity using online learning.In encoding process, online learning used to tune the two probabilistic models FMA (Fast Mode Assignment) which helps in reducing complexity.However, this scheme can marginally affect the video quality while reducing the computational complexity.In [13], a verified HEVC testing scheme presented to evaluate video quality.To get better efficiency and for bit-rate saving a MOS-based BD-rate measurement presented in association with HEVC approach.However, it increases computational complexity.In [14], asymmetric compressed stereoscopic technique adopted to evaluate high quality assessment and rate-distortion on 3D videos.In this approach, a combination of asymmetric transform coding and mixed resolution used to obtain asymmetric compression with better quality.However, it is highly complex process to implement in real-time.In [15], a quality assessment for streaming videos presented using estimated QOE (Quality of Experienced) Measurement.However, integration of QOE model with adaptive streaming decision making engine for optimal playback control is very challenging issue.In [16], a depth quality assessment approach based on no-reference edge misalignment error presented for texture plus depth T+D images.However, it is difficult to assess completely depth quality in no reference fashion.In [17], a weighted fixation density based approach presented to describe quality assessment using visual saliency map to obtain high quality compression.However, this approach marginally eliminates the central bias problem but not completely using shuffling method.In [18], a novel dynamic feature selection (DFS) model proposed to get high quality visual features to assess high video quality which can improve quality of visual saliency maps.However, measurement of background feature density and reconstruction error computed is very high.In [19], to predict gaze density and improve quality of visual saliency maps an emotion intensity incorporated along with emotional object detection.However, this model failed to define relationship between emotion and visual saliency.

763
In this section, many techniques described as a related work for high quality assessment for video saliency maps.However, every technique has its own drawback.The basic drawbacks are high storage capacity, high computational complexity, flexibility, coding efficiency, central bias problem and reconstruction error and low quality of visual saliency maps.Therefore, to overcome these issues we have proposed an efficient modified video compression HEVC technique based on high quality assessment saliency features for the assessment of high quality videos.

Quality Saliency Features based Video Encoding
The popularity of High Definition Devices in recent time has changed visual real world due to its realistic visual power and true color.Therefore, availability of high end devices increased the demand of high resolution videos.However, high-definition videos takes large space and bandwidth spectrum.Therefore, to counter these drawbacks, we have presented a video encoding technique based on its quality saliency features using HEVC (High Efficiency Video Encoding) architecture to obtain high quality compression and reconstructed frames.This technique can used in field of medical, photography, satellite imaging, HDTV, stereoscopic video processing, face recognition and video compression to estimate saliency and compress high-definition videos.Our video compression provides fast computation for large training database such as SFU dataset [20].HEVC architecture in synchronization with saliency features can provide compression efficiency to a high extent.HEVC architecture can precisely work with large datasets, can helps to get lower bitrate and can reduce computational time.
There are very few techniques, which can effectively estimate saliency features and compress a high definition video and can provide a high definition visual quality.Therefore, to detect saliency precisely and to provide high quality video after compression, we have presented a video encoding technique based on its quality saliency features using HEVC (High Efficiency Video Encoding) architecture.

Saliency Aware Video Compression
Our proposed video compression technique based on saliency map consists of following principles.a.In a saliency map while compression, low salient regions contains lower perceptual quality and higher salient regions contains higher quality.This demonstrate that quality is focused towards those viewers who probably wants to look.b.Encoding should provide saliency features of different regions .The saliency of a low resolution video can be increased after compression if that area is highly salient initially.Therefore, this shows that region is of very high quality where large number of viewers can get attract.c.Initially, if any region has low saliency in a video then its saliency can further increase after compression.Therefore, this shows that region is of very lower quality and probably less number of viewers can get attracted for such regions.

Detection of Saliency Regions
There are many factors, which can make video compression very effective such as prediction of motion and motion compensation, then transformation, quantization, estimated residuals and motion vectors entropy etc.Therefore, salient map can be an effective way to demonstrate precise quality compression.A region which attracts human eyes with highest probability concern can be called as salient region.A salient region can be colored, high resolution and motion.Here, to create a saliency map we extract global temporal alignment component and robust spatial component.The small alterations between neighboring frames may be not sufficient to define salient regions.Hence, Motion estimation and detection is very critical phenomenon in a perceptual coding.We can determine saliency map precisely using HEVC (High Efficiency Video Encoding) in association with INTRA or INTER processing blocks with different sizes to obtain high compression rate.

The IKN Saliency Model
There are number of conventional models exists in real world to define saliency model in video encoding field.Here, we have used Itti-Koch-Niebur (IKN) model [21] to define saliency  ISSN: 2502-4752 IJEECS Vol. 7, No. 3, September 2017 : 761 -772 764 map estimation due to its wide popularity and low computational complexity.There are sufficient number of independent feature mediums are available to estimate saliency map by exploring the input images/frames.Every feature medium handle a low-level visual features inside an image.These low-level visual features are intensity, color and orientation contrast.There are total nine spatial scales are produced for image downsampling and gradually pass to low pass filter utilizing dyadic Gaussian pyramids to form a reduction factor of image size whose range is scale zero (1:1) to scale eight (1:1) [21].
A "center-surround" method adopted to compute contrast for every feature medium.This center surround method can be defined as the change between Coarse and fine scales.Here, "scale" represents the pixels at scale * + and "surround" shows pixels at scale where * +.Interpolation and point-by point subtraction methods are used to find scale difference at finer scale.A normalization operator used to combine the extracted contrast features to develop a ""conspicuity map"" for every feature medium.The similar normalization operator can be used to combine all the conspicuity maps to create a "final saliency map" after resizing to level-4 by estimating pixel values.
A motion and flicker medium are introduced to IKN model to make it suitable for videos [22].The flicker medium developed by a creating a Gaussian pyramid which is the difference between present and previous frames.Motion medium developed by an intensity pyramid which is difference between spatially-shifted present and previous frames [22].The motion and flicker conspicuity maps created using the same center surround method which derived for intensity, orientation and color medium and combined together to form a final saliency map using spatial conspicuity maps.

Rate Distortion Optimization
H.264/AVC and HEVC video coding standards can support multiple types of block encoding modes like , , and , , [23].To reduce Lagrangian cost function of coding modes we have utilized RDO (Rate Distortion Optimization) method for each macroblock (MB) selection in association with HEVC coding [23][24].
Where, ( ) represents MSE (Mean Squared Error) and ( ) shows bit rate of current macro block for the coding mode with quantization step size.Here, quantifies the tradeoff comparison between distortion and rate and can be defined as Lagrange Multiplier [24]. is a specific value for which Lagrangian cost function [25] is minimized.Therefore, to accomplish optimum rate distortion model is very essential factor.In HEVC coding can be expressed as, Where, is the quantization factor.

Quality Saliency Estimation for a Video
Our model is divided into two components to evaluate saliency map such as (Spatial Saliency Component) and (Temporal Saliency Component). is a convex approximation of IKN spatial saliency whereas is used to predict saliency using global motion compensation by eliminating camera motion to obtain high compression.The prediction of the quality saliency map for a low-resolution video can be achieved in four sections which are as follows: a. Convex Approximation to Spatial IKN Saliency In Our IKN saliency model to create saliency map of a video frame we utilize the video frame content in the normalized frequency range 0 1.Here, we presented a convex approximation method to find IKN saliency map.In convex approximation method, block DCT is utilized to predict saliency of that block .This can be accomplished by recapturing some section

IJEECS
ISSN: 2502-4752  High Quality Video Assessment Using Salient Features (K.Bhanu Rekha) 765 of the video frame at position using normalized frequency range 0 1.In our IKN model, windowing and spectral down-sampling methods used to extract a block from a video frame.
To extract a block from the video frame of desired size windowing method used and then its 2-D DCT evaluated.The resultant DCT can be expressed as ( ).Assume that another resultant DCT can be denoted as ( ) which covers , ) ( frequency band.Weiner Filter DCT coefficients can be expressed as Where, ( ) is ( ) 2-D DCT coefficient of and Weiner Filter DCT Coefficient denoted by ( ).Here, for a known video resolution and block size, can be pre-computed.The energy of all mediums can be added together whenever MB has multiple color mediums.Equation 4 can be applied for all macroblocks in a frame to evaluate the spatial salient features of that frame.To compute optimum saliency map the resultant map is normalized in the range , -Therefore, to compute optimum saliency map the normalized block of spatial saliency can be obtained as: Here, for a known block and image dimensions, some coefficients of may be considered as zero.Therefore, equation ( 4) represents un-normalized spatial saliency which is monotonically non-decreasing (either constant or increasing) for a total DCT energy of a block [26].

b. Global Motion-Compensated Temporal Saliency
The temporal saliency and spatial saliency both are different aspect of the video encoding field.Object motion is one of the most powerful and essential feature of video processing [27].In many conventional techniques local motion contrast used to detect temporal saliency for visual attention [22].Consider an object with significant motion is measured as powerful and attention grabbing object to its surrounding in a visual system.
The performance of IKN saliency model degrades whenever camera motion comes in picture due to apparent motion of background participate with object motion of foreground and can easily confuse any saliency model [22,28].Therefore, to overcome this drawback, we eliminate camera motion itself before evaluating the saliency map.An efficient compresseddomain global motion prediction technique [29] used to compute saliency map effectively by using previous frame motion field as a present frame approximation.Then global motion compensation used to subtract global motion from the motion field.From this method we can evaluate block for motion compensated global motion vector ( ).For every macro-block the magnitude of ( ) is represented as motion saliency ( ).To get spatial-temporal saliency features of , we combine both spatial saliency and motion saliency of together with the help of coherent normalization technique based on fusion scheme [30], c. Macroblock QP Selection Consider that the quantization parameter of the present video frame is which is calculated using a suitable rate control technique for frames.Assume that ( ) is the saliency of macroblocks and ̅ is the average saliency of present frame for all macroblocks.For macroblock of present video frame, can be defined as.
Where, a sigmoid function is used to compute and it is defined as, Where, are the constants.In our model, we have set , and .Equation ( 7), provides quantization parameter of .In our model, the relationship between quantization parameter and size of quantization step can be defined as, Where for different modes can be defined as ( ) , ( ) ( ) , ( ) , ( ) and ( ) .In order to obtain optimum block coding mode we present a saliency distortion phase as ( ) to fulfill requirements effectively.In our model cost function can be defined as, Where, saliency distortion ( ) can be linked with Lagrangian multiplier which is absolute change between uncompressed macroblock saliency and coded macro block with the help of coding mode using size of quantization step .
Where, ̃ ( ) represents coded macroblock with the help of coding mode and is the macroblock in uncompressed form with size of quantization step .Consider that compression can change the magnitude or direction of motion of different regions only for extreme lower bitrates.Therefore, we can say the difference in motion saliency is negligible in contrast to spatial saliency.Therefore, with the help of equation ( 6) we can approximated ( ) as, where, Here, weighted macroblock motion saliency is used to compute macroblock saliency distortion which actually refers to spatial saliency distortion from equation ( 12) and (13).Therefore, the salient regions where saliency is very high, there distortion also will be high.Equation 12 computes the saliency distortion so that .̃ ( )/ can easily computed which requires to eliminate the chicken and egg problem arises after compressed block saliency of a frame.In section 3.1, some principle conditions are mentioned which are adopted in our model where in highly salient regions saliency increase after compression.These conditions are categorized by,

767
Where, user-defined threshold can be set as in our model, which is same as and it is assumed as good quality.Similarly, condition 2 tell that saliency can be decreased after compression of lower salient regions.

{ ( )
. ̃ ( )/ Note that, reduction of saliency distortion with the help of equation ( 12) can help to save the saliency of lower salient regions also.

d. Statistical Modeling of Transformed Residuals
Here, a Laplace probability density function of zero mean used to compute peripheral density of transformed residuals with parameter Where, the relationship between standard deviation and is represented by, √ In our model, is the Laplacian random variable which consists of a ( ) transform residual coefficient with parameter According to equation ( 12), weighted macroblock motion saliency is used to compute macroblock saliency distortion which actually refers to spatial saliency distortion.The energy of Wiener-filtered DCT macro block is the approximation for spatial macroblock saliency.To predict spatial saliency block distortion through quantization method using a similar quantization noise [31].Assume that Wiener-filtered DCT energy for a quantization noise is our spatial saliency distortion.Therefore, final saliency map ( ) using our proposed method, which helps to get high compression and visual quality, can be defined as Where, can be defined as,

Performance Evaluation
We evaluate our outcomes with the same dataset as used in [20,32] to compare the proficiency and performance of our saliency model to the conventional techniques described in the related work.Our efficient saliency model trained on various large dataset like SFU dataset [20] and HEVC video_database [32].Here, we have presented experimental outcomes only for SFU dataset [20].Testing outcomes demonstrates that our proposed model outperforms most of the conventional approaches in terms of PSNR, feature extraction, compression rate and the quality prediction of saliency map.We have tested our proposed model for various coding standards HEVC and H.264/AVC.Our experimental results demonstrates accuracy, speed (bit rate) and Compression ratio enhancement to a large extent.Our proposed model requires very less amount of execution time to achieve efficient video compression.Our proposed model implemented on 64-bit windows 10 OS with 16 GB RAM which consists on INTEL (R) core (TM) We have compared our model with Itti [21], Surprise [33], Judd [34], PQFT [35], Rudoy [36], Fang [37] and OBDL [38] existing techniques.

Implementation Details
We have implemented our wide experiments on large video SFU dataset [20] and HEVC video_database [32].UHD videos becomes a common requirement in the field of entertainment, medical, photography, satellite imaging, HDTV, stereoscopic video processing, face recognition etc. due to its high quality visibility and true color.Therefore, there is a massive demand of high-resolution videos in video processing field.However, there is a massive problem of limited storage capacity and bandwidth spectrum.Therefore, there is a need of an effective compression technique without any data loss and which can precisely maintain the high video quality.
In this paper, we focus on an efficient modified video compression HEVC technique based on high quality assessment saliency features for the assessment of high quality videos.Here, to create an efficient saliency map we extract global temporal alignment component and robust spatial component.The small alterations between neighboring frames may be not sufficient to define salient regions.Hence, Motion estimation and detection is very critical phenomenon in our saliency model.We can determine saliency map precisely using HEVC (High Efficiency Video Encoding) in association with INTRA or INTER processing blocks with different sizes to get high compression rate.To obtain high quality saliency here, we combine spatial saliency features and temporal saliency features together for different macroblocks in association with transformed residuals.In this way, our saliency model outperforms all the existing techniques.To compute the performance of proposed modified HEVC model, 12 raw videos from the testing SFU dataset [20] has been taken out of which all 12 videos are used in testing of PSNR computation.All the experiments undertaken on the MATLAB 2016b framework.

Comparative Study
In this paper, we have compared our experimental results with many state-of-the-art techniques such as [21], Surprise [33], [34], PQFT [35], [36], Fang [37] and OBDL [38] existing techniques.In this paper, all the raw videos sampled on YUV 4:2:0 sampling.All the videos are compressed to high quality (more than ).This videos includes contents such as sport events, video conferencing, surveillance, video games etc.All 12 raw videos in the SFU dataset are of resolution with different frames.  1 and 2 shows that PSNR (Peak Signal to Noise Ratio) and video frame reconstruction quality comparison results using our proposed method for coding standard HM 36.0 respectively considering SFU dataset .The PSNR values are very high (more than ) using our proposed method (for all three Y, U and V channels) compare to existing techniques.Similarly, frame reconstruction quality is very high using our proposed method compare to existing methods for all three Y, U and V channel which shows high quality of our reconstructed videos.Overall PSNR and bitrate values are very effective to get high quality reconstructed compressed video.The average PSNR considering Y-channel is 30.739,U-channel is 36.488and V-channel is 37.35 using our proposed method which are very higher compare to the existing techniques.Similarly, the frame reconstruction quality which is very high compare to the existing techniques and shows high quality of reconstructed video.However, quality for each video differ according to their frame rate and dimensions of video.This result demonstrates our proposed method dominance towards existing state-of-the-art techniques in terms of PSNR and reconstruction quality and provides a high quality saliency map to obtain high compression rate.
Experimental results demonstrate that our proposed techniques completely outperforms existing techniques in terms of PSNR and reconstruction quality.This results signifies our proposed method produces high quality reconstructed saliency map to get high quality compression.Here, table 3 presents the comparison of performance matrices with existing techniques for all 12 videos in SFU dataset with our Proposed System.Experimental results demonstrates that our proposed model also dominates the other existing state-of-the-art techniques in terms of AUC (Area under Curve) and NSS (Normalized Scan-path Saliency).The average AUC for all 12 videos using our proposed method is and average NSS for all 12 videos using our proposed technique is .This results assures high quality reconstructed saliency map after efficient compression.Figure 1 shows the graphical comparison of our proposed method and existing techniques in terms of performance matrices AUC and NSS.To obtain high quality saliency here, we combine spatial saliency features and temporal saliency features together for different macroblocks in association with transformed residuals.In this way, our saliency model outperforms all the existing techniques.In this paper, we have generated high quality video reconstruction after compression considering SFU dataset.Here, we have shown average AUC and NSS comparison with other existing techniques for all 12 videos in table 3. The average AUC and NSS for all 12 videos using our proposed method is and for SFU dataset respectively.The average PSNR for Y, U and V channel using our proposed method is 30.739,36.488 and 37.35 respectively presented in table 1.Similarly, the frame reconstruction quality using our proposed method for all 12 videos is very high compare to existing techniques and high resolution quality of 4 videos is presented in Table 2.These results verify that our model outperforms any other state-of-the-art-techniques.In future, this model can be used in the field of medical, photography, satellite imaging, HDTV, stereoscopic video processing, face recognition and video coding or encoding.

Figure 1 .
Figure 1.comparison of our proposed model with existing state-of-the art-techniques

Table 1 .
PSNR Comparison Results for Y, U, and V Channel Using Our Proposed and Existing Method for HM 36.0Considering SFU Dataset IJEECS ISSN: 2502-4752  High Quality Video Assessment Using Salient Features (K.Bhanu Rekha) 769

Table 2 .
Frame Reconstruction Quality Comparison Results between Our Proposed Method and Existing MethodHere, in Table2we have presented reconstruction frames from the input frames using our proposed and existing HEVC technique for 4 videos such as Bus, Stefan, Soccer and Foreman out of 12 videos.Here, we have used different frames for all 4 videos such as frame 12 for Bus, frame 22 for Stefan, frame 7 for foreman and Soccer.It is clearly visible from table 2 that the reconstruction frames are of higher quality using our proposed modified HEVC method compare to existing HEVC method.

Table 3 .
Comparison of Performance Matrices with Existing Techniques for All Videos in SFU Dataset with our Proposed System