IMPROVING PSNR AND PROCESSING SPEED FOR HEVC USING HYBRID PSO FOR INTRA FRAME PREDICTION

High efficiency video coding (HEVC) is the newest video codec to increase significantly the coding efficiency of its ancestor H.264/Advance Video Coding. However, the HEVC delivers a highly increased computation complexity. In this paper, a coding unit partitioning pattern optimization method based on particle swarm optimization (PSO) is proposed to reduce the computational complexity of hierarchical quadtree-based coding unit partitioning. The required coding unit partitioning pattern for exhaustive partitioning and the rate distortion cost are efficiently considered as the chromosome and the fitness function of the PSO, respectively. To reduce the computational time, the cellular automata-based (CA) rule based time limit is used in order to find out the best possible modes of operation. Compared to the current state of the art algorithms, this scheme is computationally simple and achieves superior reconstructed video quality (12% increase in PSNR compared to existing methods) at less computational complexity (overall delay by 40%), Increasing the bandwidth and reducing the errors..


INTRODUCTION
These days, video dissemination for different objects is multiplying over the Internet with the guides of helpful correspondence systems and brilliant cell phones. Furthermore, video buyers progressively request top quality (HD) and ultra-top notch (UHD) recordings to encounter better visual quality. Accordingly, the conveyance of HD/UHD video to the cell phone clients over the Internet is turning into a well-known pattern. Be that as it may, the information amount for HD/UHD video is tremendous because of the higher video goals and edge rate. The information size of a 10-second video with 3840 × 2160 goals at an edge pace of 60casings for each second arrives at almost 15 GB. Because of this, the conveyance of HD/UHD video requests a bigger measure of system transmission capacity and information stockpiling contrasted with the lower goals standard definition (SD) recordings.
With respect to saving money on organize assets and capacity prerequisite, a productive pressure system is vitally significant. Joint Collaborative Team on video coding (JCT-VC),the communitarian venture gathering of ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Expert Group (MPEG), has executed a profoundly proficient videocoding standard called High Efficiency Video Coding (HEVC)/H.265 [1] as an answer for the issue of expanded video goals. ITU-T and ISO/IEC are the principle institutionalization bodies which have institutionalized all HEVC's predecessor guidelines in numerous years. They have used a 16 × 16 macro block as an essential handling unit in HEVC's precursor. Each edge is part into macro blocks. Each macro block includes one 16 × 16 square of lumaparts to speak to the splendour and two 8 × 8 squares of chroma segments to allude the shading in the 4:2:0 chrominance sub sampling groups. Along these lines, the macro block is the biggest square size to demonstrate the anticipated data of intra-outline or between outline expectation in past video coding norms. In any case, ordinary HD and UHD recordings have numerous bigger edge areas than the macro block, and those districts can speak to the equivalent moving data. On the off chance that the macroblock is utilized as an essential preparing unit for regular HD and UHD recordings, a lot of bits are important to flag the expectation data. Correspondingly, the change square size is bigger than the macro block size.
Consequently, HEVC bolsters a bigger square size as a fundamental handling unit called CTU for intra-outline or between outline expectation and change coding. Albeit a huge square size is sufficient for high goals video, it's anything but a decent decision for low goals video. To be good with both high-and low-goals recordings, HEVC can deftly parcel the video outline into a few square CTUs of 2L×2 L tests, where L ∈ {4, 5, 6}. The encoder deftly picks are asonable estimation of L for proposed application to have the best ex-change off between coding execution and cost, for example, memory stockpiling, encoding time, and postponement. Be that as it may, utilizing bigger square for choosing whether intra-mode or between mode at the forecast stage can't ensure to get a decent RD execution for expectation organize. To accomplish better coding proficiency, HEVC presented another essential handling unit, called CU and an adaptable quadtree apportioning from CTU to CU. Thusly,CU size can be 64× 64, 32 × 32, 16 × 16, and 8 × 8 at profundity 0, profundity 1, profundity2, and profundity 3,individually. To characterize CU size or profundity, HEVC begins a preliminary encoding which incorporates two primary capacities called the RD cost estimation and correlation in top-down and base up way, individually, as referenced in Section I. In the top-down RD cost computation of a 64× 64 CTU, the RD costs for all conceivable 85 CUs are determined in a preorder traversal of the quadtree, if the greatest CU profundity is 3. In subtleties, there are 1, 4, 16, and 64 CUs at profundity 0, profundity 1,profundity 2, and profundity 3, separately, and the complete number of CUs is P3 i=0 4 I = 85CUs. Subsequent to computing the RD costs for four kids CUs of each parent CU, HEVCgoes to the RD cost correlation with choose whether a parent CU is part or not by looking at the RD cost of parting and non-parting states of parent CU. At that point, HEVC changes to the RD cost estimation or performs correlation once more, contingent upon the situation of parent CU. In this way, there are 85 computations and 21 examinations in the top-down RD cost estimation and base up RD cost correlation of a 64 × 64 CTU, individually.
After at long last looking at a root CU at profundity 0 with its four kids CUs at profundity 1, the best CU quadtree structure of a CTU with the least RD cost is picked among 83,522 potential quadtree structures. The CU dividing examples of casing portrayal of picture request tally(POC) 40 of grouping "Blowing Bubbles" looked by a thorough RDO search of HM variant16.5 (HM16.5). The preliminary encoding of HEVC finds the best CU segment structure of each CTU after a comprehensive RDO search. In this manner, picking an ideal CU apportioning structure can be demonstrated as an advancement issue and can be discovered an answer by a softly appropriate enhancement apparatus to look through a space of conceivable CU segment arrangements. For little space advancement process, customary thorough procedures are suitable to discover the arrangement [20]. In any case, the procedures dependent on man-made consciousness (AI) are productive for a tremendous inquiry space and PSO is one of AI systems to look through a decent arrangement proficiently.
The major objectives of this text are,  To design a reduced complexity intra-frame predictor using soft computing  To design a reduced complexity intra-frame predictor using soft computing  To optimize the time complexity of this predictor using cellular automata rules  To integrate the reduced time complexity and reduced computational complexity predictor into HEVC encoding and decoding process.
The next section describes various techniques for improving HEVC performance followed by the proposed predictors. This text concludes with some interesting observations about our results and some recommendations that can be researched to further optimize the prediction performance.

DUAL TREE COMPLEX WAVELET TRANSFORM
DT-DWT is the advanced design of DWT. Unlike DWT, DTDWT can obtain better shift invariance and directional selectivity [21,22]. DTDWT is also known as complex transform since it includes the real and imaginary part of six oriented wavelet coefficients. Figure 1 shows the filter band tree structure of the DTDWT. As shown in the figure, the top tree R generates the real parts of the DTDWT coefficient and bottom tree I generates the imaginary parts of the DT-DWT coefficient. * denotes a convolution operation, ↓2 means a down sampling by 2. lfr and hfr are low-pass filter and high-pass filter, which form a Hilbert transformed pair to insure the perfect reconstruction of the discrete wavelet transform. With the filter band tree structure, the following wavelet sub-bands which oriented at ±75°, ±15 0 and ±450 are produced.
From the above equations,   x  and   y  represent the low pass filter along with first and second dimension. Similarly,   x  and   y  represent the high pass filter along with first and second dimension. Also, LH and HL sub-bands are oriented at vertical and horizontal directions respectively. The HH sub-band is simultaneously oriented along the +45° and−45° diagonal directions. It is also denoted as HHp (positive oriented direction) and HHn (negative oriented direction). Similarly, two LH sub-bands and two HL sub-bands are denoted as LHp, LHn, HLp and HLn. With these six oriented sub-bands, best angular mode is estimated among the 35 angular modes for intra prediction process.

RELATED WORK
HEVC accomplishes the bit rate sparing of almost half under the equivalent visual quality contrasted with the H.264/AVC. Thus, HEVC turns into a famous video codec. Variants of HEVC like H.264, H.265, and others are iterations over the base HEVC codec that improve its efficiency by adding computationally optimum algorithms during the inter-frame and intra-frame prediction conditions. Researchers from different fields including video processing specialists, Mathematicians, signal processing experts, to name a few have forayed into this field in order to further optimize the efficiency of HEVC processing. For instance the work in [2] adds watermarking capabilities to HEVC by matrix encoding in the DCT (discrete cosine transform) block of HEVC. This paper achieves data hiding with minimum distortions in the output video. This indicates that HEVC has some inherent redundancies which can be reduced in order to further optimize the video encoding/decoding performance. These redundancies are in terms of inter and intra-frame prediction co-efficient values. This basic property paves as the motivation for the work in this paper. A motion density-based scheme with unequal error protection (UEP) is proposed in [3], wherein it is seen that motion density schemes outperform the existing inter-frame & intra-frame prediction schemes of HEVC.This performance is evaluated in terms of the capability of the algorithm to find out important frames from the input video. A higher value is an indicative of better performance for the system. The proposed approach in [3] outperforms other control unit (CU) based strategies by more than10%. This approach can be used to evaluate the best quality frames or key-frames.These key-frames serve as the base-line for inter and intra-prediction in HEVC. Another approach similar to [2], but directed towards H.265 codec is proposed in [4]. Wherein, the synchronization error is reduced after two-stage re-compression in H.265 codec. This approach uses spatial texture analysis for finding out the most suitable embedding blocks. These blocks are then used in representation mode in order to find out the best pixels for watermarking. The identified pixels contain some level of redundancy, and thus can be reduced (compressed) without any significant loss in video quality. This can improve the frame rate and the efficiency of the HEVC system when operating in the H.265 mode. It is seen that the proposed algorithm performs well in the presence of any kind of noise, and there by can be used for further redundancy reduction of HEVC.
The work in [5] is inspired by these approaches in [2,4], and uses a concept of just notable distortion (JND). A good quality video can be encoded and decoded using the JND concept. These identified JND points in the video frames can reduce the size of HEVC data by more than 13% on average, and up-to 39% for certain video sequences. The mean opinion score(MOS) was evaluated for different videos, and it is observed that the approach in [5] has similar performance to original HEVC algorithm in terms of visual quality, but it has a reduced compressed video size. A similar work is proposed in [6], wherein the concept of classical secretary problem (CSP) is used in the rough-mode-decision module of HEVC.Moreover, the CSP is modified using a dynamic stopping criterion that further enhances the performance by reducing the encoding delay and marginally increasing the bit-rate performance. It uses the concept of mode reduction with the help of redundancy evaluation. A similar concept is proposed in [2], [4] & [5], and is also the base for this research.
HEVC can be extended to 3D videos. The concept of fast-depth map for intra-mode selection in 3D videos is given in [7], wherein the depth is analysed from the different dimensions of the 3D video. This depth map is used for prediction of intra-mode redundancies, and finally a compressed video is obtained. Various depth modelling models are proposed in [7], some of them also use tensor features for homogeneity detection. Due to the use of depth maps, there is a large reduction in encoding delay, which further improves the encoding and decoding performance for 3D videos. The approach in [7], can further utilize deep learning methods like deep neural networks as proposed in [8] to optimize its performance. It is observed from [8], that deep net models like convolution neural networks (CNN) can be trained with different videos to identify the redundancies in them. This trained model can then be applied to new videos to optimize their redundancies with minimal computational complexity and improved bandwidth. They further observe that specialized models like IPCNN can be trained to specifically reduce the intra-frame redundancies in order to optimize the quality of service(QoS) for HEVC. An approach that can be facilitated by CNNs is proposed in [9], where in metrics like rate distortion are evaluated to reduce the complexity of encoding and decoding process. They have used texture homogeneity between inter-frames and spatio-temporalcorrelation between intraframes in order to reduce the encoding time by more than 70% than normal HEVC. Though the results seem promising, it is advised that researchers perform adue diligence before using this research in their applications. The work proposes development of fast coding unit and fast prediction unit in order to improve the efficiency of the HEVC system. While most of these research models are based on lossy HEVC performance improvement, the work in [10] uses lossless HEVC using context-based angular & planarintra predictions. It also uses redundancy reduction in HEVC videos by identifying redundant edges, textures, colours, and other parameters between neighbouring pixels. They use pixel-level processing for edge and texture redundancy optimization without increasing any computational complexity. Due to removal of redundant edges and textures, the resulting video is completely lossless. It can achieve performance improvement of up to 10% when compared with other standard HEVC models.
Another 3D video optimization algorithm is mentioned in [11], that uses dynamically configurable depth maps similar to [7]. In [10] the depth maps are not generated using tensors or hyper-planes, but they use the concept of Rough Mode Decision (RMD). It is known that RMD is inherited from the texture maps, rather than the depth maps. This RMD affects the block distortion and the rate distortion of HEVC, and thus can be used for better HEVC performance. The proposed work achieves 0.1% improvement in Bjontegaard Delta-rate (BDRate), which indicates that the compression performance is high when compared to normal HEVC encoding. A similar method like [11] is given in [12], wherein methods like bipartition modes, intra-picture skip, and DC-only are used to optimise depth map processing. Their work indicates that depth map processing to identify redundancies using these approaches can reduce the encoding delay by more than 20%. They also propose that reduction in texture and depth can be combined to further improve the HEVC performance.
A fast and adaptive mode decision HEVC algorithm can be seen in [13], which uses coding unit partition for early termination of intra-prediction process. The proposed work in [13]forms the base our work, wherein this paper also utilizes mode reduction technique similar to [13] for a better HEVC performance. They have reduced the number of modes from 35 to 11,which improve the Bjontegaard delta rate by 1.7%, but reduce the average delay by more than50%. Thus, giving a big bump in terms of final video performance. The work in [13] also utilizes CU partitioning based on number of coding bits, which further helps in improving the system performance. This work can further be improved by addition of RD cost as a measure for mode reduction as proposed in [14]. RD cost can be an early prediction metric for reducing the number of intra modes from HEVC. Due to inclusion of RD cost in evaluation of mode reduction, a performance improvement of more than 25% can be expected when compared to usual HEVC system, which can further be improved by adding machine learning mechanisms like the one proposed in [15] for adaptive CU size decisions. The work i [15]proposes the use of complexity classification for training the machine learning model. This complexity classification method uses parameters like CU size, CU partitions and rate distortion to train a support vector machine (SVM) algorithm. This SVM algorithm solves a2-class classification problem, and classifies each intra-frame into required and non-required. All the non-required frames are dropped, and finally we get the compressed video with minimal complexity. The proposed ML algorithm reduces the complexity by more than 60%,and thereby speeding up the entire process of HEVC compression and decompression. The approach in [15] can further be modified using the techniques mentioned in [16]. From there view done in [16], we can observe that dynamic support vector machines (DSVMs), which can be destroyed and recreated for every inter-frame and intra-frame model prediction are the best option for HEVC encoding. These models must be integrated with existing HEVC approaches to further improve their efficiency. The SVM models can also be used as the final flat layer for CNN models described in [17]. This replacement can enhance the performance of the existing CNN models by more than 20%, and also reduce the complexity of processing the HEVC videos. Moreover, the CNNs can be replaced by deep CNNs, as proposed in [18]to further reduce the intra-mode redundancies. These redundancies are easily analysed by deep CNNs, and thereby can be further reduced with the help of models like GoogLe Net or VG GNet. A combination of layers like convolutional, ReLU, convolutional, ReLU,convolutional, Max Pooling, fully connected, ReLU and finally fully connected can be used for a better prediction performance. Another experimental work is described in [19], where in a SAD unit is proposed to compress ultra HD 8K videos. This can be used as a future work for deep CNN models. The next section describes our proposed PSO-based approach for HEVC processing

PROPOSED PSO COMBINED WITH CELLULAR AUTOMATA MODEL FOR INTRA FRAME PREDICTION
PSO is applied inside the intra-frame prediction process in order to optimize the PSNR at the decoding side. This model requires a certain amount of delay for the first one or two searches, but it is compensated as the number of frames are processed. Due to this self-learning nature of the algorithm, it can be integrated inside the intra-frame prediction block of HEVC. The CA technique further optimizes the performance of the existing PSO. It does so by reducing the randomized search space of the PSO via rules of CA. The application of CA to PSO is done with the help of the following rules in CA.  Let the structure for the 64x64 CTU block be defined as follows, Figure 3. Block division process  Here 'a' is the main block, 'b0 … b3' are the divided blocks, and 'c0 … c15' are the subdivided blocks and finally 'd0 … d63' are the 64 CTU blocks  Let's call this combination a particle, and in each solution generate random particles for operation  Considering the 3-level depth as shown in the figure, the particle will have 21 bits, as follows, P = a0,b0,b1,b2,b3,c0,c1,c2…c15 where, a0 = 0, when CU is not split, else a0=1 bi= 0, when a0=0 or CU is not split, else bi=1 ci=0, when a0=0, corresponding b=0, or CU is not split, else ci=1  Here a, b, and c represent the splitting decisions for depth 0, depth 1, and depth 2, respectively  The possible values for a is 0 (non-splitting) and 1 (splitting). The possible values for b are null if a is 0, 0 (non-splitting), and 1 (splitting). The possible values for c are null if its corresponding parent b is 0, 0 (non-splitting), and 1 (splitting). It should be noted that the proposed data structure is composed of a group of dependent genes. Therefore, the total number of possible CU partitioning patterns P is calculated as,  where d ∈ {1, 2, 3} is the maximum CU depth and the mod is the modulo operation for finding the remainder. If the maximum CU depth is 2 and 3, the total number of possible partitioning patterns is only 17 as shown in the following figure, and 83,522 even there are five genes and 21 genes to represent the CU partitioning pattern of a 64 × 64 CTU, respectively. o If the fitness value is better than the best fitness value (pBest) in history, then set current value as the new pBest o Choose the particle with the best fitness value of all the particles as the gBest o For each particle follow the given steps,  Evaluate the velocity of the particle using the following equation, v = v + C1 * random (CA_LIMIT_PBEST) * (pBest -currentFitness) + C2 * random (CA_LIMIT_GBEST) * (gBest -currentFitness)  Update the position of the particle using the following equation, presentParticle = presentParticle + v  At the end of the last iteration, use the particle with gBest fitness value as intra-frame prediction particle.
The best particle is replaced in the output stream of HEVC as the encoded block. Once the particle selection is done then a dual-tree complex wavelet transform block is used in order to reduce the modes of the system from 35 modes to 8 modes. Initially, Dual Tree Discrete Wavelet Transform (DT-DWT) [25] is applied to the optimum selected block by PSO algorithm. By applying this transform, six oriented wavelet sub-bands are generated. Among the sub-bands, two LH and two HL sub-bands are used find the direction or angle of the texture in a block. Polarity of the texture angle is estimated with two HH Sub-bands. With the direction and the angle of the texture, a mode is determined that is closer to the actual best mode. For best mode selection, the four modes around this determined mode (Modedeter) and also DC and planar modes are considered as a final candidate list. This candidate list is forwarded to the process of Rate Distortion Optimization (RDO). With the RDO, the mode with minimum rate distortion cost is selected as the best mode (Modebest). Based on this selected Modebest , the encoder encodes the video frames. Decoder decodes it and output of the decoder is analyzed for performance.
The flow diagram for the proposed system can be observed in the figure 5. From the figure we can observe that the output of PSO and CA system is given to the DT-CWT based HEVC encoder thereby hybridizing the system with the already existing high efficiency encoder. Moreover, the output of the decoder is used for performing result evaluation of PSNR and delay. These outputs and their analysis are showcased in the next section.
Due to the combination of the proposed intra-frame prediction model with the dual tree complex wavelet transform (DTCWT), the overall effectiveness of the system is improved. The combined model is able to reduce the search space, and also reduce the number of modes needed for encoding. Thereby, giving a dual level advantage to the system under test. Figure5. Overall flow of the system Usually rate distortion optimization (RDO) is done in DTCWT. In this work, the RDO is not done, but the RD cost is evaluated. We would request readers not to get confused between the two processes. As PSO is only using RD cost in order to select the best blocks for encoding/decoding, while DTCWT uses RD cost for optimization with the help of mode reduction.

RESULT EVALUATION
In this paper, HEVC intra prediction algorithm using PSO as well as PSO with CA is proposed.
The results are compared with [25] which use dual tree complex wavelet transform for intra prediction. Simulation is carried out in JAVA Net Beans Software. The results were compared for delay and PSNR values for different videos. These results were compared for HEVC DT-CWT [25] , HEVC with PSO and HEVC with PSO+CA as shown in Table1 which showcases the delay results obtained for the videos on the given algorithms. Author in [25] proposes a novel approach to reduce the modes from 35 to 8 and then selecting a optimum mode. We have proposed PSO with CA to reduce time consumption in decision making process along with the further application of dual tree complex wavelet transform to reduce the computational complexity. Similarly, a comparison of PSNR for these algorithms was performed, and the results are tabulated in Table 2 . From the results we can observe that the proposed algorithm is able to reduce the delay and improve the PSNR of the existing dual-tree based HEVC system. We also evaluated the average values of PSNR and delay for both the algorithms, and observed the following results.   Table 4 for different testing video sequences. From the comparison table, it is very clear that both the proposed algorithm that is HEVC processing using PSO as well as using PSO and CA provides better video quality and reduction in time complexity. The conclusion and some interesting observations from these results are mentioned in the next section. Table 4. Comparison among stat-of-the-art algorithms As shown in Figure 6, it is clear that proposed PSO and PSO with CA provides almost similar video quality which is comparatively better to the existing algorithm using dual tree complex wavelet transform. As shown in the Figure 7, proposed PSO with CA algorithm provides better time complexity reduction. The conclusion and some interesting observations from these results are mentioned in the next section

CONCLUSION AND FUTURE WORK
The increased computational complexity in HEVC is a major problem especially for power constrained devices or real-time applications especially for high-resolution videos. Therefore, it is highly desirable to optimize the encoding process for computational complexity reduction while maintaining the coding efficiency of HEVC. Fast intra prediction algorithm using PSO with CA is proposed in this paper. The experimental results are conducted for various test video. The results are evaluated based on encoding time and peak signal to noise ratio (PSNR). The proposed PSO+CA based HEVC performs faster than the existing HEVC algorithm in terms of overall delay by 40%. It also outperforms the existing method by 12 % in terms of PSNR. The results of comparative experiments demonstrate that the proposed algorithm can no doubt effectively reduce the computational complexity of HEVC Encoder while maintaining good video quality.
All these advantages are evident due to the extensive intra-frame prediction phase, where in most of the mapping process and calculations are pre-dominantly done. Another reason for such a huge bump in performance is the presence of the light weighted execution phase.
There are many other ways to explore in the CU early termination, mode reduction and fast intraprediction in the intra prediction area as suggested by literature. In future, many of these methods can be combined, or if needed, one method may be replaced by a new method and encoding time gains can be explored. Convolution Neural Network Model, SVM machine learning approach can also be applied in order to reduce time complexity. Similar Intra Prediction algorithms can be developed for fast inter-prediction resulting in lessen coding time and reduced complexity. Future research can be conducted to reduce computational complexity in Quad tree structure means dividing CTUs up to CU and PU both for the intra and inter coding can be improved to obtain much higher reduction of encoding time, better bit rate and PSNR. The aim should be to reduce the overall complexity of HEVC encoder suitable for hand held devices as well as transmission with limited computing resources.
In future this work can be further improved by evaluating the performance for higher bit rate videos. These videos are a bit complex to map, and thus might be a need of multiple preexecution steps before a required level of efficiency is achieved. Moreover, in order to really optimize the performance further, researchers can use quantum computing for processing, and develop quantum computational layers in order to evaluate its performance, and apply the proposed machine.