Prediction mode based reference line synthesis for intra prediction of video coding

Intra prediction is a significant coding tool that allows a high level of video compression to be achieved in the current state-of-the-art video coding standard, High Efficiency Video Coding (HEVC), and the joint exploration model (JEM) developed by the Joint Video Exploration Team (JVET) of ITU-T VCEG and ISO/IEC MPEG for the next generation video coding standard. In intra prediction, the top and left adjacent lines to the current coding block in the neighboring reconstructed blocks are selected as the reference lines. However, it has been observed that the adjacent reference line might not always provide optimal prediction due to the quantization noise and object occlusions caused by straight lines. In this paper, we propose the synthesis of another reference line by integrating multiple lines in the neighboring reconstructed blocks based on the prediction mode. The synthesized line and the reconstructed adjacent line compete in the rate distortion optimization process, where the line that yields a minimum cost is finally selected. The proposed method is implemented on top of JEM 3.0, and the experimental results show that −0.29% (average), −1.15% (maximum) luma BD gain and −0.15% (average), −0.59% (maximum) luma BD gain can be achieved in all intra and random access conditions, respectively, among all the test sequences.


Introduction
Intra prediction is a key technique for achieving high compression performance by reducing spatial redundancy among samples, and it is a significant coding tool in both the HEVC test model (HM) for the current state-of-the-art video coding standard, High Efficiency Video Coding (HEVC) [1], and the joint exploration model (JEM) [2] developed on top of HM16.6 by the Joint Video Exploration Team (JVET) [3] of ITU-T VCEG and ISO/IEC MPEG for the next generation video coding standard. In HEVC intra prediction [4], there are three main steps: reference sample generation, target sample prediction, and the filtering process. The residual between the target sample and the corresponding predicted sample is then transformed, quantized and further coded into bitstream. Intra picture accounts for a large portion of bitstream, and, as the reference of the following inter picture, the quality of the intra picture also affects the coding performance of the following inter pictures. Thus, improvement of intra prediction contributes to coding performance for both still pictures and motion pictures in video sequences.
The intra prediction structure in JEM is inherited and modified from the one in HM. Besides keeping DC mode and planar mode, the number of angular intra mode in JEM has been extended from 33 to 65, and denser prediction angle brings about higher prediction accuracy. Other modifications in the intra prediction of JEM can be found in the algorithm description document [6]. However, there is no change between HM and JEM in the selection of the reference line for intra prediction. The assumption of intra prediction is that the prediction accuracy is inversely proportional to the spatial distance between the current sample and the reference sample. However, even if the nearest reference line has the strongest spatial correlation to the samples in the current block, such reference line lies in the farthest position to its own reference line in the neighboring block. In other words, when the quantization step is large, the quantization error of the residual becomes large, and hence the adjacent reference line has probably suffered the most degradation due to the large quantization step in the previous prediction process. Moreover, when there is a thin object, e.g. a stick or a straight line, occluding the background, a more appropriate partitioning structure can be obtained if another synthesized line is available as the reference line. Therefore, in this paper, we propose to synthesize another reference line by integrating multiple lines in the reconstructed neighboring blocks based on the prediction mode. There is a competition between the synthesized reference line and the adjacent reference line in the rate distortion optimization (RDO) process, and the line that yields the minimum RDO cost is finally selected. Experimental results show that on average a -0.29% and -0.15% luma BD gain can be achieved in all intra and random access conditions, respectively, and the maximum gain reaches -1.15% and -0.59% in all intra and random access conditions, respectively.
The remainder of the paper is organized as follows. Related work is introduced in the next section, and the proposed method is described in section 3, followed by a description of the experimental results in section 4. Finally, we briefly conclude this paper in the last section.

Related work
Since the release of HEVC, there have been a number of studies on the improvement of intra prediction. Y. Chen et al. viewed the image signal as a 2D non-separable Markov model and proposed the use of three-tap extrapolation filters as a replacement for the pixel-copying prediction mode [7]. However, this kind of recursive extrapolation approach reduced the parallelism of the coding process. Later, F. Kamisli proposed the adoption of a 2D Markov process for both intra prediction and transform step in order to achieve improved coding gain [8]. Furthermore, a sparse coding scheme was also proposed by L.F. Lucas et al. to generate prediction using sparse linear predictors, and geometric transformation was also considered in the generation of prediction [9]. The schemes mentioned above challenged the traditional structure of intra prediction, and explored the statistical, geometric, and sparse characteristics of image content in a quest for more accurate prediction. However, a major concern is the difficulty to fully support parallel computing, which has become very popular in hardware implementation.
On the other hand, there are also several studies on intra prediction that follow the structure in HEVC. X. Chao et al. proposed a short distance intra coding method by splitting a coding unit (CU) into non-square prediction units (PUs) so that the shortened spatial distance between the current sample and the reference sample brings higher prediction accuracy [10]. The idea has been absorbed into a new partition structure, called quad tree plus binary tree (QTBT) [6] in JEM. In addition, X. Qi et al. proposed an intra prediction method based on inpainting algorithms and vector prediction [11]. Based on the assumption that repetitive or similar patches may exist over a natural image, a vector is generated by using template matching in a reconstructed search region on top and left of the current block. However, the reconstruction of a search region leads to greater memory access in the implementation. Moreover, J. Li et al. proposed trying multiple reference lines in RDO process and selecting the best one [12]. However, each line is simply tried as the reference line without any further processing. Furthermore, E. Wige et al. proposed a pixel-wise prediction based on original samples for lossless coding [13]. The causal samples of the current sample are reconstructed and grouped into several patterns as the reference in prediction. However, the dependency of samples reduces the parallelism of the entire prediction.
X. Chao, X. Qi, J. Li, and E. Wige et al. tried to explore different reference samples or reference lines in the intra prediction [10]- [13], and our proposed method shares a similar intention but differs substantially in terms of its essence. We propose to further synthesize a reference line to provide an alternative when the nearest reference line fails to provide a good prediction. In addition, the angular information is utilized to robustly synthesize the reference line. As far as we know, it is the first time a reference line for intra prediction has been synthesized, and this addition of a synthesized reference line brings -0.29% and -0.15% averaged BD gain in all intra and random access conditions respectively.

Overview
An illustration of the intra prediction mode in JEM is shown in Figure 1 (a), where red dotted arrows represent added finer prediction angles. A two-step intra mode candidate construction for RDO is performed, where the coarse 35 modes are checked based on the sum of absolute transformed differences (SATD) of the prediction residual to select the best modes in the first step and then the neighboring finer angles of the selected modes are further checked by SATD to update the best modes in the second step. Next, the first most probable modes (MPMs) that have been derived from the top and left neighboring PUs are combined with the selected modes to form the candidate list of intra prediction mode in the RDO process. In addition, as shown in Figure 1 (b), the reference line selection in both HM and JEM is the same.
However, when there is a thin object, e.g. a stick or a straight line, occluding the background or there is large quantization error in the adjacent reference line, a more appropriate partitioning structure can be obtained if another synthesized line is available as the reference line.
Therefore, in order to enhance the robustness and accuracy of intra prediction, we propose to synthesize another reference line by using multiple lines in the neighboring block based on the prediction mode, and the basic flow of the proposed method is shown in Figure 2, where the bold boxes indicate the main parts of our proposal. The synthesized reference line is checked in the RDO process after the construction of a candidate list of intra prediction mode, and a flag in the prediction unit (PU) level is added into bitstream to indicate the decoder which reference line is finally adopted in the prediction. Please note that the proposed method is only for the luma component.

Prediction Mode-based Reference Line Synthesis
Multiple lines in the neighboring block are adopted in the reference line synthesis, and the number of utilized multiple lines is limited to four in order to reduce the memory access. Another reason for selecting four multiple lines is that the correlation between reference lines and the current coding block drops drastically if more distant lines are utilized.
Let and denote the top and the left synthesized reference line respectively to the current PU. In addition, we define three categories of prediction mode: horizontal mode (1 < ≤ 34), vertical mode (34 < ≤ 66), and non-angular mode ( = 0; Planar, or = 1; DC). Thus, given a prediction mode, , we first check the category of the mode. As shown in Figure  3, if the prediction mode belongs to the horizontal mode, the synthesis of follows the direction and the synthesis of follows the direction of = 50. Symmetrically, if belongs to vertical mode, only the synthesis of follows the direction of and is synthesized by following the direction of = 18. Moreover, if is the non-angular mode, the left and top reference lines are synthesized by following the direction of = 18 and = 50, respectively.
In the next step, the specific synthesis method is explained. The sample in the synthesized line and reconstructed line is defined as [   As for the line synthesis, the sample in the synthesized line is calculated by averaging the four corresponding samples in the four reference lines, as shown in Figure 3. Therefore, the synthesis of the top reference line, , is written as where and is the length of the top reference line. Similarly, the left reference line, , can be synthesized as where ∈ [0, −1] and is the length of the left reference line.
Since the prediction direction has been available before the reference line is synthesized, it is important and necessary to synthesize the reference line along such direction, otherwise uncorrelated samples are involved in the reference line synthesis, resulting in more prediction errors. In order to calculate an appropriate shift for each reference line, we propose to reuse the modified sample prediction scheme. As shown in Figure 4, a displacement value ∈ [−32, 32] for TrafficFlow each reference line is obtained at 1/32 pixel accuracy from a look up table according to a given prediction mode , and the is further converted to a different sample offset by using where ∈ [0, 3] is the distance from the reference lines to the current block. Consequently, when the prediction direction is considered in the reference line synthesis, the eq. (2) and eq. (3) are modified as Please note eq. (4) outputs an integer offset according to the displacement value and the distance . Actually, a fractional linear interpolation between two neighboring samples was also tested but was not considered, because the interpolation increases computation complexity in both the encoder and decoder with negligible improvement of coding gain.

Experimental Result
The proposed method is implemented on top of JEM 3.0. By following the common test condition [14] in JVET, the proposed method is tested for all 24 sequences (class A1: 4096x2160; class A2: 3840x2160; class B: 1920x1080; class C: 832x480; class D: 416x240; class E: 1280x720) in all intra and random access conditions, respectively. The summary of the BD rate [15] and runtime are shown in Table 1 and Table 2. Table 1 shows the results in all intra condition, and it is observed that an average -0.29% and up to -1.15% luma BD rate gain is achieved with an 87% average runtime increase on encoder side. In addition, according to the results in random access condition shown in Table 2, an average -0.15% and up to -0.59% luma BD rate gain is achieved with a 19% average runtime increase on encoder side. For all intra condition, there is a 1% decoder runtime increase, and there is no decoder runtime increase in random access condition. From the two tables, it can be seen that the sequences 'TrafficFlow' in class A2 and 'BQTerrace' in class B are the sequences most improved by using the proposed method. The main reason is that there are many thin and straight lines as textures in the both sequences, and hence the synthesized reference line could provide a better prediction than the adjacent reference line. Furthermore, it is observed that there is almost no improvement to the sequence 'ToddlerFountain' in class A1 because of the complicated content that the sprayed water is splashed in various directions.
Finally, a subjective comparison is made between the original picture, the coded picture by JEM 3.0, and the coded picture by the proposed method. 'TrafficFlow' sequence is selected and encoded at the QP value of 37 in all intra condition. One picture is selected and the cropped part (in the resolution of 300x200) is compared in Figure 5, and it is observed that the visual quality is also improved by the proposed method.

Conclusion
This paper proposed the synthesis of a reference line using multiple lines in the neighboring blocks for intra prediction based on the prediction mode. The proposed method was implemented on top of JEM 3.0. In all the test sequences, an average -0.29% and -0.15% luma BD gain in average was achieved in all intra and random access conditions, respectively, and the maximum gain reached -1.15% and -0.59% in all intra and random access condition, respectively. In the near future, we plan to further improve the coding performance and to design fast algorithms for reducing encoder runtime.