Temporal error concealment for fisheye video sequences based on equisolid re-projection

Wide-angle video sequences obtained by fisheye cameras exhibit characteristics that may not very well comply with standard image and video processing techniques such as error concealment. This paper introduces a temporal error concealment technique designed for the inherent characteristics of equisolid fisheye video sequences by applying a re-projection into the equisolid domain after conducting part of the error concealment in the perspective domain. Combining this technique with conventional decoder motion vector estimation achieves average gains of 0.71 dB compared against pure decoder motion vector estimation for the test sequences used. Maximum gains amount to up to 2.04 dB for selected frames.


INTRODUCTION
Video surveillance, automotive, and also outdoor applications often make use of very wide fields of view (FOV) of 180 degrees and beyond. To capture such ultra wide-angle video sequences with a single camera, fisheye lenses [1] based on projection functions quite different from the pinhole model are employed. Many applications require the immediate coding of the obtained fisheye videos using a block-based hybrid video codec [2,3], for instance. Subsequently transmitting the coded data from the camera to a receiver over error-prone channels may cause losses that the receiver side may want to conceal to reconstruct the visual quality to a certain degree.
Countless error concealment techniques can be found in literature, classified into three categories, namely spatial, temporal, and spatio-temporal techniques. Spatial error concealment techniques [4] rely only on information available in the video frame to be reconstructed. Temporal error concealment approaches like decoder motion vector estimation (DMVE) [5] exploit correlations within the temporal neighborhood of the distorted frame. The third category comprises spatio-temporal error concealment techniques which try to suitably combine spatial and temporal approaches [6]. Further improvement can be achieved by adding post-processing steps like denoising [7]. In this paper, we consider temporal error concealment for fisheye video sequences. More specifically, we propose an adapted DMVE technique designed for equisolid fisheye data as depicted in Fig. 1. Block-matching error concealment techniques [8] are based on a translational motion model and are thus very much suited to rectilinear video data. For fisheye videos, however, the translational model is not a suitable assumption as they do not comply with the pinhole model. This was partly investigated in [9], where it was shown that interframe video coding and, consequently, traditional motion estimation works better in the perspective domain. As this observation can be extended towards block-matching error concealment methods like DMVE, our equisolid temporal error concealment (E-TEC) technique is based on a transform into the perspective domain to exploit its better suitability to the translational motion model. Following the motion search in the perspective domain, E-TEC employs a re-projection into the equisolid domain, where the actual concealment is conducted. E-TEC thus adapts the motion estimation technique described in [10] for use in temporal error concealment. frag replacements

DECODER MOTION VECTOR ESTIMATION
In the following, the principle of DMVE [5] is briefly outlined. Fig. 2 visualizes the approach. Given a video frame s τ [m, n] at time t = τ containing a block loss, an error concealed frames τ [m, n] is obtained by: Here, L describes the area of the lost block and (∆m, ∆n) denotes the motion vector used for concealing this block. The optimum motion vector is selected from a set of motion vector candidates (∆m i , ∆n i ) defined within a certain search range and based on an error criterion such as the sum of squared differences (SSD): D describes a decision area around the lost block, excluding the loss area L. In Fig. 2, D comprises the area within the yellow block without the area of the red block. Minimizing SSD i yields the motion vector to be used for concealing L: (∆m, ∆n) = argmin (∆mi,∆ni) After obtaining the motion vector, the lost block can be substituted and thereby concealed by copying the corresponding block shifted by the motion vector from the reference frame s τ −1 [m, n] into the distorted frame as defined in (1). Just like conventional block-based motion estimation [11] methods, DMVE relies on a translational motion model as this describes the predominant kind of motion in a typical video sequence. Since fisheye images are not based on a perspective projection function, they exhibit characteristics for which this model no longer holds true. We hence propose taking into account the projection function of fisheye images and introduce an adapted temporal error concealment method.

TEMPORAL ERROR CONCEALMENT VIA EQUISOLID RE-PROJECTION
The different projection functions and resulting image characteristics of perspective images and equisolid fisheye images become quite evident by regarding Fig. 3. While the left image is obtained using the pinhole model i. e., perspective projection, the right one is based on equisolid projection [1]: In both cases, θ is the incident angle of light and f denotes the focal length. r p and r e describe the distance to the image center in the perspective and equisolid image, respectively. Using polar coordinates (r p , φ p ) and (r e , φ e ), r p and r e represent the radius, while φ p and φ e denote the angle. Evidently, equisolid projection allows a much larger FOV, but the resulting image no longer follows the rules of projective geometry and straight lines are mapped onto arcs. As a consequence, image processing techniques based on a translational motion model must be considered suboptimal as concluded in [9]. We hence propose an equisolid temporal error concealment (E-TEC) technique based on DMVE which conducts the motion vector search in the perspective domain [10]. Since projecting the entire equisolid image into the perspective domain is practically infeasible due to the vast amount of pixels this would result in, we instead manipulate the image coordinates (r e , φ e ) in a suitable fashion. We thus use to back-project the image coordinates into the perspective domain P. Since the translational model holds here, the addition of the motion vector candidate (∆m i , ∆n i ) is conducted in this domain using a Cartesian representation. Afterwards, the now shifted polar coordinates (r ′ p , φ ′ p ) are re-projected into the equisolid domain E via r ′ e = 2f sin and subsequently applied to a suitably upsampled and interpolated version of the reference frame to extract the corresponding pixel values. Note that in neither (6) nor (7) the angle is changed in any way, so that φ p = φ e and φ ′ e = φ ′ p . The upsampling and interpolation of the reference frame is necessary since the Cartesian coordinates corresponding to (r ′ e , φ ′ e ) are no longer comprised of integer values. To preserve a certain degree of accuracy, a suitable upsampling factor must thus be chosen, e. g., a factor of 8 for eighth-pixel accuracy.
Apart from the additional projections described above, the principle of the motion vector search, including the minimization of the SSD based on a decision area D around the lost block, is the same as for regular DMVE. To minimize the SSD of the decision area D and thus determine the motion vector (∆m, ∆n), all image coordinates (r e , φ e ) ∈ D are projected into the perspective domain, where the motion vector candidate (∆m i , ∆n i ) is added. The resulting shifted coordinates are subsequently re-projected into the equisolid domain, where they can be applied to the reference frame, thus extracting the pixel values to be compared to D. Repeating this for all motion vector candidates within the search range finally yields (∆m, ∆n). Having determined the motion vector (∆m, ∆n), the block to be used for concealing the loss area is obtained by projecting all image coordinates (r e , φ e ) ∈ L into the perspective domain, adding the motion vector, and re-projecting the shifted coordinates into the equisolid domain. Applying the image coordinates thus obtained to the upsampled reference frame then yields the block used for concealing the area L of the regarded lost block.
To implement the presented E-TEC method, we directly build upon conventional DMVE and thereby create a hybrid equisolid temporal error concealment (HE-TEC) technique. HE-TEC allows DMVE as an optional technique for the concealment of blocks where our E-TEC method meets its limits. One such limiting factor is the inverse tangent function in (7). As we define an integer-pixel search range in the perspective domain, employing (7) leads to a shortened search range in the equisolid domain. Since the original search range in the perspective domain is able to cover a larger range of motion, DMVE may outperform E-TEC in the case of fast motion or for lost blocks in the periphery of the fisheye image, as it is not inhibited by a shortened search range. This is especially true when nearing the 180 degree boundary of the fisheye image as these coordinates are located near infinity in the perspec-  tive domain. Any search range is consequently re-projected onto a very small area or even a single point in the equisolid domain according to the inverse tangent in (7), hence being unable to capture any kind of motion between frames. For the proposed HE-TEC, we thus incorporate an SSDbased decision between pure DMVE and our equisolid reprojection variant E-TEC. HE-TEC is schematically depicted in Fig. 4, where S τ , S τ −1 , andS τ denote the lossy signal to be concealed, the reference frame, and the error concealed signal, respectively. The blue box denotes the conventional DMVE approach. The light red box describes the proposed equisolid re-projection approach E-TEC. While only the projection of the image coordinates into the perspective domain P, the motion vector addition, and the re-projection of the translated image coordinates into the equisolid domain E are explicitly visualized, any other necessary processing steps like the motion vector search based on SSD minimization, the upsampling of the reference frame, as well as the concealment block extraction are also part of E-TEC. Note that the projections do not require any information about actual pixel values as they work solely with image coordinates. In the following, HE-TEC is compared against pure DMVE.

SIMULATION SETUP AND RESULTS
To test our HE-TEC approach, we generated synthetic fisheye video sequences using blender [12] and, to that end, made use of several object models from [13] to create realistic scenes.   Table 2. Average luminance PSNR results. In addition, the overall maximum gain achieved for a selected frame is given.
The blender setting for the camera was panoramic fisheye using equisolid projection. The FOV was set to 185 degrees, the focal length to 1.8 mm, and the sensor size to 5.2 mm by 5.2 mm, so that the entire circular fisheye can be captured. Fig. 1 shows exemplary frames of our synthetic fisheye sequences. Street employs a moving camera and static objects and the contained motion is mostly translational. Room on the other hand uses a static camera and various moving objects so that there is no global motion.
To further test HE-TEC on real-world video sequences, we used four traffic sequences which all contain global translational motion. For each real-world sequence, an example frame is depicted in Fig. 5. Regarding the FOV and focal length, we assume the same values as used for the synthetic sequences. Only the sensor size was changed to 4.6 mm by 2.9 mm, as the real-world sequences evidently consist of fullframe fisheye images which fill the entire sensor area. The sensor size was estimated by searching the maximum radius that was mapped onto the image plane along with the assumption that 5.2 mm is enough to represent the entire circular fisheye. Further information on the test sequences used is given in Table 1.
In all of the tests conducted, a fixed integer-pixel search range of 128 pixels in every direction was used for both regular DMVE as well as our HE-TEC technique so that 257×257 motion vector candidates were evaluated for each lost block. For HE-TEC, the re-projected image coordinates were applied to a reference frame upsampled by a factor of 8 using cubic convolution interpolation. The proposed HE-TEC technique was evaluated for multiple isolated block losses of 16×16 pixels throughout all tests conducted. The decision area D around each lost block was set to a width of 8 pixels so that the union D ∪ L forms an area of 32×32 pixels. Table 2 summarizes the average luminance PSNR results calculated for the loss areas as well as the average gains obtained for each sequence. Additionally, the maximum gains achieved for selected frames of each sequence are given. For Street, average gains amount to 1.07 dB with an overall maximum of 2.04 dB. This result shows that equisolid reprojection is a suitable means for an improved motion search in fisheye sequences. Not surprisingly, there is a much lower gain for the static camera sequence Room since DMVE is able to achieve perfect signal reconstruction for most of the lost blocks and HE-TEC cannot improve on that. Nonetheless, small gains can be achieved if lost blocks contain parts of the few moving objects within this sequence.
In terms of real-world fisheye sequences, average gains of around 0.7 dB are obtained, showing that the proposed HE-TEC technique also works on non-synthetic sequences. Although the assumption of an equisolid projection is certainly not an accurate one, it is good enough to achieve an improved concealment result. Adapting HE-TEC to calibrated projection information should further increase the gain. As mentioned earlier, the search range is still a limiting factor of HE-TEC, so that an adaptation considering the equisolid projection should also potentially increase the obtained gains.
A zoomed-in visual example is given in Fig. 6. (b) shows the error pattern, (c) the concealment results obtained by DMVE, and (d) the HE-TEC results. In (e), the HE-TEC results are additionally overlaid with a decision mask. Red color denotes those lost blocks for which error concealment was done using the projection-based E-TEC approach, while blue color denotes blocks for which conventional DMVE was chosen. Although the overall visual impression seems very similar for both DMVE and HE-TEC, differences along curved shapes can be made out upon closer inspection. Here, the equisolid re-projection technique is able to achieve better reconstruction results that sum up to an improved image quality. These visual results are representative for all frames tested, synthetic and real-world data alike.
When evaluating the HE-TEC results for the real-world sequences, it was observed that on average, 75 % of all lost blocks were concealed via E-TEC, i. e., via equisolid reprojection, while conventional DMVE was chosen for only 25 % of the blocks. For Street and Room, E-TEC was employed for 71 % and 96 % of all lost blocks, respectively. It is quite evident that our equisolid re-projection technique is chosen for most of the lost blocks, thus substantiating the PSNR results. Since the majority of lost blocks is concealed by the introduced E-TEC approach, its implementation as a stand-alone error concealment method is also conceivable, rendering the blue box as well as the SSD-based decision in Fig. 4 obsolete.

CONCLUSION
In this paper, we introduced a temporal error concealment technique for fisheye video sequences via equisolid reprojection. Based on the knowledge that the translational motion model does not hold for fisheye videos due to the different underlying projection function, we employed suitable projections from the equisolid to the perspective domain and vice versa in order to conceal the lost blocks with the help of a reference frame. Furthermore, a hybrid technique combining this approach with conventional DMVE was proposed and evaluated. Average gains in luminance PSNR amounted to 0.71 dB for both the synthetic and real-world sequences tested, letting us conclude that exploiting knowledge about optics that differ from the conventional pinhole model is a suitable means for improving image reconstruction quality.
The development of the proposed E-TEC method as a stand-alone technique is part of work in progress. A major point of interest to that end is the suitable handling of the peripheral parts of circular fisheye frames, i. e., those parts, where the FOV gets close to and surpasses the 180 degree boundary. Current work also investigates spatio-temporal error concealment for fisheye video sequences as well as optimizations with regard to the motion search. Of particular interest here is the reduction of motion vector candidates to evaluate.