Long-Term Robust Tracking Whith on Failure Recovery

This article aims at a new algorithm for tracking moving objects in the long term. We have tried to overcome some potential difficulties, first by a comparative study of the measuring methods of the difference and the similarity between the template and the source image. In the second part, an improvement of the best method allows us to follow the target in a robust way. This method also allows us to effectively overcome the problems of geometric deformation, partial occlusion and recovery after the target leaves the field of vision. The originality of our algorithm is based on a new model, which does not depend on a probabilistic process and does not require a data based detection in advance. Experimental results on several difficult video sequences have proven performance advantages over many recent trackers. The developed algorithm can be employed in several applications such as video surveillance, active vision or industrial visual servoing.

confidence map is analyzed to estimate the most likely location of the object. Finally, the classifier is updated in an unsupervised manner using randomly selected patches by [2]. Another approach [3] is to locate precisely the target object at each image in order to prevent tracking errors. Based on a structured SVM framework, it addresses the limitation of previous trackers, such as [4,5], which separate the target location (Samples Labeling in a Research Region) and updating the model in two separate steps. This dichotomy introduces additional labeling errors to the Update Model because the sample chosen by the classifier may not correspond to the best-estimated object location.
The optical flow approach improves motion monitoring algorithm of several objects with a reformed location by [6]. Another improvement is a long-lasting algorithm, which allows a tracking with panne recovery. After choosing a tracked object in the first frame forward and backward in time, they calculate the distance between these two trajectories. If the distance is greater than the threshold, tracking is likely to fail, however the return of the most recent object model by the detector will reset the tracker. The major problem of the optical flow approach always remains the change of illumination [7]. An algorithm for object traking via prototypes is presented in [8]. The author of [9] present a robust visual tracking with an improved subspace representation model. In [10] the authors discuss an algorithm that makes product rule and weighted sum rule unified into an adaptive framework according to defined features distance. In [11] the authors introduce an algorithm of hybrid tracking through the analysis and experiments associated with the software of sporting video. A real time tracking algorithm based on particle filter with gaussian weighting is presented in [12]. We propose in this work a versatile and generic system of perception (NSSD_DT) based on an active perception strategy (synoptic diagram Figure 01).

Figure 1. Synoptic Diagram
First, we studied measuring techniques of difference and similarity (SAD, SSD, NSSD, CC, NCC) for matching method on an evaluation bench to choose the most effective in terms of recognition. The study allowed the selection of the normalized square difference function (NSSD).
Then we have dealt with the problem of tracking visual objects which aims to locate a target over time, in particular, we have focused on the difficult scenarios in which the object undergoes important deformations and occlusions, or leaves the field of vision. To achieve this objective, we proposed a robust method based on a similarity sentinel index to update the template when it reaches it. Our tracker has the performance to detect tracking failure and recover after failure. We therefore solve all deformation and occlusion problems to ensure a robust tracking in the long term.

Matching Method
The matching model is adopted to detect small parts that corresponds to a template image. This technique is widely used in object detection fields such as vehicle tracking, robotics, medical imaging and in the industry as part of quality control. The crucial point is to adopt an appropriate "measure" to quantify similarity or matching. However, this method also requires a high computational cost since the matching process involves moving the model image to all possible positions in a larger source image and calculating a numerical index indicating how much the pattern corresponds to the image in that position. This problem is therefore considered as an optimization problem.
The measurement of the correspondence between two images is considered as a metric that indicates the degree of resemblance or dissimilarity between them. This metric may be increasing or decreasing with a degree of similarity. When the metric is specifically indicated as a measure of inadequacy, it is an amount that increases with the degree of dissimilarity.
By sliding, we move the patch one pixel at a time (left to right, up to down). At each location, a metric is calculated, it represents "good" or "bad" match at that location. For each location of Template over source image, we store the metric in the result matrix (R). Each location (xy) in R contains the match metric. The image below is the result R of sliding the patch with a metric NSSD. The brightest locations indicate the highest matches. The location marked by the red circle is the one with the highest value. Thus, that location (the rectangle formed by that point as a corner and width and height equal to the patch image) is considered the match.

Techniques Related to Matching Model
In matching, the source object can be turned, occluded or set to another scale. Techniques that provide for a distinct model for each scale and orientation, round rigid models. Though, they are too expensive, especially for large Models. The idea is to be robust and fill as much as possible all these deficiencies, with more flexibility and with an optimized cost.
The proposed approach begins with a metric study of two methods, Normalized Correlation and Normalized Square Difference. Afterwards, we propose an improvement of a selected method on the aforementioned preference criterion.

Normalized Cross Correlation Method and Normalized Square Difference Method 3.1.1. Normalized Cross Correlation Method
The cross-correlation function is an operator that acts on two functions (f (x, y), g (x, y)), each corresponding to an image. This operator has the property of 1 when the two functions are identical and of tending towards -1 when the functions are different. In 2D, to measure the relative displacement of two images along image x and y axes, a correlation algorithm (1), (2) uses this operator, taking as functions f and g respectively portions of the reference and deformed images. The algorithm searches for the values of displacements dx and dy such that g (x + dx, y + dy) maximizes the correlation operator with f. These values are retained as the best displacement estimation of the image g with respect to the image f. In practice, the algorithm applies this procedure to image series, which are parts of the reference image. It calculates the correlation function between a reference image and a distorted image. The deformed image giving the greatest cross-correlation function with the reference image is retained as the best and thus makes it possible to estimate the displacement at this stage.  For values varying between 0.999 and 0.430, a good correspondence of 98% hence better matching resistance.

Comparison of matching methods
The normalized cross-correlation method, comparing concordance values to zero (ideal values), for correlation methods, normalized correlations, normalized difference and normalized square difference, compare the matching values to 1 (ideal values) [13].
After testing the six methods, the best results are given by the normalized correlation and the normalized square difference method, above the performance evaluation curves for both methods. For the two selected methods, by fixing the model and modifying the source, we obtain the curves in Figure 3 and 4 of the variations of the correspondence value according to the variation of the source. In similarity measurements, the NSSD method has less computational cost since it is only a square operation and a subtraction of pixels between the model and the original image. In addition, it takes less time to search the area in an image corresponding to the model. On the other hand, NCC is better than the SSD because it involves a multiplication, a division and a square root operation.

Basic algorithms: NSSD
The algorithm below represents the matching based on the normalized squared difference method.
The Normalized sum squared difference algorithm 1: Loads an input image (source) and an image patch (template) 2: Perform a matching procedure template using the function matching procedure 4: Locate the location with a likelihood of adaptation 5: Draw a rectangle around the area corresponding to the highest match

Proposed Approch 4.1. The NSSD_DT Algorithm
The proposed Matching model NSSD_DT is based on the updating of the template according to a sentinel of recognition. The update is triggered with each change of the source in its geometric form, its scale, Rotation, or occlusion. The tracking begins with an original template. At the first change of the source that exceeds the sentinel index, the updating is done by substitution of the old template by the new one.

Metric of the NSSD_DT algorithm
For efficient tracking without interruption, we have defined a "Sentinel coefficient" of value monitoring (0.5000). In this section, we discuss the examination of our NSSD_DT algorithm in two parts: The first part, with real-time video source with constant conditions, a single target car "racing" and a plain background, we are served by the raised results as a reference compared to the second metric section. In the second section of the metric, we treated the examination of NSSD_DT with variables resolutions video, variables number of

Metric test with real-time video camera
Test conditions; a single target with a constant background. This section includes (1) the vertical multiscale tracking test, (2) inclined multi-scale tracking test, tracking with source rotation test, (3) Tacking with geometrical distortion test and (4) Tracking with occlusion test.

Vertical Tracking Multi-scale
With a step of 20° of the size, the algorithm submits to the sentinel match value (0.5000). In order to reach and approach the optimal match value (0.9000), a transposition of the Template is performed.

Inclined multi-scale Tracking
The robustness measurement of the algorithm continues up to 18 % of the moving object.  Figure 6. Inclined multi-scale Tracking  Figure 7. Tracking with source rotation Table 4. Tracking with geometrical distortion   Figure 9. Tracking with occlusion

Comparative Curve of Correspondence tow Algorithme NSSD and NSSD_DT
The values in Table 4 represent the responses of two NSSD and NSSD_DT algorithms that are tested under test leap by applying the same change to the same sequence at 20fps and resolution 720 x 1080 with occlusions.  Figure 10. Video test for two algorithms NSSD and NSSD  Figure 11. Divergence between the two algorithms NSSD and NSSD_DT

Evaluation of the results on sequence video
In this section, in order to evaluate our approach, we put our NSSD_DT algorithm implemented in C++ on 8 different videos at 20 Fps: different resolution (v1), rotation of the source and the change of scale (v2, v3, v4) and we resume tracking of race car after leaving the field of vision (v5).

Tracking with fixed template and different resolutions of the source
With a fixed resolution template 1080x1920, we track the targets at different resolution. Table 7 summarizes the results that represents the average of 10 values raised for each resolution. The recognition results are evaluated at an average of 84%.

Test avec rotation de la source
The test is applied to three industrial videos; v2 to track the madeleine, v3 to track the car hull on chain and v4 to track the red stylot maintained by the robot bra. Table 8 summarizes the measurements and gives an average of recognition of 87%.

Partial Occlusion and Field of View Output Test
The evaluation of our algorithm is replicated 5 times on this sequence. The following table summarizes the average of recognition and the average of the execution time.  Figure 16. tracking with scale variation and recovery after quit field

Recap of Results
Based on the various robustness tests of our tracker, we obtain very relevant results with an average overall recognition rate of 84%.

Discussion
The principle of tracking is based on a vigilant control on the index of similarity. If the latter reaches the nominal value predefinit (i.e. the sentinel value) then an immediate update of the template is triggered to continue the tracking with similarity closer to the ideal.
This assumes that the value of the sentinel index must be very well chosen, because a value close to 1 affects the strength of the tracking. Besides, a value too low than 0.5 risks the accuracy of target marking. A series of tests for a given subject makes it possible to choose this famous index. Generally a value near to 0.5 gives satisfactory results.

Conclusion
In this work, we have dealt with the problem of tracking visual objects. The goal is to locate a target over time. In particular we have focused on difficult scenarios in which the object leaves the field of vision or undergoes important deformations and occlusions. In order to achieve this objective, we proposed a robust method based on a sentinel response index identifying the state of the object and we update the template when it reaches this index. Our tracker has the performance to detect tracking failure and recover after failure. Therefore, we solve all problems of deformation and occlusion to ensure a robust tracking in the long term.