Using Rao-Blackwellised Particle Filter Track 3D Arm Motion Based on Hierarchical Limb Model

For improving the efficiency of human 3D tracking, we present an algorithm to track 3D Arm Motion. First, the Hierarchy Limb Model (HLM) is proposed based on the human 3D skeleton model. Second, via graph decomposition, the arm motion state space, modeled by HLM, can be discomposed into two low dimension subspaces: root nodes and leaf nodes. Finally, Rao-Blackwellised Particle Filter is used to estimate the 3D arm motion. The result of experiment shows that our algorithm can advance the computation efficiency.


I. INTRODUCTION
LTHOUGH caught the many researchers' attention because of its widely application, human 3D tracking is still a challenging task because of the exponentially increased computational complexity in terms of the freedom degrees.
Moeslund [1] [2] et al. classified the existing research pose estimation algorithms into learning-based algorithm and model-based algorithm. The model-based category builds the human motion model with human prior knowledge and the human motion constraints, and the use of stochastic sampling techniques in model-based analysis-by-synthesis to obtain the optimal estimation based on the Bayesian network framework. As the nonlinear filter algorithm based on the Bayesian estimation framework, particle filter [3] has been widely application [4] [5] in the area of human 3D motion estimation. Deutsher [6] et al. proposed the annealed particle filter to track the human 3D motion. Markov Chain Monte Carlo [7] [8] is utilized to solve the particle degeneracy problem. Recently, the structure graphical model [9] has been used to facilitate the estimation of human 3D motion. Based on the structure graphical model, Wei [13] proposed a decentralized framework. And facilitated by graph decomposition, they derived a novel Bayesian conditional density propagation rule.
Although these algorithms have achieved the goal, they can't track any motion in the nature scene but tracking the learned motion. And learning a general probabilistic model in full space is very difficult because of the high dimensionality and the huge amounts of training data to account for motion complexity. Because of the restrictive dependency, RBPF can not be directly used for 3D human tracking.
Focused on these problems, the paper proposes the 3D arm motion tracking algorithm using Rao-Blackwellised Particle Filter based on Hierarchy Limb Model. The arm motion spaces can be divided into two parts: root variables and leaf variables by the Hierarchy Limb Model. Rao-Blackwellised Particle Filter can be used to track arm motion via the decomposition. As a result, our algorithm can advance the computational efficiency because of the lower dimensionality of the search space and the reduced amounts of particle.
The paper is organized as follows. Section 2 describes the Hierarchy Limb Model. The 3D arm tracking algorithm based on RBPF is proposed in Sention 3. Experimental results and analysis are shown in Section 4, and finally concludes the paper.

II. HIERARCHY LIMB MODEL
The arm can be represented by an arm graphical model such as shown Fig. 1 (a). The circle nodes corresponds to a part of right arm, such as the right upper arm and the right lower arm. The square nodes are the observation values assiocated with each circle nodes. The undirected links represent physical constraints among different parts of the right arm. The directed link from a part's state to its associated observation represents the local observation likelihood. In order to describe the motion of an articulated object, we accommodate the state dynamics by a dynamical graphical model such as shown in Fig. 1 (b). It contains two consecutive time frames.The directed links between consecutive states represents the dynamics translation from time t-1 to time t.
We model the arm as 2 cylinders connected at revolute joints, and denote the state of each part of arm at time t by , where i=1,2 is the index of two parts of arm. The arm motion state space is 12-dimensional, including 6D for the global (shoulder) position and orientation, 3D for the elbow as 1,t x , 3D for the wrist as 2,t x .  According to the characteristics of arm motion, the motion of any node of the arm only interacts with its children nodes. For example, the motion of lower arm is not constrained by any limbs but only the motion of corresponding upper arm. Fig. 2 (a) show the descomposion result for the right arm in Fig. 1 (b), and Fig. 2 (b) is the associated moral graph via the separation theorem and the charactics of the dynamic Markov network [13]. Using the arm hierarchy model, the problem of tracking right arm motion can be formulated as the prediction of x at time t .
III. 3D ARM TRACKING WITH RBPF By the HLM, arm motion can be decomposed into two parts: 1,t x (upper arm) and 2,t x (lower arm). We denote 1,t x as root variable t R , and 2,t x as leaf variable t L . In the algorithm, we marginalize the lower arm pose variable by using the motion speed correlation matrix V. The strategy of Rao-Blackwellised Particle Filter is to partition the full state space into two parts, t R (root variable) and t L (leaf variable), so the 1: Given the speed correlation matrix V, and the image measurements 1:t Z , the Equation (1) can be approximated by the following expression: In Equation (2), the integral parts is propagated the root variables t R utilizing the state transition model according to the standard Particle Filter. The leaf variables t L can be predicted analytically using Kalman Filtering prediction. The image likelihood model, as ( | , ) t t t P Z R L , is computed in the state space, including root and leaf variable.

A. State Transition Model
In particle filter theoretical framework, the state transition model, by which particle is generated, is described as shown Where t v is drawn from the Gaussian noise that the expectation is a 3×1 scalar, which is defined as the motion speed of root variables, and the variance is the 3×3 diagonal matrix. Utilizing hard prior, we can eliminates the particles corresponding to implausible arm pose to reduce the search space.

B. Prediction for the Leaf Variables
The leaf variables can be predicted by the use of the motion speed correlation matrix V using the Kalman Filter while the root variables are propagated. Condition on the root variables and the correlation matrix V, the leaf variable t L are described as the following expression: Where represent the process noise and represent the measurement noise. V is defined as the motion speed correlation matrix. Then the leaf variables are predicted by Equation (5).
In Equation (5), i is defined as the particle index. In Equation (4) and Equation (5), A and H are assumed to be the diagonal matrix.
According to the Equation (4) and Equation (5), Kalman Updates can be computed as the following expression:

C. Image Likelihood
The observation likelihood model is represented for the matching relationship between the human appearance model and the features subtracted from the image among the particle filter theoritical framework. In this section, color distribution and image edge information are used to calculate the matching similarity between the human appearance model and the features subtracted from the image.
Color distributions are used as target models as they achieve robustness against non-rigidity, rotation and partial occlusion. The weighted color histogram, which consists of m=8×8×8=512 bins, is choosen and calculated in HSV color space to decrease the effection of the illumination.
The projection quadrilateral of the limb shape is defined as Where ( ) x is the Delta function, Dr S is the area of Dr , C is the normalized constant. The Bhattacharyya distance is used to calculate the simility between two weigthed color histograms.

IV. EXPERIMENTAL RESULT AND ANALYSIS
We have done experiments to track the right arm motion using the HumanEva data sets [14]. The experiment chooses the right arm motion color video made in the front to reduce the self-occlusions. Table 1 is the comparison of mean error, Mean, and error variance, Std. between the ground truth and the prediction value of the right lower arm under different count of particle using our algorithm in X direction, Y direction and Z direction. The Equation (8)

A. Experimental Result
In Equation (8) and Equation (9) the frames of test video is described as T, and T=796. t x is the prediction value, t X is the ground truth at frame t. From Table I, the mean error and error variance between the prediction and ground truth have not evidently changes as the particle count of limbs increasing. Then we can draw the conclusion that the count of particle for limbs can not affect the tracking result of our algorithm. Figure. 3 shows the tracking results of 3D arm motion by our algorithm. It is no evidently different between the tracking results of our algorithm and the real pose of arm motion.  N ). While standard particle filter generates N K kinds of combination patterns of particle in whole state space, which is formulated as N K kinds of motion states and the computational complexity of the standard particle filter is E (N K ). In our experiment, K is 2, N 1 =200, N 2 = N 1 /10, N=200. Then time-cost of HLMRBPF tracking one every frame image is 5908ms, while the standard Particle Filter (SPF) needs 14768ms. Table II show the comparison of mean error, Mean, error variance, and Std. between the prediction values using two algorithms and the ground truth in X direction, Y direction, and Z direction.

V. CONCLUSIONS
The paper proposes 3D arm motion fast tracking algorithm. Based on the HLM, the algorithm can transfer the global optimal search of the whole state space to the top-bottom search based on the joints under the case that the dimension of state space is unchangeable. In the process of tracking, the particle count is reduced by the prediction of each joint of HLMRBPF. The experiment shows that the tracking result using our algorithm is not evident difference compared with the standard  Fig. 3 3D animation Comparison between the tracking result by our algorithm and ground truth; (a) 3D animation for the tracking value of our algorithm; (b) 3D animation for the ground truth.