Motion Detection and Clustering Using PCA and NN in Color Image Sequence

This paper presents a motion detection method with the use of the Principal Component Analysis. This method is able to detect and track moving objects in a sequence of images. The tested sequence is segmented within the meaning of movement. In this paper, the concept of extracting significant information from a large number of data is adopted to provide an effective method for tracking moving objects on the video image. The principal components are different in term of getting significant information, the nature of motion (the nature of information) is responsible of this difference, the algorithm in this paper distinguish the motion nature and choose the appropriate components to give a best segmentation.


Introduction
The detection and tracking of objects in an images sequence is one of the issues that arises in many image processing applications such as traffic monitoring systems [1] or surveillance systems [2]. The difficulty of such application is accentuated in environments without constraints where the monitoring system should be adapted to the high variability of the objects as well as motion detection problems. Motion detection plays a fundamental role in the tracking algorithms, in so far as almost all begin with motion detection. Since PCA (Principal Component Analysis) is generally used for extracting significant information from a large number of dataset, this study adopts the concept of PCA to provide an effective method for tracking the moving objects on the video image. The proposed algorithm allows the motion detection; areas and moving objects, with the use of principal component analysis. It is a region segmentation within the meaning of movement of an image sequence. Sometimes, the movement is so slow that the algorithm of detection cannot detect this motion. The proposed motion detection algorithm is a recursive algorithm that detects each time for ten frames, the area where the pixels are in motion, whatever the nature of motion, fast or slow.
The problem is to segment the motion from sequences of images in the case of a stationary camera. This can be solved by threshold difference images [3]. However, the choice of this threshold results from a post hoc analysis of the results by an outside observer. In the case of a very noisy sequence, the same threshold is no longer appropriate. Indeed, several methods have been proposed for segmenting a moving object in a sequence of images including segmentation by subtracting the background. As a set of static pixels, the background can be removed from a given image in order to extract the pixels in movement. The variation of the different techniques depends mainly on the type of used background model [4][5][6][7][8][9][10]. The background image without moving objects [11][12] is needed to be used as reference information. But most of the time, it is not possible to obtain a sequence of images without moving objects, for example in traffic monitoring. In [13][14][15][16], methods for generating a background model from a sequence of images with moving objects are presented. Another approach is to make the motion detection problem, in a Bayesian framework, in terms of energy minimization. This energy will then be minimized by the algorithm of minimum cut calculation "graph cuts" [17]. In this paper, first, a brief overview of previous related approaches is presented. Then, we propose a motion segmentation algorithm that successfully detect fast and slow motions. Finally, performance evaluation, analysis, and discussion are carried out.

Our Approach
Because the PCA is generally used for extracting significant information from a large number of dataset, therefore, this approach uses the concept of PCA to detect the trace of the moving object by converting the video image domain to the PC domain. Our algorithm is a recursive algorithm which is designed to define each time the moving area using the principal component analysis on ten images. For a sequence of several frames, the recursive algorithm takes each time subsequences of 10 frames and makes a call to the motion detection algorithm using the PCA to segment the movement of the scene, in order to be a real time algorithm. The segmentation of the movement depends on its nature, fast moving or slow.The proposed algorithm should then do this distinction. For this, we use criteria for distinguishing extracted from optical flow of movement. The fast/slow classification is done by using a network of neurons apprentices. The block diagram of the algorithm is presented by Figure 1.

Optical Flow
The optical flow is a visual displacement field that helps to explain variations in a moving image in terms of moving image points. Calculating an apparent movement between two images corresponding to the estimation of parameters of a transformation affecting the image points where we associate for each image pixel ( , , ) x y t , a vector ( , ) v for all pixels ( , ) xy belonging to image. From the speeds matrices u and v , we measure criteria that we use it soon to distinguish the nature of motion slow or fast. Figure 2 shows the optical flow between two images with movement. The field of vectors corresponds to the displacement of the pixels of the image. An overview of different methods to calculate the optical flow is presented in [18].

Classification Criterion
Using the matrices velocities u and v, we calculate the following measures: The sum of absolute differences, the Euclidean distance, the Hausdorff distance and difference of energies. These measures are used as criteria for the classification of slow or fast motion. They are brought in a network of neurons that will be our tool for classifying movements of the scene. Indeed, the amounts thus calculated measure the movement of image pixels. The movement is considered fast for a certain value for each of the calculated measures. The sum of absolute differences is determined by calculating the absolute difference of speeds for each pixel, these differences are summed to create a measure of movement in an image. Euclidean distance is the known distance that defines a Cartesian space. The Hausdorff distance is also a topological tool that measures the distance of two subsets of a metric space underneath. As to the difference of energy is the difference of the sums of the squares of each matrix velocity elements. We choose twenty frames contain fast motions, and the same for the slow motions. Figure 3 shows the measurements calculated for the series of quick movements followed by the series of slow movements. We note that rapid movements take the great values of the measurements compared to the slow movements. So we can get the critical values that distinguish the nature of movement. This task is performed by a neural network apprentice introduced into our algorithm, which are able thanks to this criterion to classify a motion as fast or slow.

Neural Network
The network has nine neurons and one hidden layer. The network has as inputs the classification criteria measured previously it classifies according to these inputs a slow or fast motion. Figure 4 shows our neural network in which the output is a two dimensional variable that we convert it to a Boolean variable; 1 indicate that the motion is fast and 0 indicate that it is slow. We train the network with a training set of forty pairs of slow and fast images. For twenty pairs of images taken for validation, classification gives nineteen-ranked movements, that to say an error rate of 0.05. As it is shown in Figure 5 the best validation performance of the neural network, when it is training, is obtained at epoch 10 where the mean squared error 8.8191e-08.

Principal Component Analysis
The video data are initially represented by a function defined in a three dimensions space: two spatial dimensions ( , ) xyand a time dimension (t). At each point in this space is assigned a gray level (or a color component vector) at a point ( , ) xyat time t . The various semantic entities (background, moving objects) are then a subset of points in this space. To identify them, they should be aggregated into classes of points with common characteristics. It goes that the number of points to consider is very important, especially if we want to consider more than two frames to detect moving objects. This is why, the approach of building a model of background is so common: the only points to be considered are those of the current frame, while the model of the background is supposed to summarize all past observations. We believe it is best to keep a less synthetic knowledge of the past because the relevant information to extract is not always the same. We therefore intend to select a representation space more suited to the sequence itself rather than each of the frames and which allows to take into account the movement without changing the initial information. Also in view of the use of data analysis techniques, the sequence is no longer considered as a function but as a set of individuals: the pixels that we see when we look at the sequence. To avoid having to make a detailed analysis, it is not the objects that we follow but it is a fixed position which is considered on the surface of the image. From each pixel we will hold multiple values of the gray levels over time. Each pixel is an individual characterized by a set of parameters. Individuals are marked in a p dimensional  space. Obviously to stay within reasonable processing times, almost real-time, we must make a reduction in the mass of information. To preserve the information that will allow the best to discriminate the points and build classes, we chose to use a principal components analysis. We consider that we want to treat a set of 10 observations, which are realizations of a random pdimensional variable, which p is the size of an image of the scene, it is the number of pixels in the image. These observations are arranged in an array of data X , which contain p rows and 10 columns, each column is formed by an observation. The PCA is a statistical technique which aims to simplify a dataset by expressing it in a new coordinate system such that the greatest variances are observed on the first coordinates. This reduces the dimensionality of the search space, keeping only the first dimensions of the projection space obtained [19]. We consider the data matrix X . If we call C the variance-covariance matrix associated with X , then the main axis directions are given by the eigenvectors of C as in Equation (1).
The axis on which it is observed the larger variance is defined by all eigenvector associated with the largest eigenvalue in absolute value. Figure 6 represents the steps of the principal component analysis. The proposed motion detection algorithm is to create a data matrix from a set of images and apply the PCA [20] to release a basic of eigenvectors which we can keep only those who explain the better the variance of the image database. Each image is then projected into the space of reduced dimension. We consider a subsequence of ten frames, where each frame is associated with a size of the data representation space, wherein we apply PCA in order to represent the data in a low dimensional space wherein the points representing a coherent movement are close. The method must be as insensitive as possible to the various conditions in which acquisitions are made: we focus more on the gray level variations rather than the grayscale of the pixel itself. We can, from the start, delete a dimension of the data representation space by choosing to fill the data matrix with time derivatives at any point, then we are in a space of nine dimensions [21]. (Y is data matrix represented in this space). Moving areas appear clearly when we project the sequence of the 10 frames on the first three principal axes. The difference between a static area and a moving area is accentuated on these axes. According to the eigenvalues histogram and also to the principal components, we find that for a slow motion, due to the small quantities of information comparing to a fast motion, we can consider only the two first principal axes, these axes can show the moving areas. Figure 7 compares two eigenvalues histograms for a slow and fast motion. The x axis represents the nine axes and the y axis represents the eigenvalues. As we can see, the information is concentrated almost in the first component for a slow motion, so the variance is greater for this component comparing to the others. For a fast motion, it is true the greatest variance is taken by the first component but the first others have a near values of variance. So we superpose the three components to build the mask of segmentation, in which we project our images, for a fast motion and just two for a slow motion. Figure 7. Eigenvalues histograms of a slow motion frames (on the right) and a fast motion frames (on the left)

Experimental results and discussion
The experiment is designed to test the proposed method for detecting and tracking the objects moving (the man, the chair, the phone) in conference room in the image sequence, whatever the motion is fast or slow.The tested sequence, contain 131 frames in which a man is entering to the scene, sitting on the chair, calling in the phone and then leaving the scene. The sequence is segmented and the movements are detected. The result of the motion extraction is shown in the Figure 8. The x axis represents the serial number of the frames and the y axis indicates the number of motion pixels. Two significant peaks can be evidently observed in the graphic, this peaks correspond to the entering and the leaving of the man in which we have a large area of motion pixels. When the man is talking on the phone the motion is very slow and we have a small number of motion pixels. The results of experiment indicate that the proposed method can track the moving object successfully whatever the motion is slow or fast. Although the noise will cause some pseudo moving events in the video images, the proposed method still can track the interested target and

Conclusion
This paper presents a new method for motion detection and tracking using the concept of extraction significant information, the Principal Component Analysis. The information provided by principal components depends on the type of motion. This method distinguishes the nature of motion and uses Neural Network to classify it, and according to this it uses the appropriate principal components to segment motion. The proposed method is tested on images sequences and gives a satisficing result.