Framework for Contextual Outlier Identification using Multivariate Analysis approach and Unsupervised Learning

ABSTRACT


INTRODUCTION
The usage and technical adoption of the video surveillance system has been increasing in faster pace owing to the increasing security concerns [1]. At present, there are various processing techniques of video that has significantly benefited the computer vision strategies to a great extent [2], [3]. Although, existing system are capable of capturing the high definition video as well as transmit the high-definition video frames over wireless links, but they also suffers from some significant pitfalls [4], [5]. The most prominent issues in the existing system are to perform identification of the outliers that may come in different shape and form of an object present in the scene with respect to its context [6]. The meaning of the outlier pertains to presence of objects or events that has highly less probability to occur with respect to the given context of the scene. Construction a framework in order to perform identification of such events or objects is quite challenging especially considering unsupervised manner and hence it has drawn the attention of the research communities. The existing approaches called for using spatial and temporal factors [7], [8], optical flow [9][10][11], background model [12], Histogram-based [13], [14], behavioral-based template matching [15], [16], etc. From practical implementation viewpoint, the extraction of feature as well as construction of framework is quite significant for an efficient identification of outliers for a given video frames. Therefore, the proposed system offers a unique framework that utilizes unsupervised learning mechanism for developing an involuntary system for detecting abnormal events. We explored that the set of feature-based attributed utilized in existing research work are not capable enough to perform representation of complex contextual behaviour of the video frames. Existing system perform better capturing of information related to gradient 1093 that is better effective for resisting changes against illumination and appearance. However, we have also seen that video frames are more likely exhibit the characteristics of gradients and hence such existing system may ignore a voluminous amount of significant contextual information. The existing systems are also found to have lesser dependencies on data in such a manner that it is not feasible to use the information specific to particular task in the dataset. The proposed study is highly motivated by the advancement in usage of feature towards identification problems [17], [18] and thereby presents a framework that uses matrix decomposition principal in order to optimize the learning process of videos. We find that our proposed system is highly capable of capturing more relevant information with higher range of complication and therefore can harness lots of task-related information present in the dataset. This mechanism is used for extracting features. Therefore, in that context, it can be said that proposed mechanism can offer better performance in contrast to existing system. Another significant contribution of the proposed system is its usage of probability theory to perform computation of level of outliers present from the pixel-levels using block-based transformation process. For the purpose of resisting detection of too much of local values, the proposed system carry out appending of both temporal as well as spatial data that bears more contextual information. It is to be noted that proposed system uses unsupervised learning approach that is completely free from any form of human intervention with respect to both feature-based learning as well as framework-based learning. The study outcome shows better performance with respect to existing system and offers better computational performance while performing the process of identification. Section 1.1 discusses about the existing literatures where different techniques are discussed for detection schemes used in outlier localization in video surveillance system followed by discussion of research problems in Section 1.2 and proposed solution in 1.3. Section 2 discusses about algorithm implementation for accomplishing the proposed research goals followed by discussion of result analysis obtained in Section 3. Finally, the conclusive remarks are provided in Section 4.

Background
This section discusses the existing techniques towards the identification of significant events in the form of an outlier. Dutta et al. [19] have presented a framework using sparse coding for performing saliency detection as well as identification of outliers. Usage of saliency-based approach was also seen in the work of Jang and Park [20] towards identifying potholes from grayscale images. Wang et al. [21] have used localized histogram for analyzing crowded scene using supervised learning approach. Zhou and Torre [22] have had also adopted spatial as well as a temporal scheme for analyzing human poses using three-dimensional capturing model. Fu et al. [23] have presented a technique for identification of possible outliers framed from the annotation from the video. Li and Haupt [24] investigate the problems associated with the localizing the outliers in larger samples of data inflicted with noise. Xue et al. [25] have introduced a technique where the outlier's detection is carried out by emphasizing on the estimation of foreground considering the sparsity constraint. Zhou et al. [26] have used a low-rank representation for identification of outliers of contiguous type. Gopalan et al. [27] have used learning-based methodology followed by feature extraction from pixel hierarchy and using particle filter to perform identification of the outliers from the traffic data considering lane markings. Abnormal behavior detection is also investigated over a facial data by Yang and Bhanu [28]. Ni et al. [29] have used principal component analysis along with mining-based approach inorder to identify a pecular pattern of age from social videos. Ammar and Lashkar [30] have presented a technique that performs diagnosis of the typical pattern of a sleep disease right from optical video flow. Identification of the outlier was also carried out by Choi and Choi [31] for assisting in fire-resistive application. Feris et al. [32] have presented a correction technique of lightning condition that significantly assists in show-based outlier detection. Jayasuganthi et al. [33] have modeled uniform background using Gaussian algorithm followed by segmentation and used k-means algorithm for performing video surveillance. Liu et al. [34] have used a sparse collaborative model for performing detection of outliers from a given video. Maurya and Toshniwal [35] have used supervised learning algorithm for training the data gathered from a nuclear power plant to identify set of outliers. The work carried out by Pang et al. [36] have presented a study where the extraction of features as well as clustering of data is adopted to perform detection of outliers from a public scene images. Similar clustering methodology was also adopted by Pritch et al. [37] on the video data to perform identification of abnormal events. Adoption of time-series for analysis its effect on the outlier detection was seen in the work of Teng et al. [38]. The work of Bayat et al. [39] discussed the detection of goal in soccer by using event detection mechanism and achieved better accuracy with less detection failure. A forgery detection model for mobile recorded and surveillance videos were presented in Staffy et al. [40]. This model found able to identify the tampering irrespective of video format. Teddy et al. [41] performed the performance analysis of automatic number plate identification over Android smartphone device and found effective recognition at 0.98s processing time. Therefore, it can be seen that there has been various forms of techniques that has been evolved in most recent times for solving the problems associated with outlier detection. All the existing studies have a significant level of advantages as well as contributions. However, the existing studies are also associated with significant loopholes which are required to be addressed. The next section briefs about problems identified from existing literature.

Research Problem
The significant research problems are as follows: a. The existing technique of outlier detection has been constructed depending on a particular pattern of an object without considering the actual context of the scene. b. Usage of supervised learning approach increases the accuracy of the identification of an abnormal event but at the cost of computational complexity. c. Usage of prior information about the object and its types makes the existing system more narrowed to the specific research environment and became incompatible when it changes. d. The extent of false positives is more in the conventional techniques even where the sparsity coding has been carried out in order perform outlier detection. Therefore, the problem statement of the proposed study can be stated as "To design a framework that offers more contextual scene analysis to enhance the precision level of identification of outliers." The next section discusses the proposed solution.

Proposed Solution
The proposed study is a continuation of our prior work [42], [43] where the present solution targets to develop a framework for video surveillance system that is capable of identifying outliers for a given set of captured video frames. The primary consideration of this paper is that existence of outliers is never instantaneous and normally exists on the given scene considering both time factor and spatial factor where both these factors can also be stated as contextual factors. This can be empirically represented by Equation (1), In the above expression, the variable A represents an aggregation of local attribute where A spat and A time represent spatial and temporal attribute associated with the local feature. The idea of the proposed system is to compute local attribute a ( A a  ) for all the pixels to develop a local attribute Z i that is localized at the centroidal position for a given pixel. This process results in the generation of histogram π(Z i ), where π represents negative coefficient. Therefore, applying probability, the identification of the outliers prob(A) can be empirically expressed as, In the above Equation (2), z k =(x k -x o ,y k -y o ,t k -t o ) can be considered to represent the position-based association of Z k with Z o . The above expression can further be split in the form of prob(z k |ϕ i , ϕ j ) to represent probabilistic selection for a position with time and spatial-based attributes where variable ϕ represents dictionary. Also, in the above empirical expression, prob(Z k =ϕ j ) is equivalent to correlational factor existing between X k and dictionary ϕ j as shown in Equation (3). Although the computation of prob(A) is slightly computationally complex process owing to its dependencies of histogram factors π for all the given value of A, it also offers capability to compute such forms of attributes and reutilize histogram attribute π for all the local attributes for different aggregates A of pixels. The construction of the histogram factor of negative coefficient π is empirically computed as shown in Equation (4) as follows: One of the interesting facts to observe is that usage of above histogram attribute π is approximately equivalent to local attributes that offer significant amount of granularity by harnessing the minute information from the given multimedia file (i.e., video) to offer better accuracy performance while performing an outlier identification. The next step of the study is to perform learning operation where the initial learning process is applied on selection of probabilities variable prob(π k |ϕ i , ϕ j ). Applying learning operation is not challenging here as it can be directly executed over the trained data. The second learning process is applied concerning identification of cut-off μ. The prime idea implemented in this part of the study is if all the aggregated points are observed in the form of time-based chronological data than the frequency of occurrence of outliers has significantly lower value compared to other data points. Therefore, we construct a statement that while monitoring the objects of the aggregates are found to satisfy the following condition shown in Equation (4) as decision making: If the above condition is found to be valid than the system decides that the monitored object is considered as outliers. To learn the cut-off μ from trained data, a hard-coded attribute of probability of outlier could be selected by the requirement of the user. We consider that smaller value of such probability will not lead to efficient detection as it may have higher chances to generate false positives as only less number of outliers will be ignored. At the same time, a higher value of such probability should also be rejected as the outcome may not be practical in origin. This problem is avoided by performing sparsity-based learning approach in order to compute prob(A) for all the aggregates existing in the trained dataset for the purpose of evaluating the cut-off value of μ to any particular value in such a way that proportion of aggregates satisfying the logical condition of prob(A)< μ is a specific probability, i.e., prob(a). Where A a  . An analytical research methodology is applied to carry out the implementation of the proposed study. Figure 1 highlights the schematic diagram of the proposed system to identify any form of abnormal events as outliers from the video frames. The proposed system is designed in two possible steps of (a) unsupervised training and (b) validating. The system uses local-level features in the preliminary level that is characterized by more sophisticated patterns. With the aid of probability theory, the proposed system assesses the level of outliers existing in given data using a blocking operation. An algorithm for extracting block is designed that takes the input of video frame which upon processing yields an output of extracted blocks. This block assists in further extraction of local-level information thereby constructing a good number of features with the aid of next algorithm of sparsity-based learning. Both the spatial as well as time-related information captured from the trained blocks are used to perform detection of outliers. Finally, an algorithm for outlier detection is formulated. The significant contribution of the proposed system is as follows viz.
a. Construction of an analytical framework for feature extraction from a given video set b. Incorporates blocking operation for further granularity in the feature extraction process c. Using a dictionary for assisting in the better identification process. The significant level of the study contribution is that proposed system is completely capable of solving any form of multivariate problems existing in the case of video surveillance system. Therefore, the context of the scene is understood well, and all the detailed information is significantly captured by the proposed framework. The utilization of the proposed framework is more on abnormal object behavior in any environment. The next section discusses the algorithm implementation followed by a discussion of the outcome obtained from the study.

ALGORITHM IMPLEMENTATION
The prime purpose of the proposed algorithm is to perform a precise identification of the outliers from the video frames. However, this research aim is carried out considering the formulation of three different algorithms where they are responsible for extracting blocks, applying sparsity for implementing learning strategy, and for performing outlier detection. All the algorithms are constructed in a sequential form and hence are respectively illustrated sequentially. The steps involved in the algorithm-1 are as follows: The algorithm-1 initially takes the video frames f as the input (Line-1), which is followed by the number of operations in the consecutive steps for performing block computation. The size of the frame f is then mapped into three variables number of rows nr, some columns nc and index k (Line-2). The next step is to convert the pixel elements into columnar form to divide the frame into distinct 5x5 block B (Line-3). Two variable n rd and n cd computes the number of rows and columns for dictionary respectively along with computation of one block size obs (Line-4). Finally, the dictionary is created considering nrd, ncd, and the number of frames for training divided by block size. For all the sizes of the dictionary-based blocks (Line-6), all the frames are considered, which are then further divided into 5x5 distinct blocks. Finally, the dictionarybased blocks are computed as shown in Line-5 to obtained B dict , i.e., dictionary-based block. A loop is created as shown in Line-6 for all sizes of B dict to compute vector v=B dict (j) and m 1 represents the mean value of B dict (Line-7) that finally leads to the generation of extracted blocks B dict as the outcome.
After the blocks are extracted from the given frames, the proposed system implements a novel form of matrix decomposition to perform multivariate analysis using sparsity-based learning process. The steps of an algorithm-2 for sparsity-based learning and steps of an algorithm-3 for outlier detection are given. The above-mentioned algorithm-2 first initializes the index of the matrix k to set dimension of the dictionary for applying it to multi-variate analysis to it (Line-1). A structure is maintained for the dictionary followed by formation of a loop as shown in Line-3. A matrix Dt is created for storing all the dictionary-related values within itself (Line-4) followed by the creation of a super-index X (Line-6) that maintains all the multi-variate matrices of Dt. The next step is to apply a function ϕ that performs sparse matrix decomposition using linear algebra (Line-8) over the matrix index k and super-index X. As the outcome of this matrix is always positive, therefore, it is easier for computing the resulting matrix. The outcome of this algorithm results in the generation of multiple features, e.g., X (matrix with mixed signs), A (basis matrix), and Y (coefficient matrix). The dictionary evolved from this algorithm in Line-8 will be reused for performing identification of the outliers for the test frames. After the features have been extracted from the multivariate analysis concept 1097 used in the learning phase, the next part of algorithm implementation will be to perform identification of the outliers.
The input to algorithm-3 for outlier detection is the dictionary that has been formed in its prior algorithm (Line-1). All the images, as well as ground truth images, are considered in this study phase. It is followed by implementation of first algorithm for block extraction (Line-2), where similar steps, e.g., creation of dictionary of 5x5 block size, reading the test frame, dividing the frames into 5x5 blocks, computation of dictionary-coefficients (n rd , n cd , and obs), computation of mean of B dict , and estimation of dictionary-based vector v (Line-3). A new matrix M Dict is formulated by-product of a dictionary of all sizes of B dict and superindex (line-4). A distance computation of the two linear points is computed using probability prob between M dict and normalized vector a followed by computation of final probability, i.e., E prob (Line-6). The algorithm then estimates the minimum value of E prob followed by final computation of the final feature Dim that transforms columnar information in the form of an image (for highlighting the outliers). Finally, using the information restored from ground truth image, the system checks if the final feature Dim is more than 1 (Line-8). It is important to understand that ground truth images play a significant role in outlier detection where first the ground truth GT images are read and converted to binary images. This process is followed by applying another function Ω which is meant for performing outlier detection by ground truth image GT, binary image bw for the condition mapping with Dim>1, and final feature Dim. The function Ω is designed in following steps: first the region-based properties of ground truth image GT is estimated based on centroidal factor of region followed by concatenation of it with centroid and followed by dilation operation of morphology in order to obtain Dim and bw values. Implementation of the function Ω leads to the formation of probability map Probmap, which is checked for its cut-off value in order to identify the outliers object (Line-11 to Line-13). This calculation is finally followed by estimation of accuracy performance in later stages. A closer look into the algorithm formation will show that proposed system uses unsupervised learning algorithm in order to address the uncertainty problems associated with detection of patterns exhibited by anomaly detection. Therefore, applying the approach of multivariate analysis over the matrix decomposition, the proposed algorithm ensures the presence of positive elements always depicting case of non-outliers. Presence of any form on non-positive elements is the only possible case of outliers. Therefore, the proposed system is easier to implement and requires lesser number of iteration in order to reach its convergence phase. Another significant advantage of this algorithm is its clustering characteristics that involuntarily perform clustering of the columnar data of the vector v. In short, the proposed system harnesses the potential of sparsity-based matrix decomposition concept using multivariate analysis to obtain unique numerical features that have enough information to assist in identification of outliers. The matrix holds enough information without causing any forms of degradation to system performance. Another uniqueness of this algorithm is lesser variable dependencies will also reduce lesser resource dependencies with faster computational time.
The next section discusses about the results being obtained.

RESULT ANALYSIS
The implementation of the algorithms for achieving the research goal of outlier detection has been carried out using pedestrian data of UCSD on Matlab. The dataset consists of approximately 6800 image sequences where various pedestrians are found to be walking in different directions on the given path. The anomaly object will be an individual cycling along the path of a pedestrian. Similarly, the test-images are four times the number of train images along with ground truth images. The visual outcomes of the unsupervised training operation are as shown below in Table 1.  Table 1 shows the sample visual outcomes of 1 st , 50 th , and 100 th frame for 1 st , 10 th , 20 th , and 30 th training dataset. The complete training of 6800 images took approximately 0.7621 seconds in the core-i5 processor along with the extraction of the features. While performing the training, all the features for any mobile objects are extracted and subjected to the next phase of algorithm implementation, i.e., testing. The complete training is carried out considering parameter of sparse regularization of 10 with a block size of 5x5. The possible challenging situation is to ensure the detection of anomaly object, i.e., a cyclist or a person moving with cart, etc. This is challenging as the dataset has different numbers of objects (person) moving at different speed, in a different direction, with different mobility patterns, where no specific pattern can be drawn from this. Therefore, this case study exactly matches with the real-time scenario. However, the proposed system solves this problem by undertaking the blocking operation that allows explicit extraction of 1099 a feature from specific blocks of the image. This operation assists in the significant formulation of multivariate analysis of different coefficients extracted from the object in such a linear pattern that the decomposed matrix, as well as original matrices, do have only positive elements. This concept used in training has one dominant advantage, i.e., identification of any anomaly object during the training is maintained in a different matrix which bears separate index of negative elements. Hence, the system proposes a supermatrix, where one matrix holds only non-anomalous information, where other holds only indexes of the cell position mapping with anomalous information. This concept of unsupervised training not only trains faster but also offer enhanced accuracy performance with 90% of memory efficiency as the matrix only stores the indices of anomalous objects. For better inference of the study outcome, the proposed system is also compared with one of the most relevant studies by Cong et al. [44] as shown in Figure 2. The work carried out by Cong et al. [44] have worked towards addressing a similar problem, i.e., detection of the anomaly from the video for assisting in event detection system. However, the authors have used the segmentation-based approach on the similar database. However, we hypothetically compare the theoretical outcomes of the existing system with a proposed system concerning conventional accuracy parameters, e.g., recall, precision, specificity, and F1-Score to find that proposed system offers better accuracy in identification of outliers concerning existing system.

CONCLUSION
This paper presents a novel framework that emphasizes on the contextual information of a scene. As a scene can have multiple numbers of heterogeneous contexts, hence we apply multivariate analysis to perform matrix decomposition. The presented technique significantly assists in identifying an outlier that is also capable of extracting all the contextual features. An algorithm is designed for blocking operation, unsupervised learning using sparsity factor, and finally, perform identification of the objects.