Online Reweighted Least Squares Robust PCA

The letter deals with the problem known as robust principal component analysis (RPCA), that is, the decomposition of a data matrix as the sum of a low-rank matrix component and a sparse matrix component. After expressing the low-rank matrix component in factorized form, we develop a novel online RPCA algorithm that is based entirely on reweighted least squares recursions and is appropriate for sequential data processing. The proposed algorithm is fast, memory optimal and, as corroborated by indicative empirical results on simulated data and a video processing application, competitive to the state-of-the-art in terms of estimation performance.

It is well known that in RPCA the data signal is commonly decomposed as the sum of a low-rank term, a sparse term and a statistically modeled noise term, i.e., with Y, L, S, E ∈ R m×n . The columns of matrix L span a lowdimensional subspace and only a few of the entries of S have non-negligible values. The entries of E are usually assumed to be zero-mean Gaussian. The classical approach to obtain both L and S from Y is by solving a minimization problem, whose objective function consists of a data proximity term (based on the statistics of E) and two regularization terms that promote low-rankness and sparsity to L and S respectively. In most cases, the data matrix Y is processed as a whole, leading to batch iterative decomposition algorithms. Nevertheless, in modern big data applications, data sequences may be long or data may arrive in a streaming fashion, as, e.g., in video processing applications. Apparently, in such cases a batch processing algorithm would be prone to fail, due to high computational complexity, latency and large memory demands. It should be noted that, although batch solvers for RPCA have been extensively investigated (see [2] and the references therein), the studies on online RPCA schemes are rather limited. In this regard, the algorithms presented in [15] and [16] (which, among others, come with complete correctness guarantees [2]) can be considered as the state-of-the-art. Other online RPCA schemes, such as those presented in [17], [18] and [19], albeit fast and memory efficient, show mediocre to poor performance [2], [16].
Capitalizing on our previous work on low-rank matrix factorization [20], [21], we present a new online RPCA algorithm that is based entirely on reweighted least squares (LS) time recursions. In a batch setting, iteratively reweighted least squares (IRLS) has been successfully utilized in [22] to promote sparsity and extended in [20] for low-rank matrix factorization. In the present letter, we suitably combine these two schemes in a unique RPCA solver. More specifically, after expressing the matrix L as the product of two low-rank matrices, we define a new objective function, in which the matrix factors of L are penalized via their reweighted Frobenius norms as in [20] and the sparse component S is penalized using reweighted LS as in [22]. The respective minimization problem is then solved via alternating IRLS, leading to a batch RPCA algorithm that comprises efficient matrix-matrix operations only. Moving one step further, we reformulate the initial objective function described above by adjusting it to the online scenario that allows the processing of data on a matrix column-by-matrix column basis. The solution of this minimization problem gives rise to an online reweighted LS RPCA algorithm, in which all sought quantities are computed time-recursively. The proposed algorithm is very fast and turns out to be memory optimal, i.e., has the minimum possible memory requirements. Indicative empirical results verify the high computational efficiency of the new algorithm and demonstrate that it may be competitive, in terms of estimation performance, to the state-of-the-art NORST algorithm [15], on both simulated data and real video foreground-background separation tasks.
Notations: Matrices are denoted by bold capital letters, e.g., X. We adopt Matlab notation for matrix entries, matrix rows and marix columns, i.e., X(i, j) is the ijth element of X and X(i, :), X(:, j) are the ith row and jth column of X respectively. The symbols and are used for entry-wise multiplication and division respectively, transposition is denoted by · T , · 2 , · F denote the 2 and the Frobenius norms, σ i (X) is the ith singular value of matrix X and 1 is the all ones vector or matrix depending on the context. Finally, R + is the set of non-negative real numbers.

II. PROBLEM FORMULATION AND BATCH IRLS RPCA
Using a matrix factorization (MF) representation for the lowrank component L, i.e., L = UV T , (1) is rewritten as where U ∈ R m×d , V ∈ R n×d and d m, n. Note that due to the last inequality, MF implicitly imposes low rank to L, although d may still be much greater than its true rank. In (2), the columns of U span a low dimensional subspace where the columns of L lie and V contains the representation coefficients in this subspace. The estimation of U, V, S given Y is an ill-posed problem that can only be tackled by imposing certain constraints to the sought matrix variables. In that regard, we propose the following minimization problem, which falls entirely into the IRLS framework where λ, μ are regularization parameters. The weighting matrix D is a diagonal matrix defined as [20] while the ijth weight element of W is expressed as where is a very small positive constant that provides protection against singularities. The first term in the objective function of (3) is a model fitting term, the second term promotes the low-rankness of L = UV T and the third term induces sparsity on S. While the latter is very well known from the sparse vector recovery literature [22], the low-rank imposition properties of the second term have not been established yet in a rigorous way. In this vein, an enlightening result is presented and proved next [23].
Proof: The function f (x) = x p with 0 < p < 1 is concave, non-decreasing on [0, ∞) and f (0) = 0. Then by defining from Theorem 1 and Corollary 1 of [24] we get directly (6). Notice that (6) generalizes the expressions for the variational form of the nuclear norm [25], which are obtained by setting p = 1 in (6) (recall that the nulcear norm is the Schatten-1 norm). While the nuclear norm is equivalent to the 1 norm on the singular values of L, the Schatten-p norm is equivalent to the p norm with 0 < p < 1 and thus is more effective in imposing low rank. If, now, is neglected in (4) we easily get From (6) and (7) we see that the second term in (3) is an upper bound of the Schatten-1/2 norm and thus is expected to impose low rank to L, as desired. 1 An alternative interpretation is given in [20] by recognizing (7) as the 1,2 of the matrix [ U V ]. Due to the column sparsity promoting property of the 1,2 norm [27], low rank may be enforced on L via the joint elimination of corresponding columns of U and V [20].
In our problem, both sparsity and low-rankness may be imposed via IRLS with the aid of the weighting matrices D and W. In this regard, by minimizing the respective objective function in (3) alternatingly with respect to U, V and S (and assuming that D and W depend on the estimates of U, V and S of the previous iteration k) we end up with the batch IRLS RPCA algorithm tabulated as Algorithm 1. In analogy to our previous work [20], Algorithm 1 can be also derived following a block successive upper bound minimization (BSUM) procedure [28] with blocks the matrices U, V and S. The upper bound functions minimized for each block satisfy the conditions of BSUM and minimization leads in all three cases to closed-form solutions. Thus, according to Theorem 1 of [28], the resulting algorithm converges to stationary points of the objective function of (3).

III. ONLINE REWEIGHTED LS RPCA ALGORITHM
As mentioned previously, in many applications (e.g., online video processing) data may become available sequentially on a frame-by-frame or block-by-block basis. In our setting, this means that we might need to process the columns of the data matrix Y sequentially in an online fashion. In such an online scenario, given Y(:, n) at time n, we must be able to estimate S(:, n), update the fixed size (m × d) subspace matrix estimate U n−1 to U n and compute V(n, :), the new row of V. In addition, a "forgetting" mechanism should be included to account for possible time-variations of the data subspace. Based on the above, we may define the following minimization problem for each time iteration n, min U n ,V(n,:),S(:,n) Finally, U n , the subspace data matrix estimate at time n, is obtained from (8) as follows, By defining T n = (Y n − S n )Ξ n V n and P n = V T n Ξ n V n we easily derive the following efficient recursive equations T n = ξT n−1 + (Y(:, n) − S(:, n))V(n, :), P n = ξP n−1 + V(n, :) T V(n, :).
By arranging the above equations properly and adjusting in a reasonable way the time indexes of the various quantities, we obtain the online RPCA algorithm tabulated as Algorithm 2.
The main features of the proposed online reweighted LS RPCA algorithm are summarized in the following remarks. 1) To estimate the sparse vector S(:, n) the first two steps of Algorithm 2 are repeated K times for each time iteration n. As verified in our experiments a value of K = 2 or 3 may be sufficient in most cases. 2) To compute D n we do not need to store the whole matrix V n (whose size increases with time). It can be shown that the squared Euclidean norms of the columns of V n required in (4)  per iteration (basically operations required to compute V(n, :) and U n ). According to the information provided in [2], it can be considered as one of the fastest online RPCA algorithms. 5) As verified in our experiments, the new online algorithm converges fast starting with a random subspace initialization, i.e., 'good' initial subspace estimates are not necessary for the algorithm to work properly, as e.g., in [15] and [16]. 6) The algorithm uses two regularization parameters λ and μ, which can be selected via cross-validation. We have observed though that (possibly) due to its IRLS nature, the proposed scheme seems to have low sensitivity to λ, μ. The effectiveness and high computational efficiency of the proposed algorithm are corroborated in the next section, where some indicative empirical results are presented.

IV. NUMERICAL SIMULATIONS
The proposed algorithm is compared with the state-of-the-art NORST algorithm, reported in [15]. As is shown in [2], NORST outperforms other state-of-the-art algorithms, while it also enjoys theoretical guarantees regarding its performance. NORST's Matlab code has been provided by the authors of [15], whom we would like to thank.

A. Simulated Data Experiment
In this experiment we aim at testing the performance of the proposed algorithm when it comes to the recovery accuracy of both the low-rank and the sparse components. The simulated low-rank component L ∈ R 800×5000 + take values from a uniform distribution defined in the interval [0 1]. As a result, by construction, the first 2000 columns of L lie in a 10-dimensional subspace and its last 3000 columns in a 8-dimensional subspace. We simulate a sparse (outliers) matrix S ∈ R 800×5000 + having 5% nonzero entries randomly distributed in the matrix, which take values from a uniform distribution defined in the interval [0 20]. Independent identically distributed zero mean Gaussian noise with variance σ 2 = 10 −2 is also added to the data, which thus obey the model given in (1). The tested online algorithms process the resulting data matrix Y column-by-column.
The proposed algorithm is initialized randomly and its parameters are selected as λ = 200 and μ = 0.5. The forgetting factor ξ is set to 0.99. NORST algorithm is initialized as described in [15]. The algorithms are assessed in terms of the normalized mean squared error (NMSE) between the true and the estimated columns of both their low-rank and sparse components. These measures are computed as the average of the respective results of 100 independent realizations of the algorithms and are depicted in Fig. 1. We observe that the proposed online reweighted LS RPCA algorithm has a lower NMSE than NORST regarding the recovery of both the low-rank and the sparse terms of the data matrix. Both algorithms are able to detect the change of the low-rank data subspace dimension at time iteration 2000, but again Algorithm 2 converges to a lower NMSE. It should be noted that Algorithm 2 starts with an overestimate (d = 20) of the data subspace dimension and as it evolves sequentially, reduces it first to 10 and after time iteration 2000, to 8, thus providing a correct estimation of the true ranks of L 1 and L 2 . In addition, in this specific experiment, Algorithm 2 processes on average 658 matrix columns/sec and NORST 151 columns/sec, which verifies the computational efficiency of the proposed method.

B. Real Data Experiments
In this section we validate the effectiveness of the proposed algorithm on the video foreground-background separation problem. To this end, two different popular datasets are utilized i.e., the Meeting Room dataset and the Lobby dataset, described next. In this experiment the convex batch PCP algorithm of [1] is used also for comparison purposes. It should be noted that for the NORST algorithm, we set its parameters as suggested by the authors in [15] and the regularization parameter of PCP has been fine-tuned. Moreover, the subspace matrix for the NORST algorithm has been previously initialized by adopting the same scheme as in [15]. In the case of the proposed online reweighted LS RPCA algorithm, we initialize the rank of the background to 5, the low-rank regularization parameter λ is set to 640 and the sparsity promoting parameter μ is fixed to 0.1. Additionally, ξ = 0.99 and K = 3 inner iterations are performed to estimate the sparse foreground frames. The same setting was followed for the experiments conducted on both real datasets.
1) Meeting Room Dataset: This video dataset consists of 2964 grayscale images (frames) of spatial resolution 64 × 80. Since the first 1755 video frames contain no "outliers," we consider only the last 1209 frames. In this part of the video, after the first 400 frames, a man wearing a white shirt is getting in the room. Since the color of the curtain in the room is the same with the color of the man's shirt, the foreground-background separation task becomes quite challenging. As it can be observed  in Fig. 2, the proposed reweighted LS RPCA algorithm offers comparable background estimation results to those of NORST, though it has been randomly initialized. Both algorithms, outperform PCP. PCP being a batch algorithm fails to capture the dynamic changes of the background. In addition, Algorithm 2 processes 344 frames/sec and hence is much faster that NORST, which achieves a rate of 16 frames/sec.
2) Lobby Dataset: This dataset consists of 1555 frames of 128 × 160 pixels. Background estimation results of NORST, Algorithm 2 and PCP for three frames of this dataset are shown in Fig. 3. We see that, especially in video frames 350 and 610, Algorithm 2 is able to remove foreground information more reliably than NORST and PCP. This is achieved by processing the video sequence much faster with a rate of 277 frames/sec, as compared to the 15 frames/sec rate of NORST. The above results regarding runtime complexity indicate that the proposed algorithm might be used for near real-time video processing.

V. CONCLUSION
In this letter a novel online RPCA algorithm has been presented. The main characteristic of the proposed algorithm is that both the factors of the low-rank component and the sparse component of the data matrix are effectively updated via a reweighted LS optimization mechanism. The new online estimation algorithm exhibits high computational and memory efficiency without compromising performance. A theoretical convergence and correctness analysis of the algorithm is under current investigation.