Block sparse vector recovery via weighted generalized range space property

In block sparse vector recovery problems we are interested in finding the vector with the least number of active blocks that best describes the observation. The convex relaxation of that problem, typically used to reduce complexity, is strictly equivalent with the original problem only when certain conditions are met, such as Restricted Isometry Property, Null Space Characterization, and Block Mutual Coherence. In practice, those conditions may not be satisfied, which implies that solving the relaxed problem may not retrieve the block sparsest solution. In this paper, we propose a weighted approach, which, in the noise free case and under certain conditions guarantees that the relaxed problem solution has the same support as the sparsest block vector. The weights can be obtained based on a low resolution estimate of the group sparse signal.


I. INTRODUCTION
Compressed Sensing (CS) and Sparse Signal Recovery emerge in many signal processing applications, including biomedical imaging [1], [5], [13], [15], [19], and radar [3], [4], [11], [14], [20]. In sparse signal recovery, we are interested in finding the best possible representation for the observation vector using a vector with the smallest number of non-zero entries. Mathematically, this can be represented as where the 0 -norm, . 0 , represents the number of non-zero entries in a vector, x ∈ R n×1 is the minimization variable, A ∈ R r×n is the dictionary matrix, and y ∈ R r×1 is the observation vector. It has been shown that (1) is an NP-hard problem [16]. To tackle the complexity associated with the 0 -norm problem, a relaxed convex 1 -norm approximation is often used to find a sparse solution. The mathematical model of the relaxed problem can be written as The problems (PL0) and (PL1) are said to be strictly equivalent if they both have a unique solution and the two solutions coincide [21]. When the problem (PL0) has multiple solutions, and the solution of (PL1) coincides with one of the solutions of (PL0), then we say that (PL0) and (PL1) are equivalent [21]. (PL0) and (PL1) are strictly equivalent when certain conditions are met, such as the Restricted Isometry Property (RIP) [6], the Null Space Property (NSP) [7], the Mutual Coherence [8], or Work supported in part by NSF under grant NSF ECCS 1408437.
the Range Space Property (RSP) of order K [21]. In practice however, those conditions may not be met, which means that the least 1 -norm solution and the least 0 -norm solution are not the same. In [1], a weighted approach is proposed for recovering the support of the sparsest solution in cases in which the dictionary matrix exhibits high coherence. Also, in [12], the optimal choice of the weights for the weighted approach, such that minimum amount of measurements is needed for exact recovery using the location of the support of the signal is discussed.
In some applications, it is known in advance that the nonzero entries of the underlying sparse vector occur in groups, a properties known as block sparsity. In block sparse signal recovery problems, we are interested in finding the vector with the least number of non-zero blocks that explains the observed vector. If we let m represent the number of groups in x, the block sparsest vector estimation problem can be written as [10] (PG0) min where (PG0) is hard to solve, so its convex relaxation is often considered, which consists of finding a vector with the smallest sum of the blocks 2 -norm , i.e., In general, (3) and (4) are not the same. Several works have provided conditions for strict equivalence between (PG0) and (PG1). Those include the generalization of the RIP condition [10], the Null Space Characterization of [18] and the generalization of the Mutual Coherence [9]. In [2], a generalized RSP (GRSP) is proposed for group sparse under-determined systems, where a set of sufficient and necessary conditions for a sparse vector x to be a solution to the problem of (4) are proposed.
In this paper, we propose a weighted approach to address the cases in which strict equivalence conditions may not be satisfied. We show that by multiplying the sensing matrix with a diagonal matrix W, we transform the problem into a problem that satisfies the GRSP, and provide the conditions so that the transformed problem has the same support as the sparsest block vector. By multiplying by the weighting matrix W, the range space of A is rotated in such as way that solving (PG1) favors the underlying block sparse vector.
The paper is organized as follow. In Section II, we discuss the background theory that related to the proposed approach. Section III introduced the proposed approach in noise free and noisy observation cases, while Section IV provides conclusion remarks.

II. BACKGROUND THEORY
The GRSP was proposed in [2]. The conditions for equivalence between (PG0) and (PG1) are stated in the following theorem.
x s is also a unique solution to problem (PG1) if and only if there is a vector u * ∈ R(A T ) such that where R(.) represents the range sapce of a matrix. A sufficient condition for (6) to have a unique solution was also provided in [2].

III. THE PROPOSED APPROACH A. Noise Free Group Sparse Vectors
By substituting t i = x i 2 , (PG1) can be recast as Second Order Cone Program (SOCP), i.e., [10] (PG2) One can see that the problems in (4) and (6) are equivalent in the sense that the optimal solution for both problems is the same. Suppose that the strict equivalence conditions for (PG0) and (PG1) do not hold. Below, we show that by multiplying the sensing matrix with a diagonal matrix W, we transform the problem into a problem that satisfies the conditions in Theorem 1. We provide a sufficient condition for the weighted problem to satisfy the GRSP. In the following, S represents the support of x s (i.e., the indices of the active groups),S = {1, 2, ..., m} | S the complement of S.
for all i ∈S, then x s is the solution to the problem (4), where AS i , for i ∈S, is the collection of columns in A associated with non-active blocks in x s , A S is the concatenation of the blocks in A that are associated with active blocks in x s , u is a vector with u Si = xsi xsi 2 for i ∈ S, and uS i 2 < 1 for i ∈S .
Proof: We will prove the contrapositive of this theorem, i.e., we will show that if the GRSP conditions of Theorem 1 are not satisfied, then Let A T v = u, and suppose that the first condition of (5) is satisfied, but the second condition is not satisfied. Then we have On solving for v, and substituting the solution in (8), we get Revisiting the original problem in (3), we have Let x = Wq, where W is a diagonal weight matrix with the following structure, W = w ⊗ I k×k , where w is a diagonal matrix that contains the weights, ⊗ represents the Kronecker product, I k×k is the identity matrix with size k × k, with k representing the group size. Based on its structure, W assigned the same weight to all elements of a block. Then, (10) can be rewritten as Since W is non-zero at the support of x, (11) can be rewritten as The problem of (12) is NP-hard. Its convex relaxation can be solved instead by replacing the indication function with the sum of the active groups energies of q, i.e., Suppose that the block sparsest solution of (PG1) does not satisfy the GRSP conditions in Theorem 1. We can manipulate W such that the solution to (13) satisfies the condition on a vector that has the same support of the sparsest block sparse solution of (PG1).
To make the support of the solution of (12) coincide with that of the sparsest vector x, we have to choose W such that where WS i are the diagonal elements in W that associated with i ∈S, and W S is a diagonal matrix composed of diagonal sub-matrices W Si for i ∈ S. It is easy to show that by assigning high values to the w i that corresponds to active groups, and low values to the w i that corresponds to nonactive groups, (14) is satisfied, and the support of the solution of (13) is the same as the support of sparsest solution.

B. The Noisy Case
In the case of noisy observations, we choose to minimize the tradeoff between the sparsity of the solution and the fitting error. The minimization problem can be written as By setting t i = x i 2 , ∀i ∈ {1, 2, ..., m}, and v = y − Ax, we can write (15) as The dual of (16) is Now, we will provide the conditions for the sparsest block sparse vector to be the solution to (15). Those conditions are stated in the following theorem. In the following, we will assume that the system in (15) has a unique solution, and Slater's and strict complementary slackness conditions are satisfied.
Theorem 3 x * is a solution to the system: hΣ m i=1 x i 2 + y − Ax 2 if and only if there is a u * ∈ R(A T ) such that Proof: First, we will prove the necessary condition, i.e., if x * is a solution to (15), then there is u ∈ R(A T ) such that Now, consider the non-zero entries in (15). We have Differentiating (20) with respect to x, and equating to zero, we get which implies that For zero blocks in x * , let t * , v * , and x * be the solution of (16), and α * , λ * 1 and λ * 2 be the solution for (17). From the strict complementary property, we should have t * i + λ * 1i > 0, which implies λ * 1i > 0 when t * i = x * i 2 = 0. From the second and third constraints of (17), we have From Slater's condition, we have We can see that α * = y−Ax * y−Ax * 2 is a solution to (24), since it satisfies (24) and the constraints in (17). So, indeed there is a vector u * ∈ R(A T ) such that it satisfies the conditions in (19). Now, we will provide the proof for the sufficient condition, i.e., if there is a u * that satisfies the conditions in (19), then x * is the solution to (15). Assume thatx = x * is the solution to (15), then we should have (25) From the necessary condition of this theorem, ifx is a solution to (15), then there should beα = y−Ax y−Ax 2 such that Assume thatα,λ 1 , andλ 2 are the dual solution set. The dual problem should attend its maximum atα,λ 1 , andλ 2 , i.e., (29) which contradict the first assumption, i.e., hΣ m i=1 For the weighted problem, the problem can be restated as where W = w ⊗ I k×x . The conditions of Theorem 7 can be easily modified to include the weights as follows where w i is the i th element of w, which represents the weight associated to i th block Now, we will provide a theorem in which, if w i is less that a specific value, the corresponding group will be non-active.
, then q i =0, where λ max (G) is the largest eigenvalue of the matrix G.
Proof: According to the second condition in (31), w i A T i α 2 < h when q i =0, which can be rewritten as We have α T (A i A T i )α ∈ [λ max , λ min ], and the maximum achievable value is λ max . On substituting λ max in the above equation, we get We can see form Theorem 4 that assigning low values to the weights that correspond to non-active blocks guarantees that these blocks will be non-active in the estimated vector. For instance, if we assign low values to w i that are associated to the non-active blocks, such that (33) is satisfied for all nonactive blocks, and assign high values to w i that are associated with active blocks, such that (33) is not satisfied, solving (15) will retrieve a vector with the same support as the underlying sparse vector. Since we do not know the real support of the underlying block sparse vector, we propose to use a low resolution estimate to reconstruct the weighting matrix.

IV. SIMULATION RESULTS
To evaluate the performance of the proposed approach, we test our approach on Synthetic Aparture Radar (SAR). To simulate the block sparsity scenario, we consider the case in which the target is composed of two adjacent pixels in the scene. We adopt the system that was used in [3] to simulate the sensing matrix A, and following the assumption that the reflectivity of the targets does not depend on the observation angle, the system can be described as a linear system of the form y = Ax. The simulation parameters that were used to construct the sensing matrix A are as shown in Table (1). The ground patch used in the simulation is 40 m wide and 60 m long, and the scene is uniformly sampled on a grid with spacing 0.5 m. The distance between the antenna and the ground patch center is 1050 m. The diagonal weighting matrix W that is used in the proposed approach is constructed by assigning to the diagonal of W a low resolutio estimate based on Spatial Frequency Interpolation [17]. Fig. 1-(a) shows two targets inside the scene of interest, while Fig. 1-(b) shows the spatial frequency interpolation. It is clear from 1-(b) that the frequency spatial interpolation provides a rough estimate of the targets locations. Fig. 1-(c) shows the estimated targets using the estimation in Fig. 1-(b) as a weighting matrix in the proposed approach, while Fig. 1-(d) shows the estimation of the non-weighted approach. One can see from Fig. 1-(c) and 1-(d) that the non-weighted approach fails to estimate the actual targets, while the weighted approach estimates the targets locations correctly.
Next, we conduct Monte Carlo simulations to test the performance of the weighed approach as compared to the nonweighted approach for noise free and additive white Gaussian noise. 100 Monte Carlo trials are performed. In each trial, n block sources are randomly distributed around the scene, and a low resolution estimate (w) is constructed based on the estimation result of spatial frequency interpolation to be used in the weighing approach. The performance metric is the success rate; we claim success when the indices of the n largest blocks of the estimated source coincide with the actual group indices of the actual vector. Fig. 2 shows the performance of the proposed approach versus the non-weighted approach with the increase of the number of active blocks in the actual source, and block size of 3. One can see that the non-weighted approach degrades rapidly with the increase of the number of active blocks, while the proposed approach shows significantly better performance. Fig. 3 shows the performance of the proposed approach and that of the non-weighted approach for different SNRs, and block size of 2. One can see that the non-weighted approach performance is poor even at high SNR, while the proposed approach shows good performance for SNR above 5 dB.

V. CONCLUSIONS
In this paper, a weighted approach has been proposed to solve for the block sparsest vector in scenarios when the strict equivalence conditions may not hold. Simulation results have shown improved performance as compared to the nonweighted approach.