Nonparametric simultaneous sparse recovery: An application to source localization

We consider multichannel sparse recovery problem where the objective is to find good recovery of jointly sparse unknown signal vectors from the given multiple measurement vectors which are different linear combinations of the same known elementary vectors. Many popular greedy or convex algorithms perform poorly under non-Gaussian heavy-tailed noise conditions or in the face of outliers. In this paper, we propose the usage of mixed ℓp, q norms on data fidelity (residual matrix) term and the conventional ℓ0,2-norm constraint on the signal matrix to promote row-sparsity. We devise a greedy pursuit algorithm based on simultaneous normalized iterative hard thresholding (SNIHT) algorithm. Simulation studies highlight the effectiveness of the proposed approaches to cope with different noise environments (i.i.d., row i.i.d, etc) and outliers. Usefulness of the methods are illustrated in source localization application with sensor arrays.


INTRODUCTION
In the multiple measurement vector (MMV) model, a single measurement matrix is utilized to obtain multiple measurement vectors, i.e., y i = Φx i + e i , i = 1, . . ., Q where Φ is M × N measurement matrix and e i are the (unobserved) random noise vectors.Typically there are more column vectors φ i than row vectors φ (j) , i.e., M < N .It is still possible to recover the unknown signal vectors x i , i = 1, . . ., Q by assuming that signals are sparse, i.e., some of the elements are zero.In matrix form, the MMV model reads where C M×Q collect the measurement, the signal and the error vectors, respectively.When Q = 1, the model reduces to standard compressed sensing (CS) model [5].Then, rather than recovering the sparse/compressible target signals x i separately using standard CS reconstruction algorithms, one attempts to simultaneously (jointly) recover all signals.The key assumption for the success of this scheme is that the target signals are all predominantly supported on a common support set, i.e., signal matrix X is K rowsparse.Joint estimation can lead both to computational advantages and increased reconstruction accuracy; See [1][2][3][4][5][6].The objective of multichannel sparse recovery problem is finding a row sparse approximation of the signal matrix X based on knowledge of Y, the measurement matrix Φ and the sparsity level K.Such a problems arises for example when multichannel recordings of a sparse signal are contaminated by additive noise.Applications are many and include electroencephalography and magnetoencephalography (EEG/MEG) [5] and direction-of-arrival (DOA) estimation of sources in array processing [7].
Most greedy CS reconstruction algorithms have been extended for solving MMV problems.These methods, such as simultaneous normalized iterative hard thresholding (SNIHT) algorithm [6] are guaranteed to perform very well provided that suitable conditions (e.g., incoherence of Φ and non impulsive noise conditions) are met.The derived (worst case) recovery bounds depend linearly on E 2 , so the methods are not guaranteed to provide accurate reconstruction/approximation under heavy-tailed non-Gaussian noise.In this paper, we consider different ℓ p,q mixed norms on data fidelity (residual matrix) and devise a greedy SNIHT algorithm for obtaining a sparse solution.We focus on mixed ℓ 1 norms which provide robust solutions.As will be shown in the sequel, these methods are then based on spatial signs [8] of the residuals and are hence non-parametric in nature.
The paper is organized as follows.In Section 2 we formulate a mixed-norm constrained objective function for the MMV problem and motivate the usage of ℓ 1 -norm or the mixed ℓ 2,1 -and ℓ 1,2 -norms.In Section 3 we formulate the greedy SNIHT algorithm for the problem.Section 4 provides simulation examples illustrating the improved accuracy of the proposed methods in various noise conditions and signal to noise (SNR) settings.Our examples illustrate that joint sparse recovery is superior to applying standard sparse reconstruction methods to each channel individually.Finally, effectiveness of the methods are illustrated in source localization application with sensor arrays in Section 5.
By A Γ (resp.A (Γ) ), we denote the M × K (resp.K × N ) matrix restricted to the columns (resp.rows) of M ×N matrix A indexed by the set Γ.The row-support of X ∈ C N ×Q is the index set of rows containing non-zero elements: x ij = 0 for some j}.
For p, q ∈ [1, ∞), the mixed ℓ p,q norm [9] of X ∈ C N ×Q is defined by The mixed norms generalize the usual matrix p-norms: if p = q, then X p,p = X p .The ℓ 2 -norm • 2 is called the Frobenius norm and will be denoted shortly as • .In the same spirit, the usual Euclidean norm on vectors is denoted shortly as • .The row-ℓ 0 quasi-norm of a signal matrix X is the number of nonzero rows, i.e., X 0 = | rsupp(X)|, where |Γ| denotes the cardinality of the set Γ.The matrix X is then said to be K-rowsparse (or shortly as sparse) if X 0 ≤ K.We use H K (•) to denote the hard thresholding operator: for a matrix X ∈ C N ×Q , H K (X) retains the elements of the K rows of X that possess largest ℓ 2 -norms and set elements of the other rows to zero.We use a shorthand notation X| Γ as sparsified version of X that the entries in the rows indexed by set Γ remain unchanged while all other rows of have all entries set to 0.

ROBUST MIXED NORM MINIMIZATION
Our objective is now to recover K-sparse X in the model (1).
For this pupose, we consider the following constrained optimization problem: subject to X 0 ≤ K. (P p,q ) For p = q, the problem reduces to conventional ℓ p -norm minimization of the residual matrix R = Y − ΦX under rowsparsity constraint on X.The well-known problem with ℓ 2norm minimization is that it gives a very small weight on small residuals and a strong weight on large residuals, implying that even a single large outlier can have a large influence on the obtained solution.For robustness, one should utilize ℓ 1 in mixed norms since it gives larger weights on small residuals and less weight on large residuals.In this paper we consider (P p,q ) in the cases that p = 1 and/or q = 1.The problem (P p,q ) is combinatorial (NP-hard).Hence suboptimal reduced complexity reconstruction algorithms have been proposed.These can be roughly divided into two classes: convex-relaxation algorithms (e.g., [2,7,9]) and greedy pursuit (e.g., [1,6]) algorithms.In this paper devise greedy simultaneous NIHT (SNIHT) algorithm for the problems (P 1,1 ) and (P 2,1 ).The case (P 1,2 ) is excluded due to the lack of space, but our approach and discussion straightforwardly extends for this mixed ℓ 1 norm as well.
In (P 1,1 ) problem, one aims to minimize x j | under sparsity constraint, so the solution can be viewed as a sparse multivariate least absolute deviation (LAD) regression estimator.The LAD regression (in the real-valued overdetermined linear regression) is wellknown to offer robust solution with bounded influence function.In the complex case, this approach can be considered optimal when the error terms e ij are i.i.d. with (circular) complex generalized Gaussian (GG) distribution [10, Example 4] with exponent s = 1/2.It is important to realize that minimization of ℓ p -norms in (P p,p ) implicitly assumes i.i.d.'ness of the error terms.Since the measurement matrix Y is in many applications a space × time matrix as in medical imaging or sensor array applications, the i.i.d.assumption of the error terms in time/space is often not valid.Indeed the benefit of mixed ℓ 1 -norms, such as ℓ 2,1 and ℓ 1,2 considered here is that they introduce couplings [9] between the coefficients and offer robustness in case of dependent heavy-tailed errors or outliers.When the errors terms have dependencies in time and/or space, then ℓ 2,1 and ℓ 1,2 minimization can offer advantages over ℓ 1 or ℓ 2 norm approaches.As will be shown later, the usage of ℓ 1 -norm or the mixed ℓ 1 -norms lead to nonparametric approaches that are based on the concept of spatial sign function [8] which in the scalar case (x ∈ C) is defined as sign(x) = x/|x|, for x = 0 0, for x = 0 (2) In the vector case, sign(x) = x −1 x, = 0 for x = 0, = 0.

MIXED NORM SNIHT ALGORITHM
Iterative hard thresholding is a projected gradient descent method that is known to offer efficient and scalable solution for K-sparse approximation problem [11].The normalized IHT (NIHT) method updates the estimate of X by taking steps towards the direction of the negative gradient followed by projection onto the constrained space.In our multichannel sparse recovery problem, at (n + 1)th iteration the SNIHT update is where ψ p,q (X) = ∇ X * X q p,q is the complex matrix derivative [12] with respect to (w.r.t.) X * and µ n+1 > 0 is the stepsize for the current iteration.For ℓ 2 -and ℓ 1 -norms the derivatives are easily shown to be ψ 2,2 (X) = X and ψ 1,1 (X) = sign(X) respectively, where notation sign(X) refers to element-wise application of the spatial sign function (2), i.e., [sign(X)] ij = sign(x ij ).For (2, 1) mixed norm, we obtain that is, the spatial sign function is applied row-wise.For (1, 2) mixed norm, ψ 1,2 (X) is a matrix in which spatial sign function is applied columnwise.Table 1 provides the pseudocode of the greedy SNIHT algorithm for the problem (P p,q ), which we call SNIHT(p, q) algorithm for short.Note that SNIHT(2, 2) corresponds to the conventional SNIHT studied in [6] and in [11] for Q = 1 case.
We now describe the CompStepsize function which computes the stepsize update µ n+1 in Step 4. Following the approach in [11], assuming that we have identified the correct support at nth iteration, then we may look for a stepsize update µ n+1 as the minimizer of Y − ΦX q p,q for the gradient ascent direction X n + µG n | Γ n .Thus we find µ > 0 as the minimizer of the convex function where R n = Y − ΦX n and B = Φ Γ n G (Γ n ) When p = q this reduces to minimizing a simple linear regression estimation problem, min µ r − µb p p , where the response is r = vec(R n ) and the predictor is b = vec(B).Thus when using p = q = 2 as in conventional SNIHT [6], the minimizer of (3) is easily found to be However, for the robust estimators that we are interested in, i.e., when using (p, q) = (1, 1) and (p, q) = (2, 1), a minimizer of (3) can not be found in closed-form.In the (p, q) = (1, 1) case, it is easy to show that the solution µ verfies the following fixed point (FP) equation µ = H(µ), where and R = R n − µB = (r ij ) depends also on the unknown µ.Then, instead of choosing the next update µ n+1 as the minimizer of (3) which could be found by running the FP iterations µ i = H(µ i+1 ) for i = 0, 1, . . .until convergence (with initial value µ 0 > 0), we use a 1-step FP iterate which corresponds to a single iteration with initial value of iteration given by the previous stepsize µ n .In other words, in Step 4, we set µ n+1 = H(µ n ).In simulation studies we noticed that this 1-step FP iterate gave a very good approximation of the true solution (within 3 decimal accuracy).In case we use (p, q) = (2, 1), it is easy to show that the solution µ verifies the FP equation µ = H ⋆ (µ), where and the same approach, i.e., µ n+1 = H ⋆ (µ n ), is used for computing the stepsize update.

SIMULATION STUDIES
Next we illustrate the usefulness of the methods in a variety of noise environments and SNR levels.Also, the effect of number of measurement vectors Q on recovery probability will be illustrated.The elements of Φ are drawn from CN (0, 1) distribution and the columns are unit-norm normalized.The coefficients of K non-zero row vectors of X have equal amplitudes σ x = |x ij | = 10 ∀i ∈ Γ, j = 1, . . ., Q and uniform phases, i.e., Arg(x ij ) ∼ U nif (0, 2π).Γ = supp(X) is randomly chosen from {1, . . ., N } without replacement for each trial.We define the (generalized) signal to noise ratio (SNR) as SNR(σ) = 10 log 10 (σ 2 x /σ 2 ) which depends on the scale parameter σ of the error distribution.
As performance measures of sparse signal recovery, we use both the (observed) mean squared error MSE( X) = denote the estimate of the K-sparse signal X [ℓ] and the signal support Γ [ℓ] for the ℓth Monte Carlo (MC) trial, respectively.In all simulation settings described below, the number of MC trials is L = 1000, the length of the signal is N = 512, the number of measurements is M = 256, and the sparsity level is K = 8.The number of measurement vectors is Q = 16 unless otherwise stated.
I.i.d.Gaussian noise.Figure 1 shows the signal reconstruction performance in terms of MSE as a function of SNR in i.i.d.circular Gaussian noise, e ij ∼ CN (0, σ 2 ), where As expected, the conventional SNIHT(2, 2) has the best performance, but SNIHT(2, 1) suffers a negligible 0.07 dB loss, whereas SNIHT(1, 1) suffers 1.07 dB performance loss.Note that SNR = 4 dB is the cutline for which all methods had full PER rate (= 1).From 4 dB the PER rate declines and reaches 0 at SNR = 0 dB for all of the methods.
Student's t ν -distributed noise.to Cauchy distribution.SNIHT(1, 1) has the best performance retaining low MSE for all values of ν.This is in deep contrast to SNIHT(2, 2) which starts an exponential increase at ν ≤ 3, reaching sky-high MSE levels in Cauchy noise (ν = 1).The PER rates in Table 1 further illustrates the remarkable performance of the robust SNIHT(1, 1) but now at lower SNR = 10 dB; SNIHT(1, 1) is able to maintain full PER rates for all values of ν, whereas SNIHT(2, 2) fails completely for ν < 4. The usefulness of joint recovery becomes more pronounce at low SNR's, where multiple measurements can dramatically improve on the recovery ability by exploiting the joint information.This is illustrated in our 3rd simulation set up, where d.o.f. is fixed at ν = 3 and the SNR is 10 dB. Figure 3 depicts the PER rates for increasing Q.As can be seen, the PER rate increases as a function of Q from poor 14% (when Q = 2) to near full 100% recovery (when Q = 6) when using SNIHT(1, 1) method.Again, SNIHT(2, 1) is slightly behind in performance to SNIHT(1, 1).Conventional SNIHT(2, 2) is drastically behind the robust methods, reaching highest 96.6%rate when Q = 18.This is again in deep contrast with near 100% PER obtained by SNIHT(1, 1) method only with Q = 6 samples.

APPLICATIONS TO SOURCE LOCALIZATION
Consider a sensor array consisting of M sensors that receives K narrowband incoherent farfield plane-wave sources from a point source (M > K).At discrete time t, the array output  (snapshot) y(t) ∈ C m is a weighted linear combination of the signal waveforms x(t) = (x 1 (t), . . ., x K (t)) ⊤ corrupted by additive noise e(t) ∈ C M , y(t) = A(θ)x(t) + e(t), where A = A(θ) is the M × K steering matrix parametrized by the vector θ = (θ 1 , . . ., θ K ) ⊤ of (distinct) unknown directionof-arrivals (DOA's) of the sources.Each column vector a(θ i ), called the steering vector, represents a point in known array manifold a(θ).The objective of sensor array source localization is to find the DOA's of the sources, i.e., to identify the steering matrix A(θ) parametrized by θ.We assume that the number of sources K is known.
As in [7], we cast the source localization problem as a multichannel sparse recovery problem as follows.We construct an overcomplete M × N steering matrix A( θ), where θ = ( θ1 , . . ., θN ) ⊤ represents a sampling grid of all source locations of interest.Suppose that θ contains the true DOA's θ i , i = 1, . . ., K. In this case the measurement matrix Y = y(t 1 ) • • • y(t Q ) ∈ C M×Q consisting of snapshots at time instants t 1 , . . ., t Q can be exactly modelled as MMV model (1) in which the signal matrix X ∈ C N ×Q is Krowsparse matrix, whose K non-zero row vectors correspond to source signal sequences {x j (t i )} Q i=1 , j = 1, . . ., K. Thus identifying the source locations is equivalent to identifying the support Γ = supp(X) since any i ∈ Γ maps to a DOA θi in the grid.Note that in the multichannel sparse recovery framework, the steering matrix A is known, and we can use SNIHT methods to identify the support.
Next we illustrate the effectivess of the approach via a simulation study.We assume that K = 2 independent (spatially and temporally) complex circular Gaussian source signals of equal power σ 2 x arrive on uniform linear array (ULA) of M = 20 sensors with half a wavelength inter-element spacing from DOA's θ 1 = 0 o and θ 2 = 8 o .In this case, the array manifold is a(θ) = (1, e −π sin(θ) , • • • , e −π(M−1) sin(θ) ) ⊤ .The noise matrix E ∈ C M×Q has i.i.d.row vectors, each row vector e (i) having complex Q-variate inverse Gaussian compound Gaussian (IG-CG) distribution [13] with shape parameter λ = 0.1 and covariance matrix Cov(e (i) ) = I Q .Note that the covariance of the snapshot is Cov(y(t i )) = σ 2 x A(θ)A(θ) H + I M , so we may use the popular MUSIC method to localize the sources.In other words, we search for the K = 2 peaks of the MUSIC pseudospectrum in the grid.We use a uniform grid θ on [−90, 90] with 2 o degree spacing, thus containing the true DOA's.In Step 1 of SNIHT(p, q) algorithm, we locate the K largest peaks of rownorms of Φ H ψ p,q (Y) instead of taking Γ 0 as indices of K largest rownorms of Φ H ψ p,q (Y).
We then identify the support (which gives the DOA estimates) for all the methods over 1000 MC trials and compute the PER rates and the relative frequency of DOA estimates in the grid.Full PER rate = 1 implies that the support Γ correctly identified the true DOA's in all MC trials.Such a case is shown in upper plot of Figure 4 for the SNIHT(1, 1) and SNIHT(2, 1) when the number of snapshots is Q = 50 and the SNR is −10 dB.The PER rates of SNIHT(2, 2) and MUSIC were considerably lower, 0.81 and 0.73, respectively.Next we keep other parameters fixed, but decrease the SNR lower to -20 dB.In this case, the MUSIC method fails completely and provides nearly a uniform frequency on the grid.This is illustrated in lower plot of Figure 4.Note that the proposed robust methods, SNIHT(1, 1) and SNIHT(2, 1), provide high peaks on the correct DOA's.The PER rates of SNIHT(2, 1), SNIHT(1, 1), SNIHT(2, 2) and MUSIC were 0.70, 0.64, 0.11 and 0.05, respectively.Hence the mixed ℓ 1 -norm method SNIHT(2, 1) has the best recovery performance.Note that a random guess would offer PER rate 1/90 = 0.011, so the performance of MUSIC is not much better here.Thus robust methods can offer considerable improvements in performance when the measurement environment is challenging (low SNR, small Q).