Enhanced Lasso Recovery on Graph

This work aims at recovering signals that are sparse on graphs. Compressed sensing offers techniques for signal recovery from a few linear measurements and graph Fourier analysis provides a signal representation on graph. In this paper, we leverage these two frameworks to introduce a new Lasso recovery algorithm on graphs. More precisely, we present a non-convex, non-smooth algorithm that outperforms the standard convex Lasso technique. We carry out numerical experiments on three benchmark graph datasets.


Sparse Representation on Graphs
The goal of this work is to reconstruct signals on graphs that are supposed to be sparse in the graph Fourier representation.In this context, we will deal here with two main concepts, graph and sparsity, which have gathered a lot of attention in the recent years with the emergence of Compressed Sensing and Big Data.Let us introduce briefly these two concepts in the rest of this section.
Graph/network is a powerful tool to represent complex high-dimensional datasets, in the sense that a graph structures data with respect to their similarities.Graphs have become increasingly more considered in applications such as search engines, social networks, airline routes, 3D geometric shapes, human brain connectivity, etc. Mathematics offer strong theoretical tools to analyze graphs with Harmonic Analysis and Spectral Graph Theory.An essential graph analysis tool is the graph Laplacian operator, which is the discrete approximation of the continuum Laplace-Beltrami operator for smooth manifolds.It is known that the eigenvectors of the Laplace-Beltrami operator provide a local parametrization of the manifold [1].Equivalently, the eigenvectors of the graph Laplacian, also called graph Fourier modes, provides a representation of the graph.Given a graph with (V, E, W ), V , E and W being respectively the set of n nodes, the set of edges and the similarity/adjacency matrix, then the (unnormalized) graph Laplacian operator is defined as where D is the diagonal degree matrix s.t.D ii = j W ij .L is symmetric and positive-semidefinite, i.e. its eigenvalues λ i , ∀i are nonnegative.The graph Fourier modes are given by the eigenvectors {u i } n i=1 of L and can be represented by the orthogonal matrix U = (u 1 , ..., u n ) ∈ R n×n s.t.U ⋆ U = I.The graph Fourier basis U acts as a basis to represent, analyze and process signals on graph.For example, one can represent a function f : is its Fourier transform.In this paper, we consider three well-known graphs.First, the synthetic LFR graph, which was introduced in [2] to study community graphs.Here, the number of nodes is chosen to be n = 1, 000, the number of communities is 10 and the degree of community overlapping is µ = 0.4.Second, we consider a coarse version (for computational speedup) of the benchmark MNIST dataset of NYU [3] with n = 1, 176 nodes and the number of classes is 10.Last, we use a coarse version of the well-known 20newsgroups dataset of CMU [4] with n = 1, 432 nodes and the number of classes is 20.All three dataset graphs are illustrated on Figure 1 with their graph Laplacian spectrum.Sparse recovery is currently one of the most studied topics in signal processing.The main goal is to reconstruct signals that are supposed to be sparse in some basis representation.For example, in medical imaging, one of the objectives of sparsity is to speed up MRI acquisition by reconstructing an image in the Fourier basis given a small number of Fourier samples.This problem can be generalized to find the solution of a underestimated linear system of equations, which is generally ill-posed, with the constraint that the solution is sparse.Finding the solution of this problem is however impracticable because it is a NP-hard combinatorial problem.But Candes, Romberg, Tao and Donoho showed in [5,6] that using an ℓ 1 relaxation and under some conditions on the linear operator, known as the Restricted Isometry Property (RIP), and the measurements, known as incoherence property, there exists a tight convex relaxation of the NP-hard problem, that is easily tractable.However, it has recently been observed that the ℓ 1 relaxation technique can be improved with reweighed ℓ 1 [7], ℓ p , p < 1 [8], difference of convex functions ℓ 1 -ℓ 2 [9] and smoothed ℓ 1 /ℓ 2 ratio [10].These recent works suggest that non-convex relaxations may outperform the original ℓ 1 sparse recovery.In this work, we follow this line of research and we introduce a new non-convex algorithm for sparse recovery on graph.Specifically, our goal is to improve Lasso problems on graph.

Enhanced Sparsity
Starting from the standard ℓ 1 problem for sparse recovery where x is a sparse signal to be recovered, U is the graph Fourier basis, and f 0 are the given measurements, we propose the following enhanced recovery model The new additional constraint, i.e. the ℓ 2 unit sphere, is a non-convex set that is here essential for enhancing sparse recovery.Basically, it forces the solution to be at the intersection of the ℓ 1 -ball and the ℓ 2 -sphere, which are precisely the locations of sparse points in the Euclidean domain, see Figure 2. Observe now that the new constrained ℓ 1 optimization problem is equivalent to min The equivalence comes from the fact that the ratio ℓ 1 /ℓ 2 is a zero-homogenous function, i.e.F (αx) = F (x), α > 0. This means that the solution x ⋆ is the same as αx ⋆ , ∀α.Particularly, for the specific value of α such that x ⋆ belongs to the unit sphere x ⋆ 2 = 1. Figure 2 compares geometrically the standard ℓ 1 and the new ratio model ℓ 1 /ℓ 2 .At a first glance, both models promote sparsity and the new model does not appear to bring anything new but a more complex problem.However, this figure acts as a simple illustration and one must remember that the recovery performance depends also on the incoherence property about the number of observed measurements.In this context, the major motivation to go beyond convexity with the recent works [7,8,9] is to precisely improve sparse recovery with a smaller number of measurements than the standard approach.We will see that the newly proposed model holds this property.

Optimization
We consider a different version of (1) that is robust to noise: min Problem ( 2) is a non-smooth and non-convex optimization problem.The ℓ 1 /non-smooth part of the problem can be handled quite efficiently with techniques introduced in Compressed Sensing such as Alternating Direction Method of Multipliers (ADMM) [11] or Uzawa-type Primal-Dual technique [12].However, the non-convex part is more challenging.For general non-convex problems, it is difficult to design an algorithm that is fast, accurate, robust and also guaranteed to converge, or at least that satisfies the monotonicity property.Monotonicity means that the energy is guaranteed to decrease at each iteration, although the problem is non-convex.In this situation, most non-convex algorithms only find solutions that are critical points or local minimizers, and rarely global minimizers.

Proximal Forward-Backard Splitting Algorithm
We develop in this section an algorithm for the ratio optimization problem (2).A related numerical scheme was introduced in [13] in the different context of data clustering.Let 2 such that we want to solve Let us consider a semi-implicit gradient flow for this problem: where ∂ stands for the subdifferentials of T and B (which is not unique for ℓ 1 but is for ℓ 2 ) and τ k is the time step.This provides the optimality condition where the notations T k = T (x k ) and B k = B(x k ) are used.This leads to a two-step iterative scheme: (1) (2) The second step is the proximal operator [12,14] of the convex function c k 1 T + τ k 2 F .Overall, we have designed a proximal forward-backward splitting algorithm to solve (2) as the solution is given by In the next section, we will show that the proposed iterative algorithm is (almost) monotonic, i.e. its energy is guaranteed to decrease at each iteration.

Monotonicity
We show the following quasi-monotonicity result: Proof.Define the convex functions and observe that G k (x k ) = F k (x k ) for latter use.We remind the general definition of the subdifferential ∂E of a convex function E: We plug x 1 = x k+1 , x 2 = x k and E = G in (8): If we now observe that the first step of the algorithm is Let us now plug x 1 = x k , x 2 = x k+1 and E = F in (8): Notice that the optimality condition (3) reads x k+1 − y k + ∂F k (x k+1 ) ∋ 0 and thus y k − x k+1 ∈ ∂F k (x k+1 ).This implies that (11) may be written as Adding ( 10) and ( 12) and using the fact that Using the definition ( 6) and ( 7), this inequality can be rewritten as (5), which is the desired result.✷ Notes.Observe that close to the steady-state solution, we have B k+1 /B k → 1 for k → ∞ and the quasi-monotonicity tends to a monotonicity property.Second, see that if we had access to the quantity B k+1 (or a good estimation) then we would set τ k = B k B k+1 τ 0 and this would imply , where E T ot = E + F , and thus unconditional monotonicity for any τ 0 .

Enhanced Lasso on Graphs
The Algorithm.The standard Lasso problem on graph is min where U is the sensing matrix, here the graph Fourier modes.Function f 0 is the signal measured on the graph.It is generated as f 0 = U (x 0 + n) where x 0 is a pure sparse signal with 5% of non-zero entries uniformly chosen between [−1, 1] and n is the noise, a Gaussian distribution with standard deviation σ = 0.1.The goal is to recover the sparse signal x 0 .We recall that the proposed enhanced Lasso problem on graph is min We use the proximal forward-backward splitting algorithm introduced in Section 3.1 to solve it.That is, Step 1: x k 2 , and Step 2: We may write this problem as a saddle-point problem min x max p p, x − F ⋆ (p) + G(x) where F ⋆ is the barrier function of the ℓ ∞ unit ball such that Note that G(x) is uniformly convex so that we can apply the accelerated primal-dual algorithm of [14].The algorithm consists in iterating the following steps: The scheme converges quickly, with order O(1/n 2 ), provided that σ 0 = η 0 = 1.The first inner proximal problem has an analytical solution and the second inner proximal problem has also a closed-form solution As the two proximal operators are fast to solve, so it is for the general algorithm.In fact, solving the non-convex ratio problem (2) for sparse recovery can be seen as solving the standard Lasso problem with the addition of a convex quadratic term x − y k 2 2 and updating y k each time the monotonicity condition (5) is satisfied.We summarize the algorithm here.
Note: the time step τ k = B k was chosen experimentally, and is the subject of future study.
Numerical Experiments.We compare standard Lasso and enhanced Lasso on graphs.We test on the LFR, MNIST and 20NEWS graphs.The value of the parameter λ that balances the sparsity term and the fidelity term is chosen to minimize the recovery error defined as x − x 0 2 / x 2 for all models and all graphs.The results are reported on Table 1 and Figure 3. Overall, the proposed enhanced Lasso model performs better than the standard one, but it is 2-3 times slower.

Enhanced Lasso-Inpaiting on Graphs
The Algorithm.In this section, we add a layer of difficulty by removing a set of observed measurements in f 0 .In other words, we do not observe the whole function f 0 but only a portion of it.This problem is equivalent to a Lasso-Inpainting problem.For this, a diagonal selector matrix R is added to the linear operator U such that otherwise, Ω obs being the set of observed measurements, and R ii = 0 otherwise.The formulation is thus min 2 .The enhanced Lasso-Inpainting is naturally min Numerical Experiments.We compare standard Lasso-Inpainting and enhanced Lasso-Inpainting on graphs.We test on the LFR, MNIST and 20NEWS graphs.We remove 40% of measurements of f 0 with R. The value of the parameter λ is again chosen to minimize the recovery error defined as x − x 0 2 / x 2 for all models and all graphs.The results are reported on Table 2 and Figure 4.
Overall, the proposed enhanced Lasso-Inpainting model also performs better than the standard one, but it is 2-3 times slower.

Conclusion
A new sparse recovery algorithm for Lasso-type problems on graph has been introduced.Numerical experiments have shown improvements over the standard ℓ 1 algorithms.This result leverages the recent idea to go beyond ℓ 1 convexity and explore non-convex, non-smooth techniques to find better sparse solutions.In this context, the closest works to ours are (i) the difference of convex (DC) functions [9] and (ii) the smoothed ℓ 1 /ℓ 2 technique [10].We would like to explore in a future work the relationship between our model and these models.Particularly, a direct application of Dinkelbach technique [15] reveals that minimizing the ratio is equivalent to minimize the DC model ℓ 1 − αℓ 2 with α being the minimum value of the ratio ℓ 1 /ℓ 2 .As a result, an interesting question is whether this α  value, which is automatically learned with the proposed algorithm, can provide satisfying solutions for a range of sparse problems.Eventually, we would like to compare our exact ℓ 1 /ℓ 2 ratio technique, which has a weak monotonicity property, with the smoothed ratio technique of [10], which has a strong monotonicity feature.

Figure 3 :
Figure 3: Standard Lasso vs Proposed Lasso on three graphs.