Parameter estimation in spike and slab variational inference for blind image deconvolution

Most current state of the art blind image deconvolution methods model the underlying image (either in the image or filter space) using sparsity promoting priors and perform inference, that is, image, blur, and parameter estimation using variational approximation. In this paper we propose the use of the spike-and-slab prior model in the filter space and a variational posterior approximation more expressive than mean field. The spike-and-slab prior model, which is the "gold-standard" in sparse machine learning, has the ability to selectively shrink irrelevant variables while relevant variables are mildly regularized. This allows to discard irrelevant information while preserving important features for the estimation of the blur which results in more precise and less noisy blur kernel estimates. In this paper we present a variational inference algorithm for estimating the blur in the filter space, which is both more efficient than MCMC and more accurate than the standard mean field variational approximation. The parameters of the prior model are automatically estimated together with the blur. Once the blur is estimated, a non-blind image restoration algorithm is used to obtain the sharp image. We prove the efficacy of our method on both synthetically generated and real images.


I. INTRODUCTION
In image deconvolution, it is usual to model the degradation suffered by an original image x as [1] y = Hx + n, where, using matrix-vector notation, y is the observed image, n represents the noise and the matrix H is the convolution matrix whose row elements are obtained from the blur kernel h. In blind image deconvolution (BID) both the original image x and the blur h are unknown and both have to be estimated from the available data, that is, the degraded observed image. Notice that the inverse problem in (1) is underdetermined since for estimating the original image and the blur we have more unknowns than available observations. It is also an ill-posed problem in the sense of Hadamard, i.e., small variations in the observed data can lead to large variations in the solution and, even more, small variations in the estimated blur result in large variations in the restored image. To obtain good image and blur estimates, prior knowledge about the unknowns and sound estimation procedures are needed. BID methods can be formulated in either the image or the filter space. In the image space, the restored image and blur are estimated directly from the observed image. In the This work was supported in part by the Ministerio de Economía y Competitividad under contracts TIN2013-43880-R and DPI2016-77869-C2-2-R, and the US Department of Energy grant DE-NA0002520. filter space, however, the observed image is first filtered with different high-pass filters to obtain a set of sparse pseudoobservations. Several authors have discussed the advantages of using one space over the other [1]- [3]. While filter space methods have the advantage of having access to more pseudoobservations (one for each used filter) to estimate the blur, they appear to be more noise-sensitive. In the filter space, a non-blind deconvolution algorithm is always used to recover the sharp image after blur estimation. Image space approaches can estimate the image and blur simultaneously [4] although they usually first estimate the blur from the observed image and then they resort to a non-blind deconvolution algorithm to recover the final sharp image since images suited to blur estimation tend to have many steep edges and not many details.
Variational Bayes (VB) is a sound and well-grounded probabilistic estimation approach to BID problems. VB BID methods, like the vast majority of state-of-the-art BID methods, rely on the use of sparsity promoting image models. See [1], [5] for reviews on variational BID methods and image models. Since the work [6], where a mixture of Gaussians (MoG) is used to impose sparsity and VB is utilized to perform inference, the interest on sparse priors has increased. Babacan et al. [7] propose a general Bayesian framework based on Super Gaussian (SG) and Scale Mixture of Gaussian (SMG) priors for BID. The framework includes, among others, [6] as a special case. Notice also that popular sparse prior models, such as TV, p , MoG, and Student-t [8] are included in the proposed framework. The work in [7] has been extended in [9] to handle Huber Super Gaussian (HSG) priors, which solve the problem of lack of differentiability around zero of most common SG priors. It is interesting to note that while most of the BID research concentrates on image modeling, work has also been carried out to force the posterior blur distribution to be a member of a particular class of probability distributions, see, for instance, [10].
As indicated in [11], soft-sparse or shrinkage priors such as the Laplace and other related SMG priors may not be ideal sparsity-promoting priors. They assign zero probability mass to events we may be interested in assigning a probability greater than zero. For instance, to obtain better blur estimates in BID and get rid of noise that compromises the estimation procedure we may want to assign non-zero probability to a zero output in the filter space. Priors that combine Bernoulli and continuous distributions are starting to be used to better approximate the 0 penalization [12], [13]. Such is the case of spike-and-slab priors [14], also named Bernoulli-Gaussian priors [13] since they consist of a mixture of a Bernoulli and a Gaussian distribution. These priors constitute the gold standard in sparse machine learning, having the ability to selectively shrink irrelevant variables, while mildly regularizing the relevant ones [15]. Applications of this prior include variable selection [16], [17], denoising [11], [18], inpainting [11], unsupervised latent variable models [19], hyper-spectral image fusion [20] and sparse signal recovery [13]. Notice that the spike-and-slab prior can be approximated as the mixture of two Gaussians, one very peaky (the spike) and another with very high variance (the slab) [21] but this is still a mixture of two continuous distributions. According to [22] spike-and-slab models are more effective than other sparse priors (Laplacian or Student-t priors, for instance) in enforcing sparsity. The degree of sparsity can also be directly adjusted by modifying the weight of the spike in the mixture.
Unfortunately, due to the form of the prior, Bayesian inference for spike and slab models is a very challenging task. The exact posterior can not be calculated. Since classical mean field variational inference removes essential dependencies in the posterior distribution approximation, until recently, costly Monte-Carlo Markov Chain (MCMC) sampling was the usual way to perform inference. The work by Titsias et al. [11], proposes an alternative VB inference model to approximate the posterior distribution using a simple and efficient algorithm. Instead of using a unimodal variational distribution, the authors propose an alternative approximation that more accurately matches the combinatorial nature of the posterior distribution over the spike-and-slab weights (see also [22] for the use of Expectation Propagation for posterior approximation).
Note that spike-and-slab models depend on a series of parameters that need to be set or estimated. The estimation of the model parameters is a very important issue in BID since they dramatically affect the quality of the final result. Usually, their values are chosen by trial-and-error, using a grid search on a set of selected values, generalized cross-validation (GCV) [23], discrepancy principle [24], or measures such as the whiteness measure [25]. However, when using the VB approach, the parameter estimation can be performed within the framework [4], [7], [9].
In this paper we formulate the BID problem in the filter space and introduce a spike-and-slab prior to model our knowledge on the original image in that space. This allows us to distinguish relevant observations for blur estimation from noisy ones, discarding the latter, and make the blur estimation more robust and accurate. We also derive a method to estimate the prior model parameters automatically from the observed data within the VB framework. The VB approach we propose in this paper handles inference in a much more efficient way than MCMC, and is more accurate than the standard mean field variational approximation.
The rest of the paper is organized as follows. Section II describes the proposed model for blur estimation. In Sec. III, Bayesian inference is performed and, in Sec. IV, a BID algorithm is synthesized. The performance of the proposed method is assessed in Section V. Finally, Section VI concludes the paper.

II. BAYESIAN MODELING OF THE BLUR ESTIMATION PROBLEM
The BID problem is formulated here in the filter space. We create L pseudo-observations y γ by applying high-pass filters {f γ } L γ=1 to the blurred and noisy image y obtaining with x γ = F γ x and F γ is the convolution matrix associated with the filter f γ . In this work a linear convolution is assumed. The size of each pseudo-observation is N .
Assuming that the pseudo-observations are independent, and denoting where β γ , is the inverse of the noise variance, we can write where p(y γ |h, x γ , β γ ) = N (y γ |Hx γ , β γ I).
Notice that {x γ } L γ=1 should be sparse since they represent high-pass filtered instances of the original image. Therefore, to impose sparsity on the solution, we define a spike-and-slab prior on the value of each pixel x γi of x γ , where δ(·) denotes the Dirac delta function and N (x γi |0, α −1 γ ) denotes a zero mean Gaussian distribution onx γi with variance α −1 γ . Observe that this is a truly sparse prior: x γi is exactly zero with probability 1 − π γ .
Let us now write, following [11], x γi as the product of a Gaussian zero-mean random variablex γi ∼ N (x γi |0, α −1 γ ) and a Bernoulli random variable s γi ∼ π sγi γ (1 − π γ ) 1−sγi , that is, and redefine the prior on the two components of x γi , as where s γi ∈ {0, 1}. We use the notationx and utilize flat priors on the hyperparameters α Γ and π Γ as well as on the blur h. Other distributions can be used on the hyperparameters, such us gamma distributions [4], if any information about their value is available. Notice that, since the size of the blur is much smaller than the size of the image, enough data is available to obtain a precise estimation, even if no information on the prior form of the blur is available, which supports the use of an improper non-informative priors.
With all the above and denoting the whole set of unknowns by Θ = {α Γ , π Γ , h,x Γ , s Γ }, we can write the joint distribution as

III. VARIATIONAL INFERENCE FOR BLUR ESTIMATION
Since p(Θ|y Γ ) cannot be calculated in closed form, the standard mean field approximation [26] that factorizes q(x γ , s γ ) = q(x γ )q(s γ ) could be used. However this is a unimodal distribution [11] and, therefore, not a good approximation of the true posterior distribution. Since the pairs {x γi , s γi } are strongly correlated (recall that x γi = s γixγi ), we treat them as a unit, hence we use the factorization and utilize the following mean field approximation The distribution approximating the posterior p(Θ|y Γ ) is found by minimizing the Kullback-Leibler divergence. In absence of any other knowledge, we assume that q(α Γ ), q(π Γ ) and q(h) are degenerate distributions and that the blur is non-negative and its coefficients add up to one. We now present how inference on q(x γi , s γi ) and q(h) is performed.
A. Obtaining q(x γi , s γi ) Using the Kullback-Leibler criterion and the mean field approximation in (9), we have where Z is the partition function, θ ∈ Θ, and Θ θ denotes Θ with θ removed. To compute the explicit expression for the above posterior we separate the derivations for q(s γi ) and q(x γi |s γi ) obtaining where u γi = ln q(s γi = 1) − ln q(s γi = 0) = ln where h i denotes the ith column of H and with h 2 = h T i h i , ∀i, since spatially invariant blur is assumed.

B. Obtaining q(h)
Notice that we assume a degenerate distribution on q(h), which leads to obtaining the point estimate of h,ĥ, aŝ constrained to h i ≥ 0, i h i = 1. This estimation problem can be efficiently solved using the ADMM method in [9].

IV. BLIND DECONVOLUTION ALGORITHM
The blur estimation algorithm iterates on the estimation of each one of the unknowns from their respective distributions given the current estimate of the rest of the unknowns which leads to the estimation procedure summarized in Algorithm 1.
Following [6], [7], [9], we perform kernel estimation using a multiscale approach. This allows us to obtain a good kernel approximation at coarse scales, where it is easier to estimate it, and provide a good starting point to finer levels by upsampling the kernel estimated at the previous scale.
Notice that whereas this algorithm does not provide an estimate of the image since it works on the filtered images, not on the image itself, it provides an estimate of the blur. Once the estimate of the blur,ĥ, has been obtained, a nonblind deconvolution algorithm is used to obtain an estimate of the original sharp image. In this paper we obtain an estimate of the original image by solving the problem using the fast iterative method in [27], [9].

V. EXPERIMENTAL RESULTS
We have tested the proposed method on a set of 4 test images with 6 different blur kernels. Each original image in the upper row of Fig. 1 was convolved with each one of the blur kernels in the lower row of Fig. 1 and Gaussian noise of variance 10 −4 was added to obtain a set of 24 degraded images.
The proposed method was compared with the BID method in [9]. Following [9], The proposed algorithm is initialized as follows. Initial blur at the lowest scale is a 3 × 3 Gaussian kernel with variance 0.5. For the rest of the scales, the kernel estimated at the previous lower scale is upsampled by a factor √ 2 in each direction. The precision parameters β γ were set to 5000, ∀γ, its real value according to the noise in the image (notice that β γ = β/ f γ 2 ). At each scale, µ Γ is initialized at the current pseudo-observation y Γ and ω γi , ∀γ∀i, is drawn from a Gaussian distribution with mean 0.5 and standard deviation 0.01, π γ = 0.5, ∀γ, and α Γ is obtained from the previous scale.
Some restored images and their corresponding blur estimates are shown in Fig. 2. The proposed method obtains restorations with fewer noticeable artifacts. Color images were   Table I. The data suggest that the proposed method performs better than the method in [9] for most of the images and blur kernels tested, providing higher PSNR and similar SSIM values. However, as in many other BID methods, good results depend on a good blur estimation at each scale and small variations or a poor estimation at a lower scale may ruin the final result. The proposed method which needs about 400 seconds to run on an i7-5500U CPU @ 2.40GHz with 16GB RAM, is slower than the method in [9] which needs about 50 seconds. The reason is that the image update in Eq. (18) is performed pixelwise, preventing the use of the FFT to speed up the computations and that the method in [9] does not automatically estimate the parameters so several runs may be needed to adjust the method's parameters. Figure 3a shows a real blurred and noisy image borrowed from [28]. Noticeable artifacts appear on the deconvolved image with the method in [9] (Fig. 3b), while the result by the proposed method, depicted in Fig. 3c, is sharper, less noisy and presents fewer artifacts. The estimated PSF (see the inset) is more accurate and not as noisy as the one obtained with the method in [9].

VI. CONCLUSION
We have presented a new BID method formulated in the filter space. The novelty of the proposed model lies in the introduction of the spike-and-slab prior on the high-pass filtered image. A convenient reparametrization of the spikeand-slab prior makes VB inference possible and a sensible factorization provides a better approximation of the posterior. This leads to an efficient algorithm that accurately estimates the blur kernel due to the ability of such priors to shrink irrelevant information. All the prior model parameters are estimated along with the blur within the VB framework. Automatic estimation of the noise parameter, as well as the use of other hyperprior distributions on the parameters, is under study. Empirical experimentation provides sufficient proof of the competitiveness of the proposed method.