A Novel Approach for Solving MPI for Multi-Target ToF Imaging Using Subdivision-Based Nested Compressed Sensing

Time-of-Flight (ToF) cameras have become a popular imaging modality for macroscopic scene detection. In particular, amplitude-modulated continuous wave (AMCW) ToF cameras use the phase difference between sent and received signals for object depth reconstruction. However, various sources of multipath reflections exist in practice, causing each ToF pixel to erroneously receive a superposition of multiple reflections instead of a single bounce. This leads to distortions in the phase difference and consequently, errors in the depth maps. Compressed Sensing methods have emerged as an effective approach to solve this multipath interference (MPI) problem. However, it has two major disadvantages—large sensing matrix size leading to high computational load, and a high mutual coherence causing reconstruction failure. This paper introduces a subdivision-based nested compressed sensing algorithm that aims to alleviate these known disadvantages. Measurements at multiple modulation frequencies are used to isolate the $k$ interfering signals in the time domain. Simulation results are presented with a noise performance analysis and results based on real multitarget measurement data are also discussed.


I. INTRODUCTION
There is an increasing demand for high-resolution remote sensing systems across different parts of the wavelength spectrum.The Time-of-Flight (ToF) camera is one such remote sensing system for macroscopic scene detection that has gained popularity in the recent years and is being used in a wide range of applications such as vehicle monitoring and obstacle detection in the automotive industry, safe patient monitoring for healthcare, 3D scanning, and human-machine interactions [1], [2], [3].Specifically, the amplitude-modulated continuous wave (AMCW) ToF cameras have had a wide commercial impact due to their small size and high signalto-noise (SNR) ratio [4].
In AMCW ToF cameras, the intensity of the emitted sinusoidal signal is modulated at one or several frequencies of the order of 10-100 MHz.In an ideal case, this signal is reflected by a sparse number of objects in the scene, causing specific time-shifts or phase-shifts in the signal, corresponding to the object depths.The pixels of the ToF camera capture these shifts in phase, which can then be scaled by the modulation frequency to obtain the object depths.
ToF systems work on the assumption that each pixel receives a signal reflected by a single object.However, in practice, the signal received by each pixel corresponds to the superposition of signals from multiple scatterers, leading to multipath interference (MPI).As a result, the detected phase shift is distorted, which ultimately leads to an inaccurate depth estimation.Therefore, in ToF systems, depth resolution improvement equates to better multipath interference mitigation.
A flurry of research work has appeared on different ways of tackling this MPI problem.Some of these research directions include sensor modification [5], [6], spectral estimation [7], [8] and deep-learning based approaches [9].Under the umbrella of spectral estimation, research on compressed sensing (CS) based optimization has shown promising results [10], [11], [12].However, the main challenges have been the computational load due to the size of the sensing matrix, and its high coherence, leading to a failure in the l1 norm minimization.
The aim of this paper is to tackle the MPI problem using a new subdivision-based nested CS approach.It aims to use multiple submatrices constructed from a main structured sensing matrix.This serves to reduce the coherence as well as the computational load of the optimizations.The depth-regions of interest can then be identified based on the variance of the sub-estimates.Re-applying CS on the combination of these interest areas can provide better estimates with a much lower computational cost.The performance of the algorithm is tested on a simulated multi-target scene, as well as on real ToF data.
The paper is structured as follows.Section II discusses the ToF operation and generation of the measurements at different modulation frequencies.Section III describes the CS formulation in detail.Section IV discusses the CS specific challenges encountered in ToF MPI mitigation and introduces the subdivision-based CS algorithm to overcome these problems.Section V provides results based on simulations as well as real data, demonstrating the effectiveness of the proposed method.Section VI discusses the conclusions drawn and the scope for future work.

II. PROBLEM STATEMENT
In this work, the focus is placed on a multi-frequency AMCW ToF system.In this section, the mathematical formulation of the frequency-domain sensing model is described.
The scene response function in the phase domain is given by where δ(ϕ − ϕ k ) is a Dirac delta function centered at ϕ k , which is the phase shift that the transmitted signal undergoes when following the path k ∈ [1, P ].P represents the number of paths which is assumed to be low or 'sparse'.a k is the attenuation factor due to non-unit reflectance of target.The received signal at each pixel can then be expressed as a convolution of the transmitted periodic illumination signal s(ϕ), and the scene response e(ϕ), i.e., As described in [11], the final measurement at each pixel is obtained by a cross-correlation of r(ϕ) with the PMD control signal p A−B , yielding The per-path amplitude and the phase terms in (4) are the only two parameters of interest and can be determined by 2P + 1 measurements [11].If these two measurements have a phase shift of π/2, they can be used to construct the Fourier coefficients of the scene response function as follows: where j represents the index specifying the frequency in the multi-frequency acquisition mode.Considering f 0 as the base operating frequency, the data acquisition is carried out for different frequencies given by f j = jf 0 , j = 1, ..., M .Therefore (5) can be simplified to give Thus, (6) provides a simplified expression of the measurements at every pixel in terms of discrete time and frequency samples.The aim is to retrieve the terms a k and t k , ∀k ≤ P , that can correctly explain y.The matrix pencil method, a robust variant of Prony's method, is a standard way to achieve this goal [16].Compressed Sensing (CS) based approaches may improve the depth resolution by promoting sparse solutions in a finer grid.However, existing CS algorithms pose certain challenges that are discussed later.

III. COMPRESSED SENSING APPROACH
In practice, the number of objects of interest that reflect the illumination signal are sparse in comparison to the ambient dimension in time/depth domain.Therefore, it is logical to follow a CS-based approach for ToF depth resolution improvement.
A general CS problem is expressed as where ⃗ y ∈ C J containing the measurements in frequency domain, A A A ∈ C J×K , J ≪ K, is the sensing matrix, and ⃗ x ∈ C K is an s-sparse reflectivity vector constructed from the sparse reflectivities of the objects in the scene.⃗ e is white Gaussian noise in the measurement ⃗ y. ( 6) can be expanded into this general CS formulation, giving . . .
The sensing matrix A A A resembles a horizontal discrete fourier transform (DFT) matrix, oversampled in time.Such a sensing matrix construction aligns with the structured CS approach [13], and is different from the traditionally used random sensing matrices.The sparse reflectivity vector ⃗ x consists of the real-valued reflectivities a(k) from each object in the scene.Based on this framework, the solution for ⃗ x is given by where η is the upper bound on the norm of the measurement error, i.e., ∥⃗ e∥ 2 ≤ η.For the ToF application, if the location of the non-zero reflectivities can be determined accurately, the corresponding columns of A A A can be identified as the active columns, thereby identifying the correct 'times of flight', t k .These time instances then directly provide the depth information for each object, since In order to get an improved depth resolution using this CS formulation, the ToF camera must acquire a sufficient number of measurements at different modulation frequencies.Theoretically, given a sparsity s and a finely-resolved depth grid of length N , the relation [14] m ≥ 2s ln(N ) gives the number of multi-frequency measurements required.

IV. SUBDIVISION BASED ITERATIVE SOFT THRESHOLDING ALGORITHM
The use of popular CS algorithms for the depth resolution improvement in ToF systems poses certain challenges.In the CS formulation described in (9), a depth resolution improvement is possible only when CS algorithms support a finelyspaced time (or depth) grid.However, this is challenging due to the following reasons: 1) Given a constant depth to be covered, a finely-spaced time grid increases the size of the sensing matrix, leading to a large computational load.2) With a finely-spaced time grid, the degree of similarity between adjacent columns, which is measured by their normalized inner product, increases, and, as a result, the coherence of the sensing matrix increases, leading to an ill-posed CS problem.In such a scenario, conventional CS methods fail to provide a good estimate.In order to solve these issues, a sub-division-based nested CS algorithm is introduced.A different algorithm based on this idea was applied for radar range resolution improvement in [15].

A. Subdivision-based Nested CS
It must be noted that in order to get (real-valued) reflectivities using complex measurement vector ⃗ y, a symmetric version of the DFT matrix is used as the sensing matrix A A A, i.e., the time grid is always centered around 0. Based on this sensing matrix structure, the subdivision-based nested CS algorithm is outlined in Algorithm 1 and is described as follows: 1) Determine the factor k sub = K J that dictates the difference between the number of rows and columns of A A A.

2) Divide A A A into k sensing matrices A sub
A sub A sub that cover different parts of the time or depth grid, while maintaining a symmetric structure.Thus, where, mid represents the column index for time 0, and i = 1, ...k sub .Due to this subdivision, each A sub A sub A sub (i) has a lower coherence and a k sub -times smaller column dimension compared to A A A, thereby addressing both the challenges previously described.
3) The measurement vector ⃗ y may now be represented as ⃗ y = A sub A sub A sub (i)⃗ x sub (i), i = 1, ..., k sub .These k subproblems can be directly solved using a greedy CS method or basis-pursuit minimization.

4) The columns of each sub-matrix A sub
A sub A sub (i) capture a different part of the depth grid.Since the detected scene is sparse, only a few of the ⃗ x sub (i)s will have sharp peaks representing the on-grid targets for the specific sub-matrices.k sub /2 of the ⃗ x sub (i)s having the highest standard deviation values are selected, since high standard deviation (or variance) corresponds to the CS results with the sharpest peaks.
5) The fine-grid locations corresponding to these peaks are used to construct the final support S. 6) The final estimate is given by solving a normal CS problem, as expressed in (8), such that where A S A S A S consists of the columns of A A A identified by the support set S.  The performance of the algorithm was tested on synthetic measurements from 3 point targets at depths of 0.92, 1.84 and 2.4 m respectively.Following the parameters of the real measurement setup, 53 frequency samples were considered ranging  from 0 to 179.214 MHz, with steps of 3.514 MHz.Since a centered measurement vector is considered, as described in Section IV-A, the frequency grid ranges from −179.214MHz to 179.214 MHz.The measurements corresponding to the negative frequencies are complex conjugates of the ones at the positive frequencies.Therefore, the centered measurement vector ⃗ y consists of 103 measurements.For all simulations, a depth of 3m was considered with a grid resolution of 8mm.Fig. 1 shows the 3 peaks on the depth grid and the reconstruction results from subdivision-based nested CS for SNR= ∞ and SNR=30.As expected, the amplitudes of the detected peaks reduce for the noisy case, however, the positions are detected accurately.Using the Iterative Soft-Thresholding Algorithm (ISTA) as the CS reconstruction method, every instance of the subdivision algorithm has a computation time of 0.004 sec, while the standard ISTA requires 0.13 sec.This improvement greatly affects the computation time, specially when the a large number of ToF pixels are considered.

B. Performance Analysis
The matrix-pencil method [16] has been used as the basis for performance comparison.It was observed that the matrixpencil method performed better for lower values of the pencil parameter L, J 2 ≤ L ≤ J.This agrees with the conditions for a noise-robust pencil method in [16].Fig. 2 shows the MSE vs SNR plots for the depth images of the 3 targets obtained using the subdivision-based nested CS method (in red) and the matrix-pencil method (in blue), for an average of 5 noisy measurements.The error reduces with increase in SNR for both methods and becomes constant at around 20 dB.For all 3 cases, the results from the CS method has a lower error as compared to the MPI method.This is expected, since the synthetic scene is quite sparse.The small irregularities in the low SNR region arise due to a small number of measurements used for the average.To analyze the performance of the proposed algorithm for different levels of sparsity and different numbers of measurements in the presence of noise, phase transition diagrams are constructed.In Fig 4, m K denotes the ratio of the number of available measurements to the number of range cells, and s m denotes the ratio of sparsity to the number of measurements.For each plot, the ratios m K and s m are varied from 0.05 to 1 in steps of 0.05 and for every point m K , s m , 20 iterations of the proposed method are executed.The success or failure of each iteration is determined by calculating the averaged meansquared error between the ground truth and the reconstruction results.An error of 0.001 or less is considered a success.The averaged rate of success is then shown in the corresponding position in the phase transition diagram.Since the sensing matrix is always horizontal (K ≫ m), the number of columns (K) considered for the phase transition diagrams must correspond to the maximum number of available measurements (m).Figs show the phase transition plots for the nested CS method for SNR = ∞, 80, 40, and 20 dB.

C. Results on Multi-Target Measurement Data
This section discusses the results obtained using real ToF data.The experimental setup, shown in Fig. 5, consists of 3 targets-a glossy translucent target (T1) at 0.87 m, a diffusive translucent target (T2) at 1.87 m, and an opaque white placard target (T3) at 2.408 m depth.Table .I shows the estimated depths of each target from both methods.Since the position in depth is decided by the location of the amplitude peaks, the sparsity-inducing methods perform better as they are capable of identifying prominent data points in noisy environments.The matrix pencil method shows much higher sensitivity to noise and fails with real data.Specially in the case of T3, the same amplitude was detected for a large number of inaccurate points in the depth grid, leading to an incorrect estimate.Overall, the estimation accuracy of the CS based method is much better than that of the matrix pencil method adopted in prior works to solve the MPI problem.

VI. CONCLUSION
A new subdivision-based nested CS algorithm was proposed with an aim to improve on the disadvantages of existing CS methods applied to ToF MPI mitigation.Simulation results were presented along with a performance analysis.A 3-target setup was used to test the performance of the algorithm for both simulated and real-world measurements.The proposed algorithm is flexible and can be further explored using different types of CS methods.A more detailed comparative study based on different CS methods can be performed in future.The effect of the pencil parameter on the performance of the matrix pencil method can also be further explored.

Fig. 3 :
Fig. 3: Depth Image results from Subdivision-based CS and Matrix Pencil methods.

TABLE I :
Target position estimates from the Subdivisionbased CS method and the matrix pencil method.