Pseudo-Passive Indoor ToF Sensing exploiting Visible Light Communication Sources

The need for active illumination is one of the fundamental downsides of state-of-the-art Time-of-Flight (ToF) cameras and translates into large power consumption as compared to passive imaging modalities. Recently, developments in visible light communications (VLC) have allowed the lighting infrastructure to provide both illumination and communication services in indoor environments. In this paper, we propose exploiting visible light sources as opportunity illuminators for indoor ToF sensing. This allows for drastically reducing the power consumption of the ToF camera, as the need for illumination modules is eliminated. We study the feasibility of this idea using an off-the-shelf VLC module, model the emitted light signal, and study its autocorrelation properties. We show that the prominence of two dominant frequencies, arising from the underlying clock signal and coding scheme, enables CW-ToF operation. Simulations carried out using real signals from the VLC module showed successful depth estimation, up to an offset, using standard methods for CW-ToF depth estimation such as the four-phases algorithm.


I. INTRODUCTION
In recent years, 3D Time-of-Flight (ToF) imaging has fetched the interest of researchers from industry and academia due to its wide range of applications, which includes mobile robotics, indoor sensing, autonomous driving, and humanmachine interaction [1]. ToF cameras make use of a light source which illuminates the scene in connection with an array of ToF pixels. This allows to recover the time delay from the back-scattered optical signal that returns from the scene. Therefore, depth information can readily be obtained, provided that the speed of light is a known constant. In the last few years, with the rapid progress of solid-state devices, illumination infrastructure has seen a rapid transition from conventional (e.g., incandescent and halogen) lamps to LEDs, due to their inherent benefits such as low cost, long lifetime, power efficiency, low heat generation, and high bandwidth, enabling switching at tens of megahertz. This transformation has turned out to a profound development in optical wireless communication, namely Visible Light Communication (VLC), which uses the lighting infrastructure as an optical wireless link. Moreover, this has revolutionised the existing wireless communication models and opened up new avenues for different technologies. State-of-the-art lighting solutions provide illumination and communication functionalities, simultaneously within a single module. These developments have led to the rise of LED-based VLC in the domestic domain [2]. In this context, the VLC infrastructure has created a strong push for pseudo-passive ToF sensing to take profit of VLC sources as opportunity illuminators. Active 3D imaging is a wellstudied problem [3]- [5]. Nevertheless, a challenging problem that arises in this domain is its power-hungry illumination source, which compromises its applicability. For instance, the illumination system used in [6] for wide-area ToF imaging emits 91 W of optical power.
To overcome this problem, an alternative approach is proposed in this work that eliminates the illumination unit from the ToF camera system, thus enabling passive ToF 3D sensing. The method exploits the existing VLC infrastructure [7] to illuminate the scene with a modulated light and an asynchronous ToF camera to reconstruct the depth information. In this work, the OpenVLC1.3 module with white LED is used for our experiments [8], which is a low-cost open-source platform that features a bandwidth of 1 MHz and supports a throughput of 400 kbps. To the best of the authors' knowledge, this is the first paper that explores using VLC infrastructure as a drop-in replacement of illumination unit from the ToF depth cameras.

II. BACKGROUND AND RELATED WORK
Referring to earlier studies, ToF technology is classified into pulsed ToF and continuous wave (CW) ToF systems, which measure flight time or phase shift to estimate distance, respectively. In 1977, Standford Research Institute (SRI) developed a first ToF camera [9], but did not get much attention due to the limited sampling rates of the detectors. Later in 1997, phased ToF cameras based on charged coupled devices (CCD) were pioneered by Prof. R. Schwarte, from the University of Siegen, Germany [10]. From then until now, ToF imaging technology has become ubiquitous in a wide range of 3D imaging applications. A prominent core technology for CW-ToF cameras is known as Photonic Mixer Device (PMD). These devices follow a phase stepping algorithm to extract depth [11] from raw images and are being widely adopted due to their well-known processing pipeline and open design [12]. In [11], [13]- [15], the authors have discussed the principle operation of the lock-in ToF cameras, advantages, limitations, structure of ToF pixels, and provided a solution to technical problems that arise when a PMD operates in the presence of ambient light. Sequential acquisition produces motion artifacts and low frame rate problems, multi-tap and multi-aperture systems are proposed to acquire all raw data in a single shot [16]. In [17]- [20], the authors have developed a passive system via aperture masking interferometry, and photometric stereo (PS) imaging methods. These approaches can only recover a limited number of hidden sources by using low-coherence interferometry and require an appropriate light footprint respectively. Recently, an interferometric technique has been proposed in [21] that uses the concept of photon bunching signature, as demonstrated by Hanbury Brown and Twiss [22]. This approach uses thermal light sources for passive 3D sensing, but requires conditioning the light before illuminating the scene. The VLC infrastructure can be shared for illumination, communication and ToF depth sensing as reported in [23]. In an attempt to address the power constraint in a novel way, passive ToF imaging aims to exploit existing light sources 3D scene reconstruction.
The foregoing passive ToF alternatives [17], [21] suffer from severe limitations and are far from being operative. Differently, in this work we exploit the fortunate fact that VLC infrastructure is mostly found in homes, office environments, industrial areas, and vehicles, i. e., attractive application scenarios for ToF cameras [24]. Taking profit of this, we propose using VLC infrastructure, turning indoor background light into optical signals for the ToF camera, and eliminating the need for an optical filter to discard background light.

III. PROPOSED METHODOLOGY
The recently-developed VLC infrastructure uses modulated light signals for communication, relying on similar light sources are to be used for sensing to illuminate the scene, thus enabling pseudo-passive ToF sensing. The VLC source and the ToF camera are independent. In addition, the signal attenuation is not considered. Here, the VLC emitted signal interacts with the scene response function (SRF) and, hence, the corresponding reflected signal is observed at the ToF sensor. The VLC module emits a manchester-coded on-offkeying (MC-OOK) modulated signal. This translates a 0 bit and a 1 bit as 'HIGH' during the first and second half of the symbol period respectively, as enunciated in (2). The emitted signal for m ∈ {0, 1} can be formulated as: where, Here, q m (t) is the pulse waveform, and T represents a bit period. The effectively-emitted light signal, p(t), however, does not exactly coincide with the p th (t) in (2) due to the low-pass effect of the LEDs. The theoretic signal model uses the convolution operation between the input signal p th (t) and the impulse response function h LED (t) of the LED to generate the output signal.
where the impulse response function is given by: where, T RC = RC is the time constant and the unit step function σ(t) is 1 for t ≥ 0, and is 0 for t < 0. The time constant is determined by simply exciting the system with a unit step function. In the general case of multiple bounces of light per ToF pixel, let Γ j denotes the reflective components of the j th target, located at a distance equivalent to the time of flight t j at the speed of light, ∀j. Then the SRF can be represented as a weighted sum of direct delta functions: The reflected signal follows (6). Note that (6) is simply the convolution of the emitted light signal with the SRF, i. e., r(t) = (p * h)(t). The SRF is defined by means of the shiftinvariant kernel h(t, τ ) = h(t − τ ), being τ the shift variable.
The modulated signal is generated by driving the (low-pass) light emitter with binary signals of the type given by (1) and (2) and the scene-dependent signal received by each ToF pixel follows (p * h)(t) with p(t) showing the transitions in (5). Coherently with related work on CW-ToF, let us assume that the demodulation control signal (DCS) is a perfect sinusoid M k (t) = M A sin(ωt + ϕ k ) used to synchronize the ToF camera, and ϕ k can be written as ϕ k = 2π(k − 1)/N . M A is the magnitude of demodulation signal, ω = 2πf , f is the desired frequency and ϕ k is k th phase shift, for k ∈ [1, N ]. In the demodulation process, the measurements can be modeled as N samples of the cross-correlation function R(ϕ k ), between the received light signal and the demodulation function, as reported in [12]. In practice, the cross-correlation integral is computed over one exposure time, typically in the ms range. In the discrete case, the measurements obtained for different phase shifts, ϕ k , are given by: where, r[n] is the reflected signal in discrete domain and M k [n] is the discretized DCS. To simplify the depth calculus, the N = 4 samples of the cross-correlation function are uniformly distributed, so that the phase offsets are 0 • , 90 • , 180 • , and 270 • respectively. Following the four-phases algorithm, which is a standard tool for phase retrieval in CW-ToF [12], the phase shift induced by a single target can be retrieved from these four measurements as follows: Finally, the depth can be readily obtained as: where c denotes the speed of light. Empirical data presented in next section showed that the emitted light signal p(t) contains two distinct peaks in frequency domain. We exploit this to enable depth estimation from few measurements of the shape in (6) acquired at any of these dominant frequencies.
IV. RESULTS AND DISCUSSION We make use of a real VLC optical signal, p(t), acquired empirically (Fig. 1a) to carry out simulations that showed the feasibility of the methodology outlined in Section III. For acquiring samples of the VLC module signal, the optical signal in Fig. 1a was recorded with a Thorlabs photodiode PDA100A-EC coupled to a Tektronics MSO 4034 oscilloscope (350 MHz bandwidth) and its autocorrelation function is presented in Fig. 1b. The LED capacitive nature is clearly visible in the inset. The frequency analysis of this signal unveiled two dominant frequencies: f 1 = 367438 Hz and f 2 = 122438 Hz. The first is related to the underlying clocking, while the second practically coincides with one third of the first and is due to the MC-OOK. From the empirical data, a parametric RC model was fit with parameters R = 2 kΩ and C = 0.8 nF. The model instantiated with these values accurately mimics the measured signal responses, as shown in Fig. 2 (cf. insets in Fig. 1). For  (8) and (9). Provided that the initial phase offset is unknown, the phase depth obtained at 0 depth is subtracted from the value obtained from (9). Depth estimation results for the two dominant frequencies are shown in Fig. 3. The obtained accuracy is around 99 % of the range. The low frequency affects the distance resolution. For f 1 (Fig. 3a) and f 2 (Fig. 3b) we show that the depth estimation is feasible up to 10 m with error below 0.6 mm (Fig. 3c) respectively. For f 1 , depth is attainable up to 35 m with error below 14 mm (Fig. 3f). For f 2 , Fig. 3f shows that the depth error is below 0.21 mm up to 30 m. Divergence for large depth values may be due to drift of the dominant frequencies over time.

V. CONCLUSION
We have proposed the novel idea of making use of existing VLC infrastructure for attaining pseudo-passive ToF 3D imaging in indoor environments. Feasibility of this idea is demonstrated via simulations using real optical signals from an off-the-shelf VLC module. Analysis of empirical optical signals showed two prominent peaks in frequency domain. Crucially, these frequencies do not depend on the data being transmitted. Consequently, we have proposed adopting a CW-ToF pipeline at any of these two dominant frequencies. In our simulations we have emulated the scene-related delays via a moving window on a prerecorded optical signal during data transmission. Simulation results showed that accurate depth estimation is possible both in the short and long ranges. In future work, we will demonstrate our approach using real measurements from a ToF camera.