Time-domain realisation of model-based rendering for 2.5D local wave field synthesis using spatial bandwidth-limitation

Wave Field Synthesis aims at a physically accurate synthesis of a desired sound field inside an extended listening area. This area is surrounded by loudspeakers individually driven by their respective driving signals. Recently, the authors have published an approach for so-called Local Wave Field Synthesis which enhances the reproduction accuracy in a limited region by applying a spatial bandwidth limitation in the circular/spherical harmonics domain to the desired sound field. This paper presents an efficient time-domain realisation of the mentioned approach for 2.5-dimensional synthesis scenarios. It focuses on the model-based rendering of virtual plane waves and point sources. As an outcome, the parametric representation of the driving signals for both source types allows for the reproduction of time-varying acoustic scenarios. This also includes an adaptation to the tracked position of a moving listener. The realisation is compared with conventional Wave Field Synthesis regarding the spatial structure and spectral properties of the reproduced sound field. The results confirm the findings of the prior publication, that the reproduction accuracy can be locally improved with Local Wave Field Synthesis.


I. INTRODUCTION
Sound Field Synthesis (SFS) techniques aim to synthesize a desired sound field in a physically accurate manner within an extended listening area.Wave Field Synthesis (WFS) [1] and Near-Field-Compensated Higher Order Ambisonics (NFC-HOA) [2] are well established representatives of this methods.In SFS, the rendering of acoustic scenarios can be classified in two fundamental principles: (i) data-based rendering reproduces a scene acquired via Sound Field Analysis (SFA) techniques.(ii) model-based rendering uses mathematical models for virtual sources which are fed by (dry) source signals.An acoustic scene is typically composed of multiple virtual sources.Frequently used models are point sources and plane waves.
The theory of SFS assumes a continuous distribution of acoustic sources placed around the listening area to reproduce the desired sound field.In practice, a limited number (up to hundreds) of individually driven loudspeakers placed at discrete positions approximates this distribution.The synthesis accuracy is mainly limited by spatial aliasing artefacts which are introduced to the reproduced sound field due to the finite resolution of this discretisation.For some applications, an accurate reproduction is not necessary in the whole extended area.For instance, the listener's position is restricted to a smaller region of interest.Here, techniques for Local Sound Field Synthesis (LSFS) are useful as they aim at a more accurate synthesis within a prioritised area which is smaller than the area surrounded by the loudspeaker array.Several approaches [3]- [7] for LSFS have been proposed within the past decade.Recently, the authors have published a method for Local Wave Field Synthesis (LWFS) [8] which expands the desired sound field into circular/spherical harmonics around a chosen expansion centre.Spatial aliasing is reduced by truncating the expansion and reproducing the resulting sound field with conventional WFS.This paper presents a parametric time-domain realisation of the mentioned approach focusing on the model-based rendering of plane waves and point sources.The parametric implementation allows for time-varying aspects, such as moving sources or a moving expansion centre.Latter enables the adaptation to a changing listening position.The realisation is compared with conventional WFS regarding the spatial structure and spectral properties of the reproduced sound field.

II. NOMENCLATURE
A position vector x in the three-dimensional, right-hand coordinate system is defined by its cartesian [x, y, z] T or its cylindrical representation [ρ cos φ, ρ sin φ, z] T .Within this treatise components denoted with a subscript * , e.g.x * , belong to the respective vector x * .The scalar product of two vectors is written as •|• .The radial frequency ω = 2πf is defined by the temporal frequency f .The imaginary unit is denoted by j.

A. Problem Statement
The fundamental task in LSFS is to reproduce a desired (aka.virtual) sound field P (x, ω) within a defined listening region Ω l ⊆ Ω (cf.Fig. 1).In 2 1 /2-dimensional (2.5D) scenarios [9, Sec.2.3], reproduction is restricted to the horizontal plane, i.e. z = 0, and Ω l and Ω are hence twodimensional areas.For the special case where Ω l = Ω, approaches are usually referred to as conventional SFS.A distribution of loudspeakers is positioned along the boundary ∂Ω as so-called secondary sources (loudspeaker symbols).Each secondary source is oriented along the inward pointing boundary normal n 0 (x 0 ) = [cos φ n 0 , sin φ n 0 , 0] T .The sound field emitted by an individual secondary source is commonly modelled by a monopole point source.It is given by the three-dimensional free-field Green's function [10,Eq.(8.41)]G(x|x 0 , ω) with x 0 ∈ ∂Ω.Each individual secondary source is driven by its respective driving signal D(x 0 , ω) and the resulting superposition of all secondary sources constitutes the reproduced sound field.The driving signals have to be chosen such that the reproduced and the desired sound field coincide within.Mathematically, this is subsumed by A suitably chosen differential line segment for the integration along the boundary ∂Ω is denoted by dl(x 0 ).

B. Wave Field Synthesis
WFS is based on a high-frequency approximation of the Helmholtz Integral Equation (HIE).For 3D scenarios, the driving function is generally described by [11, Eq. ( 10)] where ∇ x 0 denotes the gradient w.r.t.x 0 .The selection criterion a P (x 0 ) activates only the secondary sources, whose normal points in the direction of propagation of the virtual sound field at x 0 .For a virtual plane wave with the propagation direction n pw (φ pw ) = [cos φ pw , sin φ pw , 0] T , this criterion is given as The 2.5D WFS driving function for a plane wave reads [12, Eq. (2.177)] , where x ref denotes the reference point at which the amplitude of the reproduced and the desired plane wave coincide.The geometry-dependent weighting factor and delay are subsumed under w(x 0 , φ pw , x ref ) and τ (x 0 , φ pw ) = x 0 |n pw /c, respectively.

C. Continuous Driving Function
Under free-field conditions, any two-dimensional (zindependent) sound pressure field P (x, ω) that is source-free in the domain of interest may be expanded about the expansion centre x c into [10, Eq. (4.49)] with the regular circular expansion coefficients Pµ and the µ-th order cylindrical Bessel function J µ .The vector x = x − x c describes a position in a shifted frame with the expansion centre x c as its origin.In order to limit the spatial bandwidth of P (x, ω) the summation in ( 5) is bound to a finite order of ±M .
While in [8] the driving function is directly computed from the truncated circular expansion, an intermediate representation is dφ pw (6) with the plane wave expansion coefficients [13, Eq. (4.91)] computed from the circular expansion coefficients.Since the integration in ( 1) is a linear operation, the driving function D(x 0 , ω) to reproduce P (x, ω) can be computed by superposing the driving functions necessary to reproduce each individual plane wave.Hence, the plane wave in ( 6) is substituted with the right-hand side of (4) in order to get × P (φ pw , x c , ω)e −j ω c x 0 |n pw dφ pw as the LWFS driving function.Here, x ref = x c has been chosen.The inverse Fourier transform of (8) yields the continuous time-domain driving signal with δ denoting the Dirac delta distribution.The convolution operator w.r.t to time is denoted by * t [14, p. 169].The wellknown geometry independent WFS pre-filter h pre (t) is the inverse Fourier transform of j ω c .

D. Temporal and Spatial Discretisation
The temporal sampling with a sample period T s is incorporated by the substitution (t) → (nT s ) → [n] with n ∈ Z. Further, the plane wave expansion in ( 9) is approximated by a sum over N pw equidistant samples on the unit circle, i.e. φ pw → 2πm /N pw with m ∈ Z ∧ 0 ≤ m < N pw .For brevity, the notation (•) m = (•)[ 2πm /N pw ] is introduced.The discretised time-domain plane wave expansion coefficients of the bandlimited sound field N pw µm (10) result from the inverse Fourier transform of ( 7) and subsequent discretisation.In order to avoid additional spatial aliasing due to discretisation of the plane wave expansion, N pw has to to be significantly greater than 2M + 1.Hence, the equation states a truncated Inverse Discrete Fourier Transform (IDFT) w.r.t. to the variable pair (µ,m) weighted by N pw .As the resulting time-domain plane wave coefficients are expected to be real-valued, the equation can be efficiently implemented by an Inverse Fast Fourier Transform (IFFT) for a conjugatesymmetric input using only the circular coefficients for nonnegative µ together with a suitable zero-padding.The discrete driving signal reads with * n as the discrete convolution operator w.r.t n.As the delay τ m (x 0 ) is generally not an integer multiple of the sample period T s , fractional delay (FD) filters [15] have to be applied for interpolation.For more information on FD filters in the context of a similar approach for LWFS the reader is referred to [16].Within this publication, it is assumed that the FD filter is designed such that its introduced distortions are negligible within the observed frequency range.The realisation of ( 10) and ( 11) is summarised in Fig. 2: The WFS pre-filter h pre [n] can be directly applied to the dry source signal s[n].The result is then filtered by the circular expansion coefficients, followed by the mentioned IFFT.The individual plane wave signals pm are buffered in delaylines allowing to request differently delayed and weighted versions for each secondary source.As the remaining task, parametric expressions of the circular expansion coefficients pµ [x c , n] have to be provided for the plane wave and the point source.

A. Plane Wave
The circular expansion coefficients of a plane wave are given in the time-frequency domain as [17, Eq. (2.41)] with n s = [cos φ s , sin φ s , 0] T defining the propagation direction.The coefficients are transferred to the discrete time domain using the inverse Fourier transform and subsequent discretisation.Similar to (11), a FD interpolation is necessary, which can be directly applied to the source signal s[n].

B. Point Source
A point source is a three-dimensional sound field and cannot be fully described by a circular expansion.However, an approximative conversion formula [8, Eq. (32)] can be applied to its spherical expansion coefficients [18,   to express the point source in terms of circular expansion coefficients The sound source position x s is hereby restricted to the horizontal plane.The |µ|-th order spherical Hankel function [18, Eq. (2.1.76)] of second kind is denoted by h |µ| .The timefrequency dependent part of the coefficients can be expressed in the Laplace-domain (jω → s) as [19,Eq. (3.20)] with σ (|µ|) l being the l-th zero of a |µ|-th order polynomial [19,Eq. (3.21)].A direct implementation of (13) as a Infinite Impulse Response (IIR) filter, e.g.via the bilinear transform, is not possible due to stability problems caused by the pole of order |µ| at s = 0. Also an Finite Impulse Response (FIR) implementation, e.g.via the frequency-sampling method, is challenging as the pole induces very high amplitudes for low frequencies.
In the context of SFA using spherical microphone arrays [20], Linkwitz-Riley (LR) [21] highpass filters were used to stabilise the spherical Hankel functions.For each mode µ, a filter of order 2η ≥ |µ| was used to compensate the pole at s = 0.As filters of different orders and with different cut-off frequencies where applied to individual modes, phase match between the modes had to be preserved by a cascade of (M − 1) LR allpass filters which mimic the phase distortions introduced to the other modes.While this was feasible for a low M (≈ 4) which is common for spherical microphone arrays, model-based SFS may require much higher M .Hence, computational complexity is reduced by applying the same highpass filter to all modes, whereas 2η ≥ M .As a drawback, this results in a strong highpass-characteristic of the reproduced sound field.As conventional WFS is aliasing-free at low frequencies, a frequency crossover is used to combine the WFS and LWFS driving functions to with LP 2η f c (ω) and HP 2η f c (ω) denoting a 2η-th order transientimperfect LR crossover pair with the joined allpass characteristic The highpass HP 2η f c (ω) is combined with the spherical Hankel function for each mode µ in order to explicitly remove its poles and establish stability.The cut-off (or crossover) frequency f c may be chosen according to the aliasing frequency, i.e. the frequency up to which WFS does not introduce significant aliasing to the reproduced sound field.A discrete-time implementation is achieved via the bilinear transform together with a frequency pre-warping at f c .
As an optional processing step, the allpass characteristic of the crossover pair is compensated by applying its inverse to the driving function such that holds.Since the inverse of an allpass generally results in an unstable filter, the so-called backward filtering approach is utilised: The inverse of an allpass filter is equivalent to its conjugate complex, which corresponds to a time inversion of the allpass' impulse response.Latter behaviour can also be achieved by time-inverting the allpass' input signal and timeinverting the resulting output signal, again.An online, blockbased approach for backward filtering is given in [22].

V. EVALUATION
For the evaluation, a circular array of 1.5 metre radius consisting of 56 equi-angular spaced loudspeakers is used, cf.black dots in Fig. 3.The array is centred at the coordinates' origin and is located in the horizontal plane.A plane wave with a propagation direction of n pw = [0, −1, 0] T and a point source located at x s = [0, 2.5, 0] T m define the desired sound fields.As the source signal, a broadband impulse emitted at t = 0 is chosen.The sample rate 1 /T s and the speed of sound c are set to 44.1 kHz and 343 m /s, respectively.For LWFS, M and N pw are set to 27 and 1024, respectively.For the synthesis of the point source, the crossover frequency f c is set to 1.2kHz and a LR crossover with order 2η = 28 is used.For WFS, x ref is kept constant at [0, 0, 0] T m.
The sound field plots shown in Fig. 3 are used to analyse in how far LWFS is capable to reconstruct the spatial structure of the sound field.Especially, an accurate synthesis of the first wave front with a correct propagation direction is desired as this is likely to trigger humans' precedence effect [23] to correctly localise the sound source.It can be seen for conventional WFS, that the mentioned wave front is reconstructed well over the whole listening area, but followed by aliasing components, which is characteristic for this method.For a virtual plane wave, LWFS yields good results near the expansion centre, which agrees with evaluations done in [8].For the point source, both methods yield the correct structure near the expansion centre.Without the allpass compensation of the LR crossover, significant phase distortions behind the first wave front can be observed, cf.third column in Fig. 3.
The magnitude spectra of the reproduced sound fields evaluated for different expansion centres are depicted in Fig. 4. The results agree with the prior observations from the sound field plots: WFS introduces significant aliasing artefacts above ≈ 1 kHz.For LWFS, only minor fluctuations at high frequencies can be identified.Note, that a comparison between (15) and (17) for the synthesis of a point source is obsolete, as the optional allpass compensation does not effect the magnitude spectrum.For all methods, deviations at low frequencies are caused by the high-frequency approximations included in the theory of WFS, which also builds the basis for the LWFS approaches.Contrary to LWFS, the reference position x ref in WFS is independent from x c , which results in an amplitude offset for evaluation positions other than x ref .

VI. CONLUSION
This paper presents a time-domain realisation of a Local Wave Field Synthesis approach using spatially bandwidthlimited sound fields.A parametric implementation for modelbased rendering targeting plane waves and point sources as virtual source models is given.The main building blocks are their circular expansion coefficients, an IFFT to convert this representation into a plane wave expansion and a bank of delay lines to synthesise individual plane waves.While the coefficients for a plane wave can be provided straightforwardly, two approaches have been presented to overcome stability problems for the IIR implementation of a point source.For both approaches, a frequency crossover between the conventional WFS and LWFS driving signal was suggested.The presented realisation allows for the synthesis of time-varying scenarios, as scene parameters like source positions or the circular expansion centre can be changed.Future work includes the consideration of other crossover approaches for the synthesis of the point source.Also the artefact-free handling of IIR filter coefficient changes has to be addressed.

Fig. 2 :
Fig. 2: The block-diagram shows the signal flow for the time-domain realisation of model-based rendering in LWFS.The vectors w m and τ m subsume all w m (x 0 , x c ) and τ m (x 0 ), respectively.

Fig. 3 :
Fig. 3: The plots qualitatively show time-snapshots of the reproduced sound fields using different SFS approaches (columns, reference shown in brackets).The colour encodes the logarithmically scaled magnitude ranging from -60 (white) to 0 dB (dark blue).The desired sound field is a plane wave (PW) and a point source (PS).The expansion centre x c for the LWFS approaches is set to [0, 0, 0] T m (top) and [−0.75, 0.75, 0] T m (bottom).It is indicated by the red cross and is added to the WFS plots for comparison.The time in each snapshot is set to t = |x s −x c | /c.The green line indicates of the wave front of the desired sound field as the ground truth.

Fig. 4 :
Fig. 4: Magnitude spectrum of the reproduced sound field at x c for different SFS techniques.The spectra are normalised to the spectrum of the desired sound field, i.e. plane wave (PW) and point source (PS).The remaining parametrisation is the same as for Fig. 3.