Non-iterative Filter Bank Phase (Re)Construction

Signal reconstruction from magnitude-only measurements presents a long-standing problem in signal processing. In this contribution, we propose a phase (re)construction method for filter banks with uniform decimation and controlled frequency variation. The suggested procedure extends the recently introduced phase-gradient heap integration and relies on a phase-magnitude relationship for filter bank coefficients obtained from Gaussian filters. Admissible filter banks are modeled as the discretization of certain generalized translation-invariant systems, for which we derive the phase-magnitude relationship explicitly. The implementation for discrete signals is described and the performance of the algorithm is evaluated on a range of real and synthetic signals.


I. INTRODUCTION
In this contribution, we suggest a direct method for the construction of time-frequency phase information from magnitude-only measurements with respect to a collection of analysis filters. In Fourier-based signal analysis, phase information is crucial for signal reconstruction from filter bank (FB) coefficients. Two variants of the phase reconstruction problem are most prominent: (a) Due to limitations in the measurement/analysis process, only magnitude measurements can be obtained or the phase is involuntarily lost in some processing step. (b) In many processing applications, the phase of the analysis coefficients before processing is known. However, after coefficient modification, the known phase is often invalid and has to be adjusted. While the first instance is common in optics and medical imaging, where phase retrieval has been an active problem for several decades [1], the second instance is arguably more important in audio signal processing. It arises in applications such as source separation and denoising [2], timestretching/pitch shifting [3], speech synthesis [4] and missing data inpainting [5], to name a few.
In order to reconstruct a signal from representation coefficients, it is necessary for the underlying representation to be invertible. For linear systems, invertibility is essentially equivalent to the frame property. Moreover, it has been shown that for a generic phase retrieval algorithm to have any hope of providing reliable solutions, a certain overcompleteness is This work was supported in part by the Austrian Science Fund (FWF) START-project FLAME ("Frames and Linear Operators for Acoustical Modeling and Parameter Estimation"; Y 551-N13) and MERLIN (I 3067-N30). This is the Author's Accepted Manuscript version of work presented at EU-SIPCO17. It is licensed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. The published version is available at: https://ieeexplore.ieee.org/ abstract/document/8081342 strictly necessary [6]. For such redundant, invertible linear systems, a number of iterative phase retrieval schemes have been proposed, the most important of which is the Griffin-Lim algorithm (GLA) [7]. In particular, the recent fast GLA (fGLA) [8] provides good results with reasonable computational performance. Generally, all iterative phase reconstruction algorithms require a significant number of rather costly iterations, see also [9] for an alternative to fGLA . For the particular case of the short-time Fourier transform (STFT), specialized methods have been presented [10]- [13]. Here, we introduce an extension of phase gradient heap integration (PGHI), see [13], where a more exhaustive overview and comparison of previous phase reconstruction schemes is given. PGHI uses the phase-magnitude relationship of the STFT with a Gaussian window [14] to compute the phase gradient from the magnitude coefficients and generate a phase estimate by integration.
We derive a generalization of the essential equations provided in [13], valid for certain generalized translation-invariant (GTI) systems [15]. Although we are not able to exactly determine the phase gradient solely from known information, our evaluation shows that the resulting approximation achieves excellent results.
Notation: In this manuscript, we consider continuous or discrete signals of finite energy, i.e. s ∈ L 2 (R) or s ∈ 2 (Z). By T x , M ξ and D γ we denote the translation, modulation and dilation operators given by T x s = s(· − x), M ξ s = e 2πiξ(·) s, and D γ s = γ −1/2 s(·/γ) and their analogue on 2 (Z). Without subscript, T denotes the time-weighting operator Ts = (·)s.

VARIATION AND THE DERIVED FILTER BANKS
A generalized translation-invariant (GTI) system on L 2 (R) is a collection of functions {g i } i∈I ⊂ L 2 (R), for some index set I, together with all their translations on the real line, i.e. {T x g i } x∈R,i∈I . Here, we only consider I = R, identified with frequency, and g ξ := M ξ D γ(ξ) g, where g ∈ L 2 (R) is the prototype function and γ : R → R + is a continuous function of the frequency variable ξ determining the frequency-bandwidth relationship. We define The analysis coefficients of a function s with respect to G(γ, g) are defined through the inner products c s (x, ξ) := V G(γ,g) s(x, ξ) := s, g x,ξ = s * g ξ (−·)(x), (2) arXiv:2202.07498v1 [cs.SD] 15 Feb 2022 for all x, ξ ∈ R. The final equality shows that c s (·, ξ) is a filtering of s with the filter g ξ (−·). The complex-valued function V G(γ,g) s can be described by its magnitude and phase as for all x, ξ ∈ R, where M G(γ,g) s := |V G(γ,g) s| is the magnitude and φ s G(γ,g) is the phase of V G(γ,g) s. Let {g k } k∈Z ⊂ L 2 (R) be a collection of functions and a ∈ R + a decimation factor. The system {g n,k } n,k∈Z with g n,k = T na g k is a filter bank (FB). The analysis coefficients of s with respect to {g n,k } n,k∈Z are The frame property guarantees the stable invertibility of the coefficient mapping by means of a dual frame For FBs with uniform decimation, various efficient methods exist for computing the dual frame or at least the synthesis operation (4), see [16]- [18]. Clearly, if ξ : Z → R is an increasing function, the filter bank is a sampling of G(γ, g) and

III. PHASE-MAGNITUDE RELATIONSHIPS FOR GAUSSIAN GTI SYSTEMS
Assume that g ∈ C 1 and γ ∈ C 1 . A straightforward calculation using and analogous for ∂ ∂ξ log V G(γ,g) s , show that the equalities provided in (7) hold for all g ∈ L 2 (R)∩C 1 (R). The derivation steps here are analogous to [19]. Now, if we set g = g 0 := 2 1/4 e −π(·) 2 , then the equality g 0 = −2πTg 0 yields (8) and if γ is constant, we obtain the phase-magnitude relationship for the STFT [14].

IV. APPLICATION TO FILTER BANK PHASE (RE)CONSTRUCTION
Given a FB G(γ • ξ, g 0 , a) as per (5), the results of the previous section can be used to obtain a phase estimate φ s G(γ,g) from the magnitude measurements |c s [n, k]| = M G(γ,g0) s(na, ξ(k)). To that end, PGHI [13] is adapted to cope with the more general filter bank structure. Before a phase estimate can be constructed, we have to compute an estimate of the phase derivative from the given magnitude. Assuming that only |c s |, ξ(·) and γ(ξ(·)) are known, we have where ∆ n and ∆ k are discrete differentiation schemes. Since the sampling step in time is uniform and equals the decimation factor a, we use simple centered differences for ∆ n , i.e.
The sampling step in frequency is variable, depending on ξ(·).
Hence, weighted centered differences are used: .
(12) Note that we omit the terms depending on V G(γ,T 2 g) s. Although this introduces additional inaccuracies, our results have shown that the contribution of those terms is minor and their omission has little adverse effect.
The integration of (∆ φ,x,s Integration in time direction is again straightforward, see (9), while integration in frequency direction takes the channel distance into account (10). The purpose of the heap integration algorithm, described in the next section, is the initialization of the integration and the adaptive selection of the integration path, i.e. when to use (9) or (10).

V. IMPLEMENTATION AND ANALYSIS
In practice, we work with sampled signals and digital filters in 2 (Z). The signal s and the prototype filter g are assumed to be samples of smooth and localized functions, such that the procedure described above still provides a valid estimate of the phase of c s . Moreover, we only have to consider a limited number of frequency channels k ∈ K := {0, . . . , K − 1} and, if s is finitely supported, time positions n ∈ N := {0, . . . , N − 1}, for which to compute φ s G(γ,g0) (na, ξ(k)). Algorithm 1 (FBPGHI), a modified version of PGHI, is used to compute the phase estimate φ s G(γ,g0) . If N is very large, or s has infinite support, RTPGHI [20] can be similarly adapted.
The algorithm introduces several possible sources of inaccuracy. In addition to the errors present in PGHI, we (a) approximate γ (ξ(k)) by a weighted centered difference only involving γ(ξ(k)), γ(ξ(k ± 1)), where available. (b) disregard the real or imaginary part of in (8), since there is no straightforward way to obtain them from known information. Therefore, the accuracy of the algorithm rests on γ not to vary too quickly. In this case, the derivative of γ is approximated well by the finite difference scheme and moreover, the factor γ (ξ)/γ(ξ) is expected to be small, such that the missing term has little influence on the result.

VI. EVALUATION
To demonstrate the performance of the proposed algorithm, we applied it to a number of real and synthetic audio signals, using several different filter bank configurations. Moreover, we compare our results to the results provided by the established, iterative fast Griffin-Lim algorithm (fGLA). Phase reconstruction algorithms are usually not expected to recover the original phase exactly and, typically, the reconstruction quality cannot be easily judged by simply comparing the waveforms of the original and reconstructed signals. In [7], Griffin and Lim have proposed to use the spectral difference E spec (s,s) = 20 log 10 |cs|−|cs| 2 cs 2 , to measure the distortion of the phaserestored signals. Despite some flaws, E spec usually provides a useful indicator of the restoration quality. Figure 1(r) shows a typical example of the phase difference between the original representation phase and the phase obtained with FBPGHI. See [13] for more details.
Evaluation setup: For the evaluation, we selected seven signals, sampled at ξ s = 44.1 kHz each: , l ∈ ξ s , where echirp 1 and echirp 2 are real-valued, constant amplitude chirps with exponential frequency modulation and center frequency increasing from 500 Hz to 15 kHz, resp. decreasing from 18 kHz to 3 kHz. • s 3 is ξ s samples of white noise. 1) Adapted to the ERB scale [23], bins = 1 and bw = 2, for the full frequency range, see also [18]. 2) Adapted to the ERB scale, but bins = 4, bw = 1/2. The latter 2 scales have no particular perceptual relevance and were chosen merely for demonstration purposes. For now, our method only considers uniform decimation by a. The chosen decimation factors and resulting redundancies are shown in Table I. Quantitative evaluation: Table II lists the spectral difference in dB of the solution provided by the proposed algorithm for all combinations of signals and filter banks. It can be seen that, despite considerably different redundancies, the algorithm Algorithm 1: Phase Gradient Heap Integration -FBPGHI Input: Magnitude |c s | of FB coefficients, estimates ∆ φ,x,s G(γ,g) and ∆ φ,ξ,s G(γ,g) of the time and frequency phase derivative, relative tolerance tol . Output: Phase estimate φ s G(γ,g0) . 1 abstol ← tol · max (c s [n, k]); 2 Create set I = {(n, k) ∈ K × N : c s [n, k] > abstol }; 3 Assign random values to φ s (n, k) for k / ∈ I; 4 Construct a self-sorting max heap [21] for (n, k) pairs; 5 while I is not ∅ do 6 if heap is empty then 7 Move (k m , n m ) = arg max  performs similar for all considered FBs in terms of spectral difference. The possible exception to this rule is the ERB-scale FB(1) with only 1 filter per ERB, which performs worse in almost all cases. Also of note are the large values of E spec for the noise s 3 , which is consistent with the evaluations in [13].
Comparison with iterative methods: In [8], [13], it was shown that, for Gabor transforms, fGLA performs comparably or better than other iterative schemes in terms of spectral difference. Since we have no reason to assume that the situation changes in the filter bank setting, we consider fGLA as reference algorithm. In Figure 2, we provide some examples as to how FBPGHI compares to fGLA iterations for the signals s 4 to s 7 and FB (2). Between 30 and 80 fGLA steps are necessary to achieve the same E spec as FBPGHI and any meaningful improvement requires a large number of additional Figure 1: Filter bank spectrogram (l) and difference between original and FBPGHI-restored phase (r) for an excerpt of s 4 . The displayed phase difference is the difference between the phase angles in radians, divided by π.

FB
(1)   fGLA steps 1 . Note that every step of fGLA requires 1 application each of FB synthesis and analysis. Therefore, every iteration has considerable computational cost, while FBPGHI is very efficient, see also [13] for more details. In the same contribution, it was shown that the initialization of fGLA with PGHI provided a significant quality boost over both methods. We expect the same for FBPGHI.
Perceptual performance: Informal listening of the signals restored by the proposed FBPGHI algorithm or 80 fGLA iterations, for signals s 1 through s 7 and FBs (1) to (5) led to the conclusion that both methods performed comparably and without significant artifacts on all signals 1 . No clear performance gap between the algorithms could be detected, with the exception of FB(1) which produced clearly audible artifacts for both FBPGHI and fGLA, albeit on different signals. We attribute these artifacts to the poor frequency resolution of FB(1).

VII. CONCLUSIONS AND OUTLOOK
We have provided an extension of the recent PGHI algorithm for phase reconstruction to filter banks with controlled frequency variation and uniform decimation. Experiments have shown that the algorithm performs competitively in terms of an established objective error measure and also perceptually.
A significant drawback of the proposed method is the required redundancy, in particular for filter banks with highly  varying filter bandwidths, see Table I FBs (3), (5). Therefore, a logical next step will be the combination of the heap integration method with nonuniform decimation. Such a scheme will enable the selection of an appropriate sampling step for each frequency channel. Therefore, it can be expected that redundancy is significantly reduced without meaningful impact to the restoration quality. However, the adaptation of the heap integration method to a truly nonuniform sampling grid requires significant work.
In [25], the authors propose an improved phase vocoder based on PGHI for time-stretching and pitch-shifting. A similar application of FBPGHI is conceivable and might possibly further improve the quality of the achieved effect. Future theoretic work could be concerned with finding an appropriate estimate including the neglected terms in (8) as well as estimates for the error if the used filters differ from the Gaussian. Preliminary results 1 have shown promising results for Blackman filters.