On the feasibility of personal audio systems over a network of distributed loudspeakers

Personal audio reproduction systems deal with the creation of personal sound zones within a room without the necessity of using headphones. These systems use an array of loudspeakers and design the required filters at each loudspeaker in order to render the desired audio signal to each person in the room as free of interferences as possible. There are very interesting proposals in the literature that make use of circular or linear arrays, but in this paper we study the problem considering a network of distributed loudspeakers controlled by a set of acoustic nodes, which can exchange information through a network. We state the model of such a distributed system by considering the electro-acoustic paths between the loudspeakers and each microphone, and try to provide a minimum signal-to-interference-and-noise ratio (SINR) to each zone, but constraining the emitted power of the loudspeakers to a maximum value (avoiding annoying feedback effects). We make use of optimization techniques to decide if, given a distribution of the loudspeakers and a location of the personal sound zones within the room, the system will be feasible. Simulations are done to support the use of the proposed optimization techniques.


I. INTRODUCTION
The aim behind a personal audio system (PAS) is to create multiple audio zones inside a room making use of distributed loudspeakers.Recently Betlehem et al. [1] presented an interesting overview of the major challenges that multizone sound control in a room has to deal with.One of the first approaches to this problem is based on the maximization of a contrast function between the bright zone (where the sound is presented) and the dark or silent zone, which converts the multizone problem in an optimization problem [2].In this paper we deal with the case of creating two sound zones inside a room, but where a different sound has to be rendered at each zone.From an alternative point of view, this problem is a generalization of the classical crosstalk cancellation problem [3].
The distribution of the loudspeakers that provide the personal audio zones usually follows some geometric pattern, where the loudspeakers are fixed in their position inside the room.In this way, a careful design of the system can be done during a setup stage in order to provide the best quality of experience.In the case of wireless acoustic sensor networks (WASN) [4] where the loudspeakers do not follow any pattern and the acoustic nodes can change their location, the setup stage is even more critical, and it becomes necessary to develop tools that can help to analyse the performance of a WASN prior to its use.In this paper we propose an algorithm that analyses the feasibility of a WASN to provide a certain signal-to-interference-and-noise (SINR) level at two different zones, but keeping the sound power emitted by the loudspeakers as low as possible [5], [6].Instead of maximizing the SINR functions as the contrast function in [2], the optimization consists on providing at least a certain SINR level at each zone, but constraining the loudspeakers to emit with the minimum achievable power.If the algorithm is evaluated for a certain range of desired SINRs, the maximum SINR for which the system is feasible can be used as a 'performance measure' of the deployed WASN to provide two different bright zones.Moreover, although the algorithm is proposed for only two sound zones in this paper, it could be easily extended to several zones provided that each zone is covered by one acoustic node.
The outline of the paper is as follows: Section II states the model of the signals involved in the PAS when a WASN is used.In Section III the proposed algorithm is explained whereas in Section IV simulation results based on real channels from a WASN are shown to support its benefit.Finally, Section V summarizes the main conclusions.

II. MODEL AND DESCRIPTION
A wireless acoustic network formed by two nodes is considered as in Fig. 1, where each node is equipped with an array of N loudspeakers and one microphone.The WASN is deployed in order to create different sound zones at the locations of the microphones, in such a way that the dominant sound at the position of the first node should be s 1 (n), whereas at the position of the second node, the dominant sound would be s 2 (n).
The signal recorded at the mth microphone (m = 1, 2) at discrete-time n is given by: where (•) T stands for transposed, symbol * denotes discretetime convolution, c ml,k models the [L c × 1] electro-acoustic path between the lth loudspeaker of node k and the mth microphone, L c is the maximum number of samples of any c ml,k , and z m (n) is the electro-acoustic noise.In Fig. 1, the lth loudspeaker of node k is fed by signal v l,k (n) being: On  where g ik,l stands for the lth element of vector and p ik represents a power term associated to the whole array of node k that affects only signal s i (n).The elements of g ik are a kind of control parameters associated to every loudspeaker of node k and also affecting particularly the signal s i (n).The use of both parameters, p ik and g ik , will be explained later.Substituting (2) into (1), the model of the signals recorded at each microphone is stated as: where the convolutions between the acoustic paths c ml,k and signals s i (n) are calculated using vectors

III. ANALYSIS OF THE WASN
Since the goal of the system is to render signal s 1 (n) at the location of the first microphone and signal s 2 (n) at the location of the second one, we define for each x m (n), the desired signal as: and the interference signal as: with m = 1, 2, i = 1, 2 and i = m.
In order to obtain a compact form of the above expressions, let us define the matrix containing the N electro-acoustic paths between the array of loudspeakers of node k and microphone m as: Therefore, the desired and interference signals for m = 1, 2, i = m, can be expressed as: It is straightforward from ( 4) and ( 8)-( 9) that x m (n) = x m,D (n) + x m,I (n) + z m (n), thus the signal-to-interferenceand-noise ratio (SINR) can be expressed for each microphone signal as: where σ 2 m , σ 2 i and σ 2 Zm are the average power of s m (n), s i (n) and z m (n), respectively.From this point on, we will assume the following conditions: (i) Both signals s 1 (n) and s 2 (n) have the same average power, σ 2 1 = σ 2 2 .This is assumed for the sake of clarity, but it is also an ordinary condition in the testing of audio systems with white noise signals.
(ii) We define a new noise power term normalized with respect to the signal power: The electro-acoustic paths of node k are assumed to be uncorrelated to the electro-acoustic paths of node j.As it can be seen from Fig. 1, the two arrays of loudspeakers are considered to be deployed over different separated areas, thus the coupling or cross-correlation between them can be neglected.Therefore, the cross terms C T m,k C m,j in (10) with k = j will be discarded.Under assumptions (i)-(iii), and denoting , the final expression of the SINR at the location of each microphone m is given by: Matrix R m,k in (11) can be considered as the correlation matrix of the acoustic channels between the loudspeakers of node k and the mth microphone.In physical terms, R m,k is related to the amount of acoustic coupling between the acoustic channels in C m,k [7].In this sense, the vectors g mk could be seen as 'effort' vectors that can help the system to deal with a given acoustic coupling.On the other hand, if s i (n) were single-frequency signals, g mk could be seen as the set of coefficients needed to build signals v l,k (n) (2) in order to maximize the 'acoustic contrast' [1], [8], which in our case would be defined as the SINR in (11).
Regarding the power terms p mk in (11), they can be considered as control parameters used to minimize the power of the sound emitted by each node.In physical terms, the system depicted in Fig. 1 could present a dangerous behaviour if there were no limits in the emitted sound level: node k could increase its power p mk in order to improve the term related to s m (n) at microphone m, but this would also increase the amount of interference received at the other microphone, creating a loop that could make the system unstable.
Therefore, both parameters g mk and p mk are used in the following algorithm in order to analyse the feasibility of the WASN system to render two different sound zones.Moreover, the proposed algorithm also gives a maximum SINR that could be obtained by the WASN using only these 'effort' and power control parameters, which could be seen as a 'performance' measure of the deployed WASN.

A. Constrained power minimization algorithm
The constrained power minimization algorithm [5], [6] tries to minimize the total sound power emitted by the WASN loudspeakers subject to the SINR at each microphone is larger than a required SINR threshold: where γ m is the required SINR at the mth microphone.Notice that the solution to (12) will be considered feasible only if p mk are non-negative vectors, since they represent power terms.The minimum power in (12) will be obtained for the lowest achievable SINR [5], that is, when SINR m = γ m .Under this condition, we can express: the block diagonal matrix D as: with the block anti-diagonal matrix F as: with and vector u = γ 1 σ 2 Z1 , γ 2 σ 2

B. Feasibility of the solution
For a given set of vectors g mk and required values γ m , p is a feasible solution of (19) if it is non-negative.This can be proved by means of the theorem of Perron-Frobenius that states some conditions on non-negative and irreducible matrices (see chapter 8 of [9] for a full description 1 ).One of the practical properties extracted from the theorem is [9]: Assume [n × n] matrix A is a non-negative irreducible matrix.For a given non-negative and non trivial vector c and any constant s, the necessary and sufficient condition to obtain a non-negative and non trivial solution x to: where ρ(A) is the spectral radius of A, whose definition is the maximum of the absolute of the eigenvalues of A.
In order to apply this property to our problem, (19) can be expressed as (I − D † F)p = D † u, and, consequently, the necessary and sufficient condition for (19) to be feasible is: C. Solution based on a virtual WASN [5] The condition in (20) assures that there exists a non-negative vector p that minimizes (12) for a given set of vectors g mk and a required SINR γ m .However, the range of achievable SINRs depends on g mk as much as on the power term.On the other hand, the maximization of (11) cannot be done at each zone separately, since each vector g mk affects both zones: for instance, g 1k appears in the nominator of SINR 1 and in the denominator of SINR 2 .The algorithm proposed in the following tries to separate this mutual influence of vectors g mk making use of a virtual system [5], which is a dual version of the real one.In this way, an iterative algorithm can be proposed such that at each iteration the vectors g mk are calculated to maximize the SINR, and afterwards the vector p is calculated to minimize the power.
Let us define a virtual WASN dual of the one depicted in Fig. 1 where, for each node k, the microphone has been substituted by a loudspeaker and N loudspeakers have been substituted by N microphones, but whose acoustic channels were exactly the same as they are now.That is, the virtual WASN would have the same representation of the channels as the one in Fig. 1, but with the arrows at the opposite end.This type of virtual system has been widely used in mobile communication systems [5], [6], [10] due to an interesting property: If the respective constrained power minimization problem is formulated for the virtual WASN, and it shows a feasible solution, the constrained power minimization problem for the real WASN is also feasible and the vectors g mk that maximize the SINRs of the virtual WASN also maximize the SINRs of the real one [5], [11].
Let us define the SINR of the virtual WASN for the m signal as [5]: where parameters pmk , pik , are the corresponding virtual powers and The parameter α m is the result of considering a noise of unit power [5] recorded by the virtual microphones of each node.Let us also define the equivalent constrained power minimization problem of ( 12) for the virtual WASN as: and, finally, let us also state the equivalent equations to ( 13)-( 19) for the virtual WASN as: which can be expressed as where F is also an anti-diagonal matrix defined as: with vector p is defined as and vector ũ is defined as ũ = [γ 1 α 1 , γ 2 α 2 ] T .As it has been stated before, the benefit of the virtual system is that it can be solved iteratively.Once the solution is reached and it is checked to be feasible, the resulting vectors g mk also maximize the real SINRs assuring a feasible solution to (11).On the other hand, the resulting vector of virtual parameters p has no meaning on the real WASN and the real vector p has to be computed apart from p, but both vectors are computed at each iteration.For this purpose we use the minimum norm solution of ( 19) and (24) given by the Jacobi iterative method [12], which obtains the same value as the direct solution to (19), but iteratively.The Jacobi update equation for ( 19) is expressed as and for (24) it is expressed as where N it the iteration step.The final iterative algorithm to solve the constrained power minimization of ( 12) is given in Algorithm 1.The convergence of the algorithm is defined to reach SINR m = γ m for m = 1, 2 in a maximum number of 50 iterations.If this equality is not achieved within 50 iterations the algorithm stops and the WASN system is considered unfeasible.Update real powers p by means of (28).The WASNs have been deployed in the listening room located at the laboratory of Audio and Communication Signal processing of the Institute of Telecommunications and Multimedia Applications (iTEAM) (see http://www.iteam.upv.es/group/gtac.html).The room is 9.07m long by 4.45m wide by 2.69m high, its ceil and walls are isolated and it presents a reverberation time of T 60 = 0.15s.The sampling frequency is f s = 11025 Hz, the noise power is σ zm = 10 −6 , and the acoustic paths have been estimated using chirp signals such that the channel vectors have L c = 1600 coefficients.
Before presenting the results, a discussion on the channel estimation process must be provided.The Bluetooth connection of any device can be linked to only one adapter at the same time, so the N channels associated to one node have to be estimated one by one.A procedure to estimate the four channels associated to a WASN with two devices an only one loudspeaker per node can be found in [13].This is the procedure that has been followed in this experiment, estimating in a sequential order the channels associated to loudspeakers #1 of both nodes, afterwards the channels associated to loudspeakers #2 and so on.Once every channel is estimated, its random delay introduced by the Bluetooth adapter [13] is manually compensated to match the corresponding geometric arrangement of Fig. 2 for each configuration.The values of the maximum SINR achieved by the WASNs and the resulting spectral radius when γ 1 = γ 2 are shown in Table 1.The number of iterations till convergence for WASN #1 was 20, whereas for WASN #2 was 29.It can be seen that the second configuration provides a higher SINR, which is consistent with a more spread distribution of the loudspeakers.In Fig. 3, the evolution of the SINR (11) at each iteration is shown for the case of unfeasible systems, γ m = 3 dB in WASN #1 (bottom) and γ m = 8 dB in WASN #2 (top).For WASN #2, the value of the radius ρ = 0.95 < 1 indicates that the system is feasible according to the theorem of Perron-Frobenius.However, the system becomes unfeasible because it does not converge within a maximum of 50 iterations.Indeed, it can be noticed how the SINRs are approaching the required value of γ m = 8 dB, although they show an oscillating behaviour.On the other hand, the value of the radius when the required SINR is γ m = 3 dB in WASN #1 is ρ = 1.25 (see bottom of Fig. 3), which indicates that the system is theoretically unfeasible.In this case, the SINRs present an steady behaviour, once their maximum achievable values have been reached.

V. CONCLUSION
We have presented an algorithm able to analyse the feasibility of a WASN to provide a certain SINR at the two zones of a PAS, but keeping the power emitted by the loudspeakers as low as possible.The maximum achievable SINR for a certain loudspeaker distribution can be seen as a performance measurement of the capacity of the WASN to provide the PAS.

11 g 11 p p 21 g 21 p p 12 g 12 p p 22 g 22
the Feasibility of Personal Audio Systems over a Network of Distributed Loudspeakers G. Piñero * , C. Botella † , M. de Diego * , M. Ferrer * , A. González *

Table 1 .
Performance results of the WASNs.