Mutual Information of Wireless Channels and Block-Jacobi Ergodic Operators

Shannon’s mutual information of a random multiple antenna and multipath time varying channel is studied in the general case where the process constructed from the channel coefficients is an ergodic and stationary process which is assumed to be available at the receiver. From this viewpoint, the channel can also be represented by an ergodic self-adjoint block-Jacobi operator, which is close in many aspects to a block version of a random Schrödinger operator. The mutual information is then related to the so-called density of states of this operator. In this paper, it is shown that under the weakest assumptions on the channel, the mutual information can be expressed in terms of a matrix-valued stochastic process coupled with the channel process. This allows numerical approximations of the mutual information in this general setting. Moreover, assuming further that the channel coefficient process is a Markov process, a representation for the mutual information offset in the large signal to noise ratio regime is obtained in terms of another related Markov process. This generalizes previous results from Levy et al.. It is also illustrated how the mutual information expressions that are closely related to those predicted by the random matrix theory can be recovered in the large dimensional regime.


Introduction
In order to introduce the problem that we shall tackle in this paper, we consider the example of a wireless communication model on a time and frequency selective channel that is described by the equation where L is the channel degree, where the complex numbers s n , y n and v n represent respectively the transmitted signal, the received signal, and the additive noise at the moment n, and where the vector C n = [c n,0 , . . . , c n,L ] T ∈ C L+1 represents the channel's impulse response at the moment n.
In a mobile environment, the sequence (C n ) is often modeled as a random ergodic process such as E C 0 2 < ∞ (here we take · as the Euclidean norm). Assuming that this process is available at the receiver site, our purpose is to study Shannon's mutual information of this channel under the generic ergodicity assumption. By stacking n−m+1 elements of the received signal, where m, n ∈ Z and m ≤ n, we get the vector model y m , . . . c n,L · · · c n,0    .
Let ρ > 0 be a parameter that represents the Signal to Noise Ratio (SNR). Considering the matrix/vector model above, and putting some standard assumptions on the statistics of the processes (s n ) and (v n ) (see below), this mutual information is written as I ρ = aslim n−m→∞ log det(ρB m,n B * m,n + I n−m+1 ) n − m + 1 = lim n−m→∞ E log det(ρB m,n B * m,n + I n−m+1 ) n − m + 1 , where B * m,n is the matrix adjoint of B m,n , and where the existence and the equality of both the limits above ("aslim" stands for the almost sure limit) are essentially due to the ergodicity of (C n ).
The natural mathematical framework for studying this limit is provided by the ergodic operator theory in the Hilbert space 2 (Z), for whom a very rich literature has been devoted in the field of statistical physics [25]. In our situation, B m,n is a finite rank truncation of the operator B represented by the doubly infinite matrix . . . c n,L · · · c n,0 . . . . . .
Thanks to the ergodicity of (C n ), it is known that the spectral measure (or eigenvalue distribution) of the matrix B m,n B * m,n converges narrowly in the almost sure sense to a deterministic probability measure called the density of states of the self-adjoint operator BB * , where B * is the adjoint of B. This convergence leads to the convergences in (2).
In statistical physics, the study of the density of states has focused most frequently on the Jacobi (or tridiagonal) ergodic operators which are associated to the so-called discrete Schrödinger equation in a random environment. In this framework, the Herbert-Jones-Thouless formula [7,25] provides a means of characterizing the density of states of an ergodic Jacobi operator, in connection with the so-called Lyapounov exponent associated with a certain sequence of matrices.
In the context of the wireless communications that is of interest here, it turns out that the use of the Thouless formula is possible when one considers BB * as a block-Jacobi operator. This idea was developed by Levy et al. in [19]. The expression of the mutual information that was obtained in [19] was also used to perform a large SNR asymptotic analysis so as to obtain bounds on the mutual information in this regime.
In this paper, we take another route to calculate the mutual information. The expression we obtain for I ρ in Theorem 1 below involves an ergodic process which is coupled with the channel process, and appears to be more tractable than the expression based on the top Lyapounov exponent provided in [19]. We moreover exploit the obtained expression for I ρ to study two asymptotic regimes: we first consider the large SNR regime in a Markovian setting, and obtain an exact representation for the constant term in the expansion of I ρ for large ρ. We also consider a regime where the dimensions of the blocks of our block-Jacobi operator converge to infinity; the expression of the mutual information that we recover is then closely related to what is obtained from random matrix theory [17,12]. In the context of the example described by Equation (1), this asymptotic regime amounts to L converging to infinity. Beyond this example, the large dimensional analysis can also be used to analyze the behavior of the mutual information of time and frequency selective channels in the framework of the massive Multiple Input Multiple Output (MIMO) systems ( [22]), which are destined to play a dominant role in the future wireless cellular techniques/standards.
Organisation of the paper. In Section 2, after stating precisely our communication model and our standing assumption, we provide our main result (Theorem 1). We then consider the large SNR regime in a Markovian setting (Theorem 2) along with some cases where the assumptions for this theorem to hold true are satisfied. In Section 3 we illustrate Theorems 1 and 2 with numerical experiments. There we also state our result on the large dimensional regime, which is related with one of the channel models considered in this section. The next sections are devoted to the proofs.

The model
The model herein is well-suited for the block-Jacobi formalism that we use in the remainder. Given two positive integers N and K, we consider the wireless transmission model with n ∈ Z and where: -(Y n ) n∈Z represents the C N -valued sequence of received signals.
-(S n ) n∈Z is the C K -valued sequence of transmitted information symbols.
-(F n , G n ) n∈Z with F n , G n ∈ C N ×K is a matrix representation of a random wireless channel.
-(V n ) n∈Z is the additive noise.
Let us first give a few examples which fit with this transmission model.
The multipath single antenna fading channel. The channel described by Equation (1) is a particular case of this model. When L > 0, we put and F n , G n ∈ C L×L are the upper triangular and lower triangular matrices defined as When L = 0, we set instead N := K := 1, Y n := y n , S n := s n , V n := v n , F n := 0, and G n := c n,0 . In the multiple antenna variant of this model, the channel coefficients c n, are R × T matrices, where R, resp. T , is the number of antennas at the receiver, resp. transmitter. In this case, the N × K matrices F n and G n given by Eq. (5) when L > 0 are block triangular matrices with N := RL and K := T L.
The Wyner multi-cell model. Another instance of the transmission model introduced above is a generalization of the so-called Wyner multi-cell model considered in [14,30], where the index n now represents the space instead of representing the time. Assume that the Base Stations (BS) of a wireless cellular network are arranged on a line, and that each BS receives in a given frequency slot the signals of the L + 1 users which are not too far from this BS. Alternatively, each user is also seen by L + 1 BS. In this setting, the signal y n received by the BS n is described by Eq. (1) (where the time parameter is now omitted), where s n is the signal emitted by User n, and where c n, is the uplink channel carrying the signal of User n − to BS n.
Other domains than the time or the space domain, such as the frequency domain, can also be covered, see e.g. [29], which deals with a time and frequency selective model. Moreover, this could even address different connected domains as the Doppler-Delay (connected via the so-called Zak transform), as in [4,3], which lead to modulation schemes that are considered as interesting candidates for the fifth generation (5G) wireless systems, as reflected in the references [13,6].

General assumptions
The purpose of this work is to study Shannon's mutual information between (S n ) and (Y n ) when the channel is known at the receiver. To this end, we consider the usual setting where: -The information sequence (S n ) n∈Z is random i.i.d. with law CN (0, I K ).
-The noise (V n ) n∈Z is i.i.d. with law CN (0, ρ −1 I N ) for some ρ > 0 that scales with the SNR.
-The random sequences (S n ) n∈Z , (F n , G n ) n∈Z , and (V n ) n∈Z are independent.
Here and in the following, i.i.d. means "independent and identically distributed", and CN (0, Σ) stands for the law of a centered complex Gaussian circularly symmetric vector with covariance matrix Σ. We also make the following assumptions on the process (F n , G n ) n∈Z representing the channel: Assumption 1. The process (F n , G n ) n∈Z is a stationary and ergodic process. Moreover, Note that the moment assumption (6) does not depend on the specific choice of the norm on the space of N × K complex matrices. In the remainder, we choose · to be the spectral norm.
Let us make precise the assumptions of stationarity and ergodicity. In the following we set for convenience and consider the measure space Ω := E Z equipped with its Borel σ-field F := B(E) ⊗Z . An element of Ω reads ω = (. . . , The assumption that (F n , G n ) n∈Z is an ergodic stationary process, seen as a measurable map from (Ω, F ) to itself, means that the shift T is a measure preserving and ergodic transformation with respect to the probability distribution of the process (F n , G n ) n∈Z . A fairly general stationary and ergodic model is provided by the following example.
Example 1. In the single antenna and single path (L = 0) fading channel case, the autoregressive (AR) statistical model has been considered as a realistic model for representing the Doppler effect induced by the mobility of the communicating devices. This model reads where M > 0 is the order of the AR channel process, (u n ) n∈Z is an i.i.d. driving process, and (a 1 , . . . , a M ) are the constant AR filter coefficients, which can be tuned to meet a required Doppler spectral density (see, e.g., [2]). In the multipath case, this model can be generalized to account for the presence of a power delay profile and the presence of correlations between the channel taps in addition to the Doppler effect. In this case, the channel impulse response vector C n = [c n,0 , . . . , c n,L ] T is written as where {A 1 , . . . , A M } is a collection of deterministic (L + 1) × (L + 1) matrices, and where (U n ) n∈Z is a C L+1 -valued i.i.d. driving process. If the polynomial det(I − M =1 z A ) does not vanish in the closed unit disc, it is well known that there exists a stationary and ergodic process whose law is characterized by (9), see e.g. [15,23], leading to a stationary and ergodic process (F n , G n ) n∈Z by recalling the construction of F n | G n given by Equation (5).

Mutual information and statement of the main result
In order to define the mutual information of the channel described by (3), define for any m, n ∈ Z, m ≤ n, the random matrix of size (n − m + 1)N × (n − m + 2)K, For any fixed ρ > 0, let I ρ be given by As we shall briefly explain below, these two limits exist, are finite and equal, and do not depend on the way n − m → ∞ due to the Assumption 1. As is well known, I ρ is known to represent the required mutual information per component of our wireless channel, provided the input S n is as in Section 2.2, see [10]. The purpose of this paper is to study this quantity.

Remark 1.
In the Wyner multicell model introduced above, where the BS collaborate while the users do not, I ρ represents the sum mutual information per component.
Denoting by H ++ K , resp. H + K , the cone of the Hermitian positive definite, resp. semidefinite, K × K matrices, we show that one can construct a stationary H ++ K -valued process (W n ) n∈Z defined recursively and coupled with (F n , G n ) n∈Z which allows a rather simple formula for the mutual information I ρ .
Theorem 1 (Mutual information of an ergodic channel). If Assumption 1 holds true, then: (a) There exists a unique stationary H ++ K -valued process (W n ) n∈Z satisfying In particular, the process (W n ) is ergodic.
(b) We have the representation for the mutual information per component: (c) Given any matrix X −1 ∈ H + K , if one defines a process (X n ) n∈N by setting for all n ≥ 0, then we have The proof of Theorem 1 is provided in Section 4.
Remark 2. As we will illustrate in Section 3, Theorem 1(c) yields an estimator for I ρ that is less costly numerically than the naive one, due to the dimension of the involved matrices.
Remark 3. The proof of Theorem 1 reveals that the moment assumption (6) can be weakened to The second moment assumption (6) is here to ensure that the received signal power is finite. in [18] in the particular case where N = 1 and where the process (F n , G n ) is i.i.d.

Connection to block-Jacobi operators and previous results
Recall Eq. (10). Due to Assumption 1, it is well known, see [25], that there exists a deterministic probability measure µ that can defined by the fact that for each bounded and continuous function f on [0, ∞), (here, f is of course extended by functional calculus to the semi-definite positive matrices). The measure µ is intimately connected with the so-called ergodic self-adjoint block-Jacobi (or blocktridiagonal) operator HH * , where H is the random linear operator acting on the Hilbert space 2 (Z), and defined by its doubly-infinite matrix representation in the canonical basis (e k ) k∈Z of this space as The random positive self-adjoint operator HH * is an ergodic operator in the sense of [25, Page 33] (see also [12]), and the measure µ is called its density of states. Recalling (11), it holds that where this limit is finite, due to the moment assumption (6) and a standard uniform integrability argument.
As said in the introduction, the Herbert-Jones-Thouless formula [7,25] provides a means of characterizing the density of states of an ergodic Jacobi operator. In [19], Levy et al. develop a version of this formula that is well suited to the block-Jacobi setting of HH * .
In this paper, we rather identify I ρ by considering the resolvents of certain random operators built from the process (F n , G n ) n∈Z instead of using the Herbert-Jones-Thouless formula. The expression we obtain for I ρ involves the ergodic process (W n ) which is coupled with the process (F n , G n ) n∈Z by Eq. (12). This approach is developed in Section 4.

The Markovian case and large SNR regime
First, assuming extra assumptions on the process (F n , G n ), we obtain a description for the constant term (or mutual information offset) in the large SNR regime. Indeed, it often happens that there exists a real number κ ∞ such that the mutual information admits the expansion as ρ → ∞, see e.g. [20]. Our next task is to prove this expansion indeed holds true and to derive an expression for the offset κ ∞ when the process (F n , G n ) n∈Z is further assumed to be a Markov process satisfying some regularity and moment assumptions. Namely, consider for any n ∈ Z the σ-field F n := σ((F k , G k ) : k ≤ n) and assume there exists a transition kernel P : Besides P f ((F, G)), we use the common notations from the Markov chains literature and also write P ((F, G), A) := P 1 A ((F, G)) for any Borel set A ∈ B(E); the iterated kernel P n stands for the Markov kernel defined inductively by P n f := P (P n−1 f ) with the convention that P 0 f := f ; given any probability measure η on E, we let ηP be the probability measure on E defined as The following assumption is formulated in the context where N > K. We denote as M(E) the space of Borel probability measures on the space E. Given a matrix A, the notations Π A and Π ⊥ A refer respectively to the orthogonal projector on the column space span(A) of A, and to the orthogonal projector on span(A) ⊥ . Assumption 2. The process (F n , G n ) n∈Z is a Markov process with transition kernel P associated with a unique invariant probability measure θ ∈ M(E), namely satisfying θP = θ. Moreover, (a) P is Feller, namely, if f : E → R is continuous and bounded, then so is P f .
Remark 5. Since a Markov chain (F n , G n ) n∈Z associated with a unique invariant probability measure is automatically ergodic, we see that Assumption 2 is stronger than Assumption 1 and thus Theorem 1 applies in this setting.
Remark 6. If one assumes (F n , G n ) n∈Z is a sequence of i.i.d random variables with law θ having a density on E, then it satisfies Assumption 2 (and hence Assumption 1) provided that the moment conditions Assumption 2(b)-(c) are satisfied. We also provide more sophisticated examples were Assumption 2 holds in Section 2.5.
Theorem 2 (The Markov case). Let N > K. Then, under Assumption 2, the following hold true: (a) There exists a unique stationary process (Z n ) n∈Z on H ++ K satisfying (b) We have, as ρ → ∞, where log det(Z 0 + F * 1 F 1 ) is integrable, and (c) Given any X −1 ∈ H ++ K , if we consider the process (X n ) n∈N defined recursively by then we have, in probability, The proof of Theorem 2 is provided in Section 5.
Remark 8 (The case N ≤ K). In the statement of Theorem 2, it is assumed that N > K. Let us say a few words about the case where N < K. In this case, assuming that (F n , G n−1 ) is a Markov chain, there is an analogue ( Z n ) of the process (Z n ) satisfying the recursion and adapting Assumption 2 to this new setting, we can show that I ρ = log ρ +κ ∞ + o(1), wherẽ This result can be obtained by adapting the proof of Theorem 2 in a straightforward manner. The case K = N is somehow singular and requires a specific treatment that will not be undertaken in this paper; see also the end of Section 5.1.2 for further explanations.

Remark 9.
In the case where K = 1, N > 1, and the process (F n , G n ) n∈Z is i.i.d., we recover [18,Th. 2], where this result is obtained with the help of the theory of Harris Markov chains.

Examples where Assumption 2 is verified
In Proposition 3 below, the Markov property of the process (F n , G n ) n∈Z is obvious, while in Proposition 4, it can be easily checked from Equation (5). Moreover, in both propositions, it is well known that the Markov process (F n , G n ) n∈Z is an ergodic process satisfying Assumptions 2-(a) and 2-(b) [23]. We shall focus on Assumptions 2-(c) and 2-(d).
Proposition 3 (AR-model). For N > K, assume (F n , G n ) is the multidimensional ergodic AR process defined by the recursion where A ∈ C 2N ×2N is a deterministic matrix whose eigenvalue spectrum belongs to the open unit disk, and where (U n , V n ) n∈Z is an i.i.d. process on E such that E U 0 2 + E V 0 2 < ∞. If the entries of the matrix U n V n are independent with their distributions being absolutely continuous with respect to the Lebesgue measure on C, then Assumption 2-(d) is verified. If, furthermore, the densities of the elements of U n and V n are bounded, then, Assumption 2-(c) is verified.
Our second example is a particular multi-antenna version of the AR channel model of Example 1. This model is general enough to capture the Doppler effect, the correlations within each matrix tap of the channel impulse response, as well as the power profile of these taps.
Proposition 4 (MIMO multipath fading channel). Given three positive integers L, R, and T such that R > T , let (C n ) n∈Z be the C (L+1)R×T -valued random process described by the iterative model where the {H } L =0 are deterministic R × R matrices whose spectra lie in the open unit disk, and where (U n ) n∈Z is an i.i.d. matrix process such that E U 0 2 < ∞. Let F n and G n be the LR × LT matrices defined as in (5) with C n = c T n,0 · · · c T n,L T , the c n, 's being R × T matrices. If the entries of U n are independent with their distributions being absolutely continuous with respect to the Lebesgue measure on C, then Assumption 2-(d) is verified on the Markov process (F n , G n ) n∈Z . If, furthermore, the densities of the elements of U n are bounded, then, Assumption 2-(c) is verified.
Propositions 3 and 4 are proven in Section 5.4.

Numerical illustrations
We consider here a multiple antenna version of the multipath channel desribed in the introduction, see Equations (4)- (5). We assume the channel coefficient matrices c n, satisfy the AR model c n, = αc n−1, + √ 1 − α 2 a u n, . Here the AR coefficient α takes the form α = exp(−f d ). The parameter f d represents the Doppler frequency, since it is proportional to the inverse of the effective support of the autocorrelation function of a channel tap (channel coherence time). For n ∈ Z and ∈ {0, . . . L}, the u n, 's are i.i.d. R × T random matrices with i.i.d CN (0, T −1 ) entries; the real vector a = [a 0 , . . . , a L ] is a multipath amplitude profile vector such that a = 1; as is well known, the vector [a 2 0 , . . . , a 2 L ] represents the so called power delay profile.
Illustration of Theorem 1. We choose an exponential profile of the form a ∝ exp(−0.4 ). We start by comparing the mutual information estimatesÎ m,n of I ρ that naturally come with (11), namely by taking empirical averages of for several realizations of H m,n , with those coming with Theorem 1(c), namelŷ where, for any n ∈ N,    Figure 2 shows that the dispersion parameters associated with these estimates are still important for n as large as 80. We note that in the setting of this figure, the matrix H 1,n H * 1,n ∈ C nRL×nRL is a 480 × 480 matrix when n = 80. On the other hand, the mutual information estimatesÎ Th1 n provided by Theorem 1 require much less numerical computations since they involve the inversions of RL × RL = 6 × 6 matrices.
The large random matrix regime. Next, we consider the asymptotic regime where both N and K converge to infinity at the same pace. For a large class of processes (F n , G n ), it happens that in this regime, the Density of States of the operator HH * (which should now be indexed by K, N ) converges to a probability measure encountered in the field of large random matrix theory; see [17] for "Wigner analogues" of our model, and [12] for models closer to those of this paper. One important feature of this probability measure is that it depends on the probability law of the channel process only through its first and second order statistics.
We illustrate herein this phenomenon on an instance of the MIMO frequency and time selective channel described at the beginning of this section. We observe that in this applicative setting, the regime of convergence of N, K → ∞ at the same rate embeds the case where R and T are fixed while L → ∞, the case where L is fixed while R, T → ∞ at the same pace, as well as the intermediate cases. For the simplicity of the presentation, we assume that the numbers of antennas R and T are equal (note that N = K = RL in this case), and moreover, set the AR coefficient α = 0. If we let N → ∞, we get the following result: Proposition 5 (large dimensional regime). Within the specific model described above, assume the vector a, which depends on L, satisfies a = 1 for every L, and that (which is trivially satisfied if L is fixed). Then, lim N →∞ To prove this proposition, we shall show that I ρ converges as This is the element of the family of the celebrated Marchenko-Pastur distributions which is the limiting spectral measure of XX * when X is a square random matrix with iid elements. We provide a proof in Section 6 which is based on Theorem 1. More sophisticated channel models can be considered, including non centered models or models with correlations along the time index n, and for which one can prove similar asymptotics, see [12]. Note also that in the context of the large random matrix theory, a similar model where L is fixed and R, T → ∞ at the same rate has been considered in [24].
We illustrate this result on an example, represented in Figure 3. As an instance of the statistical channel model used in the statement of Proposition 5, we assume a generalized Wyner model as described in the introduction of this paper. We fix R and T to equal values, and we consider the regime where the network of Base Stations becomes denser and denser, making L converge to infinity. By densifying the network, the number of users occupying a frequency slot will grow linearly with the number of BS. The number of interferers will grow as well. Yet, provided the BS are connected through a high rate backbone to a central processing unit which is able to perform a joint processing, the overall network capacity will grow linearly with L. To be more specific, we assume that the channel power gain when the mobile is at the distance d to the BS is where D > 0 is a parameter that has the dimension of a distance. If the BS are regularly spaced, and if there are L Base Stations per D units of distance, then one channel model approaching this power decay behavior is the setting where the a 's are given by The quantity R×lim L→∞ I ρ , where the limit is given by Proposition 5, thus represents the ergodic mutual information per user. Figure 3 shows that the predictions of Proposition 5 fit with the values provided by Theorem 1 for L as small as one.
Illustration of Theorem 2. Finally, we illustrate the asymptotic behavior of I ρ in the high SNR regime as predicted by Theorem 2. In this experiment, we consider a more general model than the one described above where we replace the centered channel coefficient matrix c n, of the model by where d n, : is a determistic matrix with entries and where the nonnegative number K R plays the role of the so-called Rice factor. We take again a ∝ exp(−0.4 ) and α = exp(−f d ) as in the first paragraph of the section. The high SNR behavior of I ρ is illustrated by Figure 4.  Keeping the same channel model, the behavior of κ ∞ in terms of the Doppler frequency f d and the Rice factor is illustrated by Figure 5. This figure shows that the impact of f d is marginal. Regarding K R , the channel randomness has a beneficial effect on the mutual information for our model, assuming of course that the channel is perfectly known at the receiver.   In this section, we let Assumption 1 hold true.

Preparation
The idea behind the proof of Theorem 1 is to show that I ρ can be given an expression that involves the resolvents of infinite block-Jacobi matrices and to manipulate these resolvents to obtain the recursion formula for W n . We denote for any m, n ∈ Z∪{±∞} by H m,n the operator on 2 := 2 (Z) defined as the truncation of H, defined in (18), having the bi-infinite matrix representation where the remaining entries are set to zero. Recalling the definition of the random matrix H m,n already provided in (10) for finite m, n ∈ Z, we thus identify this matrix with the associated finite rank operator acting on 2 for which we use the same notation.
Let us now introduce a convenient notation: If one considers an operator on 2 with block- We shall prove that the sequence (W n ) indeed satisfies the statements of Theorem 1. To do so, we will use in a key fashion the following Schur complement identities: where the ×'s can be made explicit in terms of A, B, C, D but are not of interest for our purpose.

Proof of Theorem 1(a)
We first show that W n defined in (43) indeed satisfies the recursive equations (12), that is we prove the existence part of Theorem 1(a).

Existence
Proof of Theorem 1(a); existence. Introduce the truncation of H −∞,n defined by deleting the rightmost non-zero column, so that Recalling W n 's definition (43), the Schur's complement formula (45) then provides where we introduced Here the identity = is shown in, e.g., [12,Lemma 7.2].
By similarly expressing H −∞,n in terms of H −∞,n−1 and F n , the same computation further yields and thus we obtain with (48) the identity

Uniqueness
Next, we establish the uniqueness of the process (W n ) n∈Z satisfying the recursive relations (12) within the class of stationary processes, to complete the proof of Theorem 1(a). The proof relies on a contraction argument with the distance on H ++ m for m being a positive integer: dist : which is the geodesic distance associated with the Riemannian metric g X (A, B) := Tr(X −1 AX −1 B) on the convex cone H ++ m ; we refer e.g. to [5, §1.2] or [21, §3] for further information. Convergence in dist is equivalent to convergence in the Euclidean norm. It has the following invariance properties: for any X, Y ∈ H ++ m and any m × m complex invertible matrix A, Moreover, for any S ∈ H + m , we have according to [5,Prop. 1.6], where λ min (S) is the smallest eigenvalue of S. We also have the following result, which will be the key to prove the uniqueness of the process: Lemma 6. Given two positive integers k and n such that n ≥ k, let X, Y ∈ H ++ k , S ∈ H ++ n , and A ∈ C n×k . Then, Proof. Define in H ++ n the two matrices Let (B ) be a sequence of matrices in C n×n such that B is invertible for each ∈ N, and such that B → A 0 as → ∞ (such a sequence is guaranteed to exist by the density of the set of invertible matrices in C n×n ). Using the first identity in (53) and Inequality (54), and observing that dist(X, Y ) = dist(X , Y ), we get that Making → ∞, and recalling that the geodesic and the Euclidean topologies are equivalent, we obtain the result.
Proof of Theorem 1(a); uniqueness. To prove the uniqueness, we assume that N ≥ K for simplicity, since the case N < K can be treated in a similar manner. If one introduces, for any F, G ∈ C N ×K , the mapping ψ F,G : then (12) reads W n = ψ Fn,Gn (W n−1 ). This mapping can be written as where we set τ A;S (X) := AXA * + S and ι(X) := X −1 with a small notational abuse related to the fact that, e.g., the two functions ι used in (59) are not the same in general. Using Lemma 6 together with the invariance of dist with respect to the inversion, we obtain for any W, where for the first inequality we used that G * (I + ρF W F * ) −1 G ≤ G 2 for any W ∈ H + K , and that the function x → x/(x + 1) is increasing. Now, let (W n ) n∈Z be any stationary process on H ++ K satisfying W n = ψ Fn,Gn (W n−1 ) a.s. for every n ∈ Z. If we let n ≥ 0, then we have from (61) a.s. that and, iterating, we obtain By the ergodicity of (G n ) n∈Z , we have and thus we have proven that dist(W n , W n ) → 0 a.s. as n → ∞. Finally, since for any M −tuple of integers (m 1 , . . . , m M ) and similarly for W n , by letting n → ∞ this yields that the finite-dimensional distributions of the two stationary processes (W n ) n∈Z and (W n ) n∈Z are the same, and consequently these two processes have the same distribution.

Proof of Theorem 1(b)
We start with the following lemma.
Lemma 7. For any fixed n ∈ Z and ρ > 0, we have Proof. Denote by K ⊂ 2 the subspace of sequences with finite support. Clearly, for any fixed n ∈ Z and fixed event ω ∈ Ω, we have for all x ∈ K, where → denotes the strong convergence in 2 . Now K is a common core for the set of operators {H * m,n H m,n : m ∈ {n, n − 1, n − 2, . . .}} and H * −∞,n H −∞,n , see e.g. [16,§III.5.3] or [27, Chap. VIII] for this notion. As a consequence, the convergence also holds in the strong resolvent sense, see [27,§VIII], and thus for every x ∈ 2 and ρ > 0, from which (66) follows by definition (43) of W n .

Proof of Theorem 1(b). We start by writing
with P := [ 0 · · · 0 F n ], and use Schur's complement formula (44)  I + ρ F n F * n + ρ G n G * n ò = log det(I + ρ H m,n−1 H * m,n−1 ) + log det I + ρ F n F * n + ρ G n G * n − ρ 2 P H * m,n−1 (I + ρ H m,n−1 H * m,n−1 ) −1 H n,m−1 P * = log det(I + ρ H m,n−1 H * m,n−1 ) + log det I + ρ F n F * n + ρ G n G * n + ρ P [(I + ρ H * m,n−1 H m,n−1 ) −1 − I]P * = log det(I + ρ H m,n−1 H * m,n−1 ) By iterating this manipulation after replacing H m,n−i by H m,n−i−1 at the i th step, if we set for any m ≤ i ≤ n with the convention that H m,m−1 := 0, we have Next, Lemma 7 yields . Thus, by the moment assumption (6), we obtain from (73) and dominated convergence that where the equality follows from the stationarity of the process (F n , G n ) n∈Z . The stationarity further provides that Eξ m,i only depends on i − m and thus, for any fixed n, we obtain by Cesàro summation (see [26,Page 16]) that By taking n = 0 in the recursive relation (12), we moreover see that which proves (13).

Proof of Theorem 1(c)
Proof of Theorem 1(c). Since the process (F n , G n ) n∈Z is assumed to be ergodic, and so does (W n ) n∈Z by construction, we have a.s.
Next, for the same reason as and with the same notations as in the proof of the uniqueness of W n provided in Section 4.2.2, we have dist(X n , W n ) → 0 a.s. as n → ∞. Thus, as a Cesàro average. Since Lemma 6 also yields we similarly have and the result follows from this convergence along with (77).
This completes the proof of Theorem 1.

Proof of Theorem 2
Assume from now that N > K and that Assumption 2 holds true.

Preparation
To obtain an expansion of the type I ρ = (K/N ) log ρ + κ ∞ + o(1) as ρ → ∞, it is more convenient to work with the new variables: Indeed, it follows the identity (13) of Theorem 1 and the stationarity of (W n ) n∈Z that which is the starting point of the asymptotic analysis γ → 0. With this expression at hand, we would like to take the limit γ → 0 and identify the limit To study this limiting case, we start from the recursive equation (12), which reads for these new variables where, for any γ ≥ 0 and F, G ∈ C N ×K , we define h γ,F,G : Note that if γ > 0 then h γ,F,G (Z) ∈ H ++ K . The same holds true when γ = 0, which is now allowed, as soon as G has full rank. We now observe that one can extend this mapping to the whole of H + K .
5.1.1 Extension of the mapping h γ,F,G to H + K Assume that F ∈ C N ×K has full rank, namely rank(F ) = K. By setting T := (F * F ) 1/2 and U := F (F * F ) −1/2 , we have the polar decomposition F = U T where U ∈ C N ×K is an isometry matrix and T ∈ H ++ K . By completing U so as to obtain a N × N unitary matrix U U ⊥ and setting Π ⊥ F := U ⊥ (U ⊥ ) * = I − F (F * F ) −1 F * , which the orthogonal projection onto the orthogonal space to the linear span of the columns of F , we can write where for the second equality we used the matrix identity (I + AB) −1 = B −1 (I + A −1 B −1 ) −1 A −1 with A := T Z −1/2 and B := Z −1/2 T for any Z 1/2 ∈ H + K satisfying (Z 1/2 ) 2 = Z. Note that the alternative expression (87) for h γ,F,G (Z) does now make sense when Z ∈ H + K is not invertible, provided that F has full rank. Moreover, since two Hermitian square roots of Z ∈ H + K are identical up to the multiplication by a unitary matrix, the right hand side of (87) does not depend on the choice for Z 1/2 . In the following, we chose Z → Z 1/2 so that it is continuous (for the operator norm). Thus, by taking the right hand side of (87) as the definition of h γ,F,G (Z) in this case, we properly extended h γ,F,G to a mapping H + K → H + K which is continuous, and that we continue to denote by h γ,F,G . An important property of h 0,F,G we use in what follows is: When γ > 0, if (F n , G n , Z γ,n ) n∈Z denotes the Markov process defined by Z γ,n = h γ,Fn,Gn (Z γ,n−1 ) with (F n , G n ) n∈Z the Markov process with transition kernel P , then by the definition of Z γ,n in (81) and by Theorem 1, it follows that Q γ has a unique invariant measure, that we denote by π γ . The strategy of the proof of Theorem 2 is to show that Q 0 has also a unique invariant measure π 0 , which will yield the existence of the process Z n := Z 0,n , and we also show that π γ → π 0 narrowly as γ → 0 and that one can legally take the limit γ → 0 in (83), so as to obtain . It turns out when N = K one can possibly lose the uniqueness of the invariant measure for Q 0 , which makes this setting out of reach for our current approach.

Existence and uniqueness of the invariant measure of Q 0
The key to prove the existence of an invariant measure for Q 0 is the following result.
is a tight subset of M(H ++ K ).
Proof. Let us fix ε > 0. We first prove there exists η > 0 such that, for any ξ ∈ C , where we recall that λ min (Z) is the smallest eigenvalue of Z ∈ H + K . To do so, observe from (85) that if Z ∈ H ++ K then so does h 0,F,G (Z) as soon as G has full rank, which is true θ-a.s. due to Assumption 2(d). We claim that this assumption further yields that, that for all (F, G, Z) satisfying rank(Z) < K, we have Q 0 ((F, G, Z), rank(Z) > rank(Z)) = 1, namely at each step of the process the rank of the random matrix Z increases Q 0 ((F, G, Z), ·)-a.s. To prove this, we start from Q 0 ((F, G, Z), rank(Z) ≤ rank(Z)) = P ((F, G), rank(h 0,F,G (Z)) ≤ rank(Z)). (91) Recalling (87), we have rank(h 0,F,G (Z) − G * Π ⊥ F G) = rank(Z) as soon as F * G is invertible. Using Assumption 2(d) in conjunction with the general fact that rank(A + B) ≤ rank(A) implies that the column spans of these matrices satisfy span(B) ⊂ span(A) for any A, B ∈ H + K , this yields for θ-a.e. (F, G). Next, we will use repeatedly that, for two matrices A and B we have span(A) ⊂ span(B) if and only if span(CAD) ⊂ span(CBD) for all invertible matrices C and D. If we let Z ⊥ ∈ C K×K be any matrix such that span(Z ⊥ ) = span(Z) ⊥ , we have: provided that F and G have full rank. Therefore, together with Assumption 2(d), we obtain for θ-a.e. (F, G), and our claim follows. As a consequence, Z has full rank (θ ⊗ δ 0 )Q K 0 ((F, G), ·)-a.s. and thus there exists η > 0 such that Next, we use that Z → h 0,F,G (Z) and Z → λ min (Z) are non-decreasing on H + K , see Lemma 8, so that for any ζ ∈ M(E × H + K ) satisfying ζ(· × H + K ) = θ(·) and any n ≥ K, we have which finally proves (90).
Finally, let C > 0 be such that θ( G 2 > C) < ε and consider the compact subset K of H ++ K given by It follows from (87) that h 0,F,G (Z) ≤ G 2 for any (F, G) ∈ E such that F has full rank and any Z ∈ H + K . This provides, for any ζ ∈ M(E × H + K ) satisfying ζ(· × H + K ) = θ(·) and any n ≥ K, and thus ξ(K) ≥ 1 − 2ε for any ξ ∈ C . The proof of the lemma is therefore complete.
In the remainder, C b (S) denotes the set of continuous and bounded functions on the metric space S. Lemma 10. For any γ ≥ 0 the kernel Q γ maps C b (E × H ++ K ) to itself. Proof. Let f : E × H ++ K → R be a bounded and continuous function, and note from the definition of Q γ that Q γ f is clearly bounded. To show it is continuous, let (F k , G k , Z k ) k≥1 be a sequence converging to (F 0 , G 0 , Z 0 ) in E × H ++ K as k → ∞. If we set g k (F, G) := f (F, G, h γ,F,G (Z k )) and µ k (·) := P ((F k , G k ), ·), then this amounts to show that g k dµ k → g 0 dµ 0 as k → ∞. Since P is Feller by Assumption 2(a), we have the narrow convergence µ k → µ 0 . Since (F, G) → h γ,F,G (Z) is continuous on E for any Z ∈ H ++ K we have g 0 ∈ C b (H ++ K ) and that g k → g 0 locally uniformly on E. Together with the tightness of (µ k ) and that sup k∈N g k ∞ < ∞, we obtain g k dµ k → g 0 dµ 0 and the proof of the lemma is complete.
Corollary 11. Q 0 has an invariant measure in M(E × H ++ K ). Proof. Let ζ := θ ⊗ δ 0 so that by Lemma 9 we have ζQ n 0 ∈ M(E × H ++ K ) for every n ≥ K and ζQ n 0 → π narrowly as n → ∞ for some π ∈ M(E × H ++ K ), possibly up to the extraction of a subsequence. If we set, for any n > K, then we also have the narrow convergence ζQ 0,n → π. Next, given any Since Q 0 f ∈ C b (E × H ++ K ) according to Lemma 10, by taking the limit n → ∞ we obtain πf = πQ 0 f and thus π is an invariant measure for Q 0 .
Lemma 12. If Q 0 has an invariant distribution, then it is unique.
Recalling (86) for γ = 0, and keeping in mind that Assumption 2(d) yields that Z n ∈ H ++ K a.s. and that F n has full rank a.s. for every n ∈ N, we have Dealing with the terms τ (F * n Fn) 1/2 ,I and τ G * n Fn(F * n Fn) −1/2 , G * n Π ⊥ Fn Gn by Lemma 6 and Inequality (54) respectively, we get which implies that, for any n ≥ 1, By Hölder's inequality, we have By dominated convergence, the rightmost term of these inequalities converges to zero as n → ∞, and thus n−1 i=0 ξ i → 0 in probability. It thus follows from (103) that dist(Z π 1 n , Z π 2 n ) → 0 in probability, which concludes the proof.

The last step for the proof of Theorem 2
Proof of Theorem 2. First, Corollary 11 and Lemma 12 show that Q 0 has a unique invariant measure, that we denote by π 0 , and moreover that π 0 ∈ M(E × H ++ K ). Kolomorogov's existence theorem then yields there exists a unique stationary Markov process (F n , G n , Z n ) n∈Z on E × H ++ K with transition kernel Q 0 , which is in particular ergodic. Moreover, (Z n ) n∈Z satisfies the equation (24) by definition of Q 0 , which proves part (a) of the theorem.
To establish Theorem 2-(c), we follow the same strategy as in the proof of Theorem 1-(c): Since the Markov chain (F n , G n , Z n ) n∈Z is ergodic, we have By using the same line of argument as in the proof of Lemma 12, we obtain with a bound similar to (103) and the arguments below that dist(X n , Z n ) → 0 in probability. This implies in turn that dist(X n + F * n+1 F n+1 , Z n + F * n+1 F n+1 ) ≤ dist(X n , Z n ) → 0, and thus, that log det(X n + F * n+1 F n+1 ) − log det(Z n + F * n+1 F n+1 ) → 0 in probability. As a consequence, part (c) is obtained by taking a Cesàro average and (111).

Proofs for Section 2.5
We shall need the following result, which follows from the fact that the zero set of a non-zero polynomial of d variables has zero measure for the Lebesgue measure of R d .
Lemma 13. Let X be a random complex n×n matrix whose distribution is absolutely continuous with respect to the Lebesgue measure on C n×n R 2n 2 . Then, P(rank(X) = n) = 1.
We also need in this paragraph the following notations: Given a positive integer n, we set [n] := {0, . . . , n − 1}. Given a matrix X ∈ C m×n and two sets of indices J 1 ⊂ [m] and J 2 ∈ [n], we denote by X J1,J2 the |J 1 | × |J 2 | submatrix of X obtained by keeping the rows of X whose indices belong to J 1 and the columns of X whose indices belong to J 2 . We also write for convenience X J1,· := X J1,[n] and X ·,J2 := X [m],J2 . Finally, we write log − (x) = min(log x, 0) and log + (x) = max(log x, 0).
Proof of Proposition 3. We start with Assumption 2-(d). Using that (U n , V n ) and (F k , G k ) k≤n−1 are independent, it is enough to show that for any B, D ∈ C N ×K , Since U n has a density (for Lebesgue), then for any invertible matrix S ∈ C K×K , we see that S(U J,· n + B J,· ) has a density. Since Lemma 13 yields that the random matrix (V J,· n + D J,· ) is invertible a.s (it has a density), the square matrix (V J,· n + D J,· ) * (U J,· n + B J,· ) has a density. Recall that the convolution between an absolutely continuous probability and any probability measure is absolutely continuous. Thus, since (U J,· n , V J,· n ) and (U J c ,· n , V J c ,· n ) are independent, the matrix within the determinant at the right hand side of (114) has a density. Using Lemma 13 again, we obtain (112).
For any v ∈ C K \ {0}, the vector w := (U n + B)v is a random vector whose elements are independent and have probability densities. It results that for any matrix C ∈ C N ×K , we have Π ⊥ C w = 0 a.s. Thus, P Π ⊥ Vn+D (U n + B)v = 0 = 0 by the Fubini-Tonelli theorem, and (113) is obtained.
We now establish the truth of Assumption 2-(c). Write F n = f 0 n · · · f K−1 n , where f k n is the k th column of the matrix F n . For k ∈ [K − 1], let J k = {k + 1, . . . , K − 1 }. Applying, e.g., a Gram-Schmidt process to the successive columns f 0 n , . . . , f K−1 n , setting F ·,∅ n = 0 ∈ C N , and using the obvious inequality log + x ≤ x for x > 0, we get that where In the remainder of the proof, "conditional" refers to a conditionning on (F n−1 , G n−1 , u k+1 n , . . . , u K−1 n ). All the bounds are constants that only depend on the bound on the densities of the elements of U n .
The vector f k n can be written as f k n = d k n−1 + u k n , where d k n−1 is (F n−1 , G n−1 )-measurable, and where u k n is the k th column of U n . By the assumptions on (U n ), the elements of f k n are conditionally independent and have bounded densities. If k < K − 1, make a (F n−1 , G n−1 , u k+1 n , . . . , u K−1 n )measurable choice of a unit-norm vector p k which is orthogonal to the subspace span F ·,J k n , otherwise, take p k as an arbitrary constant unit-norm vector. Since | log − (·)| is a nonincreasing function, T has unit-norm, it has at least one element, say p k 0 , such that |p k get that the conditional density of p k 0 f k n,0 is bounded, and by doing a simple calculation involving density convolutions, we finally obtain that p k , f k n has a bounded conditional density. Now, it is easy to see that if X is a complex random variable with a density bounded by a constant C , which completes the proof.
To prove Proposition 4, we first need the following lemma.

Lemma 14.
Given any positive integers m, n, r satisfying r ≤ n ≤ m, let X be a m × n matrix where Y is a r × n matrix, and assume that rank(Y ) = r. Then Proof. The formula Π X = X(X * X) −1 X * yields Π Performing a singular value decomposition, with Λ the diagonal r × r matrix of singular values and V 2 satisfying span(V 2 ) = ker Y , and using Schur's complement formula (45), we obtain This expression shows that Π which is the required result.
Proof of Proposition 4. Let us prove that Assumption 2-(d) holds. The recursive equation (32) satisfied by (C n ) n∈Z yields, for any ∈ [L − 1] and k ∈ [L], where U n =: u T n,0 · · · u T n,L T , the u n, 's being R × T matrices. Notice that the c nL−1,k and the u nL+ −i,k terms in the rightmost term above are respectively (F n−1 , G n−1 )-measurable and independent from (F n−1 , G n−1 ). Plugging these equations in the expressions for F n and G n , we obtain and where the matrices B n−1 and D n−1 are (F n−1 , G n−1 )-measurable random matrices which are block-upper triangular and block-lower triangular respectively, with R × T blocks (the exact expressions of these matrices are irrelevant). Furthermore, the matrices Q n and S n are independent of (F n−1 , G n−1 ). Thus, the proposition will be proven if we show that for all constant block-upper triangular matrices B ∈ C LR×LT and all constant block-lower triangular matrices D ∈ C LR×LT with R × T blocks, : v i = 0}. An inspection of (120) reveals that for a random vector a which is independent from u nL+k,L . With this at hand, we see that Since Π ⊥ Sn+D and u nL+k,L are independent and u nL+k,L v k has a density, (123) follows from (124). To complete the proof of that Assumption 2-(d) holds true, we now turn to the proof of (124). We use the equivalence (Π ⊥ Sn+D ) ·,J = 0 ⇔ (Π Sn+D ) J ,J = I. Let us write where Y = (S n + D) J ,· ∈ C R×LT , and set  is a square upper block-triangular matrix with T × T blocks. Moreover, the th diagonal block of this matrix is the sum of u [T ],· nL+ ,L and a (F n−1 , G n−1 , u nL , . . . , u nL+ −1 )-measurable term that we denote by d n, . Now, since (1 + F n 2 )I > F * n F n ≥ (F J,· n ) * F J,· n (131) in the Hermitian semidefinite ordering, it holds that LT log(1 + F n 2 ) > log det(F * n F n ) ≥ log det((F J,· n ) * F J,· n ), thus, E| log det(F * n F n )| < E| log det((F J,· n ) * F J,· n )| + LT E F n 2 ≤ E| log det((F J,· n ) * F J,· n )| + C, where C < ∞ since Assumption 2-(b) is verified. Moreover, and the summands in this last expression can be dealt with as in the last part of the proof of Proposition 3. The main distinctive feature of the proof here is that when we deal with the th summand and when it comes to manipulate the conditional densities, we need to condition on (F n−1 , G n−1 , u nL , . . . , u nL+ −1 ). This concludes the proof of Proposition 4.

Proof of Proposition 5
The expression of Shannon's mutual information given by Theorem 1 provides a means of recovering the large random matrix regime when K, N → ∞ with K/N → γ ∈ (0, ∞) in a general setting. We present a general result, then we particularize it to the setting of Proposition 5: Lemma 15. Under Assumption 1, if we introduce for any m ≤ n, where O(1/M ) is uniform in K, N .
As an illustration, we now prove Proposition 5 as an easy consequence of this lemma and well known results from random matrix theory.
Proof of Proposition 5. Observe from (5) and the assumptions made on the process (C n ) n∈Z that, for any M ≥ 1, the (M + 1)N × (M + 1)N matrixH 0,M is a square matrix having independent entries with a doubly stochastic variance profile, and that the maximum of these variances for a given N is of order O(1/N ). It is well known in random matrix theory that when N → ∞, the empirical spectral measure ofH 0,MH * 0,M converges narrowly to the Marchenko-Pastur distribution µ MP (dλ) = (2π) −1 4/λ − 11 [0,4] (λ) dλ a.s, see [9,28,11]. Making a standard moment control, we therefore obtain, for every fixed M ≥ 1, One can compute, see e.g. [28,Th. 2.53] or [11,Th. 4.1], that this limiting integral coincides with the right hand side of (37). Letting M → ∞, the proposition follows from Lemma 15.
We finally turn to the proof of the lemma.
Proof of Lemma 15. Using the notations of Theorem 1, we set ξ n := log det (I + ρ F n W n−1 F * n ) − log det W n and check, similarly as in (76), that ξ n = log det (I + ρ G n G * n + ρ F n W n−1 F * n ) .
If we set for convenience V n := ρ G * n (I + V n ) −1 G n V n := ρ F n W n−1 F * n (140) then we have the relation V n = ρ F n (I + V n−1 ) −1 F * n and we moreover see that ξ n equals to log det(I + V n + ρ G n G * n ) = log det(I + ρ F n (I + V n−1 ) −1 F * n + ρ G n G * n ) = log det

Conclusion
Shannon's mutual information of an ergodic wireless channel has been studied in this paper under the weakest assumptions on the channel. The general capacity result has been used to perform high SNR and the high dimensional analyses. Future research directions along the lines of this paper include the high SNR analysis when the number of components at the receiver and at the transmitter are equal. This analysis requires different tools than the ones used in Section 5 of this paper, which rely heavily on Assumption 2-(d). Another research direction is to thoroughly quantify the impact of the parameters of a given statistical channel model on the mutual information obtained by Theorems 1 and 2. In this respect, an attention can be devoted to the Doppler shift as in the recent paper [8] and in the references therein. Finally, transmission schemes with a partial channel knowledge at the receiver, or scenarios with different delay constraints deserve a particular attention.