Uplink-downlink duality for integer-forcing

Consider a MIMO uplink channel with channel matrix H and a MIMO downlink channel with channel matrix Hτ. It is well-known that any rate tuple that is achievable on the uplink is also achievable on the downlink under the same total power constraint, i.e., there is an uplink-downlink duality relationship. In this paper, we consider the integer-forcing strategy, in which users steer the channel towards an integer-valued effective channel matrix so that the receiver(s) can decode integer-linear combinations of the transmitted codewords. Recent efforts have demonstrated the benefits of this strategy for uplink, downlink, and interference alignment scenarios. Here, we establish that uplink-downlink duality holds for integer-forcing. Specifically, in the uplink, L transmitters communicate over channel matrix H to an L-antenna receiver with target integer matrix A. In the downlink, an L-antenna transmitter communicates over channel matrix Hτ to L single-antenna receivers with target integer matrix Aτ. We show that any computation rate tuple that is achievable in the uplink is achievable for the same total power in the downlink and vice versa.


I. INTRODUCTION
The capacity region of the Gaussian MIMO MAC is well-known [1,Sec. 10.1] and can be attained via joint maximum likelihood (ML) decoding.While joint ML decoding is optimal, its implementation complexity grows exponentially with the number of users.This has lead to considerable interest in linear receiver architectures [2]- [4], which rely only on single-user decoding algorithms.A conventional linear receiver consists of a linear equalizer that generates an estimate of each user's codeword followed by parallel single-user decoders.However, even with optimal minimum mean-squared error (MMSE) estimation, linear receivers fall short of the MIMO MAC sum capacity.This gap can be closed via successive interference cancellation (SIC), provided that the transmitters operate at one of the corner points of the capacity region.The full capacity region can be attained via SIC combined with either time sharing [5], [6] or rate splitting [7].
Although the Gaussian MIMO BC is non-degraded, its capacity region can be established via its uplink-downlink duality with the Gaussian MIMO MAC [8]- [12].Specifically, uplink-downlink duality refers to the fact that any rate tuple that is attainable on the Gaussian MIMO MAC with channel matrix H is also attainable on the "dual" Gaussian MIMO BC with channel matrix H T using the same sum power and vice versa.The capacity region is attained via dirty-paper coding [13], which requires high implementation complexity at the transmitter.As with the MIMO MAC, significant effort has gone towards characterizing the performance of linear transmitter architectures for the MIMO BC (see, for instance, [14], [15]), which are suboptimal in general.As demonstrated by [10], linear transceiver architectures also satisfy uplink-downlink duality.That is, given equalization and beamforming matrices for a MIMO MAC, we can achieve the same rate tuple on the dual MIMO BC with the same sum power by exchanging the roles of the equalization and beamforming matrices.
Integer-forcing is a variation on conventional linear transceiver architectures that can attain significantly higher sum rates.Rather than using the equalization and beamforming matrices to separate users' codewords, an integerforcing transceiver employs them to create an integer-valued effective channel matrix.The single-user decoders are then used to recover integer-linear combination of the codewords.By selecting an integer-valued effective matrix that closely approximates the channel matrix, this transceiver can reduce the effective noise variances seen by the W.He  W.He and B. Nazer are with the Department of Electrical and Computer Engineering, Boston University, Boston, MA.Emails: whe02@bu.edu,bobak@bu.eduS. Shamai (Shitz) is with the EE Department, Technion, Haifa, Israel.Email: sshlomo@ee.technion.ac.il decoders, leading to higher rates.In the MIMO MAC, these integer-linear combinations are solved for the original codewords [16].In the MIMO BC, the transmitter applies the inverse linear transform to its messages prior to encoding, so that each integer-linear combination corresponds to the desired message of that user [17], [18].Here, we demonstrate that integer-forcing transceivers satisfy uplink-downlink duality for the sum rate in the sense of [10].
At a high-level, this means that the sum rate achievable for decoding the integer-linear combinations with integer coefficient matrix A over a MIMO MAC with channel matrix H is also achievable for decoding the integer-linear combinations with integer coefficient matrix A T over a MIMO BC with channel matrix H T by exchanging the roles of the equalization and beamforming matrices.
Although integer-forcing offers an advantage over conventional linear transceivers, it does not attain the sum capacity in general.For the MIMO MAC, recent work [19] developed a successive integer-forcing receiver that attains the sum capacity, including the corner points available to an MMSE-SIC receiver as well as other rate tuples on the dominant face (without the aid of time sharing or rate splitting).This leads to the natural question as to whether there is an equivalent of dirty-paper coding for the integer-forcing MIMO BC transmitter and whether it satisfies an uplink-downlink relationship with successive integer-forcing.Here, we answer both questions in the affirmative.That is, we propose a dirty-paper integer-forcing transmitter for the MIMO BC, building on the latticebased dirty-paper scheme from [20].We then show that the notion of uplink-downlink proposed above continues to hold between successive integer-forcing for the MIMO MAC and dirty-paper integer-forcing for the MIMO BC.
We also present two applications of uplink-downlink duality: a constant-gap optimality result for downlink integer-forcing and an iterative algorithm for optimizing beamforming and equalization matrices.To motivate the first application, prior work [21], [22] established that integer-forcing can operate within a constant gap of the MIMO MAC capacity using only "digital" successive cancellation.Using duality, we demonstrate that integer-forcing can operate within a constant gap of the MIMO BC capacity using only "digital" dirty-paper coding.For the second application, it is well-known that simultaneously optimizing beamforming and equalization matrices corresponds to a non-convex problem.However, for both the MIMO MAC and BC, finding the optimal equalization matrix for a fixed beamforming matrix has a closed-form solution.Therefore, a natural algorithm is to iterate between a problem and its dual, updating the equalization matrix at every iteration.For example, the maxSINR algorithm [23] relies on a variation of this idea to identify good interference alignment solutions.Here, we propose an iterative algorithm for optimizing the beamforming and equalization matrices used in integer-forcing.Recent follow-up work has used a variation of our algorithm to identify good integer-forcing interference alignment solutions [24].

Downlink (MIMO BC)
Integer-Forcing Transmitter [17], [18] Dirty-Paper Integer-Forcing Duality Duality Fig. 1.Summary of the relationships between uplink and downlink integer-forcing strategies with the contributions of this paper in blue.

A. Related Work
Prior work on integer-forcing [16]- [19] has focused on the important special case where all codewords have the same effective power.This constraint is implicitly imposed by the original compute-and-forward framework [25].In order to establish uplink-downlink duality, we need the flexibility to allocate power unequally across codewords.We will thus employ the expanded compute-and-forward framework [22], which can handle unequal powers.Our achievability results draw upon capacity-achieving nested lattice codes, whose existence has been shown in a series of recent works [26]- [31].We refer interested readers to the textbook of Zamir for a detailed history as well as a comprehensive treatment of lattice codes [32].
For the sake of notational simplicity, we will state all of our results for real-valued channels.Analogous results can be obtained for complex-valued channels via real-valued decompositions.Recent efforts have shown that computeand-forward can also be realized for more general algebraic structures [33].For instance, building lattices from the Eisenstein integers yields better approximations for complex numbers on average, and can increase the average performance of compute-and-forward [34].
Here, we will assume that full channel state information (CSI) is available to the transmitter and receiver, in order to optimize the beamforming matrices and power allocations.However, CSI may not always be available, especially at the transmitter.The original integer-forcing paper [16] numerically demonstrated the performance gains over conventional linear receivers in terms of outage rates.It also established that, if each antenna encodes an independent data stream, then integer-forcing attains the optimal diversity-multiplexing tradeoff [35].Subsequently, it was shown that if the transmitter mixes the data streams using a space-time code with a non-vanishing determinant, then integer-forcing operates within a constant gap of the capacity [36].Recent work has also studied the advantages of a random precoding matrix on the outage probability for integer-forcing [37].
Integer-forcing can also serve as a framework for distributed source coding, and can be viewed as the "dual" of integer-forcing channel coding in a certain sense.See [38], [39] for further details.Very recent work has also established uplink-downlink duality for compression-based strategies for cloud radio access networks [40].
Finally, we note that there is a rich body of work on lattice-aided reduction [41]- [47] for MIMO channels.For instance, in the uplink version of this strategy, each transmitter employs a lattice-based constellation (such as QAM).The decoder steers the channel to a full-rank integer matrix using equalization, makes hard estimates of the resulting integer-linear combinations of lattice symbols, and then applies the inverse integer matrix to obtain estimates of the emitted symbols.Roughly speaking, integer-forcing can be viewed as lattice-aided reduction that operates on the codeword, rather than symbol, level.This in turn allows us to write explicit achievable rate expressions for integer-forcing, whereas rates for lattice-aided reduction must be evaluated numerically.

B. Paper Organization
The remainder of this paper is organized as follows.In Section II, we give problem statements for the uplink and downlink.Next, in Section III, we give a high-level overview of our duality results.Section IV provides background results from nested lattice coding that will be needed for our achievability scheme.We review successive integerforcing for the MIMO MAC in Section V. Afterwards, in Section VI, we propose a dirty-paper integer-forcing strategy for the MIMO BC and analyze its achievable rates.Section VII formally establishes an uplink-downlink duality relationship for integer-forcing (both with and without successive cancellation and dirty-paper coding).In Section VIII, we propose an iterative algorithm that uses uplink-downlink duality for optimizing the integer-forcing beamforming, equalization, and integer matrices.We provide simulations in Section IX and Section X concludes the paper.

C. Notation
We will make use of the following notation.Column vectors will be denoted by boldface, lowercase font (e.g., a ∈ Z L ) and matrices with boldface, uppercase font (e.g., A ∈ Z L×L ).Let a[i] denote the i th coordinate of the vector a.We will use a to represent ℓ 2 -norm of a, Tr(A) to represent the trace of A, and eig(A) to denote the set of eigenvalues (i.e., spectrum) of A. We will also use diag(a) to denote the diagonal matrix formed by using the placing the elements of a along the diagonal.All logarithms are taken to base 2 and we define log + (x) max(0, log x).We denote the identity matrix by I, the all-ones column vector of length k by 1 k and the all-zeros column vector of length k by 0 k .
We will work with both the real field R and prime-sized finite fields Z p = {0, 1, . . ., p−1} where p is prime. 1 We will denote addition and summation over the reals by + and , respectively.For a prime-sized finite field, we will use ⊕ and to denote addition and summation, respectively.Define [a] mod p to be the modulo-p reduction of a.For vectors and matrices, the modulo-p operation is taken elementwise and denoted by [a] mod p and [A] mod p, respectively.Taking a linear combination over a prime-sized finite field can be linked to taking a linear combination over the reals as follows, Note that, on the left-hand side, q 1 , q 2 , w 1 , w 2 are elements of the finite field whereas, on the right-hand side, they are elements of the integers under the natural mapping.Finally, subscripts "u" and "d" will be used to denote variables associated with the uplink and downlink, respectively.

II. PROBLEM STATEMENT
We now give problem statements for the uplink and downlink.See Figure 2 for a block diagram.We focus on real-valued channels, and note that our results are directly applicable to complex-valued channels by using a real-valued decomposition as in [16].

Downlink Channel
Fig. 2. Block diagram of the uplink and downlink channel models.We say that the channels are duals of each other if their channel matrices satisfy Uplink Channel.The uplink channel (i.e., MIMO MAC) consists of L transmitters and a single N -antenna receiver.The ℓ th transmitter is equipped with M ℓ transmit antennas.It has a message w u,ℓ that is drawn independently and uniformly from {1, 2, . . ., 2 nRu,ℓ } and an encoder E u,ℓ : {1, 2, . . ., 2 nRu,ℓ } → R Mℓ×n that maps this message into a channel input X u,ℓ = E u,ℓ (w u,ℓ ) of blocklength n.It will often be convenient to work with the concatenation of the channel inputs which is of dimension M × n where M = ℓ M ℓ denotes the total number of transmit antennas.The transmitters must satisfy a total 2 power constraint E[ Tr(X u X T u )] ≤ nP total .The receiver observes a noisy linear combination of the emitted signals, where H u,ℓ ∈ R N ×Mℓ is the channel matrix from the ℓ th transmitter to the receiver and the additive noise Z u ∈ R N ×n is elementwise i.i.d.Gaussian with mean zero and variance one.We denote the concatenated channel matrices by which lets us concisely write the channel output as This channel output is sent through a decoder D u : R N ×n → {1, 2, . . ., 2 nR1 }×• • •×{1, 2, . . . , 2nRL } that produces estimates of the messages, ( ŵu,1 , . . ., ŵu,L ) = D u (Y u ).
2 Conventional MAC models impose a power constraint on each user individually.However, it is well-known that uplink-downlink duality can be established only if we are free to reallocate the power across transmitters [9]- [11].Note also that we use an expected power constraint rather than a hard power constraint of the form Tr(XuX T u ) ≤ nPtotal.In order to impose a hard power constraint, we would first need to show that our nested lattice ensemble, taken from [31], is also good for covering in the sense of [30].Alternatively, we could keep only a constant fraction of each codebook, throwing out the codewords with the highest powers.This would result in codebooks that meet hard power constraints and achieve the same rates, at the cost of disrupting the symmetry of the encoding scheme.
Overall, we say that the uplink rates R u,1 , . . ., R u,L are achievable if, for any ǫ > 0 and n large enough, there exist encoders and decoder such that P( L ℓ=1 { ŵu,ℓ = w u,ℓ }) < ǫ.The uplink capacity region is the closure of the set of all achievable rates.Downlink Channel.The downlink channel model mirrors the uplink channel model.There is a single N -antenna transmitter and L receivers.Let M ℓ represent the number of antennas at the ℓ th receiver and let M = ℓ M ℓ be the total number of receive antennas.The transmitter has L messages: the ℓ th message w d,ℓ is drawn independently and uniformly from {1, 2, . . ., 2 nRd,ℓ } and is intended for the ℓ th receiver.The transmitter uses an encoder E d : {1, 2, . . ., 2 nRd,1 } × {1, 2, . . . , 2nRd,L } → R N ×n to map these messages into a channel input X d = E d (w d,1 , . . . ,w d,L ) where n represents the blocklength.This channel input must satisfy a total power constraint E[ Tr(X d X T d )] ≤ nP total .For m = 1, . . ., L, the channel output observed by the m th receiver is where Finally, it will often be useful to work with the following concatenated matrices, which enable us to compactly write the downlink channel output as

III. OVERVIEW OF MAIN RESULTS
We now give a high-level overview of our main results.To put our results in context, we begin by stating the capacity regions for the uplink and downlink.We then give a quick summary of the rates achievable via conventional linear architectures and their uplink-downlink duality relationships.Finally, we overview our integerforcing architectures for the uplink and downlink and state our uplink-downlink duality results.

A. Capacity Regions
Uplink Channel.The uplink (i.e., MIMO MAC) capacity region C u is the set of rate tuples (R u,1 , . . ., R u,L ) for some positive semi-definite matrices K 1 , . . ., K L satisfying the sum power constraint L ℓ=1 Tr(K ℓ ) ≤ P total .It can be attained with i.i.d.Gaussian encoding and simultaneous joint typicality decoding.Alternatively, it can attained with i.i.d.Gaussian encoding, successive interference cancellation decoding, and time sharing [5], [6] or rate splitting [7].See [48, §9.2.1] for more details.Downlink Channel.As shown by [12], the downlink (i.e., MIMO BC) capacity region C d is the convex hull of the set of rate tuples (R d,1 , . . ., R d,L ) satisfying for some permutation θ of {1, 2, . . ., K} and positive semi-definite matrices K 1 , . . ., K L satisfying the sum power constraint L ℓ=1 Tr(K ℓ ) ≤ P total .It can be attained using dirty-paper coding at the transmitter, joint typicality decoding at the receivers, and time sharing.See [12] or [48, §9.6.4] for more details.Uplink-Downlink Duality.It can be argued that the uplink and downlink capacity regions described above are equal to one another, C u = C d .This was first shown for the sum-capacity [9]- [11] and then for the full capacity region [12].

B. Conventional Linear Architectures
We begin with a summary of classical linear uplink and downlink architectures and their duality relationship.Uplink Channel.The ℓ th transmitter has a codeword s u,ℓ ∈ R n with expected power 1 n E s u,ℓ 2 = P u,ℓ .It uses a beamforming vector c ℓ ∈ R Mℓ to generate its channel input Collecting the beamforming vectors into the matrix and the codewords into the matrix we can write the beamforming operation as To recover the m th codeword, the receiver uses an equalization vector b u,m ∈ R N to obtain the effective channel output which is fed into a single-user decoder.By employing i.i.d.Gaussian codewords, the transmitters can achieve the following rates Downlink Channel.The transmitter has a codeword s d,ℓ ∈ R n intended for the ℓ th receiver with expected power and applies a beamforming matrix B d ∈ R N ×L to create its channel input The m th receiver uses an equalization vector c d,m ∈ R Mm to form an effective channel output Using i.i.d.Gaussian codewords, we can achieve the following rates: Uplink-Downlink Duality.We can now state the uplink-downlink duality relationship for conventional linear architectures.Define the uplink and downlink equalization matrices respectively.Also, define the uplink and downlink power matrices, respectively.The following theorem encapsulates the uplink-downlink result of [10] in our notation.Theorem 1 ( [10]): For a given uplink channel matrix H u and (diagonal) power matrix P u that meets the total power constraint Tr(C T u C u P u ) = P total , let R u,1 , . . ., R u,L be a rate tuple that is achievable with equalization matrix B u and precoding matrix C u .Then, for the downlink channel matrix H d = H T u , there exists a unique (diagonal) power matrix P d with total power usage Tr(B T d B d P d ) = P total , such that the rate tuple R d,ℓ = R u,ℓ for ℓ = 1, . . ., L is achievable using equalization matrix C d = C T u and precoding matrix B d = B T u .The same relationship can be established starting from an achievable rate tuple for the downlink and going to the uplink.
In other words, any rates that are achievable on an uplink channel can be achieved on a downlink channel with a transposed channel matrix by exchanging the roles of the equalization and beamforming matrices (and transposing them) as well as reallocating the powers.SIC and DPC.It is well-known that the performance of linear architectures can be enhanced for the uplink and downlink via successive interference cancellation (SIC) [5], [6] and dirty-paper coding (DPC) [8]- [13], respectively.In fact, if the power, beamforming, and equalization parameters are optimized, these strategies can attain the MIMO MAC and BC capacity region.(Note that SIC only attains the corner points of the MAC capacity region and requires time sharing or rate splitting [7] to trace out the full region.) For the uplink, each decoder can cancel out the contributions of codewords recovered in prior decoding steps.For notational simplicity, let us assume that decoding occurs in ascending order, leading to the following effective channel in the m th decoding step: where the last step assumes that the first m−1 codewords have been correctly decoded, ŝℓ = s ℓ for ℓ = 1, . . ., m−1.
For i.i.d.Gaussian codebooks, this decoding strategy achieves the rate tuple For the downlink, DPC enables the transmitter to encode the m th codeword so that the interference from codewords m + 1, . . ., L is nulled out at the m th receiver.As in the uplink, other cancellations order are possible, but we have chosen this fixed order for notational simplicity.For i.i.d.Gaussian codebooks, the following rates are achievable Finally, uplink-downlink duality continues to hold between SIC and DPC.Theorem 2 ( [10]): For a given uplink channel matrix H u and (diagonal) power matrix P u that meets the total power constraint Tr(C T u C u P u ) = P total , let R SIC u,1 , . . ., R SIC u,L be an SIC rate tuple that is achievable with equalization matrix B u and precoding matrix C u .Then, for the downlink channel matrix H d = H T u , there exists a unique (diagonal) power matrix P d with total power usage Tr(B T d The same relationship can be established starting from an achievable rate tuple for the downlink and going to the uplink. Remark 1: In some cases, it may be desirable to employ rate splitting [7] at the transmitter(s).This can be viewed as creating virtual transmitters (in the uplink) or virtual receivers (in the downlink).In this setting, uplink-downlink duality continues to hold so long as the uplink and downlink users are split into virtual users in the same fashion.

C. Integer-Forcing Linear Architectures
The linear architectures discussed above (without SIC or DPC) fall short of achieving the MIMO MAC or BC capacity, due to noise amplification from the equalization step (which worsens as the condition number of the channel matrix increases).Integer-forcing linear architectures can substantially reduce this rate loss by allowing the singleuser decoders to target integer-linear combinations, rather than individual, codewords.These linear combinations can then be solved for the desired codewords.By carefully selecting the integer coefficients to match the interference presented by the channel, we can reduce the noise amplification caused by the equalization step.
ŵu,L Uplink Channel.The operations at the transmitters are the same at the transmitters, except that we use a nested lattice codebook to ensure that integer-linear combinations of codewords are themselves codewords.The goal is to recover L integer-linear combinations of the form a T u,1 S u , . . ., a T u,L S u where the a T u,m are the rows of a full-rank integer matrix A u ∈ Z L×L , i.e., To recover the m th linear combination a T u,m S u , the receiver applies an equalization vector b u,m ∈ R Mm to form the effective channel output We define the effective noise variance as Using the algebraic successive cancellation technique introduced by Ordentlich et al. in [21], we can assign each effective noise variance to a single transmitter. 3This technique will be discussed in detail in Section V. Here, we assume that the identity permutation is admissible according to Definition 2 (which can always be satisfied by reindexing).It follows from [22,Lemma 10] that the following rates are achievable To recover the codewords, the receiver now just applies the inverse of the integer matrix.As argued in [16], this inverse can be performed over the finite field from which the messages and nested lattice codes are drawn.
Remark 2: Although it is not immediately obvious, any rate tuple that is achievable via a conventional linear architecture is also achievable via an integer-forcing linear architecture by using the same beamforming matrix, setting the integer matrix to be the identity matrix, and scaling the equalization vectors by the appropriate MMSE coefficient [16, Lemma 3].While [16] only establishes this for the uplink without SIC, this can be directly generalized to the SIC case as well as the downlink with or without DPC.Downlink Channel.We use the same encoding operations at the transmitter as in a conventional linear architecture.As in the uplink, we employ a nested lattice codebook to ensure that the codebook is closed under integer-linear combinations.As first proposed by Hong and Caire [17], [18], we can also apply a precoding step over the finite field in order to "pre-invert" the linear combinations before mapping the messages to codewords.This step ensures that each receiver, upon recovering its integer-linear combination of codewords, can also obtain its desired message.
ŵd,L to make an estimate of an integer-linear combination of the lattice codewords, which, due to the inverse operation corresponds to an estimate of its desired message. wd,1 To do so, it uses an equalization vector c d,m ∈ R Mm to form the effective channel output We define the effective noise variance as In order to assign each effective power P d,m to a unique receiver, we will introduce an algebraic pre-coding technique in Section VI.Here, we assume that the identity permutation is admissible according to Definition 3 (which can always be satisfied by reindexing).As we will show in Theorem 8, the following rates are achievable Uplink-Downlink Duality.The following theorem establishes uplink-downlink duality for integer-forcing in terms of the sum rate.Theorem 3: For a given uplink channel matrix H u , integer matrix A u , and (diagonal) power matrix P u that meets the total power constraint Tr(C T u C u P u ) = P total , let R u,1 , . . ., R u,L be a rate tuple that is achievable via integer-forcing with equalization matrix B u and precoding matrix C u .Then, for the downlink channel matrix , there exists a unique (diagonal) power matrix P d with total power usage Tr(B T d B d P d ) = P total , such that the sum rate ℓ R d,ℓ = ℓ R u,ℓ is achievable via integer-forcing using equalization matrix C d = C T u and precoding matrix B d = B T u .The same relationship can be established starting from an achievable rate tuple for the downlink and going to the uplink.A full proof of this duality theorem will be given in Section VII.
Remark 3: Note that Theorem 3 only establishes duality for the sum rate whereas, for conventional linear architectures, Theorem 1 establishes duality for the individual rates.This stems from the fact that, for a conventional linear architecture, the m th effective noise variance and m th effective power are always linked to the rates R u,m and R d,m .However, for uplink integer-forcing, the effective noise variance for the m th linear combination may not correspond to R u,m .Similarly, for downlink integer-forcing, the m th effective power may not correspond to R d,m .While we can always find permutations that connect effective noise variances and power to rates, these permutations may differ between the uplink and downlink, which is an obstacle for establishing duality for individual rates.SIC and DPC.The performance of integer-forcing architectures can also be improved by SIC and DPC techniques.For the uplink, each decoder can use recovered integer-linear combinations as side information to improve its effective channel [19].Let R u be a lower, unitriangular4 matrix.At the m th decoder, we assume that a T u,1 S u , . . ., a T u,m−1 S u have been recovered correctly and form the following effective channel where where r T u,m is the m th row of R u and r u,m,ℓ is the (m, ℓ) th entry.The effective noise variance is defined to be Assuming that the identity permutation is admissible according to Definition 2, it follows from [22,Theorem 5] that the following rates are achievable: In Section VI-B, we introduce a dirty-paper integer-forcing technique, based on the nested lattice DPC technique from [20], [29].Let R d be an upper unitriangular matrix.At a high level, the nested lattice codewords s d,1 , . . ., s d,L are mapped into dirty-paper codewords s DPC,1 , . . ., s DPC,L with the property that the m th message can be recovered from a T d,m R d S DPC where The dirty-paper codewords have the same expected power as the nested lattice codewords, and the encoder generates its channel input by applying the beamforming matrix to the dirty-paper codewords, The m th decoder uses its equalization vector to generate an effective channel output where The effective noise variance is defined to be Assume that the identity permutation is admissible according to Definition 3. We will show in Theorem 10 that the following rates are achievable The following theorem establishes uplink-downlink duality in terms of the sum rates of successive and dirty-paper integer-forcing.
Theorem 4: For a given uplink channel matrix H u , integer matrix A u , and (diagonal) power matrix P u that meets the total power constraint Tr(C T u C u P u ) = P total , let R SIC u,1 , . . ., R SIC u,L be a rate tuple that is achievable via successive integer-forcing with equalization matrix B u , precoding matrix C u , and (upper unitriangular) successive cancellation matrix R u .Then, for the downlink channel matrix Remark 4: As noted in Remark 1, uplink-downlink duality continues to hold for conventional linear architectures under rate splitting.The same is true for integer-forcing architectures, but in terms of the sum rate.

IV. NESTED LATTICE CODES
Below, we review some basic lattice definitions as well as nested lattice code constructions that we will need for our achievability results.See [32] for a thorough introduction to lattice codes.

A. Lattice Definitions
The nearest neighbor quantizer associated to Λ is defined as (with ties broken in a systematic fashion).The fundamental Voronoi region V of Λ is the set of all points in R n that quantize to 0. We define the second moment Λ as where Vol(V) denotes the volume of V.
We also define the modulo operation with respect to Λ as and note that it satisfies a distributive law, [a[x] mod Λ + by] mod Λ = [ax + by] mod Λ for all a, b ∈ Z and x, y ∈ R n .Lemma 1 (Crypto Lemma): Let x be a random vector over R n and d be an independent random vector drawn uniformly over the Voronoi region V of the lattice Λ.The modulo sum [x + d] mod Λ is independent of x and uniform over V. See [32,Ch 4.1] for a full proof.
The lattice Λ C is said to be nested in the lattice Λ F if Λ C ⊂ Λ F .In this case, Λ C is called the coarse lattice and Λ F the fine lattice.A nested lattice codebook L = Λ F ∩ V C consists of all fine lattice points that fall in the fundamental Voronoi region V C of the coarse lattice.Note that nested lattices satisfy the following quantization property:

B. Nested Lattice Codes and Properties
Our encoding strategies rely on the existence of good nested lattice codebooks.Below, we describe the nested lattice ensemble as well as properties that are central to our achievability proofs.Our notation closely follows that from [22,§IV], which contains a more detailed exposition.
Recall that n denotes the blocklength of our coding scheme.Let p represent a prime number and Z p the finite field of size p.We will also need integer-valued parameters The construction begins with the generator matrix of a linear code G ∈ Z kF×n p .For ℓ = 1, . . ., L, define G C,ℓ and G F,ℓ to be the submatrices consisting of the first k C,ℓ and k F,ℓ rows of G, respectively.Let denote the resulting linear codebooks.For γ > 0 to be specified later, define the mapping φ(w) γp −1 w from Z p to R. We also define the inverse mapping φ(κ) [γ −1 pκ] mod p, which is only defined on the domain γp −1 Z.
Both of these mappings are taken elementwise when applied to vectors and will be used to go back and forth between linear codebooks and lattices.We now generate L coarse lattices and L fine lattices as follows: By construction, these lattices are nested according to the order for which the parameters k C,ℓ and k F,ℓ are increasing.Define Λ C and Λ F to be the coarsest and finest lattices in the ensemble, respectively.Let V C,ℓ and V F,ℓ denote the Voronoi regions of Λ C,ℓ and Λ F,ℓ , respectively.Finally, we take the elements of the fine lattice Λ F,ℓ that fall in the Voronoi region of the coarse lattice Λ C,ℓ to be the nested lattice codebook for the ℓ th user.The theorem below summarizes results from [31] that demonstrate that this nested lattice construction exhibits good shaping and noise tolerance properties.
Theorem 5 ( [31, Theorem 2]): For ℓ = 1, . . ., L, select parameters P ℓ > 0 and 0 < σ 2 eff,ℓ < P ℓ .Then, for any ǫ > 0 and n and p large enough, there are parameters γ, k C,ℓ , and k F,ℓ and a generator matrix G ∈ Z kF×n p such that, for ℓ = 1, . . ., L (a) the submatrices G C,ℓ and G F,ℓ are full rank.(b) the coarse lattices Λ C,ℓ have second moments close to their power constraints (c) the lattices can tolerate the desired level of effective noise.Let z 0 , z 1 , . . ., z L be independent noise vectors where z 0 ∼ N (0, I) and z ℓ ∼ Unif(V C,ℓ ).For any eff,m , any fine lattice point λ ∈ Λ F,m can recover from z eff with high probability, Similarly, if β 2 0 + L ℓ=1 β 2 ℓ P ℓ ≤ P m , any coarse lattice point λ ∈ Λ C,m can recover from z eff with high probability, (d) the rates of the nested lattice codes satisfy Finally, it can be argued that we can label lattice codewords so that integer-linear combinations of codewords correspond to linear combinations of the messages over Z p .We recall the definition of a linear labeling from [33].
Definition 1: We say that a mapping ϕ : where q ℓ = [a ℓ ] mod p.
Consider the mapping that sets ϕ(λ) to be the last k components of the unique vector v ∈ Z kF p satisfying φ(λ) = G T v. From [22, Theorem 10], ϕ is a linear labeling.We also define the inverse map φ φ G T 0 kC w , which satisfies ϕ( φ(w)) = w.

C. Intuition via Signal Levels
We now develop some intuition by describing the linear labeling of our nested lattice construction in terms of "signal levels" over Z k p .See Figure 7 for an illustration.Each codeword from the ℓ th nested lattice codebook can be expressed as an element from the ℓ th fine lattice, λ F,ℓ ∈ Λ F,ℓ modulo the ℓ th coarse lattice, [λ F,ℓ ] mod Λ C,ℓ ∈ L ℓ .We can thus write the linear label of any codeword in L ℓ as where ⊖ denotes the subtraction operation over Z p .From Definition 1(a), we know that ϕ Q ΛC,ℓ (λ F,ℓ ) only occupies the top k C,ℓ − k C elements of the vector corresponding to the linear label.Similarly, we know that ϕ(λ F,ℓ ) only occupies the top k F − k F,ℓ elements of the vector.Therefore, the first k C,ℓ − k C elements are determined by the shaping operation modL ℓ and can be interpreted as enforcing the power constraint P ℓ .The next k F,ℓ − k C,ℓ elements are occupied by information symbols and the final k F − k F,ℓ elements are zero, which can be interpreted as enforcing the noise tolerance threshold σ 2 eff,ℓ .

V. UPLINK INTEGER-FORCING ARCHITECTURE
Our uplink coding scheme is taken from [22,Section VI].Below, we summarize the encoding and decoding operations for successive integer-forcing in order to highlight the similarities between the uplink and downlink integer-forcing schemes.
We begin by selecting a power allocation P u = diag(P u,1 , . . ., P u,L ) for the codewords and a beamforming matrix C u satisfying (12).Note that, in order to meet the total power constraint with equality, we require that Tr(C T u C u P u ) = P total .We also select a full-rank integer matrix A ∈ Z L×L , an equalization matrix and an L × L lower unitriangular successive cancellation matrix R u .These choices specify the effective noise variances σ 2 u,SIC,m from ( 46).Remark 5: The uplink integer-forcing strategy without SIC is equivalent to setting R u = I.We omit a full description of this special case for the sake of brevity.The achievable rates are given in Corollary 7.
The structure of the integer matrix A u determines which codewords can be cancelled out in each decoding step.In order to keep our notation manageable, we assume that A u is selected so that the m th user can be associated with the m th effective noise variance.The following definition describes when this is possible.
Definition 2: We say that the identity permutation is admissible for the uplink if (a) the effective noise variances are in increasing order, σ 2 u,SIC,1 ≤ • • • ≤ σ 2 u,SIC,L and (b) the leading principal submatrices of A u are full rank, rank(A The first condition can always be met by reordering the rows of A u , B u , and R u .Note that R u will also need to be modified, since it will not remain upper unitriangular under row permutation.The second condition always holds up to a column permutation on A u , which corresponds to reindexing the users, and hence the powers and beamforming vectors.For notational convenience, we assume going forward that the rows and columns of A u have been reindexed so that these two conditions are satisfied. Furthermore, for our decoding procedure, we will need to triangularize A u over Z p in the following sense.We need a lower unitriangular matrix L ∈ Z L×L p such that Ā = [ LA u ] mod p is upper triangular.First, note that the condition in Definition 2(b) is equivalent to the condition that there exists a lower unitriangular matrix L ∈ R L×L such that LA is upper triangular.Given the existence of such an L, it follows from [21, Appendix A] that, for p large enough, we can always find an appropriate L. It also follows that L has a lower unitriangular inverse L(inv) over Z p .
Using the linear labeling ϕ, we can show that each nested lattice codebook L ℓ is isomorphic to the vector space Z kF,ℓ−kC,ℓ p .Each user will take the p-ary expansion of its message index w u,ℓ to obtain a message vector w u,ℓ ∈ Z kF,ℓ−kC,ℓ p .The intermediate goal of the receiver is to recover L linear combinations of the form where q u,m,ℓ = [a u,m,ℓ ] mod p, a u,m,ℓ is the (m, ℓ) th entry of A u , and wu,ℓ ∈ w u,ℓ with That is, the receiver attempts to recover L linear combinations of cosets of the messages.As discussed in [22], the flexibility to choose e above seems to be necessary in order to permit unequal power allocation across the users via nested lattice codes.We now state the encoding and decoding steps used in the successive integer-forcing architecture.We select an ensemble of good nested lattices Λ C,1 , . . ., Λ C,L , Λ F,1 , . . ., Λ F,L with parameters P u,1 , . . ., P u,L and σ 2 u,SIC,1 , . . ., σ 2 u,SIC,L using Theorem 5.
Encoding: The ℓ th transmitter starts by taking the p-ary expansion of its message index w u,ℓ to obtain the message vector w u,ℓ ∈ Z kF,ℓ−kC,ℓ p . It then uses the inverse linear labeling to map this to a lattice point and dithers it to produce the codeword where the dither vector d u,ℓ is drawn independently and uniformly over V C,ℓ .Thus, by the Crypto Lemma and Theorem 5(b), s u,ℓ is independent of λ u,ℓ and has expected power close to P u,ℓ .Finally, the ℓ th transmitter uses its beamforming vector c u,ℓ to produce its channel input Decoding: The receiver attempts to recover linear combinations of the form (69) one-by-one via successive cancellation and then solve them to obtain estimates of the message vectors.As an intermediate step, the receiver will attempt to decode certain integer-linear combinations of the lattice codewords, i.e., where λu,ℓ λ u,ℓ − Q ΛC,ℓ (λ u,ℓ + d u,ℓ ).The linear labels of these integer-linear combinations correspond to the desired linear combinations, ϕ(µ u,m ) = u u,m .It will also attempt to recover integer-linear combinations of the dithered codewords, i.e., The main obstacle is that, in order to decode the m th integer-linear combination, the receiver must first cancel out the first m − 1 codewords using the prior m − 1 linear combinations.This is accomplished via the algebraic SIC technique from [21].Define where lm,i is the (m, i) th entry of L and ām,ℓ is the (m, ℓ) th entry of Ā defined above.Note that ν u,m ∈ Λ F,m and, given ν 1 , . . ., ν m , we can recover µ u,m : where l(inv) m,i is the (m, i) th entry of L(inv) .For the m th decoding step, we assume that the receiver has already successfully recovered the previous m − 1 integer-linear combinations, i.e., μu,1 = µ u,1 , . . ., μu,m−1 = µ u,m−1 and tu,1 = t u,1 , . . ., tu,m−1 = t u,m−1 .The receiver uses this side information to form the effective channel output The receiver then removes the dithers 5 , nulls out the lattice codewords corresponding to the first m − 1 users, and quantizes onto the m th fine lattice, where the second step follows from [22,§VI].It then forms an estimate of its desired linear combination Afterwards, it attempts to recover the integer combination of the dithered codewords, Finally, if all L linear combinations have been recovered correctly, we can solve the linear combinations to recover the original messages.
Theorem 6 ( [22, Lemma 13]): Choose a power allocation P u = diag(P u,1 , . . ., P u,L ), beamforming matrix C u ∈ R M ×L satisfying (12), channel matrix H u ∈ R N ×M , full-rank integer matrix A u ∈ Z L×L , equalization vectors u,m ∈ R N , and lower unitriangular successive cancellation matrix R u = [r u,1 • • • r u,L ] T ∈ R L×L .Assume, without loss of generality that the identity permutation is admissible for the uplink according to Definition 2.Then, the following rates are achievable for m = 1, . . ., L.
It is well-known that i.i.d.Gaussian encoding combined with SIC decoding can attain the corner points of the capacity region [5], [6].It turns out that nested lattice encoding combined with successive integer-forcing can attain the same corner points, and can sometimes also attain rate tuples on the interior of the sum-capacity dominant face [19,Theorem 2].See [19,Theorem 2], [22,Theorem 5] for more details.
As noted in Remark 5, integer-forcing without SIC is equivalent to setting R u = I.Corollary 7: Set R u = I.Choose a power allocation P u = diag(P u,1 , . . ., P u,L ), beamforming matrix C u ∈ R M ×L satisfying (12), channel matrix and equalization vectors b u,m ∈ R N .Assume, without loss of generality, that the identity permutation is admissible for the uplink according to Definition 2.Then, the following rates are achievable 6: If we do not wish to reindex the transmitters or linear combinations, the achievable rates can be expressed as follows.Let θ be a permutation that places the effective noise variances in increasing order.Also, let π be a permutation such that the leading principal submatrices of Θ θ A d Θ π are full rank where Θ θ and Θ π are the permutation matrices corresponding to θ and π, respectively.Then, the rates R u,π(m) = 1 2 log + (P u,π(m) /σ 2 u,eff,θ(m) ), m = 1, . . ., L are achievable.

VI. DOWNLINK INTEGER-FORCING ARCHITECTURE
The key idea underlying downlink integer-forcing is the fact that the transmitter can pre-invert the linear combinations prior to encoding.This technique, which was first proposed by Hong and Caire [17], [18], allows each receiver to decode any integer-linear combination of the codewords in order to reduce the effective noise but still recover its desired message.These papers focused on the important special case where all users employ the same fine and coarse lattices, and thus have equal powers and must tolerate the worst effective noise across receivers.Below, we generalize their strategy to allow for unequal powers and a unique effective noise variance associated to each receiver.If the transmit antennas operate at different power levels, it is not possible to simply invert the linear combinations at the transmitter.Instead, for each symbol, we will need to apply the inverse of a submatrix that only includes the participating transmitters.We will show that this integer-forcing beamforming strategy can operate within a constant gap of the downlink sum-capacity.
Afterwards, we will introduce a dirty-paper integer-forcing scheme, building on the lattice-based dirty-paper strategy from [20], [29].We will argue that this dirty-paper integer-forcing strategy can achieve the downlink sum-capacity.

A. Integer-Forcing Beamforming
We begin by choosing a power allocation P d = diag(P d,1 , . . ., P d,L ) for the codewords and a full-rank integer matrix A d ∈ Z L×L .We also select a beamforming matrix B d ∈ R N ×L and equalization vectors c d,m ∈ R Mm , m = 1, . . ., L. To meet the total power constraint with equality, we need that Tr(B T d B d P d ) ≤ P total .Taken together, these choices specify the effective noise variances σ 2 d,eff,m from (40).As in the uplink case, the structure of the integer matrix A d will determine the order in which interference cancellation is possible via dirty-paper precoding.To simplify our notation, we will assume that A d is selected so that the m th user can be associated with the m th power P d,m .We specify when this is possible below.The first condition can be satisfied by reindexing the transmit antennas, which corresponds to reordering the powers, and permuting the columns of A d and B d .The second condition can always be satisfied by reindexing the receivers, which corresponds to reordering the equalization vectors and permuting the rows of A d .To keep our notation manageable, we assume below that the rows and columns of A d have been permuted so that Definition 3 holds.
We now describe the encoding and decoding steps used in the integer-forcing beamforming architecture.Using the parameters P d,1 , . . ., P d,L and σ 2 d,eff,1 , . . ., σ 2 d,eff,L , we pick a good ensemble of nested lattices Λ C,1 , . . . ,Λ C,L , Λ F,1 , . . . ,Λ F,L via Theorem 5. We will assume that the prime p used in the lattice construction is large enough so that Q ] mod p is full rank over Z p for m = 1, . . ., L. It is always possible to choose such a prime, as argued in [22,Lemmas 3,4].Encoding: Take the p-ary expansion of each message w d,ℓ to obtain the message vector w ℓ ∈ Z kF,ℓ−kC,ℓ p for ℓ = 1, . . ., L. These vectors are then zero-padded to obtain We now proceed to pre-invert the linear combinations symbol-by-symbol.Recall that the notation w[i] refers to the i th entry of the vector w.
In this regime, all codewords have sufficient power to participate, meaning that we can simply apply the inverse, Note that the L th codeword does not have sufficient power to control any other entries.Therefore, we set , apply the inverse linear labeling to obtain a fine lattice point and then generate our dithered codeword where the dither vector d d,L is drawn independently and uniformly over V C,L .This codeword will contribute interference of the form to the remaining signal levels.
In this regime, only the first m codewords have sufficient power to participate.Thus, we cancel out the interference caused by codewords m + 1, . . ., L and apply the inverse of the m th leading principal submatrix, Note that the m th codeword does not have sufficient power to control any other entries.Therefore, we set , apply the inverse linear labeling to obtain a fine lattice point and then generate our dithered codeword where the dither vector d d,m is drawn independently and uniformly over V C,m .This codeword will contribute digital interference of the form to the remaining signal levels.
After all signal levels have been set, we stack the dithered codewords and apply the beamforming matrix to create the channel input Decoding: The goal of each receiver is to decode its message vector w d,ℓ .As a first step, it will make an estimate of the following integer-linear combination of the lattice codewords, where a d,m,ℓ is the (m, ℓ) th entry of A d and λd,ℓ ).It forms its estimate by equalizing its observation removing the dither vectors, quantizing onto the m th fine lattice, and taking the modulus with respect to the coarsest lattice, The linear label of this estimate can be viewed as an estimate of the desired message along with zero-padding, for some ẽd,m ∈ Z kC,m−kC p . As we will argue below, if μd,m = µ d,m , then ŵd,m = w d,m .Theorem 8: Choose a power allocation P d = diag(P d,1 , . . ., P d,L ), beamforming matrix B d ∈ R N ×L , channel matrices H d,m ∈ R Mm×N , full-rank integer matrix A d ∈ Z L×L , and equalization vectors c d,m ∈ R Mm .Assume, without loss of generality, that the identity permutation is admissible for the downlink according to Definition 3.Then, the following rates are achievable for m = 1, . . ., L. Proof: By the Crypto Lemma, each dithered codeword s d,ℓ is uniformly distributed over V C,ℓ and independent of the other dithered codewords.Thus, by Theorem 5(b), we have that 1 n E s d,ℓ 2 ≤ P d,ℓ , which guarantees that the power constraint is met At the receiver side, we need to argue that μd,m = µ d,m with high probability and, if so, ŵd,m = w d,m .We begin by examining the linear labeling of µ d,m , Now, we examine the i th symbol of this linear label for k = and, using (60), From Theorem 5(c), we know that, since µ d,m ∈ Λ F,m , the quantization step can tolerate noise with effective variance σ 2 d,eff,m , which implies that P(μ m = µ m ) < ǫ.From Theorem 5(d), we know that the rate satisfies Finally, following the steps in [22, Appendix H], we can show that good fixed dither vectors exist.
Remark 7: If we do not wish to reindex the transmit antennas or receivers, the achievable rates can be expressed as follows.Let θ be a permutation that places the codeword powers in decreasing order.Also, let π be a permutation such that the leading principal submatrices of Θ π A d Θ θ are full rank where Θ π and Θ θ are the permutation matrices corresponding to π and θ, respectively.Then, the rates R d,π(m) = 1 2 log + (P d,θ(m) /σ 2 d,eff,π(m) ), m = 1, . . ., L are achievable.
For the uplink channel, it is known that integer-forcing without SIC can operate with a constant gap of the sum-capacity [21, Theorem 3], [22,Theorem 4].The theorem below establishes the dual result for the downlink channel.
Theorem 9: For any channel matrix H d ∈ R M ×N and total power constraint P total , there is a choice of the power allocation P d , integer matrix A d , beamforming matrix B d , and equalization vectors c d,m such that the integer-forcing beamforming architecture can operate within a constant gap of the downlink sum-capacity, where K 0 means that the matrix K must be positive semidefinite.
The proof makes use of uplink-downlink duality and is deferred to Appendix A.

B. Dirty-Paper Integer-Forcing
The integer-forcing transmitter described above carefully cancels out interference between receivers in the digital domain.Here, we argue that the performance can be further enhanced via dirty-paper coding in the analog domain.
Prior work [20], [29] demonstrated that nested lattice codes are an ideal building block for dirty-paper strategies, and serves as an inspiration for the scheme proposed below.
As before, we select a power allocation P d = diag(P d,1 , . . ., P d,L ) and a full-rank integer-matrix A d ∈ Z L×L .We also choose a beamforming matrix B d ∈ R N ×L that meets the power constraint, Tr(B T d B d P d ) ≤ P total , as well as equalization vectors c d,m ∈ R Mm , m = 1, . . . ,L. Finally, we choose an upper unitriangular matrix R d ∈ R L×L that specifies the coefficients used in the dirty-paper cancellation process.These choices determine the effective noise variances σ 2 d,DPC,m from (53).To streamline our notation, we assume below that the transmit antennas and receivers have been reindexed so that Definition 3 is satisfied.

Encoding:
The encoding steps are identical from (90) to (94) for the initialization step.We also define the L th dirty-paper codeword as For the rest of the signal levels, we proceed by induction for m = 1, . . ., L − 1, assuming that v d,ℓ , λ d,ℓ , s d,ℓ , s DPC,ℓ , e d,ℓ have been set for ℓ = m + 1, . . ., L.
The induction steps are the same as before from (95) to (97).We then map the dithered codeword to a dirty-paper codeword, This dirty-paper codeword will contribute digital interference of the form to the remaining signal levels.After all signal levels have been set, we stack the dirty-paper codewords and apply the beamforming matrix to create the channel input Decoding: The decoding steps at each receiver are identical to those in (101) to (104) except that we define the lattice points in the integer-linear combination (101) by We now establish the achievable rates for dirty-paper integer-forcing.
Theorem 10: Choose a power allocation P d = diag(P d,1 , . . ., P d,L ), upper unitriangular dirty-paper matrix R d ∈ R L×L , beamforming matrix B d ∈ R N ×L , channel matrices H d,m ∈ R Mm×N , full-rank integer matrix A d ∈ Z L×L , and equalization vectors c d,m ∈ R Mm .Assume, without loss of generality, that the identity permutation is admissible for the downlink according to Definition 3.Then, the following rates are achievable for m = 1, . . ., L.
As part of the proof, we will need the following lemma.
where z d,DPC,m is defined in (53)  For the uplink channel, it is known that successive integer-forcing can attain the sum-capacity [19, Theorem 2], [22,Theorem 5].The theorem below establishes the dual result for the downlink channel.
Theorem 11: For any channel matrix H d ∈ R M ×N and total power constraint P total , there is a choice of the power allocation P d , dirty-paper matrix R d , beamforming matrix B d , integer matrix A d , and equalization vectors c d,m such that the dirty-paper integer-forcing architecture can attain the downlink sum-capacity, where K 0 means that the matrix K must be positive semidefinite.
The proof makes use of uplink-downlink duality and is deferred to Appendix B.

VII. UPLINK-DOWNLINK DUALITY
As discussed in Section III-A, the uplink and downlink capacity regions are duals of one another [9]- [12].Furthermore, for conventional linear architectures, we can achieve the same rate tuple on dual uplink and downlink channels by exchanging the roles of the beamforming and equalization matrices (and transposing them) [10].(See Theorems 1 and 2 above for a precise statement in our notation.)For integer-forcing architectures, we can establish a similar form of uplink-downlink duality, but only for the sum rate.Let denote the ℓ th effective SINR for the uplink and let denote the ℓ th effective SINR for the downlink.Our uplink-downlink duality results stem from showing that if the effective SINRs β u,ℓ can be established on the uplink, then the effective SINRs β d,ℓ = β u,ℓ can be established on the downlink, and vice versa.Unfortunately, this does not immediately translate to duality of the achievable rate tuples, since the rates R SIC u,ℓ = 1 2 log(β u,ℓ ) and R DPC d,ℓ = 1 2 log(β d,ℓ ) are only achievable within our integer-forcing framework if the identity permutation is admissible for both the uplink and downlink.
Theorems 3 and 4 establish uplink-downlink duality by exchanging the roles of the pre-and post-processing matrices at the transmitters and receivers.As noted in Remark 3, for this form of duality, the admissibility of the identity permutation is not preserved in general, even with the freedom to reindex the transmitters and receivers.However, we can always find permutations π u and π d such that the rates R SIC u,ℓ = 1/2 log(P u,ℓ /σ 2 u,SIC,πu(ℓ) ) and R DPC u,ℓ = 1/2 log(P d,πd(ℓ) /σ 2 d,DPC,ℓ ) are achievable via integer-forcing.Therefore, the duality of the effective SINRs allows us to establish sum-rate duality, We now recall the following basic results for non-negative matrices.A vector or a matrix is non-negative (i.e., F ≥ 0) if all its entries are non-negative.A vector or a matrix is positive (i.e., F > 0) if all its entries are positive.A square matrix F is a Z-matrix if all its off-diagonal elements are non-positive.An M-matrix is a Z-matrix with eigenvalues whose real parts are positive.The following lemma is a special case of [49,Theorem 1].
Lemma 3 ( [49, Theorem 1]): Let F be a square Z-matrix.The following statements are equivalent: (a) F is a non-singular M-matrix.(b) F has a non-negative inverse.That is, F −1 exists and F −1 ≥ 0. (c) There exists x ≥ 0 satisfying Fx > 0. (d) Every real eigenvalue of F is positive.
The lemma below establishes uplink-downlink duality for the effective SINRs.Lemma 4: Select a (diagonal) power matrix P u , beamforming matrix C u satisfying (12) and Tr(C T u C u P u ) = P total , channel matrix H u , full-rank integer matrix A u , equalization matrix B u , and lower unitriangular successive cancellation matrix R u .Let β u,ℓ , ℓ = 1, . . ., L denote the effective uplink SINRs and assume that β u,ℓ > 0, ℓ = 1, . . ., L. Then, for dirty-paper matrix Proof: Our proof is inspired by the approach of [10].We begin by defining vector notation for the powers and effective SINRs, Let cu,ℓ denote the ℓ th column of C u (with zero-padding included) and āu,ℓ denote the ℓ th column of ]).It can be verified that the relations can be equivalently expressed as We now repeat this process for the downlink.Let bd,ℓ denote the ℓ th column of B d and rd,ℓ denote the ]).It can be verified that the relations can be equivalently expressed as By assumption, we have that Furthermore, since M u and M d are non-negative, we have that (I − diag(β u )M u ) and (I−diag(β d )M d ) are Z-matrices.Since, by assumption, β u > 0, we have that J u β u > 0 and thus (I−diag(β u )M u ) satisfies condition (c) of Lemma 3.This implies that every real eigenvalue of (I − diag(β u )M u ) is positive.Setting which implies that all real eigenvalues of (I − diag(β d )M d ) are also positive, satisfying condition (d) of Lemma 3.This implies that the inverse (I − diag(β d )M d ) −1 exists and is non-negative.Combining this with the fact that G d β d ≥ 0, we know that there exists a non-negative power vector that attains the desired effective downlink SINRs.It remains to show that the total downlink power consumption is equal to the total uplink power consumption.
The total downlink power consumption can be written as We now demonstrate these quantities are equal: where (a) uses the fact that G d = G u since C T d = C u .Proof of Theorems 3 and 4: Theorem 3 is a special case of Theorem 4 with R d = R T u = I, so it suffices to establish Theorem 4. Recall that we have assumed that the identity permutation is admissible for the uplink, so the rates R SIC u,ℓ = 1 2 log(β u,ℓ ) are achievable.Assume, without loss of generality, that all uplink powers P u,ℓ are positive, which implies that all effective uplink SINRs β u,ℓ are positive as well.From Lemma 4, we know that we can establish effective downlink SINRs β d,ℓ = β u,ℓ with the same total power consumption.Finally, it follows from (137) that a sum rate satisfying ℓ R DPC d,ℓ = ℓ R SIC u,ℓ is achievable on the downlink.

VIII. ITERATIVE OPTIMIZATION VIA DUALITY
In this section, we present an iterative optimization algorithm for the non-convex problem of optimizing the beamforming, equalization, and successive cancellation (or dirty-paper) matrices in order to maximize the sum rate.Our algorithm exploits integer-forcing duality to converge to a local optimum.We also explore algorithms for optimizing the integer matrix.We first present our algorithm for the uplink channel, and afterwards state the modifications needed to use it on a downlink channel.

A. Uplink Optimization
For a given uplink channel matrix H u and total power constraint P total , our task is to maximize the sum rate by selecting the (diagonal) power allocation matrix P u , beamforming matrix C u satisfying (12), full-rank integer matrix A u , beamforming matrix B u , and (lower unitriangular) successive cancellation matrix R u .This corresponds to the following optimization problem:

IX. NUMERICAL RESULTS
We now provide simulation results for our integer-forcing architecture and compare its performance to that of zero-forcing as well as capacity bounds.Owing to uplink-downlink duality, we can simultaneously plot the sum rate for the uplink and downlink channel.Consider a basestation with N = 4 antennas and L = 4 single-antenna users, M ℓ = 1, ℓ = 1, . . ., 4. For simplicity, we will state our notation in terms of the uplink channel.We draw the channel matrix H u elementwise i.i.d.N (0, 1).In Figure 8, we have plotted the average MIMO MAC sum capacity from (9) (corresponding to the average MIMO BC sum capacity from (10)) with respect to SNR using the dual decomposition approach from [53] to find the waterfilling solution.We have also plotted the average sum rate for uplink integer-forcing from Corollary 7 (corresponding to the average sum rate for downlink integer-forcing from Theorem 8).The integer matrix A u is chosen using the LLL algorithm to approximate the successive minima of the lattice G T Z L where G is defined in (155) with the initial choice of C u = I.Afterwards, we iteratively optimize B u , C u , and P u using Algorithm 1 while holding R u = I fixed.
Figure 8 also plots the average sum rate for uplink integer-forcing with SIC from Theorem 6 (corresponding to the average sum rate for downlink integer-forcing DPC from Theorem 10).We select the integer matrix A u as above. 6The matrices B u , C u , P u , and R u are optimized via Algorithm 1.For comparison, we have plotted the average sum rate of uplink zero-forcing from (17) (corresponding to the average sum rate of downlink zero-forcing from (22)).The matrices B u , C u , and P u are iteratively optimized using Algorithm 1 while holding A u = I and R u = I fixed.We have also plotted the average sum rate of uplink zeroforcing with SIC from (27) (corresponding to the average sum rate of downlink zero-forcing with DPC from (28)).For each trial, we select the best among all possible decoding orders in terms of sum rate.The beamforming, equalization, and power matrices are iteratively optimized as above as above.
Overall, we find that integer-forcing with and without SIC and zero-forcing with SIC operate very close to the capacity bound.With this in mind, one potential advantage of integer-forcing without SIC is that it does not require analog successive cancellation in between decoding steps, which can be difficult to implement in practice.Note that while integer-forcing with SIC and zero-forcing with SIC can reach the sum capacity, this requires optimal power allocation, which is not guaranteed by Algorithm 1.
Finally, in Figure 9, we have plotted the peak-to-average power ratio (PAPR) for all of the strategies described above in the context of the downlink.Specifically, we plot the highest power across the transmit antennas divided by the average power.Note that integer-forcing performs similarly to zero-forcing with DPC and that integer-forcing with DPC has a lower PAPR than zero-forcing with DPC.

X. CONCLUSION
In this paper, we established an uplink-downlink duality relationship for integer-forcing.In the process, we extended prior work on downlink integer-forcing to allow for unequal powers, unequal rates, and dirty-paper coding.Using the duality relationship, we developed an iterative algorithm for the non-convex problem of optimizing the beamforming and equalization matrices.We also demonstrated that downlink integer-forcing can operate within a constant gap of the MIMO BC sum capacity without the use of DPC.
An interesting direction for future work is utilizing uplink-downlink duality to optimize integer-forcing architectures for more complicated Gaussian networks.For instance, recent work [24] has utilized uplink-downlink duality as a building block for optimizing the beamforming and equalization matrices used in integer-forcing interference alignment [54].

APPENDIX A PROOF OF THEOREM 9
It is well-known [9]- [11] that the sum capacity of the MIMO BC is equal to that of the dual MIMO MAC,

APPENDIX B PROOF OF THEOREM 11
It is well-known [9]- [11] that the sum capacity of the MIMO BC is equal to that of the dual MIMO MAC, and B. Nazer were supported by NSF grants CCF-1253918 and CCF-1302600.The work of S. Shamai has been supported by the Israel Science Foundation (ISF), by the European Commission in the Framework of the FP7 Network of Excellence in Wireless COMmunications (NEWCOM#) and by the S. and N. Grand Research Fund.This work was presented at the 2014 Communication Theory Workshop, 2014 IEEE International Symposium on Information Theory, the 15th IEEE International Symposium on Signal Processing Advances in Wireless Communications in 2014, and the 53rd Allerton Conference on Communications, Control, and Computing in 2015.

Fig. 3 .
Fig. 3. Block diagram of the integer-forcing uplink architecture.Each message vector w u,ℓ is encoded into a dithered lattice codeword s u,ℓ and mapped to a channel input X u,ℓ = c u,ℓ s T u,ℓ .For m = 1, . . ., L, the receiver uses an equalized channel output ỹu,m = b T u,m Yu to make an estimate ûu,m of the linear combination uu,m.The SISO decoders are potentially enhanced by successive cancellation (illustrated with green arrows).Finally, the linear combinations are inverted to recover estimates ŵu,ℓ of the message vectors.

Fig. 4 .
Fig.4.Block diagram of the effective channel induced by the integer-forcing uplink architecture.The m th decoder observes an integer-linear combination of the codewords plus effective noise, ℓ a u,m,ℓ s u,ℓ + zu,eff,m from which it makes an estimate of the linear combination u u,ℓ with coefficients q u,m,ℓ = [a u,m,ℓ ] mod p. (If the decoders use successive interference cancellation, then zu,eff,m is replaced with zu,SIC,m.)Finally, it applies the inverse of the matrix Qu = {q u,m,ℓ } over Zp to estimate the message.

Fig. 5 .
Fig. 5. Block diagram of the integer-forcing downlink architecture.The encoder applies the inverse of Q d = [A d ] mod p over Zp to the message vectors w d,1 , . . ., w d,L and then maps the results to dithered lattice codewords s d,1 , . . ., s d,L .The SISO encoders are possibly enhanced with dirty-paper coding (illustrated by green arrows).The channel input is formed by beamforming these codewords, X d = ℓ b d,ℓ s T d,ℓ .The m th decoder uses an equalized channel output ỹd,m = cT d,m Y d,mto make an estimate of an integer-linear combination of the lattice codewords, which, due to the inverse operation corresponds to an estimate of its desired message.

Fig. 6 .
Fig. 6.Block diagram of the effective channel induced by the integer-forcing downlink architecture.The m th decoder observes an integerlinear combination of the codewords plus effective noise, ℓ a d,m,ℓ s d,ℓ + z d,eff,m .(If the encoders use dirty-paper coding, then z d,eff,m is replaced with zu,DPC,m.)Since the encoder applied the inverse of Q d = [A d ] mod p over Zp to the message vectors prior to mapping them to lattice codewords, then the m th integer-linear combination corresponds to the m th message.
there exists a unique (diagonal) power matrix P d with total power usage Tr(B T d B d P d ) = P total , such that the same sum rate ℓ R DPC d,ℓ = ℓ R SIC u,ℓ is achievable via dirty-paper integer-forcing using equalization matrix C d = C T u , precoding matrix B d = B T u , and dirty-paper matrix R d = R T u .The same relationship can be established starting from an achievable rate tuple for the downlink and going to the uplink.See Section VII for the proof.

Fig. 7 .
Fig. 7. Illustration of the linear labeling of the ℓ th nested lattice codebook.The first k C,ℓ − kC elements of the linear label are "don't care" entries (denoted by the * symbol) and correspond to the mod Λ C,ℓ operation.The next k F,ℓ − k C,ℓ elements are free to carry information symbols (denoted by blue circles).The last kF − k F,ℓ elements are zero (denoted by black circles).

Definition 3 :
We say that the identity permutation is admissible for the downlink if (a) the powers are in decreasing order, P d,1 ≥ • • • ≥ P d,L and (b) the leading principal submatrices of A d are full rank, rank(A [1:m] d ) = m for m = 1, . . ., L.

Lemma 2 :a
Let r T d,m be the m th row of the upper unitriangular matrix R d used in the dirty-paper encoding process.We have thatr T d,m S DPC = λT d,m + d T d,m .m,ℓ s T DPC,ℓ mod Λ C,m + L ℓ=m+1 r d,m,ℓ s T DPC,ℓ = λ T d,m + d T d,m − L ℓ=m+1 r d,m,ℓ s T DPC,ℓ − Q ΛC,m λ T d,m + d T d,m − L ℓ=m+1 r d,m,ℓ s T DPC,ℓ + L ℓ=m+1 r d,m,ℓ s T DPC,ℓ (b) = λT d,m + d T d,mwhere step (a) uses the fact that R d is upper unitriangular and step (b) uses (127).Proof of Theorem 10:We can show that the expected power constraint holds using the argument from the beginning of the proof of Theorem 8. We can also follow the steps in the proof of Theorem 8 to establish thatu d,m [i] = wd,m [i] for k C,m − k C + 1 ≤ i ≤ kand that µ m ∈ Λ F,m .It remains to show that μd,m = µ d,m with probability at least 1 − ǫ.First, we can rewrite the m th effective channel output as ỹT d,m = a T d,m R d S DPC + z T d,m,ℓ ( λT d,m + d T d,m ) + z T d,DPC,m and equalization matrix C d = C T u , there exists a unique (diagonal) power matrix P d with total power usage Tr(B T d B d P d ) = P total , that yields effective downlink SINRs β d,ℓ = β u,ℓ , ℓ = 1, . . ., L. The same relationship can be established starting from effective SINRs for the downlink and going to the uplink.
]) and J d as the L × L matrix with (m, ℓ) th entry [J d ] m,ℓ = b 2 d,m,ℓ where b d,m,ℓ is the (m, ℓ) th entry of B d .The total uplink power consumption can be written as

maxAlgorithm 2
Au,Bu,Cu,Pu,Ru Tr(C T u C u P u ) ≤ P total Iterative Downlink Optimization via Duality Given H d and P total .Set initial parameters A d , B d , C d , P d , and R d .Calculate initial downlink SINRs β d .while β d not converged do Optimize C d using (154).Create virtual uplink downlink channel withA u = A T d , B u = B T d , C u = C T d , and R u = R T d .Solve for ρ u using (140) and set P u = diag(ρ u ).Optimize B u using (149) and R u using(148).form = 2 to L do if σ 2 u,SIC,m < σ 2 u,SIC,m−1 then Set r u,m = ru,m using (150).end if end for Update B d = B T u and R d = R T u .Solve for ρ d using (143) and set P d = diag(ρ d ).Update β d .end while Output A d , B d , C d , R d , P d , and β d .

Fig. 8 .
Fig. 8. Average sum rate for integer-forcing and zero-forcing architectures on either an uplink or downlink channel with N = 4 basestation antennas and L = 4 single-antenna users.

Fig. 9 .
Fig. 9. Peak-to-average power ratio for integer-forcing and zero-forcing architectures for a downlink channel with N = 4 basestation antennas and L = 4 single-antenna users.
I + H u KH T u (157) where H u = H T d .Select a covariance matrix K opt that attains the MIMO MAC sum capacity.Next, select a power allocation P u and a beamforming matrix C u satisfying C u P u C T u = K opt .From [22, Theorem 4], there exists an integer matrix A u such that integer-forcing via Corollary 7 attains the sum rateL ℓ=1 R u,ℓ = 1 2 log det I + H u C u P u C T u H equalization matrix B u from (149).From Theorem 3, we can attain the same sum rate on the downlink by using A d = A T u , B d = B T u , and C d = C T u as well as solving for the downlink power vector ρ d using (143) and setting P d = diag(ρ d ).
I + H u KH T u (160)where H u = H T d .Select a covariance matrix K opt that attains the MIMO MAC sum capacity.Next, select a power allocation P u and a beamforming matrixC u satisfying C u P u C T u = K opt .From [22,Theorem 5], there exists an integer matrix A u such that integer-forcing with SIC via Theorem 6 attains the sum rateI + H u C u P u C T u H I + H u K opt H T u .(162)using the optimal successive cancellation and equalization matrices R u and B u from (148) and (149), respectively.From Theorem 4, we can attain the same sum rate on the downlink by usingA d = A T u , B d = B T u , and C d = C T u , R d = R Tu as well as solving for the downlink power vector ρ d using (143) and setting P d = diag(ρ d ).
Mm×Nis the channel matrix from the transmitter to the m th receiver and the noise Z d,m ∈ R Mm×n is elementwise i.i.d.Gaussian with mean zero and variance one.The receiver passes its channel output through a decoder D d,m : R Mm×n → {1, 2, . .., 2 nRd,m } in order to get an estimate ŵd,m = D d,m (Y d,m ) of its desired message.Overall, we say that the downlink rates R d,1 , . . ., R d,L are achievable if, for any ǫ > 0 and n large enough, there exist an encoder and decoders such that P( L ℓ=1 { ŵd,ℓ = w d,ℓ }) < ǫ.The downlink capacity region is the closure of the set of all achievable rates.