Adaptive Coding and Channel Shaping Through Reconfigurable Intelligent Surfaces: An Information-Theoretic Analysis

A communication link aided by a reconfigurable intelligent surface (RIS) is studied in which the transmitter can control the state of the RIS via a finite-rate control link. Channel state information (CSI) is acquired at the receiver based on pilot-assisted channel estimation, and it may or may not be shared with the transmitter. Considering quasi-static fading channels with imperfect CSI, capacity-achieving signalling is shown to implement joint encoding of the transmitted signal and of the response of the RIS. This demonstrates the information-theoretic optimality of RIS-based modulation, or"single-RF MIMO"systems. In addition, a novel signalling strategy based on separate layered encoding that enables practical successive cancellation-type decoding at the receiver is proposed. Numerical experiments show that the conventional scheme that fixes the reflection pattern of the RIS, irrespective of the transmitted information, as to maximize the achievable rate is strictly suboptimal, and is outperformed by the proposed adaptive coding strategies at all practical signal-to-noise ratio (SNR) levels.


I. INTRODUCTION
In the context of wireless communications, a reconfigurable intelligent surface (RIS) usually acts as an "anomalous mirror" or a "focusing lens" that can be configured to reflect or refract impinging radio waves towards arbitrary angles by applying appropriate phase shifts to the incident signals [1], [2]. Due to these desirable properties, RISs are being considered for future wireless networks as means to shape the wireless propagation channel for signal, interference, security, and scattering engineering [3]- [7].
Most prior work, to be reviewed below, proposed to use the RIS as a fixed passive beamformer in order to control the SNR levels at the receivers. However, by altering the amplitude or phase of the incident signal, the RIS reflection pattern can also be jointly encoded with the transmitted signals as a function of the information message, thus enlarging the modulation space. One instantiation of this idea is the "single-RF MIMO" system introduced in [8] that encodes multiple information streams using the RIS reflection pattern and a single radio frequency (RF) chain [9].
While practical RIS-based modulation schemes exist [8]- [13], their information-theoretic properties have not been studied. This paper addresses this knowledge gap by studying the capacity of RIS-aided communication links in which a single-RF transmitter can control the state of an RIS via a finite-rate control link (see Fig. 1). The optimal configuration of the RIS requires knowledge of the CSI. The acquisition of CSI is made complicated by the fact that the RIS is a nearly-passive device, and hence it cannot process and transmit pilot signals. To account for this practical constraint, in this paper, the information-theoretic analysis is based on a model in which the CSI is estimated at the receiver via pilot-assisted transmission [14], and it may or may not be shared with the transmitter.
Related Work: The optimization of a fixed RIS reflection pattern has been studied in various scenarios. A comprehensive survey of the state-of-the-art is available in [1], and we mention here some representative examples. Algorithms for jointly optimizing precoding at the transmitter and beamforming at the RIS were proposed for a point-to-point Multiple-Input Single-Output (MISO) systems in [15], and for Multiple-Input Multiple-Output (MIMO) systems in [16], [17]. RIS-based passive beamforming was compared to conventional relaying methods such as amplify-and-forward and decode-and-forward in [2].
Acquiring CSI is crucial for RIS-aided communication. Channel estimation schemes were proposed in [14], [18], in which RIS training patterns are designed under the constraint of discrete phase shifts. The overhead required for channel estimation was studied in [19], and an overhead-aware resource allocation framework was developed. Channel estimation based on statistical CSI is used in [20] to reduce the channel training overhead.
Schemes for encoding information in the configuration of the RIS have been recently presented. In [10]- [12], information is encoded in the reflection patterns of the RIS by setting the amplitude of each reflecting element to be 0 or 1. In [13], the receiver antenna for which the SNR is maximized encodes the information bits using index modulation [21]. The strategies above are extended in [8] by implementing phase-shift keying (PSK) and quadrature amplitude modulation (QAM) at each element, and by using two independent data streams to control the RIS.
Main Contributions: This work provides an information-theoretic analysis of the RIS-aided system illustrated in Fig. 1, which consists of a single-RF transmitter and a receiver with antennas. CSI is assumed to be acquired at the receiver via pilot-based transmission, and it may or may not be shared with the transmitter. We first derive the capacity for any RIS control rate, and prove that jointly encoding data onto the transmitted signals and RIS reflection pattern is generally necessary to achieve the maximum information rate. We explicitly characterize the performance gain of joint encoding in the high-SNR regime. Then, we propose an achievable scheme based on layered encoding and successive cancellation decoding (SCD) that enables RIS-based modulation, while supporting standard separate encoding and decoding strategies. Numerical experiments demonstrate that, for SNR levels of practical interest and for a sufficiently fast RIS control link, capacity-achieving joint encoding provides significant gain over the max-SNR approach, which fixes the reflection pattern. However, joint encoding is shown to require a more accurate channel estimation compared to the max-SNR scheme, and is hence mostly desirable for long channel coherence blocks. The results in this paper were partially presented in [22], which only considers perfect CSI at the transmitter and receiver.
Organization: The rest of the paper is organized as follows. In Section II, we present an information-theoretic model for an RIS-aided quasi-static fading channel with imperfect CSI obtained via channel estimation. In Section III, we derive the capacity and we compare it to the rates achieved by two standard suboptimal signalling schemes: a max-SNR scheme that does not encode information in the RIS reflection pattern, and an RIS-based signalling scheme that modulates the reflection pattern uniformly and has no beamforming gain. In

II. SYSTEM MODEL
We consider the system depicted in Fig. 1 in which a single-RF transmitter communicates with a receiver equipped with antennas over a quasi-static fading channel in the presence of an RIS that comprises nearly-passive reconfigurable elements. The reconfigurable elements are spaced half of the wavelength apart, so that the mutual coupling or channel correlation effects can be ignored as a first-order approximation [23]. We explore the potential improvement in capacity that can be obtained when the transmitter can encode its message ∈ [2 ] of rate [bits/symbol] not only into a codeword of symbols sent on the wireless link to the receiver, but also in the reflection pattern of the RIS. The reflection pattern is controlled through a rate-limited control link, and is defined by the phase shifts that each of the RIS elements applies to the impinging wireless signal.
As illustrated in Fig. 2, the fading coefficients are assumed to remain constant for a coherence interval of symbol periods, after which they change to new independent values.
The coding slot of symbols hence contains / coherence blocks, which is taken to be an integer. The codeword transmitted in a coding slot has symbols from a constellation S of = |S| points. The constellation S is assumed to have an average power of one, i.e., The phase shift applied by each element of the RIS is chosen from a finite set A of The channel from the transmitter to the RIS in the th coherence block, ∈ [ / ], is denoted by the vector g( ) ∈ ℂ ×1 , and the channel from the RIS to the receiving antennas is denoted by the matrix H( ) ∈ ℂ × . In order to support multiple information streams with a single RF chain, the transmitter and RIS are expected to be placed such that there is a strong line-of-sight between them [9], [13]. Therefore, we assume that the elements of the channel vector g( ) have random phases and unit amplitude, as illustrated in Fig. 1. In contrast, the reflected signal is assumed to undergo a multi-path channel before being received, and hence the elements of the matrix H( ) are independent and identically distributed (i.i.d.) as CN(0, 1). Moreover, as in, e.g., [8], [13], we assume that the direct link between transmitter and receiver is blocked, so that the propagation from transmitter to receiver occurs solely through the reflected signal from the RIS. During the th coherence block, the fraction of the codeword consisting of symbols transmitted in the th sub-block, ∈ [ℓ], is denoted The phase shifts applied by the RIS in the th sub-block are denoted by the vector with θ , ( ) ∈ A being the phase shift applied by the th RIS element, ∈ [ ]. Finally, we denote the signal received by the antennas for the th transmitted symbol by y , ( ) ∈ ℂ ×1 , The overall received signal matrix Y ( ) = (y ,1 ( ), . . . , y , ( )) ∈ ℂ × in the th sub-block can hence be written as where the matrixH( ) H( ) diag(g( )), whose elements are i.i.d. CN(0, 1), combines the channels g( ) and H( ); the scalar ( ) > 0 denotes the power gain applied to the transmitted signal s ( ), which is subject to the power constraint for some > 0; and the matrix Z ( ) ∈ ℂ × , whose elements are i.i.d. as CN(0, 1), denotes the additive white Gaussian noise at the receiving antennas. It is worth noting that the product H( ) θ θ θ ( ) in (4) can be viewed as an augmented channel, shaped by the RIS for increasing the capacity.
Since the message is encoded onto both transmitted symbols s ( ) and phase shifts θ θ θ ( ), ∈ [ℓ], ∈ [ / ], we denote the effective channel input as With this notation, the channel (4) can be restated as At first glance, the channel (7) resembles a standard multiple-antenna wireless communication link [24]. In (7), however, the input matrixX ( ) is rank-one and is chosen from the finite set As a special case, for a fixed RIS reflection pattern θ θ θ = θ θ θ for all ∈ [ℓ], i.e., when the same phase shift vector is used for the entire coherence block, the channel input is chosen from the subset In the present paper, we study the impact of imperfect CSI on the achievable rates. In order to characterize the joint distribution of channel estimation and output signal, we vectorize the channel matrixH( ) and output Y ( ) in (7) as and respectively, where we have defined the vector z ( ) vec(Z ( )) ∈ ℂ ×1 , and, for any matrixX, the matrixX ⊗ is defined as the Kronecker product

A. Training and Channel Estimation
As illustrated in Fig. 3, we focus our attention on transmission schemes in which, for each coherence block ∈ [ / ], the first ≥ 0 sub-blocks are used to transmit pilot symbols known to the receiver. That is, we havē where we have defined matrix 1: As for the transmitter, we assume that either it has no access to the CSI or that it has access to the receiver's CSI via a feedback channel.
The transmission power can vary between the training and information transmission phases.
Accordingly, the power gain ( ) in (4) has two levels The power constraint (5) can hence be restated as Therefore, the vectorized channel output during the training phase is Based on the pilot symbols 1: , the receiver estimates the channel vectorh( ) using the minimum mean-square error (MMSE) estimator, which yieldsĥ( ) = h ( )|y 1: ( ) as the estimate ofh( ) from the observations y 1: ( ). Since vectorsh( ) and y 1: ( ) are jointly Gaussian distributed, the MMSE estimator can be computed as the linear MMSE estimator [25], i.e.,ĥ and the estimation error is a Gaussian random vector whose covariance matrix is In order to asses how channel estimation affects the achievable performance, we shall also consider as a benchmark the case of perfect CSI, which corresponds to the case study in which the vectorĥ( ) =h( ) is available to both the transmitter and receiver as side information without any training ( = 0).

B. Channel Encoding
As discussed, in each coherence block, the transmitter selects the ℓ − data sub-blocks based on the information message and the channel estimateĥ( ), if available. The vectorized channel output in (11), received over the ℓ − data sub-blocks, can be expressed as where the supremum is taken over all joint encoding and decoding schemes. The number of sub-blocks used for training 0 ≤ ≤ ℓ, pilot symbols 1: , and power-amplifier gain > 0 can all be optimized to increase the achievable rate.

III. CHANNEL CAPACITY
In this section, we derive the capacity ( , , 1: ) defined in (23) and we prove that the conventional scheme that does not encode information in the RIS reflection pattern is strictly suboptimal. More specifically, this result is proved in the high-SNR regime by characterizing the gain of the proposed joint encoding. For finite values of the SNR, on the other hand, the performance gain is evaluated in Section VI via numerical experiments.
Most works on RIS-aided systems consider Gaussian codebooks for the transmitted signal s ( ). This implies that the resulting achievable rates are formulated in the standard form log 2 (1 + SNR), even in the presence of imperfect CSI by using standard bounds [26]. In contrast, as described in Section II, we focus our attention on the more practical model in which the transmitted symbols and the RIS elements' phase response take values from finite sets. As a result, standard capacity expressions of the form log 2 (1 + SNR) are not applicable, and standard techniques for bounding the capacity under imperfect CSI cannot be used. Specifically, lower bounding the capacity by modeling the residual channel estimation noise as Gaussian [27], [28] does not hold for finite input constellations [29]. Therefore, the expressions for the capacity and achievable rates that we present in this section are more complex, and require the following definitions.

Definition 1:
The cumulant-generating function (CGF) of a random variable u is defined The value of the CGF for = 1 is denoted by (u) 1 (u). Definition 2: The CGF of a random variable u conditioned on a random vector x is defined The value of the conditional CGF for = 1 is denoted by (u|x) 1 (u|x). We now derive the capacity for the general case with imperfect CSI available at both the transmitter and receiver. In particular, the capacity is formulated in the form of an optimization problem with respect to the encoding distribution X|ĥ ( |ˆ ) of the effective inputs in (21) given the channel estimateĥ. To this end, we define the covariance matrix of the received signal y( ) in (22) conditioned on the channel estimateĥ( ) and the input X( ) as where, for any matrix X, we have defined the positive semidefinite matrix Γ Γ Γ(X) as We also define the decomposition where (X) is a square root matrix of Γ Γ Γ(X).
Proposition 1: When the MMSE estimateĥ( ) in (19) is available at both the receiver and transmitter, the capacity of the channel (22) is given as where random variable u is defined as with independent random vectors z ∼ CN(0, (ℓ− ) ) andĥ ∼ CN(0, − MMSE ), and random matrices X 1 , X 2 ∼ X|ĥ ( |ˆ ) that are conditionally independent givenĥ. Furthermore, for ≥ , we have the high-SNR limit which, for a given cardinality = |S| of the signal constellation, is maximized if the amplitude shift keying (ASK) modulation is used, i.e., where the factor 3/[3 + 4( 2 − 1)] ensures a unit average power constraint. In this case, the high-SNR limit is Proof: See Appendix A.
Achieving the capacity in (29) generally requires joint encoding over the codeword symbols s ( ) and RIS reflection variables θ θ θ ( ), for all data sub-blocks = + 1, . . . , ℓ, ∈ [ / ], as well as joint decoding of the message at the receiver based on the information encoded over both s ( ) and θ θ θ ( ). In (29), this is specified in the optimization over the distribution X|ĥ ( |ˆ ) of the input X( ) = (X +1 ( ), . . . ,X ℓ ( )) in (21), which, by (6), is a function of both s ( ) and θ θ θ ( ). However, the high-SNR asymptotic limit in (31) implies that, in the high-SNR regime, capacity is achieved by using independent random codebooks with uniform distribution for the codeword symbols s and the RIS reflection pattern θ θ θ, and perfect channel estimation can be obtained by using ≥ pilot sub-blocks.
At a computational level, problem (29) is convex (see Appendix A), and hence it can be solved by using convex optimization tools. Moreover, calculating (u|X 1 , z,ĥ) in (29) involves evaluating the expectation over the random vectors z andĥ, and over the random matrices X 1 and X 2 . Since z andĥ are continuous random vectors, the former expectation may be estimated via an empirical average, while the second requires summing over |C| ℓ− terms.
The following two corollaries formulate the capacity under the assumption of imperfect CSI available only at the receiver, and under the assumption of perfect CSI available at both the transmitter and receiver, respectively. Corollary 1: When the MMSE estimateĥ( ) in (19) is available only at the receiver, the capacity of the channel (22) is given as where the random variable u is defined as in (30) with independent random vectors z ∼ , and independent random matrices X 1 , X 2 ∼ Proof: It follows from the proof of Proposition 1 with the caveat that, since the channel estimateĥ is available only at the receiver, the optimal input distribution X ( ) is uniform.
Proof: It follows from the proof of Proposition 1 by setting = 0 and Γ Γ Γ MMSE = 0, since the channel vectorh is known to both the receiver and transmitter without requiring any training.

A. Max-SNR Approach
Having observed that achieving the capacity generally requires joint encoding of data over the codeword symbols and the RIS reflection pattern, we now consider the standard approach in which the reflection pattern of the RIS is fixed for all data sub-blocks = + 1, . . . , ℓ, of the fading block , irrespective of the message , i.e., θ θ θ ( ) = θ θ θ( ). We denote the fixed RIS reflection pattern by (ˆ ) to emphasize that it is chosen based on the channel estimateˆ to maximize the achievable rate, and we have the following result.
The limit in (39) implies that, in the high-SNR regime, the rate of the max-SNR scheme is limited to (ℓ − ) log 2 ( )/ℓ. This is because, in each coherence block, the information data is modulated solely onto the (ℓ− ) codeword symbols, which are selected from a constellation S of points. By comparing (39) with (31), we evince that, for any phase response set A of distinct phases, modulating the RIS reflection pattern can be used to increase the achievable rate by additional (ℓ − ) log 2 ( )/( ℓ) bits per symbol as compared to the max-SNR scheme. However, note that the max-SNR scheme can achieve the high-SNR rate (39) by fixing the RIS reflection pattern irrespective of CSI and estimating only the effective channel from the transmitter to the receiver. Therefore, the max-SNR approach requires only ≥ 1 pilot symbols to achieve the high-SNR limit in (39), whereas joint encoding achieves the limit in (31) with ≥ pilot symbols. For finite values of the SNR, the achievable rate in (38) can be computed by combining convex optimization tools for the inner minimization problem and global optimization tools for the minimization over the set of discrete phase shifts. The corresponding performance loss is evaluated in Section VI via numerical experiments.
The rates achieved for imperfect CSI available only at the receiver and for perfect CSI available at both the transmitter and receiver are given in the following two corollaries, respectively.
Corollary 3: When the MMSE estimateĥ in (19) is available only at the receiver, a transmission scheme in which the phase shift vector is kept fixed achieves the rate where the random variable u is defined as in (30) with independent random vectors z ∼ CN(0, (ℓ− ) ) andĥ ∼ CN(0, − MMSE ), and independent random matrices X 1 , X 2 ∼ Proof: It follows from the proof of Proposition 2 with the caveat that, since the channel estimateĥ is available only at the receiver, the optimal input distribution X ( ) is uniform.
Proof: It follows from the proof of Proposition 2 by setting = 0 and Γ Γ Γ MMSE = 0, since the channel vectorh is known to both receiver and transmitter without requiring any training.

IV. LAYERED ENCODING
As discussed, achieving the capacity in (29) requires jointly encoding the message over the phase shift vector θ θ θ ( ) and the transmitted signal s ( ), while performing optimal, i.e., maximum-likelihood joint decoding at the receiver. This may be infeasible in some communication networks. Therefore, in this section, we propose a strategy based on layered encoding and successive cancellation decoding (SCD) that uses only standard separate encoding and decoding procedures, while still benefiting from the modulation of information onto the state of the RIS so as to enhance the achievable rate compared with the max-SNR scheme.
To this end, the message is split into two sub-messages, or layers, 1 and 2 , such that By averaging the first columns of the received signal matrix Y ( ) in (7), we obtain where we have defined random vectorz ( ) ∼ CN(0, ). The receiver decodes layer 1 based on the received matrixȲ( ) (ȳ +1 ( ), . . . ,ȳ ℓ ( )), which, from (44), can be expressed as where we have defined the matrixZ( ) (z +1 ( ), . . . ,z ℓ ( )) ∈ ℂ ×(ℓ− ) , whose elements are i.i.d. with distribution CN(0, 1), and the phase shift matrix which is selected from the set By direct inspection of (45), we evince that it depends only of the RIS phase shifts, and hence layer 1 can be separately decoded. Once layer 1 is decoded, the receiver reconstructs the phase shift vectors θ θ θ ( ), which are then used to decode layer 2 . This strategy achieves the rate detailed in Proposition 3.
Note that the layered encoding scheme does not require CSI at the transmitter (CSIT) since both layers are encoded independently from the channel estimateĥ. The rate achieved by the proposed layered strategy in the case of perfect CSI is derived in the following corollary.
Proof: It follows from the proof of Proposition 3 by setting = 0 and Γ Γ Γ MMSE = 0 since the channel vectorh is known to both the receiver and transmitter without requiring any training.

V. LOWER BOUNDS
As discussed in the previous sections, calculating the capacity and achievable rates typically requires the evaluation of expectations over Gaussian random vectors and over discrete random matrices whose size increases exponentially with ℓ − . This makes the evaluation numerically difficult for long coherence blocks. Furthermore, unlike the Gaussian vectors that have a known distribution, the input distribution of the random matrices needs to be numerically optimized. This implies that the standard method for estimating the expectations via empirical averages cannot be applied to the discrete random matrices, and hence estimating the expectations from a small number of samples requires methods such as the Monte Carlo gradient estimation [31]. In this section, we take a different approach and present lower bounds on the capacity and achievable rates that require summing over a fixed number of terms that does not increase with the number of sub-blocks ℓ, which simplifies the exact calculation of the bounds.

A. Lower Bounds for Optimal Signalling and Max-SNR
Proposition 4: When the MMSE estimateĥ in (19) is available at both the receiver and transmitter, the capacity in Proposition 1 and the rate achieved by the max-SNR scheme in Proof: See Appendix C.
As detailed in Appendix C, the lower bounds in Proposition 4 correspond to rates achievable when the sub-blocksX ∈ C, = + 1, . . . , ℓ, are decoded separately. This is in contrast to the optimal strategy presented in Proposition 1 that jointly decodes all data sub-blocks inputs (X +1 , . . . ,X ℓ ) ∈ C ℓ− from the channel outputs y +1 , . . . , y ℓ . The key computational advantage of the lower bounds is that evaluating the expectations over the discrete random matrices X 1 and X 2 defined in Proposition 1 requires summing over |C| ℓ− terms, whereas evaluating the expectations in the lower bound (60) requires summing over |C| terms, which is exponentially smaller.
The corresponding lower bounds on capacity and rate achieved by the max-SNR scheme under the assumptions of imperfect CSI available only at the receiver and perfect CSI available at both the transmitter and receiver, are formulated, respectively, in the following two corollaries.
The random variable u in (62) and (63) is defined as in (30)  Proof: It follows from the proof of Proposition 4 with the caveat that, since the channel estimateĥ is available only at the receiver, the optimal input distribution X ( ) is uniform.
Proof: It follows from the proof of Proposition 4 by setting = 0 and Γ Γ Γ MMSE = 0, since the channel vectorh is known to both the receiver and transmitter without requiring any training.

B. Lower Bounds for Layered Encoding
Similar to Proposition 4, we derive a lower bound on the rate achieved by the layeredencoding scheme introduced in Section IV.

VI. NUMERICAL RESULTS
In this section, we illustrate and discuss numerical examples with the main aims of (i) comparing the capacity achieved by the proposed joint encoding scheme with the achievable rates attained by the max-SNR and the layered encoding schemes, and (ii) assessing the impact of imperfect CSI. For the phase response set, we consider uniformly spaced phases in the set A {0, 2 / , . . . , 2 ( − 1)/ }, whereas, for the input constellation, we consider ASK, which was shown to maximize capacity in the high-SNR regime (Proposition 1), and PSK modulations. In addition, we set an equal power for training and data sub-blocks, i.e., = = √ , and optimize the channel estimation by testing all pilot symbols 1: ∈ C 1× that satisfy the power constraint in (14). Moreover, the empirical average over Gaussian random vectors, e.g.,ĥ and z in Proposition 1, is evaluated via a Monte Carlo method, and the optimal input distributions, e.g., X|ĥ ( |ˆ ) in Proposition 1, are numerically calculated using the fmincon function in MATLAB. We limit our investigation to small number of RIS elements in order to perform numerical optimization without requiring excessive computing power. Based on the high-SNR analysis in Proposition 1, we can conclude that the capacity increases linearly with the number of elements for sufficiently high SNR and a sufficiently long coherence block. We postpone the numerical analysis with larger to future work.
On the role of the SNR level. In Fig. 4, we plot the rate as a function of the average power , with ℓ = 4 sub-blocks of which = 2 sub-blocks are used for channel estimation, = 2 receive antennas, = 2 RIS elements, = 2 available phase shifts, a symbol-to-RIS control rate = 1, and input constellation given by the 4-ASK S = { , 3 , 5 , 7 } with For very low SNR, i.e., less than −20dB, it is observed that the max-SNR approach is close to being optimal, and hence, in this regime, encoding information in the RIS reflection pattern does not increase the rate. For larger SNR levels of practical interest, however, joint encoding provides significant gain over the max-SNR scheme.
It is also observed that CSIT is unnecessary for very low or very high SNR levels. This is because, at low SNR, the channel estimate is poor and cannot be applied for beamforming, whereas, at high SNR, beamforming, which is used to increase SNR, is unnecessary. In addition, the lower bounds presented in Section V are shown to be close to the achievable rates. Note that the gap to the lower bounds increases for small number of pilot symbols < , i.e., when channel estimation is poor, even for high-SNR.
Optimal number of pilot symbols. In Fig. 5 Rate [bit per channel use] optimal signalling w/ perfect CSI optimal signalling w/ imperfect CSI uniform signalling (no CSIT) max-SNR w/ perfect CSI max-SNR w/ imperfect CSI max-SNR w/o CSIT is shown to require a more accurate channel estimation compared to the max-SNR scheme with CSIT, for which allocating = 1 pilot is optimal. Comparing the penalty of channel estimation between the joint encoding strategy and the max-SNR scheme, in addition, we observe that the gap is larger for joint encoding since a higher percentage of the coherence block is used to obtain a sufficient channel estimation accuracy.
As seen in Fig. 5, the capacity-achieving joint encoding strategy requires a better channel estimation compared to the max-SNR scheme. However, for short coherence blocks, acquiring sufficiently good channel estimation might not be feasible and the gain of joint encoding is expected to decrease. This is illustrated in Fig. 6, where we plot the lower bounds on the rate as a function of the number of sub-blocks ℓ with = 2 receive antennas, = 4 RIS elements, = 2 available phase shifts, a symbol-to-RIS control rate = 1, an average power constraint of = 10 dB, and an input constellation given by 4-ASK. For each value of ℓ, the lower bounds are optimized over = 0, . . . , ℓ−1. For fast-changing channels, the gain of joint encoding is shown to be low. Moreover, without CSIT, the max-SNR scheme is optimal for ℓ ≤ 2.
On the number of receive antennas. In Fig. 7 available phase shifts, a symbol-to-RIS control rate = 2, and input constellation given by 4-ASK or QPSK S = {±1, ± }. For layered encoding, we set = 1 pilot, which was seen to maximize the rate in this experiment. It is observed that, for sufficiently high SNR, the layered-encoding scheme improves over the max-SNR approach. Note that, in the high-SNR regime, as apparent from the limits in (39) and (54), layered encoding achieves a higher rate when log 2 ( ) > log 2 ( ). In addition, while PSK outperforms ASK when used with the max-SNR and layered-encoding schemes, the opposite is true with joint encoding in the high-SNR regime. In fact, as discussed in Proposition 1, in the high-SNR regime, out of all finite input sets S with the same size, ASK achieves the maximum capacity.
On the RIS control rate. The gain of using the state of the RIS as a medium for conveying information is expected to decrease as the rate of the control link from the transmitter to the RIS decreases. This is illustrated in Fig. 9, where we plot the rate with perfect CSI at both transmitter and receiver as a function of the RIS control rate factor , with = 2 receive antennas, = 2 RIS elements, = 2 available phase shifts, an average power constraint of = 40 dB, and an input constellation 2-ASK. Note that the performance of the layeredencoding scheme improves from = 1 to = 2 since, for = 1, the transmitted symbol in each sub-block is used as a pilot, and hence only the first layer carries information. It is  observed that, while, for = 1, joint encoding achieves three times the rate of max-SNR, the gain reduces to a factor of 1.3 for = 7.

VII. CONCLUSIONS
In this work, we have studied the capacity of an RIS-aided system. We focused on a fundamental model with one transmitter and one receiver, where the CSI is acquired through pilot-assisted channel estimation. The common approach of using the RIS as a passive beamformer to maximize the achievable rate was shown to be generally suboptimal in terms of the achievable rate for finite input constellations, especially for slow-changing channels.
Instead, the capacity-achieving scheme was proved to jointly encode information in the RIS   reflection pattern as well as in the transmitted signal. While the scheme was shown to require a more accurate channel estimation compared to the max-SNR approach, the gain of encoding information in the reflection pattern of the RIS was demonstrated to be significant for a sufficiently high RIS control rate. In addition, a suboptimal, yet practical, strategy based on separate layered encoding and successive cancellation decoding was demonstrated to outperform passive beamforming for sufficiently high SNR levels, and motivates RIS-based modulation design [8]- [13] for single-RF MIMO communication.
Among related problems left open by this study, we mention the design of low-complexity joint encoding and decoding strategies that approach capacity, the derivation of the capacity for noisy RIS [32] and for RIS with mutual coupling [23], and extensions to RIS systems with multiple users/surfaces [33] or with security constraints [34]. Another related problem is finding the optimal input distribution for a slowly fading channel with CSI only at the receiver [35].

A. Proof of Proposition 1
The model in (22) can be viewed as a standard channel with input X, output y, and known CSIĥ. This is because the transmitter directly controls the states of the RIS θ θ θ ( ) and the transmitted symbols s ( ) for ∈ [ℓ] and ∈ [ / ]. Therefore, it follows from the channel coding theorem [36,Ch. 7], [37,Ch. 7.4.1], that the ergodic capacity can be expressed as The mutual information (X; y|ĥ) in (68) can be written as (X; y|ĥ) = ℎ(y|ĥ) − ℎ(y|ĥ, X).
In addition, the conditional probability density function of the output y given the estimateĥ and input X is where the covariance matrix Γ Γ Γ( ) is defined in (27). Therefore, the conditional differential entropy ℎ(y|ĥ, X) is given as with z ∼ CN(0, (ℓ− ) ) and where we have defined the scalar Overall, by subtracting (70) from (71) and applying the conditional CGF definition in (25), we get (29). Note that the mutual information (X; y|ĥ) is a concave function of X|ĥ ( |ˆ ) for fixed y|ĥ,X ( |ˆ , ) [36, Theorem 2.7.4], Therefore, problem (68) can be solved using convex optimization tools.

B. Proof of Proposition 3
The channel in (45) is equivalent to a point-to-point Gaussian multiple-input multipleoutput (MIMO) channel with PSK input Q. Therefore, for layer 1 , the following rate is .
By applying the conditional CGF definition in (25) to the achievable rate in (74) with the aid of (76), we get (49).