Semantically-Secured Message-Key Trade-off over Wiretap Channels with Random Parameters

We study the trade-off between secret message (SM) and secret key (SK) rates, simultaneously achievable over a state-dependent (SD) wiretap channel (WTC) with non-causal channel state information (CSI) at the encoder. This model subsumes other instances of CSI availability as special cases, and calls for efficient utilization of the state sequence for both reliability and security purposes. An inner bound on the semantic-security (SS) SM-SK capacity region is derived based on a superposition coding scheme inspired by a past work of the authors. The region is shown to attain capacity for a certain class of SD-WTCs. SS is established by virtue of two versions of the strong soft-covering lemma. The derived region yields an improvement upon the previously best known SM-SK trade-off result reported by Prabhakaran et al., and, to the best of our knowledge, upon all other existing lower bounds for either SM or SK for this setup, even if the semantic security requirement is relaxed to weak secrecy. It is demonstrated that our region can be strictly larger than those reported in the preceding works.


A. Background
Physical layer security (PLS) [1]- [3], rooted in information-theoretic (IT) principles, is an approach to provably secure communication that dates back to Wyner's celebrated 1975 paper on the wiretap channel (WTC) [4]. By harnessing randomness from the noisy communication channel and combining it with proper physical layer coding, PLS guarantees protection against computationally-unlimited eavesdroppers, with no requirement that the legitimate parties share a secret key (SK) in advance. Two fundamental questions in the field of PLS regard finding the best achievable transmission rate of a secret message (SM) over a noisy channel, and the highest attainable SK rate that distributed parties can agree upon based on correlated observations. The base model for SM transmission is Wyner's WTC [4], where two legitimate parties communicate over a noisy channel in the presence of an eavesdropper. The secrecy capacity of the degraded WTC was derived in [4], and the result was extended to the general case by Csiszár and Körner [5]. The security analyses in both [4] and [5] relied on evaluating particular conditional entropy terms, named equivocation. This technique has been widely adopted in the IT community ever since.
Recently, distribution approximation arguments emerged as the tool of choice for proving security. This approach relies on a soft-covering lemma (SCL) that originated in another 1975 paper by Wyner [6]. The SCL states that the distribution induced by randomly selecting a codeword from an appropriately chosen codebook and passing it through a memoryless channel will be asymptotically indistinguishable from the distribution of random noise. The SCL was further developed over the years and stricter proximity measures between distributions were achieved [7]- [10]. Based on these more advanced versions, one can make the channel output observed by the eavesdropper in the WTC seem like noise and, in particular, be approximately independent of the confidential data. This, in turn, implies IT security. Notably, [11] and [12] focused on tight soft-covering exponents with respect to relative entropy and total variation, respectively.
The study of SK agreement was pioneered by Maurer [13], and, independently, by Ahlswede and Csiszár [14], who studied the achievable SK rates based on correlated observations at the terminals that can communicate via a noiseless and rate unlimited public link. The SK capacity when only one-way public communication is allowed was characterized in [14]. This result was generalized in [15] to the case where the public link has finite capacity.
The optimal random coding scheme for these cases is a combination of superposition coding and Wyner-Ziv coding [16]. If the encoder controls its source (rather than just observing it), this source becomes a channel input and the setup evolves to a WTC. This is a special case of the SK channel-type model that was also studied in [14].

B. Model and Contributions
A more general framework to consider is the state-dependent (SD) WTC with non-causal encoder channel state information (CSI). This model combines the WTC and the Gelfand and Pinsker (GP) channel [17], and is therefore sometimes referred to as the GP-WTC. The dependence of the channel's transition probability on the state sequence accounts for the possible availability of correlated sources at the terminals. The similarity between the SM transmission and the SK agreement tasks makes their integration in a single model natural. Adhering to the most general framework, we study the SM-SK rate pairs that are simultaneously achievable over a SD-WTC with non-causal encoder CSI.
The scenario where there is only a SM was studied in [18], where an achievable SM rate formula was established.
This result was improved in [19] based on a novel superposition coding scheme 1 . SK agreement over the GP-WTC was the focus of [22], and, more recently, of [23] (see also references therein). The combined model was considered by Prabhakaran et al. [24], who derived a benchmark inner bound on the SM-SK capacity region. The result from [24] is optimal for several classes of SD-WTCs.
We propose a novel superposition coding scheme for the combined model that subsumes all the aforementioned achievability results as special cases. Specifically, [18], [19], [22]- [24], as well as all the other existing inner bounds (on SM transmission, SK agreement or both) that are known to the authors, are captured. Furthermore, our inner bound is shown to achieve strictly higher rates than each of these previous results.
The coding scheme used herein is inspired by [19]. Namely, an over-populated superposition codebook that encodes the entire confidential message in its outer layer is utilized. Using the redundancies in the inner and outer layers, the transmission is correlated with the state sequence by means of the likelihood encoder [25]. Although the redundancy indices are chosen as part of the encoding process, we show that their true distribution is close to uniform. Consequently, as long as a certain redundancy index is kept secret (along with the confidential message), it may be declared as a SK. The security analysis is based on constructing the inner codebook such that it is better observable by the eavesdropper, making the inner layer index decodable by him/her. This enhances the secrecy resources that the legitimate parties can extract from the outer layer, which they use to secure the SM and part of the redundancy index of the outer layer. The latter is declared as the SK.
Our results are derived under the strict metric of semantic-security (SS). The SS criterion is a cryptographic gold standard that was adapted to the WTC framework (of computationally unbounded adversaries with a noisy observation) in [26]. As was shown in [26], SS is equivalent to negligible mutual information (MI) between the confidential information (in our case, the SM-SK pair) and the eavesdropper's observations, when maximized over all possible message distributions. The proof of SS relies on the strong SCL for superposition [ Since the past secrecy results from [18], [22]- [24] were derived under the weak secrecy metric (i.e., a vanishing normalized MI with respect to a uniformly distributed message-key pair), our achievability outperforms those schemes, not only in terms of the achievable rate pairs, but also in the upgraded sense of security.

C. Organization
This paper is organized as follows. Section II establishes notation and definitions and sets up the SD-WTC problem. Section III states our main result -an inner bound on the SM-SK optimal trade-off region. In Section IV our inner bound is shown to be tight for a certain class of channels. In Section V we discuss past results captured within the considered framework, and illustrate the improvement our result yields. The proof of the main result is the content of Section VI. Finally, Section VII summarizes the main achievements and outlines the main insights emerging from this work.

A. Preliminaries
We use the following notations. As is customary, N is the set of natural numbers, while R are the reals. We Given two real numbers a, b, we denote by [a : b] the set of integers n ∈ N a ≤ n ≤ b . Calligraphic letters denote sets, e.g., X , while |X | stands for the cardinality of X . X n denotes the n-fold Cartesian product of X . An element of X n is denoted by x n = (x 1 , x 2 , . . . , x n ); whenever the dimension n is clear from the context, vectors (or sequences) are denoted by boldface letters, e.g., x.
Let Ω, F, P be a probability space, where Ω is the sample space, F is the σ-algebra and P is the probability measure. Random variables over Ω, F, P are denoted by uppercase letters, e.g., X, with conventions for random vectors similar to those for deterministic sequences. The probability of an event A ∈ F is denoted by P(A), while P(A B ) denotes the conditional probability of A given B. We use 1 A to denote the indicator function of A ∈ F.
The set of all probability mass functions (PMFs) on a finite set X is denoted by P(X ), i.e., PMFs are denoted by letters such as p or q, with a subscript that identifies the random variable and its possible conditioning. For example, for two discrete correlated random variables X and Y over the same probability space, we use p X , p X,Y and p X|Y to denote, respectively, the marginal PMF of X, the joint PMF of (X, Y ) and the conditional PMF of X given Y . In particular, p X|Y : Y → P(X ) represents the stochastic matrix whose elements are given by p X|Y (x|y) = P X = x|Y = y . Expressions such as p X,Y = p X p Y |X are to be understood as , for all (x, y) ∈ X × Y. Accordingly, when three random variables X, Y and Z satisfy p X|Y,Z = p X|Y , they form a Markov chain, which is denoted by Any PMF q ∈ P(X ) gives rise to a probability measure on (X , 2 X ) 2 , which we denote by P q ; accordingly, P q A) = x∈A q(x) for every A ⊆ X . We use E q to denote an expectation taken with respect to P q . Similarly, we use H q and I q to indicate that an entropy or a mutual information term are calculated with respect to the PMF q. For a random vector X n , if the entries of X n are drawn in an independent and identically distributed (i.i.d.) manner according to p X , then for every x ∈ X n we have p X n (x) = n i=1 p X (x i ) and we write p X n (x) = p n X (x).
Similarly, if for every (x, y) ∈ X n ×Y n we have p Y n |X n (y|x) = n i=1 p Y |X (y i |x i ), then we write p Y n |X n (y|x) = p n Y |X (y|x). The conditional product PMF p n Y |X given a specific sequence x ∈ X n is denoted by p n Y |X=x . The empirical PMF ν x of a sequence x ∈ X n is ν x (x) We use T n ǫ (p X ) to denote the set of letter-typical sequences of length n with respect to the PMF p X and the non-negative number ǫ, i.e., we have Definition 1 (Total Variation) Let (Ω, F) be a measurable space and µ and ν be two probability measures on that space. The total variation between µ and ν is If the sample space Ω is countable, p, q ∈ P(Ω) and P p and P q are the probability measures induced by p and q, respectively, then (3a) reduces to

B. Problem Setup
We study the SD-WTC with non-causal encoder CSI, for which we establish a novel achievable region of semantically secured message-key rate pairs.
Let S, X , Y and Z be finite sets. The S, X , Y, Z, W S , W Y,Z|S,X discrete and memoryless (DM) SD-WTC with non-causal encoder CSI is shown in Fig. 1. A state sequence s ∈ S n is sampled in an i.i.d. manner according to W S and revealed in a non-causal fashion to the sender. Independently of the observation of s, the sender chooses a message m from the set 1 : 2 nRM and maps the pair (s, m) onto a channel input sequence x ∈ X n and a key index k ∈ 1 : 2 nRK (the mapping may be random). The sequence x is transmitted over the SD-WTC with transition probability W Y,Z|S,X : S × X → P(Y × Z). The output sequences y ∈ Y n and z ∈ Z n are observed by the receiver and the eavesdropper, respectively. Based on y, the receiver produces the pair (m,k), its estimates of (m, k). The eavesdropper tries to glean whatever it can about the message-key pair from z.  Definition 2 (Code) An (n, R M , R K )-code c n for the SD-WTC with non-causal encoder CSI and a message set M n 1 : 2 nRM and a key set K n 1 : 2 nRK is a pair of functions (f n , φ n ) such that 1) f n : M n × S n → P(K n × X n ) is a stochastic encoder.
2) φ n : Y n → M n × K n is the decoding function.
For any message distribution p M ∈ P(M n ) and any (n, R M , R K )-code c n , the induced joint PMF is The probability measure induced by p (cn) is P p (cn ) . The performance of c n is evaluated in terms of its rate pair (R M , R K ), its maximal decoding error probability, the key uniformity and independence metric, and the SS-metric.
and subscript p (cn) denotes that the underlying PMF is (4).
where for any m ∈ M n δ m (c n ) p (cn) and p

(U)
Kn is the uniform PMF over K n .

Definition 5 (Information Leakage and SS Metric) The information leakage to the eavesdropper under the
(n, R M , R K )-code c n and the message PMF p M ∈ P(M n ) is ℓ(p M , c n ) I p (cn ) (M, K; Z), where I p (cn ) denotes that the MI is taken with respect to (4). The SS metric with respect to c n is Definition 6 (Achievability) A pair (R M , R K ) ∈ R 2 + is called an achievable SS message-key rate pair for the SD-WTC with non-causal encoder CSI, if for every ǫ > 0 and sufficiently large n there exists an (n, R M , R K )-code Definition 7 (SS-Capacity) The SS message-key capacity region C Sem of the SD-WTC with non-causal encoder CSI is the convex closure of the set of all achievable SS message-key rate pairs.

III. MAIN RESULT
The main result of this work is a novel inner bound on the SS message-key capacity region of the SD-WTC with non-causal encoder CSI. Our achievable region is at least as good as the best known achievability results for the considered problem, and is strictly larger in some cases. To state our main result, let U and V be finite sets and for any q U,V,X|S : S → P(U × V × X ) define where the MI terms are calculated with respect to the joint PMF W S q U,V,X|S W Y,Z|S,X , under which (U, V ) − − (S, X) − − (Y, Z) forms a Markov chain.
Theorem 1 (SS Message-Key Capacity Inner Bound) The following inclusion holds: The proof of Theorem 1 is given in Section VI, and is based on a secured superposition coding scheme. An over-populated two-layered superposition codebook is constructed (independently of the state sequence), in which the entire secret message is encoded in the outer layer. Thus, no data is carried by the inner layer. The likelihood encoder [25] uses the redundancies in the inner and outer codebooks to correlate the transmitted codewords with the observed state sequence. Upon doing so, part of the correlation index from the outer layer is declared by the encoder as the key. The inner layer is designed to utilize the part of the channel which is better observable by the eavesdropper. This saturates the eavesdropper with redundant information and leaves him/her with insufficient resources to extract any information on the SM-SK pair from the outer layer. The legitimate decoder, on the other hand, decodes both layers of the codebook and declares the appropriate indices as the decoded message-key pair.
Remark 3 (Interpretation of Theorem 1) To get some intuitive understanding of the result of Theorem 1, we examine R A (q U,V,X|S ) from two different perspectives: when the joint PMF W S q U,V,X|S W Y,Z|S,X is such that , and when the opposite inequality holds.
, the third rate bound in R A (q U,V,X|S ) becomes redundant and the dominating bounds are The right-hand side (RHS) of (11a) is the total rate of reliable (secured and unsecured) communication that our superposition codebook supports. This clearly bounds the rate of the SM that may be transmitted. For (11b), the MI difference on the RHS is the total rate of secrecy resources that are produced by the outer layer of the codebook.

Since the security of our SM-SK pair comes entirely from that outer layer, this MI difference is an upper bound on the sum of rates.
For the opposite case, if I(U ; Y ) < I(U ; S), then the second inequality in R A is inactive and we are left with While the interpretation of (12a) remains as before, to understand (12b) consider the following. Since rewriting the the bounds on R M + R K from (9) as it is evident that maximizing only over joint PMFs satisfying I(U ; Z) ≥ max I(U ; Y ), I(U ; S) attains optimality.
Indeed, if the opposite inequality holds, one could always chooseṼ = (U, V ) andŨ = ∅ to achieve higher rates.

Adapting Theorem 1 to the Rate-Equivocation Framework
A confidential transmission of a SM requires channel resources for both reliability and security. The lesser of the two resources, therefore, limits the feasible transmission rates. The main focus of this paper is utilization of the residual secrecy resources that the SD-WTC offers. However, if secrecy is the lesser resource, the superior capability of the channel to support reliable communication may be utilized by considering a Rate-Equivocation framework.
Theorem 1 naturally extends to an inner bound on the rate-equivocation region of the considered SD-WTC [5], [27]. Equivocation represents the portion of the message that can be secured from the eavesdropper. Intuitively, it answers the question of how much information does the eavesdropper lack for decoding the entire message. The rate-equivocation framework enables communicating at rates higher than the secrecy capacity, as long as full secrecy is forfeited. Since equivocation has added value over full secrecy only when the channel offers more resources for reliable communication than for security, for simplicity we assume R K = 0.
Formally, the equivocation rate of an (n, R)-code c n is R (4) and M is a uniformly distributed message. The achievability of a rate-equivocation pair (R, R E ) ∈ R 2 + requires the existence of a sequence of (n, R)-codes {c n } n∈N with a vanishing error probability and an equivocation rate R An adaptation of the arguments from the proof of Theorem 1 (see Section VI) shows that any rate-equivocation for some PMF q U,V,X|S that induces a joint distribution W S q U,V,X|S W Y,Z|S,X is achievable.
To prove this inner bound we follow the derivation from Section VI, while replacing the message M therein with a pair of uniformly distributed messagesM (M, M ′ ) of rates R E and R − R E , respectively; the total rate of communication is R. To ensure that the distribution approximation arguments from Lemma 1 and the error probability analysis hold, it suffices that Inequalities (36) and (41) To satisfy the equivocation requirement, the security analysis is only carried out with respect to M . Therefore, Inequality (54) is replaced by We conclude by noting that securing M implies the desired equivocation forM : where the multi-letter MI and entropy terms above are taken with respect to the distribution induced by the extracted The SD less-noisy-eavesdropper WTC with a key.
(reliable and secure) sequence of codes. Applying Fourier-Motzkin elimination to remove R 1 and R 2 produces (14).

IV. TIGHT SECRECY CAPACITY RESULTS
An operationally appealing special case of the considered SD-WTC is the following. Assume that W Y,Z|S,X is such that the eavesdropper's channel is less noisy than the main channel, but that the legitimate parties share a SK L ∼ W n L (independent of the state sequence S ∼ W n S ), using which they secure the confidential data. The setup is illustrated in Fig. 2.
Formally, let L, S, X , Y and Z be the alphabets of the key, the state, the channel input and the two channel outputs, respectively. The considered instance is the S , X ,Ỹ, Z, WS, WỸ ,Z|X,S SD-WTC withS = L × S, , and whose channel transition matrix factors as where W Y,Z|S,X is such that Z is less noisy than Y . A less noisy Z means that I(U ; Y ) ≤ I(U ; Z) for any random forms a Markov chain. We refer to this special case as the SD less-noisy-eavesdropper WTC with a key.
Theorem 1 applies here since the above case is a certain instance of a SD-WTC with non-causal encoder CSI. As subsequently shown, the obtained inner bound is tight, thus characterizing the SS SM-SK secrecy capacity region of the SD less-noisy-eavesdropper WTC with a key. The following corollary states the result.

Corollary 1 (SM-SK Capacity Region) The SS SM-SK capacity region of the SD less-noisy-eavesdropper WTC with a key is the set of all SM-SK rate pairs
where the MI terms in (18a) are with respect to the joint PMF W S q U,X|S W Y |S,X .
The proof of Corollary 1 is relegated to Appendix A. Note that while (18a) bounds the total communication rate as a function only of the communication channel, (18b) bounds the total secrecy rate depending solely on the secret source.
A direct consequence of Corollary 1 is that when no SK is to be established between the legitimate parties, i.e., R K = 0, the best attainable SM rate is A simple separation-based coding scheme achieves the secrecy capacity from (19). Namely, using a capacity achieving error correction code, the channel is effectively converted into a reliable bit-pipe. Each of the legitimate parties compresses L, which results in a uniform random variable. The latter is used to encrypt the SM via a one-time pad. The encrypted message is then transmitted over the reliable bit-pipe. Therefore, The achievable SM rate is equal to the minimum of the capacity of the channel max q U,X|S I(U ; Y ) − I(U ; S) and the rate of the key H(L).
While this scheme may seem very natural, to the best of our knowledge, none of the past achievability results for the SD-WTC with non-causal CSI prior to [19] attain its performance. In Section V-A1, a special case of this setup is used to demonstrate the improvement of our result over the previous benchmark achievable SM-SK region for the SD-WTC from [24].

V. PREVIOUS RESULTS AS SPECIAL CASES
We compare the result of Theorem 1 to those from related past works. The previously best known inner bound on the SM-SK trade-off region attainable over the considered SD-WTC is [24, Theorem 1]. The next subsection restates this inner bound and shows that Theorem 1 can strictly outperform it. Afterwards, we provide a comparison to the best past achievability results for only SM transmission [19] or only SK agreement [23]. The achievability result from [19] captures the previous lower bounds on the secrecy capacity of the SD-WTC from [18], [28], [29].
The SK achievability results from [23] subsume previous lower bounds on the SK generation rate, such as [22], [30]. Relating to one another these three benchmarks that we use to evaluate the performance of Theorem 1, we note that while [19] recovers [24] when there is only a SM (R K = 0), [23] and [24] do not imply one another. [31]. Theorem  Following the steps of the proof of [31,Theorem 1], it appears that another constraint was assumed without being explicitly stated. Following the notations from [31], the missing constraint seems to be

Remark 5 Another result on SK generation over SD-WTCs with non-causal CSI is found in
which would assure decodability of the inner code layer by the legitimate receiver without relying on the outer layer. Taking the additional constraint into consideration, our inner bound from Theorem 1 recovers the amended Theorem 1 from [31] as follows.
We use (Ũ,Ṽ,X,S,Ỹ,Z) to denote the inner layer, the outer layer, the channel input, the encoder CSI, and the observations of the legitimate receiver and the eavesdropper, respectively, in Theorem 1 of [31]. These were originally denoted, respectively, by W , U , X, S,Y andŽ. To adjust our model to that of [31], we identify 2) Φ independent of (Ũ,Ṽ,X,S,Ỹ,Z) with maximal entropy, i.e., such that H(Φ) = C P .
With respect to the above, substituting (U, V, X, Y, Z, S) into (9) and maximizing only over distributions that satisfy To conclude the discussion of [31,Theorem 1] (in its original form), Appendix B provides a specific example that shows the rates from that achievability formula to be exceeding the SK capacity.

A. SM-SK Trade-off Region
The result of Theorem 1 recovers the previously best known achievable SM-SK trade-off region over the SD-WTC with non-causal encoder CSI [24]. In [24, Theorem 1] the following region was established: where, for any q U ∈ P(U) and q V,X|U,S : U × S → P(V × X ), and the MI terms are taken with respect to W S q U q V,X|U,S W Y,Z|S,X , i.e., U and S are independent and (U, First note that Theorem 1 recovers R PER by restricting U to be independent of S in R A . This is since for an independent pair (U, S), we have I(U ; S) = 0, while I(U, V ; Y ) ≥ I(V ; Y |U ) always holds. Consequently, the third rate bound in R A becomes redundant and R PER is recovered.
The result from [24] was derived under the weak secrecy metric (i.e., a vanishing normalized MI 1 n I(M, K; Z) between the SM-SK pair and the eavesdropper's observation sequence, where the message is assumed to be uniform).

1) Achieving Strictly Higher Rates: Since [24, Theorem 1] allows only inner layer coding random variables U
that are independent of the state, Gelfand-Pinsker coding [17], which generally requires correlating U with S, is not supported in the inner layer. Instead, only Shannon's Strategies coding [32], which operates with independent U and S is allowed. The latter is optimal if the encoder observes the state causally, but is generally sub-optimal when non-causal encoder CSI is available. To demonstrate the improvement of Theorem 1 over [24] we exploit the aforementioned limitation of the scheme therein, along with the observation that it is beneficial to exploit any part of a considered SD-WTC that is better observable by the eavesdropper to transmit the inner layer of the code.  Fig. 3, whose transition probability W Y,Z|S,X , key L ∼ W L and state S ∼ W S are defined by the three parameters λ, ǫ, σ ∈ (0, 0.5) as follows: • L, S and E are independent random variables with L ∼ Ber(λ), E ∼ Ber(ǫ) and The joint distribution of (L, S, E) is denoted by W L,S,E = W L W S W E .
• The Memory with Stuck-at-Faults (MSAF) [33] is a deterministic SD channel, driven by a ternary state S. The binary input and output symbols X and G, respectively, are related through the function g : S × X → G given • Z = (S, X), i.e., the eavesdropper noiselessly observes the transmitted symbol X and the state random variable
As stated in the following proposition, R PER (λ, ǫ, σ) is strictly below capacity.

Proposition 1 is proven in
where, for any q U,V,X|S : S → P(U × V × X ), and the MI terms are taken with respect to W S q U,V,X|S W Y,Z|S,X .
R GCP is the projection in the (R M , R K )-plane of R A from Theorem 1 to the R M axis when R K = 0. The main difference between the coding scheme from [19] and our superposition code is the additional index k ∈ K n in the outer layer of the codebook (that also encodes the SM m ∈ M n ). Along with the other redundancy indices, k is used to correlate the transmission with the observed state sequence via the likelihood encoder [25]. Based on distribution approximation arguments we show that K is approximately independent of the message M and approximately uniform. The pair (M, K) is known to the transmitter and is reliably decoded by the receiver. Finally, by securing K along with M in our analysis, it is established as a SK.
The intuition behind the SK construction is that, unlike the message, the key does not have to be independent of the state sequence, nor is it chosen by the user. Therefore, the padding that ensures the correlation with the state sequence is a valid key, as long as it is secured.
Observing that any portion of the SM can be allocated in favor of a SK implies that (27b) is also an achievable SM-SK trade-off region, when R M above is replaced with R M + R K . R A outperforms R GCP , e.g., in settings where an external random source L ∼ W n L is observed by both legitimate parties but not by the eavesdropper, while the capacity of the communication channel is zero (say, Y = Z = 0). For such a setup, the legitimate parties may use the random source to generate a SK of rate H(L). While Theorem 1 supports this strategy, R GCP nullifies in

C. SK Agreement over SD-WTCs
In [23] two achievable schemes were proposed for SK agreement over a WTC when the terminals have access to correlated sources. The results from [23] do not imply one another. The difference between them is that [23, Theorem 2] is based on source and channel separation, while [23, Theorem 3] relies on joint coding.
The setup in [23] consists of three correlated sources S x , S y and S z that are observed by the encoder, the decoder and the eavesdropper, respectively, and a SD-WTC in which the triple (S x , S y , S z ) plays the role of the state. Our general framework is defined through the state distribution W S and the SD-WTC WỸ ,Z|S,X . Setting S = S x , Y = (S y , Y ) andZ = (S z , Z) recovers the model from [23] (see Remark 1).
The first scheme from [23,Theorem 2] operates under the assumption that the SD-WTC decomposes as W (Sy,Y ),(Sz,Z)|Sx,X = W Sy,Sz|Sx W Y,Z|X into a product of two WTCs, one being independent of the state (given the input), while the other one depends only on it. Thus, the legitimate receiver (respectively, the eavesdropper) observes not only the output Y (respectively, Z) of the WTC W Y,Z|X , but also S y (respectively, S z ) -a noisy version of the state sequence drawn according to the corresponding conditional marginal of W Sy ,Sz|Sx . This scheme shows that the SK capacity C SK is lower bounded by where the maximization is over all qṼ |Sx qŨ |Ṽ : S x → P(Ṽ ×Ũ) and q Q,T q X|T ∈ P(Q × T × X ) that give rise to a joint PMF W Sx,Sy,Sz qṼ |Sx qŨ |Ṽ ×q Q,T q X|T W Y,Z|X satisfying I(Ũ ; S x |S y ) ≤ I(Q; Y ) and I(Ṽ ; S x |S y ) ≤ I(T ; Y ).
where the maximization is over all qṼ ,X|Sx qŨ |Ṽ : S x → P(Ṽ × X ×Ũ) that give rise to a joint PMF BPS , recovers (29). It was shown in [23] that, in some cases, the separation-based scheme achieves strictly higher rates than the joint coding scheme, i.e., that R (Separate) BPS > R (Joint) BPS . As Theorem 1 captures both these results, it unifies the two schemes from [23], and, in particular, it outperforms R (Joint) BPS . Furthermore, since the results from [23] were derived under the weak secrecy metric, Theorem 1 also upgrades them to SS.

VI. PROOF OF THEOREM 1
The subsequently presented proof follows lines similar to those from the proof of [19,Theorem 1]. Several claims herein are recovered from corresponding assertions in [19] by identifying the index j in [19] with the pair (j, k) in our scheme. The proofs of such claims are omitted, and the reader is referred to [19].
Fix ǫ > 0 and a conditional PMF q U,V,X|S : S → P(U ×V ×X ). For any n ∈ N, let p M ∈ P(M n ) be the message distribution. We first show that for any (R M , R K ) ∈ R A q U,V,X|S there exists a SS sequence of (n, R M , R K )codes with a key distribution that is approximately uniform conditioned on any message, and a vanishing average error probability. We then use the expurgation technique [34, Theorem 7.7.1] to ensure a vanishing maximal error probability. This is done without harming the SS and the statistical properties of the key, since they hold for each message in the original message set.
Codebook C n : We use a superposition codebook where the outer layer carries both the SM and the SK. The codebook is constructed independently of S, but has sufficient redundancy to enable correlating the transmission with it.
Define the index sets I n 1 : 2 nR1 and J n 1 : 2 nR2 . Let B (n) U U(i) i∈In be a random inner layer codebook, which is a set of random vectors of length n that are i.i.d. according to q n U . An outcome of B To describe the outer layer codebook, fix B We also set B V = B V (i) i∈In and denote its realizations by B V . Finally, a random superposition codebook is denotes a fixed codebook.
Let B n be the set of all possible outcomes of B n . The above codebook construction induces a PMF µ ∈ P(B n ) over the codebook ensemble. For every B n ∈ B n , we have The encoder and decoder are described next for any superposition codebook B n ∈ B n .
Encoder f (Bn) n : The encoding function is based on the likelihood-encoder [25], which, in turn, allows us to approximate the induced joint distribution by a simple distribution that we use for the analysis. Given m ∈ M n and s ∈ S n , the encoder randomly chooses (i, j, k) ∈ I n × J n × K n according to where q S|U,V is the conditional marginal of q S,U,V defined by q S,U, for every (s, u, v) ∈ S × U × V. The encoder declares the chosen index k ∈ K n as the key. The channel input sequence is generated by feeding the chosen uand v-codewords along with the state sequence into the DM channel q n X|U,V,S , i.e., it is sampled from the random vector X ∼ q n X|U=u(i),V =v(i,j,k,m),S=s . Accordingly, the (stochastic) encoding function f n : M n × S n → P(K n × X n ) is given by Decoder φ (Bn) n : Upon observing y ∈ Y n , the decoder searches for a unique tuple (î,ĵ,k,m) ∈ I n × J n × K n × M n such that u(î), v(î,ĵ,k,m), y ∈ T n ǫ (q U,V,Y ).
If such a unique quadruple is found, then set φ For any message distribution p M ∈ P(M n ) and codebook B n ∈ B n , the induced joint distribution p (Bn) over M n × S n × I n × J n × K n × U n × V n × X n × Y n × Z n ×M n ×K n is If Mn , i.e., the message distribution is uniform, we writep (Bn) instead of p (Bn) . Approximating Distribution: We now show that with high probability p (Bn) is close in total variation to another distribution π (Bn) , which lends itself for simpler reliability and security analyses. For any p M ∈ P(M n ) and As before,π (Bn) stands for π (Bn) when p M = p (U) Mn . The following lemma states sufficient conditions for π (Bn) to be a good approximation (in total variation) of p (Bn) with double-exponential certainty.

Lemma 1 (Sufficient Conditions for Approximation) If
then there exist α 1 , α 2 > 0, such that for any n large enough In particular, for any such n it also holds that where ξ S = min s∈supp(WS ) W S (s) > 0. The subscript µ in P µ and E µ indicates that the probability measure and the expectation are taken with respect to the random codebook B n ∼ µ.
Lemma 1 essentially restates [19,Lemma 7] with the index j therein replaced here with the pair (j, k). The proof of Lemma 1 relies on the strong SCL for superposition codes and some basic properties of total variation.
Due to the similarity to [19,Lemma 7] we omit the proof and the reader is referred to [19].
Lemma 1 is key for analyzing the performance of the proposed code. The reliability analysis that is presented next exploits the convergence of the expected value from (38) to show that the average error probability can be made arbitrarily small. The expurgation method [34,Theorem 7.7.1] is used in a later stage of this proof to upgrade to a vanishing maximal error probability.
Average Error Probability Analysis: The average error probability 3ē (B n ) associated with a codebook B n is Our next step is to establish that the expected value ofē(B n ) over the codebook ensemble is approximately the same underp andπ. Then, the expected average error probability underπ is analyzed and shown to converge to zero as n → ∞. Due to the simple structure ofπ, this analysis requires nothing but standard typicality arguments.
To do so we use the two following lemmas.

Lemma 2 (Average Error Probability Underp andπ)
The following relation holds: where E µ p (Bn) −π (Bn) TV is a shorthand for E µ p The proof of Lemma 2 is found in the Average Error Probability Analysis part of Section VI-B in [19]. Lemma 3 is also proven in the same reference by standard typicality decoding arguments. We stress that the conditions in (41) ensure reliable decoding of the four indices (i, j, k, m), and, in particular, of the SM-SK pair (m, k).
Combining the claims of Lemmas 2-3 with (38) from Lemma 1, we have that as long as (41) and (36) are Key Analysis: The structure of π (Bn) from (35) implies that for any B n ∈ B n and m ∈ M n we have π Kn . Adopting the same abuse of notation we used for the reliability analysis, we use Lemma 1 to upper bound the probability that δ(B n ) does not decay exponentially fast to zero as n grows. Therefore, assuming (36) holds, we have that there exists η 1 , η 2 > such that where (a) is by (37) from Lemma 1. We proceed with the security analysis.
Security Analysis: This part mainly deals with analyzing the SS metric under the distribution π (Bn) . The following lemma explains the reason for doing so. It states that if SS is attained for a codebook B n ∈ B n under π (Bn) then it is also attained under p (Bn) .
Lemma 4 (SS for Induced vs. Approximating Distribution) Let B n ∈ B n and β 1 > 0, such that for all p M ∈ P(M n ) and n sufficiently large (independent of p M ) Then, there exist β 2 > 0 such that for all p M ∈ P(M n ) and large enough values of n (independent of p M ), we have where the subscripts p (Bn) and π (Bn) indicate that a mutual information term is calculated with respect to the corresponding PMF.
The proof of Lemma 4 extends that of [19,Lemma 8], and is provided in Appendix D.
The hypothesis from (45) This implies that any codebook for which the RHS of (47) is small is SS.
The two following lemmas state conditions under which the probability that the RHS of (51) vanishes exponentially quickly with n is double-exponentially close to 1.

Lemma 5 (Total Variation Dominates Relative Entropy)
Let X and Y be finite sets, and for any n ∈ N let p X ∈ P(X n ), p Y|X : X n → P(Y n ) and q Y |X : X → P(Y). If p Y|X=x ≪ q n Y |X=x , for all x ∈ X n , i.e., p Y|X=x is absolutely continuous with respect to q n Y |X=x , then where ξ Y |X = min (x,y)∈X ×Y: Lemma 5 is [19, Lemma 9] and its proof is omitted. It is readily verified that π for any sufficiently large n, then there exists ζ 2 > 0 for which as n grows.
Lemma 6 (Sufficient Conditions for SS) If the rate tuple (R M , R K , R 1 , R 2 ) ∈ R 4 + satisfies (36a) and then there exist γ 1 , γ 2 > 0, such that for n sufficiently large Combining the lemma with (53), we deduce that there exist τ 1 , τ 2 > 0 such that for any sufficiently large n.
Code Extraction: The above derivation shows that if (36), (41) and (54) are simultaneously satisfied, then and for sufficiently large n, we also have The Selection Lemma from [9, Lemma 5] implies the existence of a sequence of superposition codebooks B n n∈N (an outcome of the random codebook sequence B n n∈N ), for which Since the indicator functions in (58b)-(58c) take only the values 0 and 1, we have that for any n large enough On account of (57a) and (59), we have that {B n } n∈N is SS, satisfies the target key statistics, and is reliable with respect to the average error probability.
Our last step is to upgrade {B n } n∈N to have a small maximal error probability. This is a standard step that uses the expurgation technique (see, e.g., [34,Theorem 7.7.1]). Namely, pushing the average error probability below ǫ 2 , at least half of the messages in M n result in a probability of error that is at most ǫ. Throwing away the rest of the messages ensures a maximal error probability that is at most ǫ, while inflicting a negligible rate loss. Discarding those messages does not harm the SS or the key uniformity and independence metric, thus producing a new sequence of codes that satisfies (8). Applying the Fourier-Motzkin Elimination on (36), (41) and (54) shows that any SM-SK rate pair (R M , R K ) ∈ R A q U,V,X|S is achievable, which concludes the proof.

VII. SUMMARY AND CONCLUDING REMARKS
We studied the trade-off between the SM and SK rates that are simultaneously achievable over a SD-WTC with non-causal encoder CSI. This model subsumes all other instances of CSI availability as special cases. An inner bound on the SS message-key capacity region was derived based on a superposition coding scheme, the likelihood encoder and soft-covering arguments inspired by [19].
We presented a class of SD-WTCs for which our inner bound achieves capacity, and demonstrated that for this class, the previously best known SM-SK trade-off region by Prabhakaran et al. [24] is strictly sub-optimal.
Furthermore, we showed that the inner bound derived here recovers the best lower bounds on either the SM [19] or the SK [23] rate achievable over the considered SD-WTC. Our derivations ensure SS, thus upgrading the security standard from most of the past results, which were derived under the weak secrecy metric.
As the message-key capacity region for this setup remains an open problem, finding good outer bounds is of particular interest. Extensions to multiple terminals, action dependent states [35], and source reconstruction models should be examined as well. A q U,X|S,L induces a joint distribution over L × S × U × X × Y × Z that is given by We now proceed with the direct and the converse proofs.
where, similarly to the above, (a) is implied by the independence of (S, U, Y, Z) and L. Finally, due to (61a), any joint distribution that produces a non-zero achievable region satisfies I(U ; Y ) − I(U ; S) ≥ 0; hence, the term Maximizing over all q U,X|S concludes the proof.
Converse: To get (18a), notice that the secret communication rate of the setup cannot exceed the total reliable communication rate. Therefore, an upper bound on the secrecy capacity is given by the GP channel capacity formula [17]: where, for each q U,X|S , the underlying joint PMF is q U where (a) follows because L and S are independent (see (60)), while (b) follows by recasting (L, U ) as U .
For the bound on R M + R K from (18b), we enhance the channel by allowing the encoder to control both the state S and the secret source L, yet constraining it to the original statistics W S × W L . The obtained channel is equivalent to a WTC WỸ ,Z|X with inputX = (L, S, X) and outputsỸ = (L, Y ) andZ = Z at the legitimate receiver and the eavesdropper, respectively. Therefore, the channel secrecy capacity [5] is an upper bound on the sum of rates 4 ; thus, Now, note that for any q U,X = q U,(L,S,X) we have We first restate [31, Theorem 1] through the notations of this work. This theorem stipulates the following lower bound on the SK capacity C SK of the SD-WTC with non-causal encoder CSI 5 : where the maximization is over all conditional PMFs q U|V : V → P(U) and q V,X|S : S → P(V × X ) satisfying All the above MI terms are taken with respect to the appropriate marginals of W S q U|V q V,X|S W Y,Z|S,X , where Now consider the following setup.
• Let A, B and Q be three i.i.d. Ber( 1 2 ) random variables. Also, set A n , B n and Q n as three n-fold random vectors whose coordinates are i.i.d. copies of A, B and Q, respectively. (67) T n represents the output sequence of the deterministic and memoryless channel T = t(A, B, Q), when it is fed by A n , B n and Q n . 4 For the WTC without state, no additional secrecy may be extracted in the form of a key. [14, Channel Model] 5 [31, Theorem 1] also incorporates a public communication link into the setup. We restate the theorem assuming that the public communication rate is zero.
• Let f n be the stochastic encoder and Ψ n be the sequence that f n produces and transmits over a private binary bit-pipe to the legitimate receiver.
• The encoder observes (A n , B n ) non-causally and determines the binary bit-pipe transmission Ψ n .
• The eavesdropper observes A n ⊕ n B n , where ⊕ n stands for bit-wise addition modulo 2. (At each time instance the eavesdropper observes A i + B i (mod 2).) Thus, at each channel use i ∈ [1 : n], the encoder observes two fair coin tosses, A i and B i . The decoder observes only one of them, namely T i , chosen at random (using a third fair coin Q i ). The decoder knows which coin it observes, but the encoder does not. There is a private bit-pipe from the encoder to the decoder, which enables the transmission of a single noiseless bit each time the coins are flipped. The legitimate parties wish to agree upon a key that is kept secret from the eavesdropper (who observes only the modulo 2 addition of the two coins, A i ⊕ B i , each time they are flipped).
Denoting the SK generated by the legitimate parties by K n , the induced joint PMF of the system is q A n ,B n ,Q n ,T n ,Ψ n ,Kn (a n , b n , q n , t n , ψ n , k n ) = f n (k n , ψ n |a n , b n ) To comply with our notations, we identify S = (A, B), X = Ψ, Y = (T, Q, Ψ) and Z = A ⊕ B, while also denoting byỸ (T, Q) the output-CSI pair observed by the decoder.
A valid choice of random variables for (66) is which achieves R Zib = 2. Hence, by showing that the SK capacity of the proposed setup is strictly less than 2, we contradict the achievability of R Zib from [31, Theorem 1] as the SK rate for this setup. We do so by showing that the vanishing average error probability and the weak secrecy of the SK, used in the definition of achievability in [31], cannot coexist in this setup while a SK rate of 2 is attained.
Consider a sequence of codes {c n } n∈N achieving R Zib = 2 for the above setup. We have that there exists a sequence {ǫ n }, with lim n→∞ ǫ n = 0, such that where: (69a) follows by the definition of SK rate achievability 6 .
(69b) is because the alphabet of Ψ n is of size 2 n and since a uniform distribution maximizes discrete entropy.
(69c) is Fano's inequality, following the requirement of vanishing decoding error.
(69d) is the weak secrecy requirement.
Lemma 7 For the considered setup, the SK capacity is upper bounded by 2 bits per channel use, Lemma 7 follows because the considered setup, but without an eavesdropper (i.e., when Z = 0), falls within the framework of the common randomness (CR) problem in Model i from [36].
Proof: Theorem 4.1 in [36] shows that the CR capacity is upper bounded by where R is the rate of the communication link between the transmitter and the receiver. Evaluating the RHS of (71) with respect to the considered setup shows that it equals 2 (CR bits per channel use). This upper bound remains valid when a security requirement is introduced, since it can only reduce the admissible rates.
Lemma 7 guarantees the existence of a sequence {ǫ ′ n }, with lim n→∞ ǫ ′ n = 0, such that the following condition may be added to the set (69): Another technical lemma we need is stated next. Its proof is relegated to Appendix E.
where (a) uses (74), (b) follows by the chain rule and because Z n is deterministically defined by (A n , B n ) and(c) is since A n , B n and Z n = A n ⊕ n B n are all i.i.d. Ber 1 2 sequences, and because A n and B n are independent. Having (75), we conclude with where (a) uses (69a) and (75). Evidently, (76) contradicts (69d).

APPENDIX C PROOF OF PROPOSITION 1
Fix σ ∈ (0, 0.5) and set to [0, 0.5], respectively. It is readily verified that ǫ, λ ∈ (0, 0.5). By virtue of (26), the inner bound from Theorem 1 attains the SM capacity, which is given by (see (19)) where C GP (W Y |S,X ) = max q U,X|S I(U ; Y ) − I(U ; S) is the GP capacity of the SD channel W Y |S,X with state distribution W S . By the corollary to Theorem 2 from [37] we find that C GP (W Y |S,X ) = (1 − σ)(1 − ǫ). As , we obtain and, therefore, The achievability of (80) may also be verified directly from Theorem 1 by substituting U = G, V = (U, L) and X ∼ Ber 1 2 independent of (S, L) into (9). We now show that R PER (λ, ǫ, σ) < 1 − 1 2 σ + h σ 2 . Fix a joint distribution to evaluate the region from (21b) with R K = 0, and S and Note that the independence of (L, S) and U is a restriction on the feasible joint distributions in (21a). Now, assume in contradiction that evaluating (21b) with respect to q produces a rate that is at least as high as where (a) uses the Markov relation V − − (S, U, X) − − Y , which follows because Y = y E, g(S, X) and E is independent of (S, U, V, X) under the distribution from (81).
On account of (82b), the single inequality from (83) must hold with equality. For this to happen, the following argument must hold.
1) The conditioning is removed from the first (positive) term, i.e., This implies that L is independent of (U, Y ).
2) The second (negative) term is zero, i.e., where (a) relies on the independence of E and (L, U, V ). The last equality in (86) implies that there exists a (deterministic) function ℓ : U × V → L such that L = ℓ(U, V ).
3) Expanding the third (negative) term with respect to E in a similar manner to that presented in the above 2nd point, we obtain I(V ; S, X|U, Y, E = 1) = I(V ; S, X|U, E = 1) = I(V ; S, X|U ) = 0, which establishes V − − U − − (S, X) as a Markov chain.
Since S and U are independent under q from (81), the Markov relation from point 3) further implies that S is independent of the pair (U, V ). Observe that this effectively means that the inability of the scheme from [24, Theorem 1] to support GP coding in the inner layer implies that GP coding is not supported at all.
We proceed to analyze (82a) under the above deductions. Consider The expression on the RHS of (89) is the capacity of the MSAF with causal encoder knowledge of the state sequence (cf., e.g., [38, p.5469]). However, the causal CSI is useless for the MSAF encoder, as demonstrated in Section V-A of [38]. Omitting the availability of any CSI from the MSAF encoder, the channel is equivalent to a binary symmetric channel with flip probability σ 2 (see (23)), whose capacity equals 1 − h σ 2 . We conclude with where (a) is because σ < h( σ 2 ) for any σ ∈ (0, 0.5). This is a contradiction to (82a).

APPENDIX D PROOF OF LEMMA 4
Fix p M ∈ P(M n ) and, for simplicity of notation, abbreviate p (Bn) and π (Bn) as p and π, respectively. Consider: ≤ p M,K − π M,K TV log |M n | · |K n | p M,K − π M,K TV + p Z − π Z TV log |Z n | p Z − π Z TV + p M,K,Z − π M,K,Z TV log |M n | · |K n | · |Z n | p M,K,Z − π M,K,Z TV The bound on the RHS of (93) is uniform in p M ∈ P(M n ) and decays exponentially fast to zero as n grows. The result of Lemma 4 follows by maximizing both sides on (93) over all message distributions.
We next show that which, when combined with (95) gives n − 2nǫ n ≤ I(K n ;Ỹ n ) ≤ n − 1 2 H(A n , B n |K n ).
This further implies (73), as required. Thus, to complete the proof of Lemma 8 it suffices to show that (96) holds.
Consider the following steps: where (a) and (b) follow since Q i is independent of (K n , A i , B i ,Ỹ i−1 ), for every i ∈ [1 : n], as evident from (68), (c) is because A i , B i ∼ Ber 1 2 , (d) follows sinceỸ i−1 is a deterministic function of (Q i−1 , A i−1 , B i−1 ) and (e) uses the independence of Q n and (K n , A n , B n ) (see (68)).