Computable Upper Bounds on the Capacity of Finite-State Channels

We consider the use of the well-known dual capacity bounding technique for deriving upper bounds on the capacity of indecomposable finite-state channels (FSCs) with finite input and output alphabets. In this technique, capacity upper bounds are obtained by choosing suitable test distributions on the sequence of channel outputs. We propose test distributions that arise from certain graphical structures called Q-graphs. As we show in this paper, the advantage of this choice of test distribution is that, for the important classes of unifilar and input-driven FSCs, the resulting upper bounds can be formulated as dynamic programming (DP) problem, which makes the bounds tractable. We illustrate this for several examples of FSCs, where we are able to solve the associated DP problems explicitly to obtain capacity upper bounds that either match or beat the best previously reported bounds. For instance, for the classical trapdoor channel, we improve the best known upper bound of 0.66$ (due to Lutz (2014)) to 0.584, shrinking the gap to the best known lower bound of 0.572, all bounds being in units of bits per channel use.


I. INTRODUCTION
A finite-state channel (FSC) is a mathematical model for a discrete-time channel in which the channel output depends statistically on both the channel input and an underlying channel state, the latter taking values in a finite set. This model can represent a channel with memory since it allows the channel output to depend on past inputs and outputs via the channel state. In this paper, we investigate two important classes of FSCs, namely, unifilar and input-driven FSCs.
Finding a computable characterization of the capacity of these fundamental channels is a long-standing open problem This work was supported in part by the DFG via the German Israeli Project Cooperation (DIP), in part by the Israel Science Foundation (ISF), in part by the Cyber Center at Ben-Gurion University of the Negev, and in part by the WIN consortium via the Israel minister of economy and science. The work of O. Sabag has been partially supported by the ISEF postdoctoral fellowship. The work of S. Shamai has also been supported by the European Union's Horizon 2020 Research And Innovation Programme, grant agreement no. 694630. The work of N. Kashyap   in information theory. The investigation of FSCs dates back to classical works from the 1950s [2]- [4]. Besides their theoretical importance, these channels appear in many practical applications of wireless communication [5], [6], and magnetic recording [7]. Except for a few special cases where a closedform single-letter capacity formula can be obtained, for general FSCs, only a multi-letter capacity formula exists [4], [8]. This paper advances the research on FSCs by providing a new technique to derive simple, analytical upper bounds on their capacity. For instance, consider the trapdoor channel ( Fig.  1) that was introduced by David Blackwell in 1961 [9]. While its zero-error capacity [10], [11] and its feedback capacity [12] are known exactly, its channel capacity (without feedback and allowing a vanishingly small error probability) is still unknown. The best lower and upper bounds known are from [13] and [14], respectively: where the capacity is measured in bits per channel use. In this work, we will show a novel upper bound, C ≤ log 2 3 2 (≈ 0.5849), that improves significantly upon the previous best upper bound. We will establish a general technique by which such specific bounds are relatively easy to obtain.
Our upper bounds are based on a known technique called the dual capacity bounding technique, attributed to Topsøe [15] -see [16, p. 147, Problem 1]. This technique was used in [17]- [22] to obtain upper bounds for channel capacity in various contexts. In this technique, an upper bound on capacity is obtained by specifying a test distribution on the channel output process. The resulting bound is tight if the chosen test distribution is equal to the output distribution induced by the capacity-achieving input distribution. For an FSC, this output distribution is, in general, not i.i.d.. As a result, it is important to develop a systematic means of specifying a test distribution that has memory but which gives rise to a computable upper bound.
A standard choice of test distribution for channels with memory are Markov distributions of some finite order [19], [20], [22]. However, we will use test distributions that belong to a more general class of finite-state processes. The distributions we consider are defined by a (strongly) connected 1 directed graph on finitely many nodes, in which each edge is labeled by a symbol from the channel output alphabet in such a way that the outgoing edges from any given node get distinct labels. For each node of the graph, we specify a probability distribution on the set of its outgoing edges. Then, walks on the graph starting from some distinguished initial node form a random process over the channel output alphabet. Following [23], we call the underlying labeled directed graph a Q-graph. 2 Note that the random process defined in this manner is a finitestate process, but it need not be Markov of any fixed order. On the other hand, it is easy to see that any Markov process of fixed order, say m, over a finite alphabet A can be defined on a certain Q-graph with |A| m nodes, the set of nodes being in one-to-one correspondence with the set of strings of length m over A. As we will demonstrate, there is utility in going from the class of Markov test distributions to the more general class of test distributions defined on Q-graphs. For the specific case of the dicode erasure channel, we will show that a Q-graph on 3 nodes yields an output distribution that outperforms all Markov distributions of order up to 2.
For an FSC, the dual capacity upper bound obtained from a given test distribution is, in general, a multi-letter expression. One of the main theoretical contributions of our paper is showing that, for any test distribution defined on a Q-graph, the evaluation of this multi-letter expression can be formulated as an infinite-horizon average-reward dynamic programming (DP) problem. This formulation immediately gives us numerical as well as analytical tools to compute the multi-letter expression, thus yielding an explicit upper bound on capacity. Indeed, a well-known approach to handling DP optimization problems is by solving the associated Bellman equationsee e.g., [25]. Computer-based simulations of the dynamic program provide important insights into the solution of this equation.
In this paper, we use Q-graph based test distributions to bound from above the capacity of several well-known FSCs, namely, the trapdoor [9], Ising [26], Previous Output is STate (POST) [27], and dicode erasure [28] channels. For each of these channels, we use the insights gained from numerical methods to arrive at an explicit analytical solution to the corresponding average-reward DP problem. In this manner, we obtain upper bounds on the capacities of these channels.
The relationship between channel capacity and DP was first observed in Tatikonda's thesis [29], where it was shown that the feedback capacity of a class of FSCs can be formulated as a DP problem. This approach was further developed in [12], [30], [31], and yielded several new feedback capacity results for FSCs [12], [32]- [36]. However, in the case of capacity without feedback, except for the POST channel [27], exact results are known only for certain FSCs with strict symmetry conditions, all with an i.i.d. capacity-achieving input distribution [37]- [39]. The remainder of this paper is organized as follows. Section II introduces our notation and defines the model. Section III introduces the dual capacity upper bound, gives some background on Q-graphs, and states our main result. Section IV gives a brief review of infinite-horizon DP and introduces the DP formulation of the dual capacity upper bound for FSCs. Section V presents our bounds on capacity for several specific FSCs. Finally, our conclusion appears in Section VI. To preserve the flow of the presentation, most of the proofs are given in the appendices.

II. NOTATION AND MODEL DEFINITION
In this section, we introduce our notation and define our FSC model.

A. Notation
Throughout this paper, we use the following notations. The set of natural numbers (which does not include 0) is denoted by N, while R denotes the set of real numbers. Random variables will be denoted by capital letters and their realizations will be denoted by lower-case letters, e.g., X and x, respectively. Calligraphic letters denote sets, e.g., X . We use the notation X n to denote the random vector (X 1 , X 2 , . . . , X n ) and x n to denote the realization of such a random vector. For a real number α ∈ [0, 1], we defineᾱ = 1 − α. The binary entropy function is denoted by The probability mass function (pmf) of X is denoted by P X , the conditional probability of X given Y is denoted by P X|Y , and the joint distribution of X and Y is denoted by P X,Y . The probability Pr[X = x] is denoted by P X (x). When the random variable is clear from the context, we write it in shorthand as P (x). For a conditional pmf P Y |X , P Y |X ≻ 0 denotes that P Y |X (y|x) > 0 for all x ∈ X and y ∈ Y.
Let P Y and R Y be two discrete probability measures on the same probability space. Then, P Y ≪ R Y denotes that P Y is absolutely continuous with respect to R Y . The relative entropy between P Y and R Y is denoted by D (P Y R Y ). The conditional relative entropy is defined as D( , where E X {·} denotes the expectation operator over X.

B. FSCs
We consider the standard finite-state channel, described in Fig. 2. The channel is defined with finite input and output alphabets X and Y, respectively, and a finite set of states S. The input, output and state at time t are denoted by x t , y t and s t , respectively. The defining property of an FSC is that, given x t and s t−1 , the pair (s t , y t ) is conditionally independent of all previous inputs, outputs and states, as well as of the message m to be transmitted. To be precise, where S denotes the channel state at the beginning of the transmission and S + represents the channel state at the end of the transmission. In particular, the transition probability kernel P S + ,Y |X,S is time-invariant, i.e., it does not depend on t. Furthermore, if there is no feedback, the conditional probability P S t ,Y t |X t ,S0 decomposes as The following definition presents the indecomposability property of FSCs.
for any channel states s n , s 0 , s ′ 0 , and any input sequence x n . Loosely speaking, for an indecomposable FSC, the effect of the initial channel state becomes negligible as time evolves. An alternative characterization of indecomposability [8, Theorem 4.6.3] is that for some n and each input sequence x n ∈ X n , there is a choice of state s n at time n (s n may depend on x n ) such that P (s n |x n , s 0 ) > 0 for all initial states s 0 .
The capacity of an indecomposable channel is presented in the following theorem.
for any s 0 ∈ S.
Throughout this paper, the capacity (and bounds on it) are measured in bits per channel use. We investigate the following two important classes of FSCs: 1) Unifilar FSCs: For these channels, the state evolution is given by a deterministic function. Specifically, (1) is simplified to: where f : X × Y × S → S. Since the channel state can be computed recursively, we may use s t = f t (x t , y t , s 0 ) to denote t applications of f (·). 2) Input-driven FSCs: For these channels, the channel state does not depend on past outputs. Specifically, Note that this definition generalizes that of FSCs with input-dependent states [40], in which the next state is a deterministic function of the input and the previous state.

III. MAIN RESULT VIA DUAL CAPACITY FORMULA
In this section, we present the dual capacity upper bound, Q-graphs and our main result.

A. Dual capacity upper bound
The dual capacity upper bound [15], [16] is a simple upper bound on channel capacity that has been utilized in many works [17]- [22]. For any memoryless channel, P Y |X , and test distribution R Y , the dual capacity upper bound is given by The proof follows from the following steps: The bound is tight if R Y is equal to the output distribution, P * Y , induced by an optimal (i.e., capacity-achieving) input distribution.
For FSCs, where the aim is to maximize the n-letter mutual information I(X n ; Y n ), one may replace the test distribution in (6) with R Y n and obtain Again, this bound is tight when R Y n = P * Y n , the output distribution induced by an input distribution that maximizes I(X n ; Y n ). Naturally, the choice of the test distribution will affect the tightness of the bound, and we would like to choose test distributions that are close, in some sense, to P * Y n . The output distribution P * Y n is, in general, not i.i.d.. A common choice of a test distribution is a Markov distribution of some finite order [19], [20], [22], but here we use an extension of this notion. The mathematical structure needed to define this extension is called a Q-graph, which is presented in the next section.

B. The Q-graph
A Q-graph, introduced in [23], is a directed and (strongly) connected graph on a finite set of nodes Q, in which each node has |Y| outgoing edges with distinct labels. Due to the distinct labeling, the graph defines a mapping φ : Q×Y → Q, where φ(q, y) is the unique node pointed to by the edge from q labeled with y. Further, given a distinguished initial node q 0 ∈ Q, we also have a well-defined mapping Φ q0 : Y * → Q, where Y * is the set of all finite-length sequences over Y. Indeed, Φ q0 (y t ) is the node reached by walking along the unique directed path of length t starting from q 0 and labeled by y t = (y 1 , y 2 , . . . , y t ). We will often drop the subscript from Φ q0 for notational convenience, whenever there is no ambiguity in doing so.
Fix a Q-graph on the set of nodes Q, with a distinguished initial node q 0 . A graph-based test distribution, R Y |Q , is a collection of probability distributions R Y |Q=q on Y, defined for each q ∈ Q. This defines a test distribution on channel output sequences as follows: where q t−1 = Φ(y t−1 ) for t > 1. It can be noted from (8) that, since |Q| < ∞, the induced process is a finite-state process.
A special case of a Q-graph is a kth-order Markov Q-graph, which is defined on the set of nodes Q = Y k , and for each node q = (y 1 , y 2 , . . . , y k ), the outgoing edge labeled y ∈ Y goes to the node (y 2 , . . . , y k , y). For instance, Fig. 3 shows a Markov Q-graph with Y = {0, 1} and k = 1. Note that test distributions R Y |Q on a kth-order Markov Q-graph correspond to kth-order stationary Markov processes.
Q-graph-based test distributions grant us an added layer of generality over Markov distributions of finite order. There is value to this added generality, as we will see in Section V-C. Moreover, the dual capacity upper bound obtained from any such test distribution is actually computable (at least numerically) for certain classes of FSCs. Indeed, our main result is that, for unifilar and input-driven FSCs, the dual capacity upper bound obtained from any Q-graph based test distribution can be formulated as a DP problem, and hence, is computable.

C. Summary of main results
In this section we summarize our main contributions for unifilar and input-driven FSCs. Specifically, our main contributions are as follows: • In Section IV, we derive the duality upper bounds for unifilar FSCs and input-driven FSCs in Theorem 3 and Theorem 5, respectively. The duality bounds hold for any choice of a graph-based test distribution and are given by multi-letter expressions, i.e., they depend on a limiting blocklength. • In Section IV, we show the computability of the bounds by formulating them as a DP. Specifically, when the FSC is unifilar, we show that the dual capacity upper bound in Theorem 3 can be formulated as a dynamic program with P(S × Q) being the state space and X being the action space. • Similarly, if the channel is an input-driven FSC, then the dual capacity upper bound in Theorem 5 can be formulated as a dynamic program with P(S) × P(Q) being the state space and X being the action space.
• In Section V, we apply the developed framework to several examples and derive novel upper bounds on the capacity of the well-known trapdoor and Ising channels that outperform previously reported upper bounds. Further, we provide an alternative converse proof for the capacity of the POST channel. • Lastly, in Section V, we demonstrate the superiority of the graph-based test distribution compared to simple Markovian test distributions by comparing the duality upper bound for the DEC. In the next section, we introduce the DP framework and formally define the DP formulations stated above. The DP formulations are useful as we can then use known DP algorithms to numerically compute upper bounds on capacity. Moreover, the numerical results can sometimes be converted to explicit analytical upper bounds, as we do for the examples presented in Section V.
The dual capacity bounding technique has been utilized in several works, e.g., for amplitude-constrained additive white Gaussian noise channels [20], [41]. In [42], [43], the authors derive bounds on the capacity of channels with memory and provide numerical methods to approximate the bounds. Our work is closest in spirit to that in [19] and [21], in which the dual capacity bounding technique is applied to binary-input memoryless channels with a runlength constrained input and to single-tap binary-input Gaussian channels with intersymbol interference. Using Markov test distributions, the authors of [19] and [21] are able to derive, in some specific cases, explicit expressions for the resulting upper bounds on channel capacity.
The main novelty in our work is the DP formulation of the dual capacity upper bound and the use of graph-based test distributions. On the one hand, our formulation is restricted to channels with finite input, output, and state alphabets, but on the other hand, it allows us to use the powerful machinery of DP to at least numerically evaluate the bounds for a large class of FSCs. In some cases, as we will see in Section V, we are even able to convert the numerically evaluated bounds to analytical expressions.

IV. UPPER BOUNDS VIA DP
In this section, we first introduce DP and the Bellman equation. Then, for a fixed graph-based test distribution, we present a DP formulation of the dual capacity upper bound for unifilar and input-driven FSCs. Additionally, we present a simplified DP formulation for the case of unifilar input-driven FSCs, where the state evolves according to s t = f (x t , s t−1 ).

A. DP and the Bellman equation
Here we introduce a formulation for a deterministic 3 average-reward dynamic program. Each DP problem is defined by a quintuple (Z, U, F, P Z , g). Each action, u t , takes a value in a compact subset U of a Borel space. We consider a discretetime dynamical system that evolves according to where each DP state, z t , takes values in a Borel space Z. The initial state z 0 is drawn according to the distribution P Z . The action u t is selected by a deterministic function µ t that maps the initial DP state, z 0 , into actions. Specifically, given a policy π = {µ 1 , µ 2 , ...}, actions are generated according to u t = µ t (z 0 ). Accordingly, in this setup, the only randomness is in z 0 .
Given a bounded reward function, g : Z × U → R, we aim to maximize the average reward. The average reward for a policy π is defined by ρ π = lim inf n→∞ 1 n E π n−1 t=0 g (Z t , µ t+1 (z 0 )) , where the subscript π indicates that actions are generated by the policy π = (µ 1 , µ 2 , ...). The optimal average reward is given by The following theorem, an immediate consequence of Theorem 6.1 in [25], encapsulates the Bellman equation, which provides a sufficient condition for the optimality of an average reward and a policy.
Theorem 2 (Bellman equation). Given a DP problem as above, if a scalar ρ ∈ R and a bounded function h : Z → R satisfy Numerical methods for solving the DP problem, such as policy iteration and value iteration, provide very important insights into the solution of the Bellman equation. One may use the approximate solution obtained by these algorithms to generate a conjecture for the exact solution, and use the Bellman equation to verify its optimality.

B. A DP formulation for unifilar FSCs
In this section we introduce a DP formulation of the dual upper bound on the capacity of unifilar FSCs. First, let us present a definition that extends the idea of channel indecomposability (see Definition 1 and its subsequent paragraph) to the notion of a channel and a graph-based test distribution being jointly indecomposable.
Definition 2 (Joint indecomposability). Fix an FSC and a graph-based test distribution on channel output sequences. If for some n ∈ N and each input sequence x n , there exists a choice of s n and q n such that P (s n , q n |x n , s 0 , q 0 ) > 0, for all s 0 , q 0 , then the FSC and test distribution are jointly indecomposable. Note that s n and q n above are allowed to depend on x n .
The following theorem presents an upper bound on the capacity of unifilar FSCs, which is a simplification of the dual capacity upper bound for FSCs when choosing graph-based test distributions on channel outputs.

DP notations
Upper bound on capacity The DP state Eq. (11) DP state evolution Eq. (40) Theorem 3. For any unifilar FSC and a graph-based test distribution R Y |Q ≻ 0 that are jointly indecomposable, the channel capacity is bounded as for any (s 0 , q 0 ), where

Remark 1.
In the statement of Theorem 3, the condition that R Y |Q be strictly positive is imposed to simplify the presentation of the result. Indeed, without this restriction, the bound may be infinite, which is still a valid upper bound. However, more importantly, R Y |Q ≻ 0 ensures that the condition P Yt|Y t−1 =y t−1 ,X t =x t ≪ R Yt|Qt−1=qt−1 holds for any q 0 , x t and y t−1 , where q t−1 = Φ q0 (y t−1 ). The latter condition ensures that the upper bound does not depend on the choice of the initial state. This point will be addressed precisely in the proof of Theorem 3 -see Appendix A-B.
The proof of Theorem 3 is given in Appendix A. We now present the DP formulation of the upper bound in Theorem 3.
Throughout the derivations, a Q-graph and a test distribution, R Y |Q , are fixed. At each time t, let the action be the current channel input u t x t , which takes values in X . The DP state, z t−1 , is defined as appears in Theorem 3. The reward function is defined by The above formulation is summarized in Table I. We show in Appendix C-A, as part of the proof of Theorem 4, that this constitutes a valid DP. The infinite-horizon average reward of this DP is given by The following theorem summarizes the relation between the upper bound in Theorem 3 and ρ * .
Theorem 4 (DP formulation of the upper bound). The upper bound in Theorem 3 is equal to the optimal average reward in (12). That is, the capacity is upper bounded by ρ * , the optimal average reward of the DP defined above.
The proof of Theorem 4 is given in Appendix C-A. Special Case: We consider here a special case of the upper bound in Theorem 3 for which the DP formulation simplifies significantly. That is, it will be shown that both the DP state and action are discrete. Assume that the channel state is evaluated according to s t = f (x t , s t−1 ). This time we use a kth-order Markov Q-graph. For this case, for any s 0 ∈ S, the upper bound in Theorem 3 is simplified to t−k denotes a kth-order Markov distribution. Note that the simplification follows directly by considering q = y t−1 t−k (corresponding to a Markov Q-graph of kth-order) in Theorem 3, and observing that z t−1 (q, s) can be written as the product term within the parentheses.
For this special case, the DP formulation is the same as that for the unifilar FSC, but the DP state simplifies to z t−1 Specifically, from (13) and the assumption that s t = f (x t , s t−1 ), it follows that the reward is a function of (x t t−k , s t−k−1 ). Thus, it is a function of the previous DP state z t−1 and the action x t .
Note that in this formulation the DP state and the action take values from a finite set. Consequently, the numerical evaluation and the subsequent analytical derivation of the solution to the Bellman equation become more tractable.

C. A DP formulation for input-driven FSCs
The following theorem presents an upper bound on the capacity of an input-driven FSC, which is a simplification of the dual upper bound for FSCs when choosing graph-based test distributions.
Theorem 5. For an input-driven FSC and a graph-based test distribution R Y |Q ≻ 0 that are jointly indecomposable, the channel capacity is bounded as for any (s 0 , q 0 ), where The proof of Theorem 5 is given in Appendix B. Remark 1 applies also to Theorem 5. We now present the DP formulation The DP state The action u t = x t The reward function Eq. (16) DP state evolution Eq. (44)- (45) of the upper bound in Theorem 5, and it will be shown that this formulation satisfies the DP properties. Throughout the derivations, a Q-graph and a test distribution, R Y |Q , are fixed. At each time t, let the action be the current channel input u t x t . The DP state is defined as (14), (15). The reward function is defined by The above formulation is summarized in Table II. Assuming that it is a valid DP, this DP formulation implies that the infinite-horizon average reward is: The following theorem provides the relation between the upper bound in Theorem 5 and ρ * .
Theorem 6 (DP formulation of the upper bound). The upper bound in Theorem 5 is equal to the optimal average reward in (17). That is, the capacity is upper bounded by ρ * , the average reward of the defined DP.
The proof of Theorem 6 is given in Appendix C-B. In Section IV we presented several DP formulations in which the action space is discrete, while the DP state space might be either discrete or continuous, depending on the channel state evolution and the choice of the test distribution. In the case where both the DP state space and the action space are discrete, numerical methods (such as policy iteration and value iteration) always yield sufficient insights to solve the Bellman equation and extract the function h and the optimal reward ρ analytically. Accordingly, in this case, analytic upper bounds can be easily derived. In general, however, there is no systematic way of analytically determining a h and a ρ that satisfy the Bellman equation. Nevertheless, in this section we present single-letter upper bounds on the capacity of several channels that were derived by solving a DP problem with a continuous DP state space while using the insights gained from the numerical methods.
In addition, one of the main challenges here is to choose Qgraphs that will result in tight bounds. To this end, following [44], we create a pool of all valid Q-graphs up to a fixed size of nodes and choose the Q-graphs that result in the best upper bounds. A particular choice of a Q-graph is the kthorder Markov Q-graph which, as will be shown, in some cases, provides very good upper bounds.

A. The Trapdoor Channel
The trapdoor channel was introduced by David Blackwell in 1961 [9]. Its operation proceeds as follows: at each time t, the channel input, x t , is transmitted through the channel and the channel state is s t−1 . The channel output, y t , is equal to s t−1 or to x t with the same probability. The new channel state is evaluated according to s t = x t ⊕ y t ⊕ s t−1 , where ⊕ denotes the XOR operation. Accordingly, the trapdoor channel is a unifilar FSC. An illustration of the trapdoor channel appears in Fig. 1.
The zero-error capacity of the trapdoor channel was found by Ahlswede et al. [10], [11] and is equal to 0.5 bits per channel use. Furthermore, the feedback capacity of this channel was found in [12] to be C FB = log 2 φ, where φ is the golden ratio, 1+ √ 5 2 . However, the trapdoor channel was originally introduced as a channel without feedback, and the capacity of this channel in the absence of feedback is still open. The best known lower and upper bounds obtained so far in the literature imply that where the lower bound is derived in [13], and the upper bound is derived in [14]. In the following theorem, we introduce a novel upper bound on the capacity of the trapdoor channel that significantly improves the upper bound in (18).

Theorem 7. The capacity of the trapdoor channel is upperbounded by
The value of log 2 3 2 is approximately 0.5849, which concludes our new upper bound, 0.572 ≤ C trapdoor ≤ 0.5849.
The proof of Theorem 7 is presented in Appendix D. It involves analytically solving the Bellman equation (Theorem 2) corresponding to the DP formulation of the bound in Theorem 3 obtained from a graph-based test distribution defined on the Q-graph in Fig. 3. The chosen test distribution, the function h, and the optimal average reward ρ * that are used to solve the Bellman equation are given in the appendix.

B. The Ising Channel
The Ising channel was introduced as an informationtheoretic channel by Berger and Bonomi in 1990 [26]. Resembling the well-known physical Ising model, it models a channel with intersymbol interference. The channel operates as follows. At each time t, the channel input, x t , is transmitted through the channel while the channel state is s t−1 . The channel output, y t , is equal to s t−1 or to x t with probability 0.5. The new channel state is s t = x t , and therefore, the channel is both unifilar and input-driven.
The feedback capacity of the Ising channel was shown in [32] to be approximately 0.5755. In the absence of feedback, the capacity is still unknown, and the best known lower and upper bounds were recently derived in [45] and are given by In the following theorem, we introduce a novel upper bound on the capacity of the Ising channel that improves upon the upper bound in (19).

Theorem 8. The capacity of the Ising channel is upperbounded by
where the minimum is over all (a, b, c, d) ∈ (0, 1) 4 that satisfy: Evaluation of the bound shows that it is equal to approximately 0.5482. Thus, the lower bound in [45] is almost tight: The proof of Theorem 8 is presented in Appendix E, and follows from analytically solving the Bellman equation (Theorem 2) while using a graph-based test distribution that is structured on a Markov Q-graph with k = 3. The upper bound in Theorem 3 can also be evaluated for Markov graphs of higher order. However, given the elegant expression obtained by using Markov graphs of order k = 3 and the minor improvement seen when k is increased, we present only the case of k = 3.

C. The Dicode Erasure Channel
The main objective of this example is to demonstrate that the notion of Q-graphs can indeed be useful in a search for good bounds. We will show that for a simple channel known as the dicode erasure channel (DEC), a small Q-graph outperforms all Markov test distributions up to order 2.
The DEC has been investigated in [23], [28], [46] and stands as a simplified version of the well-known dicode channel with white additive Gaussian noise (AWGN) used as a model in magnetic recording [47]. Specifically, in response to the input sequence (x t ), the DEC with parameter ǫ ∈ [0, 1] produces as output the sequence (y t ), where The channel state is the previous input, i.e., s t−1 = x t−1 , so that the channel is both unifilar and input-driven. The feedback capacity of the DEC channel was derived in [23]. However, in the absence of feedback, the problem of determining the capacity is still open. In the following theorem, we present an upper bound on the DEC capacity.
The proof of Theorem 9 is given in Appendix F. The bound is obtained by analytically solving the Bellman equation (Theorem 2) corresponding to the DP formulation of the bound in Theorem 3 obtained from a graph-based test distribution defined on the Q-graph in Fig. 4. Surprisingly, although the feedback capacity optimization problem is very different (the optimization is done over input distributions that are conditioned on past outputs), our upper bound coincides with the DEC feedback capacity. Of course, the feedback capacity is always an upper bound on the capacity without feedback, so the dual capacity method does not yield a better upper bound for this channel.
Nonetheless, our approach serves to illustrate another point. Fig. 5 compares the upper bound in Theorem 9 with those obtained by optimizing over first-and second-order Markov test distributions. Since the output alphabet of the DEC is of size 4 (Y = {−1, 0, 1, ?}), the Markov Q-graphs of order k = 1 and k = 2 have 4 and 16 nodes, respectively. Thus, the dual capacity bound obtained using the Q-graph on 3 nodes (depicted in Fig. 4) outperforms that obtained from Markov Q-graphs of larger size. Of course, it is possible that higherorder Markov test distributions may yield bounds that improve upon that in Theorem 9, but the problem of optimizing over such test distributions is considerably more complex than that of optimizing over test distributions defined on the Q-graph in Fig. 4. Indeed, exploiting the symmetry between the states Q = 1 and Q = 2 in the latter Q-graph, the optimization problem over test distributions R Y |Q defined on this graph only involves two free parameters. For the purpose of comparison, we present below a lower bound on the DEC capacity that is obtained by considering first-order Markov input processes.

Theorem 10 ( [28], Ch. 4). The capacity of the DEC with erasure probability ǫ ∈ [0, 1] is lower-bounded by the maximum mutual information rate obtained from first-order Markov input processes, which is given by
The lower bound above is not explicitly stated in [28,Ch. 4], but it can be inferred from the derivations there. For completeness, an alternative proof of this result appears in Appendix G.

D. The POST Channel
The POST channel was introduced in [27] as an example of a channel whose previous output serves as the next channel state. The channel inputs and outputs are related as follows. At each time t, if x t = y t−1 , then y t = x t , otherwise, y t = x t ⊕z t , where z t is distributed according to Bern(p). Accordingly, as illustrated in Fig. 6, when y t−1 = 0, the channel behaves as a Z channel with parameter p ∈ [0, 1], and when y t−1 = 1, it behaves as an S channel with the same parameter p. Here, the new channel state is the channel output, i.e., s t = y t and therefore, the POST channel is a unifilar FSC.
The capacity of the POST channel was found in [27]. Here we give an alternative proof of the converse, i.e., an upper bound matching the capacity expression given in [27].  The bound is proved by solving the DP formulation of the upper bound in Theorem 3 obtained from a graph-based test distribution defined on the Markov Q-graph depicted in Fig. 3. The proof is given in Appendix H.

VI. CONCLUSIONS
In this paper, upper bounds on the capacity of FSCs are derived. First we used the dual capacity bounding technique with graph-based test distributions to derive a multi-letter upper bound expression on the capacity. Then it was shown that the derived upper bound can be formulated as a DP problem, and therefore, the bound is computable. For several channels, we were able to solve explicitly the DP problem, and we presented several results, including novel upper bounds on the capacity of the trapdoor and Ising channels. Further, our results for the DEC demonstrate the value of introducing Qgraphs and the accompanying graph-based distributions as a generalization of Markov distributions. An interesting future research direction is to address the complexity of finding a good Q-graph using efficient reinforcement learning tools to evaluate the dynamic program [48]. Such an integrated approach to computing upper bounds is not limited to the channels studied in this paper, and should work for any channel that admits a duality bound, e.g., for channels with feedback [49] and channels with constrained inputs [19].

APPENDIX A DERIVATION OF THE UPPER BOUND -PROOF OF THEOREM 3
We provide here a complete proof of the upper bound in Theorem 3. We start with the relative entropy term in the bound on I(X n ; Y n ) in (7): For any initial pair (s 0 , q 0 ), consider D P Y n |X n =x n ,s0 R Y n |Q0=q0 = y n P (y n |x n , s 0 ) log 2 P (y n |x n , s 0 ) R(y n |q 0 ) = y n P (y n |x n , s 0 ) n j=1 log 2 P (y j |x n , y j−1 , s 0 ) R(y j |y j−1 , q 0 ) (a) = n j=1 y j−1 P (y j−1 |x j−1 , s 0 ) yj P (y j |x j , y j−1 , s 0 ) × log 2 P (y j |x j , y j−1 , s 0 ) R(y j |y j−1 , q 0 ) where (a) follows by exchanging the order of summation and computing the marginal distributions, (b) follows by identifying the relative entropy, (c) follows from the fact that the pair (q j−1 , s j−1 ) is a deterministic function of (x j−1 , y j−1 , s 0 , q 0 ), (d) follows from the unifilar property and the fact that q j−1 is a deterministic function of (y j−1 , q 0 ) and, finally, (e) follows since the divergence does not depend on y j−1 . By taking the maximum over x n and dividing the term in (22) by n, we obtain, by way of (7), where the existence of the limit, for any (s 0 , q 0 ), is shown next.
Let us define the quantity We will argue that lim n→∞ max x n c(x n , s 0 , q 0 ) exists for any (s 0 , q 0 ), and, in fact, the limit does not depend on the particular choice of (s 0 , q 0 ). To this end, define C n and C n to be max x n min s0,q0 c(x n , s 0 , q 0 ) and max x n max s0,q0 c(x n , s 0 , q 0 ), respectively, i.e., For any fixed choice of (s 0 , q 0 ), we clearly have C n ≤ max x n c(x n , s 0 , q 0 ) ≤ C n . In Appendix A-A, we show that lim n→∞ C n exists, and in Appendix A-B, we show that this limit in fact equals lim n→∞ C n . The desired conclusion follows by a sandwich argument.

A. Existence of lim n C n
We want to show that lim n→∞ C n exists. The basic idea of the proof is to show that the sequence nC n is super-additive. A sequence is super-additive if, for any two positive integers m, k, it satisfies the inequality a m+k ≥ a m + a k . By Fekete's lemma [50], for such a sequence, the limit lim n→∞ an n exists, and is equal to sup n an n . Let m and k be two positive integers such that m + k = n. Letx m andx k be the input sequences that achieve the maximum for C m and C k , respectively. Now, letx n be the concatenation ofx m andx k , and consider x n =x n . Since, in general, x n is not necessarily the input sequence that achieves nC n , we have where (a) follows from min t [f (t) + g(t)] ≥ min t f (t) + min t g(t). We will now show that the second term in (27) is at least kC k . That is, where (a) follows from the the Markov chain Lemma 4 in Appendix A-C). Furthermore, since the minimum over s 0 and q 0 in (27) does not affect the inequality, we conclude that Therefore, nC n is indeed a super-additive sequence, which implies that the limit lim n C n exists.

B. Equality of limits
The following lemma is the main result of this section.

Lemma 1. If an FSC and a graph-based test distribution,
Before providing the proof of Lemma 1, we present a technical result. Lemma 2. Let Y ∈ Y and Z ∈ Z be two arbitrary random variables such that, for any z ∈ Z, P Y |Z=z ≪ R Y . Then, Proof of Lemma 2. We bound the difference as follows: Proof of Lemma 1. This proof follows the main idea of Gallager's proof in [8,Theorem 4.6.4].
From (22), (25) and (26), we note that for any n, Let x n and (s 0 , q 0 ) be the input sequence, the initial state, and the initial node that maximize D P Y n |X n =x n ,s0 R Y n |q0 . Let (s 0 ,q 0 ) denote the initial state and the initial node that minimize it for the input sequence x n . Therefore, by the definition of C n and C n it follows that Let m and k be two positive integers such that m + k = n. By using the chain rule for relative entropy we have Now, the condition R Y |Q ≻ 0 assumed in the statement of the lemma ensures that for any 1 ≤ i ≤ n. As a consequence, the first relative entropy term in (30) is bounded by kM 1 : Furthermore, by Lemma 2, the second relative entropy term in (30) is changed by at most log 2 (|S|) when conditioning on S k . Therefore, In a similar manner, C n can be written as in (30), where this time we consider (s 0 ,q 0 ) instead of (s 0 , q 0 ). The first term is lower bounded by 0, and here too, by Lemma 2, we have Therefore, where (a) follows since Q k = Φ(Y k ) and (b) follows by observing that the conditioning on Y k can be dropped due to the channel definition, and since R(y n k+1 |y k , q k , q 0 ) = R(y n k+1 |q k , q 0 ). Here, too, there exists a finite integer M 2 such that Therefore, To further upper bound this, we will use Lemma 3 which implies that d k tends to zero as k grows. Accordingly, by Lemma 3, for any ǫ > 0, we can choose k so that d k ≤ ǫ. Therefore, for such a k, Since ǫ > 0 is arbitrary and C n ≥ C n , the proof is completed.

Lemma 3.
Consider an FSC and a graph-based test distribution that are jointly indecomposable. Then, for any ǫ > 0, there exists an N , such that for n ≥ N P (s n , q n |x n , s 0 , q 0 ) − P (s n , q n |x n ,s 0 ,q 0 ) ≤ ǫ for all s n , q n , x n ,s 0 ,q 0 , s 0 , and q 0 .
Proof of Lemma 3. Since the FSC and the graph-based test distribution are jointly indecomposable, then, by Definition 2, for some fixed n and each input sequence x n , there exists a choice of s n and q n , such that P (s n , q n |x n , s 0 , q 0 ) > 0, for all s 0 , q 0 .
In [8, Theorem 4.6.3], Gallager provides a sufficient condition for verifying that an FSC is indecomposable, that is, a sufficient condition for verifying that property (2) holds. Following his proof with an appropriate modification we obtain that condition (34) is sufficient for verifying that condition (33) holds. In particular, the modification is done by replacing the state s n by the pair (s n , q n ), and the initial state s 0 by the pair (s 0 , q 0 ).

C. Proof of the Markov chains
We now show the Markov chains that were required in the proof.
Lemma 4. For any FSC, the following Markov chains hold: P (s t , q t |x t , q m , s m , s 0 , q 0 ) = P (s t , q t |x t m+1 , q m , s m ), P (s m , q m |x t−1 , s 0 , q 0 ) = P (s m , q m |x m , s 0 , q 0 ), for t ≥ m + 1.

Proof of Lemma 4. For the first Markov chain, consider
where (a) follows since q t is determined by a deterministic function of q m and the output sequence y t m+1 . Further, P (y t m+1 , s t m+1 |x t , q m , s m , s 0 , q 0 ) where (a) follows by the chain rule, and (b) follows by the definition of an FSC. From (36), we observe that P (y t m+1 , s t m+1 |x t , q m , s m , s 0 , q 0 ) does not depend on x m , s 0 , q 0 , and therefore, from (35), so does P (s t , q t |x t , q m , s m , s 0 , q 0 ).
where (a) follows by the chain rule, and (b) follows by the definition of an FSC. From (38), we observe that P (y m , s m |x t−1 , s 0 , q 0 ) does not depend on x t−1 m+1 , and therefore, from (37), so does P (y m , s m |x t−1 , s 0 , q 0 ).

APPENDIX B UPPER BOUND FOR THE INPUT-DRIVEN FSC (THEOREM 5)
Proof. The proof is based on the same main steps we used in the proof of Theorem 3, but here we consider input-driven FSCs. Let us find an expression equivalent to the conditioned version of the relative entropy term in (7). For any initial pair (s 0 , q 0 ) we have, where step (a) follows by computing the marginal distributions, exchanging the order of the summations and identifying the relative entropy, step (b) follows from the fact that q j−1 is a deterministic function of (y j−1 , q 0 ), step (c) follows by the input-driven FSC law, i.e., and step (d) follows since the divergence does not depend on y j−1 . Therefore, for any (s 0 , q 0 ), we conclude that where (a) follows from the dual upper bound for FSCs and (39), and z j−1 , γ j−1 are defined as The proofs of the limit's existence and of the initial state independence are omitted as they follow from the same steps taken for the unifilar FSC in Appendix A.

A. DP formulation for unifilar FSCs (Theorem 4)
In this section, we prove Theorem 4 on the formulation of the upper bound in Theorem 3 as a dynamic program. The proof has three technical parts: the first two parts are there to verify that the DP is well-defined, and the last part is there is order to relate the average reward of the DP and the upper bound in Theorem 3. These are summarized in the following lemma.

Lemma 5. 1) The reward is a time-invariant function of the DP state and action.
2) The DP state is a deterministic function of the previous DP state and action.

3) The limit and the maximization in the upper bound in
Theorem 3 can be exchanged. Specifically, where c(x n , s 0 , q 0 ) is defined in (24). Since we showed that the upper bound is independent of the initial state, we can conclude from the third item that C ≤ ρ * .
Proof of Lemma 5. 1) Recall that the reward function is defined as Therefore, for a fixed FSC and test distribution, it can be easily noted that the reward is a function of the previous DP state z t−1 and the action u t x t . 2) Let us first derive a recursive relation between z t at the coordinates (q t , s t ) and the previous DP state z t−1 : where (a) follows from the Markov chain follows from the channel and the Q-graph definitions. From (40), it is clear that z t is a function of z t−1 and the action x t .
3) The main idea is to show the equality by showing the corresponding two inequalities. The first inequality can be shown as follows: where (a) follows by Fekete's lemma (see Appendix A-A where it is shown that the sequence nC n is supper additive). We now show the reverse inequality. Using the notation and the main result from Appendix A-A, the existence of lim n→∞ C n implies that, for any ǫ > 0, there exists an N (ǫ) such that for all k > N (ǫ) Fix k > N (ǫ), and letx k be the input sequence that achieves the maximum. Definex ∞ = {x t } ∞ t=1 as an infinite sequence composed of identical concatenations of the sequencex k . Consider the following chain of inequalities where (a) follows by considering the sequencex ∞ , which is not necessarily the input sequence that achieves the maximum, (b) follows from the fact that k is fixed and the divergence is bounded, and therefore, when rounding n to k⌊n/k⌋ the residual goes to zero, (c) follows from taking the minimum at the beginning of each kth block, i.e., min t i f i (t) ≥ i min t f i (t) and, the fact thatx ∞ is a repetition of the same sequencex k , and, finally, (d) follows from (42).

B. DP formulation for input-driven FSCs (Theorem 6)
In this section, we prove Theorem 6 on the formulation of the upper bound in Theorem 5 as a dynamic program. Similarly to Lemma 5, the proof consists of three technical parts that are summarized in the following lemma.
Here, also, the upper bound is independent of the initial state. Therefore, we can conclude from the third item that C ≤ ρ * .
Proof of Lemma 6. 1) The reward function in Eq. (16) is defined as Accordingly, since z t−1 = (β t−1 , γ t−1 ), this item is deduced directly from the definition above. 2) Let us first derive a recursive relation between z t = (β t , γ t ) and the previous DP state z t−1 . In particular, β t is computed as where (a) follows from the Markov chain , and (c) follows from the channel characteristics and the Q-graph definition. Furthermore, γ t is computed as where (a) follows from the Markov chain S t−1 − (X t−1 , S 0 ) − X t and the input-driven FSC definition in (4). From (44) and (45), it is clear that β t and γ t are a function of the previous DP state z t−1 and the action x t .
3) The proof of this item is omitted as it follows from the same steps taken for unifilar FSCs in Appendix C-A.

APPENDIX D TRAPDOOR CHANNEL -PROOF OF THEOREM 7
Proof. The proof is based on the Markov Q-graph from Fig. 3 and on the following optimized graph-based test distribution: Since the trapdoor channel is a unifilar FSC, we define z as a pmf on Q×S that corresponds to the DP state in Section IV-B. In particular, z consist of four elements that are indexed as z q,s where z q,s = P (q, s). To simplify notation, we will consider in the calculation below the relation z 1, Recall that to solve the Bellman equation (Theorem 2), one should identify a scalar ρ and a function h : Z → R such that In the following, we show that ρ * = log 2 3 2 and the function h * (z) = z 1,0 , z 0,1 ≤ z 1,0 , z 0,1 , z 0,1 > z 1,0 .
We will now verify that the assumption we made in (47) holds. That is, which is nonnegative for all z 1,0 ≥ z 0,1 , and therefore, in this region, u = 0 is indeed the optimal action. Similarly, it can also be verified that, for all z 1,0 < z 0,1 , u = 1 is the optimal action. Therefore, we conclude that, ρ * = log 2 3 2 is indeed the optimal average reward.

APPENDIX E ISING CHANNEL -PROOF OF THEOREM 8
Proof. The proof is based on a Markov Q-graph with k = 3. Recall that for the Ising channel the state is evaluated according to s t = x t . Therefore, we can use the simplified DP formulation that is presented in Section IV-B. The proof of the bound is based on the following graph-based test distribution: According to the DP formulation, the next DP state is computed as F (z, u) = [z 1 , z 2 , z 3 , u], and the reward function is defined as According to Theorem 2, if we identify a scalar ρ and a bounded function h(z) such that F (z, u))] , ∀z ∈ Z, (48) then ρ = ρ * . In the following, we show that and the function h * (z) defined below solves (48).
Let us assume that the optimal policy, under the constraints given in (20), is given by where ⊕ denotes the XOR operation. The policy in (51) is obtained by optimizing the DP program and extracting the relation between the optimal policy and the DP state. Assuming (51), it can now be verified that (48) is satisfied with the above choice of ρ * and the function h * (z). Here, we will verify that it holds only for z = [0, 0, 0, 0], and the verification for the other states can be done similarly. The lefthand side of the Bellman equation is while the right-hand side of the Bellman equation is = g(0, 0, 0, 0, 1) + h(0, 0, 0, 1) where (a) follows from (51), and therefore the Bellman equation holds for z = [0, 0, 0, 0]. It is now left to verify that the suggested policy in (51) is indeed optimal under the constraints given in (20). Here, too, we will verify it only for z = [0, 0, 0, 0] and the verification for the other states can be done similarly.

APPENDIX F DEC -PROOF OF THEOREM 9
Proof. The proof is based on the Q-graph depicted in Fig. 4 and on the following graph-based test distribution: where p ∈ [0, 1], the rows correspond to Q = 1, 2, 3 and the columns correspond to Y = −1, 0, 1, ? in that order. Now, note that some of the test distribution entries are equal to zero, and therefore, the condition R Y |Q ≻ 0 in Theorem 3 does not hold. However, it can be easily verified that the condition in Remark 1 holds. This is mainly due to the fact that when Q = 1 the previous state must be equal to 0, and when Q = 2 the previous state must be equal to 1. We omit the details of this verification.
Using the above choice of a test distribution, one can show that the Bellman equation holds. However, since the upper bound is exactly equal to the feedback capacity, and C ≤ C FB (where C FB denotes the feedback capacity), we will not provide here the proof that the Bellman equation holds. It will only be shown that the resultant upper bound expression in Theorem 9 is equal to the feedback capacity [23].
The feedback capacity of the DEC is given by Straightforward calculations show that the derivative of G(p, ǫ) (with respect to p) is equal to zero iff Therefore, C FB = G(p * , ǫ) where p * = arg max p∈[0,1] G(p, ǫ).
Using simple algebra, it can be further verified that (53) holds iff 2p = p ǫ . Hence, p * is the solution p of the equation 2p = p ǫ .

APPENDIX G DEC -PROOF OF THEOREM 10
Proof. The basic idea of the lower bound proof is to consider input sequences that are restricted to a first-order Markov process, i.e., In the following we denote by P markov the set of all distributions satisfying (54). The capacity of the DEC is then lower bounded by C DEC ≥ lim n→∞ max P (x n )∈P markov 1 n I(X n ; Y n |S 0 = s 0 ) for any s 0 ∈ S. Based on the channel symmetry, we consider the following input distribution: where a ∈ [0, 1]. In the following, we will find the mutual information in (55) explicitly: where (a) follows by the Markov chain Y i − (X i , X i−1 ) − (X i−2 , X n i+1 , Y i−1 ) and the channel law. To find H(Y i |Y i−1 , S 0 = s 0 ), we will calculate the probabilities P (y i |y i−1 , s 0 ). First, let us find the distribution P (x i = 0|y i , s 0 ) for any possible output sequence y i . We will show that this distribution induces the graph depicted in Fig. 7. For any output sequence y i−1 , P (x i = 0|y i = −1, y i−1 , s 0 ) = 1, (57) P (x i = 0|y i = 1, y i−1 , s 0 ) = 0, where (57) follows since the channel output is y i = −1 iff x i = 0, and (58) follows since the channel output is y i = 1 iff x i = 1. Further, P (x i = 0|y i = 0, y i−1 , s 0 ) = xi−1 P (x i−1 |y i−1 , s 0 )P (x i = 0|x i−1 )P Y |X,S (0|0, x i−1 ) Based on (57)-(60) we now show that the probability P (x i = 0|y i , s 0 ) induces the graph depicted in Fig 7. Equations (57) and (58) imply that, for any possible output sequence y i−1 , if y i = 1 or y i = −1, then P (x i = 0|y i ) is equal to 0 or 1, respectively. Therefore, each node on the graph in Fig 7 has an outgoing edge labeled with y = −1 to Q = A 0 and an outgoing edge labeled with y = 1 to Q = B 0 . Equation (59) implies that each possible node on the graph has a self-loop labeled with y = 0. Finally, (60) implies that, if the current output is y i =?, then there is an outgoing edge labeled with y =? to the next node on the graph, as depicted in Fig 7. Note that the induced graph contains an infinite number of nodes.
To conclude, given an initial node q 0 , there exists a unique mapping Φ q0 : Y i → Q from an output sequence y i to a unique node on the induced graph. Therefore, the equality P (x i = 0|y i , s 0 ) = P (x i = 0|q i , s 0 ) holds where q i = Φ(y i ). Accordingly, using (57)-(60), it follows that for q ∈ N ∪ {0} P (x i = 0|y i , s 0 ) where α q = 1+(2a−1) q 2 . We now calculate P (y i |y i−1 , s 0 ) for any possible output sequence y i : To find the stationary distribution induced by the graph, we first calculate the transition probability P Q|Q − as follows: × P (x t |x t−1 )P (y t |x t , x t−1 )P (q t |q t−1 , y t ).
Based on the graph symmetry and by using simple algebra, it follows that the stationary distribution that is induced by the transition probability P Q|Q − is where q ∈ N ∪ {0} and k is a constant in [0, 1]. Recall that the entries of the stationary distribution must sum to 1: where (a) follows by using the formula of a geometric series with a common ratio ǫ 1−aǭ . Hence, k =āǭ 2(1 − aǭ) .

APPENDIX H POST CHANNEL -PROOF OF THEOREM 11
Proof. The proof is based on the Markov Q-graph depicted in Fig. 3 and the following optimized graph-based test distribution: Define z as the pmf on Q that corresponds to the DP state. To simplify the notation, we denote K (1+pp p p ) −1 . Further, since the vector z consists of only two components that sum to one, we then consider the DP state to be only the first component and denote it by z.
According to the DP formulation, the next DP state is computed as