On Calculating Sakrison ’ s Rate Distortion Function for Classes of Parameterized Sources

Sakrison extended Shannon’s notion of the rate distortion function to parameterized classes of sources by taking a minimax approach and defining a measure of the minimum rate required for information reconstruction subject to a prescribed fidelity level D. Unfortunately, calculation of Sakrison’s rate distortion function may be very difficult because analytic solutions do not generally exist and there has been a lack of a constructive method for finding the rate. However, an approach presented in this correspondence may be used to calculate an approximation to Sakrison’s rate distortion function for classes of sources with a finite, discrete input space and a continuous parameter space. The approach gives rise to an algorithm which is shown to be convergent and numerical examples are studied.


On Calculating Sakrison's Rate Distortion Function for Classes of Parameterized Sources
Laurence B. Wolfe Abstract-Sakrison extended Shannon's notion of the rate distortion function to parameterized classes of sources by taking a minimax approach and defining a measure of the minimum rate required for information reconstruction subject to a prescribed fidelity level D. Unfortunately, calculation of Sakrison's rate distortion function may be very difficult because analytic solutions do not generally exist and there has been a lack of a constructive method for finding the rate.However, an approach presented in this correspondence may be used to calculate an approximation to Sakrison's rate distortion function for classes of sources with a finite, discrete input space and a continuous parameter space.The approach gives rise to an algorithm which is shown to be convergent and numerical examples are studied.Index Terms-Rate distortion function, source matching, relative entropy.

I. INTRODUCTION
Sakrison in [6] extended the notion of a rate distortion function to classes of sources.Shannon [7] had previously defined the rate distortion function Re ( D ) for an individual source 0 which measures the minimum amount of information that must be preserved by any code to allow reproduction of the compressed data with average distortion less than or equal to a given D .According to Sakrison [6], the rate distortion function R A ( D ) for a class of sources with compact parameter space A may be defined as the supremum over all rate distortion functions Re (D) in the class.However, calculation of RA (D) for a continuous A is not generally tractable because any solution would involve an uncountable number of integrals.
Apparently, Sakrison's RA ( D ) can be approximated for continuous A by performing an exhaustive search after selecting for some M , a finite set of parameters {O,}zl to represent the class of sources with finite, discrete input spaces, where O3 E A. In fact, it will be shown later in this correspondence that under some general conditions and a specific value of M, there exists an optimal set {O;}gl yielding the most accurate approximation to Sakrison's RA ( D ) .However, finding the {0;},"=, is not trivial because it would involve computation of an infinite number of integrals and the lack of a general computational method.Therefore, we seek a procedure for selecting a set {O,}:=, which contains a subset that accurately approximates {O;}El for some M 5 N .We present such a procedure in this correspondence and employ an approach which selects {0,}~=, from A by using relative entropy to group within N subclasses, all sources in A whose entropies are within a previously assigned level of similarity.In this correspondence, preliminary information is reviewed in Section I1 and an approach for calculating Sakrison's [6] rate distortion function is presented in Section III.An example is studied in Section IV, and conclusions are in Section V.

PRELIMINARIES
Sakrison [6] and H ( Q ; q ) is the relative entropy which measures the similarity (or difference) between probability measures Q and q over the reproduction space.
Apparently, analytic solutions t3 (1) do not exist in general, and a minimization technique must be used to find a solution.One  p:, Q k l , , q ) .Q , 42) For a fixed Qk13 in Q ( D ) , F(p!, Qk13,q) is minimized by ) € A 3) For a fixed q, F(p:, Qkl,, q ) is minimized by

Z€A
The results show that Blahut's calculation of Shannon's R e ( D ) must depend on a parameter s.However, in order to extend Shannon's rate distortion function to a class of sources A, Sakrison [6] considered the mutual information I ( p s , Q) for each 0 E A and defined the extended rate distortion function for a class of sources A as Sakrison [6] also proved that if the parameter space A is compact, then Apparently, (I 1) and (12) show that calculating RA (D) is equivalent to solving a minimax problem.Although such problems are generally difficult to solve, the well-known minimax theorem as given in [4] shows that a solution may be obtained by finding a least favorable distribution and the corresponding Bayes risk.Using (12), consider a Bayes problem with a given prior distribution ~( 6 ' ) over A and a weighted mutual information function T ( T , Q ) Following a source-matching approach presented by Davisson et al. in [3] and combining (12) and (13) it can be shown that a solution is given by an uncountable number of solutions of (1 l), (12), and (14) would be required; one for each 8. Furthermore, analytic solutions for RA (D) are not generally available and calculating R"(D) may be very difficult because of the lack of a general computational approach.However, an approach presented in the next section may be used to calculate an approximation to R~ ( D ) .

AN APPROACH FOR CALCULATING RA ( D )
The approach comes from the recognition that Sakrison's rate distortion function RA (D) for a continuous class of sources A can be approximated by performing an exhaustive search after selecting for some M , a finite set of parameters {o,},"=, to represent the class, where 8, E A. We consider only classes with a finite, discrete source space A of size J.Under certain conditions, specified later, it is shown this section that for a specific value of M, there exists a set {8*}E, which yields the most accurate approximation to Sakrison's R (D).However, finding the {O;}, ", , is not trivial because of the lack of a general computational method.Therefore, we seek a procedure for selecting a set {e,},"=, which contains a subset that accurately approximates {O;}E, for some M < N .We present such a procedure in this section.The procedure uses relative entropy (i.e., Kullback distance, cross entropy, discrimination information) as in [8] to group, within subclasses, all sources in a continuous parameter class whose entropies are within a previously assigned level of similarity.A finite set of N representative sources is then selected from the N subclasses, namely { 8 , } ~= , , which is subsequently used to derive an approximation to Sakrison's rate distortion function.
We begin by describing an algorithm adapted from [8] to partition a continuous class of sources A into N subclasses and select a set of N representatives; one from each subclass.An error threshold E S P can be adjusted in the algorithm such that N can assume any positive integer value.For simplicity of discussion, the algorithm will be given assuming that each parameter 8 E A lies on the real line in the closed interval [a, b].Furthermore, we assume that H ( 8 ; e') varies smoothly with 8, for all 8 E A. We also assume that A is compact; and if it is not, add the hull to the space for purposes of computation and subsequently restrict our choice to only those sources which lie in A for inclusion in the finite set of subclasses.Also, the algorithm can readily be adapted to higher dimensions under similar, general conditions.
The Source Partition algorithm given below groups all sources into N subclasses using relative entropy to measure the similarity between any two sources.Thus the algorithm produces a finite set of N parameters {e,}:='=,, which partitions [a, b] into N subclasses

{S,}!&
where S, = [e-,-,, e,].The {S-,}&l are chosen such that the relative entropy between any two sources in a subclass is less than a given tolerance E S P .Clearly, the number of subclasses produced, N , can assume any positive integer value by appropriately varying E S P .where E is the class of all prior distributions defined on A.
When A and Q ( D ) are compact and both the input and output spaces are discrete, the infimum and supremum in (14) can actually be replaced by the minimum and maximum.Furthermore, if the class of sources A is finite the minimax problem may be solved by calculating R B ( D ) for each 8 E A. R A ( D ) may then be determined by an exhaustive search over all 0 E A.
Unfortunately, this approach cannot be directly applied to finding Sakrison's RA ( D ) when the class of sources A is continuous since as N + 00, as will be shown later in this section.However, the validity of approximating Sakrison's RA (D) in this way depends on the proper choice of N, as shown in the following theorem.Theorem 1: Let A be a continuous, compact parameter space for a class of sources with input space A of size J. Further, assume that each parameter 8 E A lies on the real line in the closed interval [a, b] and that H ( 8 , 8') varies smoothly with 8, for all 8 E A. For a given distortion measure d, there exists a finite set of sources {ez* E A } E l where M 5 J, which gives rise to Sakrison's rate distortion function Proof: Let Q* be the channel given by solution of (1 1) thereby generating Sakrison's rate distortion function R A ( D ) .However, Q' along with all 8 E A gives rise to a new class of sources W with probability distributions given by R"(D).

) I f 8 N > b o u t p u t t h e
Wk" = QE1, -P,". (15) Therefore, W is a class of discrete sources with parameter given by 8 E A. Let H ( W e ) denote the average length of the best code for W e and define the redundancy between any two codes as ?.(wet, we3) = I H ( W 8 t ) -H(we3)I. (16) The problem becomes one of minimizing the maximum redundancy over W by finding a source best matched to the class of sources.
However, this is a source-matching problem which, as shown in [3], is solved by finding the channel capacity between the parameter space A and the output space A. The solution of this source-matching problem also produces an associated least favorable distribution T* over A. The validity of utilizing a finite set (8, E A } E l chosen from an uncountable number of sources can now be seen for the sourcematching problem by applying [5, Corollary 3, p. 961 which states that for a finite output space A there is a distribution T * over A that assigns a nonzero probability to only a minimal number of sources (8, E A}El and T* gives rise to the channel capacity between A and A. Furthermore, M can be no larger than the size of the output set, i.e., M 5 J.
Observe that there exists a one-to-one correspondence between each W e and some p e and that W maps onto the class of all { p e } .
By construction, the quantization and source-matching problems correspond directly and solution of one implies solution of the other.

QED
The suggested approach may now be summarized as follows: Algorithm 1: 1) Apply Algorithm SP to produce N representative sources {8J}y=l, which contains a subset that approximates {8;},"=,, the optimal set of M source representatives, where M 5 J 5 N .
2) Calculate Shannon's rate distortion function for each 8, E {O,}y=, and find the maximum by exhaustive search, which is the approximation to Sakrison's rate distortion function One of the most well-known ways of calculating Shannon's rate distortion function as in step 2 is by using Blahut's algorithm [l].Before showing convergence of the approach, we observe that there may be a difficulty in directly applying Blahut's rate distortion algorithm [ 11 to step 2. Blahut's algorithm uses a LaGrange multiplier s which is the slope of the rate distortion function, to determine the information rate required at distortion level D for a channel transition matrix Q.Therefore, a direct application of Blahut's algorithm to a set of representative sources will produce a comparison based upon slope s.Unfortunately, the respective rate distortion functions at slope s may actually have dissimilar distortions D; thereby rendering the comparison invalid.That is, two different sources may require R"(D).two different slopes s in order to produce the same distortion D.
Therefore, application of Blahut's algorithm to this problem must be modified to ensure that the rate distortion functions of any two sources are compared only at the same level D.
A modification in using Blahut's algorithm can apparently be made by selecting slope sJ as a function of D, for each source 8, in the set of N representative sources {8,}~=l.There are simple techniques available to approximate s for a particular value of D. An example of one such method is obtainable from [lo, p. 951 in the case of the difference distortion measure [ d J k ] z p ( kj), for some measure p.
Employing this or a similar method applicable to the given distortion measure d, enables calculation of Re3 ( D ) for each j = 1, ... , N which gives rise to the set { Re3 ( D ) } g l .Taking the maximum over this set determines the sought after approximation to Sakrison's rate distortion function RA (D) .
It is clear that any set of N representative sources { 8 , } ~' = , , M 5 J 5 N produced by Algorithm SP does not necessarily contain the optimal M sources.However, since relative entropy is used as a measure of similarity to construct the subclasses, the j t h optimal source must be in some subclass represented by B k , for some IC.
Therefore, the set {e,},"=, contains a subset which approximates the optimal {Sf}E1 that determines Sakrison's rate distortion function RA (D) .Clearly, the error depends upon the relative entropy threshold E S P used to determine the subclasses.Thus for each increasing value of N, {e,},"_, must contain a subset which more closely approximates the optimal (6'; }E1 because of our assumption that H ( 8 ; 8') varies smoothly with 8, for all 8 E A. Therefore, even though only M sources are required to represent the class of sources A, it is apparent that the accuracy of the approximation increases with smaller E S P and corresponding larger N, as shown in the following theorem.
Theorem 2: Let R $ ( D ) be the rate distortion function for a compact class A, that is determined by exhaustive search over the set {Re, ( D ) } E 1 produced when N 2 M representative sources are selected by Algorithm SP.Further, assume that each parameter 8 E A lies on the real line in the closed interval [a, b] and that H ( 8 ; 8') varies smoothly with 8, for all 8 E A. The sequence {R$(D)} converges in the limit to R A ( D ) as N + 00.information produced by 0, is greater than that of 8,.The second inequality is a result of (14).Case 2: There does not exist a 6, E {0,}:2 that produces the greatest mutual information, such that 0, $ !{O,}y=l and where the mutual information produced by 0, is such technique was given by Blahut [ l ] and is briefly reviewed for reference because it is used in Sections III and IV.Blahut's algorithm formulates an associated minimization problem in terms of a LaGrange multiplier s r 1 where and Q' is the channel transition matrix in a compact space Q(D) which achieves the minimum.Using this parametric form, Blahut U.S. Government work not protected by U.S. copyright formulated a double minimum problem by defining with the result that 1) R e ( D ) = s.D + min F( : Set E S P = an assigned error tolerance; 81 = a and N = 1.

4 )
If 8. w < b, go to step 2.An approximation to Sakrison's rate distortion function RA( D) could' in concept be calculated with any finite set of representative sources {e,}: = 1, produced by Algorithm SP.In fact, R$(D), the rate distortion function calculated using these N representative sources, must converge to Sakrison's rate distortion function RA (D)

Proof:
It suffices to show that R$+, ( D ) 2 R$( D) for all N because R " ( D ) achieves the supremum over all R e ( D ) in the compact space A. Consider the approximation produced by N representative sources {O,}y==,.Let 8, E {8,}~==, produce the smallest mutual information.Case 1: find 0, E { 8 , } ~~~ that produces the greatest mutual information, such that 8, $Z {8,},"=1 and where the mutual information produced by BS is greater than or equal to that produced by 8,.Thus the following equation is true: R k ( D ) = sup inf T ( T ~, Q)