The Moments of Matched and Mismatched Hidden Markov Models

An algorithm for computing the moments of matched and mismatched hidden Markov models from their defining parameters is presented. The algorithm is of general interest because it is an extension of the usual forward-hackward linear recursion. The algorithm computes the joint moments of the posterior likelihood functions (i.e., the scores) by a multilinear recursion involving the joint moments of the random variables associated with the hidden states of the Markov chain. Examples comparing the first two theoretical moments to simulation results are presented. They are of independent interest because they indicate that the distribution of the posterior likelihood function scores for matched and mismatched models are asymptotically log-normal in important special cases and, therefore, are characterized asymptotically by the first two moments alone. One example discusses the effect of a noisy discrete communication channel on a suboptimal classification method based on the distributions of scores rather than on maximum likelihood classification.

In speech applications, they are used to characterize the time variation of the short term spectra of spoken words. where 0 T c HMM(i) denotes the hypothesis that the observation sequence 0 T is a realization of HMM(i). The maximum of the p computed posterior likelihoods is assumed to identify, or classify, the original signal s(t).
In practice, some kind of tie-breaking rule must be defined and some threshold must be set to identify signals for which HMMs have not been trained.
The likelihood function (1) can be computed with only n 2 T multiplications (where n is the number of states in the Markov chain) by using the forward-backward algorithm. 2 The misclassification rate (or false alarm rate) of the system depicted in figure 1 can be estimated by simulation after training is completed.
Alternatively, the misclassification rate of signal i as signal j can be determined from the conditional cumulative distribution functions If F ij(x) is differentiable with derivative F ij(x), then the moments can be written equivalently as the Riemann integral

I-
The moments depend on the length T of the observation sequence because  HMMs impact the performance c" the system depicted in figure 1.

II. THE MOMENT ALGORITHM
The reader is assumed to be familiar with such first principles of HMMs as given in reference 2. It is not, however, necessary to reaJ this section to undei tand the examples provided in section 11.

A. FINITE SYMBOL HMMs
Let HMM(u) be a hidden MarKov process with n(u) states, u = 1 ,  The assumption that the training phase is completed means that the parameters X = (w , Au, Bu) are known.
Substituting equation (4) into equation (3) gives It is clear from equation (5)  The expression in equation (5) (5)  We now derive a recursion for equation (5) that requires computational effort that grows only linearly with T. The recursion is derived for a more general expression that contains equation (5) as a special case.

T V=l
The application of equation (6) to compute any moment from equation (5)

%__=
The recursion is verified for T = 1. For T = 2, from equation (12), we have so that, from equation (10), and the recursion is verified for T = 2.
Once r has been computed and stored for a given value of k, the recursion (12) can be computed for any length T of the observation sequence.
For each of the N k sets of indices (j(to)} in equation (12) it is possible to use approximately The posterior likelihood function f (0 T) is now a probability density function for continuous symbol HMMs, as opposed to a simple probability (see equation (1)) for discrete symbol HMMs. Thus,for real vectors V and B with '&< B, we have where 0 T c HMM(u) denotes the hypothesis that 0 T is a realization of HMM(u) and dO T = dx ... dx T .
The conditional cumulative distribution functions Fi(x) are defined by equation (2) Sf.
T-fold which is the continuous analog of equation (5). It is clear from equation (23)

T-fold
The forward-backward algorithm for computing the posterior likelihood function for continuous HMMs is modified as follows:

j(u):l
where aT(j(v)) is computed exactly as given by the recursions (8) and (9), with the only difference being that b" (0(t)) in equation ( in place of equation (13).
The remarks in the preceding section concerning storage, multiplication counts, and symmetry properties all apply for continuous symbol HMMs. The primary difference is that equation (27) requires an integral evaluation instead of a finite sum as in equation (13). This evaluation increases the initial computational overhead, but once equation (27) is computed, the algorithm (12) proceeds exactly as before. guarantee that maximum posterior probability classification (described in the Introduction) will be 98 percent reliable? We will give what may best be 6escribed as a semiempirical answer to this question.

f3(OT) = 0.
Other forbidden symbol sequences may also be noticed. It will be seen that these facts make f3( 0 T) a powerful discriminator against ergodic observation sequences.
To summarize briefly, this example will show that short observation sequences of quasi-stationary and transient   Table 7 gives two estimates of the mean and standard deviation of log dF 2 3 (x), and    Consequently, a tradeoff exists between short T and long T.
The total misclassification rate can be expressed as the sum of the misclassification rate due to forbidden symbol sequences and the misclassification rate due to noise-induced shift in the statistics of the nonzero values of the posterior likelihood function. We examine the total misclassification rate for HMM (4), which is defined to be the HMM equivalent  , it is clear that no significant difference between 0 log dF 3 4 (x) and log dF 3 3 (x) is evident. Therefore, the misclassifi-