Published September 1, 2020 | Version Version 1
Journal article Open

AN ANALYSIS OF CONCEPTS AND TECHNIQUES CONCERNING THE USE OF HIDDEN MARKOV MODELS FOR SEQUENTIAL DATA

  • 1. Sardar Patel Institute of Technology, Mumbai, India
  • 1. AM Publications

Description

This A lot of machine learning concerns with creating statistical parameterized models of systems based on the data points that have been extracted from some underlying distribution depending upon the inductive bias. In probabilistic models, we model the joint probability of the input and the output or even the conditional probability of the output given the input (and vice versa using Bayes theorem).To elaborate, generative models determine the probability of the input given a particular output class if it is a classification task and discriminative models find the probability of the output class given the input. For a large class of models such as the standard ANNs, we assume the individual data points to be independent and identically distributed. However, when it comes to data such as rainfall measurements across successive days, there is an inherent sequential structure to the data which the previous assumption fails to take into consideration. Hidden Markov Models are used to model sequential data such as a time series where in for each sequence observed, the successive data points in that sequence are temporally related and are not independent, they are a part of a series of measurements and are evolving with respect to time-here, an independent variable. HMMs(Hidden Markov Models) are able to exploit this structure in the data and are used for a variety of tasks which are inherently sequential such as speech recognition, language modelling, time series etc. In this paper, we will understand and review the mathematics behind building and training an HMM, we will understand Latent Variables, Markov Chains, the forward backward algorithm and the Viterbi algorithm for HMMs. We will understand Lagrange Multipliers, the expectation maximization algorithm and see how it is applied for optimization in an HMM. We will also review some of the applications of an HMM. [3]

Notes

In the scientific community, there has been an observation that often probabilistic models are more reasonable than deterministic models.[6] Many phenomena that we observe are not just random phenomena to be described by Random Variables, but they are also evolving with time. A stochastic process is a family of Random Variables which are functions of time. A Markov Chain is a stochastic process with the Markov Assumption which is that the Random Variable at time step t depends only on the Random Variable at the previous time step and the probability distribution of the Random Variable at time step t given all the previous Random Variables reduces to the probability distribution of the Random Variable at time step t given the previous Random Variable at time step (t-1) and is independent of the Random Variables which precede that.[1] Now coming back to our sequence modelling problem, we can model the joint probability distribution of the input observations for a sequence as follows: As we said earlier, we have to relax the assumption about the observations being independent and identically distributed. Under the Markov Assumption, this equation reduces to- We assume that the conditional distribution of the Random Variables given the previous one does not change over time steps. This is equivalent to a stationary time series. Such a homogenous Markov Model simplifies the math and allows us to model a large number of sequences. If the Random Variable (for all time steps) takes on K states, then the conditional distribution of the Random Variable at one time step t given the Random Variable at the previous time step will have (K-1) free parameters for each of the K states of the Random Variable at time step (t-1). (K-1) is because we are modelling a probability distribution with the sum of the probabilities of the function (Random Variable) constrained to be 1. Thus, there will be K (K-1) number of free parameters. As we increase the number of terms as given in the conditional probability distribution, the number of free parameters will increase exponentially. We must now understand latent variables. These are variables in our model, which are not directly observed but are inferred based on the values of other observable variables, which can be measured. Latent variables may represent an abstract concept or they may represent a part of physical reality. In latent variable models, the observed variable responses are assumed dependent on the latent variables. We can create sequence models which can generalize more than the Markov assumption and are consisting of a relatively limited number of parameters by introducing latent variables. In the Hidden Markov Model, for each observation x at time step t, we introduce a latent variable z at time step t that is responsible for producing that observation. We make the hidden variables a Markov Chain. The distribution for each observed variable depends only on the latent variable at that time step and the corresponding conditional distribution determines the observed variable values. We introduce a matrix, A which is the transition matrix of the Markov Chain of the latent variables and a matrix B that denotes the emission probabilities of the observations given the latent variable. A(i,j) is the probability of the latent variable going from state i to state j from one time step to the next. B(j,k) is the probability of observing input k in state j where j is the state of the latent variable. In our homogeneous models, the emission distribution is the same across time steps. We have parameterized it using the matrix B, HMMs can have other emission distributions such as a mixture of Gaussians.[1] We let π denote the initial distribution of the states for the first time step. Now that we have defined our model, we will demonstrate how to find the probability of the sequence, that probability is the likelihood estimate of the data given the parameters. This likelihood is also our objective function during optimization which we try to maximize. We will see how to find the most likely sequence of hidden states given an observation sequence, this is useful after our training is done.

Files

02.AUCS10083.pdf

Files (241.9 kB)

Name Size Download all
md5:a5b7b7cc0822d2a41f2571fa98c9fc87
241.9 kB Preview Download

Additional details

Related works

Cites
Journal article: 10.26562/irjcs.2020.v0708.002 (DOI)
Is cited by
Journal article: http://www.irjcs.com/volumes/Vol7/iss-8/02.AUCS10083.pdf (URL)

References

  • 1. Christopher Bishop, Pattern Recognition and Machine Learning, Springer, 2006.
  • 2. Daniel Jurafsky and James H.Martin ,Speech and Language Processing, Pearson Education, 2009.
  • 3. Pietrzykowski, Marcin & Sałabun, Wojciech. (2014). Applications of Hidden Markov Model: state-of-the-art.
  • 4. L. R.Rabiner, B. H. Juang .IEEE ASSP magazine.(1986).An Introduction to Hidden Markov Models.
  • 5. Edwin Chang and Stanislaw Zak, An introduction to optimization, Wiley, 2013.
  • 6. J Medhi, Stochastic Processes, New Age International Publishers, 2017.