Hidden Markov Model

According to L.R Rabiner et al[1], a hidden Markov model is a doubly embedded stochastic process with an underlying stochastic process that is not observable (it is hidden) but can only be observed through another set of stochastic processes that produce the sequence of observations.

Basically, a hidden Markov model (HMM) is one in which you observe a sequence of emissions, but do not know the sequence of states the model went through to generate the emissions. We analyze the hidden Markov models to recover the sequence of states from the observed data.

Sounds confusing right…

Markov Model and Markov Chain

To understand an HMM, we first need to understand what is a Markov model. Markov model is a stochastic model that is used to model pseudo-randomly changing states. That means that the current state does not depend on the previous state.  The most simple Markov model is the Markov chain. In this, the current state depends only on the immediately previous state.

a Markov chain
Example of a Markov chain

Let’s take the help of an example to understand what a HMM and a Markov chain are…….

Rob participates in a game in which a prize is handed out after a dart hits the bullseye. The prizes are kept in 3 different boxes(red, green, and blue) behind a screen. Now the person handing the prizes out gives out a  prize depending on if he is stuck in traffic in the morning or not. These are called states. Now since rob knows the general trend in the traffic of the area, he can model it as a Markov chain. But he does not have any definite information about the traffic since he cannot observe them directly. They are hidden from him. He also knows that the prize will be picked from the green, red, or blue box. These are called observations. Since he cannot observe the correlation between the states and the observations, the system is that of an HMM.

Representation of a Hidden Markov Model
Hidden Markov Model

Transition Probabilities

The transition probability is the probability of moving from one state to another. In this case, if today we had traffic, there is a 55% chance that there will be traffic tomorrow and a 45% chance that there won’t be traffic tomorrow. Similarly, if we didn’t have traffic today, there is a 35% chance that there won’t be traffic tomorrow and a 65% chance that there will be traffic tomorrow.

Emission Probabilities

They are the output probabilities that can be observed by the observer. Here, the probabilities connecting the states and the observations are emission probabilities.

Representation of a HMM in python
Representation of an HMM in python

Assumptions about Hidden Markov Model

Markov Assumption

Since the HMM is an augmented form of the Markov Model, the following assumption holds true:

The future state is going to be dependent only on the current state. It is expressed as follows mathematically:

P(qi = a|q1…qi−1) = P(qi = a|qi−1)
Markov Assumption

Output independance

The probability of an output observation(oi) depends only on the state that produced the observation qi and not on any other states or any other observations. It is expressed as follows:

P(oi|q1...qi.......qT,o1...oi......oT)= P(oi|qi)
Output independance

Where is an Hidden Markov Model used?

  • Analysis of biological sequences, particularly DNA[2]
  • Time series analysis
  • Handwriting recognition
  • Speech Recognition[1]

To get insights on HMM algorithms please refer to a detailed post here.

Advantages of HMM

  • HMM taggers, used in NLP are very simple to train (just need to compile counts from the training corpus).
  • Performs relatively well (over 90% performance on named entities).
  • It eliminates the label bias problem
  • Because each HMM uses only positive data, they scale well; since new words can be added without affecting learned HMMs.
  • We can initialize the model close to something believed to be correct.

Disadvantages of HMM

  • In order to define joint probability over observation and label sequence HMM needs to enumerate all possible observation sequences.
  • It is not practical to represent multiple overlapping features and long term dependencies.
  • Number of parameters to be evaluated is huge. So it needs a large data set for training.
  • HMM training involves maximizing the observed probabilities for examples belonging to a class. But it does not minimize the probability of observation of instances from other classes since it uses only positive data .
  • It adopts the Markovian assumption which does not map well to many real-world domains


[1]L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” in Proceedings of the IEEE, vol. 77, no. 2, pp. 257-286, Feb. 1989, doi: 10.1109/5.18626.
[2]What is a hidden Markov model?. Nat Biotechnol 22, 1315–1316 (2004). https://doi.org/10.1038/nbt1004-1315

Similar Posts

Leave a Reply

Your email address will not be published.