The Viterbi algorithm
Introduction to the Viterbi algorithm
The backward and forward algorithms can be used to compute $P(O|\lambda)$. In this notebook we are interested in computing the most likely path given a sequence $O$ and a hidden Markov model $\lambda$. The Viterbi algorithm gives us a way to do so. The Viterbi algorithm is a dynamic programming algorithm for finding the most likely sequence of hidden states, also called the Viterbi path, that results in a sequence of observed events, especially in the context of Markov information sources and hidden Markov models (HMM) [2]. The algorithm uses a maximum operation instead of the sum. The operation of Viterbi's algorithm can be visualized by means of a trellis diagram [2]. It is essentially the shortest path through this trellis.
Given a state sequence $Q=q_1q_2,\cdots,q_T$ and an observation sequence $O=O_1O_2,\cdots,O_T$ we dfine the variable $\delta_t(i)$ as the probability of the highest probability path at time $t$ that accounts for the first $t$ observations and ends in $S_i$ [1]:
$$\delta_t(i) = max_{Q}p(q_1q_2,\cdots,q_t=S_i,O_1O_2,\cdots,O_t|\lambda)$$
Then we can recursively calculate $\delta_{t+1}(i)$ and the optimal path can be read by backtracking from $T$ , choosing the most probable at each instant. The algorithm is as follows [1]:
- Initialize
$$\delta_1(i) = \pi_i b_i(O_1)$$
This is initialization is the same as in the forward algorithm. To retrieve the state sequence we also need to keep track of the argument which maximized for each $t$ and $j$. We therefore use the array $\psi$, and in the initialization step the first $\psi$ variable of every state will be equal to 0 because no specific argument coming from the initial probability maximized the value of the first state.
$$\psi_1(i) = 0$$
- Recurse
$$\delta_t(j) = max_{i=1}^{N}\delta_{t-1}(i)a_{ij}b_j(O_t)$$
$$\psi_t(j) = argmax_{i=1}^{N}\delta_{t-1}(i)a_{ij}$$
- Termination
$$p^{*} = max_i \delta_T(i)$$
$$q^{*}_T = arg max_i \delta_T(i)$$
- Path backtracking
$$q_t{*} = \psi_{t+1}(q^{*}_{t+1}), t= T-1, T-2, \cdots, 1$$
$\psi_t (j)$ keeps track of the state that maximizes $\delta_t(j)$ at time $t-1$, that is, the best previous state. The Viterbi algorithm has the same complexity with the forward phase, where instead of the sum, we take the maximum at each step [1].
Let's see an example applying the Viterbi algorithm. The example is taken from [2]. Some coding hints from Implement Viterbi Algorithm in Hidden Markov Model using Python and R have also been used.
import numpy as np
obs_to_idx = {'normal':0, 'cold': 1, 'dizzy':2}
# state to index map
state_to_idx = {'Healthy':0, 'Fever':1}
pi = np.array([0.6, 0.4])
# transition probabilities
A = np.array([[0.7, 0.3],
[0.4, 0.6]])
# emission probabilties
B = np.array([[0.5, 0.4, 0.1],
[0.1, 0.3, 0.6]])
o = ['normal', 'cold', 'dizzy']
delta = np.zeros(shape=(len(o), A.shape[0]))
previous = np.zeros((len(o)-1, A.shape[0]))
for st in state_to_idx:
state_idx = state_to_idx[st]
delta[0][state_idx] = pi[state_idx] * B[state_idx][obs_to_idx[o[0]]]
print(delta)
for t in range(1, len(o)):
obs_idx = obs_to_idx[o[t]]
for i in state_to_idx:
i_st_idx = state_to_idx[i]
probs=[]
for j in state_to_idx:
j_st_idx = state_to_idx[j]
probs.append(delta[t - 1][j_st_idx]*A[j_st_idx][i_st_idx]*B[i_st_idx][obs_idx])
# This is our most probable state given previous state at time t (1)
previous[t-1, i_st_idx] = np.argmax(probs)
delta[t, i_st_idx] = np.max(probs)
# Path Array
S = np.zeros(len(o))
# Find the most probable last hidden state
last_state = np.argmax(delta[len(o) - 1, :])
S[0] = last_state
backtrack_index = 1
for i in range(len(o) - 2, -1, -1):
S[backtrack_index] = previous[i, int(last_state)]
last_state = previous[i, int(last_state)]
backtrack_index += 1
# Flip the path array since we were backtracking
S = np.flip(S, axis=0)
# Convert numeric values to actual hidden states
path = []
for s in S:
if s == 0:
path.append("Healthy")
else:
path.append("Fever")
print("Path is ", path)
Another way to compute the $\delta$ matrix is the following more Pythonic way, taken from Implement Viterbi Algorithm in Hidden Markov Model using Python and R. Note the use of the log function.
delta = np.zeros(shape=(len(o), A.shape[0]))
previous = np.zeros((len(o)-1, A.shape[0]))
for st in state_to_idx:
state_idx = state_to_idx[st]
delta[0, :] = np.log(pi * B[:, obs_to_idx[o[0]]])
print(delta)
for t in range(1, len(o)):
obs_idx = obs_to_idx[o[t]]
for i in state_to_idx:
i_st_idx = state_to_idx[i]
# Same as Forward Probability
probability = delta[t - 1] + np.log(A[:, i_st_idx]) + np.log(B[i_st_idx, obs_idx])
# This is our most probable state given previous state at time t (1)
previous[t - 1, i_st_idx] = np.argmax(probability)
# This is the probability of the most probable state (2)
delta[t, i_st_idx] = np.max(probability)
# Path Array
S = np.zeros(len(o))
# Find the most probable last hidden state
last_state = np.argmax(delta[len(o) - 1, :])
S[0] = last_state
backtrack_index = 1
for i in range(len(o) - 2, -1, -1):
S[backtrack_index] = previous[i, int(last_state)]
last_state = previous[i, int(last_state)]
backtrack_index += 1
# Flip the path array since we were backtracking
S = np.flip(S, axis=0)
# Convert numeric values to actual hidden states
path = []
for s in S:
if s == 0:
path.append("Healthy")
else:
path.append("Fever")
print("Path is ", path)
The following video provides a motivation behind the use of the Viterbi algorithm
from IPython.display import YouTubeVideo
YouTubeVideo('MPeedE6Odj0', width=800, height=300)
The following video provides nice description of the Viterbi algorithm
from IPython.display import YouTubeVideo
YouTubeVideo('s9dU3sFeE40', width=800, height=300)
- Ethem Alpaydin,
Introduction To Machine Learning, Second Edition
, MIT Press. - Viterbi algorithm, Wikipedia.