Backward Algorithm
Brief introduction to backward algorithm
The backward algorithm is the complement of the forward algorithm. Let's introduce the backward variable $\beta_t(i)$. This is the probability of being in $S_i$ at time $t$ abd observing the partial sequence $O_{t+1},\cdots,O_T$ [1]. This can be written as
$$\beta_t(i) = P(O_{t+1},\cdots,O_T | q_t=S_i, \lambda)$$
The backward algorithm computes this recursively
- Initialize
$$\beta_T(i) = 1$$
- Recurse
$$\beta_t(i) = P(O_{t+1},\cdots,O_T | q_t=S_i, \lambda)$$
which can be written as, see [1],
$$\beta_{t}(i) = \sum_{j}^{N}b_j(O_{t+1})a_{i,j}\beta_{t+1}(j)$$
Let's implement this as we did for the forward algorithm.
Assume a system with two states $S=\{S_0, S_1 \}$. Futher, assume that the observation sequence consists of elements from the following set $V=\{a, b,c\}$. Also let's assume the following HMM:
$$\boldsymbol{\pi}=\begin{bmatrix}0.6 & 0.4 \end{bmatrix}$$
$$\mathbf{A}=\begin{bmatrix}0.7 & 0.3 \\ 0.4 & 0.6 \end{bmatrix}$$
$$\mathbf{B}=\begin{bmatrix} 0.5 & 0.4 & 0.1 \\ 0.1 & 0.3 & 0.6\end{bmatrix}$$
Assume the following sequence $V=\{a, b, c \}$. We introduce the $\beta$ matrix:
$$ \beta =\begin{bmatrix}0 & 0 \\ 0 & 0 \\ 0 & 0 \end{bmatrix}$$
First we initialize
$$ \beta =\begin{bmatrix}0 & 0 \\ 0 & 0 \\ 1 & 1 \end{bmatrix}$$
Then use the recursion formula
$$\beta_{t}(i) = \sum_{j}^{N}b_j(O_{t+1})a_{i,j}\beta_{t+1}(j)$$
import numpy as np
obs_to_idx = {'a':0, 'b': 1, 'c':2}
# state to index map
state_to_idx = {'S0':0, 'S1':1}
pi = np.array([0.6, 0.4])
# transition probabilities
A = np.array([[0.7, 0.3],
[0.4, 0.6]])
# emission probabilties
B = np.array([[0.5, 0.4, 0.1],
[0.1, 0.3, 0.6]])
o = ['a', 'b', 'c']
beta = np.zeros(shape=(len(o),A.shape[0]))
beta[len(o) - 1] = np.ones(A.shape[0])
# start at the position one before the end
# proceed until t is -1. Move back one step at the time
for t in range(len(o)-2, -1, -1):
for j in range(A.shape[0]):
# for x = np.array([1.,2.]) and
# y = np.array([1.,2.])then
# x*y = np.array([1., 4.])
# that is element-wise product is performed
beta[t, j] = (beta[t + 1] * B[:, obs_to_idx[o[t + 1]]]).dot(A[j, :])
print(beta)
Overall the forward and backward algorithms can be used to compute $P(O|\lambda)$. Indeed using the forward algorithm we have:
$$P(O|\lambda) = \sum_{i}^{N}\alpha_{T}(i)$$
whilst using the backward algorithm
$$P(O|\lambda) = \sum_{i}^{N} \pi_i b_i(O_1)\beta_{1}(i)$$
The following video nicely explains the motivation behind the backward algorithm
from IPython.display import YouTubeVideo
YouTubeVideo('EbxLWGw2zJ4', width=800, height=300)
The following video nicely explains both the forward and backward algorithms.
from IPython.display import YouTubeVideo
YouTubeVideo('gYma8Gw38Os', width=800, height=300)
- Ethem Alpaydin,
Introduction To Machine Learning, Second Edition
, MIT Press. - Forward–backward algorithm.