The backward algorithm is the complement of the forward algorithm. Let's introduce the backward variable $\beta_t(i)$. This is the probability of being in $S_i$ at time $t$ abd observing the partial sequence $O_{t+1},\cdots,O_T$ [1]. This can be written as

$$\beta_t(i) = P(O_{t+1},\cdots,O_T | q_t=S_i, \lambda)$$

The backward algorithm computes this recursively

  • Initialize

$$\beta_T(i) = 1$$

  • Recurse

$$\beta_t(i) = P(O_{t+1},\cdots,O_T | q_t=S_i, \lambda)$$

which can be written as, see [1],

$$\beta_{t}(i) = \sum_{j}^{N}b_j(O_{t+1})a_{i,j}\beta_{t+1}(j)$$

Let's implement this as we did for the forward algorithm.

Assume a system with two states $S=\{S_0, S_1 \}$. Futher, assume that the observation sequence consists of elements from the following set $V=\{a, b,c\}$. Also let's assume the following HMM:

$$\boldsymbol{\pi}=\begin{bmatrix}0.6 & 0.4 \end{bmatrix}$$

$$\mathbf{A}=\begin{bmatrix}0.7 & 0.3 \\ 0.4 & 0.6 \end{bmatrix}$$

$$\mathbf{B}=\begin{bmatrix} 0.5 & 0.4 & 0.1 \\ 0.1 & 0.3 & 0.6\end{bmatrix}$$

Assume the following sequence $V=\{a, b, c \}$. We introduce the $\beta$ matrix:

$$ \beta =\begin{bmatrix}0 & 0 \\ 0 & 0 \\ 0 & 0 \end{bmatrix}$$

First we initialize

$$ \beta =\begin{bmatrix}0 & 0 \\ 0 & 0 \\ 1 & 1 \end{bmatrix}$$

Then use the recursion formula

$$\beta_{t}(i) = \sum_{j}^{N}b_j(O_{t+1})a_{i,j}\beta_{t+1}(j)$$

import numpy as np
obs_to_idx = {'a':0, 'b': 1, 'c':2}

# state to index map
state_to_idx = {'S0':0, 'S1':1}
pi = np.array([0.6, 0.4])

# transition probabilities
A = np.array([[0.7, 0.3],
              [0.4, 0.6]])

# emission probabilties
B = np.array([[0.5, 0.4, 0.1],
              [0.1, 0.3, 0.6]])
o = ['a', 'b', 'c']
beta = np.zeros(shape=(len(o),A.shape[0]))
beta[len(o) - 1] = np.ones(A.shape[0])
# start at the position one before the end
# proceed until t is -1. Move back one step at the time
for t in range(len(o)-2, -1, -1):
    for j in range(A.shape[0]):
        
        # for x = np.array([1.,2.]) and 
        # y = np.array([1.,2.])then
        # x*y = np.array([1., 4.])
        # that is element-wise product is performed
        beta[t, j] = (beta[t + 1] * B[:, obs_to_idx[o[t + 1]]]).dot(A[j, :])
print(beta)
[[0.106 0.112]
 [0.25  0.4  ]
 [1.    1.   ]]

Overall the forward and backward algorithms can be used to compute $P(O|\lambda)$. Indeed using the forward algorithm we have:

$$P(O|\lambda) = \sum_{i}^{N}\alpha_{T}(i)$$

whilst using the backward algorithm

$$P(O|\lambda) = \sum_{i}^{N} \pi_i b_i(O_1)\beta_{1}(i)$$

The following video nicely explains the motivation behind the backward algorithm

from IPython.display import YouTubeVideo
YouTubeVideo('EbxLWGw2zJ4', width=800, height=300)

The following video nicely explains both the forward and backward algorithms.

from IPython.display import YouTubeVideo
YouTubeVideo('gYma8Gw38Os', width=800, height=300)
  1. Ethem Alpaydin, Introduction To Machine Learning, Second Edition, MIT Press.
  2. Forward–backward algorithm.