next up previous
Next: Calculating (j) Up: Three Problems for HMMs Previous: Three Problems for HMMs

Solving Problem 1: The Evaluation Problem

One method is to use the law of total probability and condition on all possible state sequences, I, of length T: P(O|$ \lambda$) = $ \sum_{I}^{}$P(O| I,$ \lambda$P(I|$ \lambda$). The probability of a given sequence of states is simple: P(I|$ \lambda$) = $ \pi_{I_{1}}^{}$×aI1, I2×...×aIT - 1, IT. The probability of a given observation sequence given a state sequence is also simple: P(O| I,$ \lambda$) = bI1(O1)×...×bIT(OT). But the number of state sequences is exponential, so this is a poor method.

The standard approach is to use the forward-backward procedure, which takes advantage of the Markov property, namely that the path to a state is irrelevant to the future behavior of the process after being in that state. First we define

$\displaystyle \alpha_{t}^{}$(j) = P(O1,..., Ot, It = j|$\displaystyle \lambda$),

which is the probability of the process producing the first t observations and of ending up in state j at time t. P(O|$ \lambda$) is easily calculated from $ \alpha_{T}^{}$.

To see this, we first apply the chain rule and get $ \alpha_{t}^{}$(j) = P(It = j|$ \lambda$P(O1,..., Ot| It = j,$ \lambda$). Next we note that the law of total probability tells us that

P(O1,... Ot|$\displaystyle \lambda$) = $\displaystyle \sum_{j\in Q}^{}$$\displaystyle \left[\vphantom{ P(O_{1},\ldots ,O_{t}\vert I_{t}=j,\lambda )\times P(I_{t}=j\vert\lambda )}\right.$P(O1,..., Ot| It = j,$\displaystyle \lambda$P(It = j|$\displaystyle \lambda$)$\displaystyle \left.\vphantom{ P(O_{1},\ldots ,O_{t}\vert I_{t}=j,\lambda )\times P(I_{t}=j\vert\lambda )}\right]$ = $\displaystyle \sum_{j\in Q}^{}$$\displaystyle \alpha_{t}^{}$(j)

which becomes P(O|$ \lambda$) = $ \sum_{j\in Q}^{}$$ \alpha_{T}^{}$(j) when we want the entire observation considered.

Moverover, $ \alpha_{T}^{}$ is easily calculated iteratively:

1. $\displaystyle \alpha_{1}^{}$(i) = $\displaystyle \pi_{i}^{}$×bi(O1)
2. $\displaystyle \alpha_{t+1}^{}$(i) = $\displaystyle \left[\vphantom{ \sum _{j\in Q}\alpha _{t}(j)\times a_{j,i}}\right.$$\displaystyle \sum_{j\in Q}^{}$$\displaystyle \alpha_{t}^{}$(jaj, i$\displaystyle \left.\vphantom{ \sum _{j\in Q}\alpha _{t}(j)\times a_{j,i}}\right]$×bi(Ot + 1)

Notice that calculating P(O|$ \lambda$) requires on the order of N×T calculations using $ \alpha_{T}^{}$ as compared to requiring an exponential number of calculations.

$ \alpha_{t}^{}$ represents the forward part, and $ \beta_{t}^{}$ represents the backward part. Let

$\displaystyle \beta_{t}^{}$(j) = P(Ot + 1, Ot + 2,..., OT| It = j,$\displaystyle \lambda$),

which is the probability of the process being in state j at time t and also of producing the observations from time t + 1 to T.

We can also use $ \beta$ to determine P(O|$ \lambda$).

P(O1,..., OT|$\displaystyle \lambda$) = P(O1|$\displaystyle \lambda$P(O2,..., OT| O1,$\displaystyle \lambda$).

First we calculate P(O1|$ \lambda$):

P(O1|$\displaystyle \lambda$) = $\displaystyle \sum_{j\in Q}^{}$$\displaystyle \left[\vphantom{ P(O_{1}\vert I_{1}=j,\lambda )\times P(I_{1}=j\vert\lambda )}\right.$P(O1| I1 = j,$\displaystyle \lambda$P(I1 = j|$\displaystyle \lambda$)$\displaystyle \left.\vphantom{ P(O_{1}\vert I_{1}=j,\lambda )\times P(I_{1}=j\vert\lambda )}\right]$ = $\displaystyle \sum_{j\in Q}^{}$$\displaystyle \left[\vphantom{ b_{j}(O_{1})\times \pi _{j}}\right.$bj(O1$\displaystyle \pi_{j}^{}$$\displaystyle \left.\vphantom{ b_{j}(O_{1})\times \pi _{j}}\right]$.

Next, we calculate P(O2,..., OT| O1,$ \lambda$):

P(O2,..., OT| O1,$\displaystyle \lambda$) = $\displaystyle \sum_{j\in Q}^{}$$\displaystyle \left[\vphantom{ P(O_{2},\ldots ,O_{T}\vert O_{1},I_{1}=j,\lambda )\times P(I_{1}=j\vert\lambda )}\right.$P(O2,..., OT| O1, I1 = j,$\displaystyle \lambda$P(I1 = j|$\displaystyle \lambda$)$\displaystyle \left.\vphantom{ P(O_{2},\ldots ,O_{T}\vert O_{1},I_{1}=j,\lambda )\times P(I_{1}=j\vert\lambda )}\right]$.

But once we condition on I1 = j then we can ignore the conditions O1 because knowing the obsevation produced at time 1 gives us no more information once we know the state at time 1. Thus, we get the simplification

P(O2,..., OT| O1,$\displaystyle \lambda$) = $\displaystyle \sum_{j\in Q}^{}$$\displaystyle \left[\vphantom{ P(O_{2},\ldots ,O_{T}\vert I_{1}=j,\lambda )\times P(I_{1}=j\vert\lambda )}\right.$P(O2,..., OT| I1 = j,$\displaystyle \lambda$P(I1 = j|$\displaystyle \lambda$)$\displaystyle \left.\vphantom{ P(O_{2},\ldots ,O_{T}\vert I_{1}=j,\lambda )\times P(I_{1}=j\vert\lambda )}\right]$ = $\displaystyle \sum_{j\in Q}^{}$$\displaystyle \left[\vphantom{ \beta _{1}(j)\times \pi _{j}}\right.$$\displaystyle \beta_{1}^{}$(j$\displaystyle \pi_{j}^{}$$\displaystyle \left.\vphantom{ \beta _{1}(j)\times \pi _{j}}\right]$.

Putting these together, we get

P(O1,..., OT|$\displaystyle \lambda$) = $\displaystyle \sum_{j\in Q}^{}$$\displaystyle \left[\vphantom{ b_{j}(O_{1})\times \pi _{j}}\right.$bj(O1$\displaystyle \pi_{j}^{}$$\displaystyle \left.\vphantom{ b_{j}(O_{1})\times \pi _{j}}\right]$×$\displaystyle \sum_{j\in Q}^{}$$\displaystyle \left[\vphantom{ \beta _{1}(j)\times \pi _{j}}\right.$$\displaystyle \beta_{1}^{}$(j$\displaystyle \pi_{j}^{}$$\displaystyle \left.\vphantom{ \beta _{1}(j)\times \pi _{j}}\right]$.

$ \beta_{t}^{}$ is also calculated easily.

1. $\displaystyle \beta_{T}^{}$(i) = 1
2. $\displaystyle \beta_{t}^{}$(i) = $\displaystyle \sum_{j\in Q}^{}$ai, j×bj(Ot + 1$\displaystyle \beta_{t+1}^{}$(j)

There are a few useful quantities that we can calculate with $ \alpha$ and $ \beta$ that we now note.



Subsections
next up previous
Next: Calculating (j) Up: Three Problems for HMMs Previous: Three Problems for HMMs
Webmaster: Jim Schmolze, schmolze@eecs.tufts.edu.  [Get postscript version]