Sadly, there is no known analytic method for determining a maximum likelihood model adjustment. Therefore, iterative techniques such as Baum-Welch or gradient descent techniques must be used. We examine only the Baum-Welch method.
Training is particularly difficult since we are given only a set of observation sequences that the process actually produced. We are not given the associated state transitions that occurred. Given those actual state transitions, the model adjustment would be simpler. But without those hidden variables, we must make an educated guess as to the state transitions that occurred.
If we sum
(i) over t, we get the expected number of
times that state i is visited, or equivalently, the number of transitions
made from state i, if we exclude the last time point. Thus, we get the
following.
![]()
(i) =the expected number of transitions
made from state i.
![]()
(i, j) =the expected number of transitions
made from state i to state j.
We now have tools for counting state transitions for model adjustment.
The Baum-Welch reestimation formulas are as follows.
In the numerator, we only include those t such that (i.e., s.t.)
Ot = k, where k is the observation being examined.
When you have a set of L observation sequences, O1,..., OL, we perform a similar estimation but over all the sequences at once. I will add superscripts to all the appropriate symbols to refer to the difference observation sequences.
(I stopped writing here.)