Local alignment and gaps

Local alignment: Given strings S1 and S2 find A (a substring of S1) and B (a substring of S2) with the best similarity value V(A, B) among all such substrings. Call this value v*.

Local suffix alignment: Assuming V(empty,empty) = 0, let v(i, j) denote the best similarity value among (possibly empty) suffixes of S1[1..i] and S2[1..j].

Theorem: v* = max[ v(i, j) | 1 <= i <= n, 1 <= j <= m] and if this maximum is attained at (i*,j*) then A and B with the bext local suffix alignment for (i*,j*) give the best overall local alignment.

Recurrence: v(i, 0) = 0, v(0, j) = 0, and v(i, j) = max[ 0, v(i-1, j) + s(S1(i),_), v(i, j-1) + s(_,S2(j)), v(i-1, j-1) + s(S1(i), S2(j) ) ]

Definition: A gap in one string of an alignment is a consecutive run of spaces that is bounded on the left and right by letters in the alphabet (or the beginning or end of the string)

Affine gap weight model: Score matches and mismatches as before, but score each gap with cost Wg + qWs, where q is the number of spaces in the gap. For example, in FASTA the default values are Wg = 10 and Ws = 2.

Other possibilities are to use two different affine gap costs, one for short gaps and one for long gaps, or to use a logarithmic gap cost, Wg + log q

Recurrence for affine gap weights: Define E(i, j) to be the value of the best alignment with a gap at the end of S1[1..i], F(i, j) to be the value of the best alignment with a gap at the end of S2[1..j], and G(i, j) to be the value of the best alignment that doesn't end with a gap. Initialize V(i, 0) = E(i, 0) = -Wg - iWs and V(0, j) = F(0, j) = -Wg - jWs (or 0 if gaps at the beginnings of alignments have no cost)

V(i, j) = max[ E(i, j), F(i, j), G(i, j) ]

E(i, j) = max[ E(i, j-1), V(i, j-1) - Wg ] - Ws

F(i, j) = max[ E(i-1, j), V(i-1, j) - Wg ] - Ws

G(i, j) = G(i-1, j-1) + s(S1(i), S2(j) )

If gaps at the ends of alignments have no cost then the solution is found from the maximum value in the last row or column, otherwise the solution is found at V(n, m).