Definition: R(x) is the position of the rightmost occurrence of x in P (or zero)
Example: P = actca, so R(a) = 5, R(c) = 4, R(g) = 0, R(t) = 3
T = actgactaactca P = actca
Bad character shift rule: If the first mismatch is found at position i (when scanning the pattern right to left) and T(k) is the text character that mismatches P(i), then shift the pattern right by max(1, i - R(T(k))
In the above example, i = k = 4 and T(k) = g, so the shift is 4 - R(g) = 4
Extended bad character shift rule: shift so that the next
occurrence of T(k) in P (scanning left) is matched to T(k)
Preprocessing requires space proportional to the pattern length rather than
the alphabet size
In the above example, the first shift results in
T = actgactaactca P = actca
in either case, but now the first rule would only give a second shift of max(1, 4-5) = 1
Good suffix rule: If t is the longest suffix of P that matches T in the current position, then P can be shifted so that the previous occurrence of t in P matches T. In fact, it can be required that the character before the previous occurrence of t be different from the character before the occurrence of t as a suffix. If no such previous occurrence of t exists, then the following cases apply:
Note that these cases can be applied even if the length of t is n (a match of P was found in T)
More formally:
L(i) is the largest k < n so that P[i..n] matches a suffix of P[1..k] (or zero if none exists)
L'(i) is the largest k < n so that P[i..n] matches a suffix of P[1..k] and P(i-1) did not match the character before the suffix (or zero if there's no such j)
Nj(P) is the length of the longest suffix of P[1..j] that is also a suffix of P Note: Nj(P) = Zn-j+1(Pr)
l'(i) is the length of the longest suffix of P[i..n] that is also a prefix of P (or zero if none exists)
Theorem 2.2.2: L(i) is the largest j < n so that Nj(P)
>= |P[i..n]| = n - i + 1
L'(i) is the largest j < n so that Nj(P) = |P[i..n]| = n - i +
1
Given this theorem and the above note, one can calculate all the Nj(P) values using the Z (fundamental preprocessing) algorithm, then go through these values in order of j filling in a table of L'(i) = j, where i = n - Nj(P) + 1.
Theorem 2.2.4: l'(i) is the largest j <= |P[i..n]| = n - i + 1 so that Nj(P) = j
Shift due to good suffix rule: If a mismatch occurs at P(i-1) and L'(i) > 0, shift right by n - L'(i)
If L'(i) = 0, shift right by n - l'(i)
When an occurrence of P is found, shift right by n - l'(2)
If P(n) mismatches, shift right by 1