Comparison of exact matching algorithms
Notation: n is the (total) length of the pattern(s), m is the
length of the text
- Z algorithm: linear time, at most n matches and
n mismatches
- Boyer-Moore
- Using only the bad-character shift rule, the average case on random strings
is sublinear
- Preprocessing can be done in linear time using the Z algorithm
- Linear if the pattern doesn't occur in the text (at most 3m character
comparisons)
- If the pattern does occur the worst-case time is proportional to m*n,
but Galil's modified version runs in linear time
- The Apostolico-Giancarlo variation has a 2m bound on the number of character
comparisons
- Knuth-Morris-Pratt
- Preprocessing can be done in linear time using the Z algorithm
- The number of character comparisons is at most 2m
- A real-time version exists, assuming the alphabet size is finite
- Aho-Corasick: Preprocessing is linear in
the total length of the patterns and search is linear in the length of the
text plus the number of matches found (since more than one match can be found
at each location)