String Searching
Authors: Benjamin Qi, Siyong Huang
Prerequisites
Knuth-Morris-Pratt and Z Algorithms (and a few more related topics).
| Resources | |||
|---|---|---|---|
| CPC | String Matching, KMP, Tries | ||
| CP2 | |||
Single String
KMP
Knuth-Morris-Pratt, or KMP, is a linear time string comparison algorithm that matches prefixes. Specifically, it computes the longest substring that is both a prefix and suffix of a string, and it does so for every prefix of a given string.
| Resources | |||
|---|---|---|---|
| cp-algo | |||
| PAPS | |||
| GFG | |||
| TC | |||
| Status | Source | Problem Name | Difficulty | Tags | Solution | URL |
|---|---|---|---|---|---|---|
| Kattis | Easy | Show TagsStrings | Show Sketch | |||
| POI | Easy | Show TagsStrings, prefix functions | External Sol | |||
| Baltic OI | Normal | View Solution | ||||
| POJ | Hard | Show TagsStrings | Show Sketch | |||
| CEOI | Hard | View Solution |
Z Algorithm
The Z-Algorithm is another linear time string comparison algorithm like KMP, but instead finds the longest common prefix of a string and all of its suffixes.
| Resources | |||
|---|---|---|---|
| cp-algo | |||
| CPH | |||
| CF | |||
Palindromes
Manacher
Manacher's Algorithm is functionally similarly to the Z-Algorithm and can compute information about palindromes. It can determine the longest palindrome centered at each character.
| Resources | |||
|---|---|---|---|
| HR | |||
| CF | shorter code | ||
| cp-algo | |||
Don't Forget!
Palindromic Tree
A Palindromic Tree is a tree-like data structure that behaves similarly to KMP. Unlike KMP, in which the only empty state is , the Palindromic Tree has two empty states: length , and length . This is because appending a character to a palindrome increases the length by , meaning a single character palindrome must have been created from a palindrome of length
| Resources | |||
|---|---|---|---|
| CF | |||
| adilet.org | |||
Multiple Strings
Tries
A trie is a tree-like data structure that stores strings. Each node is a string, and each edge is a character. The root is the empty string, and every node is represented by the characters along the path from the root to that node. This means that every prefix of a string is an ancestor of that string's node.
| Resources | |||
|---|---|---|---|
| CPH | |||
| CF | |||
| PAPS | |||
Aho-Corasick
Aho-Corasick is the combination of trie and KMP. It is essentially a trie with KMP's "fail" array.
Warning!
Build the entire trie first, and then run a BFS to construct the fail array.
| Resources | |||
|---|---|---|---|
| cp-algo | |||
| CF | |||
| GFG | |||
| Status | Source | Problem Name | Difficulty | Tags | Solution | URL |
|---|---|---|---|---|---|---|
| Gold | Normal | Show TagsStrings | External Sol | |||
| CF | Normal | Show TagsStrings | Check CF |
This section is not complete.
1731 Word Combinations -> trie 1753 String Matching -> string search 1732 Finding Borders -> string search 1733 Finding Periods -> string search 1110 Minimal Rotation -> string search 1111 Longest Palindrome -> string search 1112 Required Substring -> string search
Module Progress:
Join the USACO Forum!
Stuck on a problem, or don't understand a module? Join the USACO Forum and get help from other competitive programmers!