Chapter 105

Home

KMP Pattern Matching

KMP finds all occurrences of pattern P in text T in O(N+M) — no backtracking of the text pointer. The key is the LPS (Longest Prefix Suffix) array, which tells KMP how far to shift the pattern on a mismatch.

KMP Algorithm

C++ — KMP

vector build_lps(string& pat) {
    int m = pat.size();
    vector lps(m, 0);
    for (int i = 1, len = 0; i < m; ) {
        if (pat[i] == pat[len]) { lps[i++] = ++len; }
        else if (len) len = lps[len - 1];
        else i++;
    }
    return lps;
}

vector kmp(string& text, string& pat) {
    auto lps = build_lps(pat);
    vector matches;

    for (int i = 0, j = 0; i < text.size(); ) {
        if (text[i] == pat[j]) { i++; j++; }
        if (j == pat.size()) {
            matches.push_back(i - j);
            j = lps[j - 1];
        } else if (i < text.size() && text[i] != pat[j]) {
            if (j) j = lps[j - 1];
            else i++;
        }
    }
    return matches;
}

// O(N+M) — each character in text is compared at most twice.

💡 KMP vs built-in find

C++ string::find is O(N×M) worst case. Rust's str::find uses two-way algorithm (O(N+M) average). Use KMP when worst-case O(N+M) is critical.

Key Takeaways