Chapter 103

Home

Suffix Arrays and LCP

A suffix array lists all suffixes of a string in lexicographic order. With the LCP array (longest common prefix between adjacent suffixes), it solves pattern matching, unique substrings, and longest repeated substring in O(N log N) or O(N).

Suffix Array Construction (O(N log N))

C++ — suffix array (sorting-based)

vector build_sa(string& s) {
    int n = s.size();
    vector sa(n), rank(n), tmp(n);

    for (int i = 0; i < n; i++) { sa[i] = i; rank[i] = s[i]; }

    for (int k = 1; k < n; k *= 2) {
        auto cmp = [&](int i, int j) {
            if (rank[i] != rank[j]) return rank[i] < rank[j];
            int ri = i + k < n ? rank[i + k] : -1;
            int rj = j + k < n ? rank[j + k] : -1;
            return ri < rj;
        };
        sort(sa.begin(), sa.end(), cmp);

        tmp[sa[0]] = 0;
        for (int i = 1; i < n; i++)
            tmp[sa[i]] = tmp[sa[i-1]] + cmp(sa[i-1], sa[i]);
        rank.swap(tmp);
    }
    return sa;
}

// O(N log² N) — good enough for most CP (N ≤ 10⁵).
// Use SA-IS for O(N) when N ≥ 10⁶.

💡 KMP vs Suffix Array

KMP finds one pattern in O(N+M). Suffix array finds all patterns, counts occurrences, and handles many string problems. Use suffix array when you need multiple queries on the same text.

Key Takeaways