Chapter 108

Home

Refactoring: Double Hashing

Single-hash Rabin-Karp can have collisions (different strings, same hash). Double hashing uses two different moduli (usually 10⁹+7 and 10⁹+9). The probability of both hashes colliding simultaneously is negligible.

Before: Single Hash (O(NM) worst case)

C++ — vulnerable to collisions

const long long MOD = 1e9 + 7;
// One hash, one modulus — ~1/MOD collision probability per comparison

After: Double Hash (negligible collision risk)

C++ — double hash structure

struct DoubleHash {
    const long long M1 = 1e9 + 7, M2 = 1e9 + 9, B = 911382323;
    vector> pow, h;

    DoubleHash(string& s) {
        int n = s.size();
        pow.resize(n + 1, {1, 1});
        h.resize(n + 1, {0, 0});
        for (int i = 0; i < n; i++) {
            pow[i+1].first = pow[i].first * B % M1;
            pow[i+1].second = pow[i].second * B % M2;
            h[i+1].first = (h[i].first * B + s[i]) % M1;
            h[i+1].second = (h[i].second * B + s[i]) % M2;
        }
    }

    pair get(int l, int r) {
        long long v1 = (h[r+1].first - h[l].first * pow[r-l+1].first % M1 + M1) % M1;
        long long v2 = (h[r+1].second - h[l].second * pow[r-l+1].second % M2 + M2) % M2;
        return {v1, v2};
    }

    bool equal(int l1, int r1, int l2, int r2) {
        return get(l1, r1) == get(l2, r2);
    }
};

// Collision probability: 1/(M1×M2) ≈ 10⁻¹⁸ — practically impossible.

Key Takeaways