Chapter 58

Home

Edit Distance

Edit distance (Levenshtein distance) measures how many single-character operations (insert, delete, replace) are needed to turn string A into string B. It's used in spell check, DNA alignment, and plagiarism detection.

The Recurrence

dp[i][j] = edit distance between s[0..i) and t[0..j)

if s[i-1] == t[j-1]:   dp[i][j] = dp[i-1][j-1]      // match, no cost
else:                   dp[i][j] = 1 + min(
    dp[i-1][j],       // delete from s
    dp[i][j-1],       // insert into s
    dp[i-1][j-1]      // replace in s
)

C++ — edit distance

int edit_distance(string& s, string& t) {
    int n = s.size(), m = t.size();
    vector> dp(n + 1, vector(m + 1));

    // Base cases: empty strings
    for (int i = 0; i <= n; i++) dp[i][0] = i;  // delete all
    for (int j = 0; j <= m; j++) dp[0][j] = j;  // insert all

    for (int i = 1; i <= n; i++) {
        for (int j = 1; j <= m; j++) {
            if (s[i - 1] == t[j - 1])
                dp[i][j] = dp[i - 1][j - 1];
            else
                dp[i][j] = 1 + min({dp[i - 1][j],
                                    dp[i][j - 1],
                                    dp[i - 1][j - 1]});
        }
    }
    return dp[n][m];
}

Rust — edit distance

fn edit_distance(s: &str, t: &str) -> usize {
    let (n, m) = (s.len(), t.len());
    let s = s.as_bytes();
    let t = t.as_bytes();
    let mut dp = vec![vec![0; m + 1]; n + 1];

    for i in 0..=n { dp[i][0] = i; }
    for j in 0..=m { dp[0][j] = j; }

    for i in 1..=n {
        for j in 1..=m {
            dp[i][j] = if s[i - 1] == t[j - 1] {
                dp[i - 1][j - 1]
            } else {
                1 + dp[i - 1][j].min(
                    dp[i][j - 1].min(dp[i - 1][j - 1]))
            };
        }
    }
    dp[n][m]
}

Key Takeaways

Practice