Chapter 91

Refactoring: Fast Matrix Power

Naive matrix multiplication is O(K³). For Fibonacci (K=2) that's fine. For K=100, it's 10⁶ operations per multiply. Cache-efficient loop ordering (i-k-j) gives 2-3× speedup. Skip zeros for sparse matrices.

Before: Standard i-j-k Loops

C++ — cache-unfriendly loop order

for (int i = 0; i < n; i++)
    for (int j = 0; j < n; j++)
        for (int k = 0; k < n; k++)  // B[k][j] is not contiguous

After: Cache-Friendly i-k-j Ordering

C++ — i-k-j with zero-skip

Matrix mul(Matrix& A, Matrix& B) {
    int n = A.size();
    Matrix C(n, vector(n, 0));
    for (int i = 0; i < n; i++) {
        for (int k = 0; k < n; k++) {
            if (A[i][k] == 0) continue;  // skip zeros
            for (int j = 0; j < n; j++)
                C[i][j] = (C[i][j] + A[i][k] * B[k][j]) % MOD;
        }
    }
    return C;
}
// i-k-j: B[k][j] accessed row-wise → cache hits
// Zero-skip: avoids unnecessary multiplications

Key Takeaways

→Loop order i-k-j is cache-friendlier than i-j-k.
→Skip zeros for sparse matrices (companion matrices are very sparse).