Chapter 49

Home

Refactoring: Priority Queue Huffman

The core Huffman operation is "find the two smallest frequencies, merge them." Scanning the list each time is O(N) per merge — O(N²) total. A min-heap (priority queue) makes it O(log N) per merge — O(N log N) total.

Before: Linear Scan for Two Smallest

Maintain a list of nodes. Each iteration, scan the entire list to find the two with the smallest frequency, remove them, merge, and add the parent back. O(N) per merge × N merges = O(N²).

C++ — O(N²) linear scan

Node* build_huffman_naive(vector<Node*>& nodes) {
    while (nodes.size() > 1) {
        // Find index of smallest
        int i1 = 0, i2 = 1;
        if (nodes[i1]->freq > nodes[i2]->freq) swap(i1, i2);

        for (int i = 2; i < nodes.size(); i++) {
            if (nodes[i]->freq < nodes[i1]->freq) {
                i2 = i1;
                i1 = i;
            } else if (nodes[i]->freq < nodes[i2]->freq) {
                i2 = i;
            }
        }

        // Merge
        Node* parent = new Node('\0', nodes[i1]->freq + nodes[i2]->freq);
        parent->left = nodes[i1];
        parent->right = nodes[i2];

        // Remove and add back
        nodes.erase(nodes.begin() + max(i1, i2));
        nodes.erase(nodes.begin() + min(i1, i2));
        nodes.push_back(parent);
    }
    return nodes[0];
}

// N = 1000:    ~500K operations
// N = 10⁵:     ~5 billion operations  →  too slow

🚨 O(N²) with vector erasure

erase() on a vector is O(N) — it shifts elements. Combined with the O(N) scan for the two smallest, each merge is O(N). For real inputs (N = 10⁵ symbols), this is unusable. The priority queue avoids all scanning and shifting.

After: Min-Heap (Priority Queue)

Push all nodes into a min-heap. Pop the two smallest (O(log N)), merge them, push the parent back (O(log N)). N merges → O(N log N).

C++ — O(N log N) with priority_queue

Node* build_huffman(map<char, int>& freq) {
    priority_queue<Node*, vector<Node*>, Cmp> pq;

    for (auto& [ch, f] : freq)
        pq.push(new Node(ch, f));

    while (pq.size() > 1) {
        Node* left = pq.top(); pq.pop();   // O(log N)
        Node* right = pq.top(); pq.pop();  // O(log N)

        Node* parent = new Node('\0', left->freq + right->freq);
        parent->left = left;
        parent->right = right;

        pq.push(parent);                   // O(log N)
    }
    return pq.top();
}

// N = 10⁵:  ~1 million heap ops  →  ~0.02 seconds

Rust — O(N log N) with BinaryHeap

use std::collections::BinaryHeap;
use std::cmp::Reverse;

fn build_huffman(freq: &HashMap<char, usize>) -> Node {
    let mut pq: BinaryHeap<Reverse<Box<Node>>> = BinaryHeap::new();

    for (&ch, &f) in freq {
        pq.push(Reverse(Box::new(Node::leaf(ch, f))));
    }

    while pq.len() > 1 {
        let left = pq.pop().unwrap().0;   // O(log N)
        let right = pq.pop().unwrap().0;  // O(log N)

        let parent = Box::new(Node::internal(
            left.freq + right.freq, left, right));

        pq.push(Reverse(parent));          // O(log N)
    }
    pq.pop().unwrap().0
}

💡 The heap pattern in competitive programming

Whenever a problem says "repeatedly find the smallest/largest" or "merge the smallest two," reach for a priority queue. It's the single most common optimisation for greedy algorithms — used in Huffman, Dijkstra, Prim's MST, and running median (Chapter 35).

Key Takeaways

Practice