Chapter 49
HomeThe core Huffman operation is "find the two smallest frequencies, merge them." Scanning the list each time is O(N) per merge — O(N²) total. A min-heap (priority queue) makes it O(log N) per merge — O(N log N) total.
Maintain a list of nodes. Each iteration, scan the entire list to find the two with the smallest frequency, remove them, merge, and add the parent back. O(N) per merge × N merges = O(N²).
Node* build_huffman_naive(vector<Node*>& nodes) {
while (nodes.size() > 1) {
// Find index of smallest
int i1 = 0, i2 = 1;
if (nodes[i1]->freq > nodes[i2]->freq) swap(i1, i2);
for (int i = 2; i < nodes.size(); i++) {
if (nodes[i]->freq < nodes[i1]->freq) {
i2 = i1;
i1 = i;
} else if (nodes[i]->freq < nodes[i2]->freq) {
i2 = i;
}
}
// Merge
Node* parent = new Node('\0', nodes[i1]->freq + nodes[i2]->freq);
parent->left = nodes[i1];
parent->right = nodes[i2];
// Remove and add back
nodes.erase(nodes.begin() + max(i1, i2));
nodes.erase(nodes.begin() + min(i1, i2));
nodes.push_back(parent);
}
return nodes[0];
}
// N = 1000: ~500K operations
// N = 10⁵: ~5 billion operations → too slow
🚨 O(N²) with vector erasure
erase() on a vector is O(N) — it shifts elements. Combined with
the O(N) scan for the two smallest, each merge is O(N). For real inputs
(N = 10⁵ symbols), this is unusable. The priority queue avoids all scanning
and shifting.
Push all nodes into a min-heap. Pop the two smallest (O(log N)), merge them, push the parent back (O(log N)). N merges → O(N log N).
Node* build_huffman(map<char, int>& freq) {
priority_queue<Node*, vector<Node*>, Cmp> pq;
for (auto& [ch, f] : freq)
pq.push(new Node(ch, f));
while (pq.size() > 1) {
Node* left = pq.top(); pq.pop(); // O(log N)
Node* right = pq.top(); pq.pop(); // O(log N)
Node* parent = new Node('\0', left->freq + right->freq);
parent->left = left;
parent->right = right;
pq.push(parent); // O(log N)
}
return pq.top();
}
// N = 10⁵: ~1 million heap ops → ~0.02 seconds
use std::collections::BinaryHeap;
use std::cmp::Reverse;
fn build_huffman(freq: &HashMap<char, usize>) -> Node {
let mut pq: BinaryHeap<Reverse<Box<Node>>> = BinaryHeap::new();
for (&ch, &f) in freq {
pq.push(Reverse(Box::new(Node::leaf(ch, f))));
}
while pq.len() > 1 {
let left = pq.pop().unwrap().0; // O(log N)
let right = pq.pop().unwrap().0; // O(log N)
let parent = Box::new(Node::internal(
left.freq + right.freq, left, right));
pq.push(Reverse(parent)); // O(log N)
}
pq.pop().unwrap().0
}
💡 The heap pattern in competitive programming
Whenever a problem says "repeatedly find the smallest/largest" or "merge the smallest two," reach for a priority queue. It's the single most common optimisation for greedy algorithms — used in Huffman, Dijkstra, Prim's MST, and running median (Chapter 35).