Chapter 35

Home

Refactoring: Running Median

After each number in a stream, output the median so far. The naive approach re-sorts every time — O(N²). With two heaps, it's O(N log N) and one of the most elegant patterns in competitive programming.

The Problem

Given a stream of numbers, after each insertion, output the current median. If there's an even count, output the lower median (the smaller of the two middle values).

Stream: 5  15  1  3
After 5:   median = 5
After 15:  median = 5   (sorted: [5, 15])
After 1:   median = 5   (sorted: [1, 5, 15])
After 3:   median = 3   (sorted: [1, 3, 5, 15], even → lower = 3)

Before: Insert and Re-sort

The naive approach: maintain a sorted list. Each insertion requires finding the right position (O(N) binary search) then shifting elements (O(N)). That's O(N²) total.

C++ — O(N²) insert-and-shift

vector<double> running_median_naive(vector<int>& stream) {
    vector<int> sorted;
    vector<double> medians;

    for (int x : stream) {
        // Find insertion point (O(log N) — binary search)
        auto pos = lower_bound(sorted.begin(), sorted.end(), x);
        // Shift elements right (O(N))
        sorted.insert(pos, x);

        int n = sorted.size();
        if (n % 2 == 1) {
            medians.push_back(sorted[n / 2]);
        } else {
            medians.push_back(sorted[n / 2 - 1]);  // lower median
        }
    }
    return medians;
}

Rust — O(N²) insert-and-shift

fn running_median_naive(stream: &[i32]) -> Vec<f64> {
    let mut sorted: Vec<i32> = Vec::new();
    let mut medians = Vec::new();

    for &x in stream {
        let pos = sorted.binary_search(&x).unwrap_or_else(|e| e);
        sorted.insert(pos, x);  // O(N) shift

        let n = sorted.len();
        if n % 2 == 1 {
            medians.push(sorted[n / 2] as f64);
        } else {
            medians.push(sorted[n / 2 - 1] as f64);  // lower
        }
    }
    medians
}

🚨 O(N²) is the cost of shifting

vector::insert / Vec::insert shifts every element after the insertion point right by one. For a stream of 10⁵ elements, that's ~5 × 10⁹ memory moves. A heap-based approach avoids shifting entirely.

After: Two-Heap Median

The insight: the median divides the data into two halves. A max-heap stores the lower half (largest on top), and a min-heap stores the upper half (smallest on top). The median is always at the top of one of the heaps.

Lower Half Max-Heap (top = largest) [1] [3] [5] ← → Upper Half Min-Heap (top = smallest) [15] [20] Median = max-heap.top() = 5

C++ — O(N log N) two-heap

vector<double> running_median(vector<int>& stream) {
    priority_queue<int> lower;  // max-heap
    priority_queue<int, vector<int>, greater<int>> upper;  // min-heap
    vector<double> medians;

    for (int x : stream) {
        // Insert: always push to lower first
        lower.push(x);

        // Balance: lower's largest must ≤ upper's smallest
        if (!upper.empty() && lower.top() > upper.top()) {
            upper.push(lower.top());
            lower.pop();
        }

        // Size balance: lower can have at most one extra element
        if (lower.size() > upper.size() + 1) {
            upper.push(lower.top());
            lower.pop();
        } else if (upper.size() > lower.size()) {
            lower.push(upper.top());
            upper.pop();
        }

        // Median is top of lower (or average if equal size — we output lower)
        medians.push_back(lower.top());
    }
    return medians;
}

Rust — O(N log N) two-heap

use std::collections::BinaryHeap;
use std::cmp::Reverse;

fn running_median(stream: &[i32]) -> Vec<f64> {
    let mut lower: BinaryHeap<i32> = BinaryHeap::new();  // max-heap
    let mut upper: BinaryHeap<Reverse<i32>> = BinaryHeap::new();  // min-heap
    let mut medians = Vec::new();

    for &x in stream {
        lower.push(x);

        // Ensure lower's top ≤ upper's top
        if let Some(&Reverse(top_upper)) = upper.peek() {
            if lower.peek().unwrap() > &top_upper {
                upper.push(Reverse(*lower.peek().unwrap()));
                lower.pop();
            }
        }

        // Balance sizes
        if lower.len() > upper.len() + 1 {
            upper.push(Reverse(*lower.peek().unwrap()));
            lower.pop();
        } else if upper.len() > lower.len() {
            lower.push(upper.peek().unwrap().0);
            upper.pop();
        }

        medians.push(*lower.peek().unwrap() as f64);
    }
    medians
}

💡 Why this works

The lower heap contains the smaller half, the upper heap the larger half. lower.top() is the largest element of the lower half — the median when counts are odd, or the lower median when even. Balancing the sizes ensures the median is always accessible in O(1).

Key Takeaways

Practice