Find Median from Data Stream
Design a data structure that supports the following operations:
add_num(int num): Adds a number to the current data stream.
find_median(): Returns the median of all numbers added so far.
Median is the middle value in a sorted list. If the list is even-sized, the median is the average of the two middle elements.
Example
Input: ["MedianFinder", "add_num", "add_num", "add_num", "find_median", "add_num", "find_median"] [[], [5], [15], [1], [], [3], []] Output: [None, None, None, None, 5.0, None, 4.0]
Two heaps solution
Here we use two heaps (priority queues) to keep track of the lower and upper halves of the data.
A max heap for the lower half of the numbers and a min heap for the upper half.
The size difference between the heaps is at most 1, and All elements in small ≤ all elements in large, we can directly access the median using just the tops (roots) of these two heaps.
Algorithm
Initialize two heaps:
maxHeap for the lower half (all elements ≤ median).
minHeap for the upper half (all elements > median).
For each incoming number num:
If num is less than or equal to the root of maxHeap, add it to maxHeap. Otherwise, add it to minHeap.
If the size difference between the heaps exceeds 1, move the root of the larger heap to the smaller heap.
To find the median:
If both heaps have the same size, return the average of their roots.
If maxHeap has one more element, return its root.
If minHeap has one more element, return its root.
Let's try to understand this with an example data stream.
Insert 5:
maxHeap:[5]
minHeap:[]Median:
5(root of maxHeap)
Insert 15:
15 > 5, so add to minHeap.
maxHeap:[5]
minHeap:[15]Median:
(5 + 15) / 2 = 10
Insert 1:
1 ≤ 5, so add to maxHeap.
maxHeap:[5, 1]
minHeap:[15]Size difference =
2, so move5(root of maxHeap) tominHeap.
maxHeap:[1]
minHeap:[5, 15]Median:
5(root ofminHeap, since it has one more element)
Insert 3:
3 > 1, so add tominHeap.
maxHeap:[1]
minHeap:[3, 5, 15]Size difference =
2, so move3(root ofminHeap) tomaxHeap.
maxHeap:[3, 1]
minHeap:[5, 15]Median:
(3 + 5) / 2 = 4
Insert 2:
2 ≤ 3, so add tomaxHeap.
maxHeap:[3, 1, 2]
minHeap:[5, 15]Size difference =
1, no rebalancing needed.Median:
3(root ofmaxHeap)
Insert 8:
8 > 3, so add to minHeap.
maxHeap:[3, 1, 2]
minHeap:[5, 15, 8]Median:
(3 + 5) / 2 = 4
Insert 7:
7 > 3, so add tominHeap.
maxHeap:[3, 1, 2]
minHeap:[5, 7, 8, 15]Size difference =
1, no rebalancing needed.Median:
5(root ofminHeap)
import heapq
class MedianFinder:
def __init__(self):
self.small = []
self.large = []
heapq.heapify(self.small)
heapq.heapify(self.large)
def add_num(self, num: int) -> None:
if self.large and num > self.large[0]:
heapq.heappush(self.large, num)
else:
heapq.heappush(self.small, -num)
if len(self.small) > len(self.large) + 1:
el = -heapq.heappop(self.small)
heapq.heappush(self.large, el)
if len(self.large) > len(self.small):
el = heapq.heappop(self.large)
heapq.heappush(self.small, -el)
def find_median(self) -> float:
len1, len2 = len(self.small), len(self. large)
if len1 == len2:
return ((-self.small[0]) + self.large[0])/2
else:
return -self.small[0]Time Complexity: Insertion: O(log n) due to heap operations (push/pop).
Median Access: O(1) since we only need the roots.
Space Complexity: O(n) to store all elements in the two heaps.