Priority Queue
What is a priority queue and how to implement it. Plus priority queues use cases and the heapsort algorithm.
Hi Friends,
Welcome to the 63rd issue of the Polymathic Engineer newsletter.
This week, we will focus on priority queues. Most developers use this data structure in their daily jobs, but only a few of them know how it works under the hood.
I still remember being asked this question during an interview many years ago, and being unable to answer was a bit embarrassing.
This will be the outline:
Introduction to priority queues
The heap data structure
How to implement a priority queue using a heap
Priority queues use cases
How to use a heap to sort an array
Posts that made me think
Introduction to priority queues
A priority queue is an abstract data type representing elements as key-value pairs, with the key indicating the priority.
Unlike ordinary queues or arrays that follow a strict first-in-first-out (FIFO) ordering, priority queues organize elements such that the one with the highest (or lowest) priority is always at the front.
Priority queues can be implemented differently, for example, using arrays or binary search trees. However, a heap data structure is the most common and efficient way to implement them.
The heap data structure
A heap is a specialized tree-based data structure that satisfies the heap property. In a max-heap, the property says that the value of a node (except the root) is, at most, the value of its parent. In a min-heap, the property says that the value of a node is at least the value of its parent.
A heap can be visualized as a nearly complete binary tree. This is a tree where every level, except possibly the last, is filled, and all nodes are as far left as possible.
The cool thing is that a heap can be efficiently represented as an array, where each element corresponds to a node in the tree. Assuming that the array is 0-indexed, the following rules apply for each node with index i:
parent(i) = floor(i/2) -1
left_child(i) = 2i +1
right_child(i) = 2i +2
Heaps support several fundamental operations, each critical to the utility of the heap data structure. Let's review these operations assuming that we are dealing with a max-heap.
The Max-Heapify operation is essential for maintaining the max-heap property. It adjusts a heap when a node's children satisfy the heap property, but the node does not. The easiest way to implement it is using recursion.
Each step calculates the largest value among a node and its children. If one of the children is larger than the node, the two are swapped, and the procedure is repeated on the children.
The function runs in O(log n) time, where n is the number of elements in the heap.
def max_heapify(self, i):
left = 2 * i + 1
right = 2 * i + 2
largest = i
if left < self.length and self.array[left] > self.array[largest]:
largest = left
if right < self.length and self.array[right] > self.array[largest]:
largest = right
if largest != i:
self.array[i], self.array[largest] = self.array[largest], self.array[i]
self.max_heapify(largest)
The Build-Max-Heap procedure builds a max-heap from an unordered array. The idea is to execute the Max-Heapify procedure over all the non-leaf nodes.
The function runs in linear time, efficiently creating a heap structure from raw data.
def build_max_heap(self, array):
self.array = array
self.length = len(array)
for i in range((self.length-1) // 2, -1, -1):
self.max_heapify(i)
Implementing priority queues using heaps
Once you have a heap data structure, it’s straightforward to implement the most common priority queue operations.
To extract the maximum element from a priority queue, save the root of the heap and swap it with the rightmost leaf. Then, you execute Max-Heapify on the new root to preserve the max-heap property and return the saved value.
The function runs in constant time.
def extract_max(self):
if self.length < 1:
return None
max_value = self.array[0]
self.array[0] = self.array[self.length - 1]
self.length -= 1
self.max_heapify(0)
return max_value
To insert an element into a priority queue, insert it as the rightmost leaf and bring it up on the tree until the heap property is valid again. The function runs in O(log n) time, where n is the number of elements in the heap.
def insert(self, value):
self.array.append(value)
self.length += 1
i = len(self.array) - 1
while i > 0 and self.array[i] > self.array[(i-1) // 2]:
self.array[i],self.array[(i-1)//2] = self.array[(i-1)//2],self.array[i]
i = (i-1) // 2
Priority queue use cases
Priority queues are extremely useful for many applications that need dynamic data prioritization:
Job Scheduling. In computing environments, priority queues can manage tasks or jobs by prioritizing them based on importance or urgency.
Event Simulation. Priority queues are critical in event-driven simulations that process events in a specific order determined by time or another criterion.
Algorithm Optimization: Many algorithms, including Dijkstra's and Prim's for graph processing, leverage priority queues to enhance efficiency and performance.
Heapsort
Another well-known use case of priority queues is sorting an array using the heapsort algorithm. The main idea is to build a max heap and repeatedly extract the maximum value to construct the sorted array. The sorted array can then be constructed in place by swapping the max element with the rightmost leaf.
def heap_sort(self, array):
if not array:
return []
self.build_max_heap(array)
for i in range(self.length - 1, 0, -1):
self.heap[0], self.heap[i] = self.heap[i], self.heap[0]
self.length -= 1
self.max_heapify(0)
return array
Posts that made me think
Studying the source code of the libraries you use the most is a great way to understand how they actually work and become a better developer. Unit tests are a good starting point when available.
I had an experience similar to Adrian. Dedication and a bit of passion are all you need at the beginning. However, I also started enjoying software engineering after becoming good.
Most of the best product features are there because engineers found clever ways to solve problems. Link
Love the reminder/review of fundamentals.
parent(i) = floor(i/2) -1
should be
parent(i) = floor((i - 1) / 2)