Solving Problems by Sorting
How to solve coding problems more efficiently by sorting the input data.
Hi Friends,
Welcome to the 89th issue of the Polymathic Engineer newsletter.
This week, we will see how sorting the input data can be useful to solve many coding problems more efficiently.
The outline is as follows:
Sorting as a preprocessing step
Sweep line algorithms
Scheduling Events
Being hands-on is the best way to build technical skills. CodeCrafters is an excellent platform that lets you build exciting stuff, such as your version of Redis, HTTP server, and even Git from scratch.
Understanding how such things work under the hood sets you apart. Sign up, and become a better software engineer.
Sorting as a preprocessing step
When solving coding interview problems, it may be easy to find straightforward brute-force solutions with a time complexity of O(n²).
However, most interviewers won't be happy with such solutions because they only work for small datasets and quickly stop working as the input size grows.
This is where sorting comes to the rescue, helping to reduce the time complexity of your algorithms from O(n²) to O(n log n) or even O(n) in some cases.
The reason is that having sorted data allows more efficient comparisons and searches without nested loops. Moreover, many problems become simpler when you assume that related elements are next to each other in the input.
Let me give you a quick example to show what I mean. Imagine you need to check if an array contains duplicated elements. The brute force approach compares every element with every other element, resulting in O(n²) time complexity.
def hasDuplicate(arr):
n = len(arr)
for i in range(n):
for j in range(i + 1, n):
if arr[i] == arr[j]:
return True
return False
If you sort the array first, you can compare adjacent elements in a single pass through the sorted array.
def hasDuplicate(arr):
sorted_arr = sorted(arr)
for i in range(1, len(sorted_arr)):
if sorted_arr[i] == sorted_arr[i-1]:
return True
return False
The sorted version reduced the time complexity to O(n log n), dominated by the sorting step. The subsequent linear scan through the sorted array is just O(n).
This simple example shows how sorting can transform problems, allowing you to come up with more efficient solutions. In the following sections, we'll check some classes of algorithms that leverage sorting to achieve better time complexities.
Sweep line algorithms
Sweep line algorithms are a clever way to solve problems by treating them as a series of events you can process in order.
For example, consider you are given a shop and the arriving and leaving times of all the customers in a day. How could you calculate the maximum number of customers who were in the shop at the same time?
The brute force approach checks every possible time interval, counting customers for each. However, that's slow and inefficient since you need a nested loop to compare the intervals related to each pair of customers.
A better approach is to use a sweep line algorithm. First, you sort the events and map them on a line representing the time. Then, you process the events in order according to their times.
While going through the events, you keep a counter. The counter increases for each arrival and decreases for each leaving. The largest value of the counter answers the problem.
def max_customers(arrivals, departures):
events = [(time, 1) for time in arrivals] + [(time, -1) for time in departures]
events.sort() # Sort all events by time
max_count = 0
current_count = 0
for _, change in events:
current_count += change
max_count = max(max_count, current_count)
return max_count
The sweep line part takes only O(n) time, but the overall algorithm takes O(nlogn) because of the sorting step.
If you want to practice, a similar problem on LeetCode is Meeting Rooms II. Given an array of meeting time intervals, the problems asks you to return the minimum number of conference rooms required.
Here, instead of customers, you’re dealing with meetings, and instead of finding the maximum number of simultaneous customers, you need to find the maximum number of simultaneous meetings (which is equals the number of rooms needed).
Scheduling Events
A lot of scheduling events problems can be solved by sorting the input data and then using a greedy approach to build a solution. Greedy algorithms always make a choice that looks the best at the moment and never do a step back.
As an example, suppose you have a list of events with their start and end times and you have to find the maximum number of events you can attend.
The brute force approach is to check all the possible combinations of events. But once again this solution as a quadratic time complexity in the number of events.
The optimal approach consists in sorting the events and selecting the best possible event to attend in a greedy way. But how should you sort the events?
A first possibility is to is to sort the events according to their lengths and then choose as much short events as possible. However, this strategy does not always work, since a short event can overlap with two non-overlapping longer events.
Another idea is to sort the events according to their starting times and always select the next possible event that begins as early as possible. However, the right way to do it is sorting the events by their end time.
The reason is that if you always pick the event that ends earliest, you maximize your chances of being able to attend more events afterward. The code below first sort the events by their end times. Then it goes through the sorted list, greedily selecting events that don't overlap with the previously selected event.
def max_events(events):
# Sort events by end time
events.sort(key=lambda x: x[1])
count = 0
last_end_time = 0
for start, end in events:
if start >= last_end_time:
count += 1
last_end_time = end
return count
The time complexity is O(n log n), because of the sorting step. The subsequent pass through the events is just O(n), where n is the number of events.
A similar LeetCode problem to practice is Non-overlapping Intervals.
Food for thoughts
Never stop upskilling and focus on every day becoming a better developer. Embrace a growth mindset and constantly look for opportunities to learn and networking. The demand for skilled developers will always be there.
Interesting Reading
Here are some interesting articles I read this week:
Thanks for the mention, Fernando!
I like how you approached the topic. At least for me, I learn better by seeing real world applications of an algorithm than seeing a theoretical leetcode problem wihtout any real world application.
My favorite sorting algorithm, bucket sort came to my mind. 😃
Bucket sort almost sounds like bending reality because you create buckets first and then try to fit your input into what you know is already sorted.
It's kind of removing the problem at step 1) - and creating a different problem.