

Discover more from The Polymathic Engineer
Sorting by Counting
How to properly implement the counting sort algorithm. And when to use it for optimizing the performance.
Hi Friends,
Welcome to the 4th issue of the Polymathic Engineer newsletter.
This week we will cover the following topics:
Comparison vs Non-Comparison based sorting methods
The basics of counting sort
How to count keys frequencies
How to use keys’ frequencies to sort
Asymptotic complexity & additional considerations
Solution to the coding challenge of last week
Technical threads of the weeks
An interesting tweet I read this week
Comparison vs Non-Comparison based sorting methods
Most sorting algorithms for arrays work by comparing the elements and placing them into their sorted positions in different steps. For this reason, these algorithms are known as comparison-based.
Examples are bubble sort, selection sort, insertion sort, merge sort, quick sort, and heap sort. The best time complexity this category of sorting algorithms can obtain is O(nlogn), where n is the number of elements to sort.
Another category of sorting algorithms adopts different approaches than comparing the elements. The advantage of these algorithms is that they can run in O(n) if some conditions are verified.
In this issue, we will look at a famous non-comparison sorting algorithm: counting sort.
The basics of counting sort
Counting sort is a non-comparison-based sorting algorithm that can run in linear time. The assumption is that the elements are sorted according to integer keys within a finite range.
Its basic idea is that the frequency of the keys can be used to determine the position of the elements once they're sorted.
Let's assume that the range of keys is [0,k] and consider an element with key i. If we knew that the number of elements with key <= i is m, then:
the sorted index of the last element with key i would be m-1
the sorted index of the 2nd last element with key i would be m-2
and so on
An example can help to clarify. If the keys in the input array are [1,1,2,4,3,0,1,2,3,1], there are seven keys <= 2. So the sorted position of the last 2 will be 6, and the sorted position of the second last 2 will be 5. This can be easily verified by checking the sorted output array: [0,1,1,1,1,2,2,3,3,4].
So, the first step of the counting sort is computing the frequencies of all the keys.
How to count keys frequencies
The frequency of each key can be computed using a bookkeeping array B. The size of B is k+1, and B is initialized with all 0's.
Since the range of the keys is [0,k], it's possible to iterate over the input array and use the keys as indexes for B.
During the iteration, the elements of B indexed by the keys get incremented. At the end of the iteration, each element B[i] stores the frequency of the key with index i.
How to use keys’ frequencies to sort
As previously explained, the sorted position of the last element having key <= i is the sum of the frequencies of all keys <= i (minus 1 because arrays are 0-indexed).
So the cumulative sum of the bookkeeping array gives at each index the sorted position of the last element with key equal to that index.
The bookkeeping array B can be modified in place to store the cumulative sum of its elements. Let's call the modified bookkeeping array B'.
To get a sorted output array, it's now possible to iterate backward over the input array and use each key as an index for B' to obtain its sorted position.
Note that B'[i] is decreased before copying an element having key i to its sorted position. This keeps the invariant that B'[i] represents the sorted position (minus 1) of the next remaining input element having key i.
Here is the implementation of the algorithm in C++.
Note that the iteration through the input array could also be done in the forward direction. The advantage of the backward iteration is that the resulting sorting algorithm is stable.
Asymptotic complexity
The time complexity is O(n+k) to iterate through both the input and bookkeeping array.
The space complexity is O(k) to store the bookkeeping array.
Usually, the number of items to be sorted is similar to the number of keys those items can take.
In those cases, k becomes O(n) and the time complexity of the whole algorithm is O(n). This is only sometimes valid (i.e., for the input [1e10,1,2,4,3,0,1,2,3,1]).
Additional considerations
Another property of the counting sort is that it is stable. So elements with the same key appear in the output array in the same order as in the input array.
Finally, it was assumed that all the keys were positive integers and included in a range between 0 and some maximum value k. Actually, this is not necessary.
Counting sort can be applied to whatever integer range of keys. The trick is to find the minimum element and store the frequency of that minimum element at the index 0.
Coding challenge
The coding challenge of last week was Find Players With Zero or One Losses.
This problem gives as input a list of pairs [winner, loser] indicating the results of some matches. It then asks to find a list of all players that have not lost any matches and all that have lost exactly one match.
The problem is an excellent example of the importance of familiarity with the hash table data structure. The idea is to keep a hash map to count the number of lost matches for the losers and a hash set to keep track of the winners.
The list of all players that have yet to lose any matches are winners but not losers.
The list of all players that have lost exactly one match is all the losers with frequency 1.
Here is the solution to the problem in C#.
The last sorting step is because the problem asks to return the, 2 output list in increasing order. With this step the time complexity is O(nlogn), without it would be O(n). The space complexity is O(n) to store the hash set and hash map.
For the next week, I challenge you to solve the problem h-index.
Technical threads of the week
A quick introduction to the counting sort algorithm.


Consistent hashing in the context of distributed systems.


An interesting tweet of this week

This is such a good advice. Getting a promotion is not easy, but there is one thing that is important to understand.
Asking for a promotion is not a one-shot discussion but a series of continuing conversations with your manager.
It is crucial to clarify why you want a promotion and focus the discussions on what it would take to get to the next level.