#### Discover more from The Polymathic Engineer

# Sorting by Counting

### How to properly implement the counting sort algorithm. And when to use it for optimizing the performance.

Hi Friends,

Welcome to the 4th issue of the Polymathic Engineer newsletter.

This week we will cover the following topics:

Comparison vs Non-Comparison based sorting methods

The basics of counting sort

How to count keys frequencies

How to use keysâ€™ frequencies to sort

Asymptotic complexity & additional considerations

Solution to the coding challenge of last week

Technical threads of the weeks

An interesting tweet I read this week

### Comparison vs Non-Comparison based sorting methods

Most sorting algorithms for arrays work by comparing the elements and placing them into their sorted positions in different steps. For this reason, these algorithms are known as comparison-based.

Examples are bubble sort, selection sort, insertion sort, merge sort, quick sort, and heap sort. The best time complexity this category of sorting algorithms can obtain is O(nlogn), where n is the number of elements to sort.

Another category of sorting algorithms adopts different approaches than comparing the elements. The advantage of these algorithms is that they can run in O(n) if some conditions are verified.

In this issue, we will look at a famous non-comparison sorting algorithm: counting sort.

### The basics of counting sort

Counting sort is a non-comparison-based sorting algorithm that can run in linear time. The assumption is that the elements are sorted according to integer keys within a finite range.

Its basic idea is that the frequency of the keys can be used to determine the position of the elements once they're sorted.

Let's assume that the range of keys is [0,k] and consider an element with key *i*. If we knew that the number of elements with key <= i is m, then:

the sorted index of the last element with key

*i*would be m-1the sorted index of the 2nd last element with key

*i*would be m-2and so on

An example can help to clarify. If the keys in the input array are [1,1,2,4,3,0,1,2,3,1], there are seven keys <= 2. So the sorted position of the last 2 will be 6, and the sorted position of the second last 2 will be 5. This can be easily verified by checking the sorted output array: [0,1,1,1,1,2,2,3,3,4].

So, the first step of the counting sort is computing the frequencies of all the keys.

### How to count keys frequencies

The frequency of each key can be computed using a bookkeeping array B. The size of B is k+1, and B is initialized with all 0's.

Since the range of the keys is [0,k], it's possible to iterate over the input array and use the keys as indexes for B.

During the iteration, the elements of B indexed by the keys get incremented. At the end of the iteration, each element B[i] stores the frequency of the key with index i.

### How to use keysâ€™ frequencies to sort

As previously explained, the sorted position of the last element having key <= *i* is the sum of the frequencies of all keys <= *i* (minus 1 because arrays are 0-indexed).

So the cumulative sum of the bookkeeping array gives at each index the sorted position of the last element with key equal to that index.

The bookkeeping array B can be modified in place to store the cumulative sum of its elements. Let's call the modified bookkeeping array B'.

To get a sorted output array, it's now possible to iterate * backward* over the input array and use each key as an index for B' to obtain its sorted position.

Note that B'[i] is decreased before copying an element having key *i* to its sorted position. This keeps the invariant that B'[i] represents the sorted position (minus 1) of the next remaining input element having key *i*.

Here is the implementation of the algorithm in C++.

Note that the iteration through the input array could also be done in the forward direction. The advantage of the backward iteration is that the resulting sorting algorithm is stable.

### Asymptotic complexity

The**Â time complexity**Â is O(n+k) to iterate through both the input and bookkeeping array.

The**Â space complexity**Â is O(k) to store the bookkeeping array.

Usually, the number of items to be sorted is similar to the number of keys those items can take.

In those cases, k becomes O(n) and the time complexity of the whole algorithm is O(n). This is only sometimes valid (i.e., for the input [1e10,1,2,4,3,0,1,2,3,1]).

### Additional considerations

Another property of the counting sort is that it isÂ **stable**. So elements with the same key appear in the output array in the same order as in the input array.

Finally, it was assumed that all the keys were positive integers and included in a range between 0 and some maximum value k. Actually, this is not necessary.

Counting sort can be applied to whatever integer range of keys. The trick is to find the minimum element and store the frequency of that minimum element at the index 0.

### Coding challenge

The coding challenge of last week was Find Players With Zero or One Losses.

This problem gives as input a list of pairs [winner, loser] indicating the results of some matches. It then asks to find a list of all players that haveÂ **not**Â lost any matches and all that have lost exactlyÂ **one**Â match.

The problem is an excellent example of the importance of familiarity with the hash table data structure. The idea is to keep a hash map to count the number of lost matches for the losers and a hash set to keep track of the winners.

The list of all players that have yet to lose any matches are winners but not losers.

The list of all players that have lost exactly one match is all the losers with frequency 1.

Here is the solution to the problem in C#.

The last sorting step is because the problem asks to return the, 2 output list in **increasing** order. With this step the time complexity is O(nlogn), without it would be O(n). The space complexity is O(n) to store the hash set and hash map.

For the next week, I challenge you to solve the problem h-index.

### Technical threads of the week

A quick introduction to the counting sort algorithm.

Consistent hashing in the context of distributed systems.

### An interesting tweet of this week

This is such a good advice. Getting a promotion is not easy, but there is one thing that is important to understand.

Asking for a promotion is not a one-shot discussion but a series of continuing conversations with your manager.

It is crucial to clarify why you want a promotion and focus the discussions on what it would take to get to the next level.