The Polymathic Engineer

Share this post

Sorting by Counting

newsletter.francofernando.com

Sorting by Counting

How to properly implement the counting sort algorithm. And when to use it for optimizing the performance.

Franco Fernando
Dec 5, 2022
3
Share this post

Sorting by Counting

newsletter.francofernando.com

Thanks for reading The Polymathic Engineer! Subscribe for free to receive new posts and support my work.

Hi Friends,

Welcome to the 4th issue of the Polymathic Engineer newsletter.

This week we will cover the following topics:

  • Comparison vs Non-Comparison based sorting methods

  • The basics of counting sort

  • How to count keys frequencies

  • How to use keys’ frequencies to sort

  • Asymptotic complexity & additional considerations

  • Solution to the coding challenge of last week

  • Technical threads of the weeks

  • An interesting tweet I read this week

Comparison vs Non-Comparison based sorting methods

Most sorting algorithms for arrays work by comparing the elements and placing them into their sorted positions in different steps. For this reason, these algorithms are known as comparison-based.

Examples are bubble sort, selection sort, insertion sort, merge sort, quick sort, and heap sort. The best time complexity this category of sorting algorithms can obtain is O(nlogn), where n is the number of elements to sort.

Another category of sorting algorithms adopts different approaches than comparing the elements. The advantage of these algorithms is that they can run in O(n) if some conditions are verified.

In this issue, we will look at a famous non-comparison sorting algorithm: counting sort.

The basics of counting sort

Counting sort is a non-comparison-based sorting algorithm that can run in linear time. The assumption is that the elements are sorted according to integer keys within a finite range.

Its basic idea is that the frequency of the keys can be used to determine the position of the elements once they're sorted.

Let's assume that the range of keys is [0,k] and consider an element with key i. If we knew that the number of elements with key <= i is m, then:

  • the sorted index of the last element with key i would be m-1

  • the sorted index of the 2nd last element with key i would be m-2

  • and so on

An example can help to clarify. If the keys in the input array are [1,1,2,4,3,0,1,2,3,1], there are seven keys <= 2. So the sorted position of the last 2 will be 6, and the sorted position of the second last 2 will be 5. This can be easily verified by checking the sorted output array: [0,1,1,1,1,2,2,3,3,4].

So, the first step of the counting sort is computing the frequencies of all the keys.

How to count keys frequencies

The frequency of each key can be computed using a bookkeeping array B. The size of B is k+1, and B is initialized with all 0's.

Since the range of the keys is [0,k], it's possible to iterate over the input array and use the keys as indexes for B.

During the iteration, the elements of B indexed by the keys get incremented. At the end of the iteration, each element B[i] stores the frequency of the key with index i.

How to use keys’ frequencies to sort

As previously explained, the sorted position of the last element having key <= i is the sum of the frequencies of all keys <= i (minus 1 because arrays are 0-indexed).

So the cumulative sum of the bookkeeping array gives at each index the sorted position of the last element with key equal to that index.

The bookkeeping array B can be modified in place to store the cumulative sum of its elements. Let's call the modified bookkeeping array B'.

To get a sorted output array, it's now possible to iterate backward over the input array and use each key as an index for B' to obtain its sorted position.

Note that B'[i] is decreased before copying an element having key i to its sorted position. This keeps the invariant that B'[i] represents the sorted position (minus 1) of the next remaining input element having key i.

Here is the implementation of the algorithm in C++.

Note that the iteration through the input array could also be done in the forward direction. The advantage of the backward iteration is that the resulting sorting algorithm is stable.

Asymptotic complexity

The time complexity is O(n+k) to iterate through both the input and bookkeeping array.

The space complexity is O(k) to store the bookkeeping array.

Usually, the number of items to be sorted is similar to the number of keys those items can take.

In those cases, k becomes O(n) and the time complexity of the whole algorithm is O(n). This is only sometimes valid (i.e., for the input [1e10,1,2,4,3,0,1,2,3,1]).

Additional considerations

Another property of the counting sort is that it is stable. So elements with the same key appear in the output array in the same order as in the input array.

Finally, it was assumed that all the keys were positive integers and included in a range between 0 and some maximum value k. Actually, this is not necessary.

Counting sort can be applied to whatever integer range of keys. The trick is to find the minimum element and store the frequency of that minimum element at the index 0.

Coding challenge

The coding challenge of last week was Find Players With Zero or One Losses.

This problem gives as input a list of pairs [winner, loser] indicating the results of some matches. It then asks to find a list of all players that have not lost any matches and all that have lost exactly one match.

The problem is an excellent example of the importance of familiarity with the hash table data structure. The idea is to keep a hash map to count the number of lost matches for the losers and a hash set to keep track of the winners.

The list of all players that have yet to lose any matches are winners but not losers.

The list of all players that have lost exactly one match is all the losers with frequency 1.

Here is the solution to the problem in C#.

The last sorting step is because the problem asks to return the, 2 output list in increasing order. With this step the time complexity is O(nlogn), without it would be O(n). The space complexity is O(n) to store the hash set and hash map.

For the next week, I challenge you to solve the problem h-index.

Technical threads of the week

A quick introduction to the counting sort algorithm.

Twitter avatar for @Franc0Fernand0
Fernando 🇮🇹🇨🇭 @Franc0Fernand0
Counting sort is an algorithm capable to stable sort an array in linear time. How it works and when to apply it: {1/9} ↓
Image
3:08 PM ∙ Nov 30, 2022
42Likes2Retweets

Consistent hashing in the context of distributed systems.

Twitter avatar for @Franc0Fernand0
Fernando 🇮🇹🇨🇭 @Franc0Fernand0
Consistent hashing is a common technique used in distributed systems. When it's functional and how it works: {1/9}
Image
3:08 PM ∙ Dec 3, 2022
513Likes94Retweets

An interesting tweet of this week

Twitter avatar for @engineering_bae
Taylor Poindexter @engineering_bae
Want to be promoted? 1. Tell your manager 2. Ask them for a bullet point list of things they’d need to see from you to feel comfortable promoting you 3. Continually check in w/ them to see how you’re progressing against that checklist This will increase your chances of success.
1:03 PM ∙ Dec 2, 2022
995Likes110Retweets

This is such a good advice. Getting a promotion is not easy, but there is one thing that is important to understand.

Asking for a promotion is not a one-shot discussion but a series of continuing conversations with your manager.

It is crucial to clarify why you want a promotion and focus the discussions on what it would take to get to the next level.

Thanks for reading The Polymathic Engineer! Subscribe for free to receive new posts and support my work.

Share this post

Sorting by Counting

newsletter.francofernando.com
Comments
TopNewCommunity

No posts

Ready for more?

© 2023 Franco Fernando
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing