Hashing in Coding Interviews
How to effectively use sets and hash tables to solve coding interview problems.
Hi Friends,
Welcome to the 120th issue of the Polymathic Engineer newsletter.
Hash tables and sets are fundamental data structures that software engineers must master during coding interviews. These structures are highly versatile and efficient, proving invaluable in a wide range of algorithmic scenarios.
In this issue, we will discuss the two main scenarios in which to use them. The outline will be as follows:
A recap on hash tables and sets
Checking for existence
Counting elements
Project-based learning is the best way to develop technical skills. CodeCrafters is an excellent platform for practicing exciting projects, such as building your version of Redis, Kafka, DNS server, SQLite, or Git from scratch.
Sign up, and become a better software engineer.
A recap on hash tables and sets
The core mechanism of both hash tables and sets is a hash function, which converts keys into integers used as array indices. The output integer is always less than a fixed value and is deterministic. The same input is always converted to the same integer.
For instance, a basic hash function for strings might compute the sum of the ASCII values of its characters and apply a modulo operation with the array size. While this example is simplistic, it illustrates the underlying concept.
In this way, the hash function allows mapping keys to values of the data type stored in an array. Hash tables are exactly this combination of hash functions and arrays. They combine the efficiency of array access with the flexibility of using various data types as indices. This makes them particularly useful for scenarios requiring quick data retrieval based on unique identifiers.
The following operations are all O(1) for a hash table:
Add an element and map it with a value
Delete an element if it exists
Check if an element exists
Iterating over both the keys and values of a hash map still takes O(N). However, the iteration doesn't necessarily follow any order (there are many implementations, and this is language-dependent for the built-in types).
Hash sets are another data structure that is similar to hash tables. The difference is that sets do not map their keys to anything. They are just collections that store unique elements, and only care about whether they are present or not. As a result, sets don't track the frequency of the elements. If you add the same item to a set ten times, it will only be added the first time.
An important thing to note is that hash functions don't guarantee that different keys map to different array indexes. When two keys map to the same position, there is a collision. Collisions must be handled; otherwise, keys will get overridden, and values will be lost.
The primary advantage of hash sets and tables in coding interviews is their ability to improve algorithm time complexity significantly. In many cases, they can optimize the solution by an O(n) factor.
However, it's important to know the trade-offs associated with these structures. When the input is small, they can be less efficient than arrays because they need to compute hash functions and deal with collisions. Besides that, they also use more memory.
Checking for existence
The most common application of hash tables and sets is determining if an element exists in O(1) time. This capability can significantly improve the time complexity of algorithms, often reducing it from O(n^2) to O(n).
A practical example clarifies this concept. In the Two Sum problem, we are given an array of integers and a target sum, and we need to return the indices of two numbers that add up to the target.
A naive nested loops approach has O(n^2) time complexity. However, we can optimize it using a hash table. The strategy is to iterate through the array once, storing each number and its index in a hash table.
We check each number to see if its complement (target-number) is in the hash table. If it does, we've found our pair. This approach reduces time complexity to O(n) because hash table operations are O(1).
def twoSum(self, nums: List[int], target: int) -> List[int]:
dic = {}
for i in range(len(nums)):
num = nums[i]
complement = target - num
if complement in dic: # This operation is O(1)!
return [i, dic[complement]]
dic[num] = i
return [-1, -1]
Another example is finding the first character to appear twice in a string. Instead of using nested loops to compare each character with every other character, we can use a set.
We iterate through the string once, adding each character to the set. If we encounter a character already in the set, we've found our answer. This approach also improves the time complexity from O(n^2) to O(n).
As a rule of thumb, if your algorithm does something like if __ in __
, consider using a hash map or a set to store elements to have these operations run in O(1). However, when implementing such solutions, you need to consider the space-time trade-off.
While improving time complexity, we introduce additional space complexity using a hash table or set. In most cases, this trade-off is acceptable, especially when dealing with large inputs.
Counting elements
The other common pattern where hash tables excel is counting and keeping track of the frequency of elements. Whenever you need to count occurrences of anything, a hash table mapping keys to integers should be your first consideration.
This pattern is particularly useful when dealing with constraints involving multiple elements. For instance, consider a problem where you need to find the length of the longest substring that contains at most k distinct characters.
This problem involves substrings and has a constraint on the substrings (at most k distinct characters), which suggests a sliding window approach. The challenge is to check the constraint efficiently.
The brute force approach checks the entire window every time, taking O(n) time. But using a hash map, we can check the constraint in O(1).
def lengthOfLongestSubstringKDistinct(self, s: str, k: int) -> int:
counts = {}
left = 0
result = 0
for right in range(len(s)):
counts[s[right]] = counts.get(s[right], 0) + 1
while len(counts) > k:
counts[s[left]] -= 1
if counts[s[left]] == 0:
del counts[s[left]]
left += 1
result = max(result, right - left + 1)
return result
Another classic example that demonstrates this pattern is Subarray Sum Equals K. In this problem, we need to find the number of subarrays whose sum is equal to a given value k. Here's an efficient solution using a hash table:
def subarraySum(self, nums: List[int], k: int) -> int:
count = {0: 1}
curr_sum = 0
result = 0
for num in nums:
curr_sum += num
if curr_sum - k in count:
result += count[curr_sum - k]
count[curr_sum] = count.get(curr_sum, 0) + 1
return result
This solution uses a hash table to count the frequency of prefix sums. The key insight is that when the current sum minus k exists in our hash table, we’ve found a subarray with sum k. This approach lets us solve the problem in linear time.
Interesting Reading
Some interesting articles I read in the past days: