Space Complexity
How to analyze the space complexity of an algorithm: time and space trade-offs and recursion.
Welcome to the 106th issue of the Polymathic Engineer newsletter.
When discussing algorithm efficiency, we often focus on their time complexity, which tells us how fast they run. However, another crucial aspect to take into account is how much memory an algorithm uses.
This is known as space complexity, and it's critical when dealing with limited memory environments or processing large amounts of data. This week, we will talk about it.
This is the outline:
How to measure space complexity
Time and space trade-offs
What about recursion?
Project-based learning is the best way to develop technical skills. CodeCrafters is an excellent platform for practicing exciting projects, such as building your version of Redis, Kafka, HTTP server, SQLLite, or Git from scratch.
Sign up, and become a better software engineer.
How to measure space complexity
Space complexity refers to the amount of memory an algorithm consumes relative to its input size.
The key question is: "If there are N data elements, how many units of memory will the algorithm consume?"
Consider a simple example, like a function that calculates the sum of digits for a given number. Here's one way to implement it:
def sum_of_digits(number):
digits = list(str(number))
sum = 0
for digit in digits:
sum += int(digit)
return sum
This function converts the input number to a string and then splits it into a list of individual digit characters. Even though the function returns a single number, it uses an amount of additional memory proportional to the input size.
Just as we use Big O notation to describe time complexity, we can apply the same idea to space complexity. The function in the example above has an O(N) space complexity because the list grows linearly with the input size.
As an additional example, let’s see an alternative function that calculates the sum of digits:
def sum_of_digits(number):
total = 0
while number > 0:
digit = number % 10
total += digit
number //= 10
return total
Here, the space complexity is O(1) because it uses the same amount of extra space regardless of the input size.
It's worth keeping in mind that when analyzing the time complexity, we only consider the extra data the algorithm generates without including the output. Such extra space is also known as auxiliary space.
Time and space trade-offs
In algorithm design, we often encounter situations where we need to balance between time efficiency and memory usage. We can make an algorithm faster by using more memory or reducing memory usage at the cost of increased running time.
Consider a classic example, like a function checking for duplicate values in a list.
Here are three different approaches:
Nested loops (Time: O(N²), Space: O(1))
def has_duplicate_value(array):
for i in range(len(array)):
for j in range(len(array)):
if i != j and array[i] == array[j]:
return True
return False
This approach uses no extra space but is slow for large arrays.
Using a hash table (Time: O(N), Space: O(N))
def has_duplicate_value(array):
seen = set()
for item in array:
if item in seen:
return True
seen.add(item)
return False
This version is much faster but uses additional memory proportional to the input size.
Sorting the array (Time: O(N log N), Space: O(log N) or O(1) depending on the sorting algorithm implementation)
def has_duplicate_value(array):
sorted_array = sorted(array)
for i in range(1, len(sorted_array)):
if sorted_array[i] == sorted_array[i-1]:
return True
return False
Each solution has a different trade-off between time and space efficiency. In cases like that, there is no absolute best solution. The best choice depends on the requirements of your application, such as available memory, acceptable response times, and the size of the data you're working with.
If memory is severely limited, you might opt for the nested loops approach despite its slower speed. The hash table method could be ideal if you need the fastest possible performance and have memory to spare. The sorting approach is a good compromise in scenarios where both time and space are critical.
What about recursion?
Recursion is a powerful programming technique, but it comes with a cost in terms of space complexity. Let's understand why and how recursion affects memory usage.
Consider a simple recursive function that calculates the sum of numbers from 1 to n:
def sum_to_n(n):
if n == 1:
return 1
return n + sum_to_n(n - 1)
Even if this function doesn't seem to use any extra space, each recursive call adds a new frame to the call stack, consuming memory.
For example, if we call sum_to_n(5)
, the call stack will look like this at its peak:
sum_to_n(1)
sum_to_n(2)
sum_to_n(3)
sum_to_n(4)
sum_to_n(5)
Since the call stack grows linearly with the input size, the space complexity of our recursive function is O(N), where N is the input number.
The extra space required by each stack frame cost can lead to unexpected issues. For instance, on many systems, calling sum_to_n(50000)
results in a maximum recursion depth exceeded error.
Such a problem do not exist with an iterative version, which has O(1) space complexity regardless of the input size:
def sum_to_n_iterative(n):
total = 0
for i in range(1, n + 1):
total += i
return total
This doesn't mean recursion should always be avoided. Many algorithms, like Depth First Search, QuickSort, or MergeSort, use recursion effectively. However, it's crucial to be aware of the space implications, especially when dealing with large inputs.
Some programming languages and compilers optimize tail recursion, eliminating the extra stack frames. However, this isn't universally supported.
Interesting Reading:
Some interesting articles I read this week:
Awesome breakdown 🙌🏻
Thank you so much