Hi Friends,
Welcome to the 35rd issue of the Polymathic Engineer newsletter. This week I want to focus on one of the most important data structures for a software engineer: binary search trees (BSTs).
The outline will be as follow:
what are binary search trees and what’s their strength
relevant balanced BSTs and their practical use cases
interesting tweets
What are binary Search Trees
Binary Search Trees (BSTs) are among the most performant data structures. There are data structures that allow fast search or flexible update, but BSTs are the only structure supporting both fast search and flexible updates.
For example, a doubly-linked lists supports insertion and deletion in O(1) time but search takes linear time in the worse case. On the other hand, sorted arrays support binary search and queries in logarithmic time, but at the price of linear-time update.
A BST is a binary tree having nodes labeled with a key and organized according to a property. The property is that for each node with key k:
all the nodes in the left subtree have keys < k
all the nodes in the right subtree have keys > k
The BST labelling and property make it possible to identify each key's location uniquely and support all the most crucial operations efficiently.
Searching, adding, or removing a node are all O(h) operations, where h is the tree's height. Let’s check how they can be implemented.
To search for a key k, you need to start at the root. Unless the root contains k, you go left if the root key is < k, and right otherwise. You can then proceed recursively on the subtrees until either you find k or there are no more nodes.
Let’s now consider the insertion of a key k. There is exactly one place where to insert k: either the node found after a successful search or the terminal node reached after an unsuccessful search.
So inserting a key is nothing more than a variant of searching for a key.
Deleting a key k is also an operation based on a search, but it is trickier. There are 3 possibilities:
k has no children and can be deleted
k has 1 child that need to be linked to the parent of k
k has 2 children and need to be first swapped with its successor and then deleted. The deletion is trivial because after the swapping k has either no children or one child.
The BST property makes it easy also other operations. For example, the minimum key is the left-most descendant of the root, while the maximum key is the right-most descendant of the root.
Moreover all the keys can be easily traversed in sorted order with an in-order traversal.
A BST performs best when all the paths from the root to the leaves have the same length. Indeed, the complexity of the main operations in a balanced BST with N nodes becomes O(N).
Some types of BST use procedures to keep the tree balanced after inserting or deleting a key. Such procedures are usually called rotations. To learn more about rotations I strongly suggest reading the book Algorithms by Robert Sedgewick.
Most used balanced BSTs
AVL tree. Each node has associated a value representing the difference in height between its left and right subtrees. After an operation, if these difference become more than 1, rotations are performed to balance the tree. AVL trees are mainly used in every situation where frequent insertions in the tree are required. For example they are used in the memory management subsystem of the Linux kernel to search memory regions of processes during preemption.
Red-black tree. Each node is either red or black. Root and leaves are black.
If a node is red, both its children are black. Every path from a given node to any leaf must have the same number of black nodes. Red-black tree are well known since they are used in various implementations of associative data structures (i.e. C++ map, set, Java HashMap). In addition they are also used to implement the CPU process scheduler in Linux and in computational geometry for reducing time complexity of the algorithms.
Splay tree. Recently accessed nodes are quick to access again. After an insertion or search, an action called splaying rearrange the tree (using rotations) so that the involved element is placed as root. Splay trees are commonly used to implement caches, in garbage collectors, and in data compression algorithms.
B Tree. It contains multiple nodes which keep data in sorted order. Each node has n keys (stored in ascending order) and n+1 children. The keys separate the ranges of keys stored in each sub-tree. B trees are mainly used in database indexing to speed up the search and in file systems to implement directories.
Interesting Tweets
Math knowledge is not only helpful to be a better developer, but also a better human. You can appreciate it much more about the world knowing math. If you don’t love yet math, I suggest reading a funny novel: The Parrot's Theorem by Denis Guedj.
The more you practice, the better you get at coding. But there is another fun fact: the more you read existing code the faster you learn. Reading code is as important as writing code. Some GitHub repositories are a goldmine for learning. Link
Usually only a small fraction of those applicants for a job take the time to tailor their CV to the position. If you do this you can highly increase your chances to get an interview.
Summer Break
I’m on a summer break in Italy with my family, so the next week the Polymathic engineer newsletter will have a break. I’ll be anyway back soon with more exciting and interesting content. Stay awesome!