Traverse Trees
Hi Friends,
Welcome to the 19th issue of the Polymathic Engineer newsletter.
Today we will talk about:
All different ways to traverse binary trees
What is a database storage engine
Coding challenge
Two interesting tweets
Binary Trees traversal
A binary tree is a computer science data structure consisting of nodes. Each node is an abstract data type storing both data and pointers to other nodes. In a binary tree, a node points to at most two nodes, the left and right children.
There are two special kinds of nodes in a binary tree:
the one at the top, called the root
the ones without children, called leaves
All the nodes in the tree are accessible starting from the root. The depth of a node is how far it is from the root node.
The most important thing to remember is that binary trees are recursive structures. Each node is a subtree's root, and this subtree can be treated as a new tree. This property allows recursively traversing a binary tree using Depth-first search (DFS).
DFS starts at the root and traverses as far down the tree as possible in one direction. When it reaches a leaf node, the algorithm goes backward, considering the other direction. Recursion makes this operation of moving back up the tree simple and elegant.
There are 3 ways to perform DFS according to the order to visit a node and its children. Visit here means executing some logic on the node data (i.e., printing it).
Preorder traversal visit first a node and then its children. Inorder traversal visits first the left child, the node, and finally the right child. Postorder traversal visits first the children and then the node.
Switching between the above traversals requires only changing the position of the recursive calls.
An alternative way to traverse a binary tree is using Breadth-first search (BFS). BFS starts at the root and traverses all nodes at a depth level before moving on to the next one. The standard way to implement BFS is using a queue, thanks to its FIFO way of working.
Database storage engines
A Database Management System DBMS is not a monolithic block but has several components.
There's a component to authenticate clients and handle their requests/responses. Other components allow instead to parse and optimize queries, perform monitoring, and so on.
A storage engine is the core component of a Database Management System, which is responsible for:
executing create/read/update/delete operations on the disk
implement indexing using specialized data structures
The underlying data structure has a significant impact on the database performance.
There are two main data structures: B-trees and LMS-trees. B-trees give better read performance, while LSM trees speed up writing operations. As usual, there is no perfect solution, but it all depends on the tradeoffs to implement.
Storage engines can be simple key-value stores or complex entities supporting ACID transactions. Building a custom DBMS using an existing engine is possible, given its modularity nature. Conversely, some databases allow swapping their storage engine.
For example, MySQL and MongoDB have native functionalities for changing the storage engine. Another possible strategy is to fork open-source databases, replacing their engines manually. For example, at Instagram, they forked Cassandra to use the RocksDB engine.
Popular storage engines using B+ trees are MyISAM and InnoDB.
InnoDB is the default MySQL storage engine, replacing MyISAM in 2009. Both engines provide fast reads, but InnoDB offers also support for transactions and foreign keys, and more performant concurrent writes.
Popular storage engines using LSM trees are LevelDB and RocksDB.
LevelDB is a key-value store introduced by Google and offers key-value pairs storage in arbitrary byte arrays, forward and backward iteration, support for batching writes, and a compression library.
Facebook introduced RocksDB as a fork of LevelDB. The initial idea was to optimize performance by efficiently exploiting many CPU cores. But many other features have been added like transactions, bloom filters, merge operators, and geospatial support.
Coding challenge
The coding challenge for the next week is Permutations in string. Given two strings s1 and s2, the problem asks to find if s1 is a permutation of s2.
The brute force solution would be to find all permutations of s1 and compare them to all possible substrings s2 to see if they’re equal. The approach is not feasible for large strings because its time complexity would be factorial in the length of the string.
The key observation to find a better solution is that a string is a permutation of another string if both use the same characters with the same frequencies. With this in mind, the approach become straightforward:
count the frequencies of the letters in s1
run a sliding windows in s2 with the size of s1 and count the frequencies of the letters in the windows
check if the frequencies of the letters in s1 and in the window are equal
Here is a possible implementation:
The time complexity is O(NA), where N is the size of s2 and A is the size of the alphabet. The space complexity is O(A).
Interesting Tweets
I was a former C++ embedded software engineer. My professional path shows that switching software development area is not a problem if you are good at what you do. Caring always about quality is one the most important points.
Having productive people on the team who bring positivity and support is priceless. Even if I love my working environment and teammates, I still miss some of those people I worked with in the past.