How distributed databases deal with replication lag. Plus, reversing a linked list in a few logical steps.
Welcome to the 11th issue of the Polymathic Engineer newsletter.
The past week our newsletter celebrated a huge goal. We are now a community of more than 1K people. Thanks to each one of you readers for making all this possible.
I promise to commit to improving this newsletter as much as possible and providing more value to you. So if you have any feedback or ideas to improve the content or topics you would like me to write about, feel free to contact me.
You can reply to this mail anytime.
Today we will talk about:
what is the replication lag and how distributed databases deal with it
how to reverse a linked list
two interesting tweets
Replication Lag and Data Consistency
Read replicas are an effective technique to scale up a cluster of database servers, especially in the case of read-heavy workflows.
A practical configuration uses a primary database for write operations and multiple read replicas. But this setup has a drawback known as replication lag.
The replicas are seconds or minutes behind the primary server and can be sent back stale data to the clients. This creates consistency problems leading to unpredictability at the application level.
There are several possible approaches to solving those consistency issues:
• Tight Consistency
• Causal Consistency
• Monotonic Read Consistency
All the replicas are up to date with the primary server before any other operation is allowed. It's rarely used in practice since it cancels out the performance benefits of using replicas.
A less rigid approach is to execute specific read requests requiring strict consistency only on the primary server. Instead, all other read requests that allow more freedom are sent to the replicas.
Each replica is guaranteed to be updated to at least a specific point in time. This way, read requests dependent on a write can be directed to a replica that has seen that write.
A way to implement this is by using global transaction identifiers (GTID). Each write transaction in the primary database has a GTID associated with it. The GTIDs associated with those changes are also sent to the replicas.
Read requests specifying a certain GTID are only routed to replicas having at least seen changes up to that GTID.
Monotonic Read Consistency
This approach allows multiple sequential reads to follow a consistent timeline. Each subsequent read request goes to a replica that is at least as up-to-date as the one that served the last read.
A way to implement this is using UUIDs. The approach assumes that a proxy layer exists between the application and the replicas.
Each read request is sent to this proxy layer with a UUID. Read requests with the same UUID are routed to the identical read replica.
The Shopify engineering blog has a great post about the replication lag problem.
Reversing a linked list
Reversing a linked list sounds like a tricky problem, but it has a simple and elegant solution. The trick is to think one step at a time about what you need to reverse a single node.
Let's call Curr the node to reverse.
To reverse Curr, you must set its next pointer to the previous node. So a Prev pointer is necessary to track the previous node. But if you make Curr point to the previous node, there is a problem.
You lose the reference to the node after Curr. So a temporary Next pointer is necessary to keep a reference to the node after Curr. Once you've done this, you can safely make Curr pointing to Prev.
The only remaining thing is preparing for the next step by: - updating Prev to Curr - updating Curr to Next. The time complexity of this solution is O(N) for a list of N nodes. The space complexity is O(1) since the solution only uses two additional pointers.
Most linked list problems don't need any trick to find an optimal solution. The only thing to understand is how to move pointers around. The more you practice, the more you get better at this.
Thanks for reading The Polymathic Engineer Newsletter! Subscribe for free to receive new posts and support my work.
The coding challenge of the last week was Best Time to Buy and Sell Stock.
The problem asks to find the maximum profit you can achieve from a single transaction.
Prices are stored in an integers array sorted by time. So the selling price shall have an index j greater than the buying price i.
The brute force solution would be to try each selling price for each buying price and keep their maximum difference. The time complexity of this solution is O(N^2). But that's a lot of unnecessary work.
A greedy approach leads to the optimal solution. While iterating through the array, keep and update the minimum buying price. For each price, try to sell using that minimum buying price.
The coding challenge for the next week is Valid Parenthesis.
Louie is totally right here. Being yourself always pays off long-term, even in the career game. Instead of pretending to be someone else, a better strategy is to be your best self.
Everybody should normalize being rejected in interviews. It's not an exam for our person, just a way to assess if we're the best fit for the role among the applicants. And the interview is the time to find out if a company is a fit for you as much as it is for them to figure out if you're a fit for them.