Scaling the data storage layer in system design
A quick journey through replication, partitioning, and NoSQL databases.
Hi Friends,
Welcome to the 148th issue of the Polymathic Engineer newsletter.
Scaling a stateless application is relatively straightforward. Adding more servers and placing them behind a load balancer enables the system to handle more traffic easily. It is not a big issue for the application layer to scale horizontally. However, there is a catch.
You can have as many application servers as you want, but they will all use the same database. And that database runs on a single machine. As your application handles more load, the number of requests it sends to the database will also rise. Eventually, that single database server reaches its capacity, and performance starts to degrade.
Your monitoring dashboards show that the application servers are working correctly, but the database CPU is at full load, which means that queries are being held up. The database becomes the bottleneck of the system.
Unlike application servers that can be easily replicated, databases are stateful. They hold your data, and that data needs to be consistent, available, and durable. It is a lot harder to scale a database while keeping these properties than it is to scale a stateless component.
In this article, we will talk about the main ways to scale the storage layer of your app. Each approach has its pros and cons that need to be understood in order to make the best architectural choices.
Here is the outline:
Replication
Partitioning
NoSQL
Building APIs is not that easy if you need them to evolve safely and work across multiple programming languages.
Protobuf is the most popular schema definition language, but it has some gaps for preventing breaking changes and maintaining a safe API development workflow.
In this 1-hour interactive online workshop organized by Buf, you can learn:
How Protobuf makes building APIs safer and easier
Best practices for designing APIs for real-world uses
How to use Protobuf in data pipelines, event streaming platforms, and more
Register for the workshop for free, and get started with Protobuf APIs
Replication
The most common way to increase the read capacity of a database is to create replicas using a leader-follower topology.
It works like this: clients only send write operations (like updates, inserts, and deletes) to the leader. The leader writes these changes down in a write-ahead log. The replicas, or followers, then connect to the leader, read its logs, and save them locally. Each log entry has a sequence number. This allows the replicas to disconnect and reconnect at any time, resuming from where they left off by sending the last sequence number they processed to the leader.
Adding read-only replicas and putting them behind a load balancer is an effective way to increase the read throughput of a database. If one replica becomes unhealthy or is no longer reachable, the load balancer can take it out of the pool immediately.
Replication also makes the database more available. When the leader fails, a replica can be reconfigured to take its place. The failover mechanism must be able to detect when the leader has failed, designate a follower as the new leader, reconfigure the other replicas to follow it, and make sure that client requests are directed to the new leader.
Another benefit of replication is workload isolation. You can use specific replicas to execute some tasks, such as expensive analytics queries that run regularly. This way, such tasks can’t affect the leader or any other replicas.
Data replication between the leader and followers can happen in different ways:
Asynchronous: As soon as the leader receives a write, it sends it to all the replicas and responds back to the client right away, without waiting for acknowledgment. This method cuts down on response time for the client, but it’s not fault-tolerant. If the leader crashes just after the acknowledgment and before sending an update to the followers, data is lost.
Synchronous: The leader waits for the write to be acknowledged by all replicas before sending a response to the client. However, this slows things down because a single slow replica makes each request take longer to respond. Even worse, the data store becomes unavailable if any of the replicas are down. This method can’t be used on a large scale. The more followers there are, the more likely it is that at least one of them is unavailable or slow.
Semi synchronous: In practice, relational databases allow both synchronous and asynchronous replication. For example, in PostgreSQL, individual followers can be configured to get updates synchronously rather than asynchronously, which is the default.
A common approach is to make one of your replicas synchronous, so that it can serve as an up-to-date backup of the leader. When the leader crashes, you can switch to the synchronous follower without losing any data. All the other replicas are asynchronous to avoid availability issues. Managed databases, such as RDS or Azure SQL Database, come with read replicas and automatic failover built in.
Replication has two main limitations that are important to understand. First, scaling out reads doesn’t help with writes. There is only one leader that all write processes go to, which slows down write-heavy workloads. Second, the whole dataset must fit into a single machine. You could get around this by shifting some tables to databases that run on different machines, but that would only put off the work that needed to be done. The way to overcome these limitations is through partitioning.
Partitioning
Partitioning lets you scale a database to accommodate both read and write operations. Relational databases don’t usually support it out of the box, but it is possible to add it at the application layer. However, this is pretty challenging to do and adds significant complexity to the system.
First, you need to figure out how to split the data between the database instances and adjust it when one part gets too big or too hot. After the data is divided, queries that cover more than one partition need to be broken into sub-queries, and their responses have to be combined. Think of aggregations or joins across partitions. You also need to set up a distributed transaction protocol, such as two-phase commit (2PC), to allow atomic transactions between partitions.
If you add to all that the requirement of replication, it becomes clear how partitioning at the application layer can become overwhelming.
Taking a step back, the fundamental problem with traditional relational databases is that they were made with the idea that they could fit on a single computer. That’s why they can handle features like ACID transactions and joins that are hard to scale.
When relational databases were first made, disk space was expensive, so normalizing the data to make it take up less space on the disk was critical, even if that meant using joins to denormalize the data at query time. Things have changed. These days, storage is cheap, but CPU time isn’t.
This is why big tech companies started making their own data storage systems. These systems were built from the ground up to be highly available and scalable, and became known as NoSQL.
No SQL
The first generation of data storage made by big tech didn’t support SQL and didn’t have nearly as many features as standard relational databases. This is what made them be called “NoSQL.” However, these days, the name doesn’t make much sense because NoSQL storages have added features that used to only be in relational databases and also support SQL dialects.
While relational databases use strong consistency models, such as strict serializability, NoSQL stores use more flexible models, such as eventual or causal consistency, to get high availability. Additionally, NoSQL storages typically don’t support joins and use denormalized data, often stored as key-value pairs or documents, such as JSON.
A key-value store maps a sequence of bytes (the key) to another sequence of bytes (the value). A document store maps a key to a document, which can have a hierarchy but doesn’t have a strict schema. The main difference between a document and a key-value store is that documents are read and indexed, which means that they can be searched for based on what they contain.
Since NoSQL stores support partitioning natively for scaling, they don’t fully support transactions or support them only within a single partition. On the other hand, since the data is stored in unnormalized form, there’s less need for transactions or joins to begin with.
Even though NoSQL stores don’t use relational data models, you can still use them to model relational data. But if you try to use a NoSQL store like a relational database, it will make things worse. To get the most out of a NoSQL data store, you need to plan ahead for how you will handle the data and tailor the store to those needs.
People often think that NoSQL data stores are more flexible than relational databases since they can scale quickly without having to model the data first. That is not true at all. NoSQL needs a lot more thought about how the data is modeled. Since NoSQL storages are tightly tied to the access patterns, they are not as flexible as relational databases.





