Hi Friends,
Welcome to the 48th edition of the Polymathic Engineer newsletter, the first reserved for paid subscribers. Thanks for your trust and support. I hope you’ll enjoy the reading.
This time, we will focus on one of the most popular NoSQL data stores: Apache Cassandra.
The outline will be as follow:
introduction
architecture
data model
partition and replication
automation and scalability
trade-offs
how to set up and use a cluster
Introduction
Cassandra is a popular NoSQL data store that was developed by Facebook and incorporates architectural ideas from Bigtable and Dynamo DB. It is a data store built for scale, and some of its features only work on a multi-node Cassandra cluster.
The largest Cassandra clusters have tens of thousands of nodes and store petabytes of data. Users of Cassandra include many big tech companies like Apple, Netflix, Uber, Meta and so on.
Architecture
The first thing to keep in mind is that Cassandra has a has a decentralized architecture where all nodes in a cluster perform the same functions. Clients can connect to any node, and when they do, that node becomes the session coordinator for the client.