The Polymathic Engineer

The Polymathic Engineer

Share this post

The Polymathic Engineer
The Polymathic Engineer
The RAFT Consensus Algorithm
Copy link
Facebook
Email
Notes
More

The RAFT Consensus Algorithm

How RAFT solves the consensus problem in distributed systems: a step-by-step breakdown that every engineer can understand.

Franco Fernando's avatar
Franco Fernando
May 09, 2025
∙ Paid
12

Share this post

The Polymathic Engineer
The Polymathic Engineer
The RAFT Consensus Algorithm
Copy link
Facebook
Email
Notes
More
1
Share

Hi Friends,

Welcome to the 121st issue of the Polymathic Engineer newsletter. This week, we will talk about the consensus problem and the RAFT algorithm.

Consensus algorithms allow a collection of machines to work as a coherent group that can survive the failures of some of its members. Because of this, they play a key role in building reliable, large-scale software systems.

Raft is a relatively new algorithm for enabling distributed consensus. It has become extremely popular because it is relatively easy to understand and implement.

The outline is as follows:

  • The consensus problem

  • Replicated state machines

  • The Raft algorithm (basics)

  • Leader election

  • Log replication

  • Safety


Project-based learning is the best way to develop technical skills. CodeCrafters is an excellent platform for practicing exciting projects, such as building your version of Redis, Kafka, DNS server, SQLite, or Git from scratch. I’m currently in the challenge of implementing my own shell and having a lot of fun.

Sign up, and become a better software engineer.


What is distributed consensus

In real life, consensus means that everyone in a group agrees on something.

An example of such an agreement is when a group of friends chooses a place to eat dinner. On a larger scale, when people in a country choose a government, that's also an agreement.

Of course, the methods people choose to reach an agreement differ in each situation, but their purpose is the same.

In distributed systems, the word consensus means the same. Each system is made of different processes that work on their own and must agree on several choices to reach a common goal.

For example, a group of processes may want to agree on questions such as:

  • Which leader process has special powers, like accessing a shared resource or assigning work to others?

  • Has a message been successfully committed to a distributed queue?

  • Does a process hold a lease or not?

  • What is a value in a data store for a given key?

Finding such an agreement between processes is known as the consensus problem, and it is at the heart of building reliable and fault-tolerant distributed systems.

All the above scenarios have in common that they require the processes to share a consistent view of some system state. All the processes must store a copy of the state locally, and the state should be mutable. When an external client changes the state of one process, this change gets copied over to all the others.

This may not be difficult to implement if you assume there are no failures and the network is reliable. However, in real-world distributed systems, the messages between processes can be delayed, lost, or delivered out of order, and individual processes can crash, restart, or do the wrong thing.

These challenges make it difficult to determine whether a process has indeed failed or is just slow to respond and to ensure that all non-faulty processes eventually reach agreement.

In general, a consensus algorithm needs to exhibit three properties:

  1. Termination: every non-faulty process eventually agrees on a value;

  2. Agreement: the final decision of every non-faulty process is the same everywhere;

  3. Integrity: the value that has been agreed on has been proposed by a process.

Raft is a consensus algorithm that satisfies such properties by providing the strongest consistency guarantee possible — the guarantee that to the clients, the state appears to be stored on a single process, even if it's replicated.

Replicated state machines

Imagine a group of friends who play a board game, but instead of sitting together around the same table, they are in different rooms. To keep the game going, each has an identical copy of the game board and communicates their moves to each other.

If everyone follows the same rules and applies the moves in the same order, all the game boards should stay in sync.

Raft is based on a similar principle known as state machine replication. The main idea is that a single process (the leader) broadcasts operations that change its state to other processes (the followers). If the followers execute the same sequence of operations as the leader, then each follower will end up in the same state.

This mechanism is called stated machine replication because each process is modeled as a state machine that transitions from one state to another after executing an operation on some input.

If the state machines are deterministic and get precisely the same input in the same order, their states are consistent. Here's how they work in practice:

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Franco Fernando
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More