

Discover more from The Polymathic Engineer
Hi Friends,
Welcome to the 38th edition of the Polymathic Engineer newsletter.
This week we will talk about a though, but important topic in Distributed System: Leader Election. I hope this won’t be too hard to digest ;-)
The outline will be as follows:
Introduction to the leader election problem
Raft algorithm
Paxos algorithm
The leader election problem
There are many use cases in distributed systems where a node needs to act as a leader having special powers.
For example the leader could be the node in charge of accessing shared resources, assigning work to other nodes or taking care of write operations.
But electing a leader in a distributed environment is not a trivial task, and a good leader election algorithm needs to guarantee two properties. First at most one leader shall be active at any given time. Second, a leader shall be elected even if failures occur.
There are many leader election algorithms, and all are based on the following assumptions:
Node and Network Failures: nodes can fail (crash and restart) and the network can lose, reorder, or duplicate messages
No Single Permanent Leader: any node can technically become a leader
No Byzantine Failures: consensus algorithms do not handle Byzantine failures, where nodes may act arbitrarily after being compromised. It assumes that nodes either follow the algorithm or fail by crashing
Stable Storage: once a leader is chosen, it is durably recorded and can be remembered even if nodes crash and recover.
The most popular ones are Raft and Paxos. Let’s explore how they works.
Raft
Raft is a leader election algorithm that makes two assumptions:
time is divided into slots of arbitrary length called election terms. Those terms are numbered using logical timestamps (e.g. consecutive integers).
at a given time each node can be in one of 3 states: the leader state, the follower state, the candidate state
This algorithm can be implemented as a state machine.
A node in the follower state recognizes another node as a leader. It expects to receive from the leader a periodic heartbeat with the term when it was elected. If this is not received within a timeout the node moves to the candidate state.
A node in the candidate state increments the current term and starts a new election. It sends to all other nodes a request to vote for it, specifying the current term. It also votes for another candidate using a first comes first served policy.
At the beginning all nodes are followers. They don't receive any heartbeat and move to candidates. From a candidate point of view, an election finishes for 3 possible events:
The candidate is voted by the majority of the other nodes and becomes the leader
The candidate receives an heartbeat from another node that won the election. If the term specified in the heartbeat is greater than the candidate current term, the other node is recognized as a leader and the candidate moves to follower state. Otherwise, the candidate keeps staying in the same state.
The election term time slot finishes without a candidate receiving the majority of the votes. In this case a new election is started randomly choosing its duration within an interval. This reduces the probability of another election without a winner.
Paxos
Paxos is a general consensus protocol that can be used for asynchronous leader election. It is more complex than Raft. Nodes are broken down into 3 main roles.
Proposer: initiates the consensus process by proposing a leader to the acceptors
Acceptor: accept proposals from the proposer node and decide whether or not they’ll accept a certain proposal
Learner: are informed of the outcome of the leader election process but do not actively participate in it
The algorithm operates in multiple rounds, each led by a unique proposer that suggests a value to be agreed upon. Acceptors vote on whether to accept the proposed value or not.
A value is considered chosen if it is accepted by a majority of the acceptors, thus ensuring that even if some nodes fail, consensus can still be reached.
Paxos guarantees safety by ensuring that only one unique leader is chosen, and it achieves liveness by continuously enabling proposers to make new proposals until one is accepted by the majority.
Final consideration
In practice, you’ll need to implement a leader election algorithm from scratch, except you need a solution with no external dependencies. A popular solution is to use a coordination service like Zookeeper. As an alternative you can use any fault-tolerant key-value data store supporting compare and swap operations.
Interesting Tweets
Supporting my family and giving to my kids the best possible life is one of the main motivations to do my best every day. But always keeping in mind that what they truly need is a father who participates in their activities.
Having a blog is an effective way to show that you are are passionate, know things well enough, and can effectively communicate in writing. The easiest way to start is to write posts about technical knowledge that would have helped you in the past.
Data structures and algorithms are not about passing tech interviews, but about becoming better engineers.