/ræft/
noun … “Simplified consensus algorithm for distributed systems.”
Raft is a fault-tolerant Consensus algorithm designed to manage a replicated log in a Distributed System. Raft ensures that multiple nodes agree on a sequence of state changes, providing strong consistency and simplifying the complexity associated with other consensus protocols like Paxos. It is widely used in distributed databases, configuration services, and fault-tolerant systems.
Key characteristics of Raft include:
- Leader-based approach: one node acts as a leader, coordinating log replication and client requests.
- Log replication: the leader appends commands to its log and ensures follower nodes replicate the same entries in order.
- Election and fault tolerance: if the leader fails, a new leader is elected among followers using randomized timers to avoid conflicts.
- Safety: all committed entries are guaranteed to be durable and consistent across all non-faulty nodes.
- Simplicity: Raft separates leader election, log replication, and safety mechanisms to make understanding and implementation more straightforward than Paxos.
Workflow example: In a distributed key-value store using Raft, a client submits a write operation. The current leader appends the operation to its log, then sends append entries requests to follower nodes. Once a majority of followers acknowledge the entry, it is considered committed, and the leader applies it to its local state machine. Followers apply the entry once committed. If the leader crashes, a new leader is elected and resumes log replication without violating consistency.
-- Simplified Raft log replication
leader = "Node1"
followers = ["Node2", "Node3"]
entry = "Set X = 42"
leader_log.append(entry)
for follower in followers {
send_append_entries(follower, entry)
}
if majority_acknowledged(followers, entry):
commit(entry)
}
-- All nodes eventually apply the committed entryConceptually, Raft is like a conductor leading an orchestra: the leader ensures all musicians follow the same sheet of music in sync. If the conductor is unavailable, the orchestra quickly elects a new conductor to continue performing without missing a beat.
See Consensus, Paxos, Distributed Systems, Replication, CAP Theorem.