/ræft/

noun … “Simplified consensus algorithm for distributed systems.”

Raft is a fault-tolerant Consensus algorithm designed to manage a replicated log in a Distributed System. Raft ensures that multiple nodes agree on a sequence of state changes, providing strong consistency and simplifying the complexity associated with other consensus protocols like Paxos. It is widely used in distributed databases, configuration services, and fault-tolerant systems.

Key characteristics of Raft include:

  • Leader-based approach: one node acts as a leader, coordinating log replication and client requests.
  • Log replication: the leader appends commands to its log and ensures follower nodes replicate the same entries in order.
  • Election and fault tolerance: if the leader fails, a new leader is elected among followers using randomized timers to avoid conflicts.
  • Safety: all committed entries are guaranteed to be durable and consistent across all non-faulty nodes.
  • Simplicity: Raft separates leader election, log replication, and safety mechanisms to make understanding and implementation more straightforward than Paxos.

Workflow example: In a distributed key-value store using Raft, a client submits a write operation. The current leader appends the operation to its log, then sends append entries requests to follower nodes. Once a majority of followers acknowledge the entry, it is considered committed, and the leader applies it to its local state machine. Followers apply the entry once committed. If the leader crashes, a new leader is elected and resumes log replication without violating consistency.

-- Simplified Raft log replication
leader = "Node1"
followers = ["Node2", "Node3"]
entry = "Set X = 42"
leader_log.append(entry)
for follower in followers {
    send_append_entries(follower, entry)
}
if majority_acknowledged(followers, entry):
    commit(entry)
}
-- All nodes eventually apply the committed entry

Conceptually, Raft is like a conductor leading an orchestra: the leader ensures all musicians follow the same sheet of music in sync. If the conductor is unavailable, the orchestra quickly elects a new conductor to continue performing without missing a beat.

See Consensus, Paxos, Distributed Systems, Replication, CAP Theorem.