/kənˈsɛnsəs/

noun … “Agreement among distributed nodes.”

Consensus is the process by which multiple nodes in a Distributed System agree on a single value or state despite failures, message delays, or node crashes. Consensus ensures that all non-faulty nodes make consistent decisions, which is crucial for maintaining data integrity, coordinating actions, and implementing replicated state machines. It underpins critical operations in databases, blockchain networks, and fault-tolerant services.

Key characteristics of Consensus include:

  • Fault tolerance: the ability to reach agreement even if some nodes fail or act maliciously (Byzantine or crash failures).
  • Agreement: all non-faulty nodes choose the same value.
  • Validity: if all nodes propose the same value, that value must be chosen.
  • Termination: the protocol must eventually produce a decision.
  • Consistency under partitions: consensus algorithms handle network splits to avoid divergent states.

Popular consensus algorithms include Paxos, Raft, and Practical Byzantine Fault Tolerance (PBFT). Paxos and Raft focus on crash-tolerant systems, using leader election and quorum-based voting to coordinate state changes. PBFT addresses Byzantine failures, where nodes may behave arbitrarily or maliciously, using multiple rounds of communication to ensure correctness.

Workflow example: In a replicated key-value store, multiple nodes maintain copies of the same dataset. When a client requests a write, the system uses a consensus algorithm to ensure that all replicas agree on the new value. Even if one or more nodes crash or messages are delayed, consensus ensures that the update is either applied consistently to all non-faulty nodes or rejected entirely, preventing divergent states.

-- Example: simplified Raft-like agreement
proposals = ["A", "B", "A"]
votes = count_each(proposals)
chosen_value = max_vote(votes)
print("Consensus value: " + str(chosen_value))
-- Output: Consensus value: A

Conceptually, Consensus is like a committee voting on a resolution with some members absent or unreliable. The group follows rules to ensure that all honest members end up in agreement, even under uncertainty or partial communication.

See Distributed Systems, CAP Theorem, Paxos, Raft, Replication.