Availability

/əˌveɪləˈbɪləti/

noun … “System responds to requests, even under failure.”

Availability is the property of a Distributed System that ensures every request receives a response, regardless of individual node failures or network issues. In the context of the CAP Theorem, availability guarantees that the system continues to serve read or write operations even during network partitions, although the returned data may not reflect the latest global state. High availability is a cornerstone of fault-tolerant services, web applications, and cloud platforms.

Key characteristics of Availability include:

Continuous responsiveness: the system aims to answer every request without indefinite delays.
Redundancy: multiple nodes or replicas handle requests, so failures of individual nodes do not prevent service.
Graceful degradation: the system may reduce functionality under heavy load or partial failure but remains operational.
Tradeoff with consistency: during partitions, maintaining availability may require returning data that is temporarily inconsistent.
Monitoring and recovery: automated health checks, failover, and load balancing ensure sustained availability in production.

Workflow example: In a replicated key-value store with three nodes, if one node fails, the remaining nodes continue accepting reads and writes. Clients may receive slightly outdated values, but service is uninterrupted. Load balancers and replication mechanisms route requests to available nodes, maintaining responsiveness while the failed node recovers.

-- Example: simplified availability check
nodes = ["Node1", "Node2", "Node3"]
failed_node = "Node2"
available_nodes = [n for n in nodes if n != failed_node]
for node in available_nodes {
    respond("Request handled by " + node)
}
-- Output:
-- Request handled by Node1
-- Request handled by Node3

Conceptually, Availability is like a 24/7 convenience store with multiple entrances: even if one entrance is blocked, customers can still access the store through other doors, keeping service continuous.

See Distributed Systems, CAP Theorem, Partition Tolerance, Consistency, Replication.

Software

Computing

Principle