Glossary term

CAP Theorem

Engineering definition of the CAP theorem covering consistency, availability, partition tolerance, CP/AP tradeoffs, stale data, conflicts and validation.

Definition

concept

The CAP theorem states that during a network partition, a distributed data system cannot simultaneously provide strong consistency and full availability for all partitioned clients.

The CAP theorem is used when reasoning about replicated databases, distributed services, control-plane stores, caches, collaborative systems and edge platforms. A useful engineering review states the consistency model, availability promise, partition behavior, rejected operations, stale-read risk, conflict-resolution rule, recovery process, user impact and validation evidence. Partition tolerance is not usually an optional feature of a real distributed system; partitions are part of the failure model.

The CAP theorem states that during a network partition, a distributed data system cannot simultaneously provide strong consistency and full availability for all partitioned clients. The practical question is what the system does when replicas cannot communicate.

The common phrase “choose two” is too loose for engineering use. A real distributed system should assume partitions can occur. The useful design choice is whether partitioned clients may continue receiving successful responses that could later conflict, or whether some requests are rejected to preserve a stronger consistency rule.

Terms

Consistency in CAP usually refers to a strong single-copy view, often close to linearizable behavior. If a write is accepted, later reads should not see an older conflicting state.

Availability means every request to a non-failed node receives a non-error response. It does not mean low latency or correct business outcome.

Partition tolerance means the system has a defined behavior when messages between groups of nodes are delayed or lost:

P=1

In a networked system, partition behavior is part of the fault model, not an optional checkbox.

Partition Geometry

For:

N

replicas, a majority quorum is:

\displaystyle Q=\left\lfloor\frac{N}{2}\right\rfloor+1

If a partition creates two sides:

n_1+n_2=N

then a quorum-preserving CP write can be accepted only on a side where:

n_i\geq Q

With fixed majority quorum, both sides cannot satisfy the write rule at the same time.

CP Behavior

A CP design preserves the stronger consistency rule during a partition by refusing some operations. A side without quorum may reject writes, serve only stale-safe reads, enter read-only mode or route users to a degraded response.

For:

N=5

the majority quorum is:

Q=3

If the partition is:

n_1=3,\quad n_2=2

only the side with three replicas can accept quorum writes. The two-replica side sacrifices availability for those writes.

AP Behavior

An AP design keeps accepting operations on reachable replicas during a partition. This improves local availability but creates reconciliation work after communication returns.

If both sides accept writes for the same logical item:

accept(n_1)=1,\quad accept(n_2)=1

then the system needs a conflict rule, merge rule, compensation rule or operator decision after the partition heals.

CRDTs are one way to make selected data structures converge under AP behavior. They do not make every business invariant safe.

Staleness and Conflict Exposure

If the partition lasts:

T_p

repair and propagation take:

T_r

and clients can observe old data until propagation finishes, a simple stale-exposure window is:

T_s=T_p+T_r

For overlapping writes at rate:

\lambda_w

and probability:

p_c

that a write conflicts with a concurrent partition-side write, an expected conflict screen is:

E[C]=\lambda_w T_p p_c

For:

\lambda_w=12\ \text{writes/s},\quad T_p=45\ \text{s},\quad p_c=0.02

the expected conflict count is:

E[C]=12\cdot45\cdot0.02=10.8

This is not a prediction guarantee. It is a planning screen for whether automatic merge, user reconciliation or operation rejection is credible.

Boundary With Consensus

Consensus algorithms are usually used to implement CP behavior for a replicated state machine or metadata service. They can preserve one agreed command order, but they may reject or delay work when quorum is unavailable.

CAP is not a replacement for consensus, quorum or leader election. It is a way to reason about the product promise when those mechanisms face partitions. A system can use consensus internally and still expose stale reads from caches if the read path is not covered by the same consistency contract.

Boundary With Availability Engineering

Availability in CAP is narrower than site reliability availability. A service can return a fast HTTP success while violating the consistency requirement. It can also preserve consistency by returning a clear error, which is unavailable in the CAP sense but may be operationally safer.

The right choice depends on consequence. A collaborative note may prefer availability and merge. A bank transfer, scarce inventory command, actuator authority transfer or schema migration may need consistency and explicit rejection during partition.

Validation

Validation should include network partitions, asymmetric partitions, delayed healing, stale reads, conflicting writes, duplicate submissions, retry behavior, cache behavior, failover interaction, client timeout behavior, reconciliation, operator procedures and metrics that distinguish accepted, rejected, stale and merged operations.

Useful evidence includes partition test traces, quorum state, read consistency checks, write rejection counts, stale-read age, conflict count, merge outcome, reconciliation time, user-visible error behavior, retry rate, data-loss checks and post-heal invariant validation.

Failure Modes

Common failure modes include treating CAP as a slogan, claiming partition tolerance without testing partitions, hiding rejected writes as success, serving stale cache data while claiming strong consistency, merging non-mergeable business operations, using timestamps as a conflict policy without causal evidence, and ignoring the user impact of read-only or degraded modes.

The engineering value of CAP is not the acronym. It is the forced question: during a partition, which operations are allowed to succeed, which are rejected, how stale can observations become, and what evidence proves that recovery preserves the invariants that matter?

REF

See also