Glossary term
Two-Phase Commit
Engineering definition of two-phase commit covering prepare and commit phases, atomic decision, coordinator failure, blocking, latency and validation.
Definition
conceptTwo-phase commit is an atomic commit protocol in which a coordinator first asks participants to prepare, then issues a final commit or abort decision.
Two-phase commit appears in distributed databases, transaction managers, storage systems and tightly coupled service architectures when multiple participants must make one all-or-nothing decision. A useful design states the coordinator, participant set, durable prepare record, decision log, timeout behavior, lock-holding period, recovery rule, blocking risk, operational limits and validation evidence.
Two-phase commit is an atomic commit protocol in which a coordinator first asks participants to prepare, then issues a final commit or abort decision. Its purpose is to make several participants reach one all-or-nothing transaction outcome.
Two-phase commit is useful when the system needs atomic visibility across multiple durable resources. It is also expensive and operationally strict: participants may hold locks while waiting, the coordinator decision must be durable, and some failures leave participants in an in-doubt state until recovery completes.
Transaction Model
Let the distributed transaction be:
and let the participant set be:
The atomicity requirement is that every participant eventually reaches the same final decision:
for all:
where:
Phase 1: Prepare
The coordinator sends a prepare request. Each participant decides whether it can commit locally. If it votes yes, it must force enough state to durable storage to commit later even after a crash:
A yes vote is a promise. After a participant votes yes, it should not unilaterally abort unless the protocol and recovery rules explicitly allow it.
Phase 2: Decision
The coordinator commits only if every participant voted yes:
If any participant votes no, times out before prepare, or cannot guarantee durability, the coordinator decides abort:
The coordinator records the final decision before notifying participants:
Participants then commit or abort their local work and release resources.
Blocking Behavior
Two-phase commit can block. If a participant has entered prepared state and then loses contact with the coordinator before receiving the final decision, it may not know whether the global decision was commit or abort.
The in-doubt condition is:
During this period, the participant may need to retain locks, undo records, redo records or reserved resources. The practical risk is not only latency; it is operational stall and resource retention.
Latency Cost
A simplified latency screen is:
where T_prepare and T_decision include network round trips and coordination overhead, while T_{log,p} and T_{log,c} represent durable logging on participants and coordinator.
For:
and:
the screened commit latency is:
before application processing and queueing delay.
Availability Screen
If the coordinator has availability:
and each required participant has availability:
a rough independent availability screen is:
For one coordinator and four participants each at:
the transaction availability screen is:
This is only a first-pass screen. Correlated failures, network partitions, storage stalls and overload can dominate the result.
Lock Holding
Prepared participants often hold locks until the final decision is known. If local work time is:
and two-phase commit adds:
then minimum lock holding is approximately:
If coordinator recovery takes:
an in-doubt participant may hold resources for:
which can create lock contention, deadlock symptoms, queue growth and timeout cascades.
Boundary With Saga and Outbox
Two-phase commit tries to preserve one atomic transaction across participants. A saga commits local steps and uses compensating actions for later failure. A transactional outbox commits local state and event intent together, then publishes later.
The choice is not stylistic. Use two-phase commit only when participants can support durable prepare, blocking is acceptable, and atomic visibility is required. Prefer saga or outbox-style patterns when services own independent data, long-running locks are unacceptable, or compensation and eventual consistency are acceptable engineering tradeoffs.
Validation
Validation should include participant crash before prepare, participant crash after prepare, coordinator crash before decision, coordinator crash after decision log, delayed decision messages, duplicate decisions, lost acknowledgements, recovery from in-doubt state, timeout behavior, lock retention, partitioned coordinator, storage-log failure and operator recovery.
Useful evidence includes coordinator logs, participant prepare records, final decision replay, lock-hold distributions, in-doubt transaction count, recovery time, abort reason, timeout count, queue depth, downstream retry rate and proof that participants never split between commit and abort for the same transaction.
Failure Modes
Common failure modes include assuming a participant can prepare when it cannot durably recover, losing the coordinator decision log, timing out a prepared participant into a unilateral abort, holding locks across slow external calls, letting in-doubt transactions accumulate, retrying ambiguous commits without idempotency, treating 2PC as a general microservice pattern, and validating only the happy path where all participants answer immediately.
Two-phase commit is strongest when the participant set is small, tightly controlled and operationally prepared for blocking recovery. It is weakest when stretched across unreliable services, human-scale workflows, heterogeneous stores or networks where partitions and long pauses are normal.