Glossary term
Deadlock
Engineering definition of deadlock covering wait-for graphs, circular wait, resource ordering, detection, recovery and validation evidence.
Definition
phenomenonA deadlock is a condition in which tasks, threads or processes cannot make progress because each is waiting for a resource or event held by another member of the same waiting cycle.
Deadlock appears in operating systems, concurrent services, embedded firmware, distributed protocols, control gateways and resource schedulers when mutual exclusion, hold-and-wait, no preemption and circular wait combine. A useful analysis states the resources, owners, waiters, wait-for cycle, recovery behavior, timeout limits, watchdog behavior, state consistency and validation evidence.
A deadlock is a condition in which tasks, threads or processes cannot make progress because each is waiting for a resource or event held by another member of the same waiting cycle. The system may be busy, idle or partially responsive, but the blocked tasks cannot complete without external intervention or recovery logic.
Deadlocks appear in operating systems, concurrent services, embedded firmware, distributed protocols, control gateways and resource schedulers. They are correctness and availability failures, not only performance problems. A slow lock eventually releases; a true deadlock does not resolve by waiting under the same state.
Wait-For Graph
A deadlock can be represented with a wait-for graph:
where vertices are tasks, threads or processes, and an edge:
means task T_i is waiting for a resource held by task T_j.
A deadlock exists when the wait-for graph contains a directed cycle:
No task in the cycle can proceed because each required release depends on another blocked task.
Necessary Conditions
Classic deadlock requires four conditions to hold together:
- mutual exclusion: at least one resource cannot be shared at the same time;
- hold and wait: a task holds one resource while waiting for another;
- no preemption: the system cannot safely force release of the held resource;
- circular wait: tasks form a waiting cycle.
Breaking any one condition can prevent that class of deadlock. Engineering designs usually target circular wait with resource ordering, or hold-and-wait with acquire-all-or-release behavior.
Resource Ordering
If resources have a strict global order:
and every task acquires resources only in increasing order, then circular wait through those resources is prevented.
For a task requesting resources:
the rule requires:
before acquiring R_b while holding R_a. If a task needs the reverse order, it should release, reorder, or use a design that avoids holding both.
Detection and Recovery
Detection scans the wait-for graph for cycles. If cycle count is:
then:
indicates at least one deadlock candidate. Recovery may abort one task, roll back a transaction, release a resource, restart a component, trip a watchdog, enter degraded mode or move to a safe state.
Recovery time should fit the system objective:
If recovery leaves state inconsistent, the deadlock handling may create a second failure.
Worked Example
Three tasks hold and wait for three resources:
| Task | Holds | Waits for |
|---|---|---|
T_1 | R_A | R_B |
T_2 | R_B | R_C |
T_3 | R_C | R_A |
The wait-for edges are:
Together:
This directed cycle is a deadlock.
Suppose the detection interval is:
and rollback plus restart takes:
The allowed degraded recovery time is:
The recovery screen is:
The timing passes, but the release case still needs state-consistency evidence after rollback.
Prevention
Prevention techniques include global resource ordering, lock hierarchy checks, try-lock with release on failure, bounded wait with safe rollback, single-owner resource design, message passing, transactional isolation, lock-free data structures and eliminating nested resource acquisition.
Timeouts alone do not prove absence of deadlock. A timeout can break a wait, but it can also leave partial state, duplicate work or inconsistent ownership. A timeout policy must define what is rolled back, what is retried, what is cancelled and what evidence proves recovery.
Relationship To Neighbor Terms
Lock contention is waiting on a lock that eventually releases. Deadlock is cyclic waiting that cannot resolve under the same state. Race conditions are timing-dependent correctness failures caused by insufficient ordering; deadlock can happen even when locking is added to prevent races. Watchdog timers can detect stalled execution, but they are recovery mechanisms, not proof that deadlock cannot occur.
Validation Evidence
Validation should include lock-order review, static analysis where available, runtime deadlock detection, fault injection, cancellation tests, timeout tests, watchdog reset tests, rollback tests, retained diagnostic context and state-invariant checks after recovery. Evidence should identify the resources, owners, waiters, blocked stack traces and recovery path.
A useful deadlock test does not only wait for a timeout. It creates the risky acquisition pattern, proves the cycle is detected or prevented, and verifies that the system returns to a consistent state.
Common Mistakes
The most common mistake is treating every long wait as a deadlock. A slow dependency, saturated queue or contended lock may eventually release. Another mistake is relying on a watchdog reset without preserving enough context to identify the cycle. A third is adding locks to remove a race condition without checking resource ordering. A fourth is breaking a deadlock by killing work but not validating rollback.
A strong deadlock review states the resource graph, ordering rule, detection method, recovery action, timeout budget, safe-state behavior and validation evidence.