Glossary term

Lock Contention

Engineering definition of lock contention covering mutex wait, critical-section saturation, shared-state bottlenecks, tail latency and validation evidence.

Branch: Computer Engineering
Glossary type: phenomenon
Content: Glossary term
Updated: Jun 26, 2026
Revision: v1.0.0 · reviewed

Definition

phenomenon

Lock contention is the performance and latency degradation that occurs when multiple threads, tasks or processes wait for the same mutex, critical section or serialized resource.

Lock contention appears in operating systems, concurrent services, embedded firmware, real-time software, caches, queues, shared counters and transactional code when shared-state protection becomes a serialized bottleneck. A useful analysis states the critical-section duration, entry rate, lock utilization, wait-time distribution, owner behavior, priority interaction, cancellation behavior and validation evidence.

Lock contention is the performance and latency degradation that occurs when multiple threads, tasks or processes wait for the same mutex, critical section or serialized resource. The lock may protect correctness, but under high demand it can become the resource that controls throughput and tail latency.

The phenomenon appears in operating systems, concurrent services, embedded firmware, real-time software, shared counters, caches, queues, schedulers and transactional code. It is not the same as a race condition. A lock can remove a race while creating a throughput bottleneck.

Critical-Section Demand

If each request enters a critical section:

m

times and request arrival rate is:

\lambda

then critical-section entry demand is:

\lambda_{cs}=m\lambda

This demand should be compared with the rate at which the lock can serve critical sections.

Lock Capacity

If mean lock hold time is:

t_{cs}

then the first-pass service rate of the serialized lock is:

\displaystyle \mu_{lock}=\frac{1}{t_{cs}}

Lock utilization is:

\displaystyle \rho_{lock}=\frac{\lambda_{cs}}{\mu_{lock}}=\lambda_{cs}t_{cs}

A necessary stability condition is:

\rho_{lock}<1

As utilization approaches one, small changes in hold time or arrival rate can create large waiting-time increases.

Wait-Time Screen

A simplified queueing screen for mean lock wait is:

\displaystyle W_q\approx\frac{\rho_{lock}t_{cs}}{1-\rho_{lock}}

This model is not a substitute for profiling. Real locks have scheduler effects, priority interactions, cache coherence, preemption, interrupt masking, spinning, blocking and tail distributions. The screen is still useful because it shows why a lock with a short average hold time can dominate p99 latency when it is frequently entered.

Serial Fraction

Lock contention also limits parallel speedup. If the serialized fraction of a workload is:

f_s

then an Amdahl-style upper bound on speedup is:

\displaystyle S_{max}\approx\frac{1}{f_s}

For a request path with total time:

T_{req}

and serialized lock time:

T_{lock}=mt_{cs}

the lock serial fraction is:

\displaystyle f_s=\frac{T_{lock}}{T_{req}}

Adding worker threads cannot remove this serial fraction. It may increase contention if every worker reaches the same hot lock more often.

Worked Example

A service receives:

\lambda=5000\ \text{requests/s}

Each request enters the same lock:

m=2

times. Critical-section demand is:

\lambda_{cs}=2(5000)=10000\ \text{entries/s}

The measured hold time is:

t_{cs}=60\ \mu\text{s}=60\times10^{-6}\ \text{s}

Lock capacity is:

\displaystyle \mu_{lock}=\frac{1}{60\times10^{-6}}=16666.7\ \text{entries/s}

Utilization is:

\rho_{lock}=10000(60\times10^{-6})=0.60

The mean wait screen is:

\displaystyle W_q\approx\frac{0.60(60\ \mu\text{s})}{1-0.60}=90\ \mu\text{s}

Now suppose a logging change increases hold time to:

t_{cs}=95\ \mu\text{s}

Then:

\rho_{lock}=10000(95\times10^{-6})=0.95

and:

\displaystyle W_q\approx\frac{0.95(95\ \mu\text{s})}{1-0.95}=1805\ \mu\text{s}

A 35 us increase in hold time creates about 1.8 ms of mean lock wait in this simplified screen. The tail can be worse.

If the critical section is reduced to:

t_{cs}=35\ \mu\text{s}

then:

\rho_{lock}=10000(35\times10^{-6})=0.35

and:

\displaystyle W_q\approx\frac{0.35(35\ \mu\text{s})}{1-0.35}=18.8\ \mu\text{s}

Causes

Common causes include one global mutex, shared counters, coarse-grained caches, single writer queues, logging inside a lock, memory allocation while locked, I/O while locked, lock ordering mistakes, retry bursts, thread pools with too many workers, and priority inversion. A lock can also look contended because the thread that owns it is blocked on storage, network, page fault or garbage collection.

Controls

Mitigations include reducing critical-section duration, moving I/O out of the lock, sharding protected state, using per-thread or per-core counters, batching updates, using read-copy-update patterns, replacing shared mutation with message passing, changing lock scope, adding admission control or reducing worker concurrency.

Not every lock should be removed. Locks often preserve important invariants. The engineering question is whether the invariant needs one global serialized path, or whether it can be partitioned, staged, batched or validated another way.

Relationship To Neighbor Terms

Thread pool saturation concerns worker slots. Lock contention concerns a serialized synchronization resource that workers wait on. Cache false sharing can look like lock-free scaling but is a memory-layout problem, not a mutex wait problem. Queue backpressure controls upstream work when queues grow. Timeout budgets define how much lock wait can fit before work becomes useless.

Validation Evidence

Validation should measure lock hold time, lock wait time, lock acquisition count, owner stack traces, waiter stack traces, p95 and p99 latency, runnable queue length, context switches, CPU spin time, blocked time, priority inversion, retry rate and behavior under burst traffic. Tests should include normal load, peak load, degraded dependency, logging enabled, cancellation, timeout and recovery.

The evidence should identify the specific lock and state protected. A report that says “lock contention is high” but cannot name the lock, owner path, hold-time distribution and protected invariant is not ready for a release decision.

Common Mistakes

The most common mistake is adding more threads to a workload limited by one hot lock. Another is measuring only CPU utilization while threads spend time waiting. A third is removing a lock without replacing the correctness invariant it protected. A fourth is optimizing average hold time while ignoring rare long hold events that dominate p99 latency.

A strong lock-contention review states entry rate, hold time, wait time, owner path, protected invariant, mitigation, retest evidence and residual concurrency risk.

REF

Disciplines