Glossary term

Futex

Engineering definition of a futex covering user-space atomic fast paths, kernel wait/wake slow paths, contention cost, lost-wakeup avoidance and validation.

Branch: Computer Engineering
Glossary type: concept
Content: Glossary term
Updated: Jun 26, 2026
Revision: v1.0.0 · reviewed

Definition

concept

Futex-style primitives appear in operating systems, language runtimes, pthread-style locks and high-performance concurrent libraries when entering the kernel for every lock operation would be too expensive. A useful design states the memory word, expected value check, atomic operation, wait queue key, wake rule, timeout behavior, cancellation behavior, memory-ordering contract, contention metric and validation evidence.

A futex is a kernel-assisted synchronization primitive in which the uncontended fast path is handled with atomic operations in user space and the contended slow path uses kernel wait and wake operations. It is not a complete mutex by itself. It is a low-level mechanism that runtimes can use to build mutexes, condition variables, semaphores and other blocking primitives.

The engineering idea is simple: do not enter the kernel when no one needs to sleep. Use an atomic operation on a shared memory word for the common case. If the word shows contention, ask the kernel to block the thread until another thread changes the state and wakes waiters.

State Word

Let the futex word be:

u

and let the value expected by a waiting thread be:

v_{exp}

A wait operation is valid only if:

u=v_{exp}

at the moment the kernel checks the word. This expected-value check is what prevents a common lost-wakeup race. If another thread changed the state before the waiter actually slept, the wait should fail or return immediately rather than sleeping on stale information.

Fast and Slow Paths

The user-space path uses an atomic operation such as compare-and-swap, exchange or fetch-add. Let the cost of that path be:

C_{user}

and the cost of entering the kernel and returning be:

C_{sys}

The design target is:

C_{user}\ll C_{sys}

When contention probability is:

P_{slow}

the approximate expected acquisition cost is:

E[C]\approx (1-P_{slow})C_{user}+P_{slow}C_{sys}

Futex-style designs work best when most operations stay on the user-space path.

Wait and Wake

A wait operation usually associates the thread with a wait queue keyed by the address of the futex word. A wake operation releases one or more waiters. If the number of waiters woken is:

N_{wake}

and only:

N_{useful}

can acquire the resource, then extra wakeups are:

N_{extra}=N_{wake}-N_{useful}

Waking too many threads can create a thundering-herd effect. Waking too few can leave capacity unused or delay progress.

Memory Ordering

The futex word only coordinates sleeping and waking. It does not automatically define every memory-ordering rule for the protected data. The runtime must pair the atomic state transitions with acquire and release semantics appropriate for the language and processor.

For protected state:

S_p

a successful lock acquisition should observe writes made before the prior unlock:

S_{unlock}\rightarrow S_{lock}

If the atomic operations are too weak, the wait/wake protocol may look correct while readers still observe stale or reordered data.

Timeout and Cancellation

Blocking on a futex should be integrated with timeout and cancellation rules. Let the caller’s remaining timeout budget be:

B_t

and the observed slow-path wait be:

T_{wait}

The call remains useful only while:

T_{wait}\leq B_t

If timeout or cancellation occurs, the caller must know whether it owns the resource, whether it must retry the state check, and whether any wake responsibility remains.

Relationship To Higher-Level Locks

A mutex can use a futex internally, but a futex does not provide owner tracking, recursive-lock rules, priority inheritance, robust owner-death handling or lock-order diagnostics unless the runtime builds those features. A condition variable can use futex-style waiting, but the protected predicate and associated mutex are still separate design obligations.

This distinction matters during debugging. Seeing futex wait time in a trace does not automatically identify the high-level bug. The root cause may be a long mutex hold, a missed condition signal, unfair wake policy, priority inversion, thread-pool saturation or a slow owner blocked in I/O.

Failure Modes

Common failure modes include waiting on the wrong word, changing the state without a matching wake, waking before publishing state, using weak memory ordering, waking too many waiters, ignoring spurious wakeups, losing timeout ownership, priority inversion in the higher-level lock, and treating futex wait time as the cause rather than as evidence of contention.

The expected-value check should be reviewed carefully. If user space checks the word and then enters the kernel without an atomic check-and-sleep protocol, a wake can occur between those steps and the thread can sleep forever.

Worked Check

Suppose an uncontended atomic fast path costs:

C_{user}=60\ \text{ns}

and a contended kernel slow path costs:

C_{sys}=2.8\ \mu\text{s}

If slow-path probability is:

P_{slow}=0.04

then:

E[C]\approx 0.96(60\ \text{ns})+0.04(2800\ \text{ns})=169.6\ \text{ns}

If a burst wakes:

N_{wake}=8

threads but only one can acquire the lock, then:

N_{extra}=8-1=7

The cost model may still look good on average, but the wake policy can create short bursts of runnable work and tail latency.

Validation Evidence

Useful evidence includes fast-path hit ratio, futex wait counts, wake counts, timeout counts, spurious wake handling, owner stack traces from the higher-level lock, memory-ordering review, contention histograms, priority-inversion checks, cancellation tests and p99 latency under burst load.

A strong release claim states which high-level primitive uses the futex, what memory word encodes the state, how lost wakeups are prevented, how many waiters are woken, and how slow-path evidence maps to user-visible latency or deadline risk.

REF

Disciplines