Glossary term

Request Coalescing

Engineering definition of request coalescing covering single-flight work, duplicate request suppression, cache stampede control, waiters and validation evidence.

Definition

method

Request coalescing is a method that combines identical in-flight requests so one leader performs the expensive work while duplicate callers wait, reuse the result or receive a controlled fallback.

Request coalescing is used in distributed services, caches, API gateways, message consumers and data systems to suppress duplicate work during cache misses, retries, startup, recovery and thundering-herd events. A useful design defines the coalescing key, payload equivalence rule, leader election, waiter timeout, cancellation behavior, error propagation, stale response rule, origin protection and validation evidence.

Request coalescing is a method that combines identical in-flight requests so one leader performs the expensive work while duplicate callers wait, reuse the result or receive a controlled fallback. It is also called single-flight work or request collapsing.

The method is useful when many callers ask for the same thing at nearly the same time: a cache key expires, a service recovers, a popular object is missing, a token must be refreshed, or clients retry after an outage. Without coalescing, every caller may trigger the same database query, render, remote call or regeneration job.

Coalescing Key

The design starts with a coalescing key:

K_c

Only requests that are equivalent for the protected work should share a key. The key may include tenant, authorization scope, object id, query parameters, version, locale, feature flag or payload fingerprint.

If two requests are not semantically identical, coalescing can leak data or return the wrong result. If the key is too narrow, duplicate work remains. If it is too broad, correctness is at risk.

Leader and Waiters

For:

N

identical in-flight requests, request coalescing elects or records one leader:

N_{origin}=1

and the remaining requests become waiters:

N_{wait}=N-1

The leader performs the origin work. Waiters either wait for the same result, receive stale data, receive a degraded response, or time out according to the service contract.

Work Reduction

Without coalescing, origin work count is:

N_{origin,raw}=N

With single-flight coalescing for one key and one regeneration window:

N_{origin,coalesced}=1

The work-reduction factor is:

\displaystyle R_c=\frac{N_{origin,raw}}{N_{origin,coalesced}}=N

This large reduction applies only to truly duplicate work. It does not reduce unique requests or independent keys.

Waiter Deadline

Waiters cannot wait forever. If regeneration time is:

T_{regen}

and a waiter has remaining deadline:

T_{waiter}

then waiting is acceptable only if:

T_{regen}+T_{return}+T_{margin}\leq T_{waiter}

If the inequality fails, the waiter should receive stale data, a degraded response, rejection or a separate fallback rather than silently timing out.

Worked Example

A hot cache key expires. The number of callers that request the same key during the burst is:

N=6000

They arrive inside:

\Delta t=0.5\ \text{s}

Without coalescing, origin request rate is:

\displaystyle \lambda_{origin,raw}=\frac{6000}{0.5}=12000\ \text{requests/s}

With coalescing for that key, origin work count during the regeneration window becomes:

N_{origin,coalesced}=1

If regeneration takes:

T_{regen}=0.40\ \text{s}

the equivalent origin work rate for that key is:

\displaystyle \lambda_{origin,coalesced}=\frac{1}{0.40}=2.5\ \text{requests/s}

The number of waiters is:

N_{wait}=6000-1=5999

The work-reduction factor is:

\displaystyle R_c=\frac{6000}{1}=6000

Now check waiter timing. Suppose waiter remaining deadline is:

T_{waiter}=1.0\ \text{s}

Return and margin allowance is:

T_{return}+T_{margin}=0.15\ \text{s}

The wait margin is:

M_T=1.0-0.40-0.15=0.45\ \text{s}

The coalesced design fits this deadline screen. If regeneration p99 rose above:

0.85\ \text{s}

then waiters would need a fallback because:

0.85+0.15=1.0\ \text{s}

leaves no margin.

Boundary With Idempotency

Idempotency keys suppress duplicate side effects across retries of a logical operation. Request coalescing suppresses duplicate in-flight work for equivalent requests. The two can be used together but answer different questions.

For a read or cache regeneration, coalescing is often enough. For a state-changing command, coalescing without idempotency can be unsafe because callers may not be equivalent or the command outcome may need per-client acknowledgement.

Boundary With Thundering Herd

A thundering herd is the synchronized-arrival phenomenon. Request coalescing is one mitigation. It is strongest when many requests share the same key or origin work.

If the herd is made of many unique keys, coalescing has little effect. Then jitter, admission control, load shedding, bulkheads and queue limits become more important.

Validation Evidence

Useful evidence includes coalescing-key cardinality, leader count, waiter count, origin request rate, waiter latency, timeout rate, stale-response count, error propagation, cache freshness and fallback behavior.

Validation should reproduce a cache stampede or reconnect wave. The test should show that origin work is bounded, waiters receive a defined result, and failures do not release every waiter into immediate retry.

Common Mistakes

Do not coalesce requests with different authorization, tenant, payload or consistency requirements. Do not let waiters exceed their timeout budget. Do not return a leader error to every waiter without considering fallback or retry behavior. Do not create one global coalescing lock that becomes a single point of failure.

A good request-coalescing design states the key, equivalence rule, leader behavior, waiter behavior, timeout rule, fallback path, error propagation and validation evidence before relying on it to control a stampede.

REF

See also