Glossary term

Thundering Herd

Engineering definition of thundering herd covering synchronized clients, cache stampedes, retry waves, jittered backoff and overload containment.

Definition

phenomenon

A thundering herd is a failure pattern in which many clients, workers or waiting tasks act at nearly the same time and overwhelm a shared resource.

Thundering-herd behavior appears in distributed services, caches, locks, message queues, operating systems, network recovery, worker pools and telemetry gateways. It can be triggered by cache expiry, service recovery, lock release, broadcast wakeup, synchronized retries, batch schedules, leader failover, cron jobs or client reconnects. A useful review states the synchronized trigger, number of actors, burst window, protected resource, spreading mechanism, admission rule and validation evidence.

A thundering herd is a failure pattern in which many clients, workers or waiting tasks act at nearly the same time and overwhelm a shared resource. The average request rate may look acceptable before and after the event, but the synchronized burst can exceed capacity by orders of magnitude.

The pattern appears in distributed services, caches, locks, operating systems, message queues, packet systems, telemetry gateways and recovery workflows. Common triggers include cache expiry, service restart, lock release, broadcast wakeup, synchronized retries, batch schedules, leader failover, DNS or connection recovery and clients reconnecting after an outage.

Burst Window

If:

N

clients or tasks act inside a burst window:

\Delta t

then the burst arrival rate is:

\displaystyle \lambda_{burst}=\frac{N}{\Delta t}

This rate, not the long-term average, controls whether queues, locks, caches, databases or network paths survive the event.

Capacity Condition

For protected capacity:

C

the burst is safe only if:

\lambda_{burst}\leq C

Capacity margin during the burst is:

M_C=C-\lambda_{burst}

When M_C is strongly negative, the system needs spreading, admission control, load shedding, bulkhead isolation or single-flight behavior before release.

Jittered Spreading

If the same N actors are spread randomly across a jitter window:

W_j

the average spread arrival rate is:

\displaystyle \lambda_{jitter}=\frac{N}{W_j}

This is a first screen. Real arrivals are random, so validation should still check tails and clustering. Jitter reduces synchronization; it does not create capacity.

Cache Stampede

A cache stampede is a common thundering-herd case. Many clients observe the same expired or missing cached value and all regenerate it at once. If regeneration cost is high, the origin store or computation tier can overload.

Single-flight or request coalescing changes the effective regeneration count from:

N_{regen}=N

to:

N_{regen}=1

for one key and one regeneration window, while other clients wait, receive stale data or get a controlled degraded response.

Worked Example

After a short dependency outage, a client fleet retries together. The number of clients is:

N=6000

Most retries arrive inside:

\Delta t=0.5\ \text{s}

The burst rate is:

\displaystyle \lambda_{burst}=\frac{6000}{0.5}=12000\ \text{requests/s}

The recovering service can process:

C=900\ \text{requests/s}

Burst overload is:

O=12000-900=11100\ \text{requests/s}

The service is not merely busy. It is exposed to more than thirteen times its recovery capacity:

\displaystyle \frac{12000}{900}=13.33

Now spread retries across:

W_j=20\ \text{s}

The average jittered retry rate is:

\displaystyle \lambda_{jitter}=\frac{6000}{20}=300\ \text{requests/s}

If normal traffic during recovery is:

\lambda_0=500\ \text{requests/s}

then total average demand is:

\lambda_{total}=500+300=800\ \text{requests/s}

Capacity margin becomes:

M_C=900-800=100\ \text{requests/s}

The jittered design is still not proof of safety, but it changes the recovery problem from an impossible synchronized burst to a load level that can be tested and protected.

Boundary With Retry Storm

A retry storm is about repeated attempts multiplying load during failure. A thundering herd is about many actors acting at the same time. The two often combine: deterministic retry delays create synchronized waves, and each wave can be amplified by retry count.

A retry budget limits the amount of retry work. Jitter, admission control and load shedding control when that work arrives and whether it should be accepted.

Boundary With Backpressure

Backpressure reacts to downstream pressure and asks producers to slow or reshape work. A thundering herd may arrive too fast for cooperative backpressure to take effect unless producers already implement jitter, retry budgets, queue limits or admission rules.

The practical design should avoid waking every producer at the same instant. Stagger cache refresh, randomize retry delays, cap reconnection attempts, use single-flight regeneration, and protect shared resources with explicit limits.

Validation Evidence

Useful evidence includes burst-size estimates, retry-delay distribution, cache-expiry distribution, reconnect rate, queue depth, lock wait, origin-store load, admitted rate, shed rate, latency percentiles and recovery time.

Validation should deliberately synchronize clients or workers to reproduce the bad case. A steady load test can miss the failure entirely. The acceptance test should show bounded queue growth and recovery without manual traffic draining.

Common Mistakes

Do not validate only average traffic. Do not set identical retry delays across clients. Do not let all cache keys expire at the same absolute time. Do not wake all workers for one unit of work. Do not assume autoscaling can react inside a subsecond herd.

A good thundering-herd review states the synchronized trigger, actor count, burst window, protected resource, spreading method, admission or shedding behavior, and evidence that recovery remains bounded.

REF

See also