Glossary term
Software Circuit Breaker
Engineering definition of software circuit breaker covering open, half-open and closed states, retry containment, cooldown and validation.
Definition
conceptA software circuit breaker is a resilience mechanism that temporarily stops or limits calls to a failing dependency so the caller and dependency do not amplify the failure.
Software circuit breakers are used in distributed services, APIs, edge gateways, control platforms and operational systems to bound retry storms and dependency overload. They usually move between closed, open and half-open states based on failure ratio, timeout rate, latency, rejection rate, cooldown and recovery probes. A software circuit breaker is distinct from an electrical circuit breaker even though both interrupt harmful flow.
A software circuit breaker is a resilience mechanism that temporarily stops or limits calls to a failing dependency so the caller and dependency do not amplify the failure. It is a software pattern, not the electrical protection device with the same name.
The usual states are closed, open and half-open. Closed means calls are allowed. Open means most calls are blocked, rejected, failed fast or routed to a degraded response. Half-open means a small number of probe calls are allowed to test whether the dependency has recovered.
Trip Condition
For a monitoring window with:
calls and:
failed or timed-out calls, the observed failure ratio is:
A simple trip rule is:
The minimum-window condition prevents one or two early failures from opening the breaker during harmless noise.
Retry Containment
Retries can multiply load during partial failure. If an original request rate is:
and each request can retry twice with failure probability:
then a simple expected-attempt multiplier is:
Effective dependency arrival rate is:
Opening the circuit breaker reduces dependency pressure by failing fast or using a degraded response instead of continuing to send retries into an overloaded dependency.
The breaker should be coordinated with retry policy. If callers retry the fast failure immediately, the dependency may be protected but the caller tier can still saturate its own workers, queues or network sockets. The release design should state whether rejected calls are retried, cached, queued, shed or surfaced to the user.
Half-Open Recovery
After cooldown:
the breaker may allow:
probe calls. If:
the breaker can close. If probes fail, it should reopen and extend or repeat cooldown according to the release policy.
Worked Example
A service observes:
dependency calls, with:
The failure ratio is:
The trip threshold is:
and:
Since:
and:
the breaker should open.
Original request rate is:
With two retries and:
the expected-attempt multiplier is:
so dependency attempt rate would be:
If dependency capacity is:
the overload is:
Now open the breaker and allow:
half-open probes every:
Probe load is:
This protects the dependency while still collecting recovery evidence.
Boundary With Load Shedding
Load shedding rejects or drops work to preserve the system boundary. A software circuit breaker is more specific: it rejects or limits work because a named dependency is unhealthy. The two mechanisms often work together. The breaker can stop calls to the failing dependency, while load shedding protects the caller from excessive queued work after fast failures.
The distinction matters for telemetry. A spike in breaker-open rejections means dependency protection is active. A spike in load-shed requests means the caller itself is protecting capacity. Both should be visible in dashboards and post-incident analysis.
Degraded Response
Opening a software circuit breaker is not automatically a user-visible outage. The caller may return cached data, queue noncritical work, reject unsafe commands, serve read-only mode, shed low-priority traffic, use a backup dependency or enter degraded mode.
The degraded response must be honest. Returning stale data, accepting commands without authorization or hiding a failed write can be worse than failing fast.
Validation Evidence
Useful evidence includes fault injection, threshold tests, timeout logs, retry counters, queue depth, open/half-open/closed state transitions, probe results, degraded-response tests, client behavior, alert behavior and recovery timing.
The breaker should be tested under load. A breaker that opens correctly in a unit test may still fail if clients retry the fast failure aggressively or if every instance probes at the same instant.
Common Mistakes
Do not use a software circuit breaker as a substitute for capacity planning. Do not set thresholds without a minimum sample size. Do not let half-open probes synchronize across thousands of clients. Do not return unsafe stale data. Do not confuse software dependency protection with an electrical circuit breaker.
A good breaker design states the protected dependency, failure definition, minimum window, trip threshold, cooldown, probe policy, degraded response, client retry behavior and validation evidence.