Glossary term
Single Point of Failure
Engineering definition of single point of failure covering redundancy limits, shared dependencies, voter availability and resilience validation.
Definition
conceptA single point of failure is one element, dependency or decision point whose failure can defeat the required system function despite other redundancy or safeguards.
Single points of failure appear in physical equipment, control systems, power distribution, networks, software services, protection architectures and maintenance processes. They may be a component, shared utility, voter, controller, database, final element, communication path, operator action, bypass permit, procedure or site condition. Finding them requires tracing the required function through sensors, logic, power, actuation, people, data and recovery paths.
A single point of failure is one element, dependency or decision point whose failure can defeat the required system function. It can exist even when the visible architecture looks redundant.
Examples include one power feed serving duplicated controllers, one database behind replicated services, one voter behind three sensor channels, one final valve behind two trip signals, one network route for both primary and backup traffic, or one maintenance action that can disable every channel.
Functional Boundary
A single point of failure is defined against a required function, not against a component list. The same item may be critical for one function and irrelevant for another.
The boundary should state:
and identify each element that must work for:
If one element is mandatory in every success path, it is a single point of failure for that function.
Series Availability
For a function that requires several elements in series, system availability can be approximated as:
The unavailability contribution of one mandatory element is:
When one element has much lower availability than the rest, it can dominate:
This is why adding redundant sensors may not improve the function if one unprotected actuator, power supply or communication gateway remains mandatory.
Redundancy With a Shared Voter
For three replicated channels with availability:
a 2-out-of-3 function under independent replica failures is:
If all three channels feed one required voter or shared controller with availability:
the system availability becomes:
The voter can therefore become the single point of failure. The design may still be acceptable, but the reliability claim must include the voter.
Worked Example
Three service replicas each have availability:
The independent 2-out-of-3 availability is:
so:
Now add a shared dependency with:
The real availability is:
The replicated service looks highly available by itself, but the shared dependency pulls the whole function below the availability of any individual replica:
The engineering conclusion is not “replication failed”. It is that the shared dependency must be removed, replicated, diversified, monitored, bypassed safely or explicitly accepted.
Physical and Operational Examples
Single points of failure are not limited to electronics. A relief path may depend on one blocked discharge header. A safety interlock may depend on one final contactor. A redundant pump set may depend on one suction strainer. A communication service may depend on one fiber route. A maintenance program may depend on one manual isolation step that can leave both channels unavailable.
The item can also be a process state. If startup requires bypassing every protection path, the startup procedure itself may create a temporary single point of failure.
Relationship to Common-Cause Failure
Common-cause failure describes one cause defeating multiple elements. A single point of failure describes one required element or dependency whose failure defeats the function. They often appear together, but they are not the same.
The review question for a single point is:
The review question for common cause is:
Both questions are needed before claiming redundancy, independent protection-layer credit or high service availability.
Validation Evidence
Useful evidence includes architecture diagrams, dependency maps, failure-mode reviews, power and network one-line diagrams, cause-and-effect matrices, trip tests, failover tests, disaster-recovery tests, route-diversity records, bypass logs and maintenance procedures.
Validation should exercise the failure, not just inspect the drawing. Pull the route, remove the voter, fail the controller, isolate the utility, disable the primary service, simulate bad data, or rehearse the manual action where this can be done safely.
Common Mistakes
Do not count replicas without tracing their shared dependencies. Do not assume cloud, network, PLC, relay, valve or operator redundancy unless the required function can still complete after one failure. Do not hide a single point of failure inside a human procedure, shared credential, spare-part policy, test fixture or maintenance lockout.
A good single-point-of-failure review produces design actions: diversify the dependency, add a safe degraded mode, improve diagnostics, create independent routes, protect the final element, validate failover or document why the residual risk is accepted.