Glossary term

Single Point of Failure

Engineering definition of single point of failure covering redundancy limits, shared dependencies, voter availability and resilience validation.

Definition

concept

A single point of failure is one element, dependency or decision point whose failure can defeat the required system function despite other redundancy or safeguards.

Single points of failure appear in physical equipment, control systems, power distribution, networks, software services, protection architectures and maintenance processes. They may be a component, shared utility, voter, controller, database, final element, communication path, operator action, bypass permit, procedure or site condition. Finding them requires tracing the required function through sensors, logic, power, actuation, people, data and recovery paths.

A single point of failure is one element, dependency or decision point whose failure can defeat the required system function. It can exist even when the visible architecture looks redundant.

Examples include one power feed serving duplicated controllers, one database behind replicated services, one voter behind three sensor channels, one final valve behind two trip signals, one network route for both primary and backup traffic, or one maintenance action that can disable every channel.

Functional Boundary

A single point of failure is defined against a required function, not against a component list. The same item may be critical for one function and irrelevant for another.

The boundary should state:

F=\text{required function}

and identify each element that must work for:

F=\text{available}

If one element is mandatory in every success path, it is a single point of failure for that function.

Series Availability

For a function that requires several elements in series, system availability can be approximated as:

A_{sys}=\prod_{k=1}^{n}A_k

The unavailability contribution of one mandatory element is:

U_k=1-A_k

When one element has much lower availability than the rest, it can dominate:

U_{sys}\approx U_{spof}

This is why adding redundant sensors may not improve the function if one unprotected actuator, power supply or communication gateway remains mandatory.

Redundancy With a Shared Voter

For three replicated channels with availability:

A

a 2-out-of-3 function under independent replica failures is:

A_{2of3}=3A^2(1-A)+A^3

If all three channels feed one required voter or shared controller with availability:

A_v

the system availability becomes:

A_{sys}=A_vA_{2of3}

The voter can therefore become the single point of failure. The design may still be acceptable, but the reliability claim must include the voter.

Worked Example

Three service replicas each have availability:

A=0.985

The independent 2-out-of-3 availability is:

A_{2of3}=3(0.985)^2(1-0.985)+(0.985)^3

so:

A_{2of3}=0.99933175

Now add a shared dependency with:

A_v=0.98

The real availability is:

A_{sys}=0.98(0.99933175)=0.979345

The replicated service looks highly available by itself, but the shared dependency pulls the whole function below the availability of any individual replica:

0.979345<0.985

The engineering conclusion is not “replication failed”. It is that the shared dependency must be removed, replicated, diversified, monitored, bypassed safely or explicitly accepted.

Physical and Operational Examples

Single points of failure are not limited to electronics. A relief path may depend on one blocked discharge header. A safety interlock may depend on one final contactor. A redundant pump set may depend on one suction strainer. A communication service may depend on one fiber route. A maintenance program may depend on one manual isolation step that can leave both channels unavailable.

The item can also be a process state. If startup requires bypassing every protection path, the startup procedure itself may create a temporary single point of failure.

Relationship to Common-Cause Failure

Common-cause failure describes one cause defeating multiple elements. A single point of failure describes one required element or dependency whose failure defeats the function. They often appear together, but they are not the same.

The review question for a single point is:

\text{Can the function succeed if this item fails?}

The review question for common cause is:

\text{Can one cause make several items fail together?}

Both questions are needed before claiming redundancy, independent protection-layer credit or high service availability.

Validation Evidence

Useful evidence includes architecture diagrams, dependency maps, failure-mode reviews, power and network one-line diagrams, cause-and-effect matrices, trip tests, failover tests, disaster-recovery tests, route-diversity records, bypass logs and maintenance procedures.

Validation should exercise the failure, not just inspect the drawing. Pull the route, remove the voter, fail the controller, isolate the utility, disable the primary service, simulate bad data, or rehearse the manual action where this can be done safely.

Common Mistakes

Do not count replicas without tracing their shared dependencies. Do not assume cloud, network, PLC, relay, valve or operator redundancy unless the required function can still complete after one failure. Do not hide a single point of failure inside a human procedure, shared credential, spare-part policy, test fixture or maintenance lockout.

A good single-point-of-failure review produces design actions: diversify the dependency, add a safe degraded mode, improve diagnostics, create independent routes, protect the final element, validate failover or document why the residual risk is accepted.

REF

See also