Case study
Safety Interlock Bypass Management Case Study
Safety interlock bypass case study for exposure accounting, demand probability, stop-time margin, proof testing, RPN, and release evidence.
A safety interlock bypass is not a minor control-system state. It deliberately removes or weakens a protection function so maintenance, troubleshooting, setup, cleaning, or commissioning can be performed. That state may be necessary, but it must be engineered, visible, time-limited, authorized, and restored with evidence.
This case study follows an automated packaging cell where a guard-door interlock is bypassed during maintenance and remains bypassed after the line returns to production. The near miss does not involve a failed robot, a failed drive, or a failed sensor. It comes from a weak operating state: the control system allowed a credited safeguard to be unavailable without making the residual risk obvious enough to stop production.
The purpose is to show how an automation engineer should connect bypass status, exposure time, demand probability, stop-time margin, human response, proof-test evidence, and release criteria before returning a system to automatic operation.
Case Context
The system is a robotic case-packing and palletizing cell. A fenced area contains a six-axis robot, powered conveyor, carton erector, and pneumatic pusher. Operators normally clear minor jams from outside the guarded space. Maintenance technicians enter the cell through an interlocked gate after the robot and conveyor are brought to a safe state.
The credited interlock is gate switch GS-204. When the gate is opened in automatic mode, the safety controller removes servo enable, commands safe torque off on the robot and conveyor drives, vents the pusher air circuit through a monitored dump valve, and requires a deliberate reset outside the cell.
| Item | Value or condition |
|---|---|
| Normal production rate | 38\ \text{cases/min} |
| Guarded-cell access gate | GS-204 |
| Safety-controller scan and input filter | 40\ \text{ms} |
| Safety relay and logic delay | 35\ \text{ms} |
| Drive safe-torque-off delay | 55\ \text{ms} |
| Measured mechanical run-down | 370\ \text{ms} |
| Distance from gate threshold to closest hazard zone | 1.20\ \text{m} |
| Assumed approach speed for screening | 1.6\ \text{m/s} |
| Planned bypass duration | 0.75\ \text{h} |
| Maximum authorized bypass duration | 2.0\ \text{h} |
| Actual bypass duration found in logs | 9.5\ \text{h} |
| Jam-clearing demand rate while running | 0.18\ \text{demands/h} |
The numbers are simplified for a teaching case. A formal machinery-safety design would use the applicable legal and site-specific safety standard, validated stopping-time measurement, reach geometry, access frequency, performance-level or safety-integrity target, and documented risk assessment.
Event Sequence
Maintenance replaces a damaged guard-door switch bracket and aligns the gate actuator. To test repeatability, the technician requests a temporary bypass. The bypass is permitted for a controlled maintenance state: reduced speed, cell empty, maintenance owner present, visible bypass indication, no production release, and restoration before the next shift.
The sequence is:
- A supervisor authorizes a bypass for
GS-204from 07:20 to 08:05. - The technician aligns the actuator and confirms that the gate switch changes state.
- A production interruption on another line pulls the technician away before full interlock proof testing.
- The cell is restarted at reduced rate for a short trial.
- The line is then returned to full automatic production after a shift handover.
- At 16:45, an operator opens the gate to clear a crushed carton while the robot is between picks.
- The robot does not stop from gate opening because the bypass bit is still active.
- A second operator presses an emergency stop. No injury occurs.
- Engineering holds the line and downloads controller, HMI, alarm, and safety-controller logs.
The unsafe condition is not only that the gate was bypassed. The deeper failure is that production could continue while a credited protection function was unavailable, the bypass was not automatically expired, and the shift handover did not make the degraded state impossible to miss.
Bypass Exposure Accounting
The first calculation is simple: how long was the protection unavailable compared with the approved condition?
Actual bypass duration:
Planned duration:
Maximum authorized duration:
The overrun relative to the permit is:
The exposure multiplier relative to the planned maintenance task is:
The exposure multiplier relative to the maximum authorized permit is:
This result changes the engineering interpretation. A short, controlled maintenance bypass is a degraded state. A bypass left active through full production is a different operating mode and must be treated as an uncontrolled safety-function defeat.
Demand Probability During the Bypass
Exposure time matters because the interlock is only challenged when the protected condition occurs. Here, the relevant demand is an access or jam-clearing event that should have caused the gate interlock to stop hazardous motion.
Assume a simplified Poisson model for jam-clearing access demand while the line is running:
where:
- \lambda=0.18\ \text{demands/h} is the observed demand rate during this product run;
- t is the time the interlock is unavailable while production continues.
For the planned maintenance bypass:
So the planned window had about a 12.6\% chance of at least one demand, assuming the line was running during that entire window. In the actual maintenance plan, production should not have been released, so this is already conservative for the planned state.
For the maximum permit duration:
For the actual bypass duration:
The actual condition had about an 81.9\% chance of at least one demand if the demand-rate estimate is representative. The exact number is less important than the order of magnitude: leaving the bypass active through production converted a controlled maintenance condition into a highly likely challenge of an unavailable safety function.
Stop-Time Screening
The second calculation checks whether the normal interlock response provides enough distance margin for the simplified access geometry.
The measured stop response after repair is:
Substitute the measured and configured values:
Using an approach speed of 1.6\ \text{m/s}, the simplified intrusion distance during stopping is:
The available distance from the gate threshold to the closest hazard zone is:
The simplified margin is:
With the interlock restored and the measured stopping time valid, the screen shows positive margin. That does not replace a formal safety-distance calculation, but it supports the engineering decision that the gate interlock can be credited only when the full chain is active and proof-tested.
During the bypass, the automatic stop is unavailable. The near miss depended on another operator seeing the event and pressing an emergency stop. The event review estimates that this manual response took about 1.8\ \text{s} from gate opening to emergency-stop actuation.
The distance a person could travel in that time is:
The margin relative to the same 1.20\ \text{m} access distance is:
This negative margin is the engineering reason that “someone will notice and hit stop” cannot be credited as an equivalent safeguard. Human response may reduce consequence in some cases, but it is not the same protection layer as an engineered interlock with verified stopping performance.
Evidence from Logs and Field Checks
The engineering review should separate direct evidence from assumptions.
| Evidence | Finding | Engineering meaning |
|---|---|---|
| Safety-controller event log | GS-204 bypass active from 07:20 to 16:50 | protection unavailable for the full production interval |
| HMI alarm history | one low-priority bypass banner acknowledged at 07:21 | status was visible once but not persistent enough |
| Production log | full-rate mode restored at 10:15 | bypass condition crossed from maintenance into production |
| Shift handover note | ”gate switch aligned, trial OK” | no explicit proof-test or bypass-removed statement |
| Gate proof test after hold | switch input, logic output, drive STO, and air dump all operated | hardware chain can work when not bypassed |
| Stop-time measurement | 0.50\ \text{s} total measured response | restored chain has positive distance margin in the simplified screen |
| Permit record | permit expired at 09:20 without automatic escalation | administrative control did not force restoration |
The review does not need a speculative root cause to make the hold decision. The logs already show that the operating state violated the permit, the HMI did not force attention, and production release was possible with the interlock unavailable.
Failure Modes
The bypass event exposes several failure modes:
| Failure mode | Effect | Evidence needed |
|---|---|---|
| Bypass left active after maintenance | guard opening no longer stops hazardous motion | safety-controller bypass log and gate challenge test |
| Bypass indication acknowledged and forgotten | operator believes the cell is normal | HMI alarm history and display-state review |
| Permit expiry not enforced | degraded state continues beyond risk assessment | permit timestamp and controller comparison |
| Shift handover omits bypass state | next crew inherits hidden risk | handover log and supervisor interview |
| Reduced-speed maintenance state becomes full production | risk basis no longer matches actual operation | mode-history and production-rate trend |
| Proof test incomplete before release | interlock restoration assumed, not demonstrated | commissioning or maintenance test record |
These are automation and management-system failures as much as component failures. Replacing the switch bracket fixes only one part of the chain. The line should remain held until the bypass workflow itself is corrected.
Risk-Priority Screening
Use RPN only as a screening tool. It does not prove compliance or safety, but it helps compare the before-and-after control state.
Initial ratings:
- severity S=9: potential serious injury in a robot cell;
- occurrence O=4: bypasses occur occasionally during maintenance and setup;
- detection D=6: active bypass can be missed after acknowledgement and shift handover.
Initial RPN:
Corrective controls reduce occurrence and improve detection:
- bypass tied to a specific permit, owner, safety function, and mode;
- automatic expiry with escalation and production inhibit;
- persistent HMI indication that cannot be cleared while the bypass remains active;
- reduced-speed and hold-to-run requirements during authorized bypass;
- full proof test required before automatic production release;
- shift handover field that cannot be closed with active safety bypasses.
Revised screening ratings:
- severity S=9: the credible consequence is still serious;
- occurrence O=2: bypass use is less frequent and more constrained;
- detection D=2: active bypass is persistent, alarmed, and blocks production release.
Revised RPN:
The improvement factor is:
The important point is not the exact RPN. The important point is that the corrective action does not pretend severity disappeared. It reduces the chance that a bypass enters production and makes the degraded state much easier to detect before exposure accumulates.
Corrective Engineering Decision
The line should not restart just because the gate switch now changes state. The release decision should require evidence at three levels: device, logic, and operating workflow.
Required corrective actions:
- Restore
GS-204and remove all bypass bits from production logic. - Perform a full chain proof test: gate input, safety-controller logic, robot STO, conveyor stop, pneumatic dump, reset, and fault indication.
- Measure and record stopping time under the actual machine configuration.
- Lock production mode when any credited safety interlock is bypassed unless a formally approved reduced-risk mode is selected.
- Add automatic bypass expiry with escalation before expiry and production inhibit after expiry.
- Make bypass status persistent on HMI, stack light, maintenance panel, and shift handover report.
- Require an owner, reason, start time, expiry time, affected safety function, compensating controls, and restoration test for every bypass.
- Review the event in FMEA or risk-review format so that similar guard, light-curtain, enabling-device, and emergency-stop bypasses are controlled consistently.
The release recommendation is:
Do not release the cell to automatic production until the bypass has been removed, the complete safety function has passed proof testing, stop-time evidence supports the access geometry, active-bypass states inhibit production release, and handover displays make any remaining degraded state unavoidable.
What Good Bypass Management Looks Like
Good bypass management treats bypasses as controlled operating states, not informal maintenance conveniences.
A defensible bypass record should state:
- the safety function being bypassed;
- the hazard and consequence controlled by that function;
- why the bypass is technically necessary;
- the allowed mode, speed, staffing, and area access;
- the start time, expiry time, and owner;
- compensating controls and their limits;
- who may approve extension;
- what proof test is required before release;
- what evidence is stored after restoration.
The control system should support the workflow. If the HMI only shows a small acknowledged alarm, the design is relying on memory. Better designs use persistent visual status, event logging, forced handover visibility, bypass-summary screens, automatic expiry, mode restrictions, and production inhibits for bypasses on credited safeguards.
Lessons for Automation Engineers
Safety interlock bypass management is a design problem, not only a procedure problem. Procedures are necessary, but the automation system must make degraded states visible, bounded, and difficult to normalize.
Transferable lessons:
- A bypass changes the safety architecture. Treat it as a temporary operating mode with its own risk basis.
- Exposure time must be measured. A forgotten bypass is not equivalent to a planned short maintenance task.
- Demand probability can rise quickly when production continues with a safety function unavailable.
- Manual emergency response is not an equivalent replacement for a verified interlock chain.
- Stop-time evidence must cover the full chain from sensor to safe state, not only the input bit.
- Bypass alarms must remain visible until the condition is cleared.
- Shift handover should include active bypasses, inhibited safeguards, open permits, and restoration tests.
- RPN can help prioritize corrective actions, but release requires physical proof testing and operational evidence.
The engineering habit is to ask a blunt question before restart:
If the bypass were still active, would the system make that fact impossible to miss and impossible to carry into production without deliberate approval?
If the answer is no, the bypass management system is not yet engineered well enough.