Case study

Safety Interlock Bypass Management Case Study

Safety interlock bypass case study for exposure accounting, demand probability, stop-time margin, proof testing, RPN, and release evidence.

A safety interlock bypass is not a minor control-system state. It deliberately removes or weakens a protection function so maintenance, troubleshooting, setup, cleaning, or commissioning can be performed. That state may be necessary, but it must be engineered, visible, time-limited, authorized, and restored with evidence.

This case study follows an automated packaging cell where a guard-door interlock is bypassed during maintenance and remains bypassed after the line returns to production. The near miss does not involve a failed robot, a failed drive, or a failed sensor. It comes from a weak operating state: the control system allowed a credited safeguard to be unavailable without making the residual risk obvious enough to stop production.

The purpose is to show how an automation engineer should connect bypass status, exposure time, demand probability, stop-time margin, human response, proof-test evidence, and release criteria before returning a system to automatic operation.

Case Context

The system is a robotic case-packing and palletizing cell. A fenced area contains a six-axis robot, powered conveyor, carton erector, and pneumatic pusher. Operators normally clear minor jams from outside the guarded space. Maintenance technicians enter the cell through an interlocked gate after the robot and conveyor are brought to a safe state.

The credited interlock is gate switch GS-204. When the gate is opened in automatic mode, the safety controller removes servo enable, commands safe torque off on the robot and conveyor drives, vents the pusher air circuit through a monitored dump valve, and requires a deliberate reset outside the cell.

ItemValue or condition
Normal production rate38\ \text{cases/min}
Guarded-cell access gateGS-204
Safety-controller scan and input filter40\ \text{ms}
Safety relay and logic delay35\ \text{ms}
Drive safe-torque-off delay55\ \text{ms}
Measured mechanical run-down370\ \text{ms}
Distance from gate threshold to closest hazard zone1.20\ \text{m}
Assumed approach speed for screening1.6\ \text{m/s}
Planned bypass duration0.75\ \text{h}
Maximum authorized bypass duration2.0\ \text{h}
Actual bypass duration found in logs9.5\ \text{h}
Jam-clearing demand rate while running0.18\ \text{demands/h}

The numbers are simplified for a teaching case. A formal machinery-safety design would use the applicable legal and site-specific safety standard, validated stopping-time measurement, reach geometry, access frequency, performance-level or safety-integrity target, and documented risk assessment.

Event Sequence

Maintenance replaces a damaged guard-door switch bracket and aligns the gate actuator. To test repeatability, the technician requests a temporary bypass. The bypass is permitted for a controlled maintenance state: reduced speed, cell empty, maintenance owner present, visible bypass indication, no production release, and restoration before the next shift.

The sequence is:

  1. A supervisor authorizes a bypass for GS-204 from 07:20 to 08:05.
  2. The technician aligns the actuator and confirms that the gate switch changes state.
  3. A production interruption on another line pulls the technician away before full interlock proof testing.
  4. The cell is restarted at reduced rate for a short trial.
  5. The line is then returned to full automatic production after a shift handover.
  6. At 16:45, an operator opens the gate to clear a crushed carton while the robot is between picks.
  7. The robot does not stop from gate opening because the bypass bit is still active.
  8. A second operator presses an emergency stop. No injury occurs.
  9. Engineering holds the line and downloads controller, HMI, alarm, and safety-controller logs.

The unsafe condition is not only that the gate was bypassed. The deeper failure is that production could continue while a credited protection function was unavailable, the bypass was not automatically expired, and the shift handover did not make the degraded state impossible to miss.

Bypass Exposure Accounting

The first calculation is simple: how long was the protection unavailable compared with the approved condition?

Actual bypass duration:

t_{actual}=16{:}50-07{:}20=9.5\ \text{h}

Planned duration:

t_{planned}=0.75\ \text{h}

Maximum authorized duration:

t_{permit}=2.0\ \text{h}

The overrun relative to the permit is:

t_{overrun}=t_{actual}-t_{permit}=9.5-2.0=7.5\ \text{h}

The exposure multiplier relative to the planned maintenance task is:

\displaystyle \frac{t_{actual}}{t_{planned}}=\frac{9.5}{0.75}=12.7

The exposure multiplier relative to the maximum authorized permit is:

\displaystyle \frac{t_{actual}}{t_{permit}}=\frac{9.5}{2.0}=4.75

This result changes the engineering interpretation. A short, controlled maintenance bypass is a degraded state. A bypass left active through full production is a different operating mode and must be treated as an uncontrolled safety-function defeat.

Demand Probability During the Bypass

Exposure time matters because the interlock is only challenged when the protected condition occurs. Here, the relevant demand is an access or jam-clearing event that should have caused the gate interlock to stop hazardous motion.

Assume a simplified Poisson model for jam-clearing access demand while the line is running:

P(\text{at least one demand})=1-e^{-\lambda t}

where:

  • \lambda=0.18\ \text{demands/h} is the observed demand rate during this product run;
  • t is the time the interlock is unavailable while production continues.

For the planned maintenance bypass:

P_{planned}=1-e^{-0.18(0.75)}
P_{planned}=1-e^{-0.135}=0.126

So the planned window had about a 12.6\% chance of at least one demand, assuming the line was running during that entire window. In the actual maintenance plan, production should not have been released, so this is already conservative for the planned state.

For the maximum permit duration:

P_{permit}=1-e^{-0.18(2.0)}
P_{permit}=1-e^{-0.36}=0.302

For the actual bypass duration:

P_{actual}=1-e^{-0.18(9.5)}
P_{actual}=1-e^{-1.71}=0.819

The actual condition had about an 81.9\% chance of at least one demand if the demand-rate estimate is representative. The exact number is less important than the order of magnitude: leaving the bypass active through production converted a controlled maintenance condition into a highly likely challenge of an unavailable safety function.

Stop-Time Screening

The second calculation checks whether the normal interlock response provides enough distance margin for the simplified access geometry.

The measured stop response after repair is:

T_{stop}=T_{input}+T_{logic}+T_{drive}+T_{run-down}

Substitute the measured and configured values:

T_{stop}=0.040+0.035+0.055+0.370=0.500\ \text{s}

Using an approach speed of 1.6\ \text{m/s}, the simplified intrusion distance during stopping is:

S=vT_{stop}
S=1.6(0.500)=0.80\ \text{m}

The available distance from the gate threshold to the closest hazard zone is:

S_{available}=1.20\ \text{m}

The simplified margin is:

S_{margin}=S_{available}-S=1.20-0.80=0.40\ \text{m}

With the interlock restored and the measured stopping time valid, the screen shows positive margin. That does not replace a formal safety-distance calculation, but it supports the engineering decision that the gate interlock can be credited only when the full chain is active and proof-tested.

During the bypass, the automatic stop is unavailable. The near miss depended on another operator seeing the event and pressing an emergency stop. The event review estimates that this manual response took about 1.8\ \text{s} from gate opening to emergency-stop actuation.

The distance a person could travel in that time is:

S_{manual}=1.6(1.8)=2.88\ \text{m}

The margin relative to the same 1.20\ \text{m} access distance is:

S_{manual\ margin}=1.20-2.88=-1.68\ \text{m}

This negative margin is the engineering reason that “someone will notice and hit stop” cannot be credited as an equivalent safeguard. Human response may reduce consequence in some cases, but it is not the same protection layer as an engineered interlock with verified stopping performance.

Evidence from Logs and Field Checks

The engineering review should separate direct evidence from assumptions.

EvidenceFindingEngineering meaning
Safety-controller event logGS-204 bypass active from 07:20 to 16:50protection unavailable for the full production interval
HMI alarm historyone low-priority bypass banner acknowledged at 07:21status was visible once but not persistent enough
Production logfull-rate mode restored at 10:15bypass condition crossed from maintenance into production
Shift handover note”gate switch aligned, trial OK”no explicit proof-test or bypass-removed statement
Gate proof test after holdswitch input, logic output, drive STO, and air dump all operatedhardware chain can work when not bypassed
Stop-time measurement0.50\ \text{s} total measured responserestored chain has positive distance margin in the simplified screen
Permit recordpermit expired at 09:20 without automatic escalationadministrative control did not force restoration

The review does not need a speculative root cause to make the hold decision. The logs already show that the operating state violated the permit, the HMI did not force attention, and production release was possible with the interlock unavailable.

Failure Modes

The bypass event exposes several failure modes:

Failure modeEffectEvidence needed
Bypass left active after maintenanceguard opening no longer stops hazardous motionsafety-controller bypass log and gate challenge test
Bypass indication acknowledged and forgottenoperator believes the cell is normalHMI alarm history and display-state review
Permit expiry not enforceddegraded state continues beyond risk assessmentpermit timestamp and controller comparison
Shift handover omits bypass statenext crew inherits hidden riskhandover log and supervisor interview
Reduced-speed maintenance state becomes full productionrisk basis no longer matches actual operationmode-history and production-rate trend
Proof test incomplete before releaseinterlock restoration assumed, not demonstratedcommissioning or maintenance test record

These are automation and management-system failures as much as component failures. Replacing the switch bracket fixes only one part of the chain. The line should remain held until the bypass workflow itself is corrected.

Risk-Priority Screening

Use RPN only as a screening tool. It does not prove compliance or safety, but it helps compare the before-and-after control state.

Initial ratings:

  • severity S=9: potential serious injury in a robot cell;
  • occurrence O=4: bypasses occur occasionally during maintenance and setup;
  • detection D=6: active bypass can be missed after acknowledgement and shift handover.

Initial RPN:

RPN_{initial}=SOD=9(4)(6)=216

Corrective controls reduce occurrence and improve detection:

  • bypass tied to a specific permit, owner, safety function, and mode;
  • automatic expiry with escalation and production inhibit;
  • persistent HMI indication that cannot be cleared while the bypass remains active;
  • reduced-speed and hold-to-run requirements during authorized bypass;
  • full proof test required before automatic production release;
  • shift handover field that cannot be closed with active safety bypasses.

Revised screening ratings:

  • severity S=9: the credible consequence is still serious;
  • occurrence O=2: bypass use is less frequent and more constrained;
  • detection D=2: active bypass is persistent, alarmed, and blocks production release.

Revised RPN:

RPN_{revised}=9(2)(2)=36

The improvement factor is:

\displaystyle \frac{216}{36}=6

The important point is not the exact RPN. The important point is that the corrective action does not pretend severity disappeared. It reduces the chance that a bypass enters production and makes the degraded state much easier to detect before exposure accumulates.

Corrective Engineering Decision

The line should not restart just because the gate switch now changes state. The release decision should require evidence at three levels: device, logic, and operating workflow.

Required corrective actions:

  1. Restore GS-204 and remove all bypass bits from production logic.
  2. Perform a full chain proof test: gate input, safety-controller logic, robot STO, conveyor stop, pneumatic dump, reset, and fault indication.
  3. Measure and record stopping time under the actual machine configuration.
  4. Lock production mode when any credited safety interlock is bypassed unless a formally approved reduced-risk mode is selected.
  5. Add automatic bypass expiry with escalation before expiry and production inhibit after expiry.
  6. Make bypass status persistent on HMI, stack light, maintenance panel, and shift handover report.
  7. Require an owner, reason, start time, expiry time, affected safety function, compensating controls, and restoration test for every bypass.
  8. Review the event in FMEA or risk-review format so that similar guard, light-curtain, enabling-device, and emergency-stop bypasses are controlled consistently.

The release recommendation is:

Do not release the cell to automatic production until the bypass has been removed, the complete safety function has passed proof testing, stop-time evidence supports the access geometry, active-bypass states inhibit production release, and handover displays make any remaining degraded state unavoidable.

What Good Bypass Management Looks Like

Good bypass management treats bypasses as controlled operating states, not informal maintenance conveniences.

A defensible bypass record should state:

  • the safety function being bypassed;
  • the hazard and consequence controlled by that function;
  • why the bypass is technically necessary;
  • the allowed mode, speed, staffing, and area access;
  • the start time, expiry time, and owner;
  • compensating controls and their limits;
  • who may approve extension;
  • what proof test is required before release;
  • what evidence is stored after restoration.

The control system should support the workflow. If the HMI only shows a small acknowledged alarm, the design is relying on memory. Better designs use persistent visual status, event logging, forced handover visibility, bypass-summary screens, automatic expiry, mode restrictions, and production inhibits for bypasses on credited safeguards.

Lessons for Automation Engineers

Safety interlock bypass management is a design problem, not only a procedure problem. Procedures are necessary, but the automation system must make degraded states visible, bounded, and difficult to normalize.

Transferable lessons:

  • A bypass changes the safety architecture. Treat it as a temporary operating mode with its own risk basis.
  • Exposure time must be measured. A forgotten bypass is not equivalent to a planned short maintenance task.
  • Demand probability can rise quickly when production continues with a safety function unavailable.
  • Manual emergency response is not an equivalent replacement for a verified interlock chain.
  • Stop-time evidence must cover the full chain from sensor to safe state, not only the input bit.
  • Bypass alarms must remain visible until the condition is cleared.
  • Shift handover should include active bypasses, inhibited safeguards, open permits, and restoration tests.
  • RPN can help prioritize corrective actions, but release requires physical proof testing and operational evidence.

The engineering habit is to ask a blunt question before restart:

If the bypass were still active, would the system make that fact impossible to miss and impossible to carry into production without deliberate approval?

If the answer is no, the bypass management system is not yet engineered well enough.

REF

See also