Project
Observer-Based Fault Detection Project
Automation and control project for designing a model-based residual monitor with state observer prediction, residual thresholds, fault-injection validation, action boundaries, and commissioning evidence.
This project produces a commissioning package for an observer-based fault-detection monitor. The goal is not to add an alarm because a model is available. The goal is to prove that a model-based residual can detect a meaningful fault early enough, with an acceptable false-alarm rate, and with a response action that is safe for the controlled plant.
The example uses a jacketed tank temperature loop, but the workflow applies to motor drives, motion axes, pumps, HVAC loops, robotics, battery systems, thermal equipment, and other controlled systems where a model prediction can be compared with measured behaviour.
Project Objective
Design and validate a residual monitor for a controlled plant. The monitor compares a one-step model prediction with the measured output and raises an advisory alarm when the mismatch is too large for too long.
The project deliverable is an engineering package containing:
- control-loop boundary and monitored fault definition;
- discrete plant model and units;
- observer update rule;
- residual definition;
- threshold calculation;
- persistence rule for alarms;
- action boundary and fail-safe behaviour;
- fault-injection validation matrix;
- commissioning checklist and handover record.
The monitor is advisory in this project. It may alert operators, freeze setpoint ramps, and request inspection. It may not directly trip the plant or override credited protection without a separate safety case.
System Boundary
The controlled system is a jacketed mixing tank temperature loop.
| Item | Project value |
|---|---|
| Controlled variable | Tank outlet temperature deviation, x in deg C from nominal. |
| Measurement | RTD transmitter reading, y in deg C deviation. |
| Manipulated variable | Steam-valve command deviation, u in percent open from nominal. |
| Existing controller | Digital PI controller. |
| Monitor role | Independent residual alarm running beside the controller. |
| Sample period | T_s=10\ \text{s}. |
| Candidate faults | Valve leakage, valve stiction, transmitter bias, missing steam-pressure disturbance, model drift. |
| Excluded functions | Safety trip, high-temperature interlock, final-element shutdown. |
The monitor does not replace the controller, operator, interlock, or process-safety system. It adds evidence that the measured plant is no longer behaving like the model used for normal control.
Acceptance Criteria
Use these project criteria.
| Requirement | Acceptance value |
|---|---|
| Detect persistent 2.0\ ^\circ\text{C} equivalent output mismatch | within 30\ \text{s} |
| Nuisance alarm rate during normal validation run | no alarm in 8\ \text{h} |
| Alarm action | advisory plus setpoint-ramp freeze |
| Automatic trip authority | not permitted in this release |
| Required validation modes | normal ramps, load disturbance, sensor bias, actuator fault, communication delay |
| Handover evidence | trend plots, residual statistics, fault-injection results, action-boundary signoff |
These values are not universal. A reactor, boiler, flight-control surface, medical device, or power-system protection function would need stronger evidence, independent fallback, and formal safety review.
Discrete Model
Use a scalar deviation model:
where:
- x_k is the modelled temperature deviation at sample k;
- u_k is the steam-valve command deviation;
- y_k is the measured temperature deviation;
- w_k is unmodelled process variation;
- v_k is measurement noise.
Use commissioned model parameters:
The value a=0.92 means the deviation decays slowly from one 10 s sample to the next when valve command is unchanged. The value b=0.18 means a positive valve-command deviation increases predicted temperature deviation at the next sample.
Engineering Comment
This is a local model. It is valid only near the operating point used for identification, with similar flow rate, jacket condition, steam pressure, product properties, transmitter damping, and controller scan timing. If the plant changes operating mode, the monitor should either use a different model or declare that residual interpretation is not valid.
Observer and Residual
Predict the next state before using the new measurement:
Define the pre-fit residual:
Apply a simple observer correction:
Use:
The residual is computed before correction. That matters because the correction can partially hide a persistent mismatch after the observer adapts.
Engineering Comment
A high observer gain follows measurements quickly but can hide faults and amplify noise. A low observer gain keeps the model independent for longer but may be slow to track legitimate operating changes. The gain must be validated against the alarm purpose, not chosen only for a smooth estimate.
Threshold Calculation
Estimate normal one-step residual uncertainty from independent model and sensor terms:
Use commissioning estimates:
Then:
Set a three-sigma residual threshold:
Alarm condition:
Persistence rule:
Raise an advisory alarm only after three consecutive threshold exceedances with valid data quality.
For independent Gaussian residuals, a single two-sided three-sigma exceedance has probability about:
Three consecutive exceedances would have approximate probability:
At one sample every 10 s, there are:
samples per day. The independent approximation would give:
false three-sample sequences per day.
Engineering Comment
This calculation is only a screening estimate. Real residuals are correlated because process dynamics, filtering, controller action, and model mismatch carry memory. The nuisance-alarm requirement must be validated with real operating data, not accepted from the Gaussian approximation alone.
Worked Fault-Injection Example
At one sample before the fault:
The valve command deviation is:
Predict the next state:
The measured temperature deviation is:
Residual:
Compare with the threshold:
The first threshold exceedance is recorded.
Now update the observer:
At the next sample, keep:
Predict:
Measured value:
Residual:
This is a second exceedance. If the next valid sample also exceeds the threshold, the monitor raises the advisory alarm.
Engineering Comment
The positive residual means the process is hotter than the model predicts from the commanded valve position. Possible causes include valve leakage, valve position feedback error, unmeasured steam-pressure disturbance, exothermic heat release, transmitter bias, or a model used outside its range. The detector identifies abnormal behaviour. It does not prove the root cause by itself.
Fault Classification Logic
Use residual sign, auxiliary evidence, and plant context to guide response.
| Evidence | Likely interpretation | Follow-up |
|---|---|---|
| Positive residual during low steam command | More heat enters than model predicts. | Check valve leakage, bypass, steam pressure, exotherm, sensor bias. |
| Negative residual during high steam command | Less heat enters than model predicts. | Check valve stiction, steam supply, fouling, actuator travel, process flow. |
| Residual changes sign with setpoint moves | Model gain or delay mismatch. | Re-identify model or schedule model by operating point. |
| Residual appears with communication delay | Timing or data-alignment fault. | Check timestamps, scan order, jitter, historian alignment. |
| Residual only on one transmitter | Measurement-chain issue. | Compare independent sensor, calibration, wiring, damping, grounding. |
The alarm message should not say “valve failed” unless the logic has evidence for valve failure. A defensible message is:
Temperature residual exceeds validated model threshold. Freeze setpoint ramp and inspect valve position, transmitter, steam pressure, and recent operating-mode changes.
Action Boundary
The first release uses this action boundary.
| Monitor state | Allowed action |
|---|---|
| Residual below threshold | No action; continue monitoring. |
| One or two valid exceedances | Log diagnostic pre-alarm; no operator alarm. |
| Three consecutive valid exceedances | Advisory alarm and setpoint-ramp freeze. |
| Alarm plus high-temperature limit approach | Escalate to operator and existing protection procedures. |
| Invalid data quality | Suppress residual alarm and raise data-quality diagnostic. |
The monitor does not close the steam valve, reset the controller, bypass the PI loop, or inhibit a safety interlock. Those actions require separate hazard analysis, fail-safe design, and validation.
Validation Matrix
Run validation with the same controller scan, historian, transmitter damping, and communication path used in operation.
| Test | Injection or condition | Expected result |
|---|---|---|
| Normal steady operation | No injected fault for 8\ \text{h}. | No advisory alarm; residual statistics within validated range. |
| Normal setpoint ramp | Approved production ramp. | No alarm if model scheduling is valid; ramp freeze not triggered. |
| Load disturbance | Known feed-temperature step. | Residual transient documented; no persistent false alarm after expected recovery. |
| Sensor bias | Add +2.0\ ^\circ\text{C} test offset. | Alarm within 30\ \text{s}, message identifies measurement or process mismatch. |
| Valve leakage | Simulate or test extra heat input at low command. | Positive residual alarm within requirement. |
| Valve stiction | Hold valve position despite command change. | Residual direction and valve-position evidence agree. |
| Timestamp delay | Shift measurement by one sample. | Data-quality or timing diagnostic prevents false fault conclusion. |
| Sensor dropout | Freeze measured value. | Residual monitor suppresses fault alarm and reports invalid input. |
| Model range violation | Operate outside identified flow range. | Monitor declares model invalid or switches to approved model. |
Each validation record should state date, configuration version, model version, controller version, test condition, pass/fail result, raw trend reference, and reviewer.
Risk Review
Before the monitor, the high-temperature mismatch failure mode is difficult to detect early from the operator screen alone.
Use a simplified risk-priority screen:
| Condition | Severity | Occurrence | Detection | RPN |
|---|---|---|---|---|
| Before monitor | 8 | 4 | 6 | 8(4)(6)=192 |
| After validated monitor | 8 | 3 | 2 | 8(3)(2)=48 |
The severity is not reduced because the process consequence has not changed. The monitor improves detection and may reduce occurrence by catching degrading valve or sensor behaviour earlier. If the alarm is ignored, bypassed, or poorly validated, the claimed risk reduction is not credible.
Commissioning Checklist
The project is complete only when these items are in the handover package:
- model identification data and operating range;
- observer equation, parameters, sample time, and implementation version;
- residual threshold calculation and validation data;
- persistence rule and data-quality gating;
- alarm text, priority, owner, and response procedure;
- action-boundary signoff showing what the monitor may and may not do;
- fault-injection validation results;
- nuisance-alarm review on normal operation;
- change-control rule for model, sensor, controller, or process modifications;
- rollback procedure if the monitor creates operational risk.
Common Mistakes
- Treating a residual alarm as root-cause diagnosis.
- Validating only with simulated faults and no normal-operation nuisance-alarm run.
- Choosing thresholds from clean commissioning data, then using them during noisy production.
- Ignoring timestamp alignment, controller scan order, filtering, and communication jitter.
- Allowing an advisory monitor to perform control or protection actions without a safety case.
- Updating the observer so aggressively that persistent faults are absorbed into the estimate.
- Failing to disable or reschedule the model outside its identified operating range.
Final Deliverable
The final deliverable is a short model-based fault-detection package that an operations, controls, and reliability review can accept:
- a clear monitored fault definition;
- a traceable model boundary;
- an observer and residual calculation;
- a threshold with units and validation evidence;
- an alarm persistence rule;
- an action boundary that preserves existing safeguards;
- fault-injection and nuisance-alarm results;
- handover instructions for operation, maintenance, and change control.
An observer-based detector is useful when it turns model disagreement into earlier, better-controlled engineering action. It is unsafe when the residual is treated as a magic fault label or when the detector is allowed to act beyond the evidence used to validate it.