Exercise set
Medical Device Usability, Critical Task, Use-Error, and Alarm Response Exercises
Solved medical-device usability exercises for critical-task success, use-error reduction, alarm response, false alarms, training and release gates.
These exercises focus on usability validation for medical devices: representative users, critical tasks, use errors, alarm response, false alarm burden, training, task time, residual use-error risk and release gates. They are engineering evidence exercises, not clinical guidance or regulatory advice.
Intended-use claim coverage and clinical/post-market evidence are handled in companion specialist exercise sets.
How to use these exercises
Use the set as a usability release review for representative users and critical tasks. Exercises 1 to 5 check critical-task success, unresolved failures, user-group balance, task-time tails and environment-specific success. Exercises 6 to 11 review use-error reduction, residual RPN, training, retest coverage, label comprehension and workload margin. Exercises 12 to 17 check alarm response, false alarm burden, missed alarms, false-to-true alarm ratio, evidence completion and residual concerns. Exercise 18 combines these usability gates into a release decision.
Before calculating, state the user group, use environment, training condition, critical task, alarm condition, success criterion and residual risk decision. A good average success rate is not enough if a safety-critical task fails for a target user group or if alarm response has a long tail. The engineering comment below each exercise identifies whether the result calls for mitigation, retest, claim narrowing, training change or hold.
Release Evidence Notes
Usability evidence should identify critical tasks, user groups, use environment, observed use errors, close calls, training condition, residual risk and whether mitigations were retested.
Critical-task success should be interpreted task by task. A high overall success rate can hide a safety-critical failure.
Alarm evidence should include detection, comprehension, response time, false alarm burden and workflow interruption.
The evidence package should separate task performance, use-error mitigation and alarm behavior. Task performance asks whether representative users can complete the intended workflow. Use-error mitigation asks whether observed errors were reduced and retested. Alarm behavior asks whether users notice, understand and respond without unacceptable false-alarm burden. A release decision needs all three streams.
Engineering Boundary Notes
These calculations do not replace a full usability engineering process, formative/summative protocol design, human-factors review, clinical workflow analysis or regulatory judgment. They are screening exercises for usability release.
The main boundary is representativeness. User groups, environments, training, lighting, noise, workload and workflow interruptions must match intended use. The second boundary is task criticality: noncritical success cannot compensate for unresolved failures on tasks that protect patient safety or device effectiveness.
Common Release Mistakes
- averaging critical and noncritical tasks into one success number;
- using trained users to represent novice users without justification;
- counting a warning as mitigation without proving the user notices and acts;
- ignoring false alarms and alarm fatigue;
- closing a use error without retesting the mitigation.
Another common mistake is treating training as a universal mitigation. Training only works if it is available, retained, repeatable and realistic for the intended users. If a mitigation depends on training that users will not actually receive, the use error remains open.
Do not treat alarms as binary signals only. Alarm audibility, visibility, prioritization, false alarm rate, missed alarms, workflow context and response-time tail all affect whether the alarm supports safe use.
Scenario Map
| Scenario | Exercises | Primary check | Engineering decision |
|---|---|---|---|
| Critical task performance | 1, 2, 3, 4, 5 | success rate, failures, group coverage and task time | Decide whether tasks can be released. |
| Use-error mitigation | 6, 7, 8, 9, 10, 11 | error reduction, residual risk, training and retest | Decide whether mitigations are effective. |
| Alarm usability | 12, 13, 14, 15, 16, 17 | response time, false alarms, missed alarms and evidence completion | Decide whether alarms support safe use. |
| Release gate | 18 | all-of usability release | Decide whether usability validation can close. |
Exercise 1: Critical-Task Success Rate
A usability validation has 120 critical-task attempts and 112 successes. Compute success rate.
Solution
Engineering Comment
Task-level failures should be reviewed individually. A critical task may need zero unresolved failures.
Plausibility Check
Eight failures out of one hundred twenty leaves success below ninety-five percent.
Exercise 2: Critical-Task Failure Count Gate
A release rule allows no more than 2 unresolved critical-task failures. The study has 8 failures, 5 mitigated and 3 unresolved. Does it pass?
Solution
It fails because:
Engineering Comment
Mitigated failures should still be retested with representative users.
Plausibility Check
Three unresolved failures exceed a two-failure allowance by one.
Exercise 3: Representative User Group Balance
The study includes 10 novice users, 14 trained users and 6 maintenance users. Target minimum is 8 users per group. Which groups pass?
Solution
Novice and trained users pass:
Maintenance users fail:
Engineering Comment
Maintenance usability can affect cleaning, setup, calibration and service safety.
Plausibility Check
Only the smallest group is below the minimum.
Exercise 4: Task Time Margin
A critical alarm task must be completed within 90 seconds. Median observed time is 68 seconds and 95th percentile is 104 seconds. Which statistic fails the limit?
Solution
Median passes:
95th percentile fails:
Engineering Comment
Tail performance matters for alarm response and critical tasks.
Plausibility Check
Most users can be fast while a minority still exceed the limit.
Exercise 5: Use Environment Success Split
Home-use success is 48 of 52 tasks. Clinic success is 64 of 68 tasks. Compute both success rates.
Solution
Home:
Clinic:
Engineering Comment
Environment-specific results can show where labeling, lighting, noise or workflow affects use.
Plausibility Check
Both rates are high but below one hundred percent.
Exercise 6: Use-Error Reduction
Before mitigation, 14 use errors occur in 80 attempts. After mitigation, 5 occur in 80 attempts. Compute relative reduction.
Solution
Initial rate:
Final rate:
Reduction:
Engineering Comment
Reduction is encouraging, but residual errors must be evaluated by severity.
Plausibility Check
The error count falls by more than half, so reduction above sixty percent is plausible.
Exercise 7: Residual Use-Error RPN
A residual setup error has severity 7, occurrence 3 and detection 4. Compute RPN.
Solution
Engineering Comment
RPN is a screen. A severe use error may require mitigation even if RPN is moderate.
Plausibility Check
The product of three single-digit scores is a two-digit value.
Exercise 8: Training Effectiveness
Before training, task success is 18 of 25. After training, success is 23 of 25. Compute improvement in percentage points.
Solution
Before:
After:
Improvement:
Engineering Comment
Training can be a mitigation only if training is realistic, repeatable and available to intended users.
Plausibility Check
Five additional successes out of twenty-five users equals twenty percentage points.
Exercise 9: Retest Coverage
There are 5 mitigated use errors. Retesting covers 4. Compute retest coverage.
Solution
Engineering Comment
An untested mitigation should remain open unless justified by risk.
Plausibility Check
One missing retest out of five leaves eighty percent coverage.
Exercise 10: Label Comprehension Rate
Twenty-four users interpret a warning label. Twenty-one interpret it correctly. Compute comprehension rate.
Solution
Engineering Comment
A warning that is not understood by representative users may not be an effective risk control.
Plausibility Check
Three incorrect interpretations out of twenty-four leave less than ninety percent.
Exercise 11: Workload Score Margin
Acceptable workload score is at most 45. Observed mean score is 41 with uncertainty allowance 6. Compute guarded score and margin.
Solution
Guarded score:
Margin:
Engineering Comment
Nominal workload passes, but guarded workload fails.
Plausibility Check
Adding uncertainty can turn a small apparent margin negative.
Exercise 12: Alarm Response Time
Required alarm response is 60 seconds. Mean response is 44 seconds and 95th percentile is 71 seconds. Does the alarm response pass?
Solution
Mean passes:
95th percentile fails:
The response gate fails if it uses the 95th percentile.
Engineering Comment
Alarm response should protect slower but representative users, not only average users.
Plausibility Check
The long-tail response exceeds the requirement.
Exercise 13: False Alarm Burden
A device produces 18 false alarms over 72 monitored hours. Compute false alarm rate.
Solution
Engineering Comment
False alarms can create alarm fatigue and reduce response reliability to true alarms.
Plausibility Check
Eighteen alarms over three days is one every four hours.
Exercise 14: Missed Alarm Fraction
During simulation, 2 of 30 true alarm events are missed by users. Compute missed alarm fraction.
Solution
Engineering Comment
Missed alarms should be linked to audibility, visibility, workload and workflow placement.
Plausibility Check
Two misses out of thirty is less than ten percent.
Exercise 15: Alarm Benefit-Risk Screen
True alarms are 30 and false alarms are 18. Compute false-to-true alarm ratio.
Solution
Engineering Comment
A high false-to-true ratio may weaken user trust even if sensitivity is acceptable.
Plausibility Check
False alarms are a little over half of true alarms.
Exercise 16: Usability Evidence Completion
The release package requires task list, user groups, environment, raw observations, use-error log, mitigation list, retest evidence, training condition, alarm response and residual risk decision. Eight of ten records are complete. Compute completion.
Solution
Engineering Comment
Missing use-error logs, retest evidence or residual risk decisions should block release.
Plausibility Check
Eight of ten is exactly eighty percent.
Exercise 17: Residual Concern Count
There are 6 residual usability concerns. Four are low risk and two are medium risk. A release rule allows no medium or high residual concern without explicit sign-off. Does it pass automatically?
Solution
No. Medium concerns exist:
Engineering Comment
The release package needs explicit sign-off or additional mitigation for medium concerns.
Plausibility Check
Any medium concern violates a rule allowing only low concerns.
Exercise 18: Usability Release Gate
A release gate requires critical-task success above 95\%, all user groups above minimum, no unresolved critical-task failures above 2, alarm 95th percentile below 60 seconds, false alarm rate below 0.2/\text{h} and evidence completion above 90\%. Current values are 93.3\%, maintenance group below minimum, 3 unresolved failures, 71 seconds, 0.25/\text{h} and 80\%. Decide release status.
Solution
All listed thresholds fail:
Release status:
Engineering Comment
The usability package should hold for user coverage, critical tasks, alarm response, false alarms and evidence completion.
Plausibility Check
Multiple independent usability barriers fail, so release is not defensible.
Validation Package Checklist
- Critical tasks are evaluated separately from noncritical tasks.
- User groups and environments match the intended use.
- Use-error mitigations are retested and residual concerns are risk-ranked.
- Alarm response, missed alarms and false alarm burden are included in release evidence.
- Training condition, label comprehension and workload assumptions are documented.
- Medium or high residual concerns have explicit sign-off or additional mitigation.
- Release status states accept, retest mitigation, revise interface, narrow claim or hold.