Exercise set

Medical Device Usability, Critical Task, Use-Error, and Alarm Response Exercises

Solved medical-device usability exercises for critical-task success, use-error reduction, alarm response, false alarms, training and release gates.

These exercises focus on usability validation for medical devices: representative users, critical tasks, use errors, alarm response, false alarm burden, training, task time, residual use-error risk and release gates. They are engineering evidence exercises, not clinical guidance or regulatory advice.

Intended-use claim coverage and clinical/post-market evidence are handled in companion specialist exercise sets.

How to use these exercises

Use the set as a usability release review for representative users and critical tasks. Exercises 1 to 5 check critical-task success, unresolved failures, user-group balance, task-time tails and environment-specific success. Exercises 6 to 11 review use-error reduction, residual RPN, training, retest coverage, label comprehension and workload margin. Exercises 12 to 17 check alarm response, false alarm burden, missed alarms, false-to-true alarm ratio, evidence completion and residual concerns. Exercise 18 combines these usability gates into a release decision.

Before calculating, state the user group, use environment, training condition, critical task, alarm condition, success criterion and residual risk decision. A good average success rate is not enough if a safety-critical task fails for a target user group or if alarm response has a long tail. The engineering comment below each exercise identifies whether the result calls for mitigation, retest, claim narrowing, training change or hold.

Release Evidence Notes

Usability evidence should identify critical tasks, user groups, use environment, observed use errors, close calls, training condition, residual risk and whether mitigations were retested.

Critical-task success should be interpreted task by task. A high overall success rate can hide a safety-critical failure.

Alarm evidence should include detection, comprehension, response time, false alarm burden and workflow interruption.

The evidence package should separate task performance, use-error mitigation and alarm behavior. Task performance asks whether representative users can complete the intended workflow. Use-error mitigation asks whether observed errors were reduced and retested. Alarm behavior asks whether users notice, understand and respond without unacceptable false-alarm burden. A release decision needs all three streams.

Engineering Boundary Notes

These calculations do not replace a full usability engineering process, formative/summative protocol design, human-factors review, clinical workflow analysis or regulatory judgment. They are screening exercises for usability release.

The main boundary is representativeness. User groups, environments, training, lighting, noise, workload and workflow interruptions must match intended use. The second boundary is task criticality: noncritical success cannot compensate for unresolved failures on tasks that protect patient safety or device effectiveness.

Common Release Mistakes

  • averaging critical and noncritical tasks into one success number;
  • using trained users to represent novice users without justification;
  • counting a warning as mitigation without proving the user notices and acts;
  • ignoring false alarms and alarm fatigue;
  • closing a use error without retesting the mitigation.

Another common mistake is treating training as a universal mitigation. Training only works if it is available, retained, repeatable and realistic for the intended users. If a mitigation depends on training that users will not actually receive, the use error remains open.

Do not treat alarms as binary signals only. Alarm audibility, visibility, prioritization, false alarm rate, missed alarms, workflow context and response-time tail all affect whether the alarm supports safe use.

Scenario Map

ScenarioExercisesPrimary checkEngineering decision
Critical task performance1, 2, 3, 4, 5success rate, failures, group coverage and task timeDecide whether tasks can be released.
Use-error mitigation6, 7, 8, 9, 10, 11error reduction, residual risk, training and retestDecide whether mitigations are effective.
Alarm usability12, 13, 14, 15, 16, 17response time, false alarms, missed alarms and evidence completionDecide whether alarms support safe use.
Release gate18all-of usability releaseDecide whether usability validation can close.

Exercise 1: Critical-Task Success Rate

A usability validation has 120 critical-task attempts and 112 successes. Compute success rate.

Solution

S=\dfrac{112}{120}=93.3\%

Engineering Comment

Task-level failures should be reviewed individually. A critical task may need zero unresolved failures.

Plausibility Check

Eight failures out of one hundred twenty leaves success below ninety-five percent.

Exercise 2: Critical-Task Failure Count Gate

A release rule allows no more than 2 unresolved critical-task failures. The study has 8 failures, 5 mitigated and 3 unresolved. Does it pass?

Solution

It fails because:

3>2

Engineering Comment

Mitigated failures should still be retested with representative users.

Plausibility Check

Three unresolved failures exceed a two-failure allowance by one.

Exercise 3: Representative User Group Balance

The study includes 10 novice users, 14 trained users and 6 maintenance users. Target minimum is 8 users per group. Which groups pass?

Solution

Novice and trained users pass:

10\ge8,\quad 14\ge8

Maintenance users fail:

6<8

Engineering Comment

Maintenance usability can affect cleaning, setup, calibration and service safety.

Plausibility Check

Only the smallest group is below the minimum.

Exercise 4: Task Time Margin

A critical alarm task must be completed within 90 seconds. Median observed time is 68 seconds and 95th percentile is 104 seconds. Which statistic fails the limit?

Solution

Median passes:

68<90

95th percentile fails:

104>90

Engineering Comment

Tail performance matters for alarm response and critical tasks.

Plausibility Check

Most users can be fast while a minority still exceed the limit.

Exercise 5: Use Environment Success Split

Home-use success is 48 of 52 tasks. Clinic success is 64 of 68 tasks. Compute both success rates.

Solution

Home:

S_H=\dfrac{48}{52}=92.3\%

Clinic:

S_C=\dfrac{64}{68}=94.1\%

Engineering Comment

Environment-specific results can show where labeling, lighting, noise or workflow affects use.

Plausibility Check

Both rates are high but below one hundred percent.

Exercise 6: Use-Error Reduction

Before mitigation, 14 use errors occur in 80 attempts. After mitigation, 5 occur in 80 attempts. Compute relative reduction.

Solution

Initial rate:

r_1=\dfrac{14}{80}=17.5\%

Final rate:

r_2=\dfrac{5}{80}=6.25\%

Reduction:

R=\dfrac{17.5-6.25}{17.5}=64.3\%

Engineering Comment

Reduction is encouraging, but residual errors must be evaluated by severity.

Plausibility Check

The error count falls by more than half, so reduction above sixty percent is plausible.

Exercise 7: Residual Use-Error RPN

A residual setup error has severity 7, occurrence 3 and detection 4. Compute RPN.

Solution

RPN=7(3)(4)=84

Engineering Comment

RPN is a screen. A severe use error may require mitigation even if RPN is moderate.

Plausibility Check

The product of three single-digit scores is a two-digit value.

Exercise 8: Training Effectiveness

Before training, task success is 18 of 25. After training, success is 23 of 25. Compute improvement in percentage points.

Solution

Before:

S_1=\dfrac{18}{25}=72\%

After:

S_2=\dfrac{23}{25}=92\%

Improvement:

\Delta S=20\ \text{points}

Engineering Comment

Training can be a mitigation only if training is realistic, repeatable and available to intended users.

Plausibility Check

Five additional successes out of twenty-five users equals twenty percentage points.

Exercise 9: Retest Coverage

There are 5 mitigated use errors. Retesting covers 4. Compute retest coverage.

Solution

C=\dfrac{4}{5}=80\%

Engineering Comment

An untested mitigation should remain open unless justified by risk.

Plausibility Check

One missing retest out of five leaves eighty percent coverage.

Exercise 10: Label Comprehension Rate

Twenty-four users interpret a warning label. Twenty-one interpret it correctly. Compute comprehension rate.

Solution

C=\dfrac{21}{24}=87.5\%

Engineering Comment

A warning that is not understood by representative users may not be an effective risk control.

Plausibility Check

Three incorrect interpretations out of twenty-four leave less than ninety percent.

Exercise 11: Workload Score Margin

Acceptable workload score is at most 45. Observed mean score is 41 with uncertainty allowance 6. Compute guarded score and margin.

Solution

Guarded score:

W_g=41+6=47

Margin:

M=45-47=-2

Engineering Comment

Nominal workload passes, but guarded workload fails.

Plausibility Check

Adding uncertainty can turn a small apparent margin negative.

Exercise 12: Alarm Response Time

Required alarm response is 60 seconds. Mean response is 44 seconds and 95th percentile is 71 seconds. Does the alarm response pass?

Solution

Mean passes:

44<60

95th percentile fails:

71>60

The response gate fails if it uses the 95th percentile.

Engineering Comment

Alarm response should protect slower but representative users, not only average users.

Plausibility Check

The long-tail response exceeds the requirement.

Exercise 13: False Alarm Burden

A device produces 18 false alarms over 72 monitored hours. Compute false alarm rate.

Solution

r=\dfrac{18}{72}=0.25\ \text{false alarms/h}

Engineering Comment

False alarms can create alarm fatigue and reduce response reliability to true alarms.

Plausibility Check

Eighteen alarms over three days is one every four hours.

Exercise 14: Missed Alarm Fraction

During simulation, 2 of 30 true alarm events are missed by users. Compute missed alarm fraction.

Solution

f=\dfrac{2}{30}=6.7\%

Engineering Comment

Missed alarms should be linked to audibility, visibility, workload and workflow placement.

Plausibility Check

Two misses out of thirty is less than ten percent.

Exercise 15: Alarm Benefit-Risk Screen

True alarms are 30 and false alarms are 18. Compute false-to-true alarm ratio.

Solution

R=\dfrac{18}{30}=0.60

Engineering Comment

A high false-to-true ratio may weaken user trust even if sensitivity is acceptable.

Plausibility Check

False alarms are a little over half of true alarms.

Exercise 16: Usability Evidence Completion

The release package requires task list, user groups, environment, raw observations, use-error log, mitigation list, retest evidence, training condition, alarm response and residual risk decision. Eight of ten records are complete. Compute completion.

Solution

C=\dfrac{8}{10}=80\%

Engineering Comment

Missing use-error logs, retest evidence or residual risk decisions should block release.

Plausibility Check

Eight of ten is exactly eighty percent.

Exercise 17: Residual Concern Count

There are 6 residual usability concerns. Four are low risk and two are medium risk. A release rule allows no medium or high residual concern without explicit sign-off. Does it pass automatically?

Solution

No. Medium concerns exist:

N_{medium}=2>0

Engineering Comment

The release package needs explicit sign-off or additional mitigation for medium concerns.

Plausibility Check

Any medium concern violates a rule allowing only low concerns.

Exercise 18: Usability Release Gate

A release gate requires critical-task success above 95\%, all user groups above minimum, no unresolved critical-task failures above 2, alarm 95th percentile below 60 seconds, false alarm rate below 0.2/\text{h} and evidence completion above 90\%. Current values are 93.3\%, maintenance group below minimum, 3 unresolved failures, 71 seconds, 0.25/\text{h} and 80\%. Decide release status.

Solution

All listed thresholds fail:

93.3\%<95\%
71>60

Release status:

\text{hold}

Engineering Comment

The usability package should hold for user coverage, critical tasks, alarm response, false alarms and evidence completion.

Plausibility Check

Multiple independent usability barriers fail, so release is not defensible.

Validation Package Checklist

  • Critical tasks are evaluated separately from noncritical tasks.
  • User groups and environments match the intended use.
  • Use-error mitigations are retested and residual concerns are risk-ranked.
  • Alarm response, missed alarms and false alarm burden are included in release evidence.
  • Training condition, label comprehension and workload assumptions are documented.
  • Medium or high residual concerns have explicit sign-off or additional mitigation.
  • Release status states accept, retest mitigation, revise interface, narrow claim or hold.
REF

See also