Exercise set

Usability Validation, Use-Error, and Interface Release Exercises

Worked usability exercises for use-error risk, validation confidence, SUS, scenario coverage, interface layout and release gates.

These exercises focus on usability validation and interface release evidence: use-error probability, critical-task success, scenario coverage, confidence screens, SUS scoring, target acquisition, control spacing, error recovery and release gates. They are generic industrial and systems-engineering usability exercises, not medical-device-specific clinical validation.

Operator workload, physical ergonomics, handoffs, fatigue, alarm burden and field performance are handled in the companion specialist exercise set. This page stays on whether the interface, workflow and validation evidence control use error before release.

Release Evidence Notes

Usability evidence should name the intended user, task, interface state, operating context, success criterion, critical-error definition, sample boundary and release action. A favorable score is weak if critical tasks, abnormal scenarios, representative users or residual use errors are missing.

Engineering Boundary Notes

These examples use simplified rates, proportions, scores and confidence screens. Real usability engineering should use protocol design, representative users, realistic tasks, moderator controls, error taxonomy, task observations, residual-risk review and post-release monitoring.

Common Release Mistakes

  • reporting completion rate without separating critical and noncritical tasks;
  • treating zero observed use errors as proof of zero risk;
  • averaging user groups when one intended group is missing;
  • using SUS or satisfaction scores as a substitute for task success;
  • checking a screen visually without target size, spacing and mode-confusion evidence;
  • accepting a mitigation before retesting the changed interaction.

Scenario Map

ScenarioExercisesMain calculationRelease decision
Use-error risk1, 4, 12, 16, 17Expected escapes, RPN change, mode confusion, confirmation error and field-entry errorAdd design controls or retest when residual error remains credible.
Validation evidence2, 3, 5, 6, 7, 13, 14Success rate, completion gate, confidence bound, user and scenario coverage, recovery and assistanceRelease only when evidence covers representative critical tasks.
Interface layout8, 9, 10, 11, 15SUS score, movement time, critical-control spacing, label comprehension and alarm message actionabilityRedesign controls, wording or layout when interaction evidence fails.
Integrated release18Combined release blockersHold release when any critical usability gate fails.

Validation Package Checklist

  • intended user groups and representative participant counts;
  • critical task list, success rule and critical-error definition;
  • scenario, environment and operating-mode coverage;
  • use-error taxonomy, recovery path and residual-risk action;
  • interface controls, labels, target sizes, spacing and mode visibility;
  • release decision tied to evidence, not average preference alone.

Exercise 1: Expected Use-Error Escapes

A task is performed 9000 times per month. The observed use-error probability is 0.004 and the design detects 70\% of errors before harm. Estimate monthly undetected use-error escapes.

Solution

N_e=9000(0.004)(1-0.70)=10.8

Engineering Comment

Even a low per-task probability can produce frequent escapes when exposure is high. The release action should address the interaction, not only user reminders.

Plausibility Check

Four errors per thousand over nine thousand tasks gives thirty-six errors before detection; thirty percent escape.

Exercise 2: Critical-Task Success Rate

A critical task is completed successfully by 57 of 60 participants. Compute the success rate.

Solution

p=\dfrac{57}{60}=95.0\%

Engineering Comment

The percentage is not the whole decision. The three failures need severity, root cause and mitigation review.

Plausibility Check

Three failures out of sixty is exactly five percent failure.

Exercise 3: Validation Completion Gate

A validation protocol requires at least 94\% successful completion across representative attempts. The test has 185 successful attempts out of 200. Does it pass?

Solution

C=\dfrac{185}{200}=92.5\%

Since 92.5\%<94\%, the gate fails.

Engineering Comment

The team should identify failed scenarios and retest after design control. Averaging with easy tasks is not a valid closure path.

Plausibility Check

Fifteen failures in two hundred attempts is more than the allowed twelve failures for a ninety-four percent gate.

Exercise 4: RPN Before and After a Design Control

A use error has severity 8, occurrence 5 and detection 6. A redesign reduces occurrence to 2 and detection rating to 3. Compute old and new RPN.

Solution

RPN_{old}=8(5)(6)=240
RPN_{new}=8(2)(3)=48

Engineering Comment

The RPN reduction is meaningful only if the redesigned interaction was verified and validated. Severity remains high, so residual risk still needs review.

Plausibility Check

Occurrence and detection both improve, so RPN should drop sharply.

Exercise 5: Zero Critical Errors Confidence Bound

A validation test observes zero critical use errors in 75 independent attempts. Use the rule of three to estimate a 95\% upper bound on critical-error probability.

Solution

p_{upper}\approx\dfrac{3}{75}=0.040

Engineering Comment

Zero observed errors does not prove zero risk. If a four percent upper bound is too high for the task severity, more evidence or redesign is required.

Plausibility Check

More attempts would lower the bound; seventy-five attempts gives a few percent.

Exercise 6: Representative User Coverage

A study requires 12 novice users, 12 experienced users and 8 supervisors. It includes 13, 10 and 8. Which group fails?

Solution

Experienced users fail:

10<12

Novice users and supervisors pass:

13\ge12,\qquad 8\ge8

Engineering Comment

Total participant count cannot compensate for a missing intended user group.

Plausibility Check

Only the experienced-user count is below its planned minimum.

Exercise 7: High-Risk Scenario Coverage

A validation plan identifies 22 high-risk scenarios. Testing covers 19. Compute coverage and compare with a 90\% gate.

Solution

C=\dfrac{19}{22}=86.4\%

The plan fails the 90\% gate.

Engineering Comment

Uncovered scenarios should be tested, justified out of scope or removed from the claim. They should not disappear into an average score.

Plausibility Check

Three missing scenarios out of twenty-two is more than ten percent.

Exercise 8: SUS Score and Lower Confidence Bound

A study reports mean SUS score \bar{x}=77.1, sample standard deviation s=5.31 and n=12. Use t=1.80 for a one-sided screen. Compute the lower confidence bound.

Solution

SE=\dfrac{5.31}{\sqrt{12}}=1.53
LCB=77.1-1.80(1.53)=74.3

Engineering Comment

SUS can support usability evidence, but it cannot replace critical-task success and observed use-error review.

Plausibility Check

The bound is a few points below the mean because the sample is small but variability is moderate.

Exercise 9: Touch Target Acquisition Time

A critical touchscreen action uses a target width W=12\ \text{mm} at movement distance D=180\ \text{mm}. Use MT=0.12+0.13\log_2(D/W+1). Compute movement time.

Solution

ID=\log_2\left(\dfrac{180}{12}+1\right)=\log_2(16)=4.0
MT=0.12+0.13(4.0)=0.64\ \text{s}

Engineering Comment

Small targets far from the current pointer or finger location increase time and reduce tolerance to gloves, vibration and awkward posture.

Plausibility Check

An index of difficulty of four bits gives a movement time well below one second for one target, but repeated actions can consume release margin.

Exercise 10: Critical-Control Spacing Gate

Two adjacent critical controls are 18\ \text{mm} and 20\ \text{mm} wide. A gloved-use allowance is 10\ \text{mm} and the required neutral gap is 8\ \text{mm}. Current center spacing is 48\ \text{mm}. Does it pass?

Solution

Required center spacing is:

S_{req}=\dfrac{18}{2}+\dfrac{20}{2}+10+8=37\ \text{mm}

Since:

48>37

the spacing screen passes.

Engineering Comment

Spacing is not only aesthetics. Critical controls should account for fingers, gloves, vibration, posture and accidental activation risk.

Plausibility Check

The required spacing is a little over the sum of half-widths plus allowances, and forty-eight millimeters is above it.

Exercise 11: Label Comprehension Pass Rate

A label comprehension test has 44 correct interpretations out of 50 representative users. The gate is 90\%. Does it pass?

Solution

C=\dfrac{44}{50}=88.0\%

The label fails the gate.

Engineering Comment

Ambiguous labels create use errors even when the button layout is correct. The label should be rewritten and retested.

Plausibility Check

Six misunderstandings in fifty users is more than one in ten.

Exercise 12: Mode-Confusion Residual Risk

A hidden mode creates 16 wrong-action events per 10{,}000 tasks. A mode indicator is expected to reduce events by 75\%. Estimate residual wrong-action events.

Solution

N_r=16(1-0.75)=4\ \text{events per }10{,}000\text{ tasks}

Engineering Comment

Four residual events may still be unacceptable for severe tasks. Mode visibility may need interlocks, confirmation or task redesign.

Plausibility Check

A seventy-five percent reduction leaves one quarter of the original events.

Exercise 13: Error Recovery Success

In a simulated-use test, users recover from 27 of 30 noncritical errors without assistance. Compute recovery rate.

Solution

R=\dfrac{27}{30}=90.0\%

Engineering Comment

Recovery evidence is useful when errors are expected, but the design should still prevent critical errors where recovery is unlikely or too late.

Plausibility Check

Three unrecovered errors in thirty attempts gives ten percent failure.

Exercise 14: Assistance Request Rate

During validation, 18 assistance requests occur in 120 task attempts. The release screen allows at most 10\%. Does it pass?

Solution

A=\dfrac{18}{120}=15.0\%

The screen fails.

Engineering Comment

Frequent assistance requests indicate that users cannot complete the task independently under the tested conditions.

Plausibility Check

Twelve requests would equal ten percent, so eighteen is clearly above the limit.

Exercise 15: Alarm Message Actionability

An interface review samples 80 alarm messages. Users identify the correct next action for 66 messages. Compute actionability rate and compare with an 85\% gate.

Solution

A=\dfrac{66}{80}=82.5\%

The gate fails.

Engineering Comment

Alarm wording is part of the interface. If users cannot identify the action, the alarm adds workload without reliable control.

Plausibility Check

Fourteen unclear messages out of eighty is more than fifteen percent.

Exercise 16: Confirmation Dialog False Acceptance

A confirmation dialog is shown 500 times in a trial. Users accept it incorrectly 7 times. The false-acceptance gate is 1\%. Does it pass?

Solution

p=\dfrac{7}{500}=1.4\%

Since 1.4\%>1.0\%, it fails.

Engineering Comment

Confirmation dialogs often become automatic. A severe action may need differentiated wording, physical separation, delay, interlock or undo path.

Plausibility Check

Five false acceptances would equal one percent; seven is above that.

Exercise 17: Form Field Error Reduction

A redesigned data-entry screen reduces field errors from 36 errors in 600 entries to 14 errors in 600 entries. Compute relative reduction.

Solution

R=\dfrac{36-14}{36}=61.1\%

Engineering Comment

The reduction supports the redesign, but the remaining errors need field type, severity and detectability review.

Plausibility Check

The error count falls by twenty-two out of thirty-six, a little over sixty percent.

Exercise 18: Usability Release Gate

A release requires four gates: critical-task success at least 95\%, high-risk scenario coverage at least 90\%, label comprehension at least 90\% and no unresolved severe use-error cause. Results are 96\%, 92\%, 88\% and no unresolved severe cause. Does it release?

Solution

The label comprehension gate fails:

88\%<90\%

The release fails.

Engineering Comment

Usability release should not average independent gates. A weak label can still drive the next critical use error.

Plausibility Check

Three gates pass, but one explicit gate fails, so the integrated decision is hold.

REF

See also