Exercise set

Medical Device Clinical Evidence, Performance, and Postmarket Validation Exercises

Solved medical-device evidence exercises for sensitivity, specificity, confidence width, dataset boundary, complaint rate, trend triggers and release gates.

These exercises focus on clinical and post-market evidence as engineering validation signals. They cover sensitivity, specificity, confidence-width screens, dataset boundary, subgroup coverage, data age, complaint rate, trend triggers, field feedback and release gates. They are not clinical guidance or regulatory advice.

Diagnostic workflow threshold exercises remain a separate diagnostic-systems resource. This page focuses on whether evidence is strong enough to support a medical-device claim and lifecycle release decision.

How to use these exercises

Use the set as an engineering evidence review for a medical-device claim. Exercises 1 to 6 check core performance metrics, confidence width and threshold margin. Exercises 7 to 11 test whether the dataset still matches intended users, environments, software version, missing-data assumptions and workflow latency. Exercises 12 to 17 convert field data into complaint rate, trend signals, severity, recurrence and post-market evidence completion. Exercise 18 combines the gates into a lifecycle release decision.

Before calculating, state the claim, intended population, clinical or operational endpoint, comparator, software version, use environment and evidence source. A sensitivity value, complaint rate or dataset count is not release evidence unless it is tied to the claim boundary. The engineering comment below each exercise identifies the missing bridge, denominator, subgroup or lifecycle control that must be resolved before release.

Release Evidence Notes

Clinical evidence should state the claim, endpoint, dataset boundary, inclusion and exclusion conditions, comparator, subgroup coverage, performance metric, confidence uncertainty and residual limitations.

Post-market evidence should state denominator, exposure basis, complaint type, severity, trend rule, investigation boundary and corrective-action trigger.

Software version, dataset age and subgroup coverage should be treated as release boundaries, not as footnotes. If the device algorithm, intended environment or target population changes, the evidence package should show whether the old data still bridge to the current claim.

The evidence package should separate performance evidence, claim coverage and lifecycle feedback. Performance evidence answers how the device behaved in the evaluated data. Claim coverage answers whether those data represent the intended use. Lifecycle feedback answers whether field experience is consistent with the risk file and validation assumptions. A release decision needs all three; one strong metric cannot compensate for a broken claim boundary.

Engineering Boundary Notes

These calculations do not replace clinical evaluation, biostatistical analysis, regulatory submission work, ethics review, medical judgment or post-market surveillance procedures. They are simplified engineering screens for evidence quality.

The main boundary is representativeness. Aggregate performance can look acceptable while a subgroup, environment, device version or workflow tail fails the actual claim. The second boundary is time: field trend signals, complaint coding, software updates and data age can make previously valid evidence incomplete for the current release candidate.

Common Release Mistakes

  • reporting sensitivity without confidence or dataset boundary;
  • accepting a dataset that does not match intended use;
  • hiding subgroup underperformance inside aggregate results;
  • calculating complaint rate without exposure denominator;
  • waiting for severe events when trend triggers already require investigation.

Another common mistake is treating software-version changes as administrative history. If an algorithm, signal-processing path, UI workflow or intended environment changes, the evidence package should either bridge the change or narrow the claim. Version mismatch is an engineering validation issue, not a formatting issue.

Do not close post-market review with incomplete denominators or incomplete coding. Complaint counts without exposure, severity, recurrence and investigation status can either understate risk or exaggerate noise. The release record should show how field evidence was converted into action, hold, CAPA review or continued monitoring.

Scenario Map

ScenarioExercisesPrimary checkEngineering decision
Performance metrics1, 2, 3, 4, 5, 6sensitivity, specificity, PPV, NPV, threshold margin and confidence widthDecide whether performance evidence supports the claim.
Dataset boundary7, 8, 9, 10, 11subgroup, environment, data age, missing data and latencyDecide whether the evidence matches intended use.
Post-market evidence12, 13, 14, 15, 16, 17complaint rate, trend, severity, recurrence and evidence completionDecide whether lifecycle feedback is acceptable.
Release gate18all-of clinical/post-market releaseDecide whether the evidence package can close.

Exercise 1: Diagnostic Sensitivity

A dataset has 180 positive reference cases. The device correctly identifies 166. Compute sensitivity.

Solution

Se=\dfrac{166}{180}=92.2\%

Engineering Comment

Sensitivity should be tied to claim scope, disease state, comparator and confidence interval.

Plausibility Check

Fourteen misses out of one hundred eighty leaves sensitivity a little above ninety percent.

Exercise 2: Diagnostic Specificity

A dataset has 220 negative reference cases. The device correctly identifies 205. Compute specificity.

Solution

Sp=\dfrac{205}{220}=93.2\%

Engineering Comment

Specificity affects false positives, workflow burden and downstream confirmatory testing.

Plausibility Check

Fifteen false positives out of two hundred twenty leaves specificity above ninety percent.

Exercise 3: Positive Predictive Value

There are 166 true positives and 15 false positives. Compute PPV.

Solution

PPV=\dfrac{166}{166+15}=91.7\%

Engineering Comment

PPV depends on dataset prevalence; it should not be generalized without context.

Plausibility Check

False positives are small relative to true positives, so PPV is high.

Exercise 4: Negative Predictive Value

There are 205 true negatives and 14 false negatives. Compute NPV.

Solution

NPV=\dfrac{205}{205+14}=93.6\%

Engineering Comment

False negatives may carry higher clinical risk, so NPV must be interpreted with intended use.

Plausibility Check

Fourteen misses among more than two hundred negative outputs leaves NPV above ninety percent.

Exercise 5: Confidence Half-Width Screen

Observed sensitivity is 92.2\% with approximate 95\% confidence half-width 4.0 percentage points. The release rule requires lower bound above 88\%. Compute lower bound.

Solution

LB=92.2\%-4.0\%=88.2\%

Engineering Comment

The evidence barely clears the lower-bound screen.

Plausibility Check

Subtracting four points from just over ninety-two leaves just over eighty-eight.

Exercise 6: Threshold Margin

Performance threshold is sensitivity at least 90\%. Observed sensitivity is 92.2\%. Compute nominal margin.

Solution

M=92.2\%-90.0\%=2.2\ \text{points}

Engineering Comment

Nominal margin is small and should be interpreted with confidence bounds and subgroup results.

Plausibility Check

The observed result is only slightly above the threshold.

Exercise 7: Subgroup Coverage

The claim includes adult, pediatric and geriatric subgroups. Dataset counts are 260, 28 and 112. Minimum subgroup count is 50. Which subgroup fails?

Solution

Pediatric subgroup fails:

28<50

Engineering Comment

Aggregate performance cannot support a subgroup claim with weak subgroup coverage.

Plausibility Check

Only one subgroup is below the minimum count.

Exercise 8: Environment Match

Clinical evidence includes hospital and clinic settings. Intended use also includes home use. Compute environment coverage if there are three intended settings.

Solution

C=\dfrac{2}{3}=66.7\%

Engineering Comment

Home-use claim should be narrowed or supported with additional evidence.

Plausibility Check

Two of three settings gives two-thirds coverage.

Exercise 9: Missing Data Fraction

A validation dataset planned 420 records. Complete analyzable records are 396. Compute missing-data fraction.

Solution

Missing records:

N=420-396=24

Fraction:

f=\dfrac{24}{420}=5.7\%

Engineering Comment

Missing data should be checked for bias, not only counted.

Plausibility Check

Twenty-four out of about four hundred is a few percent.

Exercise 10: Data-Age Validation

Evidence uses software version 3.1. Current release candidate is 3.4. There have been 3 algorithm-affecting changes since 3.1. A release rule allows at most one without bridging evidence. Does it pass?

Solution

It fails:

3>1

Engineering Comment

Clinical or performance evidence may need bridging validation after algorithm changes.

Plausibility Check

Three relevant changes exceed a one-change allowance.

Exercise 11: Decision Latency

A clinical workflow claim requires result availability within 120 seconds. Median latency is 82 seconds and 95th percentile is 138 seconds. Which statistic fails?

Solution

Median passes:

82<120

95th percentile fails:

138>120

Engineering Comment

Workflow claims often depend on tail latency, not median latency.

Plausibility Check

Most cases can be fast while the tail violates the claim.

Exercise 12: Complaint Rate

There are 36 relevant complaints over 180000 device uses. Compute complaint rate per 10000 uses.

Solution

r=\dfrac{36}{180000}(10000)=2.0\ \text{complaints/10000 uses}

Engineering Comment

The denominator must reflect actual exposure, not shipped units alone.

Plausibility Check

Thirty-six complaints over one hundred eighty thousand uses is a very low fraction.

Exercise 13: Complaint Trend Trigger

The alert limit is 1.5 complaints per 10000 uses. Current rate is 2.0. Compute exceedance ratio.

Solution

R=\dfrac{2.0}{1.5}=1.33

Engineering Comment

The rate is one third above the alert limit and should trigger investigation under this rule.

Plausibility Check

Two is larger than one and a half by one third.

Exercise 14: Severe Complaint Fraction

Of 36 complaints, 4 are severe. Compute severe fraction.

Solution

f=\dfrac{4}{36}=11.1\%

Engineering Comment

Severity can trigger escalation even if total complaint count is low.

Plausibility Check

Four is one ninth of thirty-six.

Exercise 15: Recurrence Interval

Three similar complaints occur in 45 days. Compute average recurrence interval.

Solution

T=\dfrac{45}{3}=15\ \text{d/complaint}

Engineering Comment

Recurring similar complaints suggest a systematic issue rather than random field noise.

Plausibility Check

Three events evenly spread over forty-five days gives one every fifteen days.

Exercise 16: Post-Market Evidence Completion

The post-market review requires exposure denominator, complaint coding, severity review, trend chart, returned device analysis, software version, user group, environment, CAPA decision and management sign-off. Seven of ten records are complete. Compute completion.

Solution

C=\dfrac{7}{10}=70\%

Engineering Comment

Post-market evidence is not complete enough to close a trend if CAPA decision or denominator is missing.

Plausibility Check

Seven of ten is seventy percent.

Exercise 17: Field Signal RPN

A recurring false-negative complaint mode has severity 9, occurrence 3 and detection 4. Compute RPN.

Solution

RPN=9(3)(4)=108

Engineering Comment

RPN should not hide high severity; false-negative signals often need escalation even with moderate occurrence.

Plausibility Check

High severity with moderate occurrence and detection produces a three-digit RPN.

Exercise 18: Clinical Evidence Release Gate

A release gate requires sensitivity lower bound above 88\%, subgroup counts all above 50, environment coverage 100\%, no algorithm-change gap, complaint rate below 1.5 per 10000 uses and post-market evidence completion above 90\%. Current values are lower bound 88.2\%, pediatric count 28, environment coverage 66.7\%, algorithm gap fail, complaint rate 2.0 and completion 70\%. Decide release status.

Solution

Sensitivity lower bound passes. Subgroup count, environment coverage, algorithm gap, complaint rate and evidence completion fail:

28<50
2.0>1.5

Release status:

\text{hold}

Engineering Comment

The evidence package should hold because performance margin alone does not close subgroup, environment, software-change and post-market gaps.

Plausibility Check

One passing metric cannot release a claim when multiple evidence boundaries fail.

Validation Package Checklist

  • Performance metrics include sensitivity, specificity, predictive values and confidence bounds where relevant.
  • Dataset boundary matches intended users, environments, software version and workflow.
  • Missing data and subgroup coverage are visible.
  • Post-market complaint trends use exposure denominators, severity and recurrence rules before closure.
  • Bridging evidence is stated for algorithm, workflow, population or environment changes.
  • Field signals are linked to risk controls, CAPA decisions or continued monitoring.
  • Release status states accept, narrow claim, bridge evidence, investigate trend or hold.
REF

See also