Exercise set
Medical Device Clinical Evidence, Performance, and Postmarket Validation Exercises
Solved medical-device evidence exercises for sensitivity, specificity, confidence width, dataset boundary, complaint rate, trend triggers and release gates.
These exercises focus on clinical and post-market evidence as engineering validation signals. They cover sensitivity, specificity, confidence-width screens, dataset boundary, subgroup coverage, data age, complaint rate, trend triggers, field feedback and release gates. They are not clinical guidance or regulatory advice.
Diagnostic workflow threshold exercises remain a separate diagnostic-systems resource. This page focuses on whether evidence is strong enough to support a medical-device claim and lifecycle release decision.
How to use these exercises
Use the set as an engineering evidence review for a medical-device claim. Exercises 1 to 6 check core performance metrics, confidence width and threshold margin. Exercises 7 to 11 test whether the dataset still matches intended users, environments, software version, missing-data assumptions and workflow latency. Exercises 12 to 17 convert field data into complaint rate, trend signals, severity, recurrence and post-market evidence completion. Exercise 18 combines the gates into a lifecycle release decision.
Before calculating, state the claim, intended population, clinical or operational endpoint, comparator, software version, use environment and evidence source. A sensitivity value, complaint rate or dataset count is not release evidence unless it is tied to the claim boundary. The engineering comment below each exercise identifies the missing bridge, denominator, subgroup or lifecycle control that must be resolved before release.
Release Evidence Notes
Clinical evidence should state the claim, endpoint, dataset boundary, inclusion and exclusion conditions, comparator, subgroup coverage, performance metric, confidence uncertainty and residual limitations.
Post-market evidence should state denominator, exposure basis, complaint type, severity, trend rule, investigation boundary and corrective-action trigger.
Software version, dataset age and subgroup coverage should be treated as release boundaries, not as footnotes. If the device algorithm, intended environment or target population changes, the evidence package should show whether the old data still bridge to the current claim.
The evidence package should separate performance evidence, claim coverage and lifecycle feedback. Performance evidence answers how the device behaved in the evaluated data. Claim coverage answers whether those data represent the intended use. Lifecycle feedback answers whether field experience is consistent with the risk file and validation assumptions. A release decision needs all three; one strong metric cannot compensate for a broken claim boundary.
Engineering Boundary Notes
These calculations do not replace clinical evaluation, biostatistical analysis, regulatory submission work, ethics review, medical judgment or post-market surveillance procedures. They are simplified engineering screens for evidence quality.
The main boundary is representativeness. Aggregate performance can look acceptable while a subgroup, environment, device version or workflow tail fails the actual claim. The second boundary is time: field trend signals, complaint coding, software updates and data age can make previously valid evidence incomplete for the current release candidate.
Common Release Mistakes
- reporting sensitivity without confidence or dataset boundary;
- accepting a dataset that does not match intended use;
- hiding subgroup underperformance inside aggregate results;
- calculating complaint rate without exposure denominator;
- waiting for severe events when trend triggers already require investigation.
Another common mistake is treating software-version changes as administrative history. If an algorithm, signal-processing path, UI workflow or intended environment changes, the evidence package should either bridge the change or narrow the claim. Version mismatch is an engineering validation issue, not a formatting issue.
Do not close post-market review with incomplete denominators or incomplete coding. Complaint counts without exposure, severity, recurrence and investigation status can either understate risk or exaggerate noise. The release record should show how field evidence was converted into action, hold, CAPA review or continued monitoring.
Scenario Map
| Scenario | Exercises | Primary check | Engineering decision |
|---|---|---|---|
| Performance metrics | 1, 2, 3, 4, 5, 6 | sensitivity, specificity, PPV, NPV, threshold margin and confidence width | Decide whether performance evidence supports the claim. |
| Dataset boundary | 7, 8, 9, 10, 11 | subgroup, environment, data age, missing data and latency | Decide whether the evidence matches intended use. |
| Post-market evidence | 12, 13, 14, 15, 16, 17 | complaint rate, trend, severity, recurrence and evidence completion | Decide whether lifecycle feedback is acceptable. |
| Release gate | 18 | all-of clinical/post-market release | Decide whether the evidence package can close. |
Exercise 1: Diagnostic Sensitivity
A dataset has 180 positive reference cases. The device correctly identifies 166. Compute sensitivity.
Solution
Engineering Comment
Sensitivity should be tied to claim scope, disease state, comparator and confidence interval.
Plausibility Check
Fourteen misses out of one hundred eighty leaves sensitivity a little above ninety percent.
Exercise 2: Diagnostic Specificity
A dataset has 220 negative reference cases. The device correctly identifies 205. Compute specificity.
Solution
Engineering Comment
Specificity affects false positives, workflow burden and downstream confirmatory testing.
Plausibility Check
Fifteen false positives out of two hundred twenty leaves specificity above ninety percent.
Exercise 3: Positive Predictive Value
There are 166 true positives and 15 false positives. Compute PPV.
Solution
Engineering Comment
PPV depends on dataset prevalence; it should not be generalized without context.
Plausibility Check
False positives are small relative to true positives, so PPV is high.
Exercise 4: Negative Predictive Value
There are 205 true negatives and 14 false negatives. Compute NPV.
Solution
Engineering Comment
False negatives may carry higher clinical risk, so NPV must be interpreted with intended use.
Plausibility Check
Fourteen misses among more than two hundred negative outputs leaves NPV above ninety percent.
Exercise 5: Confidence Half-Width Screen
Observed sensitivity is 92.2\% with approximate 95\% confidence half-width 4.0 percentage points. The release rule requires lower bound above 88\%. Compute lower bound.
Solution
Engineering Comment
The evidence barely clears the lower-bound screen.
Plausibility Check
Subtracting four points from just over ninety-two leaves just over eighty-eight.
Exercise 6: Threshold Margin
Performance threshold is sensitivity at least 90\%. Observed sensitivity is 92.2\%. Compute nominal margin.
Solution
Engineering Comment
Nominal margin is small and should be interpreted with confidence bounds and subgroup results.
Plausibility Check
The observed result is only slightly above the threshold.
Exercise 7: Subgroup Coverage
The claim includes adult, pediatric and geriatric subgroups. Dataset counts are 260, 28 and 112. Minimum subgroup count is 50. Which subgroup fails?
Solution
Pediatric subgroup fails:
Engineering Comment
Aggregate performance cannot support a subgroup claim with weak subgroup coverage.
Plausibility Check
Only one subgroup is below the minimum count.
Exercise 8: Environment Match
Clinical evidence includes hospital and clinic settings. Intended use also includes home use. Compute environment coverage if there are three intended settings.
Solution
Engineering Comment
Home-use claim should be narrowed or supported with additional evidence.
Plausibility Check
Two of three settings gives two-thirds coverage.
Exercise 9: Missing Data Fraction
A validation dataset planned 420 records. Complete analyzable records are 396. Compute missing-data fraction.
Solution
Missing records:
Fraction:
Engineering Comment
Missing data should be checked for bias, not only counted.
Plausibility Check
Twenty-four out of about four hundred is a few percent.
Exercise 10: Data-Age Validation
Evidence uses software version 3.1. Current release candidate is 3.4. There have been 3 algorithm-affecting changes since 3.1. A release rule allows at most one without bridging evidence. Does it pass?
Solution
It fails:
Engineering Comment
Clinical or performance evidence may need bridging validation after algorithm changes.
Plausibility Check
Three relevant changes exceed a one-change allowance.
Exercise 11: Decision Latency
A clinical workflow claim requires result availability within 120 seconds. Median latency is 82 seconds and 95th percentile is 138 seconds. Which statistic fails?
Solution
Median passes:
95th percentile fails:
Engineering Comment
Workflow claims often depend on tail latency, not median latency.
Plausibility Check
Most cases can be fast while the tail violates the claim.
Exercise 12: Complaint Rate
There are 36 relevant complaints over 180000 device uses. Compute complaint rate per 10000 uses.
Solution
Engineering Comment
The denominator must reflect actual exposure, not shipped units alone.
Plausibility Check
Thirty-six complaints over one hundred eighty thousand uses is a very low fraction.
Exercise 13: Complaint Trend Trigger
The alert limit is 1.5 complaints per 10000 uses. Current rate is 2.0. Compute exceedance ratio.
Solution
Engineering Comment
The rate is one third above the alert limit and should trigger investigation under this rule.
Plausibility Check
Two is larger than one and a half by one third.
Exercise 14: Severe Complaint Fraction
Of 36 complaints, 4 are severe. Compute severe fraction.
Solution
Engineering Comment
Severity can trigger escalation even if total complaint count is low.
Plausibility Check
Four is one ninth of thirty-six.
Exercise 15: Recurrence Interval
Three similar complaints occur in 45 days. Compute average recurrence interval.
Solution
Engineering Comment
Recurring similar complaints suggest a systematic issue rather than random field noise.
Plausibility Check
Three events evenly spread over forty-five days gives one every fifteen days.
Exercise 16: Post-Market Evidence Completion
The post-market review requires exposure denominator, complaint coding, severity review, trend chart, returned device analysis, software version, user group, environment, CAPA decision and management sign-off. Seven of ten records are complete. Compute completion.
Solution
Engineering Comment
Post-market evidence is not complete enough to close a trend if CAPA decision or denominator is missing.
Plausibility Check
Seven of ten is seventy percent.
Exercise 17: Field Signal RPN
A recurring false-negative complaint mode has severity 9, occurrence 3 and detection 4. Compute RPN.
Solution
Engineering Comment
RPN should not hide high severity; false-negative signals often need escalation even with moderate occurrence.
Plausibility Check
High severity with moderate occurrence and detection produces a three-digit RPN.
Exercise 18: Clinical Evidence Release Gate
A release gate requires sensitivity lower bound above 88\%, subgroup counts all above 50, environment coverage 100\%, no algorithm-change gap, complaint rate below 1.5 per 10000 uses and post-market evidence completion above 90\%. Current values are lower bound 88.2\%, pediatric count 28, environment coverage 66.7\%, algorithm gap fail, complaint rate 2.0 and completion 70\%. Decide release status.
Solution
Sensitivity lower bound passes. Subgroup count, environment coverage, algorithm gap, complaint rate and evidence completion fail:
Release status:
Engineering Comment
The evidence package should hold because performance margin alone does not close subgroup, environment, software-change and post-market gaps.
Plausibility Check
One passing metric cannot release a claim when multiple evidence boundaries fail.
Validation Package Checklist
- Performance metrics include sensitivity, specificity, predictive values and confidence bounds where relevant.
- Dataset boundary matches intended users, environments, software version and workflow.
- Missing data and subgroup coverage are visible.
- Post-market complaint trends use exposure denominators, severity and recurrence rules before closure.
- Bridging evidence is stated for algorithm, workflow, population or environment changes.
- Field signals are linked to risk controls, CAPA decisions or continued monitoring.
- Release status states accept, narrow claim, bridge evidence, investigate trend or hold.