Exercise set

Engineering Statistical Inference, Process Capability, and DOE Exercises

Worked engineering-statistics exercises for confidence intervals, sample size, DOE, regression, capability, control charts, guard bands and release.

Branch: Mathematical Engineering
Content: Exercise set
Updated: Jul 03, 2026
Revision: v1.0.0 · reviewed

These exercises practise engineering statistics as release evidence for measurements, experiments and processes. They cover confidence intervals, sample size, design-change comparison, factorial effects, regression margin, model drift, Monte Carlo precision, process capability, control charts, guard bands, binomial defect bounds, statistical power, tolerance bounds, gage variation, subgrouping, false rejection, field-sample representativeness and release gates.

The focus is narrower than reliability life data. Here the central question is whether measured samples, experiments and process evidence support a design, process or acceptance decision without hiding uncertainty, bias or sampling weakness.

How to Use These Exercises

For each calculation, define:

the measured characteristic, unit, lot, run, operator or scenario boundary;
the statistical model and what engineering claim it supports;
the consequence of false acceptance and false rejection;
the required confidence, coverage, power or guard band;
the release action when the bound is weaker than the point estimate.

The common mistake is reporting a sample average, p-value or capability index without connecting it to the release decision and sampling boundary.

Release Evidence Notes

Sampling evidence should preserve representativeness. A large sample can be weak when it excludes the high-temperature condition, worst operator, marginal supplier lot or field configuration that controls release.

Confidence evidence should state what is bounded. A confidence interval for a mean is not a tolerance bound for future units, and a process capability index is not a reliability claim.

Process evidence should separate common-cause control from specification compliance. A stable process can be off target; an on-spec sample can come from an unstable process.

Guard-band evidence should account for measurement uncertainty and decision risk. Tightening the acceptance limit can reduce false accepts while increasing false rejects.

Engineering Boundary Notes

These exercises use simplified statistical formulas. Real engineering release may require distribution checks, nonparametric methods, mixed-effects models, randomization, independence review, measurement-system analysis, outlier policy, missing-data treatment and subject-matter judgement.

A statistical pass does not override a physical failure mode. If test data pass statistically but failures cluster by mechanism, the mechanism controls the release decision.

Scenario Map

Scenario	Exercises	Primary calculation	Engineering decision
Estimation and comparison	1-6, 16	confidence intervals, sample size, two-sample comparison, DOE, regression and power	Decide whether measured evidence supports a change or estimate.
Process and measurement release	7-12, 14-15	model drift, Monte Carlo precision, capability, control charts, guard bands, defect bounds and gage contribution	Decide whether production or inspection can release.
Population coverage	13, 17-18	tolerance bounds, representativeness and weighted release gates	Decide whether future units are covered by the evidence.

Exercise 1: Confidence Interval for Sensor Bias

A calibration check measures sensor bias on:

n=16

units. The sample mean bias is:

\bar{x}=0.42\ \text{mm}

and sample standard deviation is:

s=0.28\ \text{mm}

Use $t=2.13$ for a two-sided $95\%$ confidence interval.

Solution

Standard error:

SE=\dfrac{s}{\sqrt{n}}=\dfrac{0.28}{4}=0.070\ \text{mm}

Half-width:

h=tSE=2.13(0.070)=0.149\ \text{mm}

Confidence interval:

0.42\pm0.149=[0.271,\ 0.569]\ \text{mm}

Engineering Comment

The interval estimates mean bias, not worst-case individual error. A guard band or tolerance bound may still be needed for acceptance decisions.

Plausibility Check

Sixteen samples reduce the standard deviation by a factor of four, so a half-width near $0.15$ mm is plausible.

Exercise 2: Sample Size for a Mean Estimate

An engineer wants mean force estimated within:

E=1.5\ \text{N}

at approximately $95\%$ confidence. Historical standard deviation is:

\sigma=5.0\ \text{N}

Use:

n=\left(\dfrac{1.96\sigma}{E}\right)^2

Solution

Sample size:

n=\left(\dfrac{1.96(5.0)}{1.5}\right)^2=42.7

Round up:

n=43

Engineering Comment

The sample should represent lots, operators and environmental states. A large sample from one easy condition is not strong evidence for release.

Plausibility Check

The desired error is much smaller than the standard deviation, so dozens of samples are expected.

Exercise 3: Two-Test Comparison for a Design Change

Old and revised parts are tested. The mean strength values are:

\bar{x}_1=118.2\ \text{MPa},\qquad \bar{x}_2=124.6\ \text{MPa}

The standard error of the difference is:

SE_\Delta=2.1\ \text{MPa}

Use $t=2.0$ for a simplified $95\%$ interval. Calculate the difference interval.

Solution

Difference:

\Delta=124.6-118.2=6.4\ \text{MPa}

Half-width:

h=2.0(2.1)=4.2\ \text{MPa}

Interval:

\Delta=[2.2,\ 10.6]\ \text{MPa}

Engineering Comment

The interval excludes zero, so the change appears beneficial under this test. Release still needs failure-mode review and representativeness.

Plausibility Check

The observed difference is about three standard errors, so a positive interval is reasonable.

Exercise 4: Two-Factor DOE Main Effects

A $2^2$ screening experiment measures yield for factors $A$ and $B$ :

A	B	Yield
low	low	$82$
high	low	$88$
low	high	$85$
high	high	$95$

Calculate main effects for $A$ and $B$ .

Solution

Mean at high $A$ :

\bar{y}_{A+}=\dfrac{88+95}{2}=91.5

Mean at low $A$ :

\bar{y}_{A-}=\dfrac{82+85}{2}=83.5

Effect of $A$ :

E_A=91.5-83.5=8.0

Effect of $B$ :

E_B=\dfrac{85+95}{2}-\dfrac{82+88}{2}=90.0-85.0=5.0

Engineering Comment

Factor $A$ has the larger main effect, but interaction and replication should be checked before final process changes.

Plausibility Check

The best run is high-high, and high $A$ improves both low and high $B$ cases, so a positive $A$ effect is expected.

Exercise 5: Interaction Effect in a Screening DOE

Use the DOE data from Exercise 4. Calculate the interaction effect:

E_{AB}=\dfrac{(y_{++}+y_{--})-(y_{+-}+y_{-+})}{2}

Solution

Substitute:

E_{AB}=\dfrac{(95+82)-(88+85)}{2}

E_{AB}=\dfrac{177-173}{2}=2.0

Engineering Comment

The interaction is smaller than the main effects but not zero. Confirmation runs should check whether the high-high condition is stable and repeatable.

Plausibility Check

The high-high yield is better than adding main effects alone would suggest slightly, so a small positive interaction is plausible.

Exercise 6: Regression Prediction with Engineering Margin

A regression predicts temperature rise:

\hat{T}=62.0^\circ\text{C}

at the release load. Prediction standard error is:

s_p=3.5^\circ\text{C}

Use a guarded prediction:

T_g=\hat{T}+2s_p

The limit is:

70^\circ\text{C}

Check release.

Solution

Guarded prediction:

T_g=62.0+2(3.5)=69.0^\circ\text{C}

Margin:

M=70.0-69.0=1.0^\circ\text{C}

The release passes with narrow guarded margin.

Engineering Comment

Regression evidence should include residual checks and the range of data used to fit the model. Extrapolation can invalidate the margin.

Plausibility Check

The nominal value is $8^\circ$ C below the limit, and the guard consumes $7^\circ$ C of that margin.

Exercise 7: Residual Z-Score for Model Drift

A digital model predicts:

\hat{x}=48.0

The measured value is:

x=52.4

The residual standard deviation from validation is:

\sigma_r=1.6

Calculate residual z-score and compare with a drift trigger of $|z|>2.5$ .

Solution

Residual:

r=52.4-48.0=4.4

Z-score:

z=\dfrac{4.4}{1.6}=2.75

The drift trigger is exceeded.

Engineering Comment

Model drift should trigger an engineering review of sensors, operating state, configuration and physics assumptions, not only a statistical alert.

Plausibility Check

The residual is almost three times the validation standard deviation, so a trigger is expected.

Exercise 8: Monte Carlo Failure Probability Precision

A Monte Carlo simulation has:

N=50{,}000

runs and observes:

k=140

failures. Estimate failure probability and approximate standard error:

SE=\sqrt{\dfrac{\hat{p}(1-\hat{p})}{N}}

Solution

Failure probability:

\hat{p}=\dfrac{140}{50{,}000}=0.0028

Standard error:

SE=\sqrt{\dfrac{0.0028(0.9972)}{50{,}000}}=0.000236

Engineering Comment

Rare-event simulations need enough failures to make the estimate stable. Input distributions and dependence assumptions may dominate the numerical standard error.

Plausibility Check

One hundred forty failures is enough for a small standard error but not enough to ignore modelling assumptions.

Exercise 9: Process Capability and Off-Center Mean

A process has specification:

10.00\pm0.30\ \text{mm}

Measured mean is:

\mu=10.08\ \text{mm}

and standard deviation is:

\sigma=0.055\ \text{mm}

Calculate $C_p$ and $C_{pk}$ .

Solution

Specification width:

USL-LSL=10.30-9.70=0.60

Capability:

C_p=\dfrac{0.60}{6(0.055)}=1.82

Upper-side capability:

C_{pk,U}=\dfrac{10.30-10.08}{3(0.055)}=1.33

Lower-side capability:

C_{pk,L}=\dfrac{10.08-9.70}{3(0.055)}=2.30

Therefore:

C_{pk}=1.33

Engineering Comment

The process has good spread capability but is shifted toward the upper limit. Centering can improve release margin without reducing variation.

Plausibility Check

$C_p$ is larger than $C_{pk}$ because the mean is off center.

Exercise 10: X-Bar Control Chart Signal

A process has historical mean:

\mu=50.0

and standard deviation:

\sigma=1.2

Subgroup size is:

n=4

Calculate three-sigma control limits for subgroup means. A new subgroup mean is $52.1$ . Check the signal.

Solution

Standard error:

SE=\dfrac{1.2}{\sqrt{4}}=0.6

Control limits:

UCL=50.0+3(0.6)=51.8

LCL=50.0-3(0.6)=48.2

The subgroup mean $52.1$ exceeds $51.8$ , so it signals.

Engineering Comment

A control-chart signal is a process stability issue. It should not be averaged away just because individual units are still inside specification.

Plausibility Check

The subgroup mean is slightly above the upper control limit, so the signal is narrow but real.

Exercise 11: Guard-Band Acceptance with Measurement Uncertainty

A dimension must not exceed:

L=25.00\ \text{mm}

Expanded measurement uncertainty is:

U=0.08\ \text{mm}

The acceptance rule is:

x_m+U\le L

A measured part has:

x_m=24.95\ \text{mm}

Check acceptance.

Solution

Guarded value:

x_g=24.95+0.08=25.03\ \text{mm}

Since:

25.03>25.00

the part is not accepted under the guard-band rule.

Engineering Comment

Guard bands reduce false acceptance risk but can increase false rejects. The rule should match product risk and measurement capability.

Plausibility Check

The nominal measurement is only $0.05$ mm below the limit, smaller than the uncertainty.

Exercise 12: Binomial Defect-Rate Upper Bound

A supplier lot sample tests:

n=120

parts and finds:

k=2

defects. Use the simple point estimate and an approximate one-sided upper bound:

p_U=\hat{p}+1.64\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}

Solution

Point defect rate:

\hat{p}=\dfrac{2}{120}=0.0167

Standard error:

SE=\sqrt{\dfrac{0.0167(0.9833)}{120}}=0.0117

Upper bound:

p_U=0.0167+1.64(0.0117)=0.0359

Therefore:

p_U=3.6\%

Engineering Comment

The point defect rate looks low, but the upper bound is wider because only $120$ parts were sampled and defects were observed.

Plausibility Check

Two defects in $120$ is about $1.7\%$ , and a conservative bound roughly doubles it.

Exercise 13: One-Sided Tolerance Bound for Release

A sample has:

\bar{x}=94.0\ \text{N},\qquad s=1.6\ \text{N}

For the selected confidence and coverage, use one-sided factor:

k=2.10

Upper tolerance bound is:

UTB=\bar{x}+ks

The release limit is $100$ N. Check release.

Solution

Tolerance bound:

UTB=94.0+2.10(1.6)=97.36\ \text{N}

Margin:

M=100-97.36=2.64\ \text{N}

The release passes.

Engineering Comment

A tolerance bound addresses population coverage. It is wider than a confidence interval for the mean because future units must be covered.

Plausibility Check

The mean is far enough below the limit that adding a few standard deviations still passes.

Exercise 14: Measurement Variation and Apparent Capability

A measured process standard deviation is:

s_{obs}=0.060\ \text{mm}

Gage R&R standard deviation is:

s_g=0.035\ \text{mm}

Estimate process-only standard deviation:

s_p=\sqrt{s_{obs}^2-s_g^2}

Solution

Process standard deviation:

s_p=\sqrt{0.060^2-0.035^2}

s_p=\sqrt{0.003600-0.001225}=0.0487\ \text{mm}

Engineering Comment

Measurement error inflates observed spread. Correcting the estimate helps diagnosis, but inspection decisions still need the actual measurement uncertainty.

Plausibility Check

Removing measurement variation should reduce the standard deviation from $0.060$ mm to a smaller value.

Exercise 15: False-Rejection Rate from Guard Band

Assume true part values are normally distributed with:

\mu=24.82\ \text{mm},\qquad \sigma=0.05\ \text{mm}

Specification limit is $25.00$ mm, but the guarded acceptance limit is:

24.90\ \text{mm}

Estimate the probability a conforming part exceeds the guarded limit using:

z=\dfrac{24.90-24.82}{0.05}

and $P(Z>1.6)=0.0548$ .

Solution

Z-score:

z=1.6

False-rejection probability under this simplified model:

P=5.48\%

Engineering Comment

Guard bands reduce false acceptance risk but may reject good product. The economic and safety trade-off should be explicit.

Plausibility Check

The guarded limit is $1.6$ standard deviations above the process mean, so a few percent tail probability is plausible.

Exercise 16: Statistical Power for a Design-Change Test

A test should detect a true improvement of:

\Delta=4.0

with standard deviation:

\sigma=5.0

and equal sample sizes:

n=25

per group. Approximate signal-to-noise for the difference:

Z=\dfrac{\Delta}{\sigma\sqrt{2/n}}

Solution

Standard error of difference:

SE=5.0\sqrt{\dfrac{2}{25}}=1.414

Signal-to-noise:

Z=\dfrac{4.0}{1.414}=2.83

Engineering Comment

A larger $Z$ supports good power, but final power depends on alpha level, one-sided versus two-sided test and the practical decision threshold.

Plausibility Check

Twenty-five per group reduces noise enough that a $4$ unit change is nearly three standard errors.

Exercise 17: Field-Sample Representativeness Weight

A validation plan requires coverage of three operating states:

State	Required share	Sampled share
normal	$50\%$	$70\%$
hot	$30\%$	$20\%$
vibration	$20\%$	$10\%$

Use the minimum sampled-to-required ratio as a representativeness score.

Solution

Ratios:

R_n=\dfrac{70}{50}=1.40

R_h=\dfrac{20}{30}=0.667

R_v=\dfrac{10}{20}=0.50

The representativeness score is:

R=0.50

Engineering Comment

Over-sampling normal cases does not compensate for under-sampling vibration if vibration controls failure or performance.

Plausibility Check

The vibration state has only half of the required coverage, so the score should be $0.50$ .

Exercise 18: Statistical Release Gate

A statistical release package has five gates:

Gate	Weight	Result
sampling representativeness	$0.25$	$0.88$
confidence or tolerance bound	$0.25$	$0.94$
measurement-system evidence	$0.20$	$0.91$
process stability and capability	$0.20$	$0.93$
decision-risk documentation	$0.10$	$0.96$

The weighted release threshold is:

S\ge 0.92

and sampling representativeness may not be below $0.90$ . Calculate the decision.

Solution

Weighted score:

\begin{aligned} S&=0.25(0.88)+0.25(0.94)+0.20(0.91)+0.20(0.93)+0.10(0.96)\\ &=0.220+0.235+0.182+0.186+0.096\\ &=0.919 \end{aligned}

The score is:

91.9\%

The score fails the $92\%$ threshold, and sampling also fails its floor. Release is held.

Engineering Comment

Statistical strength cannot be rescued by averaging if the sample misses the state that controls the engineering claim.

Plausibility Check

Several gates are good, but the largest-weight gate is below floor and pulls the score just under threshold.

Validation Package Checklist

Sampling plan names units, lots, operators, environments, time windows and exclusion rules.
Confidence intervals, tolerance bounds and prediction intervals are not used interchangeably.
DOE evidence includes factor levels, randomization, replication and interaction review.
Regression evidence includes residual checks and no unguarded extrapolation.
Capability and control-chart evidence separate stability from specification compliance.
Guard bands state false-accept and false-reject consequences.
Release decisions document the action when new data move the bound.

Common Release Mistakes

Treating sample mean as population guarantee.
Reporting a p-value without a practical engineering effect size.
Using a confidence interval for the mean when the claim is about future units.
Accepting $C_p$ while ignoring off-center mean and $C_{pk}$ .
Averaging normal-condition samples to hide weak hot or vibration coverage.
Removing measurement uncertainty from acceptance decisions after using it to diagnose process spread.
Closing release when a guard-band, tolerance-bound or representativeness floor fails.

REF