Exercise set

Reliability Availability, Redundancy, and Proof-Test Exercises

Solved reliability exercises for failure rates, MTBF, availability, redundancy, Weibull models, proof-test coverage, common cause and release gates.

These exercises practise reliability evidence for industrial systems: failure-rate estimates, MTBF, mission reliability, availability, series systems, parallel redundancy, common-cause limits, Weibull models, zero-failure evidence, proof-test coverage and release gates.

The goal is to make the reliability claim reproducible. A number is weak if the exposure basis is hidden, failure definitions are mixed, redundant channels are not independent, common support systems dominate, or proof tests do not cover latent failure modes.

Assume simplified screening models unless an exercise states otherwise. Real release work should also check duty cycle, censoring, maintenance resets, environmental severity, configuration control, diagnostic delay, repair quality, common-cause mechanisms and the consequence of each failed function.

Release Evidence Notes

Reliability evidence should name the asset, function, operating mode, exposure basis and failure definition. Calendar months, operating hours, starts, cycles, missions and demand events cannot be mixed without normalization.

Availability evidence should separate inherent availability from operational availability. A high MTBF and low MTTR result can still fail if logistics delay, spare shortage, permit delay or restart validation is outside the calculation.

Redundancy evidence should prove independence. Two channels that share power, software, cooling, calibration, maintenance procedure or environment may have much less benefit than a simple parallel formula suggests.

Proof-test evidence should map test cases to latent failure modes and required safety or service functions. Counting test steps is not enough if the steps do not expose the hidden failures that matter.

Engineering Boundary Notes

This page covers reliability modelling, availability screening, redundancy architecture and proof-test release. Maintenance interval decisions belong in the companion maintenance interval and condition-monitoring exercise set. Spare reorder points, stockout risk and repairable pools belong in the companion critical spare-parts exercise set.

The boundary is still operational engineering, not pure statistics. Use the mathematical reliability life-data set when the central question is confidence bounds, censored life data or statistical demonstration theory.

Scenario Map

ScenarioExercisesPrimary checkEngineering decision
Failure rate and availability1-4, 18MTBF, mission reliability, MTTR and series availabilityDecide whether field evidence supports service use.
Redundancy architecture5-8, 15-16One-out-of-two, two-out-of-three, shared controller and beta-factor limitsDecide whether redundancy is real or overstated.
Weibull and reliability evidence9-11, 17Weibull survival, B10 life, zero-failure lower bound and allocation marginValidate or restrict reliability claims.
Proof-test release12-14, 18Coverage, latent failure exposure and residual dangerous rateDecide whether release needs more testing or restriction.

Exercise 1: Failure Rate from Exposure

A fleet accumulates:

T=18500\ \text{operating h}

and records:

n_f=5

functional failures. Estimate the constant failure rate.

Solution

For a first screening estimate:

\lambda=\dfrac{n_f}{T}

Substitute:

\lambda=\dfrac{5}{18500}=2.70\times10^{-4}\ \text{h}^{-1}

Engineering Comment

This estimate is only meaningful for the stated failure definition. Do not combine nuisance alarms, planned shutdowns and loss-of-function failures unless the release question treats them as the same event.

Plausibility Check

Five failures in about eighteen thousand hours gives roughly one failure every few thousand hours, so a rate near 10^{-4}\ \text{h}^{-1} is plausible.

Exercise 2: MTBF from Failure Rate

Using the failure rate:

\lambda=2.70\times10^{-4}\ \text{h}^{-1}

estimate MTBF.

Solution

For a constant-rate model:

MTBF=\dfrac{1}{\lambda}

Therefore:

MTBF=\dfrac{1}{2.70\times10^{-4}}=3704\ \text{h}

Engineering Comment

MTBF is an average exposure measure, not a guarantee that an individual unit will survive for that duration. It should be tied to environment and configuration.

Plausibility Check

The reciprocal of 2.7\times10^{-4} is a little below 4000, so 3704 hours is consistent.

Exercise 3: Mission Reliability from MTBF

Assume constant failure rate and:

MTBF=3704\ \text{h}

Find reliability for a:

t=500\ \text{h}

mission.

Solution

For the exponential model:

R(t)=e^{-t/MTBF}

Thus:

R(500)=e^{-500/3704}=e^{-0.135}=0.874

Engineering Comment

This is a mission reliability statement, not an availability statement. It says the function survives the mission without failure under the constant-rate assumption.

Plausibility Check

The mission is much shorter than MTBF, so reliability should be high, but not extremely close to one.

Exercise 4: Availability from MTBF and MTTR

An asset has:

MTBF=3704\ \text{h},\qquad MTTR=8\ \text{h}

Estimate inherent availability.

Solution

Use:

A=\dfrac{MTBF}{MTBF+MTTR}

Substitute:

A=\dfrac{3704}{3704+8}=0.9978

So:

A=99.78\%

Engineering Comment

This excludes logistics and waiting time. If spares, permits or restart testing dominate downtime, operational availability will be lower.

Plausibility Check

Repair time is tiny compared with MTBF, so availability close to 99.8\% is reasonable.

Exercise 5: Series Availability

Three required subsystems have availabilities:

A_1=0.996,\quad A_2=0.992,\quad A_3=0.985

Estimate system availability if all three are required.

Solution

For required series functions:

A_{sys}=A_1A_2A_3

So:

A_{sys}=0.996(0.992)(0.985)=0.9732

Engineering Comment

High component availability can still produce a weaker system when many required elements are chained. The lowest-availability element usually deserves first review.

Plausibility Check

Multiplying three values below one should lower the result by several percentage points, so about 97.3\% is plausible.

Exercise 6: One-Out-of-Two Redundancy

Two independent channels each have mission reliability:

R_c=0.93

The function succeeds if at least one channel succeeds. Estimate function reliability.

Solution

Failure of one channel is:

Q_c=1-R_c=0.07

Both fail with probability:

Q_{both}=Q_c^2=0.07^2=0.0049

Reliability is:

R_{sys}=1-Q_{both}=0.9951

Engineering Comment

The independence assumption is the main risk. Shared firmware, wiring, calibration or environmental exposure can invalidate the simple result.

Plausibility Check

Each channel is imperfect, but requiring both to fail makes the system failure probability below one percent.

Exercise 7: Two-Out-of-Three Redundancy

Three independent sensors each have mission reliability:

R=0.94

The vote succeeds if at least two sensors work. Estimate reliability.

Solution

The success probability is:

R_{2oo3}=3R^2(1-R)+R^3

Substitute:

R_{2oo3}=3(0.94)^2(0.06)+(0.94)^3=0.9896

Engineering Comment

Voting improves random failure tolerance, but it can create other risks: common calibration bias, frozen data, voting logic faults and proof-test coverage gaps.

Plausibility Check

The result should be higher than one sensor at 94\% and lower than perfect reliability, so about 99\% is credible.

Exercise 8: Redundancy with a Shared Controller

Two redundant pumps each have availability:

A_p=0.965

A shared controller required by both pumps has availability:

A_c=0.982

Estimate system availability.

Solution

Availability of at least one pump is:

A_{pair}=1-(1-A_p)^2
A_{pair}=1-(0.035)^2=0.9988

With the controller in series:

A_{sys}=A_cA_{pair}=0.982(0.9988)=0.9808

Engineering Comment

The shared controller caps performance. Redundant field equipment cannot overcome a common required element with weaker availability.

Plausibility Check

The pump pair is nearly 99.9\%, but multiplying by a 98.2\% controller pulls the system near 98.1\%.

Exercise 9: Weibull Mission Reliability

A component has Weibull parameters:

\beta=1.7,\qquad \eta=6200\ \text{h}

Estimate reliability at:

t=2500\ \text{h}

Solution

Use:

R(t)=e^{-(t/\eta)^\beta}

Substitute:

R(2500)=e^{-(2500/6200)^{1.7}}=e^{-0.216}=0.806

Engineering Comment

Because \beta>1, the failure rate increases with age. Release should check whether the planned mission approaches a wear-out region.

Plausibility Check

The time is below half the scale parameter, so reliability above 80\% is reasonable.

Exercise 10: Weibull B10 Life

For the same Weibull model:

\beta=1.7,\qquad \eta=6200\ \text{h}

find the time by which 10\% have failed.

Solution

B10 life means:

R(t)=0.90

Solve:

t=\eta[-\ln(0.90)]^{1/\beta}

Substitute:

t=6200[-\ln(0.90)]^{1/1.7}=1649\ \text{h}

Engineering Comment

B10 is useful for replacement and warranty screening, but only if the Weibull fit is based on comparable duty and censoring assumptions.

Plausibility Check

B10 should be well below the scale parameter because only 10\% failures are allowed.

Exercise 11: Zero-Failure Lower Reliability Bound

A test runs:

n=32

units for the full mission with zero failures. Use:

R_L=\alpha^{1/n}

with:

\alpha=0.05

Estimate the one-sided 95\% lower bound.

Solution

Compute:

R_L=0.05^{1/32}=0.911

Engineering Comment

Zero failures does not prove perfect reliability. The result supports only a lower-bound claim for the tested mission and conditions.

Plausibility Check

Thirty-two clean missions are useful, but not enough to demonstrate 99\% reliability.

Exercise 12: Proof-Test Coverage

A latent protective function has:

N=28

identified failure cases. The proof-test procedure covers:

N_c=24

cases. The release rule requires 90\% coverage. Check release.

Solution

Coverage is:

C=\dfrac{N_c}{N}=\dfrac{24}{28}=0.857

Since:

85.7\%<90\%

the proof-test package fails.

Engineering Comment

Coverage should be mapped to failure modes and requirements. Missing high-consequence cases cannot be excused by many low-value tests.

Plausibility Check

Four of twenty-eight cases are uncovered, or one seventh, so coverage near 86\% is expected.

Exercise 13: Proof-Test Interval and Average Exposure

A latent dangerous failure rate is:

\lambda_D=1.2\times10^{-5}\ \text{h}^{-1}

The proof-test interval is:

T=720\ \text{h}

Use the simplified low-demand approximation:

PFD_{avg}\approx\dfrac{\lambda_D T}{2}

Estimate average probability of failure on demand.

Solution

Substitute:

PFD_{avg}=\dfrac{(1.2\times10^{-5})(720)}{2}=0.00432

Engineering Comment

This simplified formula assumes failures are latent, demand is rare and proof tests restore the function. Poor test coverage or repair delay would increase risk.

Plausibility Check

The product \lambda_D T is below one percent, and half of it is about 0.4\%.

Exercise 14: Diagnostic Coverage Residual Rate

A diagnostic monitors a failure mode with raw dangerous failure rate:

\lambda_D=4.0\times10^{-5}\ \text{h}^{-1}

Diagnostic coverage is:

DC=0.85

Estimate the undetected residual rate.

Solution

Residual dangerous undetected rate is:

\lambda_{DU}=\lambda_D(1-DC)

Thus:

\lambda_{DU}=4.0\times10^{-5}(1-0.85)=6.0\times10^{-6}\ \text{h}^{-1}

Engineering Comment

Coverage claims should be supported by fault-injection, proof-test or field evidence. A diagnostic that detects only easy failures may not control the dominant risk.

Plausibility Check

15\% of the raw rate remains, so the residual rate should be much smaller than the original.

Exercise 15: Beta-Factor Common-Cause Limit

Two channels each have independent dangerous failure probability:

q=0.025

A beta-factor estimate assigns:

\beta=0.12

as common-cause contribution. Estimate common-cause failure probability contribution:

q_{cc}=\beta q

Solution

Substitute:

q_{cc}=0.12(0.025)=0.0030

Engineering Comment

Common-cause risk often dominates the theoretical benefit of redundancy. Separation, diversity, independent power and independent testing are engineering controls, not algebraic assumptions.

Plausibility Check

Twelve percent of 2.5\% is 0.3\%, so the result is plausible.

Exercise 16: Standby Switch Reliability

A standby unit has mission reliability:

R_s=0.96

The automatic transfer switch has reliability:

R_{sw}=0.97

Estimate the effective standby success probability.

Solution

Both the standby unit and switch must work:

R_{eff}=R_sR_{sw}

Therefore:

R_{eff}=0.96(0.97)=0.9312

Engineering Comment

Standby redundancy needs switching and detection evidence. A healthy standby asset does not help if transfer logic fails on demand.

Plausibility Check

Multiplying two values below one should reduce the result below both inputs, so 93.1\% is reasonable.

Exercise 17: Reliability Allocation Margin

A system target is:

R_{sys}\geq0.94

Three required functions have allocated reliabilities:

R_1=0.985,\quad R_2=0.975,\quad R_3=0.980

Check the allocation margin.

Solution

For series-required functions:

R_{calc}=R_1R_2R_3

So:

R_{calc}=0.985(0.975)(0.980)=0.941

Margin is:

M=0.941-0.94=0.001

Engineering Comment

The allocation barely passes. Small modelling errors, common-cause effects or untracked interfaces could consume the margin.

Plausibility Check

Three values near 98\% multiply to a value near 94\%, so the tight pass is credible.

Exercise 18: Reliability Release Gate

A release package has these results:

GateRequirementCurrent result
mission reliability lower boundat least 0.900.911
inherent availabilityat least 99.5\%99.78\%
proof-test coverageat least 90\%85.7\%
common-cause action closure100\%100\%

Decide whether to release.

Solution

Check each gate:

0.911\geq0.90\quad \text{pass}
99.78\%\geq99.5\%\quad \text{pass}
85.7\%<90\%\quad \text{fail}

The package is not releasable because proof-test coverage fails.

Engineering Comment

Reliability release should not average unrelated gates. A latent failure test gap can block release even when mission reliability and availability look acceptable.

Plausibility Check

One hard gate fails. The correct decision is hold, restrict or add proof-test evidence.

Validation Package Checklist

A strong reliability availability and proof-test solution should check:

  • whether exposure basis and failure definition are explicit;
  • whether MTBF and failure rate use comparable operating data;
  • whether mission reliability is separated from availability;
  • whether series and parallel formulas match the real functional architecture;
  • whether redundancy assumptions include common-cause and switching failures;
  • whether Weibull parameters come from comparable life data;
  • whether zero-failure evidence is stated as a confidence bound;
  • whether proof tests cover latent failure modes, not only procedure steps;
  • whether all failed hard gates are resolved before release.

Common Release Mistakes

Common mistakes include treating MTBF as a guaranteed lifetime, mixing exposure bases, quoting availability while excluding logistics downtime, multiplying independent-channel formulas when channels share support systems, assuming standby redundancy without switch evidence, using Weibull parameters outside the fitted regime, treating zero failures as proof of perfect reliability, counting proof-test steps instead of covered failure modes, and releasing by averaging gates instead of fixing the failed gate.

REF

See also