Glossary term

Reproducibility

Engineering definition of reproducibility covering between-condition precision, operator and fixture effects, variance components, Gage R&R use and validation evidence.

Definition

metric

Reproducibility is the closeness of agreement among measurement results when defined conditions such as operator, instrument, site, fixture, day or method are allowed to vary.

In engineering measurement, reproducibility captures between-condition variation. It is broader than repeatability, which holds conditions as constant as practical. Reproducibility is central to Gage R&R studies, laboratory comparisons, field measurement validation, biomedical imaging consistency, quality release and uncertainty budgets.

Reproducibility is the closeness of agreement among measurement results when defined conditions are allowed to vary. Those conditions may include operator, instrument, fixture, laboratory, site, day, software version, part setup, patient positioning or test method.

Repeatability asks, “Do I get the same result when I repeat the measurement under the same conditions?” Reproducibility asks, “Do I still get compatible results when realistic conditions change?”

Engineering Meaning

For condition (j), such as operator or site, the condition mean is:

\displaystyle \bar{x}_j=\frac{1}{n_j}\sum_{i=1}^{n_j}x_{ij}

The overall mean is:

\displaystyle \bar{x}=\frac{1}{m}\sum_{j=1}^{m}\bar{x}_j

A simple between-condition standard deviation for balanced data is:

\displaystyle s_b=\sqrt{\frac{\sum_{j=1}^{m}(\bar{x}_j-\bar{x})^2}{m-1}}

where (s_b) estimates how much condition means move relative to each other.

Study Design

Reproducibility is only as strong as the study design. The varied condition must match the risk being tested. If the risk is operator technique, the study must include multiple operators. If the risk is site-to-site transfer, it must include sites or laboratories. If the risk is fixture rebuilding, the part should be removed, refixtured and measured again rather than left in place.

Randomization matters because time drift can masquerade as operator difference. A good study avoids measuring all parts with one operator first and another operator later unless time order is part of the intended comparison.

Worked Operator Example

Three operators measure the same bore using the same gage and method. Their average readings are:

25.010,\ 25.016,\ 25.004\ \text{mm}

The overall mean is:

\bar{x}=25.010\ \text{mm}

The between-operator standard deviation is:

s_b=0.0060\ \text{mm}

If short-term repeatability is (s_r=0.0030\ \text{mm}), a simple combined reproducibility screen is:

s_R=\sqrt{s_r^2+s_b^2}

so:

s_R=\sqrt{0.0030^2+0.0060^2}=0.0067\ \text{mm}

Percent of Tolerance

For a tolerance width (T), the between-condition measurement spread can be screened as:

\displaystyle \%T_R=\frac{6s_R}{T}

If (T=0.100\ \text{mm}):

\displaystyle \%T_R=\frac{6(0.0067)}{0.100}=40.2\%

This is high for many production-release contexts. The result suggests that operator, fixture or method differences may influence pass/fail decisions.

Relation to Repeatability

Repeatability is usually a component of reproducibility, not a substitute for it. A measurement setup can repeat well for one operator and still reproduce poorly across operators, shifts, sites or installation states.

This is why a bench repeatability result should not be used alone to release a field measurement method, clinical measurement workflow or production inspection process.

Relation to Gage R and R

Gage R&R combines repeatability and reproducibility to estimate measurement-system variation. In a manufacturing study, reproducibility often captures operator-to-operator or fixture-to-fixture differences. In laboratory and field work, it may capture site, day, sample preparation, software version or method differences.

The important point is the study design. If the factor is not varied, the study cannot estimate its reproducibility effect.

Intermediate Precision

Some engineering records use intermediate precision for variation within one organization over different days, operators, instruments or setups. It is narrower than full interlaboratory reproducibility but broader than pure repeatability.

That distinction should be stated in validation records. A method may be reproducible inside one plant but not across suppliers. A clinical or field method may reproduce across trained staff at one site but fail after software, protocol or fixture changes elsewhere.

Evidence for Validation

Useful reproducibility evidence states which conditions varied, how many repeats were made under each condition, whether specimens were randomized, how operators were trained, whether fixtures were reset, what software version was used and whether environmental conditions were representative.

For biomedical imaging, reproducibility may depend on patient positioning, scanner protocol, reconstruction kernel, segmentation method and reader. For wind-tunnel or field measurements, it may depend on setup rebuild, model installation, reference instrumentation and data-reduction settings.

Limits and Common Mistakes

Common mistakes include calling a single-operator repeatability test reproducibility, pooling data without identifying condition effects, changing several factors at once without recording them, and using reproducibility from an easy artifact to justify a harder real measurement.

Another mistake is ignoring reproducibility because the average bias is small. If different operators produce opposite errors, the mean can look acceptable while individual decisions remain unreliable. A strong reproducibility statement says which conditions were varied, how much variation they introduced and how that variation affects the uncertainty budget or release rule.

REF

See also