Professional Documents
Culture Documents
KEYWORDS
Six Sigma Quality management system Process capability SQC
Analytical run Frequency of QC
KEY POINTS
The traditional error model provides the basis for development of a scientific quality man-
agement system (QMS) that adheres to the Deming Plan-Do-Check-Act process for
objective decision making based on data.
Incorporation of Six Sigma concepts and metrics provides a QMS that supports analytical
quality management with tools for specifying quality goals, judging the acceptability of
performance of examination procedures, designing statistical quality control (SQC) pro-
cedures to detect medically important errors, and evaluating quality from external quality
assessment and proficiency testing surveys.
Design of risk-based SQC procedures is practical using the traditional criterion of
achieving 90% detection of the critical medically important systematic errors, Pedc (prob-
ability of detection of medically important systematic errors), along with the documented
relationship between Pedc and the Parvin measure of the maximum expected number of
final unreliable test results, which can be understood as the maximum number of unreli-
able final test results that might be reported in an analytical run.
The number of patient samples in an analytical run bracketed by quality control (QC)
events, or frequency of QC, can be optimized from information on Pedc to minimize the
risk of reporting erroneous test results.
Practical tools for daily quality management are the strength of the error model and show
why the uncertainty model has yet to be widely accepted in medical laboratories.
a
Department of Pathology and Laboratory Medicine, School of Medicine and Public Health,
University of Wisconsin, Madison, WI 53705, USA; b Westgard QC, Inc, Madison, WI 53717, USA
* Corresponding author. Department of Pathology and Laboratory Medicine, School of Medi-
cine and Public Health, University of Wisconsin, Madison 53705, WI.
E-mail address: james@westgard.com
INTRODUCTION
Metrology challenges some of the traditional concepts and practices that have been
developed for quality management in medical laboratories, as discussed in some of
the earlier articles in this issue (See Theodorsson’s article, “Uncertainty in
Measurement and Total Error – Tools for Coping with Diagnostic Uncertainty”; and
James O. Westgard and Sten A. Westgard’s article, “Measuring Analytical Quality:
Total Analytical Error vs Measurement Uncertainty”, in this issue) as well as other recent
articles in the literature.1–6 Nevertheless, medical laboratories now depend on prac-
tices that have evolved from the initial establishment of statistical quality control
(SQC) and the evolution to Total Quality Management and Six Sigma quality manage-
ment. The theoretic rigor of metrology and the practical needs in medical laboratories
must be reconciled and will likely require that the uncertainty model of metrology and
the error model, now considered traditional practice, coexist for the foreseeable future.
First and foremost, the concept of total analytical error (TAE) and the related quality goal
in the form of allowable total error (ATE) seem to be contentious issues, even though they
are well-established concepts with more than 40 years of widespread application in
medical laboratories.7,8 The practice of making a single measurement to report a test
result is almost unique to medical laboratories, as opposed to metrology laboratories.
That practice means that any test result may be in error because of both random error
(imprecision, SD [standard deviation], or CV [coefficient of variation]) and systematic er-
ror (trueness, bias) and that a measure of accuracy, such as TAE, is necessary.2 Likewise,
quality goals in the form of ATE are needed for proficiency testing (PT) and external quality
assessment (EQA) programs in which only a single measurement is generally performed
on proficiency samples. Such ATE goals are also useful for validating method perfor-
mance, selecting SQC procedures, prioritizing controls for risk-based quality control
(QC) plans, and measuring and monitoring the quality achieved over time and distance.
Many tools and techniques are available to support the traditional quality manage-
ment practices, as part of a Six Sigma quality management system (QMS),9,10
whereas metrology tools and techniques are often more suitable for manufacturers
of medical devices. Metrology emphasizes the use of reference methods and refer-
ence materials to provide a traceability chain that should provide comparability of
test results (See Armbruster’s article, “Metrological Traceability of Assays and
Comparability of Patient Test Results”, in this issue). The measure of quality of the
traceability chain is the associated measurement uncertainty (MU), which is estimated
from the components or sources of variation in the process. A bottom-up methodol-
ogy is appropriate for use by manufacturers, but is too complicated for most applica-
tions in medical laboratories. A top-down methodology using intermediate-term
precision data from routine SQC is recommended for medical laboratories by ISO (In-
ternational Standards Organization) 15189.11
For quality management in medical laboratories, a Six Sigma QMS is recommended
that follows the Deming Plan-Do-Check-Act cycle to implement a scientific manage-
ment process, as shown in Fig. 1 (plan, steps 1–2; do, steps 3–4; check, steps 5–9;
act, steps 10–12):
Plan: quality goals are the starting point in step 1 and ATE is the most common
and useful format. The selection of an analytical examination procedure in step 2
should consider traceability and harmonization, along with the reference
Six Sigma QMS and Design of Risk-based SQC 3
Fig. 1. Six Sigma QMS. CAPA, corrective and preventive action; CQI, continuous quality
improvement; EQA, external quality assessment; FRACAS, failure reporting and corrective
action system; MU, measurement uncertainty; PT, proficiency testing.
A sigma-metric is calculated to compare the quality required for intended use to the
performance observed for the examination procedure:
where ATE describes the tolerance limits and Biasobs and SDobs represent the
observed trueness and imprecision of the examination procedure. Concentration units
are preferred when a single reference material is analyzed following a protocol to es-
timate precision by the SD and bias by comparison of the observed mean with an
assigned reference value. Alternatively, all terms may be in percentages:
Percentage figures are more reliable over a range of concentrations when the esti-
mate of precision comes from a replication experiment or SQC data and the estimate
of bias comes from a comparison of methods experiment or PT/EQA samples.
This sigma-metric is a process capability index whose use is well established for in-
dustrial process management.12–14 For example, the industrial process capability in-
dex Cpk is related to the sigma-metric as follows:
If the tolerance specification is ATE, off-centerness is bias, and variation is SD, then:
For industrial production, the minimum Cpk that is acceptable is 1.0, which corre-
sponds with 3 sigma. The ideal or desirable Cpk is 2.0, which corresponds with 6 sigma.
Given that process capability indices have been used for many years in production
operations (long before the introduction of Six Sigma concepts), the use of a sigma-
metric as a process capability index provides an established approach for managing
the production of laboratory testing processes. This metric corresponds with the long-
term sigma in the Motorola Six Sigma methodology,15 which was introduced in the
1990s, rather than the short-term sigma, which accommodates a systematic shift or
bias equivalent to 1.5*SD. The long-term sigma accounts exactly for the bias observed
for the process, thus is the preferred measure for an analytical examination process.
The origin of the sigma-metric as a process capability index shows that a quality
requirement in the form of an ATE corresponds with the tolerance limits used in indus-
trial production and that the off-centerness, or bias, of the production process is an
important consideration for managing quality, along with the variation or precision of
the process. Industrial tolerance limits apply to the individual products being produced,
just like ATE applies to the individual test results being produced in a medical laboratory.
Given that the sigma-metric is such a critical measure of quality, there are some con-
cerns about the best way to make the calculation. The established practice is to calcu-
late sigma for each of the important medical decision concentrations; for example,
6.5% hemoglobin (Hb) for use of HbA1c for diagnosis, 7.0% Hb as the treatment
goal, then select the most demanding sigma value or average the sigmas to guide
the SQC design.
An alternative form of calculation has been proposed in the form of a patient-
weighted sigma.16 The argument here is that a patient-weighted sigma better repre-
sents the best single value for the observed patient population, rather than using
sigma-metrics that are calculated for critical medical decision concentrations. The
patient-weighted sigma is a much more complicated calculation and requires knowl-
edge of the expected patient measurand distribution, the precision and bias profiles
for the examination procedure, as well as the quality requirements throughout the
observed patient range. The calculation is sufficiently complex and unique that it
has been patented17 and cannot be readily applied without proprietary software.
The application of patient-weighted sigmas was shown in study of HbA1c methods
to assess the risk and reliability of 6 different examination procedures commonly
used in central laboratory testing.18 Data for high and low controls (at 5% Hb and
10% Hb) were also provided, along with regression statistics for comparison with
the NGSP (National Glycohemoglobin Standardization Program) reference method,
thus permitting calculation of point estimates of sigma for HbA1c at the medical deci-
sion concentration of 6.5% Hb. Comparison of the point estimates and the patient-
weighted sigmas shows small differences for 5 of the 6 methods studied (2.48 point
estimate vs 2.29 patient-weighted sigma; 1.54 vs 1.43; 3.97 vs 3.90; 0.49 vs 0.36;
2.07 vs 2.36). For 1 method, the patient-weighted sigma was 1.57 versus the point es-
timate of 2.80; for this method, the bias for normal samples (low concentrations) was
very high and likely reduced the patient-weighted sigma because of the preponder-
ance of normal samples in a typical patient distribution. Thus, it is not clear that this
more complicated patient-weighted sigma provides any practical improvement
compared with the simple point estimates at medically important decision
concentrations.
6 Westgard & Westgard
The subtraction of bias may render sigmas so low that the chemist/technologist
will attempt to use multiple control rules to manage assay quality, which will invari-
ably increase the probability of false rejections. It is our belief that there is little that
a laboratory system can do with bias once inter-instrument variation is minimized
by using similar analytic systems and occasionally employ a conversion factor to
make a test on one analyzer resemble the same test on a different analyzer.20
This practice assumes that reference ranges and critical decision levels have been
developed internally and neglects the trend toward evidence-based medical treat-
ment guidelines (eg, HbA1c) that are applied globally or nationally across laboratories
and across examination procedures. Another serious problem is that the decision
about the acceptability of an examination procedure must consider bias. When bias
is large and sigma is low, the preferred decision should be to improve method perfor-
mance or select another method, rather than to assume that those biases are not
important as long as daily operation is stable.
The authors have long advocated the use of the sigma-metric as a predictor of risk.
Sigma is inherently risk based and can readily be converted to the expected defect
rates or expressed in terms of defects per million. It is therefore very useful for evalu-
ating the performance of an examination procedure as well as for selecting/designing
SQC procedures to detect medically important errors. To select appropriate SQC pro-
cedures, the rejection characteristics of different control rules and different numbers
of control measurements can be described by power curves, or power function
graphs.21 Given a specification for ATE, the size of the medically important systematic
error (DSEcrit) can be calculated as follows22:
where 1.65 is the z-value for a 1-sided confidence limit, which allows a 5% risk of a
medically important error.
To select SQC procedures, the calculated critical errors can be drawn over power
curves.23 With the introduction of Six Sigma concepts, the expression (ATE – Bia-
sobs)/SDobs can be replaced by the sigma-metric, thereby simplifying the calculation:
Six Sigma QMS and Design of Risk-based SQC 7
This relationship allows power function graphs to be rescaled in terms of the sigma-
metric and makes it quicker and easier to select appropriate control rules and numbers
of control measurements. An example is shown in Fig. 2 for a 4-sigma testing process.
For goals in selecting an appropriate SQC procedure, the authors recommend
achieving a probability of error detection (Ped) of 0.90, or 90% chance, for the critical
SE, while maintaining the probability for false rejection (Pfr) as low as possible. In the
example in Fig. 2, that could be accomplished by selection of the 13s/22s/R4s/41s multi-
rule procedure with 4 control measurements per run. As shown in the key, Ped is 0.91,
which corresponds with the intersection of the vertical line and the top power curve. Pfr
is 0.03, which corresponds with the y-intercept of that power curve.
A limitation of this design approach is the lack of a specification for the size of the
analytical run or the frequency of QC. Parvin24 developed a complimentary design
approach based on the estimation of the number of unreliable, defective, or erroneous
test results that might be accepted and reported in an analytical run with an unde-
tected error condition. This parameter has been termed the maximum expected num-
ber of unreliable final test results [MaxE(NUF)24; pronounced max-enough]. The
approach was described in detail in a previous issue.25 More recently, Yago and
Alcover26 showed how to integrate the classic Westgard approach with the Parvin
approach to improve the SQC design process. They provide nomograms that show
the relationship between sigma, various control rules, and the maximum expected
number of erroneous results. They also show that the maximum expected number
of erroneous results is related to the Ped for the critical-sized systematic error (Pedc).
Fig. 3 shows this relationship and permits the estimation of the maximum number
of unreliable or erroneous results from the probability of error detection, or vice versa:
The plot shows that the relationships between these performance measures are
very similar for all rules considered, especially for rules with the same N value,
so that 1 can estimate the value of 1 from the other with a relatively small margin
of error, almost regardless of the QC procedure considered. To obtain an
average of less than 1 unacceptable patient result as a consequence of an out-
of-control condition, the plot shows that a QC procedure with a Pedc of about
0.8 is needed, assuming that 100 patient samples are analyzed between QC
events.26
This last statement suggests a goal of less than or equal to 1 erroneous result per
run, which is also consistent with Parvin’s24 recommendation and examples. This
goal can be used to assess run size in terms of the number of patient samples for
Fig. 3. Relationship between MaxE(NUF) on the y-axis versus Pedc on the x-axis for families of SQC
procedures with Ns from 1 to 3. (From Yago M, Alcover S. Selecting statistical procedures for
quality control planning based on risk management. Clin Chem 2016;62:963. Published May
19, 2016. Reproduced with permission from the American Association for Clinical Chemistry.)
Six Sigma QMS and Design of Risk-based SQC 9
The selection of SQC procedures has been done classically with the objective of
having 90% detection of the critical systematic errors, while maintaining false re-
jections as low as possible. Fig. 3 shows that values for Max E(NUF) in the range of
0.3 to 0.4 are obtained when simple QC procedures with Pedc 5 0.90 are used to
control an analytical process. This means that up to about 300 patient samples
could be analyzed between QC events for reporting less than 1 erroneous result
during an out-of-control condition, which represents a typical daily workload for
many tests performed in a small or medium-sized laboratory. Therefore, use of
Pedc 5 0.90 as a general criterion appears to be quite reasonable for selecting
QC procedures that reduce the probability of harming the patient to a level that
could be considered acceptable for many tests conducted in a laboratory.
In this example, the size of the analytical run is calculated as 100/MaxE(NUF); for
example, 100/0.3 would be 333 and 100/0.4 would be 250, which Yago and Alcover26
summarize as “about 300 patient samples.”
As another example, consider the 4-sigma process in which Pedc has been deter-
mined in Fig. 2. Use of a 13s/22s/R4s/41s multirule with N 5 4 provides a Pedc of
0.91, which means that performance is equivalent to the example given earlier and
the size of the analytical run could be 300 patient samples. Remember this means
both the QC events at the beginning and end of the bracket require a multirule proced-
ure with 4 control measurements. If the end-of-bracket QC event is out of control, then
the quality of all 300 patient samples must be reassessed and the necessary patient
samples repeated. In determining when a problem occurs during the run, it would
be helpful to space the second set of 4 controls throughout the run, every 75 patient
samples, rather than waiting until the end of the run. That would provide additional in-
formation about when the problem began and the patient samples that need to be
repeated, and might also provide earlier detection.
To see the effect of Pedc on the length of the analytical run for this 4-sigma testing
process, consider the use of a 13s/2of32s/R4s/31s multirule with N 5 3, which provides
a Pedc of 0.84. From Fig. 3, approximately 125 patient samples (100/0.80) would be
appropriate to maintain a goal of less than or equal to 1 erroneous result being re-
ported. If a 13s rule with N 5 3 were used, Pedc would be 0.54, MaxE(NUF) about
3.0, and the analytical run should be reduced to about 30 to 35 samples for a brack-
eted continuous production operation. That would mean a total of 6 controls for 30 to
35 samples, which would be expensive but necessary to achieve the goal of less than
or equal to 1 erroneous result per run.
Table 1 provides some estimates for the size of an analytical run as a function of
Pedc, based on the estimation of MaxE(NUF) from Fig. 3. The number of patient sam-
ples is calculated as 100/MaxE(NUF) based on the goal of producing less than or equal
to 1 erroneous result in a reported analytical run. These estimates are approximate
because they depend on interpolation of the graphical results and assume that the
general relationship between Pedc and MaxE(NUF) also holds for multirule QC proced-
ures and Ns as high as 4.
10 Westgard & Westgard
Table 1
Relationship between probability of error detection for critical sized systematic error,
maximum expected number of final unreliable test results, and quality control frequency,
expressed as the number of patient samples in an analytical run bracketed by control events.
Values are interpolated from the relationship between ability of error detection for critical
sized systematic error and maximum expected number of final unreliable test results, as
documented by Yago and Alcover
Data from Yago M, Alcover S. Selecting statistical procedures for quality control planning based on
risk management. Clin Chem 2016;62:959–65.
SUMMARY
The design of risk-based SQC procedures shows the importance and practical value
of the error model for quality management in medical laboratories. Metrology strangely
ignores the uncertainty caused by the instability of an examination process as well as
the uncertainty in the detection of medically important errors by SQC procedures. Se-
lection of an adequate SQC procedure is an important task in developing QC plans
that ensure the analytical quality of laboratory test results.
Our design approach, now considered traditional or classic, focuses on achieving
90% detection of a critical systematic error by selecting appropriate control rules
and the number of control measurements, while also aiming for a low probability of
false rejections. This design approach lacks a parameter for determining the fre-
quency of QC or the size of an analytical run. Parvin24 recommends the calculation
of a parameter that represents the maximum expected number of unreliable final
test results that may be reported in a run with an undetected error condition. This num-
ber of possible erroneous patient test results directly relates to the risk of patient harm,
which makes this parameter particularly useful for risk-based SQC. This parameter is
called MaxE(NUF) and represents a concept that is difficult to understand and difficult
to calculate. Its practical usefulness has also been limited by the need for specialized
computer support for calculation.27
Six Sigma QMS and Design of Risk-based SQC 11
REFERENCES
1. Sandberg S, Fraser CG, Horvath AR, et al. Defining analytical performance spec-
ifications: consensus statement from the 1st strategic conference of the European
Federation of Clinical Chemistry and Laboratory Medicine. Clin Chem Lab Med
2015;53(6):833–5.
2. Westgard JO. Useful measures and models for analytical quality management in
medical laboratories. Clin Chem Lab Med 2016;54:223–33.
3. Panteghini M, Sandberg S. Total error vs. measurement uncertainty: the match
continues. Clin Chem Lab Med 2016;54:195–6.
4. Oosterhuis WP, Theodorsson E. Total error vs. measurement uncertainty: revolu-
tion or evolution. Clin Chem Lab Med 2016;54:235–9.
12 Westgard & Westgard
5. Farrance I, Badrick T, Sikaris KA. Uncertainty in measurement and total error – are
they so incompatible? Clin Chem Lab Med 2016;54:1309–11.
6. Kallner A. Is the combination of trueness and precision in one expression mean-
ingful? On the use of total error and uncertainty in clinical chemistry. Clin Chem
Lab Med 2016;54:1291–7.
7. Westgard JO, Carey RN, Wold S. Criteria for judging precision and accuracy in
method development and evaluation. Clin Chem 1974;20:825–33.
8. Westgard JO, Westgard SA. Assessing quality on the sigma-scale from profi-
ciency testing and external quality assessment surveys. Clin Chem Lab Med
2015;53:1531–5.
9. Westgard JO, Westgard SA. Quality control review: implementing a scientifically
based quality control system. Ann Clin Biochem 2016;53:32–50.
10. Westgard JO, Westgard SA. Basic quality management systems. Madison (WI):
Westgard QC; 2014.
11. ISO 15189. Medical laboratories – particular requirements for quality and compe-
tence. Geneva (Switzerland): ISO; 2012.
12. Westgard JO, Burnett RW. Precision requirements for cost-effective operation of
analytical processes. Clin Chem 1990;36:1629–32.
13. Chesher D, Burnett L. Equivalence of critical error calculations and process
capability index Cpk. Clin Chem 1997;43:1100–1.
14. Bais R. Use of capability index to improve laboratory analytical performance. Clin
Biochem Rev 2008;29(Suppl 1):S27–31.
15. Harry M, Schroder R. Six sigma: the breakthrough management strategy revolu-
tionizing the World’s top corporations. New York: Currency; 2000.
16. Kuchipudi L, Yundt-Pacheco J, Parvin CA. Computing a patient-based sigma
metric [Abstract]. Clin Chem 2010;56:A35.
17. Yundt-Pacheco J, Parvin CA. System and method to determine sigma of a clinical
diagnostic process. US Patent 8,589,081 B2. 2013.
18. Woolworth A, Korpi-Steiner N, Miller JJ, et al. Utilization of assay performance
characteristics to estimate hemoglobin A1c result reliability. Clin Chem 2014;
60:1073–9.
19. Coskun A, Serteser J, Kilercik M, et al. A new approach for calculating the sigma
metric in clinical laboratories. Accred Qual Assur 2015;20:147–52.
20. Cembrowski GS, Cervinski MA. Demystifying reference sample quality control
[Editorial]. Clin Chem 2016;62:907–9.
21. Westgard JO, Falk H, Groth T. Influence of a between-run component of variation,
choice of control limits, and shape of error distribution on the performance char-
acteristics of rules for internal quality control. Clin Chem 1979;25:394–400.
22. Westgard JO, Barry PL. Cost-effective quality control: managing the quality and
productivity of analytical processes. Washington, DC: AACC Press; 1986.
23. Koch DD, Oryall JJ, Quam EF, et al. Selection of medically useful quality-control
procedures for individual tests done in a multitest analytical system. Clin Chem
1990;36:230–3.
24. Parvin CA. Assessing the impact of the frequency of quality control testing on the
quality of reported patient results. Clin Chem 2008;54:2049–54.
25. Yundt-Pacheco J, Parvin CA. Validating the performance of QC procedures. Clin
Lab Med 2013;33:75–88.
26. Yago M, Alcover S. Selecting statistical procedures for quality control planning
based on risk management. Clin Chem 2016;62:959–65.
27. Parvin CA, Yundt-Pacheco J. System and method for analyzing a QC strategy for
releasing results. US Patent 8,938,409 B2. 2015.