You are on page 1of 28

Measure Phase

Six Sigma Statistics


Six Sigma Statistics

Welcome to Measure

Process Discovery

Six Sigma Statistics

Basic Statistics

Descriptive Statistics

Normal Distribution

Assessing Normality

Special Cause / Common Cause

Graphing Techniques

Measurement System Analysis

Process Capability

Wrap Up & Action Items

OSSS LSS Green Belt v9.1 - Measure Phase 2 © OpenSourceSixSigma, LLC


Purpose of Basic Statistics

The purpose of Basic Statistics is to:


• Provide a numerical summary of the data being analyzed.
– Data (n)
• Factual information organized for analysis.
• Numerical or other information represented in a form suitable for processing by
computer
• Values from scientific experiments.
• Provide the basis for making inferences about the future.
• Provide the foundation for assessing process capability.
• Provide a common language to be used throughout an organization to
describe processes.

Relax….it won’t
be that bad!
OSSS LSS Green Belt v9.1 - Measure Phase 3 © OpenSourceSixSigma, LLC
Statistical Notation – Cheat Sheet

Summation An individual value, an observation

The Standard Deviation of sample data A particular (1st) individual value

The Standard Deviation of population data For each, all, individual values

The variance of sample data The mean, average of sample data


The variance of population data
The grand mean, grand average
The range of data
The mean of population data
The average range of data

Multi-purpose notation, i.e. # of subgroups, # A proportion of sample data


of classes
A proportion of population data
The absolute value of some term
Sample size
Greater than, less than

Greater than or equal to, less than or equal to Population size

OSSS LSS Green Belt v9.1 - Measure Phase 4 © OpenSourceSixSigma, LLC


Parameters vs. Statistics

Population: All the items that have the “property of interest” under study.

Frame: An identifiable subset of the population.

Sample: A significantly smaller subset of the population used to make an inference.

Population

Sample
Sample
Sample

Population Parameters: Sample Statistics:


– Arithmetic descriptions of a population – Arithmetic descriptions of a
– µ,  , P, 2, N sample
– X-bar , s, p, s2, n

OSSS LSS Green Belt v9.1 - Measure Phase 5 © OpenSourceSixSigma, LLC


Types of Data

Attribute Data (Qualitative)


– Is always binary, there are only two possible values (0, 1)
• Yes, No
• Go, No go
• Pass/Fail
Variable Data (Quantitative)
– Discrete (Count) Data
• Can be categorized in a classification and is based on counts.
– Number of defects
– Number of defective units
– Number of customer returns
– Continuous Data
• Can be measured on a continuum, it has decimal subdivisions that are
meaningful
– Time, Pressure, Conveyor Speed, Material feed rate
– Money
– Pressure
– Conveyor Speed
– Material feed rate

OSSS LSS Green Belt v9.1 - Measure Phase 6 © OpenSourceSixSigma, LLC


Discrete Variables

Discrete Variable Possible values for the variable

The number of defective needles in boxes of 100 0,1,2, …, 100


diabetic syringes

The number of individuals in groups of 30 with a 0,1,2, …, 30


Type A personality

The number of surveys returned out of 300 0,1,2, … 300


mailed in a customer satisfaction study.

The number of employees in 100 having finished 0,1,2, … 100


high school or obtained a GED

The number of times you need to flip a coin 1,2,3, …


before a head appears for the first time
(note, there is no upper limit because you might
need to flip forever before the first head appears.

OSSS LSS Green Belt v9.1 - Measure Phase 7 © OpenSourceSixSigma, LLC


Continuous Variables

Continuous Variable Possible Values for the Variable

The length of prison time served for individuals All the real numbers between a and b, where a is
convicted of first degree murder the smallest amount of time served and b is the
largest.

The household income for households with All the real numbers between a and $30,000,
incomes less than or equal to $30,000 where a is the smallest household income in the
population

The blood glucose reading for those individuals All real numbers between 200 and b, where b is
having glucose readings equal to or greater than the largest glucose reading in all such individuals
200

OSSS LSS Green Belt v9.1 - Measure Phase 8 © OpenSourceSixSigma, LLC


Definitions of Scaled Data

• Understanding the nature of data and how to represent it can affect the
types of statistical tests possible.

• Nominal Scale – data consists of names, labels, or categories. Cannot


be arranged in an ordering scheme. No arithmetic operations are
performed for nominal data.

• Ordinal Scale – data is arranged in some order, but differences between


data values either cannot be determined or are meaningless.

• Interval Scale – data can be arranged in some order and for which
differences in data values are meaningful. The data can be arranged in
an ordering scheme and differences can be interpreted.

• Ratio Scale – data that can be ranked and for which all arithmetic
operations including division can be performed. (division by zero is of
course excluded) Ratio level data has an absolute zero and a value of
zero indicates a complete absence of the characteristic of interest.

OSSS LSS Green Belt v9.1 - Measure Phase 9 © OpenSourceSixSigma, LLC


Nominal Scale

Qualitative Variable Possible nominal level data values for


the variable

Blood Types A, B, AB, O

State of Residence Alabama, …, Wyoming

Country of Birth United States, China, other

Time to weigh in!


OSSS LSS Green Belt v9.1 - Measure Phase 10 © OpenSourceSixSigma, LLC
Ordinal Scale

Qualitative Variable Possible Ordinal level data


values

Automobile Sizes Subcompact, compact,


intermediate, full size, luxury

Product rating Poor, good, excellent

Baseball team classification Class A, Class AA, Class AAA,


Major League

OSSS LSS Green Belt v9.1 - Measure Phase 11 © OpenSourceSixSigma, LLC


Interval Scale

Interval Variable Possible Scores

IQ scores of students in 100…


BlackBelt Training (the difference between scores
is measurable and has
meaning but a difference of 20
points between 100 and 120
does not indicate that one
student is 1.2 times more
intelligent )

OSSS LSS Green Belt v9.1 - Measure Phase 12 © OpenSourceSixSigma, LLC


Ratio Scale

Ratio Variable Possible Scores

Grams of fat consumed per adult in the 0…


United States (If person A consumes 25 grams of fat and
person B consumes 50 grams, we can say
that person B consumes twice as much fat
as person A. If a person C consumes zero
grams of fat per day, we can say there is a
complete absence of fat consumed on that
day. Note that a ratio is interpretable and
an absolute zero exists.)

OSSS LSS Green Belt v9.1 - Measure Phase 13 © OpenSourceSixSigma, LLC


Converting Attribute Data to Continuous Data

• Continuous Data is always more desirable

• In many cases Attribute Data can be converted to


Continuous

• Which is more useful?


– 15 scratches or Total scratch length of 9.25”
– 22 foreign materials or 2.5 fm/square inch
– 200 defects or 25 defects/hour

OSSS LSS Green Belt v9.1 - Measure Phase 14 © OpenSourceSixSigma, LLC


Descriptive Statistics

Measures of Location (central tendency)


– Mean
– Median
– Mode

Measures of Variation (dispersion)


– Range
– Interquartile Range
– Standard deviation
– Variance

OSSS LSS Green Belt v9.1 - Measure Phase 15 © OpenSourceSixSigma, LLC


Descriptive Statistics

Open the MINITAB™ Project “Measure Data Sets.mpj” and


select the worksheet “basicstatistics.mtw”

OSSS LSS Green Belt v9.1 - Measure Phase 16 © OpenSourceSixSigma, LLC


Measures of Location

Mean is:
• Commonly referred to as the average.
• The arithmetic balance point of a distribution of data.

Stat>Basic Statistics>Display Descriptive Statistics…>Graphs…


>Histogram of data, with normal curve

Histogram (with Normal Curve) of Data Sample Population


Mean 5.000
80
StDev 0.01007
N 200
70

60

50
Frequency

40
Descriptive Statistics: Data
30
Variable N N* Mean SE Mean StDev Minimum Q1
20 Median Q3
Data 200 0 4.9999 0.000712 0.0101 4.9700 4.9900
10 5.0000 5.0100

0 Variable Maximum
4.97 4.98 4.99 5.00 5.01 5.02 Data 5.0200
Data

OSSS LSS Green Belt v9.1 - Measure Phase 17 © OpenSourceSixSigma, LLC


Measures of Location

Median is:
• The mid-point, or 50th percentile, of a distribution of data.
• Arrange the data from low to high, or high to low.
– It is the single middle value in the ordered list if there is an odd
number of observations
– It is the average of the two middle values in the ordered list if there
are an even number of observations

Histogram (with Normal Curve) of Data


Mean 5.000
80
StDev 0.01007
N 200
70

60

50
Frequency

Descriptive Statistics: Data


40
Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3
30
Data 200 0 4.9999 0.000712 0.0101 4.9700 4.9900 5.0000 5.0100
20
Variable Maximum
10 Data 5.0200

0
4.97 4.98 4.99 5.00 5.01 5.02
Data

OSSS LSS Green Belt v9.1 - Measure Phase 18 © OpenSourceSixSigma, LLC


Measures of Location

Trimmed Mean is a:
Compromise between the Mean and Median.
• The Trimmed Mean is calculated by eliminating a specified percentage
of the smallest and largest observations from the data set and then
calculating the average of the remaining observations
• Useful for data with potential extreme values.

Stat>Basic Statistics>Display Descriptive Statistics…>Statistics…> Trimmed Mean

Descriptive Statistics: Data

Variable N N* Mean SE Mean TrMean StDev Minimum Q1 Median


Data 200 0 4.9999 0.000712 4.9999 0.0101 4.9700 4.9900 5.0000

Variable Q3 Maximum
Data 5.0100 5.0200

OSSS LSS Green Belt v9.1 - Measure Phase 19 © OpenSourceSixSigma, LLC


Measures of Location

Mode is:
The most frequently occurring value in a distribution of data.

Mode = 5

Histogram (with Normal Curve) of Data


Mean 5.000
80
StDev 0.01007
N 200
70

60

50
Frequency

40

30

20

10

0
4.97 4.98 4.99 5.00 5.01 5.02
Data

OSSS LSS Green Belt v9.1 - Measure Phase 20 © OpenSourceSixSigma, LLC


Measures of Variation

Range is the:
Difference between the largest observation and the smallest
observation in the data set.
• A small range would indicate a small amount of variability and a large
range a large amount of variability.

Descriptive Statistics: Data

Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3


Data 200 0 4.9999 0.000712 0.0101 4.9700 4.9900 5.0000 5.0100

Variable Maximum
Data 5.0200

Interquartile Range is the:


Difference between the 75th percentile and the 25th percentile.

Use Range or Interquartile Range when the data distribution is Skewed.


OSSS LSS Green Belt v9.1 - Measure Phase 21 © OpenSourceSixSigma, LLC
Measures of Variation

Standard Deviation is:


Equivalent of the average deviation of values from the Mean for a
distribution of data.
A “unit of measure” for distances from the Mean.
Use when data are symmetrical.

Sample Population

Descriptive Statistics: Data

Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3


Data 200 0 4.9999 0.000712 0.0101 4.9700 4.9900 5.0000 5.0100

Variable Maximum
Data 5.0200

Cannot calculate population Standard Deviation because this is sample data.

OSSS LSS Green Belt v9.1 - Measure Phase 22 © OpenSourceSixSigma, LLC


Measures of Variation

Variance is the:
Average squared deviation of each individual data point from the
Mean.

Sample Population

OSSS LSS Green Belt v9.1 - Measure Phase 23 © OpenSourceSixSigma, LLC


Normal Distribution

The Normal Distribution is the most recognized distribution in


statistics.

What are the characteristics of a Normal Distribution?


– Only random error is present
– Process free of assignable cause
– Process free of drifts and shifts

So what is present when the data is Non-normal?

OSSS LSS Green Belt v9.1 - Measure Phase 24 © OpenSourceSixSigma, LLC


The Normal Curve

The normal curve is a smooth, symmetrical, bell-shaped


curve, generated by the density function.

It is the most useful continuous probability model as


many naturally occurring measurements such as
heights, weights, etc. are approximately Normally
Distributed.

OSSS LSS Green Belt v9.1 - Measure Phase 25 © OpenSourceSixSigma, LLC


Normal Distribution

Each combination of Mean and Standard Deviation generates a


unique normal curve:

“Standard” Normal Distribution

– Has a μ = 0, and σ = 1

– Data from any Normal Distribution can be made to


fit the standard Normal by converting raw scores
to standard scores.

– Z-scores measure how many Standard Deviations from the


mean a particular data-value lies.

OSSS LSS Green Belt v9.1 - Measure Phase 26 © OpenSourceSixSigma, LLC


Normal Distribution

The area under the curve between any 2 points represents the
proportion of the distribution between those points.

The area between the


Mean and any other
point depends upon the
Standard Deviation.

m x
Convert any raw score to a Z-score using the formula:

Refer to a set of Standard Normal Tables to find the


proportion between μ and x.

OSSS LSS Green Belt v9.1 - Measure Phase 27 © OpenSourceSixSigma, LLC


The Empirical Rule

The Empirical Rule…

-6 -5 -4 -3 -2 -1 +1 +2 +3 +4 +5 +6

68.27 % of the data will fall within +/- 1 standard deviation


95.45 % of the data will fall within +/- 2 standard deviations
99.73 % of the data will fall within +/- 3 standard deviations
99.9937 % of the data will fall within +/- 4 standard deviations
99.999943 % of the data will fall within +/- 5 standard deviations
99.9999998 % of the data will fall within +/- 6 standard deviations

OSSS LSS Green Belt v9.1 - Measure Phase 28 © OpenSourceSixSigma, LLC