Professional Documents
Culture Documents
Walpole,Probability and Statistics for Engineers & Scientists, 8th e., Pearson Edu.
Chap 1-1
Section 1
Introduction and Data Collection
Walpole,Probability and Statistics for Engineers & Scientists, 8th e., Pearson Edu.
Chap 1-2
Section Goals
After completing this chapter, you should be
able to:
Presenting data
improve processes
Key Definitions
Sample
cd
ef gh i jk l m n
o p q rs t u v w
x y
gi
o
n
r
y
Measures computed from
sample data are called
statistics
Descriptive statistics
Inferential statistics
Descriptive Statistics
Collect data
Present data
e.g., Survey
Characterize data
Inferential Statistics
Estimation
Hypothesis testing
To satisfy curiosity
Data Sources
Primary
Secondary
Data Collection
Data Compilation
Print or Electronic
Observation
Survey
Experimentation
Nonprobability Sample
Probability Sample
Samples
Non-Probability
Samples
Judgement
Quota
Chunk
Convenience
Probability Samples
Simple
Random
Stratified
Systematic
Cluster
Probability Sampling
Simple
Random
Systematic
Stratified
Cluster
Systematic Samples
First Group
Stratified Samples
Population
Divided
into 4
strata
Sample
Cluster Samples
Population
divided into
16 clusters.
Randomly selected
clusters for sample
Stratified sample
Simple to use
May not be a good representation of the populations
underlying characteristics
Ensures representation of individuals across the
entire population
Cluster sample
Types of Data
Data
Categorical
Numerical
Examples:
Marital Status
Political Party
Eye Color
(Defined categories)
Discrete
Examples:
Number of Children
Defects per hour
(Counted items)
Continuous
Examples:
Weight
Voltage
(Measured characteristics)
Sampling error
Measurement error
Coverage error
Sampling error
Measurement error
Excluded from
frame
Follow up on
nonresponses
Random
differences from
sample to sample
Bad or leading
question
Section 2
Numerical Descriptive Measures
Walpole,Probability and Statistics for Engineers & Scientists, 8th e., Pearson Edu.
Chap 1-24
Section Goals
After completing this chapter, you should be able
to:
Summary Measures
Describing Data Numerically
Central Tendency
Quartiles
Variation
Arithmetic Mean
Range
Median
Interquartile Range
Mode
Variance
Geometric Mean
Standard Deviation
Shape
Skewness
Coefficient of Variation
Arithmetic Mean
Median
Mode
X
i1
Geometric Mean
XG ( X1 X 2 Xn )1/ n
Midpoint of
ranked
values
Most
frequently
observed
value
Arithmetic Mean
X
Sample size
X
i1
X1 X 2 Xn
n
Observed values
Arithmetic Mean
(continued)
0 1 2 3 4 5 6 7 8 9 10
Mean = 3
1 2 3 4 5 15
3
5
5
0 1 2 3 4 5 6 7 8 9 10
Mean = 4
1 2 3 4 10 20
4
5
5
Median
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10
Median = 3
Median = 3
n 1
Note that
is not the value of the median, only the
2
position of the median in the ranked data
Mode
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
0 1 2 3 4 5 6
No Mode
Review Example
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
$500 K
$300 K
$100 K
$100 K
Review Example:
Summary Statistics
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
Mean:
Sum 3,000,000
($3,000,000/5)
= $600,000
Geometric Mean
Geometric mean
XG ( X1 X 2 Xn )
1/ n
Quartiles
25%
25%
Q2
25%
Q3
The first quartile, Q1, is the value for which 25% of the
observations are smaller and 75% are larger
Q2 is the same as the median (50% are smaller, 50% are
larger)
Only 25% of the observations are greater than the third
quartile
Quartile Formulas
Find a quartile by determining the value in the
appropriate position in the ranked data, where
First quartile position:
Q1 = (n+1)/4
Q3 = 3(n+1)/4
Quartiles
(n = 9)
Q1 = is in the (9+1)/4 = 2.5 position of the ranked data
so use the value half way between the 2nd and 3rd values,
so
Q1 = 12.5
Q1 and Q3 are measures of noncentral location
Q2 = median, a measure of central tendency
Measures of Variation
Variation
Range
Interquartile
Range
Variance
Standard
Deviation
Coefficient
of Variation
Same center,
different variation
Range
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12
Range = 14 - 1 = 13
13 14
10
11
12
Range = 12 - 7 = 5
10
11
12
Range = 12 - 7 = 5
Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
Interquartile Range
Interquartile Range
Example:
X
minimum
Q1
25%
12
Median
(Q2)
25%
30
25%
45
Q3
maximum
25%
57
Interquartile range
= 57 30 = 27
70
Variance
Sample variance:
n
S2
(X
i 1
Where
X)
n 1
n X ( X i )
i 1
2
i
i 1
n(n 1)
X = arithmetic mean
n = sample size
Xi = ith value of the variable X
Standard Deviation
(X X)
i1
n -1
Calculation Example:
Sample Standard Deviation
Sample
Data (Xi) :
10
12
14
n=8
S
15
17
18
18
24
Mean = X = 16
126
7
4.2426
Measuring variation
Small standard deviation
12
13
14
15
16
17
18
19
20 21
Mean = 15.5
S = 3.338
20 21
Mean = 15.5
S = 0.926
20 21
Mean = 15.5
S = 4.570
Data B
11
12
13
14
15
16
17
18
19
Data C
11
12
13
14
15
16
17
18
19
Coefficient of Variation
S
CV
X
100%
Comparing Coefficient
of Variation
Stock A:
Average price last year = $50
Standard deviation = $5
S
CVA
X
$5
100%
100% 10%
$50
Stock B:
Average price last year = $100
Standard deviation = $5
S
$5
100%
CVB
100% 5%
$100
X
Both stocks
have the same
standard
deviation, but
stock B is less
variable relative
to its price
Shape of a Distribution
Measures of shape
Symmetric or skewed
Left-Skewed
Symmetric
Right-Skewed
Mean = Median
Where
X
i1
X1 X 2 XN
= population mean
N = population size
Xi = ith value of the variable X
Population Variance
Population variance:
Where
(X )
i1
= population mean
N = population size
Xi = ith value of the variable X
2
(X
)
i
i1
68%
95%
99.7%
cov ( X , Y )
( X X)( Y Y )
i1
n 1
Interpreting Covariance
cov(X,Y) > 0
cov(X,Y) < 0
cov(X,Y) = 0
Coefficient of Correlation
( X X)( Y Y )
i1
2
(
X
X
)
i
i1
2
(
Y
Y
)
i
i1
cov ( X , Y )
SX SY
Features of
Correlation Coefficient, r
Unit free
r = -1
r = -.6
X
Y
r = +1
r=0
r = +.3
r=0
Section 3
Presenting Data
Walpole,Probability and Statistics for Engineers & Scientists, 8th e., Pearson Edu.
Chap 1-64
Chapter Goals
After completing this chapter, you should be able to:
Construct a histogram
Table
Graph
Ordered Array
Stem-and-Leaf Display
Frequency Distributions and Histograms
Bar charts and pie charts
Contingency tables
Ordered Array
Stem-and-Leaf
Display
Frequency Distributions
and
Cumulative Distributions
Histogram
Polygon
Ogive
Stem-and-Leaf Diagram
Example
Data in ordered array:
21, 24, 24, 26, 27, 27, 30, 32, 38, 41
21 is shown as
38 is shown as
Example
(continued)
Leaves
1 4 4 6 7 7
0 2 8
Leaf
12
...
1224 becomes
Stem
6
Leaves
136
2258
346699
13368
10
356
11
47
12
Class Intervals
and Class Boundaries
Find range: 58 - 12 = 46
Frequency
Relative
Frequency
Percentage
3
6
5
.15
.30
.25
15
30
25
4
2
.20
.10
20
10
Class
Histogram Example
Class
Midpoint Frequency
Class
10 but less than 20
20 but less than 30
30 but less than 40
40 but less than 50
50 but less than 60
15
25
35
45
55
3
6
5
4
2
(No gaps
between
bars)
Class Midpoints
Histograms in Excel
1
Select
Tools/Data Analysis
Class
10 but less than 20
20 but less than 30
30 but less than 40
40 but less than 50
50 but less than 60
15
25
35
45
55
3
6
5
4
2
(In a percentage
polygon the vertical axis
would be defined to
show the percentage of
observations per class)
Class Midpoints
Class
Frequency Percentage
Cumulative Cumulative
Frequency Percentage
15
15
30
45
25
14
70
20
18
90
10
20
100
20
100
Total
Class
Less than 10
10 but less than 20
20 but less than 30
30 but less than 40
40 but less than 50
50 but less than 60
Lower
Cumulative
class
boundary Percentage
10
20
30
40
50
60
0
15
45
70
90
100
Scatter Diagrams
Cost per
day
23
125
26
140
29
146
33
160
38
167
42
170
50
188
55
195
60
200
2
Select XY(Scatter) option,
then click Next
3
When prompted, enter the
data range, desired
legend, and desired
destination to complete
the scatter diagram
Graphing Data
Tabulating Data
Summary
Table
Bar
Charts
Pie
Charts
Pareto
Diagram
(Variables are
Categorical)
Stocks
Bonds
CD
Savings
46.5
32.0
15.5
16.0
42.27
29.09
14.09
14.55
Total
110.0
100.0
Amount
(in thousands $)
Percentage
(%)
Stocks
Bonds
CD
Savings
46.5
32.0
15.5
16.0
42.27
29.09
14.09
14.55
Total
110.0
100.0
Amount
(in thousands $)
Percentage
(%)
Stocks
Bonds
CD
Savings
46.5
32.0
15.5
16.0
42.27
29.09
14.09
14.55
Total
110.0
100.0
Savings
15%
Stocks
42%
CD
14%
Bonds
29%
Percentages
are rounded to
the nearest
percent
Pareto Diagram
100%
40%
90%
80%
35%
70%
30%
60%
25%
50%
20%
40%
15%
30%
10%
20%
5%
10%
0%
0%
Stocks
Bonds
Savings
CD
cumulative % invested
(line graph)
Investment
Category
Investor A
Investor B
Investor C
Total
Stocks
46.5
55
27.5
129
Bonds
CD
Savings
32.0
15.5
16.0
44
20
28
19.0
13.5
7.0
95
49
51
Total
110.0
147
67.0
324
10
Inves tor A
20
30
Inves tor B
40
50
Inves tor C
60
Chart Junk
Bad Presentation
Good Presentation
Minimum Wage
1960: $1.00
1970: $1.60
1980: $3.10
$
4
2
0
1960
1990: $3.80
Minimum Wage
1970
1980
1990
No Relative Basis
listen
Bad Presentation
Freq.
300
200
100
0
As received by
students.
Good Presentation
%
30%
As received by
students.
20%
10%
FR SO
JR SR
0%
FR SO JR SR
Good Presentation
Quarterly Sales
50
100
25
0
Q1 Q2
Q3 Q4
Quarterly Sales
Q1
Q2
Q3 Q4
$Good Presentations
Monthly Sales
45
45
Monthly Sales
39
36
42
39
36
42
or
J F M A M J
60
40
20
0