Ch01 Intro Stat&DataAnalysis

Chapter 1
Introduction Statistics and Data

Analysis
Walpole,Probability and Statistics for Engineers & Scientists, 8th e., Pearson Edu.
Chap 1-1
Section 1
Introduction and Data Collection
Chap 1-2
Section Goals
After completing this chapter, you should be
able to:
Explain key definitions:

Population vs. Sample
Primary vs. Secondary Data
Parameter vs. Statistic
Descriptive vs. Inferential Statistics
Describe key data collection methods
Describe different sampling methods
Probability Samples vs. Nonprobability Samples
Numerical descriptive measures
Presenting data
Why a Manager Needs to

Know about Statistics
To know how to:
properly present information
draw conclusions about populations based

on sample information
improve processes
obtain reliable forecasts
Key Definitions
A population (universe) is the collection of all

items or things under consideration
A sample is a portion of the population

selected for analysis
A parameter is a summary measure that

describes a characteristic of the population
A statistic is a summary measure computed

from a sample to describe a characteristic of
the population
Population vs. Sample

Population
a b
Sample
cd
ef gh i jk l m n
o p q rs t u v w
x y
Measures used to describe

the population are called
parameters
gi
o
n
r
y
Measures computed from
sample data are called
statistics
Two Branches of Statistics
Descriptive statistics
Collecting, summarizing, and describing data
Inferential statistics
Drawing conclusions and/or making decisions

concerning a population based only on sample
data
Descriptive Statistics
Collect data
Present data
e.g., Survey
e.g., Tables and graphs
Characterize data
e.g., Sample mean =

n
Inferential Statistics
Estimation
e.g., Estimate the population

mean weight using the sample
mean weight
Hypothesis testing
e.g., Test the claim that the

population mean weight is 120
pounds
Drawing conclusions and/or making decisions

concerning a population based on sample results.
Why We Need Data
To provide input to survey
To provide input to study
To measure performance of service or

production process
To evaluate conformance to standards
To assist in formulating alternative courses of

action
To satisfy curiosity
Data Sources
Primary
Secondary
Data Collection
Data Compilation
Print or Electronic
Observation
Survey
Experimentation
Reasons for Drawing a Sample
Less time consuming than a census
Less costly to administer than a census
Less cumbersome and more practical to

administer than a census of the targeted
population
Types of Samples Used
Nonprobability Sample
Items included are chosen without regard to

their probability of occurrence
Probability Sample
Items in the sample are chosen on the basis

of known probabilities
Types of Samples Used

(continued)
Samples
Non-Probability
Samples
Judgement
Quota
Chunk
Convenience
Probability Samples
Simple
Random
Stratified
Systematic
Cluster
Probability Sampling
Items in the sample are chosen based on

known probabilities
Probability Samples
Simple
Random
Systematic
Stratified
Cluster
Simple Random Samples
Every individual or item from the frame has an

equal chance of being selected
Selection may be with replacement or without

replacement
Samples obtained from table of random

numbers or computer random number
generators
Systematic Samples
Decide on sample size: n
Divide frame of N individuals into groups of k

individuals: k=N/n
Randomly select one individual from the 1st

group
Select every kth individual thereafter

N = 64
n=8
k=8
First Group
Stratified Samples
Divide population into two or more subgroups (called

strata) according to some common characteristic
A simple random sample is selected from each subgroup,

with sample sizes proportional to strata sizes
Samples from subgroups are combined into one
Population
Divided
into 4
strata
Sample
Cluster Samples
Population is divided into several clusters,

each representative of the population
A simple random sample of clusters is selected
All items in the selected clusters can be used, or items can be

chosen from a cluster using another probability sampling
technique
Population
divided into
16 clusters.
Randomly selected
clusters for sample
Advantages and Disadvantages
Simple random sample and systematic sample
Stratified sample
Simple to use
May not be a good representation of the populations
underlying characteristics
Ensures representation of individuals across the
entire population
Cluster sample
More cost effective

Less efficient (need larger sample to acquire the
same level of precision)
Types of Data
Data
Categorical
Numerical
Examples:
Marital Status
Political Party
Eye Color
(Defined categories)
Discrete
Examples:
Number of Children
Defects per hour
(Counted items)
Continuous
Examples:
Weight
Voltage
(Measured characteristics)
Types of Survey Errors
Coverage error or selection bias
Non response error or bias
People who do not respond may be different from those

who do respond
Sampling error
Exists if some groups are excluded from the frame and

have no chance of being selected
Variation from sample to sample will always exist
Measurement error
Due to weaknesses in question design, respondent

error, and interviewers effects on the respondent
Types of Survey Errors

(continued)
Coverage error
Non response error
Sampling error
Measurement error
Excluded from
frame
Follow up on
nonresponses
Random
differences from
sample to sample
Bad or leading
question
Section 2
Numerical Descriptive Measures
Chap 1-24
Section Goals
After completing this chapter, you should be able
to:
Compute and interpret the mean, median, and

mode for a set of data
Find the range, variance, standard deviation, and

coefficient of variation and know what these values
mean
Compute and explain the correlation coefficient
Use numerical measures along with graphs,

charts, and tables to describe data
Summary Measures
Describing Data Numerically
Central Tendency
Quartiles
Variation
Arithmetic Mean
Range
Median
Interquartile Range
Mode
Variance
Geometric Mean
Standard Deviation
Shape
Skewness
Coefficient of Variation
Measures of Central Tendency

Overview
Central Tendency
Arithmetic Mean
Median
Mode
X
i1
Geometric Mean
XG ( X1 X 2 Xn )1/ n
Midpoint of
ranked
values
Most
frequently
observed
value
Arithmetic Mean
The arithmetic mean (mean) is the most

common measure of central tendency
For a sample of size n:

n
X
Sample size
X
i1
X1 X 2 Xn
n
Observed values
Arithmetic Mean
(continued)
The most common measure of central tendency

Mean = sum of values divided by the number of values
Affected by extreme values (outliers)
0 1 2 3 4 5 6 7 8 9 10
Mean = 3
1 2 3 4 5 15
3
5
5
0 1 2 3 4 5 6 7 8 9 10
Mean = 4
1 2 3 4 10 20
4
5
5
Median
In an ordered array, the median is the middle

number (50% above, 50% below)
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10
Median = 3
Median = 3
Not affected by extreme values
Finding the Median
The location of the median:

n 1
Median position
position in the ordered data
2
If the number of values is odd, the median is the middle number

If the number of values is even, the median is the average of
the two middle numbers
n 1
Note that
is not the value of the median, only the
2
position of the median in the ranked data
Mode
A measure of central tendency

Value that occurs most often
Not affected by extreme values
Used for either numerical or categorical data
There may may be no mode
There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
0 1 2 3 4 5 6
No Mode
Review Example
Five houses on a hill by the beach

$2,000 K
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
$500 K
$300 K
$100 K
$100 K
Review Example:
Summary Statistics
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
Mean:
Median: middle value of ranked data

= $300,000
Mode: most frequent value

= $100,000
Sum 3,000,000
($3,000,000/5)
= $600,000
Which measure of location

is the best?
Mean is generally used, unless

extreme values (outliers) exist
Then median is often used, since

the median is not sensitive to
extreme values.
Example: Median home prices may be

reported for a region less sensitive to
outliers
Geometric Mean
Geometric mean
Used to measure the rate of change of a variable

over time
XG ( X1 X 2 Xn )
1/ n
Quartiles
Quartiles split the ranked data into 4 segments with

an equal number of values per segment
25%
Q1
25%
25%
Q2
25%
Q3
The first quartile, Q1, is the value for which 25% of the
observations are smaller and 75% are larger
Q2 is the same as the median (50% are smaller, 50% are
larger)
Only 25% of the observations are greater than the third
quartile
Quartile Formulas
Find a quartile by determining the value in the
appropriate position in the ranked data, where
First quartile position:
Q1 = (n+1)/4
Second quartile position: Q2 = (n+1)/2 (the median position)

Third quartile position:
Q3 = 3(n+1)/4
where n is the number of observed values
Quartiles
Example: Find the first quartile
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
(n = 9)
Q1 = is in the (9+1)/4 = 2.5 position of the ranked data
so use the value half way between the 2nd and 3rd values,
so
Q1 = 12.5
Q1 and Q3 are measures of noncentral location
Q2 = median, a measure of central tendency
Measures of Variation
Variation
Range
Interquartile
Range
Variance
Standard
Deviation
Coefficient
of Variation
Measures of variation give

information on the spread or
variability of the data values.
Same center,
different variation
Range
Simplest measure of variation

Difference between the largest and the smallest
observations:
Range = Xlargest Xsmallest
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12
Range = 14 - 1 = 13
13 14
Disadvantages of the Range
Ignores the way in which data are distributed

7
10
11
12
Range = 12 - 7 = 5
10
11
12
Range = 12 - 7 = 5
Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
Interquartile Range
Can eliminate some outlier problems by using

the interquartile range
Eliminate some high- and low-valued

observations and calculate the range from the
remaining values
Interquartile range = 3rd quartile 1st quartile

= Q3 Q1
Interquartile Range
Example:
X
minimum
Q1
25%
12
Median
(Q2)
25%
30
25%
45
Q3
maximum
25%
57
Interquartile range
= 57 30 = 27
70
Variance
Average (approximately) of squared deviations of

values from the mean
Sample variance:
n
S2
(X
i 1
Where
X)
n 1
n X ( X i )
i 1
2
i
i 1
n(n 1)
X = arithmetic mean
n = sample size
Xi = ith value of the variable X
Standard Deviation
Most commonly used measure of variation

Shows variation about the mean
Has the same units as the original data
Sample standard deviation:
(X X)
i1
n -1
Calculation Example:
Sample Standard Deviation
Sample
Data (Xi) :
10
12
14
n=8
S
15
17
18
18
24
Mean = X = 16
(10 X)2 (12 X)2 (14 X)2 (24 X )2

n 1
(10 16)2 (12 16)2 (14 16)2 (24 16)2

8 1
126
7
4.2426
A measure of the average

scatter around the mean
Measuring variation
Small standard deviation
Large standard deviation
Comparing Standard Deviations

Data A
11
12
13
14
15
16
17
18
19
20 21
Mean = 15.5
S = 3.338
20 21
Mean = 15.5
S = 0.926
20 21
Mean = 15.5
S = 4.570
Data B
11
12
13
14
15
16
17
18
19
Data C
11
12
13
14
15
16
17
18
19
Advantages of Variance and

Standard Deviation
Each value in the data set is used in the

calculation
Values far from the mean are given extra

weight
(because deviations from the mean are squared)
Coefficient of Variation
Measures relative variation
Always in percentage (%)
Shows variation relative to mean
Can be used to compare two or more sets of

data measured in different units
S
CV
X
100%
Comparing Coefficient
of Variation
Stock A:
Average price last year = $50
Standard deviation = $5
S
CVA
X
$5
100%
100% 10%
$50
Stock B:
Average price last year = $100
Standard deviation = $5
S
$5
100%
CVB
100% 5%
$100
X
Both stocks
have the same
standard
deviation, but
stock B is less
variable relative
to its price
Shape of a Distribution
Describes how data is distributed
Measures of shape
Symmetric or skewed
Left-Skewed
Symmetric
Right-Skewed
Mean < Median
Mean = Median
Median < Mean
Population Summary Measures
Population summary measures are called parameters
The population mean is the sum of the values in the

population divided by the population size, N
N
Where
X
i1
X1 X 2 XN
= population mean
N = population size
Population Variance
Average of squared deviations of values from

the mean
N
Population variance:
Where
(X )
i1
= population mean
N = population size
Population Standard Deviation
Most commonly used measure of variation

Shows variation about the mean
Has the same units as the original data
Population standard deviation:
2
(X
)
i
i1
The Empirical Rule
If the data distribution is bell-shaped, then

the interval:
1 contains about 68% of the values in
the population or the sample
68%
The Empirical Rule
2 contains about 95% of the values in

the population or the sample
3 contains about 99.7% of the values
in the population or the sample
95%
99.7%
The Sample Covariance
The sample covariance measures the strength of the

linear relationship between two variables (called
bivariate data)
The sample covariance:

n
cov ( X , Y )
( X X)( Y Y )
i1
n 1
Only concerned with the strength of the relationship
No causal effect is implied
Interpreting Covariance
Covariance between two random variables:
cov(X,Y) > 0
X and Y tend to move in the same direction
cov(X,Y) < 0
X and Y tend to move in opposite directions
cov(X,Y) = 0
X and Y are independent
Coefficient of Correlation
Measures the relative strength of the linear

relationship between two variables
Sample coefficient of correlation:

n
( X X)( Y Y )
i1
2
(
X
X
)
i
i1
2
(
Y
Y
)
i
i1
cov ( X , Y )
SX SY
Features of
Correlation Coefficient, r
Unit free
Ranges between 1 and 1
The closer to 1, the stronger the negative linear

relationship
The closer to 1, the stronger the positive linear

relationship
The closer to 0, the weaker any positive linear

relationship
Scatter Plots of Data with Various

Correlation Coefficients
Y
r = -1
r = -.6
X
Y
r = +1
r=0
r = +.3
r=0
Section 3
Presenting Data
Chap 1-64
Chapter Goals
After completing this chapter, you should be able to:
Create an ordered array and a stem-and-leaf display
Construct and interpret a frequency distribution, polygon,

and ogive
Construct a histogram
Create and interpret bar charts, pie charts, and scatter

diagrams
Present and interpret category data in bar charts and pie

charts
Describe appropriate and inappropriate ways to display

data graphically
Organizing and Presenting

Data Graphically
Data in raw form are usually not easy to use

for decision making
Some type of organization is needed
Table
Graph
Techniques reviewed here:
Ordered Array
Stem-and-Leaf Display
Frequency Distributions and Histograms
Bar charts and pie charts
Contingency tables
Tables and Charts for

Numerical Data
Numerical Data
Ordered Array
Stem-and-Leaf
Display
Frequency Distributions
and
Cumulative Distributions
Histogram
Polygon
Ogive
The Ordered Array

A sorted list of data:
Shows range (min to max)
Provides some signals about variability
within the range
May help identify outliers (unusual observations)
If the data set is large, the ordered array is
less useful
The Ordered Array

(continued)
Data in raw form (as collected):

24, 26, 24, 21, 27, 27, 30, 41, 32, 38
Data in ordered array from smallest to largest:

21, 24, 24, 26, 27, 27, 30, 32, 38, 41
Stem-and-Leaf Diagram
A simple way to see distribution details in a

data set
METHOD: Separate the sorted data series
into leading digits (the stem) and
the trailing digits (the leaves)
Example
Data in ordered array:
21, 24, 24, 26, 27, 27, 30, 32, 38, 41
Here, use the 10s digit for the stem unit:

Stem Leaf
21 is shown as
38 is shown as
Example
(continued)

21, 24, 24, 26, 27, 27, 30, 32, 38, 41
Completed stem-and-leaf diagram:

Stem
Leaves
1 4 4 6 7 7
0 2 8
Using other stem units
Using the 100s digit as the stem:
Round off the 10s digit to form the leaves

Stem
Leaf
613 would become
776 would become
12
...
1224 becomes
Using other stem units

(continued)
Using the 100s digit as the stem:
The completed stem-and-leaf display:

Data:
613, 632, 658, 717,
722, 750, 776, 827,
841, 859, 863, 891,
894, 906, 928, 933,
955, 982, 1034,
1047,1056, 1140,
1169, 1224
Stem
6
Leaves
136
2258
346699
13368
10
356
11
47
12
Tabulating Numerical Data:

Frequency Distributions
What is a Frequency Distribution?
A frequency distribution is a list or a table
containing class groupings (categories or

ranges within which the data falls) ...
and the corresponding frequencies with which

data falls within each grouping or category
Why Use Frequency Distributions?
A frequency distribution is a way to

summarize data
The distribution condenses the raw data

into a more useful form...
and allows for a quick visual interpretation

of the data
Class Intervals
and Class Boundaries
Each class grouping has the same width

Determine the width of each interval by
range
Width of int erval
number of desired class groupings
Use at least 5 but no more than 15 groupings

Class boundaries never overlap
Round up the interval width to get desirable
endpoints
Frequency Distribution Example

Example: A manufacturer of insulation randomly
selects 20 winter days and records the daily
high temperature
24, 35, 17, 21, 24, 37, 26, 46, 58, 30,
32, 13, 12, 38, 41, 43, 44, 27, 53, 27

(continued)
Sort raw data in ascending order:

12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Find range: 58 - 12 = 46
Select number of classes: 5 (usually between 5 and 15)

Compute class interval (width): 10 (46/5 then round up)
Determine class boundaries (limits): 10, 20, 30, 40, 50, 60
Compute class midpoints: 15, 25, 35, 45, 55
Count observations & assign to classes

(continued)

12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Frequency
Relative
Frequency
Percentage
10 but less than 20

20 but less than 30
30 but less than 40
3
6
5
.15
.30
.25
15
30
25
40 but less than 50

50 but less than 60
4
2
.20
.10
20
10
Class
Graphing Numerical Data:

The Histogram
A graph of the data in a frequency distribution

is called a histogram
The class boundaries (or class midpoints)

are shown on the horizontal axis
the vertical axis is either frequency, relative

frequency, or percentage
Bars of the appropriate heights are used to

represent the number of observations within
each class
Histogram Example
Class
Midpoint Frequency
Class
10 but less than 20
20 but less than 30
30 but less than 40
40 but less than 50
50 but less than 60
15
25
35
45
55
3
6
5
4
2
(No gaps
between
bars)
Class Midpoints
Histograms in Excel
1
Select
Tools/Data Analysis
How Many Class Intervals?
Many (Narrow class intervals)
may yield a very jagged distribution

with gaps from empty classes
Can give a poor indication of how
frequency varies across classes
Few (Wide class intervals)
may compress variation too much and

yield a blocky distribution
can obscure important patterns of
variation.
(X axis labels are upper class endpoints)
Graphing Numerical Data:

The Frequency Polygon
Class
Midpoint Frequency
Class
10 but less than 20
20 but less than 30
30 but less than 40
40 but less than 50
50 but less than 60
15
25
35
45
55
3
6
5
4
2
(In a percentage
polygon the vertical axis
would be defined to
show the percentage of
observations per class)
Class Midpoints
Tabulating Numerical Data:

Cumulative Frequency
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Class
Frequency Percentage
Cumulative Cumulative
Frequency Percentage
10 but less than 20
15
15
20 but less than 30
30
45
30 but less than 40
25
14
70
40 but less than 50
20
18
90
50 but less than 60
10
20
100
20
100
Total
Graphing Cumulative Frequencies:

The Ogive (Cumulative % Polygon)
Class
Less than 10
10 but less than 20
20 but less than 30
30 but less than 40
40 but less than 50
50 but less than 60
Lower
Cumulative
class
boundary Percentage
10
20
30
40
50
60
0
15
45
70
90
100
Class Boundaries (Not Midpoints)
Scatter Diagrams
Scatter Diagrams are used for

bivariate numerical data
Bivariate data consists of paired

observations taken from two numerical
variables
The Scatter Diagram:

one variable is measured on the vertical
axis and the other variable is measured
on the horizontal axis
Scatter Diagram Example

Volume
per day
Cost per
day
23
125
26
140
29
146
33
160
38
167
42
170
50
188
55
195
60
200
Scatter Diagrams in Excel

1
Select the chart wizard
2
Select XY(Scatter) option,
then click Next
3
When prompted, enter the
data range, desired
legend, and desired
destination to complete
the scatter diagram
Tables and Charts for

Categorical Data
Categorical
Data
Graphing Data
Tabulating Data
Summary
Table
Bar
Charts
Pie
Charts
Pareto
Diagram
The Summary Table

Summarize data by category
Example: Current Investment Portfolio
Investment
Amount
Percentage
Type
(in thousands $)
(%)
(Variables are
Categorical)
Stocks
Bonds
CD
Savings
46.5
32.0
15.5
16.0
42.27
29.09
14.09
14.55
Total
110.0
100.0
Bar and Pie Charts
Bar charts and Pie charts are often used

for qualitative (category) data
Height of bar or size of pie slice shows

the frequency or percentage for each
category
Bar Chart Example

Current Investment Portfolio
Investment
Type
Amount
(in thousands $)
Percentage
(%)
Stocks
Bonds
CD
Savings
46.5
32.0
15.5
16.0
42.27
29.09
14.09
14.55
Total
110.0
100.0
Pie Chart Example

Investment
Type
Amount
(in thousands $)
Percentage
(%)
Stocks
Bonds
CD
Savings
46.5
32.0
15.5
16.0
42.27
29.09
14.09
14.55
Total
110.0
100.0
Savings
15%
Stocks
42%
CD
14%
Bonds
29%
Percentages
are rounded to
the nearest
percent
Pareto Diagram
Used to portray categorical data
A bar chart, where categories are shown in

descending order of frequency
A cumulative polygon is often shown in the

same graph
Used to separate the vital few from the trivial

many
Pareto Diagram Example

45%
100%
40%
90%
80%
35%
70%
30%
60%
25%
50%
20%
40%
15%
30%
10%
20%
5%
10%
0%
0%
Stocks
Bonds
Savings
CD
cumulative % invested
(line graph)
% invested in each category (bar

graph)
Tabulating and Graphing

Multivariate Categorical Data
Contingency Table for Investment Choices ($1000s)
Investment
Category
Investor A
Investor B
Investor C
Total
Stocks
46.5
55
27.5
129
Bonds
CD
Savings
32.0
15.5
16.0
44
20
28
19.0
13.5
7.0
95
49
51
Total
110.0
147
67.0
324
(Individual values could also be expressed as percentages of the overall total,

percentages of the row totals, or percentages of the column totals)
Tabulating and Graphing

Multivariate Categorical Data
(continued)
Side by side bar charts

C o m p a rin g In v e s to rs
S avings
CD
B onds
S toc k s
0
10
Inves tor A
20
30
Inves tor B
40
50
Inves tor C
60
Side-by-Side Chart Example
Sales by quarter for three sales territories:
Principles of Graphical Excellence
Present data in a way that provides substance,

statistics and design
Communicate complex ideas with clarity,
precision and efficiency
Give the largest number of ideas in the most
efficient manner
Excellence almost always involves several
dimensions
Tell the truth about the data
Errors in Presenting Data
Using chart junk
Failing to provide a relative

basis in comparing data
between groups
Compressing or distorting the vertical axis
Providing no zero point on the vertical axis
Chart Junk
Bad Presentation
Good Presentation
Minimum Wage
1960: $1.00
1970: $1.60
1980: $3.10
$
4
2
0
1960
1990: $3.80
Minimum Wage
1970
1980
1990
No Relative Basis
listen
Bad Presentation
Freq.
300
200
100
0
As received by
students.
Good Presentation
%
30%
As received by
students.
20%
10%
FR SO
JR SR
0%
FR SO JR SR
FR = Freshmen, SO = Sophomore, JR = Junior, SR = Senior
Compressing Vertical Axis

Bad Presentation
200
Good Presentation
Quarterly Sales
50
100
25
0
Q1 Q2
Q3 Q4
Quarterly Sales
Q1
Q2
Q3 Q4
No Zero Point On Vertical Axis

Bad Presentation
$Good Presentations
Monthly Sales
45
45
Monthly Sales
39
36
42
39
36
42
or
J F M A M J
60
40
Graphing the first six months of sales
20
0

Ch01 Intro Stat&amp;DataAnalysis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ch01 Intro Stat&amp;DataAnalysis

Uploaded by

Copyright:

Available Formats

Chapter 1

Introduction Statistics and Data

Explain key definitions:

Primary vs. Secondary Data

Parameter vs. Statistic

Descriptive vs. Inferential Statistics

Describe key data collection methods

Describe different sampling methods

Probability Samples vs. Nonprobability Samples

Numerical descriptive measures

Why a Manager Needs to

properly present information

draw conclusions about populations based

obtain reliable forecasts

A population (universe) is the collection of all

A sample is a portion of the population

A parameter is a summary measure that

A statistic is a summary measure computed

Population vs. Sample

Measures used to describe

Two Branches of Statistics

Collecting, summarizing, and describing data

Drawing conclusions and/or making decisions

e.g., Tables and graphs

e.g., Sample mean =

e.g., Estimate the population

e.g., Test the claim that the

Drawing conclusions and/or making decisions

Why We Need Data

To provide input to survey

To provide input to study

To measure performance of service or

To evaluate conformance to standards

To assist in formulating alternative courses of

Reasons for Drawing a Sample

Less time consuming than a census

Less costly to administer than a census

Less cumbersome and more practical to

Types of Samples Used

Items included are chosen without regard to

Items in the sample are chosen on the basis

Types of Samples Used

Items in the sample are chosen based on

Simple Random Samples

Every individual or item from the frame has an

Selection may be with replacement or without

Samples obtained from table of random

Decide on sample size: n

Divide frame of N individuals into groups of k

Randomly select one individual from the 1st

Select every kth individual thereafter

Divide population into two or more subgroups (called

A simple random sample is selected from each subgroup,

Samples from subgroups are combined into one

Population is divided into several clusters,

A simple random sample of clusters is selected

All items in the selected clusters can be used, or items can be

Advantages and Disadvantages

Simple random sample and systematic sample

More cost effective

Types of Survey Errors

Coverage error or selection bias

Non response error or bias

People who do not respond may be different from those

Ch01 Intro Stat&DataAnalysis

Ch01 Intro Stat&DataAnalysis