You are on page 1of 42

Chapter 1

Defining and Collecting


Data

Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 1

Learning Objectives
In this chapter you learn to:

Understand the types of variables used in


statistics
Know the different measurement scales
Know how to collect data
Know the different ways to collect a sample
Understand the types of survey errors

Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 2

Types of Variables
DCOVA

Categorical (qualitative) variables have values that


can only be placed into categories, such as yes and
no.

Numerical (quantitative) variables have values that


represent a counted or measured quantity.

Discrete variables arise from a counting process


Continuous variables arise from a measuring process

Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 3

Developing Operational Definitions Is


Crucial To Avoid Confusion / Errors

DCOVA

An operational definition is a clear and precise


statement that provides a common understanding
of meaning

In the absence of an operational definition


miscommunications and errors are likely to occur.

Arriving at operational definition(s) is a key part of


the Define step of DCOVA
Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 4

Operational Definitions Of Terms


DCOVA
VARIABLE
A characteristic of an item or individual.
DATA
The set of individual values associated with a variable.
STATISTICS
The methods that help transform data into useful
information for decision makers.
Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 5

Types of Variables
DCOVA
Variables

Categorical

Numerical

Examples:

Marital Status
Political Party
Eye Color
(Defined categories)

Discrete

Continuous

Examples:

Number of Children
Defects per hour
(Counted items)

Copyright 2015, 2012, 2009 Pearson Education, Inc.

Examples:

Weight
Voltage
(Measured characteristics)

Chapter 1, Slide 6

Levels of Measurement
DCOVA
A nominal scale classifies data into distinct
categories in which no ranking is implied.
Categorical Variables

Categories

Do you have a Facebook


profile?

Yes, No

Type of investment

Growth , Value , Other

Cellular Provider

AT&T, Sprint, Verizon, Other, None

Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 7

Levels of Measurement (cont.)


DCOVA
An ordinal scale classifies data into distinct
categories in which ranking is implied
Categorical Variable

Ordered Categories

Student class designation

Freshman, Sophomore, Junior,


Senior

Product satisfaction

Very unsatisfied, Fairly unsatisfied,


Neutral, Fairly satisfied, Very
satisfied

Faculty rank

Professor, Associate Professor,


Assistant Professor, Instructor

Standard & Poors bond ratings

AAA, AA, A, BBB, BB, B, CCC, CC,


C, DDD, DD, D

Student Grades

A, B, C, D, F

Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 8

Levels of Measurement (cont.)


DCOVA

An interval scale is an ordered scale in which the


difference between measurements is a meaningful
quantity but the measurements do not have a true
zero point.

A ratio scale is an ordered scale in which the


difference between the measurements is a
meaningful quantity and the measurements have a
true zero point.

Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 9

Interval and Ratio Scales


DCOVA

Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 10

Establishing A Business Objective


Focuses Data Collection

DCOVA

Examples Of Business Objectives:

A marketing research analyst needs to assess the effectiveness of a


new television advertisement.

A pharmaceutical manufacturer needs to determine whether a new


drug is more effective than those currently in use.

An operations manager wants to monitor a manufacturing process


to find out whether the quality of the product being manufactured is
conforming to company standards.

An auditor wants to review the financial transactions of a company


in order to determine whether the company is in compliance with
generally accepted accounting principles.
Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 11

Collecting Data Correctly Is A Critical


Task
DCOVA

Need to avoid data flawed by biases,


ambiguities, or other types of errors.

Results from flawed data will be suspect or in


error.

Even the most sophisticated statistical


methods are not very useful when the data is
flawed.
Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 12

Sources of Data
DCOVA

Primary Sources: The data collector is the one using the data
for analysis

Data from a political survey


Data collected from an experiment
Observed data

Secondary Sources: The person performing data analysis is


not the data collector

Analyzing census data


Examining data from print journals or data published on the internet.

Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 13

Sources of data fall into five


categories
DCOVA

Data distributed by an organization or an


individual

The outcomes of a designed experiment

The responses from a survey

The results of conducting an observational


study

Data collected by ongoing business activities


Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 14

Examples Of Data Distributed


By Organizations or Individuals
DCOVA

Financial data on a company provided by


investment services.

Industry or market data from market research


firms and trade associations.

Stock prices, weather conditions, and sports


statistics in daily newspapers.

Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 15

Examples of Data From A


Designed Experiment

DCOVA
Consumer testing of different versions of a product
to help determine which product should be
pursued further.

Material testing to determine which suppliers


material should be used in a product.

Market testing on alternative product promotions


to determine which promotion to use more broadly.

Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 16

Examples of Survey Data


DCOVA

A survey asking people which laundry detergent


has the best stain-removing abilities

Political polls of registered voters during


political campaigns.

People being surveyed to determine their


satisfaction with a recent product or service
experience.
Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 17

Examples of Data Collected


From Observational Studies DCOVA

Market researchers utilizing focus groups to elicit


unstructured responses to open-ended questions.

Measuring the time it takes for customers to be


served in a fast food establishment.

Measuring the volume of traffic through an


intersection to determine if some form of
advertising at the intersection is justified.

Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 18

Examples of Data Collected From


Ongoing Business Activities
DCOVA

A bank studies years of financial transactions to


help them identify patterns of fraud.

Economists utilize data on searches done via


Google to help forecast future economic
conditions.

Marketing companies use tracking data to


evaluate the effectiveness of a web site.
Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 19

Data Is Collected From Either A


Population or A Sample
DCOVA
POPULATION
A population consists of all the items or
individuals about which you want to draw a
conclusion. The population is the large
group
SAMPLE
A sample is the portion of a population
selected for analysis. The sample is the small
group
Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 20

Population vs. Sample


DCOVA
Population

All the items or individuals about


which you want to draw conclusion(s)

Sample

A portion of the population of


items or individuals

Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 21

Collecting Data Via Sampling Is Used


When Selecting A Sample Is
DCOVA

Less time consuming than selecting every item


in the population.

Less costly than selecting every item in the


population.

Less cumbersome and more practical than


analyzing the entire population.

Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 22

Things To Consider / Deal With In


Potential Sources Of Data

DCOVA
Is the source of data structured or unstructured?

How is electronic data formatted?

How is data encoded?

Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 23

Structured Data Follows An Organizing


Principle & Unstructured Data Does Not

A Stock Ticker Provides Structured Data:

The stock ticker repeatedly reports a company name, the


number of shares last traded, the bid price, and the percent
change in the stock price.

Due to their inherent structure, data from tables and


forms are structured data.
E-mails from five people concerning stock trades is an
example of unstructured data.

DCOVA

In these e-mails you cannot count on the information being


shared in a specific order or format.

This book will deal almost exclusively with structured


data
Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 24

Almost All Of The Methods In This Book


Deal With Structured Data

DCOVA
Some of the methods in Chapter 17 involve
unstructured data.

For many of the questions you might want to


answer, the starting point will be tabular data.

To deal with unstructured data, you will probably


need to seek out help with more advanced
methods / techniques.
Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 25

Data Can Be Formatted and / or


Encoded In More Than One WayDCOVA

Some electronic formats are more readily


usable than others.

Different encodings can impact the precision of


numerical variables and can also impact data
compatibility.

As you identify and choose sources of data you


need to consider / deal with these issues

Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 26

Data Cleaning Is Often A Necessary


Activity When Collecting Data DCOVA

Often find irregularities in the data

Typographical or data entry errors


Values that are impossible or undefined
Missing values
Outliers

When found these irregularities should be


reviewed / addressed
Both Excel & Minitab can be used to address
irregularities

Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 27

After Collection It Is Often Helpful To


Recode Some Variables
DCOVA

Recoding a variable can either supplement or replace


the original variable.
Recoding a categorical variable involves redefining
categories.
Recoding a quantitative variable involves changing this
variable into a categorical variable.
When recoding be sure that the new categories are
mutually exclusive (categories do not overlap) and
collectively exhaustive (categories cover all possible
values).

Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 28

A Sampling Process Begins With A


DCOVA
Sampling Frame

The sampling frame is a listing of items that


make up the population
Frames are data sources such as population
lists, directories, or maps
Inaccurate or biased results can result if a
frame excludes certain portions of the
population
Using different frames to generate data can
lead to dissimilar conclusions
Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 29

Types of Samples

DCOVA

Samples

Non-Probability
Samples

Judgment

Convenience

Probability Samples

Simple
Random

Stratified

Systematic

Copyright 2015, 2012, 2009 Pearson Education, Inc.

Cluster

Chapter 1, Slide 30

Types of Samples:
Nonprobability Sample

DCOVA

In a nonprobability sample, items included are


chosen without regard to their probability of
occurrence.

In convenience sampling, items are selected based


only on the fact that they are easy, inexpensive, or
convenient to sample.
In a judgment sample, you get the opinions of preselected experts in the subject matter.

Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 31

Types of Samples:
Probability Sample

DCOVA

In a probability sample, items in the sample


are chosen on the basis of known probabilities.
Probability Samples

Simple
Random

Systematic

Copyright 2015, 2012, 2009 Pearson Education, Inc.

Stratified

Cluster

Chapter 1, Slide 32

Probability Sample:
Simple Random Sample

DCOVA

Every individual or item from the frame has an


equal chance of being selected

Selection may be with replacement (selected


individual is returned to frame for possible
reselection) or without replacement (selected
individual isnt returned to the frame).

Samples obtained from table of random


numbers or computer random number
generators.
Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 33

Selecting a Simple Random Sample


Using A Random Number Table
DCOVA
Sampling Frame For
Population With 850
Items
Item Name Item #
Bev R.
Ulan X.
.
.
.
.
Joann P.
Paul F.

001
002
.
.
.
.
849
850

Portion Of A Random Number Table


49280 88924 35779 00283 81163 07275
11100 02340 12860 74697 96644 89439
09893 23997 20048 49420 88872 08401

The First 5 Items in a simple


random sample
Item # 492
Item # 808
Item # 892 -- does not exist so ignore
Item # 435
Item # 779
Item # 002

Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 34

Probability Sample:
Systematic Sample

DCOVA

Decide on sample size: n

Divide frame of N individuals into groups of k


individuals: k=N/n

Randomly select one individual from the 1st


group

Select every kth individual thereafter


N = 40
n=4
k = 10

First Group

Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 35

Probability Sample:
Stratified Sample

DCOVA

Divide population into two or more subgroups (called strata)


according to some common characteristic

A simple random sample is selected from each subgroup, with sample


sizes proportional to strata sizes

Samples from subgroups are combined into one


This is a common technique when sampling population of voters,
stratifying across racial or socio-economic lines.

Population
Divided
into 4
strata

Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 36

Probability Sample
Cluster Sample

DCOVA

Population is divided into several clusters, each representative of


the population

A simple random sample of clusters is selected

All items in the selected clusters can be used, or items can be


chosen from a cluster using another probability sampling technique

A common application of cluster sampling involves election exit polls,


where certain election districts are selected and sampled.

Population
divided into
16 clusters.

Randomly selected
clusters for sample

Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 37

Probability Sample:
Comparing Sampling Methods

DCOVA

Simple random sample and Systematic sample


Simple to use
May not be a good representation of the populations
underlying characteristics
Stratified sample
Ensures representation of individuals across the entire
population
Cluster sample
More cost effective
Less efficient (need larger sample to acquire the same
level of precision)
Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 38

Evaluating Survey Worthiness


DCOVA

What is the purpose of the survey?


Is the survey based on a probability sample?
Coverage error appropriate frame?
Nonresponse error follow up
Measurement error good questions elicit good
responses
Sampling error always exists

Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 39

Types of Survey Errors

Coverage error or selection bias

People who do not respond may be different from those who


do respond

Sampling error

Exists if some groups are excluded from the frame and have
no chance of being selected

Nonresponse error or bias

DCOVA

Variation from sample to sample will always exist

Measurement error

Due to weaknesses in question design and / or respondent


error
Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 40

Types of Survey Errors

DCOVA
(continued)

Coverage error

Nonresponse error

Sampling error

Measurement error

Excluded from
frame
Follow up on
nonresponses
Random
differences from
sample to sample
Bad or leading
question

Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 41

Chapter Summary
In this chapter we have discussed:

The types of variables used in statistics


The different measurement scales
How to collect data
The different ways to collect a sample
The types of survey errors

Copyright 2015, 2012, 2009 Pearson Education, Inc.

Chapter 1, Slide 42