You are on page 1of 5

1

Math 103
Statistics and
Probability
Basic Concepts of Statistics
CJD
Statistics
Statistics
Specific numbers that have been observed
Observation, Presentation, Analysis and
Interpretation of Chance Outcomes
Descriptive Statistics methods concerned with
collecting and describing data to yield meaningful
information.
Inferential Statistics methods concerned with analysis
of a subset of data to predict or infer about the entire set
of data.
CJD
Descriptive Statistics
Source : http://bsp.gov.ph
Dollar - Peso Rates
0.00
10.00
20.00
30.00
40.00
50.00
60.00
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
Year
1
$
=
CJD
Observations
Independent Variables data held constant to determine
the values of the dependent variables
Experiment Any process that generates a set of data
Observations The recorded information as a result
of an experiment
Dependent Variables data as a result of an experiment
when independent variables are fixed
2
CJD
Types of Data
Quantitative Data (Numerical)
Discrete (countable)
ex. counts, salary, test
scores
Continuous (no gaps)
ex. weight, time, force,
distance, volume
Qualitative Data (Categorical)
ex. Blood type,
gender, yes/no, car
model, profession
CJD
4 Scales of Data
100 kilos is
twice as heavy
as 50 kilos
Prices; Weights;
50 kilos
70 kilos
80 kilos
Like interval, but with an
inherent starting point.
Ratios are meaningful
Ratio
90F is not
twice as hot as
45F.
Year; Seasonal
temperatures:
50F
75F
100F
Differences between
values can be found, but
there may be no inherent
starting point; ratios are
meaningless.
Interval
An order is
determined by
compact, mid,
sport utility.
Types of Autos:
5 compact
15 mid-size
20 sport utility
Categories are ordered,
but differences cannot be
determined or are
meaningless
Ordinal
Categories or
names only.
Blood Types; Yes/No;
Baseball Players:
5 infielders
10 outfielders
15 pitchers
Categories only. Data
cannot be arranged in
ordered sequence
Nominal
Explanation Example Summary Level
CJD
Where the data is from
Population
Totality of all
observations we are
concerned
Parameter : a
characteristic of a
population
Census : collection of
data from every
element of population.
Sample
Subset of Population
Statistic a
characteristic of a
sample
CJD
Samples
Why Sample?
The population may be too big to observe
ex. all Filipino citizens
Costs may be prohibitive
ex. Surveys may be expensive
Experiments may be destructive
ex. Light bulb life
Biased sampling procedures consistently overestimates
or underestimates some characteristic of the population.
Use inferential statistics to generalize
information about the population based on
information obtained from the sample.
3
CJD
Example: A researcher wants to find out the average
weight of 3,000 students in a college. How big must the
sample be to have a 5% margin of error ?
Slovins Formula
2
1 Ne
N
n
+
=
N = Population size
n = sample size
e = margin of error
353
5 . 8
3000
) 05 . 0 ( 3000 1
3000
2
= =
+
= n
When used: When nothing is known about the population.
Otherwise, more accurate formulas are available.
CJD
Sampling Methods
Simple Random Sample
- Eliminates possibility of a bias
- choose sample so that every subset of n
observations from the population has the same
chance of being selected.
Use random numbers using mechanical devices,
tables, or computers
Systematic Sampling selects every k-th element
with starting point chosen at random
Stratified Random Sampling partition population
and select proportional random samples from each
subpopulaton
Cluster Sampling perform simple random
sampling only on randomly selected subpopulations
CJD
Simple Random Sample Example
Class of 270 students.
Want a simple random sample of 10 students.
ROW
0 00157 37071 79553 31062 42411 79371 25506 69135
1 38354 03533 95514 03091 75324 40182 17302 64224
2 59785 46030 63753 53067 79710 52555 72307 10223
3 27475 10484 24616 13466 41618 08551 18314 57700
4 28966 35427 09495 11567 56534 60365 02736 32700
5 98879 34072 04189 31672 33357 53191 09807 85796
1. Number the units: Students numbered 001 to 270.
2. Choose a starting point: Row 3, 2
nd
column (10484)
3. Read off consecutive numbers: (3-digit labels here)
104, 842, 461, 613, 466, 416, 180, 855, 118, 314, 577, 002, 896,
4. If number corresponds to a label, select that unit.
If not, skip it. Continue until desired sample size obtained.
Or use a computer to generate random numbers from 1 to 270.
CJD
Systematic Sampling
Order the population of units in some way, select one of
the first k units at random and then every k
th
unit thereafter.
College survey: Order list of rooms starting at top floor
of 1
st
undergrad dorm. Pick one of the first 11 rooms at
random => room 3, then pick every 11
th
room after that.
Note: often a
good alternative
to random
sampling but
can lead to a
biased sample.
4
CJD
Stratified Random Sampling
Divide population of units into groups (called strata)
and take a simple random sample from each of the strata.
College survey: Two strata = undergrad & graduate dorms.
Take a simple
random sample
of 15 rooms from
each of the strata
for a total of 30
rooms.
Ideal: stratify
so little variability
in responses within
each of the strata.
CJD
Stratified Proportional Allocation
Example :
Suppose 38 students in a class were classified based
on place of birth. 20 are from NCR, 8 from Luzon (other
than NCR), 6 from the Visayas, and 4 from Mindanao.
If a sample of 10 is to be made, how many from each
classification should be selected?
Solution :
NCR: 20 * (10/38) = 5.26 5
Luzon: 8 * (10/38) = 2.10 2
Visayas: 6 * (10/38) = 1.57 2
Mindanao: 4 * (10/38) = 1.05 1
Total = 10
CJD
Cluster Sampling
Divide population of units into groups (called clusters),
take a random sample of clusters and
measure only those items in these clusters.
College survey: Each floor of each dorm is a cluster.
Take a random sample
of 5 floors and all
rooms on those floors
are surveyed.
Advantage: need only
a list of the clusters
instead of a list of all
individuals.
CJD
Summation
40 values data x all sum = =

x
40
5 4 3 2 1
5
1
= + + + + =

=
x x x x x x
i
i
510 400 36 16 9 49
2
5
2
4
2
3
2
2
2
1
5
1
2
= + + + + = + + + + =

=
x x x x x x
i
i
Data:
x
1
= 7
x
2
= 3
x
3
= 4
x
4
= 6
x
5
= 20
y
1
= 1
y
2
= 3
y
3
= 2
y
4
= -1
y
5
= 0

=
= + + + + =
5
1
15 5 4 3 2 1
i
i
11 0 6 8 9
5
2
= + + =

= i
i i
y x

= = =
= + + + + + = + + =
2
1
3
1
2
1
60 ) 6 9 3 ( ) 14 21 7 ( 2 3 1
i j i
i i i j i
x x x y x
14400 ) 120 ( 3
2
2
5
1
= =

= i
i
x
5
CJD
Summation Theorems

= = =
+ = +
n
i
i
n
i
i
n
i
i i
y x y x
1 1 1
) (

= =
=
n
i
n
i
i i
x c cx
1 1

=
=
n
i
cn c
1
CJD
Multidimensional Data
25
x
24
41
x
23
31
x
22
42
x
21
2
nd
Floor
(i=2)
30
x
14
45
x
13
28
x
12
40
x
11
1
st
Floor
(i=1)
4
th
Room
(j=4)
3
rd
Room
(j=3)
2
nd
Room
(j=2)
1
st
Room
(j=1)
# of students
x
ij
282 ) 25 41 31 42 ( ) 30 45 28 40 (
) ( ) (
) (
24 23 22 21 14 13 12 11
4 3 2
2
1
4
1
2
1
1
= + + + + + + + =
+ + + + + + + =
+ + + =

= = =
x x x x x x x x
x x x x x
i i i
i j i
i ij
CJD
Exercise

=
=
=

=
=
5
1
2
5
1
2
5
1
3 evaluate
300 and
50 if
i
i
i
i
i
i
) (x
x
x
45 ) 5 ( 9 ) 50 ( 6 300
9 6
) 9 6 ( 3
5
1
5
1
5
1
2
5
1
2
5
1
2
= + =
+ =
+ =


= = =
= =
i i i
i i
i
i i
i
i
x x
x x ) (x
CJD
End

You might also like