You are on page 1of 22

MATHEM4 1

Basic Statistics

THE NATURE OF DATA Parameter and Statistic

cal
values
measurement
for different
describing
individuals.
someHeights
characteristic
of people,
of a grades
population
on a (N)
test, time it takes for a bus to arrive at the bus stop, hair color and others are some exam

nts, genders and survey responses) that have been collected.


cted to the presidency, he received 39.82% of the 1,865,908 vote cast. If we consider the collection of all those votes to be the population being consid
eriments or surveys, used as a basis for making calculations or drawing conclusions (Microsoft Encarta 2006).
measurement describing some characteristic of a sample (n)

surveyed executives, it was found that 45% of them would not hire anyone whose job application contained a typographical error. The figure of 45% is

this published result: “Twenty-three percent of people polled believed that there are too many polls.”
method of analysis

ethods of planning experiments or observations, obtaining data, and then organizing, summarizing, presenting, analyzing, interpreting, and drawing co

als with the methods of collecting, organizing and summarizing the data in such a way that valid conclusions can be drawn from them (Khazanie,

UNIT I. INTRODUCTION TO STATISTICS (4 hours)


1.1. Definition of Terms 0.30
1.2. Descriptive and Inferential 0.30
1.3. Types and Level of Measurements of Data 0.40
1.4. Population and Sample 0.50
1.5. Random Sampling Techniques 0.50
1.6. Determining Sample Size 1.00
1.7. Methods of Collection of Data 0.25
1.8. Frequency Distribution 0.50
1.9. Methods of Presenting Data 0.25
Definition: STATISTICS : a branch of mathematics that deals with the analysis and interpretation of numerical data in
terms of samples and populations (Microsoft® Encarta® 2008)

The word statistics is derived from the Latin word status (meaning “state”). Early uses of statistics involved
compilations of data and graphs describing various aspects of a state or country.

Statistical investigations and analyses of data fall into two broad categories:
Because nominal data lack any ordering or numerical
significance, they cannot be used for calculations.
Numbers are sometimes assigned to the different MATHEM4 2
categories (especially when data are computerized), but Basic Statistics
these numbers have no real computational significance
and any average calculated with them is meaningless

QUALITATIVE

(1) The “little guy” is back in the stock market in a big way and, as it turns out, is not so likely to be a guy. Women
have rushed to buy stocks during the past two years, more than men. They constitute 57% of all new
shareholders (Time Magazine).

(2) Religious affiliations of college students

QUANTITATIVE
DISCRETE CONTINUOUS

(1) The number of eggs that hens lay are discrete data (1) The amounts of milk that cows produce are
because they represent counts continuous data because they are measurements
that can assume any value over a continuous
(2) The number of children born in a family span. During a given time interval, a cow might
yield an amount of milk that can be any value
between 0 gallons and 5 gallons. It would be
possible to get 2.343115 gallons, because the
cow is not restricted to the discrete amount of 0,
1, 2, 3, or 5 gallons.

(2) Daily dietary intake (µ g) of selenium in wheat


products

FOUR LEVELS OF MEASUREMENT


Another common way of classifying data is to use four levels of measurement: nominal, ordinal, interval, and ratio. In
applying statistics to real problems, the level of measurement of the data is an important factor in determining which
procedure to use. Never do computations and never use statistical methods with data that are NOT appropriate.

For example, it would not make sense to compute an average of social security numbers, because those numbers are data
that are used for identification; they don’t represent measurements or counts of anything.

(1) The NOMINAL LEVEL OF MEASUREMENT is characterized by data that consist of names, labels, or categories
only. The data cannot be arranged in an ordering scheme (such as low or high)

Examples:

(a) Survey responses of yes, no, and undecided


Ordinal data provide information about
relative comparisons, but not the magnitudes
of the differences. They should not be used MATHEM4 3
for calculations. Basic Statistics

(b) Blood types of people living in a certain community

(2) Data at the ORDINAL LEVEL OF MEASUREMENT can be arranged in some order, but differences between
data values cannot be determined or are meaningless.

The following are examples of sample data at the ordinal level of measurement

Course Grades: A college professor assigns grades of A, B, C, D or F. These grades can be arranged in order, but we
can’t determine differences between the grades. Thus we know, for example, that A is higher than B (so there is an
ordering), but we cannot subtract B from A (so the difference cannot be found)

Rankings: Based on several criteria, a magazine ranks cities according to their “livability.” Those rankings (first, second,
third, and so on) determine an ordering. However, the differences between rankings are meaningless. For example, a
difference of “second minus first” might suggest 2 – 1 = 1, but this difference of 1 is meaningless because it is not exact
quantity that can be compared to other such differences. The difference between the first city and the second city is not
the same as the difference between the second city and the third city. Using the magazine rankings, the difference
between Baguio City and Tagaytay City cannot be quantitatively compared to the difference between Metro Manila and
Metro Cebu.

(3) The INTERVAL LEVEL OF MEASUREMENT is like the ordinal level, with the additional property that the
difference between any two data values is meaningful. However, there is no natural zero starting point (where none of
the quantity is present) and ratios of data values are not meaningful.

The following are examples of data at the interval level of measurement.

(4) The RATIO LEVEL OF MEASUREMENT is the highest level which is an interval level modified to include the
natural zero starting point (where zero indicates that none of the quantity is present). For values at this level, differences
and ratios are both meaningful.

stanceand
1776, between
1492. the two did
(Time values). However,
not begin in thethere
year is
0, no
so natural
the yearstarting point. The
0 is arbitrary value
instead of 00Fa might
of being naturalseem
zerolike a starting
starting point, but it is arbitrary and do
point.)
MATHEM4 4
Basic Statistics

The following are examples of data at the ratio level of measurement. Note the presence of the natural zero value, and
the use of meaningful ratios of “twice” and “three times.”

Exercise: The following data describe the different data associated with a state senator. For each data entry, indicate the
corresponding level of measurement.

(1) The senator’s name is Carah Bao.

Weights: Weights (in carats) of diamond engagement rings (0 does represent no weight, and 4 carats is twice
as heavy as 2 carats)

Prices: Prices of college textbooks (Php 0.00 does represent no cost, and a Php 900.00 book is three times as
costly as a Php 300.00 book

Length: A tilapia 8 inches long is twice as long as a 4-inch tilapia.

(2) The senator is 58 years old.


(3) The years in which the senator was elected to the senate are 1963, 1969, 1981, and 1994.
(4) Her total taxable income last year was Php 278,317.19.
(5) The senator sponsored a bill to protect water rights. Out of 1100 voters in her district, 400 hundred said they strongly
favored the bill, 300 said they favored the bill, 200 said they were neutral, 150 said they did not favor the bill and 50 said
they strongly did not favor the bill.
(6) The senator is married now.
(7) However, the senator was previously divorced in 1965 and again in 1982.
(8) A leading news magazine claims the senator is ranked seventh for her voting record on bills regarding public
education

Population (N), Sample (n), Data and Variables

One of the goals of a statistical investigation is to explore the characteristics of a large group of items on the basis of a
few. Sometimes it is physically, economically, or for some other reason almost impossible to examine each item in a
group under study. In such situation the only recourse is to examine a sub-collection of items from this group. In
statistics we commonly use the terms population and sample.

Example 1:
Suppose an ornithologist is interested in investigating migration patterns of birds in the Northern Hemisphere. Then all
the birds in the Northern Hemisphere will represent the population of interest to him. His choice of the population

Definition:
A population (N) is the complete collection of all elements (scores, people, measurements, and so on) to be studied. The collection is co

restricts him, for it does not include birds that are native to Australia and do not migrate to the Northern Hemisphere.

Example 2:
Every ten years the Bureau of Census conducts a survey of the entire population of the Philippines accounting for every
person regarding sex, age, and other characteristics. The last such survey was carried out last 2000. In this case the entire
population of the Philippines is the population in the statistical sense.

A population can be finite or infinite and is made up of study units


MATHEM4 5
Basic Statistics

The terms population and sample are relative. A collection that constitute a population in one context may well be a
sample in another context.

For instance, if we wish to learn how people in Greater Manila Area (GMA) feel about a certain national issue, then all
the residents of Greater Manila Area (GMA) would constitute the population of interest. However, assuming that Greater
Manila Area (GMA) represents a cross section of the Philippine population, if we use the response from these residents
to understand the feelings about the issue among all the Filipino residents, then the residents of Greater Manila Area
(GMA) would represent a sample.

RANDOM SAMPLING TECHNIQUES

When we do have specific objective, and we want to collect the data and do the analysis that will help us to meet that
objective, we typically get our data from two common sources: observational studies (such as polls) and experiments
(such as using a treatment to improve hair growth).

population) in a particular city, we do not have access to those persons who do not have a telephone.

among all men with cholesterol levels above a specified value; however short of sampling all men in the community, only those men who for some rea

y only that part of it that is available.

to conduct a complete census of the population by collecting data for every study unit in it.

onceivable and impossible to investigate every crab individually. The only way to make any kind of educated guess about their behavior would be by e

. It would not be practical to test all the bulbs, because the bulbs that are tested will never reach the market. So we might pick 50 of these bulbs to test.
MATHEM4 6
Basic Statistics

In an observational study, we observe and measure specific characteristics, but we don’t attempt to modify the subjects
being studied. In an experiment, we apply some treatment and then proceed to observe its effect on the subjects.

SAMPLE SIZE
An important consideration in conducting research is the size of your sample. It must be large enough so that erratic
behavior of very small samples will not produce misleading results. Repetition of a research or an experiment is called
replication.
A large sample is not necessarily a good sample. Although it is important to have a sample that is sufficiently large, it is
more important to have a sample in which the elements have been chosen in an appropriate way, such as random
selection.

Use a sample size large enough so that we can see the true nature of any effects or phenomena, and obtain the sample
using an appropriate method, such as one based on randomness.

RANDOMIZATION
One of the worst mistakes is to collect data in a way that is inappropriate. We cannot overstress this very important
point:

Data carelessly collected may be so completely useless that no amount of statistical torturing can salvage them.

COMMON METHODS OF SAMPLING

In a random sample members of the population are selected in such a way that each has an equal chance of being
selected. Sampling is a process or procedure which involves taking a part of a population, making observation on this
representatives and the generalizing the findings to the bigger population. (Ary, Jacob and Razavieh, 1981).

Probability Sampling – is a random sampling technique that each element in a population has an equal chance of being
selected.

Non-probability Sampling – is a non-random sampling technique that each element in a population has no equal chance
of being selected.

PROBABILITY SAMPLING

1.) Simple Random Sampling – entails all elements are given an equal chance of being included in the sample. No one
from the population is excluded from the pool. This is implemented if the population is homogenous.

2.) Systematic Sampling – This is a sampling after every regular interval. This can be undertaken if the features of the
population normally characterize what would be applied to simple random sampling. The key here is to select some
starting point and then select every kth (such as every 50th) element in the population.

Determine the sample size (n)


Determine the interval (I)

I = N/n

3.) Stratified Sampling – entails subdividing the population according to a certain characteristic, then selecting the
samples from every subgroup or stratum. This is resorted to when it is important to get response per subgroup or stratum.
It is useful if there is a need to differentiate the characteristics of a heterogeneous population and the elements are
geographically concentrated in a given area.

4.) Cluster Sampling – entails random selection of groups in a population who could serve as the respondents of the
study. This is best applied if it deals with population with homogeneous characteristics but geographically dispersed in
different parts of the country. The population area is divided into sections (or clusters), then randomly select some of
those clusters and choose all the members or elements from those selected clusters.
MATHEM4 7
Basic Statistics

SAMPLING STRATEGIES APPROPRIATE TO PARTICULAR FEATURES OF THE POPULATION

Personal Attributes Geographical Spread Sampling Strategies


Concentrated Simple Random or Systematic
Homogeneous Dispersed 1) Cluster Sampling
2) Simple Random or Systematic
Concentrated 1) Stratified Sampling
Heterogeneous 2) Simple Random or Systematic
1) Stratified
Dispersed 2) Cluster
3) Simple Random or Systematic

Determination of s Sample size (n) provided that the Population size (N) is known

Slovin’s Formula Lynch et. al Formula

N NZ 2 p (1 − p )
n≥ n≥
1 + Ne 2 Nd 2 + Z 2 p (1 − p )

N = Population Size
Z = value of the normal variable for a reliability level
n = sample size
Z = 1.645 (90% reliability in obtaining the sample size))
e = margin of error (0.10, 0.05, or 0.01)
Z = 1.96 (95% reliability in obtaining the sample size)
Z = 2.575 (99% reliability in obtaining the sample size)
p = 0.50 (proportion of getting a good sample)
(1 – p) = 0.50 (proportion of getting a poor sample)
d = 0.01, 0.025, 0.05, or 0.10 (choice of sampling error)
N = population size
n = sample size

NON-PROBABILITY SAMPLING

1) Accidental/Convenience Sampling – Simply use results that are readily available or accessible. Usually the first
person who comes along who typifies a unit of analysis serves as the respondent of the study.

2) Purposive Sampling – Implemented with the researcher defining a criterion or set of criteria for determining the
respondents of the study. It is the researcher’s judgment that becomes the basis for selecting an element or
group that will serve as the unit of analysis. It is useful in qualitative or exploratory studies. The objective is not
to have many respondents but to make sure that the person who would be interviewed will provide a wealth of
information. The aim is not to quantify but to characterize an event being studied.

3) Quota Sampling - Similar to stratified sampling except that the selection of the elements per stratum is done
through the application of random sampling strategy. Quota sampling entails grouping elements according to
certain characteristics and ensuring that each group is represented. Quota sampling is helpful if the sampling
frame is not available per group or stratum. It refines the application of convenience sampling since there is
conscious intent on the part of the researcher to view the probable differences of every stratum or group with
regard to the critical variables of the study.

4) Snowball or Referral Sampling – Involves having a respondent refers other people who are in a position to
answer some of the questions of the researcher. This is a particularly helpful in the study of highly sensitive
topics where the identity of respondents is difficult to divulge or may even be unknown to many. In other
MATHEM4 8
Basic Statistics

words, if the sampling frame cannot be provided and the topic has security implications, a researcher could
obtain referrals from the first respondent to the other respondents who may be willing to talk.

A sampling error is the difference between a sample result and the true population result; such an error results from
chance sample fluctuations.

A nonsampling error occurs when the sample data are incorrectly collected, recorded, or analyzed (such as by selecting a
biased sample, using a defective measurement instrument, or copying the data incorrectly).

METHODS OF PRESENTATION OF DATA

Statistical data collected should be arranged in such a manner that will allow a reader to distinguish their essential
features. Depending on a type of information and the objectives of the person presenting the information, data may be
presented using one or a combination of three forms: TEXTUAL, TABULAR, and GRAPHICAL.

TEXTUAL FORM – The textual or paragraph form is utilized when the data to be presented are purely qualitative or
when very few numbers are involved. This method is, generally, not desirable when too many figures are involved as the
reader may fail to grasp the significance of certain quantitative relationships, but it becomes an effective device when the
objective is to call the reader’s attention to some data that require special emphasis.

Example:

s or time periods are chronologically arranged on the horizontal axis and the relevant values are indicated on the vertical axis. Variations in the data are

From a newspaper report, it was gathered that China has a population of 707 million, India has 505 million, US has
207 million, USSR (before the break-up) has 245 million, and Indonesia has 125 million. That more than half of the
world’s people, about 2.1 billion live in Asia, 456 million in Europe, 354 million in North America, 195 million in South
America, and 20 million in Oceana. Shanghai has 10,820,000; Tokyo has 8,841,000; New York has 7,895,000; and
Moscow has 7,050,000.

TABULAR FORM – A more effective device of presenting data because the data are presented in more concise and
systematic manner. People who want to make some comparisons and draw relationships usually find tabular arrangement
more convenient and understandable than the textual presentation. The data are presented through tables consisting of
vertical columns and horizontal rows with headings describing these rows and columns.

Example:

Continent/Region Population Country Population Cities Population


Asia 2,100,000,000 China 707,000,000 Shanghai 10,820,000
India 505,000,000
Indonesia 125,000,000
Tokyo 8,841,000
North America 354,000,000 USA 207,000,000 New York 7,895,000
Europe 465,000,000 USSR 245,000,000 Moscow 7,050,000
South America 195,000,000
Oceana 20,000,000

GRAPHICAL OR PICTORIAL FORM – Among the different methods of presenting data, the graph or chart is
perhaps the most effective device for attracting people’s attention. Readers who look for comparisons and trends may
skip statistical tables but may pause to examine graphs. Graph has a great advantage over tables because graph conveys
quantitative values and compares more readily than tables.
MATHEM4 9
Basic Statistics

5
4
3
2
1
5
4
3
2
1
P
Y
,0
9
5
e
o
2
1
0
8
3
9
7
5
9
0
5
0
4
3
2
1
9
8
7
6
5
0
a
p
0

r
u
l
a
t
i
o
n
MATHEM4 10
Basic Statistics
3
4
5
7
8
2
1
0
A
Cou
2
5
0
nt
v
e
r
a
g
e

N
u
m
b
e
r

o
f

C
i
g
a
r
e
t
t
e

S
m
o
k
e

P
e
r

D
a
y
MATHEM4 11
Basic Statistics

(B
7G
A
8
E
9
3
0
5
1
2
4
A
Count
e
M
1
vo
8x7
3
5
0
lc
o
ce
so
rd
-e
a
tw
la
d
8
o
lg
9
yA
r6
2
e
9
e
v
)n
m
e
7
t(A
ri
5
b
M
a
(o
c
sv-g
M
o
te
sl8
P
ty0
l)e
y
r
f
o
r
m
a
n
c
e

20 Gender/Sex
well as chronological comparisons may be shown graphically be means of a bar graph.
Female A graph essentially consists of bars or rectangles which are draw
Male
15
Count

10

0
Excellent Good or Above Average Below Average
(Mostly 93 - Average (Mostly 81 - (Mostly 75 - 80)
99) (Mostly 87 - 92) 86)

Academic Performance
MATHEM4 12
Basic Statistics
4
7
28
11
C
A
10
15
20
25
30ount
Below
(Mostly
Good
Average
Excellent
0
5 orAverage
75
Above
(Mostly
(Mostly
- 80) 87
81
93––92)
86)
99)
c
a
d
e
m
i
c

P
e
r
f
o
r
m
a
n
c
e
MATHEM4 13
Basic Statistics

Academic
ng the relative magnitudes of the component parts of a whole. It is constructed by dividing a circle (pie) into sectors, each sector having a size proportio
4
Performance
Excellent (Mostly 93
- 99)
11
Good or Above
7 Average (Mostly 87
- 92)
Average (Mostly 81
- 86)
Below Average
(Mostly 75 - 80)

28
MATHEM4 14
Basic Statistics
MATHEM4 15
Basic Statistics

30 Gender/Sex
total. Each bar, representing 100%, is subdivided so that the length of each part corresponds to the proportion or percentage of the total number of cas
Female
Male

25

20

Count
15

10

0
Excellent (Mostly Good or Above Average (Mostly Below Average
93 - 99) Average (Mostly 81 - 86) (Mostly 75 - 80)
87 - 92)

Academic Performance

Gender/Sex
Female
Male
Below Average
(Mostly 75 - 80)
Academic Performance

Average (Mostly 81 -
86)

Good or Above
Average (Mostly 87 -
92)

Excellent (Mostly 93 -
99)

0 5 10 15 20 25 30

Count
MATHEM4 16
Basic Statistics

for attracting attention since it employs pictures or symbols which are normally drawn of the same size and in rows. Large figures are generally shown
MATHEM4 17
Basic Statistics

Statistical Maps – Statistical maps are used to present quantitative data which describe or classify geographical areas.
MATHEM4 18
Basic Statistics

Summarizing Data with Frequency Tables

Frequency Distribution – A tabular arrangement of data showing its classification or grouping according to magnitude or
size.

Variations of Frequency Distribution Table

Frequency Distribution
Table

Relative Frequency Cumulative Frequency


Distribution Table Distribution Table

Cumulative Frequency Cumulative Frequency


Table “Less Than” Distribution Table
“Greater Than”
MATHEM4 19
Basic Statistics

When working with large data sets, it is generally helpful to organize and summarize the data by constructing a
frequency table.
Components of a Frequency Distribution

of values, along with frequencies (or counts) of the number of values that fall into each class. The frequency for a particular class is the

of a class. It is the highest (upper limit) and the lowest (lower limit) values that can go into each class.
mallest numbers that can belong to the different classes
rgest numbers that can belong to the different classes.

TS) – “true” class limits defined by lower and upper boundaries. Class boundaries are numbers used to separate classes, but without the g

– the average of the lower and upper limits or boundaries of each class.

e of values used in defining a class. Simply the length or width of a class. It is the difference between two consecutive lower class limits o
MATHEM4 20
Basic Statistics

Constructing Frequency Tables


The main reason for constructing a frequency table is to use it for constructing a graph that effectively shows the
distribution of the data (for example histogram).

gested Steps in Constructing a Frequency Distribution


he following guidelines when constructing a frequency table.
y the given raw data in ascending or descending order if necessary.
mine the arrayed
mutually dataIn
exclusive. and identify
other theeach
words, highest value
of the (HV)values
original and themust
lowest value
belong to (LV)
only one class.
rmine the range using the formula:
Range
the frequency = Highest Value – Lowest Value
is zero.
R = HV - LV
rmine
or the classSometimes
all classes. interval width/size
open-ended(i) using the formula:
intervals, such as “65 years or older,” are impossible to avoid.

Class
or class limits. interval=RangeTentative
Round number
up to use fewer decimal of classes
places or use numbers relevant to the situation.

Sturges Rule
ses. Some instructors : Kuse
made = 1 of
+ 3.322 log n Rule in determining the number of classes.
the Sturges

re: K = tentative number of classes to use


n = total number of cases in an observation
log = common
quencies logarithm
must equal (baseof10)
the number original data values.

the arrayed data into appropriate classes using convenient and easy to read class limits. Start the first class with a lower limit either equal
p the class boundaries if necessary
nt or tally the number of observations into the appropriate class intervals.
MATHEM4 21
Basic Statistics

Example:
In “Ages of Oscar-winning Best Actors and Actresses” (Mathematics Teacher magazine) by Richard Brown and
Gretchen Davis, presents the results for recent winners from each category.

Actors: 32 37 36 32 51 53 33 61 35 45 55 69
76 37 42 40 32 60 38 56 48 48 40 43
62 43 42 44 41 56 39 46 31 47 45 60
Actresses: 50 44 35 80 26 28 41 21 61 38 49 33
74 30 33 41 31 35 41 42 37 26 34 34
35 26 61 60 34 24 30 37 31 27 39 34

lative Frequency Table


important variation of the basic frequency table uses relative frequencies (rf), which are easily found by dividing each class frequency b
elative frequency table includes the same class limits as a frequency table, but relative frequencies are used instead of actual frequencies.

class frequencysum of all frequencies(100%)

Age Tally Frequency


21–25 || 2
26–30 ||||| 5
31–35 |||||-|||||-||||| 15
36–40 |||||-|||||-|||| 14
41–45 |||||-|||||-||| 13
46–50 |||||-|| 7
51–55 ||| 3 56–60

dard frequency table is used when cumulative totals are desired. The cumulative frequency for a class is the sum of the frequencies for th

||| 3 61–65
|||||-|| 7 66–
70 0
71–75 | 1
76–80 || 2
---------------
n = 72

Standard Frequency Relative Frequency Cumulative Frequency Table


Table Table
Age Frequency Age Frequency <cf <cf% >cf >cf%
(f) (f)
21 – 25 2 21 – 25 2.7% 2 2.78% 72 100.00%
26 – 30 5 26 – 30 6.94% 7 9.72% 70 97.22%
31 – 35 15 31 – 35 20.83% 22 30.56% 65 90.28%
36 – 40 14 36 – 40 19.44% 36 50.00% 50 69.44%
41 – 45 13 41 – 45 18.06% 49 68.06% 36 50.00%
46 – 50 7 46 – 50 9.72% 56 77.78% 23 31.94%
51 – 55 3 51 – 55 4.17% 59 81.94% 16 22.22%
MATHEM4 22
Basic Statistics

56 – 60 3 56 – 60 4.17% 62 86.11% 13 18.06%


61 – 65 7 61 – 65 9.72% 69 95.83% 10 13.89%
66 – 70 0 66 – 70 0.00% 69 95.83 3 4.17%
71 – 75 1 71 – 75 1.39% 70 97.22% 3 4.17%
76 – 80 2 76 – 80 2.78% 72 100.00% 2 2.78%
N = 72 100.00%

You might also like