Professional Documents
Culture Documents
Basic Statistics
cal
values
measurement
for different
describing
individuals.
someHeights
characteristic
of people,
of a grades
population
on a (N)
test, time it takes for a bus to arrive at the bus stop, hair color and others are some exam
surveyed executives, it was found that 45% of them would not hire anyone whose job application contained a typographical error. The figure of 45% is
this published result: “Twenty-three percent of people polled believed that there are too many polls.”
method of analysis
ethods of planning experiments or observations, obtaining data, and then organizing, summarizing, presenting, analyzing, interpreting, and drawing co
als with the methods of collecting, organizing and summarizing the data in such a way that valid conclusions can be drawn from them (Khazanie,
The word statistics is derived from the Latin word status (meaning “state”). Early uses of statistics involved
compilations of data and graphs describing various aspects of a state or country.
Statistical investigations and analyses of data fall into two broad categories:
Because nominal data lack any ordering or numerical
significance, they cannot be used for calculations.
Numbers are sometimes assigned to the different MATHEM4 2
categories (especially when data are computerized), but Basic Statistics
these numbers have no real computational significance
and any average calculated with them is meaningless
QUALITATIVE
(1) The “little guy” is back in the stock market in a big way and, as it turns out, is not so likely to be a guy. Women
have rushed to buy stocks during the past two years, more than men. They constitute 57% of all new
shareholders (Time Magazine).
QUANTITATIVE
DISCRETE CONTINUOUS
(1) The number of eggs that hens lay are discrete data (1) The amounts of milk that cows produce are
because they represent counts continuous data because they are measurements
that can assume any value over a continuous
(2) The number of children born in a family span. During a given time interval, a cow might
yield an amount of milk that can be any value
between 0 gallons and 5 gallons. It would be
possible to get 2.343115 gallons, because the
cow is not restricted to the discrete amount of 0,
1, 2, 3, or 5 gallons.
For example, it would not make sense to compute an average of social security numbers, because those numbers are data
that are used for identification; they don’t represent measurements or counts of anything.
(1) The NOMINAL LEVEL OF MEASUREMENT is characterized by data that consist of names, labels, or categories
only. The data cannot be arranged in an ordering scheme (such as low or high)
Examples:
(2) Data at the ORDINAL LEVEL OF MEASUREMENT can be arranged in some order, but differences between
data values cannot be determined or are meaningless.
The following are examples of sample data at the ordinal level of measurement
Course Grades: A college professor assigns grades of A, B, C, D or F. These grades can be arranged in order, but we
can’t determine differences between the grades. Thus we know, for example, that A is higher than B (so there is an
ordering), but we cannot subtract B from A (so the difference cannot be found)
Rankings: Based on several criteria, a magazine ranks cities according to their “livability.” Those rankings (first, second,
third, and so on) determine an ordering. However, the differences between rankings are meaningless. For example, a
difference of “second minus first” might suggest 2 – 1 = 1, but this difference of 1 is meaningless because it is not exact
quantity that can be compared to other such differences. The difference between the first city and the second city is not
the same as the difference between the second city and the third city. Using the magazine rankings, the difference
between Baguio City and Tagaytay City cannot be quantitatively compared to the difference between Metro Manila and
Metro Cebu.
(3) The INTERVAL LEVEL OF MEASUREMENT is like the ordinal level, with the additional property that the
difference between any two data values is meaningful. However, there is no natural zero starting point (where none of
the quantity is present) and ratios of data values are not meaningful.
(4) The RATIO LEVEL OF MEASUREMENT is the highest level which is an interval level modified to include the
natural zero starting point (where zero indicates that none of the quantity is present). For values at this level, differences
and ratios are both meaningful.
stanceand
1776, between
1492. the two did
(Time values). However,
not begin in thethere
year is
0, no
so natural
the yearstarting point. The
0 is arbitrary value
instead of 00Fa might
of being naturalseem
zerolike a starting
starting point, but it is arbitrary and do
point.)
MATHEM4 4
Basic Statistics
The following are examples of data at the ratio level of measurement. Note the presence of the natural zero value, and
the use of meaningful ratios of “twice” and “three times.”
Exercise: The following data describe the different data associated with a state senator. For each data entry, indicate the
corresponding level of measurement.
Weights: Weights (in carats) of diamond engagement rings (0 does represent no weight, and 4 carats is twice
as heavy as 2 carats)
Prices: Prices of college textbooks (Php 0.00 does represent no cost, and a Php 900.00 book is three times as
costly as a Php 300.00 book
One of the goals of a statistical investigation is to explore the characteristics of a large group of items on the basis of a
few. Sometimes it is physically, economically, or for some other reason almost impossible to examine each item in a
group under study. In such situation the only recourse is to examine a sub-collection of items from this group. In
statistics we commonly use the terms population and sample.
Example 1:
Suppose an ornithologist is interested in investigating migration patterns of birds in the Northern Hemisphere. Then all
the birds in the Northern Hemisphere will represent the population of interest to him. His choice of the population
Definition:
A population (N) is the complete collection of all elements (scores, people, measurements, and so on) to be studied. The collection is co
restricts him, for it does not include birds that are native to Australia and do not migrate to the Northern Hemisphere.
Example 2:
Every ten years the Bureau of Census conducts a survey of the entire population of the Philippines accounting for every
person regarding sex, age, and other characteristics. The last such survey was carried out last 2000. In this case the entire
population of the Philippines is the population in the statistical sense.
The terms population and sample are relative. A collection that constitute a population in one context may well be a
sample in another context.
For instance, if we wish to learn how people in Greater Manila Area (GMA) feel about a certain national issue, then all
the residents of Greater Manila Area (GMA) would constitute the population of interest. However, assuming that Greater
Manila Area (GMA) represents a cross section of the Philippine population, if we use the response from these residents
to understand the feelings about the issue among all the Filipino residents, then the residents of Greater Manila Area
(GMA) would represent a sample.
When we do have specific objective, and we want to collect the data and do the analysis that will help us to meet that
objective, we typically get our data from two common sources: observational studies (such as polls) and experiments
(such as using a treatment to improve hair growth).
population) in a particular city, we do not have access to those persons who do not have a telephone.
among all men with cholesterol levels above a specified value; however short of sampling all men in the community, only those men who for some rea
to conduct a complete census of the population by collecting data for every study unit in it.
onceivable and impossible to investigate every crab individually. The only way to make any kind of educated guess about their behavior would be by e
. It would not be practical to test all the bulbs, because the bulbs that are tested will never reach the market. So we might pick 50 of these bulbs to test.
MATHEM4 6
Basic Statistics
In an observational study, we observe and measure specific characteristics, but we don’t attempt to modify the subjects
being studied. In an experiment, we apply some treatment and then proceed to observe its effect on the subjects.
SAMPLE SIZE
An important consideration in conducting research is the size of your sample. It must be large enough so that erratic
behavior of very small samples will not produce misleading results. Repetition of a research or an experiment is called
replication.
A large sample is not necessarily a good sample. Although it is important to have a sample that is sufficiently large, it is
more important to have a sample in which the elements have been chosen in an appropriate way, such as random
selection.
Use a sample size large enough so that we can see the true nature of any effects or phenomena, and obtain the sample
using an appropriate method, such as one based on randomness.
RANDOMIZATION
One of the worst mistakes is to collect data in a way that is inappropriate. We cannot overstress this very important
point:
Data carelessly collected may be so completely useless that no amount of statistical torturing can salvage them.
In a random sample members of the population are selected in such a way that each has an equal chance of being
selected. Sampling is a process or procedure which involves taking a part of a population, making observation on this
representatives and the generalizing the findings to the bigger population. (Ary, Jacob and Razavieh, 1981).
Probability Sampling – is a random sampling technique that each element in a population has an equal chance of being
selected.
Non-probability Sampling – is a non-random sampling technique that each element in a population has no equal chance
of being selected.
PROBABILITY SAMPLING
1.) Simple Random Sampling – entails all elements are given an equal chance of being included in the sample. No one
from the population is excluded from the pool. This is implemented if the population is homogenous.
2.) Systematic Sampling – This is a sampling after every regular interval. This can be undertaken if the features of the
population normally characterize what would be applied to simple random sampling. The key here is to select some
starting point and then select every kth (such as every 50th) element in the population.
I = N/n
3.) Stratified Sampling – entails subdividing the population according to a certain characteristic, then selecting the
samples from every subgroup or stratum. This is resorted to when it is important to get response per subgroup or stratum.
It is useful if there is a need to differentiate the characteristics of a heterogeneous population and the elements are
geographically concentrated in a given area.
4.) Cluster Sampling – entails random selection of groups in a population who could serve as the respondents of the
study. This is best applied if it deals with population with homogeneous characteristics but geographically dispersed in
different parts of the country. The population area is divided into sections (or clusters), then randomly select some of
those clusters and choose all the members or elements from those selected clusters.
MATHEM4 7
Basic Statistics
Determination of s Sample size (n) provided that the Population size (N) is known
N NZ 2 p (1 − p )
n≥ n≥
1 + Ne 2 Nd 2 + Z 2 p (1 − p )
N = Population Size
Z = value of the normal variable for a reliability level
n = sample size
Z = 1.645 (90% reliability in obtaining the sample size))
e = margin of error (0.10, 0.05, or 0.01)
Z = 1.96 (95% reliability in obtaining the sample size)
Z = 2.575 (99% reliability in obtaining the sample size)
p = 0.50 (proportion of getting a good sample)
(1 – p) = 0.50 (proportion of getting a poor sample)
d = 0.01, 0.025, 0.05, or 0.10 (choice of sampling error)
N = population size
n = sample size
NON-PROBABILITY SAMPLING
1) Accidental/Convenience Sampling – Simply use results that are readily available or accessible. Usually the first
person who comes along who typifies a unit of analysis serves as the respondent of the study.
2) Purposive Sampling – Implemented with the researcher defining a criterion or set of criteria for determining the
respondents of the study. It is the researcher’s judgment that becomes the basis for selecting an element or
group that will serve as the unit of analysis. It is useful in qualitative or exploratory studies. The objective is not
to have many respondents but to make sure that the person who would be interviewed will provide a wealth of
information. The aim is not to quantify but to characterize an event being studied.
3) Quota Sampling - Similar to stratified sampling except that the selection of the elements per stratum is done
through the application of random sampling strategy. Quota sampling entails grouping elements according to
certain characteristics and ensuring that each group is represented. Quota sampling is helpful if the sampling
frame is not available per group or stratum. It refines the application of convenience sampling since there is
conscious intent on the part of the researcher to view the probable differences of every stratum or group with
regard to the critical variables of the study.
4) Snowball or Referral Sampling – Involves having a respondent refers other people who are in a position to
answer some of the questions of the researcher. This is a particularly helpful in the study of highly sensitive
topics where the identity of respondents is difficult to divulge or may even be unknown to many. In other
MATHEM4 8
Basic Statistics
words, if the sampling frame cannot be provided and the topic has security implications, a researcher could
obtain referrals from the first respondent to the other respondents who may be willing to talk.
A sampling error is the difference between a sample result and the true population result; such an error results from
chance sample fluctuations.
A nonsampling error occurs when the sample data are incorrectly collected, recorded, or analyzed (such as by selecting a
biased sample, using a defective measurement instrument, or copying the data incorrectly).
Statistical data collected should be arranged in such a manner that will allow a reader to distinguish their essential
features. Depending on a type of information and the objectives of the person presenting the information, data may be
presented using one or a combination of three forms: TEXTUAL, TABULAR, and GRAPHICAL.
TEXTUAL FORM – The textual or paragraph form is utilized when the data to be presented are purely qualitative or
when very few numbers are involved. This method is, generally, not desirable when too many figures are involved as the
reader may fail to grasp the significance of certain quantitative relationships, but it becomes an effective device when the
objective is to call the reader’s attention to some data that require special emphasis.
Example:
s or time periods are chronologically arranged on the horizontal axis and the relevant values are indicated on the vertical axis. Variations in the data are
From a newspaper report, it was gathered that China has a population of 707 million, India has 505 million, US has
207 million, USSR (before the break-up) has 245 million, and Indonesia has 125 million. That more than half of the
world’s people, about 2.1 billion live in Asia, 456 million in Europe, 354 million in North America, 195 million in South
America, and 20 million in Oceana. Shanghai has 10,820,000; Tokyo has 8,841,000; New York has 7,895,000; and
Moscow has 7,050,000.
TABULAR FORM – A more effective device of presenting data because the data are presented in more concise and
systematic manner. People who want to make some comparisons and draw relationships usually find tabular arrangement
more convenient and understandable than the textual presentation. The data are presented through tables consisting of
vertical columns and horizontal rows with headings describing these rows and columns.
Example:
GRAPHICAL OR PICTORIAL FORM – Among the different methods of presenting data, the graph or chart is
perhaps the most effective device for attracting people’s attention. Readers who look for comparisons and trends may
skip statistical tables but may pause to examine graphs. Graph has a great advantage over tables because graph conveys
quantitative values and compares more readily than tables.
MATHEM4 9
Basic Statistics
5
4
3
2
1
5
4
3
2
1
P
Y
,0
9
5
e
o
2
1
0
8
3
9
7
5
9
0
5
0
4
3
2
1
9
8
7
6
5
0
a
p
0
r
u
l
a
t
i
o
n
MATHEM4 10
Basic Statistics
3
4
5
7
8
2
1
0
A
Cou
2
5
0
nt
v
e
r
a
g
e
N
u
m
b
e
r
o
f
C
i
g
a
r
e
t
t
e
S
m
o
k
e
P
e
r
D
a
y
MATHEM4 11
Basic Statistics
(B
7G
A
8
E
9
3
0
5
1
2
4
A
Count
e
M
1
vo
8x7
3
5
0
lc
o
ce
so
rd
-e
a
tw
la
d
8
o
lg
9
yA
r6
2
e
9
e
v
)n
m
e
7
t(A
ri
5
b
M
a
(o
c
sv-g
M
o
te
sl8
P
ty0
l)e
y
r
f
o
r
m
a
n
c
e
20 Gender/Sex
well as chronological comparisons may be shown graphically be means of a bar graph.
Female A graph essentially consists of bars or rectangles which are draw
Male
15
Count
10
0
Excellent Good or Above Average Below Average
(Mostly 93 - Average (Mostly 81 - (Mostly 75 - 80)
99) (Mostly 87 - 92) 86)
Academic Performance
MATHEM4 12
Basic Statistics
4
7
28
11
C
A
10
15
20
25
30ount
Below
(Mostly
Good
Average
Excellent
0
5 orAverage
75
Above
(Mostly
(Mostly
- 80) 87
81
93––92)
86)
99)
c
a
d
e
m
i
c
P
e
r
f
o
r
m
a
n
c
e
MATHEM4 13
Basic Statistics
Academic
ng the relative magnitudes of the component parts of a whole. It is constructed by dividing a circle (pie) into sectors, each sector having a size proportio
4
Performance
Excellent (Mostly 93
- 99)
11
Good or Above
7 Average (Mostly 87
- 92)
Average (Mostly 81
- 86)
Below Average
(Mostly 75 - 80)
28
MATHEM4 14
Basic Statistics
MATHEM4 15
Basic Statistics
30 Gender/Sex
total. Each bar, representing 100%, is subdivided so that the length of each part corresponds to the proportion or percentage of the total number of cas
Female
Male
25
20
Count
15
10
0
Excellent (Mostly Good or Above Average (Mostly Below Average
93 - 99) Average (Mostly 81 - 86) (Mostly 75 - 80)
87 - 92)
Academic Performance
Gender/Sex
Female
Male
Below Average
(Mostly 75 - 80)
Academic Performance
Average (Mostly 81 -
86)
Good or Above
Average (Mostly 87 -
92)
Excellent (Mostly 93 -
99)
0 5 10 15 20 25 30
Count
MATHEM4 16
Basic Statistics
for attracting attention since it employs pictures or symbols which are normally drawn of the same size and in rows. Large figures are generally shown
MATHEM4 17
Basic Statistics
Statistical Maps – Statistical maps are used to present quantitative data which describe or classify geographical areas.
MATHEM4 18
Basic Statistics
Frequency Distribution – A tabular arrangement of data showing its classification or grouping according to magnitude or
size.
Frequency Distribution
Table
When working with large data sets, it is generally helpful to organize and summarize the data by constructing a
frequency table.
Components of a Frequency Distribution
of values, along with frequencies (or counts) of the number of values that fall into each class. The frequency for a particular class is the
of a class. It is the highest (upper limit) and the lowest (lower limit) values that can go into each class.
mallest numbers that can belong to the different classes
rgest numbers that can belong to the different classes.
TS) – “true” class limits defined by lower and upper boundaries. Class boundaries are numbers used to separate classes, but without the g
– the average of the lower and upper limits or boundaries of each class.
e of values used in defining a class. Simply the length or width of a class. It is the difference between two consecutive lower class limits o
MATHEM4 20
Basic Statistics
Class
or class limits. interval=RangeTentative
Round number
up to use fewer decimal of classes
places or use numbers relevant to the situation.
Sturges Rule
ses. Some instructors : Kuse
made = 1 of
+ 3.322 log n Rule in determining the number of classes.
the Sturges
the arrayed data into appropriate classes using convenient and easy to read class limits. Start the first class with a lower limit either equal
p the class boundaries if necessary
nt or tally the number of observations into the appropriate class intervals.
MATHEM4 21
Basic Statistics
Example:
In “Ages of Oscar-winning Best Actors and Actresses” (Mathematics Teacher magazine) by Richard Brown and
Gretchen Davis, presents the results for recent winners from each category.
Actors: 32 37 36 32 51 53 33 61 35 45 55 69
76 37 42 40 32 60 38 56 48 48 40 43
62 43 42 44 41 56 39 46 31 47 45 60
Actresses: 50 44 35 80 26 28 41 21 61 38 49 33
74 30 33 41 31 35 41 42 37 26 34 34
35 26 61 60 34 24 30 37 31 27 39 34
dard frequency table is used when cumulative totals are desired. The cumulative frequency for a class is the sum of the frequencies for th
||| 3 61–65
|||||-|| 7 66–
70 0
71–75 | 1
76–80 || 2
---------------
n = 72