Professional Documents
Culture Documents
By
A MODULE
For Exclusive Use of Graduate Students
1
TABLE OF CONTENTS
Title Page
Bibliography
Appendix
2
Module 1. Definition and Function of Statistics
sample size
measures.
Statistics in plural from refers to any kind of data -both qualitative and quantitative
data; in singular form, it is a branch of knowledge which deals with the processes collection,
presentation, analysis and interpretation of data obtained by the conduct of survey and
experiments. You will study statistics as a branch of science methodology. As such, its
essential purpose is to describe and draw inferences about the numerical properties of
Collection. Data can be collected with the use of an interview - formal or casual. A
formal interview entails preparation of guide questions and/ or approved schedule for the
interviewees. Casual interview is done informally without guide questions. Data can also be
collected with the use of survey questionnaire, standard test or researcher-made tests, use of
3
Presentation of Data. Data can be presented in textual, tabular and graphical forms.
For example, a researcher was able to gather the following data on the expenditures at ABC
P2,117,680.75; 1998, P1,986,5921; 1997, P1,876,458; and P974,697 in 1996, These can be
Textual
For the past five years, ABC Company has a total expenditure of P2,416,025 in 2010;
Tabular
Table 1. Total Expenditures
of ABC Co. from 2010- 2014
______________________________
Year Expenditure
______________________________
2010 P2,416,025.00
2011 2,117,680.75
2012 1,986,592.00
2013 1,876,458.00
2014 974,697.00
______________________________
Graphical
a. Polygon (line)
b. Histogram (bar)
4
Note that statistical data are frequently arranged and presented in the form
of tables. These tables are designed to enable the readers to grasp with minimal effort the
information intended to convey. In constructing tables for insertion in term papers, theses, or
2. the title should be precise, stating clearly what the table is all about
sequence
enable us to think about a problem in visual terms as a geometric image of a set of data. One
is histogram which is presented in the form of bars. Another is a polygon which is in the form
of lines. Using the histogram or polygon, the reader will easily visualize intervals, and
comparisons can easily be made important in the analysis and interpretation of data. Questions
are answered by graphics such as: What is the increase….improvement….? Later, you will
5
Activity 1.
Enumerate at least five of them. Collect one type of data and present this using any
B. The following are a firm’s quarterly sales in thousands of pesos in each of the three major
markets, A, B and C.
Market
A B C
First Quarter 225 110 350
Second Quarter 175 180 290
Third Quarter 120 100 425
Fourth Quarter 210 220 510
statistics, the term generally refers to defined groups or aggregates of people, animals, objects,
materials, measurements, or events of any kind. The statistician’s concern is with properties
which are descriptive of the group. The entire group is called population. For example, we
refer to a population of children with IQ of 90 and above; factories involved in the production
In research, the use of the entire population is not practical. It is time-consuming and
expensive, thus, we resort to studying a sample of this population. In other words, we only
select a small group for actual testing called sample. This sample is small but is
representative of the population. By representative, we mean that the sample should bear the
the population is characterized to be 5’6” tall and above, both male and female, aggressive,
6
and 50 to 65 years of age—the selected sample must contain all those characteristics.
Selecting the members that consist the sample employs one of the following sampling
techniques.
Probability Sampling
This technique means that every member of the population has the chance to be
selected or chosen without any influence from one member to another. Sampling procedures
Random Sampling. This sampling technique may use the following procedures:
a) lottery - names of all members are put inside a box. The researcher just
b) use of the Table of Random Numbers - The researcher may want to select
members picked from the table from 0001 to 8,500 will comprise the
sample.
procedures as basis for choosing the sample. Say, if 10 will be chosen from
100 ÷ 10 = 10
From the result, get every 10th member. The sample of 10 are: 10, 20, 30,
40, 50, 60, 70, 80, 90, and 100. The procedure is to provide the population
7
d) Stratified Random Sampling - is used to pick sample members when the
the attitude of its clients on the benefits given by the company. The
company decides to pick a sample of 200 from 2,000 clients, but the clients
vary from one other in terms of age and monthly or yearly premium paid. It
is appropriate to subdivide the clients into subgroups or strata and then for
Population Sample
Other sampling procedures may be applied such as cluster sampling as the successive
sampling of units. The same procedures as stratified sampling are used depending upon the
characteristics of the population and the purpose of the study. Finally, in all these procedures,
the researcher uses the simple random sampling after all subgroups and clusters or units are
made.
8
Non-probability Sampling
This technique does not use random sampling. It is weak, yet for special purpose, it
can be used with care in selecting samples by “replicating studies with different sample”
(Kerlinger, 1986:119).
a) Quota Sampling – uses knowledge of strata of the population such as sex, race,
region, and so on as basis for selecting the sample members. Each of the strata is
Sample Size
Every researcher may ask himself a question like, “What is the right number of sample size?”
A rough-and-ready rule is “Use as large samples as possible.” Another question that must be
asked is. How much error is likely to be calculated given the sample size? Below is a figure
that shows the relationship between sample size and error, the deviation from population
values. The curve tells, the smaller the sample the larger the errors, and the larger the sample
9
Large
Error
Small
Small Large
Size of Sample
and ideally it must provide the whole of the information about the population from which the
sample has been drawn. A sample to be representative of the population should consider two
important aspects:
Population Characteristics
The first can be answered by sampling techniques. The second is in choosing the sample size.
The research should consider the following observations in sampling (Srivastava; 1994 : 52-
53).
1. The larger the sample, the smaller the magnitude or sampling error.
3. When sample groups are to be subdivided into smaller groups, the researcher
should select large enough sample so that the subgroups are of adequate size for
the purpose.
10
4. Subject availability and cost factors are legitimate consideration in determining
sample size.
the following formula (Sloven, 1960) may be used considering a predetermined sampling
N
n= --------------- where
2
1 + Ne
n = Sampling
N = Population
e = Sampling error
Activity 2.
N n for a n for b
1. 500 workers of GSIS ___ _____
Sampling Procedures:
a. using systematic sampling
B. From the above data, how would you classify them into subgroups?
11
Lesson 3. Measurement of Variables
A variable refers to a property whereby the members of a group or set differ from one
another . Members of a group may differ in sex, age, performance, height. These are variables.
Variables are labeled with the use of numbers. These values are called variates. For example,
Sex Value
Male 2
Female 1
Age Value
These values are important in statistics because these correspond to the scores that are
measured.
Classification of Variables
Variable can be classified as dependent and independent variables. Consider the expression y
Let’s consider two variables with functional relationship such as age and performance. Here,
performance is dependent on age. Therefore, age is the independent variable and performance
is the dependent variable. Give your own examples of pairs of variables which have functional
relationships.
12
Another classification is continuous and discrete variable. A continuous variable may
take any value within a defined range of values. Example of continuous variables are height
and weight. A discrete variable can take specific value only. For example, the size of a family,
Examples:
Color and brand of cars are nominal variables because you can only make statement of
“same brand, and different color / brand. You cannot say, blue is greater than red.
Height is ordinal because people in the group can be ranked from lowest to tallest’.
Temperature and age are interval variables. They don’t have true zero point. All other
variables that have true zero point such as achievement, length and weight are ratio
variables.
Statistical methods exist for the analysis of data composed of nominal, ordinal,
interval, and ratio variables. Procedures for statistical test used are important to decide on the
kind of measurement the variable has. Variables are the stuff of which statistics is made. You
13
will understand better the variables when you’re studying inferential statistics – when
Measuring Variables
Some variables are easy to measure. For example, age in years. So we say that the two
samples are 15 and 12 years of age. In research, there are variables which need to be described
or defined by looking into some indicators before one can measure it. Consider, for instance,
productivity. What indicators will you use to measure productivity? or how will you define
productivity? You may consider productivity in terms of the number of boxes/items sold
during the day. In this case, you can measure it using the following quantitative label/code
and description:
1–5 1 Poor
6 – 10 2 Fair
11 – 15 3 Good
16 – 20 4 Very Good
21 and Above 5 Excellent
Activity 3
1. age
2. color of the eyes
3. number of residents in a community
4. number of employees
5. type of building
14
B. Identify whether the following variables are nominal, ordinal, interval and
1. Race 6. Efficiency
2. Religion 7. Library collection
3. intelligence Quotients 8. Managerial skill
4. productivity 9. attitude
5. performance 10. Weight
C. Choose pairs of variables with functional relationships, label them and then
E. Write down 10 variables in your own discipline and classify these according to
type and identify how you will measure each one.
15
MODULE 2. DESCRIPTIVE STATISTICS
Objectives: At the end of this module, the students should be able to:
polygon, and ogive to show the kind of distributions the data have.
3. Compare and contrast the different measure of location, dispersion, skewness, and
The most commonly used form of notation in statistics is summation notation. A set of
values, measurements or observations is denoted by x1, x2, x3,…xn where n denotes the total
number of variables represented by x. So, we say that the summation of xi observations where
n = 5 is
x1 + x2 + x3 + x4 + x5 = Exi = Exi
Theorem 1. If every variate value in a group is multiplied by a constant number or factor, that
factor may be removed from under the summation sign and written outside as a factor. Thus,
Ec = c + c + … +c = Nc
16
Theorem 3. The summation of the sum of any number of terms is the sum of the summation of
E (xi + yi + zi) = E xi + E yi + E zi
Activity 1. Given
x1 = 5 y1 = 2
x2 = 6 y2 = 3
x3 = 12 y3 = 7
x4 = 15 y4 = 10
Find
1. E xi
2. E yi
3. E (xi + yi)
4. E (3xi + yi)
5. (xi – yi)
of occurrences of values, especially for voluminous data. This arrangement shows how many
times each score occurs so that it is advisable to reduce the number by any desired number of
classes representing the individual scores. Some values can be grouped together
at a given class interval. Consider the following set of data (IQ scores of 45 applicants):
91 88 106 115 96 90
95 105 103 96 97 87
101 100 89 90 96 84
99 86 90 84 100 112
105 94 90 89 98
91 90 92 88 104
84 89 107 91 100
17
Steps in making the frequency table:
2. Find the range by getting the difference between the highest and lowest values.
Range = 115 – 84 = 31
3. Divide the range by any desired interval (2,3,5,7,…). The answer should be equal
to or greater than 9 but not more than 20. 31 divided by 3 is approximately 10,
4. Now, what number is nearest the highest value, 115, that is divisible by 3? The
answer is 114. Therefore, the first group or class starts with a class interval 114-
116 and the last group should contain the lowest score (in this example, it is 84).
5. Make the class interval and complete the needed information’s you want to include
in the table such as the frequency, midpoint, limits (lower and higher limits), and
18
Many observations can be made for the distribution. The midpoint is the median of
every class interval. This midpoint can be used to name the whole group such as when you’re
making the graph. Cumulative frequency less than or greater than can be made by adding all
frequencies from below and above respectively. Percentages can also be computed for
cumulative frequencies by dividing the corresponding frequency (i.e. 36/45 x 100) by the total
A histogram and ogive (graph for cum f< and cum f>) can be made as follows
10
Midpoint
(A histogram)
19
45
40
35
30
25
20
15
10
Midpoint
(An Ogive)
Activity 2
20
A. For the following ungrouped or new data, make the frequency table with
appropriate interval and then draw its graphical representations such as the histogram and
ogive.
2. What per cent of applicants are above average with IQ score of 96 and
above?
3. If you decide to pass a score of 95 and above, how many of the applicants
Date:
After the collection of data from a common source, individual observations are not
likely to have the same value. It is impractical to keep all the values in mind. What we need is
single value that we may consider typical of the set of data as a whole. This single value is one
of the three most common measure called the measure of location or central tendency: the
21
Mode: The mode is the most frequently appearing value. In the measurement above,
the mode is 30 appears three times. In same sets of data, only one mode is present-
unimodal; if three are two – bimodal, if there are three or more multimodal.
Median. The median is the middle value of an ordered array when the items are
Where n = 9
Median = (9+1)/2 = 10/2 = 5th No. (counted either from right or from
left)
Median = 18
Median = 8/2 = 4th (counted from left and from right – get the average)
Counted from left, 4th number is 12; counted from right The number is 15
22
Properties of the Median
2. The median is not after affected by extreme values whereas the mean is.
3. The median can be used to characterize qualification data – e.g. quality categories
such as
1 good
2 better
3 best
Mean: The most familiar measure of central tendency is the mean, sometimes called
Arithmetic mean or average. We find it by adding the values in a set of data and dividing the
For example:
x (Mean) = Ex
x (Mean) = 102/6 = 7
1. For a given set of data, there is one and only one mean.
2. Since every values goes into its computation, it is affected by the magnitude of
each value.
Give the following frequency curves, the mode, median and mean are located
23
Symmetrical
Median
Mean
Mode
Mean Mode
Median
Mode Mean
Median
24
A comparison of the mean, median and mode may be made when al these have been
calculated for the same frequency distribution. In figure 1, the distribution is symmetrical. The
mean, median and mode coincide. In Figure 2, the distribution is negatively skewed. The
mean is less than the median and also less than the mode. In figure 3, the distribution is
positively skewed where the mean is greater than the median and mode.
Activity 3
1. In the following sets of data, find the mode, median and mean
2. Locate the mode, median and mean for sets A and C using a frequency curve.
A 9 13 15 18 20
B 2 8 10 27 28
The two sample above have the same mean, 15, however, by inspection, variation in sample B
is greater than the mean in sample A. Among the various measure to describe variation are the
25
1. range
2. mean deviation
3. standard deviation
4. variance
Range. The range is the simplest measure of variation. It is taken as the difference between
Mean Deviation. The mean deviation is the sum of the absolute deviation of every
measurement from the mean divided by the number of observations. Absolute value means
Sample B: 1, 4, 7, 10, 13
variation. Computing the mean deviation (MD) for sample A and B, we have,
Sample B Sample C
1 -6 6 1 -15 15
4 -3 3 5 -11 11
7 0 0 20 4 4
10 3 3 25 9 9
13 6 6 29 13 13
26
MD=E/x-x/ MD=E/x-x/
_____ _____
n n
= 18/5 =52/5
= 3.6 = 10.4
From the result above, the mean deviation for sample C is 10.4, that means that measurements
deviate by 10.4 from the mean. Sample B deviates by only 3.6 from the sample mean.
Therefore, we can say that value from Sample C are more variable than those in B.
In our computation of mean deviations, some deviations from the mean are
positive and others are negative. To do away with the sum of the deviations equal to O,
we get the absolute value of deviations. In the case of variance and standard
deviations, we square the deviations to do away with negative values. Thus, in our
2
X x-x (x-x)
___________________________
1 -6 36
4 -3 9
7 0 0
10 3 9
13 6 36
X = 35 = 7
2 2
S (variance) = E (x-x)/n or E (x-x) /n = 90/4
= 90/5 = 18 = 22.5
27
S (standard deviation) = E ( x – x)/n = 90/5 = 4.2 (with square root sign)
2
Or = E (x – x) /n-1= 90/4 = 4.7 (with sq root sign)
The formula above uses N and N – 1. What is the best way to use? Remember that we want
an unbiased estimate of the population variance and by decreasing it by 1, we are sure that at
least one of its values is equal to the mean. So, for unbiased estimate we always use the
We noted from our computations that when all the values in a set of data are located
near their mean, there is a small amount of variation or dispersion. And the set of data which
some values located far from their mean have a large amount of dispersion. Expressing these
relationships in terms of standard deviation, we say that the standard deviation is small when
the values are concentrated near the mean. When it is large, the values are dispersed widely
28
Activity 4
A. The following table gives the details of an investment consisting of the prices of
29
Lesson 5. Measures of Skewness and Kurtosis
Figure 1
Figure 2
Figure 3
30
SET B. Kurtosis of a Distribution
Figure 4
Figure 5
Figure 6
31
Skewness refers to the symmetry or asymmetry of the frequency distribution. Figure
1 is symmetrical, that half of its frequency falls at the left of the mean and another half at the
-3 -2 -1 0 1 2 3
Figure 2 is negatively skewed, that is, more frequencies are concentrated at the right
end of the mean. Figure 3 is positively skewed, that is more frequencies are found at the left
end.
Kurtosis refers to the flatness or peakedness of the distribution. One kind is normally
distributed called mesokurtic (see Figure 4). Another kind is when most frequencies are
evenly distributed from left end to right, called platykurtie, flat (see Figure 5). The third is
when frequencies are more concentrated at the middle. It is more peaked, called leptokurtic
32
To find out the symmetry and Kurtosis of a given distribution, we use the concept of
moments. The term “moments” originates in mechanics (Ferguson, 1982 ; 72). The first four
m1 = E (x – x) = 0
2 2
m2 = E (x – x) = N – 1 (s)
3
m3 = E (x – x)
4
m4 = E (x –x)
N
g1 = _________ . where
M2 m2
g1 = Skewness
m3 = third moment
m2 = Second moment
g2 = m4 -3 , where
2
(m2)
g2 = kurtosis
m4 = fourth moment
m2 = second moment
33
If g1= 0 and m3 = 0, the distribution is symmetrical.
if g2 = 0, it is a normal distribution
if g2 < 0, it is platykurtic.
A: 6 8 10 12 14
2 3 4
x x-x (x–x) (x–x) (x – x)
6 -4 16 -64 256
8 -2 4 -8 16
10 0 0 0 0
12 2 4 -8 16
14 4 16 64 256
m3 3
x = 50/5 g1 = m3 = E (x – x)
m2 m2
N
= 10 = 0 = 0
5
8 8
= 0
2
m2 = E ( x – x )/n
= 40/5
= 8
34
g1 = 0, symmetrical
m4 4
g2 = 2 -3 m4 = E (x-x)
m2 N
108.8 = 544
g2 = 2 -3 5
8
= 108.0
= 108.8 -3
64
= 1.7 - 3
g2 = - 1.3 , platykurtic.
Activity 5
SET A.
9, 12, 18, 24, 6, 12, 10, 24, 28, 15, 36, 40, 27, 9, 10, 10,
SET B:
B. Decide whether the following are flat or peaked by computing the kurtosis
48, 56, 38, 20, 76, 84, 29, 37, 35, 58, 60, 64, 78, 100, 96
84, 80, 90, 90, 92, 78, 86, 72, 59, 60, 59, 54
75, 35, 45
35
Module 3. Inferential Statistics
Objectives: At the end of the modules the students should be able to:
a. test of relationship
b. test of proportions
hypothesis.
population parameters using sample data. The purpose of hypothesis testing is to help one
reach a decision about a population by examining the data contained in a sample from that
population.
difference. For example, the statement that 60% of the employees in a firm have had
one two years of college. The null hypothesis is a statement that is 60 %, symbolically
Ho : p = 0.60 (null)
H1 : p = # 0.60 (alternate)
P > 0.60
36
P < 0.60
Another is when a researcher wants to find out whether the male workers have a higher work
performance (mean performance = 96) than female worker ( x = 84). The null hypothesis.
Ho : x1 = x2 Where
x1 = mean of males
x2 = mean of females
2. Identify what test statistic is appropriate to use to test the null hypothesis.
tc > ttab ; r, F, x
To accept: The value of test statistic should be lower than the tabular
tc < ttab ; r, F, x
Assumed tabular
Value (2.47)
assumed
computed value
(3.29)
-3 -2 -1 0 1 2 3
37
The computed test should be found in the region or rejection or since depending on the
value (computed vs tabular). In the above figure, mull hypothesis is rejected. Accept the
5. Decided the level of significant or acceptance. In most cases, the 5% = 0.05 and
1% = 0.01 levels are used, for both two-tailed and one-tailed tests. When do you decide
predetermined bias against one group or the research has no idea that one group is better than
the other (this is when testing differences between two groups); this is also called non-
Ho : x1 = x2
Ho : x1 = x2 or x1 # x2
x < x2
1-x = .95
x = .05
x = 0.025
region of Region of 2
rejection acceptance region of
rejection
-1.96 1.96
0.005 0.005
-2.58 2.58
38
One-tailed: one directional test has a predetermined bias against one group.
Form example, the researcher wants to test if boys have a higher performance in
Ho : Xboys = Xgirls
Note that these values at 0.05 and .01 level vary from statistics to another.
6. Reject or accept the null hypothesis given the degrees of freedom (df) and level of
significance.
7. Make your conclusion. Conclusions depend upon your decision of the null hypothesis.
When you reject the null hypothesis, your conclusion is based on the alternate
hypothesis, the accepted hypothesis. For example, if you rejected the null hypothesis:
Ho : x1 = x2
H1 : x1 # x2
Then you accept the alternate hypothesis (H 1) that means you conclusion is that there is a
39
Activity 1.
1. The quality control department of a food processing firm found out that the
mean net weight per package of cereal must not be less than 30 ounces. In a
2. One psychologist found out that the mean sociability index of sales
B. State some inferences that you know and then transform these into null form (using
symbols or descriptions).
Our concern here is with the problem of describing the degree of magnitude of the relation
between two variables. The statistics used is called correlation coefficient using the measure
type of variables.
Researchers may want to find out what relation exits between attitude
(x) and work performance (y) of employees. Two paired measurements can be computed
Xi Yi
X1 y1
X2 y2
X3 y4
X4 y4
X5 y5
40
The formula is defined as
N E NY - EX EY
r =
2 2 2 2
[ n EX -(EX)}] [n E y – (Ey) ]
The degree of relationship between x and y may take from –1 to +1. There is a
y y
x x
Perfect + relation (r = 1) Perfect + relation (r = 1)
y y
x x
r is between –1 and 1 r=0
Positive relationship between attitude (x) and make performance (y) can be stated and
interpreted as
41
a) The more positive attitude a worker has, the better is his
performance (r is positive)
X y
+ + Positive r (High)
- -
+ -
- + Negative r
x wage
y cost of living
42
d. Exy or 832
8 (83.2) – 89 (57)
r =
[8 (1365) – (89) 2] (529) – (57)2]
Therefore, the relationship is: the higher the salary of on employees, the higher his
cost of living is, and the lower the salary, the lower is the cost of living.
When r is squared,
2 2
r = (.91)
This result can also be interpreted as 83% is influenced by wage and 17% is attributable to
Prediction
When two variables are highly correlated, prediction of one variable is possible from
a knowledge of the other. The presence of a nonzero correlation ( r ) between x and y implies
that if we know something about x, we know something about y and vice versa. If knowing x
implies some knowledge of y, a prediction of y form x is possible. The greater the value of
correlation between x and y, the more accurate the prediction of one variable form the other.
43
y = bx + a where
c = constant
AC
slop = -----
BC
A
y
y’ = predicted value of y
a = constant
N Σ X - (ΣX)
ayx = Ey – byx EX
x’ = predicted value of x
b = slope
44
bxy = N Exy - ExEy
N Ey – (Ey)
8 (1356_) – (89)
N = 57 –47.17
= 10.60 + 1.23
= 11.83 or 12
Involves multiple correlations (R). This involves tedious process of computation so that the
computer can do it for you. The last chapter will give you the output, the correlation matrix
Note that for every pair of variables, one variable is an independent variable and the
other is the dependent variable – the variable that is influenced by the other. Our example
about wage (x) and cost of living (y), y is the one influenced by wage, that is amount we
To test if the degree of relationship is significant, compare the computed r with the
tabular r given N – 1 degrees of freedom at .05 or .01 levels. For example, if r = 0.91 with df =
45
7, we compare the tabular r (.05) and r (.01) = 0.798. Since r computed = 0.91 > r tabular (or
Ho : rc = rt
H1 : rc = rt
Our conclusion is the there is a significant relationship between wage and cost of living. We
can strongly say that cost of living is significantly influenced by the amount we earn.
Activity 2
A. Infer the relationships between pairs of variables as follows. Identity the independent
manufacturing expenses.
2. A firm wants to find out if sales volume is related to effective buying income.
3. A market analyst wants to find out if there exists a relation between traffic and
46
B. Compute the correlations of age and efficiency ratings of 20 assembly line employees.
X Y
44 61
44 41
45 89
43 76
40 79
52 67
43 73
46 94
53 96
43 77
51 60
50 78
61 74
47 82
62 70
34 70
51 60
48 67
51 72
57 80
C. From the result in B, predict the efficiency ratings for each of the following ages:
29 __________
70 __________
24 __________
47
Lesson 3. Test of Independence and Proportions
In any research situations, we may wish to compare a set of observed frequencies with
a set of theoretical frequencies. In such situations, the chi-square (x) is used and is defined by
2
2 (o-e)
X = _______ where,
e
0 observed frequency
e expected frequency
Activity 3
B. A certain business journal showed that 30% of Makati Investors are Chinese, 12%
Chines 56
American 41
Japanese 60
Filipino 25
Others 18
Total = 200
C. The following contingency table shows a relation between pass and fail on ratings
of job performance of 100 employees. Test the hypothesis that job performance is
48
Rating
__________________________________________
Below Average: Average: Above Average: Total
__________________________________________
Pass : 11 25 35 71
Fail : 15 7 7 29
_____________________________________________
Total 26 32 42 100
distributed to 200 households. After its use, they were asked which brand they
Computation:
2
E 0 o-e (o-e) (o-e) /e
__________________________________
49
Since the computed x = 2.88 is less than the tabular value at .05 level with df=1, then
the null hypothesis is accepted. It can be concluded that no difference exists in the
Test of Independence
In test of independence, two variables are involved. These are usually nominal
variables. The question is whether the two variables are independent of each other. The
Variable : A1 : A2 : Total
________________________________________________
B1 20 10 30
B2 30 40 70
_________________________________________________
Total : 50 50 100
For example, 200 males and females were asked if they were smoking. The results show the
following table.
Response
Sex : yes no Total
____________________________________
M 96 (66.6) 25 (53.4) 120
F 15 (44.4) 65 (35.6) 80
_____________________________________
Total 11 89 200
50
Computing for the chi-square (x), we have to compute the expected frequencies as follows:
2 2
O E O-E (O-E) (O-E)/E
df=1. The null hypothesis is rejected. Therefore, it can be safety concluded that smoking is
associated or dependent of sex; that is, males tend to smoke more than girls.
Two groups with N1 and N2 cases are independent with x1 and x2 are assumed to be
drawn from normally distributed population with equal variances. If these assumptions are
warranted, then sample means can be tested for significant differences between means defined
51
X1 - x2
t = ------------- where,
Sx1-x2
Steps:
1. Compute the respective means for the two groups and then get their difference.
2 2
2. Compute the variances and add together these variances (S1 + S2) to obtain the
2
pooled variance (S). The formula is given by
2 2
2 E (X1-X2 ) + E (X2-X2)
S = ____________________________
N1+N2-2
2
3. With the known pooled variance ( S ), compute the standard error of the difference
between means
S = ) 2 2
X1 - X2
t = --------------------
S
X1 - X2
52
The null hypothesis being tested here is that there is no significant difference
Ho: X1 - X2 = 0 or X1 - X2
The degree of freedom (df) with this test is N1+N2-2 tested for significance at .
05 or .01 level.
Example 1. A Sociologist wants to find out if in two government agencies the employee differ
in their perception on the positive effects of liberalized or global economy brought about by
the APEC meeting. A test was administered on their degree of perception. The table below
Sample A Sample B
30 30
30 28
36 26
20 45
42 20
38 15
25 21
27 27
48 20
40 32
32
30
21
___________________________
53
Ho : xA = XB
H1 : xA = XB
Df = N1 + N2 - = 10 + 12 – 2 = 20
Computation:
1. XA = 330/10 = 33
2. 2 2
A (x-xA) B (x – xB)
24 81 30 14.06
30 9 28 3.06
36 9 26 0.06
20 169 45 351.56
42 81 20 39.06
38 25 15 126.56
25 64 21 27.56
27 36 27 0.56
48 225 20 39.06
40 40 32 33.06
30 14.06
21 27.56
_______________________________________________
2
3. S = 748 + 676.22
___________
10 + 12 - 2
= 1424.22/20
= 71.21
4. S = ) 71.21/10 + 71.21/12
x1 - x2
= ) 7.12 + 1.869
= 3.612
5. t = 33 -26.25/3.612 = 1.869
tc (.05) with df = 20 2.086 > t = 1.869
54
Decision: Accept the null hypothesis since t-computed is less than t-tabular (1.869 < 2.086)
t-test for correlated groups applies to interval-ration scores from the same sample or group
who are exposed under two different conditions. These condition may be:
b. Developing parallel test to see reliability of the test through time or validity
__________________________________
01 ---------> X ---------> 02
02 – 01 = difference
__________________________________
The null hypothesis being tested here is that there are no significant gain or
improvement after a treatment is given. This treatment can be time, a certain new drug,
new program, training, or any strategy to affect or influence attitude or any physical
characteristic in the sample. The post-test score will show this change. If found that the
55
Thus, the null hypothesis is stated:
Ho: x1 = x2 or x1 – x2 = D = 0
Hi : x1≠x2 or x1 – x2 = D = 0
t= ED or
NED - (ED) / N – 1
2 2 2
Se = Sd/N and SD = E (D –D) / N – 1
Example 2. A company manager wants to find out if work efficiency can be improved
efficiency and pre-tested this to 20 rank-in-file employees and then after the TQM was
applied. After six months, he administered the same test to see if there is a significant
56
2
Pre-test Post-test D D
80 90 10 100
86 88 2 4
78 90 12 144
80 86 6 36
90 92 2 4
92 95 3 9
87 88 1 1
75 80 5 25
78 80 2 4
80 84 4 16
84 80 -4 16
84 80 -4 16
92 90 -2 4
90 85 -5 25
87 84 -3 9
80 86 6 36
82 80 -2 4
85 80 -5 25
84 89 5 25
___________________________________
ED = 31 507
31
t = ____________
20 (507 – (31) / 20 –1
Conclusion: there are no significant differences in the pre-test and post-test scores, that
means, the TQM did not influence significant change in the work performance of
employees.
57
Analysis of Variance: One-Way and Two-Way
data into different parts. It is used to test the significant of differences between means
being comprised of n members. We may wish to test the effectiveness of the effects of
management styles. The problem of testing the significant of the differences between a
A. One-way classification
Environmental Conditions
NA NB NC
Ho: XA = xB = xC
H1: XA = XB = XC
58
B. Two-way classification
Environmental Conditions
59
Data Analysis
Data for one-way ANOVA is analyzed with the use of F-test for k-groups equal of or
greater than three (3). A summary table for analysis is made such as the following:
For two-way classification, the F-test is also used and analysis is made with the following
summary table:
60
The null hypotheses tested for two-way ANOVA consist of the following:
A. No Interaction Effects
Methods
1 2 3 Age
80
∙ A
60 ∙ ∙
∙
40 ∙ ∙ B
20 ∙ ∙ ∙ C
B. No Interaction Effects
61
Methods
1 2 3
80
∙
60 ∙ ∙
∙
40 ∙ High
∙
∙ ∙ Average
20
∙ Low
0
When there are significant differences found between groups in one-way ANOVA, say K=4,
then there is a need to find out which pairs of group means are significant. Thus, a posteriori
may be used:
______________________________________
Groups
1 2 3 4
X13 X14
X24
________________________________________
Comparison F
________________________________________
1,2
1,3
1,4
2,3
2,4
3,4
_______________________________________
62
Using the Scheffe’ Method, the following steps should be done:
1. Calculate the F-ratio between of means using the within-group variance estimate, Sw.
2
(X1 – X2)
F = _______________
2 2
Sw / n1 + Sw / n2
2. Consult the table of F and obtain the value of F required at .05 or .01 level, for df1 = k –
and df2 = n – k.
4. Compare the value of F and F’. To be significant at any required level, F must be greater
Example 3. Score in productivity for three groups of salesmen exposed to three different
Group
_______________________________________
A B C
5 9 1
7 11 3
6 8 4
3 7 5
9 7 1
7 4
4
2
______________________________________
n 8 5 6 N=9
Ti 43 43 18 T = 104
2
Xi 5.38 8.40 3.00 T /N =560.26
2
Exij 269 364 68 EE xij = 701
2
2 T
63
Ti/nj 231.13 352.80 54.00
E --- = 637.93
N
_______________________________________________________
Sum of Square
______________________________________________
Between 637.93 - 569.26 = 68.67
Within 701 - 637.93 = 63.07
______________________________________________
Total 701 - 569.26 = 131.74
The null hypothesis tested that there are no significant differences between
groups of salesman exposed to three different methods of training is rejected since the
computed F-ratio=8.716 is greater than the tabular F-ratio=6.23 at .01 level with df = 2/16.
Therefore, it can be concluded that expose to the different training program has affected
significantly the productively of salesman – that is Group B got the highest mean, followed by
To find out which of the pairs had significant different, the Scheffe Method is
used:
64
Table for Comparison of Pairs
Using Scheffe’ Method
______________________________
Pairs Fair
______________________________
*
1,2 7.18
1,3 2.07 ns
*
2,3 3.75
________________________________
*
p < .05
The required tabular F (.05) = 3.63 with df1 = 16. Therefore, the only pairs whose F is found
higher than 3.63 are pairs 1,2 and 2,3. This shows that no significant different exists between
Groups 1 and 3 but significant differences are found in Groups 1 and 2 and Groups 2 and 3.
This is also true with three-way classification with three independent variable being analyzed
65
at the same time. In so far as computations are concerned, it is recommended that
computations be done through the use of the computer using the SPSS (Statistical Package for
the Social Sciences). The next chapter will give you knowledge about interpreting and
analyzing computer printouts using the different test statistics useful in research.
Objectives: at the end of this module, the students should be able to:
1. program data collected from surveys and experiments ready for encoding in the
computer.
2. Interpret results of computer printouts for both descriptive and inferential test
statistics.
You have learned the different types of variables—how to define them in order to be
encoding in the computer are very important. The following tables show the program for
1. Two what extent is the acceptability of the new training program of selected
2. Are there significant differences in the level of acceptability when respondents are
a. sex
b. age
c. educational attainment
d. position
66
3. Are there significant relationship between employees’ acceptability level and the
variable above?
Steps:
1. Identify what variables are used as found in the research problems. there
are:
Sex
Male - 1
Female – 1
Score
High School Graduate - 1
College Undergraduate - 2
College Graduate - 3
With MA Units - 4
MA degree holder - 5
Score/Code
Position and File - 1
Supervisory - 2
Managerial - 3
2. Enter the score in a table for each corresponding respondent for all
variables.
67
Respondent Educ’l Scores in
No. Sex Age Attainment Position Acceptability
1 1 36 1 1 89
2 1 28 3 2 72
3 2 25 1 1 90
4 2 36 3 1 76
5 2 29 3 3 74
6 1 45 4 3 86
7 2 39 2 1 84
8 2 45 5 2 90
9 2 52 4 2 42
10 1 56 1 1 92
11 1 58 4 3 68
12 2 61 4 3 89
13 1 48 3 1 80
14 1 39 5 2 84
15 2 53 1 1 64
16 1 27 2 1 87
17 1 60 5 2 60
18 2 50 5 3 78
19 2 42 2 1 58
20 2 35 3 2 69
21 2 30 2 1 46
22 1 54 4 3 78
23 1 50 2 1 90
24 2 48 5 3 65
25 2 40 3 2 86
____________________________________________________
3. Encode all entries made (as in the above table ) in the computer.
# 1 problem →
Mean
# 2 problem →
Group Mean
t-test →
a) Sex – 1, 2
t-test →
b) Age – 1 ( 39 and below)
2 (40 and above)
ANOVA → c) Education – 1,5
68
ANOVA → d) Position – 1,3
# 3 Problem → Correlation Matrix
Printout
B. t-test for differences between sex and score for acceptability level
education
differences in position
F. Correlation Matrix for all Variables (sex, age, education, position and level of
acceptability)
The following are the tables taken form the printouts based on the program made for
statistical analysis. What you should do is to present the data, using the different ways of
Total : 25
___________________________________
69
Table 2. Mean acceptability Level
by Age
__________________________________
Age : f : Mean : SD : Se
Group
___________________________________
2 (40 yrs.
& above) : 15 75.07 14.86 3.84
___________________________________
Total : 25
___________________________________
70
Table 5. Result of t-test for Sex
Differences on Level of Acceptability
____________________________________________
Statistics : Grp 1 (Female) : Grp 2 (Male)
____________________________________________
n (No. of Cases) 11 14
x (Mean) 80.54 72.21
x1 – x2 8.32
____________________________________________
t-value (df=23;01) = 1.52 NS
____________________________________________
NS
p .05 since t (tabular; .05) = 2.065
Activity 4.
Dept. A Dept. B
(Production) (Accounting)
90 86 90 87
96 78 85 86
86 70 80 84
85 80 84 80
80 82 90 76
78 81 90 75
95 96 87 78
92 89 80 89
84 94 84 90
80 99 83 90
Year Months
2006 Jan Feb Mar April May June July Aug Sept. Oct. Nov. Dec.
2007 4.5 3.6 5.8 9.2 8.7 6.9 7.2 7.5 8.2 8.4 9.2 9.4
2008 5.3 6.9 12.5 12.2 10.6 10.8 10.9 10.8 11.5 12.7 12.9 13.5
71