Seminar PP T Black Background

STATISTICS
WITH COMPUTER APPLICATION
By
DR. HIPOLITO P. PALCON

BSE, MA in Mathematics
MA in Measurement & Evaluation
Doctor of Philosophy in Sociology & Anthropology
(UP Diliman)
Cell : 0921-4603232
A MODULE
For Exclusive Use of Graduate Students
1
TABLE OF CONTENTS
Title Page
Module 1. Introduction to Basic Ideas in Statistics
Lesson 1. Statistic Defined

Lesson 2. Measurement of Variable
Module 2. Description Statistics
Lesson 1. Statistical Notation

Lesson 2. Frequency Distribution
Lesson 3. Measures of location or Central Tendency
Lesson 4. Measures of Dispersion or Variability
Lesson 5. Measures of Skewness and Kurtosis
Module 3. Inferential Statistics
Lesson 1. Testing Hypothesis

Lesson 2. Test of Relations: Predictions and Correlations
Lesson 3. Test of Independence and Proportion:
Analysis of Frequencies
Lesson 4. Test of Significant Differences between Means:
t-test and F-test
Module 4. Analysis of Computer Outputs
Lesson 1. Preparing the Program for Computer Encoding

Lesson 2. Interpreting Results of Computer Outputs
Bibliography
Appendix
2
Module 1. Definition and Function of Statistics
Objectives : At the end of the lesson, students are expected to:
1. define what statistic/s is.
2. Differentiate population from sample; use the different
sampling procedures and identify the best method of determining the
sample size
3. Enumerate the different kinds of variable – their characteristics and
measures.
Lesson 1. Statistic Defined
Statistics in plural from refers to any kind of data -both qualitative and quantitative
data; in singular form, it is a branch of knowledge which deals with the processes collection,
presentation, analysis and interpretation of data obtained by the conduct of survey and
experiments. You will study statistics as a branch of science methodology. As such, its
essential purpose is to describe and draw inferences about the numerical properties of
population from a given sample of that population.
Collection. Data can be collected with the use of an interview - formal or casual. A
formal interview entails preparation of guide questions and/ or approved schedule for the
interviewees. Casual interview is done informally without guide questions. Data can also be
collected with the use of survey questionnaire, standard test or researcher-made tests, use of
documents, and through the use of various techniques of observations.
3
Presentation of Data. Data can be presented in textual, tabular and graphical forms.
For example, a researcher was able to gather the following data on the expenditures at ABC
Company for the past 5 years: 2000, P2,416,025; 1999,
P2,117,680.75; 1998, P1,986,5921; 1997, P1,876,458; and P974,697 in 1996, These can be
presented in textual, tabular and graphic forms.
Textual
For the past five years, ABC Company has a total expenditure of P2,416,025 in 2010;
P2,117,680.75 in 2011; P1,986,592 in 2012; P1,876,458 in 2013 ; and P974,6974 in 2014.
Tabular
Table 1. Total Expenditures
of ABC Co. from 2010- 2014
______________________________
Year Expenditure
______________________________
2010 P2,416,025.00
2011 2,117,680.75
2012 1,986,592.00
2013 1,876,458.00
2014 974,697.00
______________________________
Graphical
a. Polygon (line)
b. Histogram (bar)
c. Ogive (cumulative < or > )
4
Note that statistical data are frequently arranged and presented in the form
of tables. These tables are designed to enable the readers to grasp with minimal effort the
information intended to convey. In constructing tables for insertion in term papers, theses, or
manuscripts, the following points should be kept in mind:
1. every table should be self-explanatory
2. the title should be precise, stating clearly what the table is all about
3. column of numbers should be appropriately labeled, and arranged in a logical
sequence
4. the information contained in a table may be partitioned by the insertion of
horizontal and/ or vertical lines
5. tables should be appropriately numbered, should be inserted in the text close to
where they are first mentioned
Graphic representation is often of great help in enabling us to comprehend
Features of frequency distribution allow us to make comparisons in mathematical form,
enable us to think about a problem in visual terms as a geometric image of a set of data. One
is histogram which is presented in the form of bars. Another is a polygon which is in the form
of lines. Using the histogram or polygon, the reader will easily visualize intervals, and
comparisons can easily be made important in the analysis and interpretation of data. Questions
are answered by graphics such as: What is the increase….improvement….? Later, you will
learn how to make tables of statistical results.
5
Activity 1.
A. What statistical data do you have in your office or company?
Enumerate at least five of them. Collect one type of data and present this using any
form of presentation appropriate to the kind of data.
B. The following are a firm’s quarterly sales in thousands of pesos in each of the three major
markets, A, B and C.
Market
A B C
First Quarter 225 110 350
Second Quarter 175 180 290
Third Quarter 120 100 425
Fourth Quarter 210 220 510
Present these data in tabular and graphical forms.
Lesson 2. Population and Sample
In every language, the term population refers to groups or aggregates of people. In
statistics, the term generally refers to defined groups or aggregates of people, animals, objects,
materials, measurements, or events of any kind. The statistician’s concern is with properties
which are descriptive of the group. The entire group is called population. For example, we
refer to a population of children with IQ of 90 and above; factories involved in the production
of t-shirtst; rank-and-file employees in the government.
In research, the use of the entire population is not practical. It is time-consuming and
expensive, thus, we resort to studying a sample of this population. In other words, we only
select a small group for actual testing called sample. This sample is small but is
representative of the population. By representative, we mean that the sample should bear the
characteristics, whatever characteristics or descriptions the population has. For example, if
the population is characterized to be 5’6” tall and above, both male and female, aggressive,
6
and 50 to 65 years of age—the selected sample must contain all those characteristics.
Selecting the members that consist the sample employs one of the following sampling
techniques.
Probability Sampling
This technique means that every member of the population has the chance to be
selected or chosen without any influence from one member to another. Sampling procedures
such as the following are used:
Random Sampling. This sampling technique may use the following procedures:
a) lottery - names of all members are put inside a box. The researcher just
pick the appropriate members that will make up the sample
b) use of the Table of Random Numbers - The researcher may want to select
say, 100 students from a population of 8,500 students, any four-digit
members picked from the table from 0001 to 8,500 will comprise the
sample.
c) Systematic Random Sampling - a procedure which uses mathematical
procedures as basis for choosing the sample. Say, if 10 will be chosen from
a population of 100, one may use this procedure:
100 ÷ 10 = 10
From the result, get every 10th member. The sample of 10 are: 10, 20, 30,
40, 50, 60, 70, 80, 90, and 100. The procedure is to provide the population
size by the desired sample size to get the interval.
7
d) Stratified Random Sampling - is used to pick sample members when the
researcher has reason to believe that the population is composed of distinct
sub-groups or strata. These subgroups or strata are characteristics or
variables of the population which may influence differences in the results
of the study. For instance, a certain Insurance Company decides to study
the attitude of its clients on the benefits given by the company. The
company decides to pick a sample of 200 from 2,000 clients, but the clients
vary from one other in terms of age and monthly or yearly premium paid. It
is appropriate to subdivide the clients into subgroups or strata and then for
each subgroup, simple random sampling procedure can be applied.
Population Sample
Subgroup Size Size

(age group) N n
(10% of N)
_____________________________________
12 yrs & below 156 16
13-18 178 18
19-23 235 23
24-29 386 39
30-39 420 42
40 and above. 620 62
_____________________________________
TOTAL 2,000 200
Other sampling procedures may be applied such as cluster sampling as the successive
sampling of units. The same procedures as stratified sampling are used depending upon the
characteristics of the population and the purpose of the study. Finally, in all these procedures,
the researcher uses the simple random sampling after all subgroups and clusters or units are
made.
8
Non-probability Sampling
This technique does not use random sampling. It is weak, yet for special purpose, it
can be used with care in selecting samples by “replicating studies with different sample”
(Kerlinger, 1986:119).
a) Quota Sampling – uses knowledge of strata of the population such as sex, race,
region, and so on as basis for selecting the sample members. Each of the strata is
unique carrying the same characteristics – different from the others
b) Purposive Sampling – is characterized by the use of judgment and a deliberate
effort to obtain representative samples by including typical groups in the sample
(i.e. company manager, presidents, mentally – retarded children) presumably
because there are only few members.
c) Accident Sampling – the weakest form of sampling which considers available
samples at hand because of convenience.
Sample Size
Every researcher may ask himself a question like, “What is the right number of sample size?”
A rough-and-ready rule is “Use as large samples as possible.” Another question that must be
asked is. How much error is likely to be calculated given the sample size? Below is a figure
that shows the relationship between sample size and error, the deviation from population
values. The curve tells, the smaller the sample the larger the errors, and the larger the sample
the smaller the error (Kerlinger; 1986)
9
Large
Error
Small
Small Large
Size of Sample
Selecting a Representative Sample
A good sample must be as nearly representative of the entire population as possible
and ideally it must provide the whole of the information about the population from which the
sample has been drawn. A sample to be representative of the population should consider two
important aspects:
Population Characteristics
Adequacy of sample size
The first can be answered by sampling techniques. The second is in choosing the sample size.
The research should consider the following observations in sampling (Srivastava; 1994 : 52-
53).
1. The larger the sample, the smaller the magnitude or sampling error.
2. Survey–type studies probably should have large samples than needed in

experimental studies.
3. When sample groups are to be subdivided into smaller groups, the researcher
should select large enough sample so that the subgroups are of adequate size for
the purpose.
10
4. Subject availability and cost factors are legitimate consideration in determining
sample size.
5. In survey-type studies, usually 10-20 percent is recommended for population size

not less than 100.
After subgroups according to population characteristics have been made,
the following formula (Sloven, 1960) may be used considering a predetermined sampling
error (.05 or .01):
N
n= --------------- where
2
1 + Ne
n = Sampling
N = Population
e = Sampling error
Activity 2.
A. the following are population + sample sizes. Determine the members of
sampling using the indicated sampling procedures:
N n for a n for b
1. 500 workers of GSIS ___ _____
2. 3,560 students of ABC School ___ _____
3. 10,000 factory workers of ABC Co. ___ _____
4. 5,420 farmers of Nueva Ecija ___ _____
5 1,275 members of Women Federation ___ _____
Sampling Procedures:
a. using systematic sampling
b. using the Table of Random Numbers
B. From the above data, how would you classify them into subgroups?
C. Determine the sample size using Sloven’s formula.
11
Lesson 3. Measurement of Variables
A variable refers to a property whereby the members of a group or set differ from one
another . Members of a group may differ in sex, age, performance, height. These are variables.
Variables are labeled with the use of numbers. These values are called variates. For example,
Sex Value
Male 2
Female 1
Age Value
6 & below (Pre-school age) 1
7-12 (Elem age) 2
13-16 (High School age) 3
These values are important in statistics because these correspond to the scores that are
measured.
Classification of Variables
Variable can be classified as dependent and independent variables. Consider the expression y
= f (x). Here, the value of y is dependent on the value of x.
Therefore Y > dependent variable

X > independent variable
Let’s consider two variables with functional relationship such as age and performance. Here,
performance is dependent on age. Therefore, age is the independent variable and performance
is the dependent variable. Give your own examples of pairs of variables which have functional
relationships.
12
Another classification is continuous and discrete variable. A continuous variable may
take any value within a defined range of values. Example of continuous variables are height
and weight. A discrete variable can take specific value only. For example, the size of a family,
value in rolling a die and frequencies responding to a questionnaire.
Another classification of variables is important to statistician. This classification is
based on differences in the type of information which different operations of
classification or measurement yield. They are the following:
Variable Statements made of

Nominal equality or difference
Ordinal greater than or less than
(rank ordering)
Interval with an arbitrarily defined zero point
Ratio With true zero point
Examples:
Color and brand of cars are nominal variables because you can only make statement of
“same brand, and different color / brand. You cannot say, blue is greater than red.
Height is ordinal because people in the group can be ranked from lowest to tallest’.
Temperature and age are interval variables. They don’t have true zero point. All other
variables that have true zero point such as achievement, length and weight are ratio
variables.
Statistical methods exist for the analysis of data composed of nominal, ordinal,
interval, and ratio variables. Procedures for statistical test used are important to decide on the
kind of measurement the variable has. Variables are the stuff of which statistics is made. You
13
will understand better the variables when you’re studying inferential statistics – when
deciding what statistics should be used to test interrelationships of variables.
Measuring Variables
Some variables are easy to measure. For example, age in years. So we say that the two
samples are 15 and 12 years of age. In research, there are variables which need to be described
or defined by looking into some indicators before one can measure it. Consider, for instance,
productivity. What indicators will you use to measure productivity? or how will you define
productivity? You may consider productivity in terms of the number of boxes/items sold
during the day. In this case, you can measure it using the following quantitative label/code
and description:
No. of Items Value/Score Description
1–5 1 Poor
6 – 10 2 Fair
11 – 15 3 Good
16 – 20 4 Very Good
21 and Above 5 Excellent
Activity 3
A. Identify what type of variables are the following – continuous or discrete?
1. age
2. color of the eyes
3. number of residents in a community
4. number of employees
5. type of building
14
B. Identify whether the following variables are nominal, ordinal, interval and
ratio according to how you measure each one:
1. Race 6. Efficiency
2. Religion 7. Library collection
3. intelligence Quotients 8. Managerial skill
4. productivity 9. attitude
5. performance 10. Weight
C. Choose pairs of variables with functional relationships, label them and then
identify what type of variable each one is.
D. Define the following variables, measure them and then describe.
1. attitude towards work

2. fluency in the use of English language
3. attendance in meeting
4. efficient secretary
5. saleable cars in 2012
E. Write down 10 variables in your own discipline and classify these according to
type and identify how you will measure each one.
15
MODULE 2. DESCRIPTIVE STATISTICS
Objectives: At the end of this module, the students should be able to:
1. draw the graphical representative of frequencies with the use of histogram,
polygon, and ogive to show the kind of distributions the data have.
2. compute measure of location, dispersion, skewness and kurtosis of given sets of
data using the theories of summation notation.
3. Compare and contrast the different measure of location, dispersion, skewness, and
kurtosis with a graphical representation of the distribution or curve.
Lesson 1. Statistical Notation
The most commonly used form of notation in statistics is summation notation. A set of
values, measurements or observations is denoted by x1, x2, x3,…xn where n denotes the total
number of variables represented by x. So, we say that the summation of xi observations where
n = 5 is
x1 + x2 + x3 + x4 + x5 = Exi = Exi
Rules for Summation Notation
Theorem 1. If every variate value in a group is multiplied by a constant number or factor, that
factor may be removed from under the summation sign and written outside as a factor. Thus,
Ec xi = cx1 + cx2 + cx3 + cx4 = c (x1 + x2 + x3 + x4) = c E xi
Theorem 2. The summation of a constant over N terms is equal to Nc. Thus,
Ec = c + c + … +c = Nc
16
Theorem 3. The summation of the sum of any number of terms is the sum of the summation of
these terms taken separately. Thus,
E (xi + yi + zi) = E xi + E yi + E zi
Activity 1. Given
x1 = 5 y1 = 2
x2 = 6 y2 = 3
x3 = 12 y3 = 7
x4 = 15 y4 = 10
Find
1. E xi
2. E yi
3. E (xi + yi)
4. E (3xi + yi)
5. (xi – yi)
Lesson 2. Frequency Distribution
The arrangement of data is a frequency distribution. It is arranged to show frequency
of occurrences of values, especially for voluminous data. This arrangement shows how many
times each score occurs so that it is advisable to reduce the number by any desired number of
classes representing the individual scores. Some values can be grouped together
at a given class interval. Consider the following set of data (IQ scores of 45 applicants):
91 88 106 115 96 90
86 99 102 112 102 89
95 105 103 96 97 87
101 100 89 90 96 84
99 86 90 84 100 112
105 94 90 89 98
91 90 92 88 104
84 89 107 91 100
17
Steps in making the frequency table:
1. Find n (sample size) n = 45
2. Find the range by getting the difference between the highest and lowest values.
Range = 115 – 84 = 31
3. Divide the range by any desired interval (2,3,5,7,…). The answer should be equal
to or greater than 9 but not more than 20. 31 divided by 3 is approximately 10,
where 10 is an acceptable number of groups or classes. Therefore, the interval is 3.
4. Now, what number is nearest the highest value, 115, that is divisible by 3? The
answer is 114. Therefore, the first group or class starts with a class interval 114-
116 and the last group should contain the lowest score (in this example, it is 84).
5. Make the class interval and complete the needed information’s you want to include
in the table such as the frequency, midpoint, limits (lower and higher limits), and
the cumulative frequency (less than and greater than).
Class Tally frequency Midpoint limit cum f< cum f>

______________________________________________________________
114 – 116 // 1 115 113.5 – 116.5 45 1
111 – 113 /// 3 112 110.5 – 113.5 44 4
108 – 110 0 109 107.5 – 110.5 41 4
105 – 107 //// 4 106 104.5 – 107.5 41 8
102 – 104 //// 4 103 101.5 – 104.5 37 12
99 – 101 /////-/ 6 100 98.5 – 101.5 33 18
96 – 98 //// 4 97 95.5 – 98.5 27 22
93 – 95 // 2 94 92.5 – 95.5 23 24
90 – 92 /////-//// 9 91 89.5 – 92.5 21 33
87 – 89 /////-// 7 88 86.5 – 89.5 12 40
84 – 86 ///// 5 85 83.5 – 86.5 5 45
_______________________________________________________________
18
Many observations can be made for the distribution. The midpoint is the median of
every class interval. This midpoint can be used to name the whole group such as when you’re
making the graph. Cumulative frequency less than or greater than can be made by adding all
frequencies from below and above respectively. Percentages can also be computed for
cumulative frequencies by dividing the corresponding frequency (i.e. 36/45 x 100) by the total
observation (n) times 100.
A histogram and ogive (graph for cum f< and cum f>) can be made as follows
10
85 88 91 94 97 100 103 106 109 112 113
Midpoint
(A histogram)
19
45
40
35
30
25
20
15
10
85 88 91 94 94 97 100 103 106 109 112 115
Midpoint
(An Ogive)
Activity 2
20
A. For the following ungrouped or new data, make the frequency table with
appropriate interval and then draw its graphical representations such as the histogram and
ogive.
B. Answer the following from the graphs:
1. How many application got IQ score below 90?
2. What per cent of applicants are above average with IQ score of 96 and
above?
3. If you decide to pass a score of 95 and above, how many of the applicants
will you consider passing?
Date:
99 100 89 87 88 99 89 105 106 105 112

87 86 79 90 94 92 90 108 109 111 110
79 85 87 96 90 83 80 84 97 90 92
120 115 97 90 92 91 90 113 114 118 116
108 90 76 77 79 80 81 89 89 96 94
86 88 91 79 88 95 97 101 107 118 100
84 88 97 91
Lesson 3. Measure of Location or Central Tendency
After the collection of data from a common source, individual observations are not
likely to have the same value. It is impractical to keep all the values in mind. What we need is
single value that we may consider typical of the set of data as a whole. This single value is one
of the three most common measure called the measure of location or central tendency: the
mean, median and mode. Consider the following values or observations:
14, 9, 26, 30, 15, 12, 8, 21, 30, 20, 30,
21
Mode: The mode is the most frequently appearing value. In the measurement above,
the mode is 30 appears three times. In same sets of data, only one mode is present-
unimodal; if three are two – bimodal, if there are three or more multimodal.
Median. The median is the middle value of an ordered array when the items are
arranged in ascending or descending magnitudes.
When n=odd number
Example. 15, 20, 7, 12, 18, 25, 30, 21, 9
Arranged is ascending order : 7, 9, 12, 15, 18, 20, 21, 25, 30
Where n = 9
Median = (9+1)/2 = 10/2 = 5th No. (counted either from right or from
left)
Median = 18
When n = even number
Example. 8, 7, 12, 10, 15, 24, 18, 21

Arrange: 7, 8, 10, 12, 15, 18, 21, 24
Median = 8/2 = 4th (counted from left and from right – get the average)
7, 8, 10, 12, 15, 18, 21, 24
Counted from left, 4th number is 12; counted from right The number is 15
Median = (12+15) /2 = 27/2 = 13.5
22
Properties of the Median
1. the median always exits in a set of numerical data.
2. The median is not after affected by extreme values whereas the mean is.
3. The median can be used to characterize qualification data – e.g. quality categories
such as
1 good
2 better
3 best
The median is 2 “better.”
Mean: The most familiar measure of central tendency is the mean, sometimes called
Arithmetic mean or average. We find it by adding the values in a set of data and dividing the
total by the number of values that were added.
For example:
Xi = 8, 12, 21, 14, 18, 27
x (Mean) = Ex
x (Mean) = 102/6 = 7
Properties of the Mean
1. For a given set of data, there is one and only one mean.
2. Since every values goes into its computation, it is affected by the magnitude of
each value.
Comparing Mode, Median and Mean
Give the following frequency curves, the mode, median and mean are located
23
Symmetrical
Median
Mean
Mode
Fig 1. Mean = Median = Mode
Mean Mode
Median
Fig. 2. Mean is greater than the Median

(Negatively Skewed)
Mode Mean
Median
Fig 3. Mean is less than the median

(Positively Skewed)
24
A comparison of the mean, median and mode may be made when al these have been
calculated for the same frequency distribution. In figure 1, the distribution is symmetrical. The
mean, median and mode coincide. In Figure 2, the distribution is negatively skewed. The
mean is less than the median and also less than the mode. In figure 3, the distribution is
positively skewed where the mean is greater than the median and mode.
Activity 3
1. In the following sets of data, find the mode, median and mean
Set A. 17, 24, 18, 12, 36, 42, 18, 24
Set B. 18, 21, 76, 35, 42, 58, 39, 45, 50
Set C. 1, 14, 12, 18, 17, 81, 90, 100
2. Locate the mode, median and mean for sets A and C using a frequency curve.
Lesson 4. Measure of Dispersion or Variability
Of utmost concern to statistician is the variation in the events of various measurements
such as work performance of individual in a given sample.
Consider the following measurements for two groups.
A 9 13 15 18 20
B 2 8 10 27 28
The two sample above have the same mean, 15, however, by inspection, variation in sample B
is greater than the mean in sample A. Among the various measure to describe variation are the
25
1. range
2. mean deviation
3. standard deviation
4. variance
Range. The range is the simplest measure of variation. It is taken as the difference between
the largest and smallest measurements or values. In our samples above.
Range for Sample A = 20 – 9 = 11

Range for Sample B = 28 – 2 = 26
Mean Deviation. The mean deviation is the sum of the absolute deviation of every
measurement from the mean divided by the number of observations. Absolute value means
without regard to algebraic signs.
Consider the following:
Sample A: 10, 10, 10, 10, 10
Sample B: 1, 4, 7, 10, 13
Sample C: 1, 5, 20, 25, 29
By inspection, Sample C is more variable than Sample B. Obviously, Sample A has no
variation. Computing the mean deviation (MD) for sample A and B, we have,
Sample B Sample C
X (x-x) /x-x/ x (x-x) /x-x/
1 -6 6 1 -15 15
4 -3 3 5 -11 11
7 0 0 20 4 4
10 3 3 25 9 9
13 6 6 29 13 13
x(Mean) = 35/5 = 7 x=80/5 = 16
26
MD=E/x-x/ MD=E/x-x/
_____ _____
n n
= 18/5 =52/5
= 3.6 = 10.4
From the result above, the mean deviation for sample C is 10.4, that means that measurements
deviate by 10.4 from the mean. Sample B deviates by only 3.6 from the sample mean.
Therefore, we can say that value from Sample C are more variable than those in B.
Standard Deviation and Variance
In our computation of mean deviations, some deviations from the mean are
positive and others are negative. To do away with the sum of the deviations equal to O,
we get the absolute value of deviations. In the case of variance and standard
deviations, we square the deviations to do away with negative values. Thus, in our
Sample B, the standard deviation and variance is:
2
X x-x (x-x)
___________________________
1 -6 36
4 -3 9
7 0 0
10 3 9
13 6 36
X = 35 = 7
2 2
S (variance) = E (x-x)/n or E (x-x) /n = 90/4
= 90/5 = 18 = 22.5
27
S (standard deviation) = E ( x – x)/n = 90/5 = 4.2 (with square root sign)
2
Or = E (x – x) /n-1= 90/4 = 4.7 (with sq root sign)
The formula above uses N and N – 1. What is the best way to use? Remember that we want
an unbiased estimate of the population variance and by decreasing it by 1, we are sure that at
least one of its values is equal to the mean. So, for unbiased estimate we always use the
formula N – 1, when n is big enough, that is, greater than 25
We noted from our computations that when all the values in a set of data are located
near their mean, there is a small amount of variation or dispersion. And the set of data which
some values located far from their mean have a large amount of dispersion. Expressing these
relationships in terms of standard deviation, we say that the standard deviation is small when
the values are concentrated near the mean. When it is large, the values are dispersed widely
about the mean.
28
Activity 4
A. The following table gives the details of an investment consisting of the prices of
stock per share in 10 months:
Month Stock A price/share Stock B price/share
January P 9.00 P 9.00
February 21.00 21.00
March 20.00 15.00
April 18.00 12.00
May 15.00 10.00
June 15.00 11.00
July 16.00 12.50
August 20.00 15.00
September 20.00 16.00
October 10.00 14.00
B. Compute a) Range of Stock A and B
b b) Mean Deviation of Stock A and B.
c c) Standard Deviation of Stock A and B
d) Variance of Stock A and B
C. Which is a more stable Stock to invest into? Why?
29
Lesson 5. Measures of Skewness and Kurtosis
Consider the following distributions.
SET A. Skewness of a Distribution
Figure 1
Figure 2
Figure 3
30
SET B. Kurtosis of a Distribution
Figure 4
Figure 5
Figure 6
31
Skewness refers to the symmetry or asymmetry of the frequency distribution. Figure
1 is symmetrical, that half of its frequency falls at the left of the mean and another half at the
right. It follows the properties of the normal curve.
-3 -2 -1 0 1 2 3
___________ 50% _________ _____________ 50% ___________
Figure 2 is negatively skewed, that is, more frequencies are concentrated at the right
end of the mean. Figure 3 is positively skewed, that is more frequencies are found at the left
end.
Kurtosis refers to the flatness or peakedness of the distribution. One kind is normally
distributed called mesokurtic (see Figure 4). Another kind is when most frequencies are
evenly distributed from left end to right, called platykurtie, flat (see Figure 5). The third is
when frequencies are more concentrated at the middle. It is more peaked, called leptokurtic
(see Figure 6).
32
To find out the symmetry and Kurtosis of a given distribution, we use the concept of
moments. The term “moments” originates in mechanics (Ferguson, 1982 ; 72). The first four
moments are computed first:
m1 = E (x – x) = 0
2 2
m2 = E (x – x) = N – 1 (s)
3
m3 = E (x – x)
4
m4 = E (x –x)
N
To measure skewness of data, we use the kind moments (m3) defined
g1 = _________ . where
M2 m2
g1 = Skewness
m3 = third moment
m2 = Second moment
To measure kurtosis, the fourth moment is used, the formula of which is
g2 = m4 -3 , where
2
(m2)
g2 = kurtosis
m4 = fourth moment
m2 = second moment
To interpret the results:
33
If g1= 0 and m3 = 0, the distribution is symmetrical.
If g2 = 0 it is asymmetrical – either negative (g1 is negative) or positive (g1
is a positive non-zero number)
if g2 = 0, it is a normal distribution
if g2 > 0, it is leptokurtic, and
if g2 < 0, it is platykurtic.
To illustrate, let’s find the grand g2 of this set of measurements:
A: 6 8 10 12 14
2 3 4
x x-x (x–x) (x–x) (x – x)
6 -4 16 -64 256
8 -2 4 -8 16
10 0 0 0 0
12 2 4 -8 16
14 4 16 64 256
m3 3
x = 50/5 g1 = m3 = E (x – x)
m2 m2
N
= 10 = 0 = 0
5
8 8
= 0
2
m2 = E ( x – x )/n
= 40/5
= 8
34
g1 = 0, symmetrical
m4 4
g2 = 2 -3 m4 = E (x-x)
m2 N
108.8 = 544
g2 = 2 -3 5
8
= 108.0
= 108.8 -3
64
= 1.7 - 3
g2 = - 1.3 , platykurtic.
Activity 5
A. Compute the skewness of the following sets:
SET A.
9, 12, 18, 24, 6, 12, 10, 24, 28, 15, 36, 40, 27, 9, 10, 10,
24, 20, 19, 20
SET B:
9, 10, 12, 8, 10, 7, 10, 14, 10, 28, 30, 36
B. Decide whether the following are flat or peaked by computing the kurtosis
Computing the kurtosis
48, 56, 38, 20, 76, 84, 29, 37, 35, 58, 60, 64, 78, 100, 96
84, 80, 90, 90, 92, 78, 86, 72, 59, 60, 59, 54
75, 35, 45
35
Module 3. Inferential Statistics
Objectives: At the end of the modules the students should be able to:
1. conduct tests of hypotheses about
a. test of relationship
b. test of proportions
2. make statements of hypotheses of given research problem using the null
hypothesis.
3. Follow the steps in hypothesis testing in the conduct of solving problems.
Lesson 1. Hypothesis Testing
A hypothesis is a conjectural or speculative statement about the magnitudes of the
population parameters using sample data. The purpose of hypothesis testing is to help one
reach a decision about a population by examining the data contained in a sample from that
population.
Steps in Hypothesis Testing
1. Statement of the hypothesis.
Hypothesis is stated in null form, that is an unbiased statement of no relation or
difference. For example, the statement that 60% of the employees in a firm have had
one two years of college. The null hypothesis is a statement that is 60 %, symbolically
Ho : p = 0.60 (null)
H1 : p = # 0.60 (alternate)
P > 0.60
36
P < 0.60
Another is when a researcher wants to find out whether the male workers have a higher work
performance (mean performance = 96) than female worker ( x = 84). The null hypothesis.
Ho : x1 = x2 Where
x1 = mean of males
x2 = mean of females
2. Identify what test statistic is appropriate to use to test the null hypothesis.
3. Compute the test statistic.
4. Decide whether to accept or reject the null hypothesis.
To reject : The value of test statistic should be greater than the
tabular value of that test statistic.
tc > ttab ; r, F, x
To accept: The value of test statistic should be lower than the tabular
tc < ttab ; r, F, x
Assumed tabular
Value (2.47)
assumed
computed value
(3.29)
-3 -2 -1 0 1 2 3
37
The computed test should be found in the region or rejection or since depending on the
value (computed vs tabular). In the above figure, mull hypothesis is rejected. Accept the
alternate hypothesis that there are no significant differences.
5. Decided the level of significant or acceptance. In most cases, the 5% = 0.05 and
1% = 0.01 levels are used, for both two-tailed and one-tailed tests. When do you decide
that the statement is a two-tailed or a one-tailed test?
Two-tailed (non-directional test) = when the conjectural statement has no
predetermined bias against one group or the research has no idea that one group is better than
the other (this is when testing differences between two groups); this is also called non-
directional test. For example
Ho : x1 = x2
Ho : x1 = x2 or x1 # x2
x < x2
1-x = .95
x = .05
x = 0.025
region of Region of 2
rejection acceptance region of
rejection
-1.96 1.96
0.005 0.005
-2.58 2.58
38
One-tailed: one directional test has a predetermined bias against one group.
Form example, the researcher wants to test if boys have a higher performance in
Mathematics than girls.
Ho : Xboys = Xgirls
Ho1 : Xboys > Xgirls
Xboys < Xgirls
X = 0.05 - 1.645 1 – x = 0.95
X = 0.01 - 2.33 x (.01) = 2.33
Note that these values at 0.05 and .01 level vary from statistics to another.
6. Reject or accept the null hypothesis given the degrees of freedom (df) and level of
significance.
7. Make your conclusion. Conclusions depend upon your decision of the null hypothesis.
When you reject the null hypothesis, your conclusion is based on the alternate
hypothesis, the accepted hypothesis. For example, if you rejected the null hypothesis:
Ho : x1 = x2
H1 : x1 # x2
Then you accept the alternate hypothesis (H 1) that means you conclusion is that there is a
significant difference between the means of boys and girls.
39
Activity 1.
A. State the Ho and H1 of the following problems:
1. The quality control department of a food processing firm found out that the
mean net weight per package of cereal must not be less than 30 ounces. In a
survey of 15 packages, the mean was 18 ounces.
2. One psychologist found out that the mean sociability index of sales
representative is 48.6 while that of accountants of the same company is 32.8.
B. State some inferences that you know and then transform these into null form (using
symbols or descriptions).
Lesson 2. Test of Relationship
The study of paired measurements is closely related to correlation and prediction.
Our concern here is with the problem of describing the degree of magnitude of the relation
between two variables. The statistics used is called correlation coefficient using the measure
Pearson-product-moment correlation coefficient or r. This is a statistics of the interval-ratio
type of variables.
Researchers may want to find out what relation exits between attitude
(x) and work performance (y) of employees. Two paired measurements can be computed
using the simple Parson r
Xi Yi
X1 y1
X2 y2
X3 y4
X4 y4
X5 y5
40
The formula is defined as
N E NY - EX EY
r =
2 2 2 2
[ n EX -(EX)}] [n E y – (Ey) ]
The degree of relationship between x and y may take from –1 to +1. There is a
perfect negative relation when r = -1 a perfect positive relation when r = +1.
The scatter diagram shows degrees of relations between x and y
y y
x x
Perfect + relation (r = 1) Perfect + relation (r = 1)
y y
x x
r is between –1 and 1 r=0
Positive relationship between attitude (x) and make performance (y) can be stated and
interpreted as
41
a) The more positive attitude a worker has, the better is his
performance (r is positive)
b) The more negative attitude, the less be performs
X y
+ + Positive r (High)
- -
+ -
- + Negative r
Consider the following paired measurements taken from a survey of 8 employees.
x wage
y cost of living
x (in thousands) x2 y (in thousand) y2 xy

8 64 6 36 48
4 16 5 25 20
6 36 3 9 18
12 144 10 100 120
24 576 15 225 360
18 324 9 81 162
14 196 7 49 98
3 9 2 4 6
Compute the following :

2
a. Ex or 89
2
b. Ex or 1365
2
c. Ex or 57
2
d. Ey or 529
2
42
d. Exy or 832
8 (83.2) – 89 (57)
r =
[8 (1365) – (89) 2] (529) – (57)2]
r = 0.91 (highly correlated)
Therefore, the relationship is: the higher the salary of on employees, the higher his
cost of living is, and the lower the salary, the lower is the cost of living.
When r is squared,
2 2
r = (.91)
= 0.83 (explained variance)

2
and 1 - r (unexplained variance)
1.- 0.83 = 0.17
This result can also be interpreted as 83% is influenced by wage and 17% is attributable to
other factors other than wage.
Prediction
When two variables are highly correlated, prediction of one variable is possible from
a knowledge of the other. The presence of a nonzero correlation ( r ) between x and y implies
that if we know something about x, we know something about y and vice versa. If knowing x
implies some knowledge of y, a prediction of y form x is possible. The greater the value of
correlation between x and y, the more accurate the prediction of one variable form the other.
The Linear Regression of y on x
Using the idea of the equation of a straight line given by
43
y = bx + a where
b = is the slope of the line
c = constant
AC
slop = -----
BC
A
y
The slope of the registration line for predicting Y from X is given by
y’ = predicted value of y
y’ = byx + ayx where b = slope
a = constant
The values of byx and ayx may be calculated as follows
byx = N Exy - ExEy
N Σ X - (ΣX)
ayx = Ey – byx EX
Similarly, predicting X from a knowledge of y is calculated by the following:
x’ = bxy Y + axy where
x’ = predicted value of x
b = slope
a = constant (line intercepts x axis)
44
bxy = N Exy - ExEy
N Ey – (Ey)
axy = 8 (832) – 89 (59) = 1583 = 0.53
8 (1356_) – (89)
a = Ey – Ex, = 57 – 0.53 (89)/8
N = 57 –47.17
Therefor, y’ = 0.53 x + 1.23
If we want to predict y given x = 20, then
y’ = 0.53 (20) + 1.23
= 10.60 + 1.23
= 11.83 or 12
Computation or correlations involving three or more variables
Involves multiple correlations (R). This involves tedious process of computation so that the
computer can do it for you. The last chapter will give you the output, the correlation matrix
and how to analyze these results.
Note that for every pair of variables, one variable is an independent variable and the
other is the dependent variable – the variable that is influenced by the other. Our example
about wage (x) and cost of living (y), y is the one influenced by wage, that is amount we
spend is always attributed or influenced by the amount we earn.
To test if the degree of relationship is significant, compare the computed r with the
tabular r given N – 1 degrees of freedom at .05 or .01 levels. For example, if r = 0.91 with df =
45
7, we compare the tabular r (.05) and r (.01) = 0.798. Since r computed = 0.91 > r tabular (or
critical), then we reject the null hypothesis,
Ho : rc = rt
H1 : rc = rt
Our conclusion is the there is a significant relationship between wage and cost of living. We
can strongly say that cost of living is significantly influenced by the amount we earn.
Activity 2
A. Infer the relationships between pairs of variables as follows. Identity the independent
variable (IV) and dependent variable (DV.
1. A study conducted to analyze the relationship between production and
manufacturing expenses.
2. A firm wants to find out if sales volume is related to effective buying income.
3. A market analyst wants to find out if there exists a relation between traffic and
market sales strategy.
46
B. Compute the correlations of age and efficiency ratings of 20 assembly line employees.
X Y
44 61
44 41
45 89
43 76
40 79
52 67
43 73
46 94
53 96
43 77
51 60
50 78
61 74
47 82
62 70
34 70
51 60
48 67
51 72
57 80
C. From the result in B, predict the efficiency ratings for each of the following ages:
Age Efficiency Rating
29 __________
70 __________
24 __________
47
Lesson 3. Test of Independence and Proportions
In any research situations, we may wish to compare a set of observed frequencies with
a set of theoretical frequencies. In such situations, the chi-square (x) is used and is defined by
2
2 (o-e)
X = _______ where,
e
0 observed frequency
e expected frequency
Activity 3
A. In tossing a coin 300 times, the following results are obtained
No. of Head = 90 and No. Tails = 210
Is the coin balanced or biased? Why or why not?
B. A certain business journal showed that 30% of Makati Investors are Chinese, 12%
Americans, 42% Japanese, 9% other nationalities, and only 3% Filipinos. A survey
of 200 investors in Makati showed the following results:
Chines 56
American 41
Japanese 60
Filipino 25
Others 18
Total = 200
Test the hypothesis that observed (surveyed) frequencies is equal to expected.
C. The following contingency table shows a relation between pass and fail on ratings
of job performance of 100 employees. Test the hypothesis that job performance is
independent of examination results.
48
Rating
__________________________________________
Below Average: Average: Above Average: Total
__________________________________________
Pass : 11 25 35 71
Fail : 15 7 7 29
_____________________________________________
Total 26 32 42 100
Consider a market research study. Two brands of soap, A and B, were
distributed to 200 households. After its use, they were asked which brand they
preferred. The results show 112 preferred A and 88 preferred B.
Ho: There is no difference in consumer preference for the two brands

of soap, A and B
H1: One brand of soap is preferred significantly against or a 50:50

split exists
Brand A Brand B Total
Preference 112 88 200
Computation:
2
E 0 o-e (o-e) (o-e) /e
__________________________________
A 100 112 12 114 1.44

B 100 88 -12 144 1.44
__________________________________
X (tabular) with df =1 at .05=3.84, at .01 = 6.64
49
Since the computed x = 2.88 is less than the tabular value at .05 level with df=1, then
the null hypothesis is accepted. It can be concluded that no difference exists in the
consumer preference between brands A and B.
Test of Independence
In test of independence, two variables are involved. These are usually nominal
variables. The question is whether the two variables are independent of each other. The
data are arranged in a table called contingency table.
Variable : A1 : A2 : Total
________________________________________________
B1 20 10 30
B2 30 40 70
_________________________________________________
Total : 50 50 100
For example, 200 males and females were asked if they were smoking. The results show the
following table.
Response
Sex : yes no Total
____________________________________
M 96 (66.6) 25 (53.4) 120
F 15 (44.4) 65 (35.6) 80
_____________________________________
Total 11 89 200
Ho: Smoking is independent of sex

H1: Smoking is related or associated to sex
2
50
Computing for the chi-square (x), we have to compute the expected frequencies as follows:
2 2
O E O-E (O-E) (O-E)/E
96 120(11)/200=66.6 29.4 864.36 12.98

24 120(89)/200=53.4 -29.4 864.36 16.19
15 80(111)/200=44.4 -29.4 864.36 19.46
65 80(89)/200=35.6 29.4 864.36 24.28
--------------
2
X = 72.91
2
The computed X value is very much greater than the tabular value of x=6.64 at .01 level with
df=1. The null hypothesis is rejected. Therefore, it can be safety concluded that smoking is
associated or dependent of sex; that is, males tend to smoke more than girls.
Lesson 4. Test of Differences Between Means
Test of significance may be applied to the difference between means of
a. two independent samples
b. the same sample under two different conditions
c. three or more independent samples or groups
All tests can only be applied to interval-ratio scales or variables.
Significance of the Difference Between Means of

Two Independent Samples
Two groups with N1 and N2 cases are independent with x1 and x2 are assumed to be
drawn from normally distributed population with equal variances. If these assumptions are
warranted, then sample means can be tested for significant differences between means defined
by the statistic t ( Student t),
51
X1 - x2
t = ------------- where,
Sx1-x2
Xi → mean of the first group
X2 → mean of the second group
Sx1-x2 → standard error of the difference

between the two means
Steps:
1. Compute the respective means for the two groups and then get their difference.
2 2
2. Compute the variances and add together these variances (S1 + S2) to obtain the
2
pooled variance (S). The formula is given by
2 2
2 E (X1-X2 ) + E (X2-X2)
S = ____________________________
N1+N2-2
2
3. With the known pooled variance ( S ), compute the standard error of the difference
between means
S = ) 2 2
x1-x2 s /N1 + s/N2
4. Now, you are ready to compute for t
X1 - X2
t = --------------------
S
X1 - X2
52
The null hypothesis being tested here is that there is no significant difference
between the two means,
Ho: X1 - X2 = 0 or X1 - X2
H1: X1-X2 for non-directional test or two-tailed test
X1>X2 or X1<X2 for directional or one-tailed test.
The degree of freedom (df) with this test is N1+N2-2 tested for significance at .
05 or .01 level.
Let’s consider the illustration.
Example 1. A Sociologist wants to find out if in two government agencies the employee differ
in their perception on the positive effects of liberalized or global economy brought about by
the APEC meeting. A test was administered on their degree of perception. The table below
shows the scores of both groups:
Sample A Sample B
30 30
30 28
36 26
20 45
42 20
38 15
25 21
27 27
48 20
40 32
32
30
21
___________________________
53
Ho : xA = XB
H1 : xA = XB
Df = N1 + N2 - = 10 + 12 – 2 = 20
Level of acceptance a = 0.05
Computation:
1. XA = 330/10 = 33
2. 2 2
A (x-xA) B (x – xB)
24 81 30 14.06
30 9 28 3.06
36 9 26 0.06
20 169 45 351.56
42 81 20 39.06
38 25 15 126.56
25 64 21 27.56
27 36 27 0.56
48 225 20 39.06
40 40 32 33.06
30 14.06
21 27.56
_______________________________________________
2
3. S = 748 + 676.22
___________
10 + 12 - 2
= 1424.22/20
= 71.21
4. S = ) 71.21/10 + 71.21/12
x1 - x2
= ) 7.12 + 1.869
= 3.612
5. t = 33 -26.25/3.612 = 1.869
tc (.05) with df = 20 2.086 > t = 1.869
54
Decision: Accept the null hypothesis since t-computed is less than t-tabular (1.869 < 2.086)
Conclusion: Employees of both government agencies equally perceived the effect of
liberalization of Philippine economy. Although, sample B has a smaller mean compared to
Sample A. this differences is not significant.
t-test Correlated Groups
t-test for correlated groups applies to interval-ration scores from the same sample or group
who are exposed under two different conditions. These condition may be:
a. pre-test and post-test comparison to see significant improvement, increase or
decrease after treatment is given.
b. Developing parallel test to see reliability of the test through time or validity
Of a test in terms of content administered to the same group.
__________________________________
01 ---------> X ---------> 02
pre-test treatment post-test
02 – 01 = difference
__________________________________
The null hypothesis being tested here is that there are no significant gain or
improvement after a treatment is given. This treatment can be time, a certain new drug,
new program, training, or any strategy to affect or influence attitude or any physical
characteristic in the sample. The post-test score will show this change. If found that the
difference is significant, this change can be attributed to this treatment.
55
Thus, the null hypothesis is stated:
Ho: x1 = x2 or x1 – x2 = D = 0
Hi : x1≠x2 or x1 – x2 = D = 0
The formula for t is given by either of these two:
t= ED or
NED - (ED) / N – 1
t= D where D ------> mean difference

See Se ------> standard error
2 2 2
Se = Sd/N and SD = E (D –D) / N – 1
Example 2. A company manager wants to find out if work efficiency can be improved
applying Total Quality Management (TQM). He devised an instrument to measure work
efficiency and pre-tested this to 20 rank-in-file employees and then after the TQM was
applied. After six months, he administered the same test to see if there is a significant
improvement. The following are the pre-test and post-test score.
56
2
Pre-test Post-test D D
80 90 10 100
86 88 2 4
78 90 12 144
80 86 6 36
90 92 2 4
92 95 3 9
87 88 1 1
75 80 5 25
78 80 2 4
80 84 4 16
84 80 -4 16
84 80 -4 16
92 90 -2 4
90 85 -5 25
87 84 -3 9
80 86 6 36
82 80 -2 4
85 80 -5 25
84 89 5 25
___________________________________
ED = 31 507
31
t = ____________
20 (507 – (31) / 20 –1
t = 31/ 21.98 = 1.41
tc (.05) = 2.093 > t = 1.41
Decision: Accept the null hypothesis
Conclusion: there are no significant differences in the pre-test and post-test scores, that
means, the TQM did not influence significant change in the work performance of
employees.
57
Analysis of Variance: One-Way and Two-Way
Analysis of variance is a method of dividing the variation observed in experimental
data into different parts. It is used to test the significant of differences between means
of three or more populations, applying different treatment to each of the k sample
being comprised of n members. We may wish to test the effectiveness of the effects of
k-treatments such as methods of instruction, methods of training, exposure to different
programs, different dosages of drugs and exposure to managers using different
management styles. The problem of testing the significant of the differences between a
number of means results from experiments to study variation in a dependent variable
with variation in an independent variable.
Some Experimental Designs
A. One-way classification
Environmental Conditions
Group A Group B Group C
(Restricted) (Free) (Combination)
NA NB NC
Ho: XA = xB = xC
H1: XA = XB = XC
58
B. Two-way classification
Environmental Conditions
Position : Group 1 : Group 2 : Group 3 : Total

---------------------------------------------------------------------------------
Clerical cell – 1 cell – 2 cell – 3 R-1
_______________________________________________________
Supervisory cell – 4 cell – 5 cell – 6 R-2
_______________________________________________________
Managerial cell – 7 cell – 8 cell – 9 R-3
_______________________________________________________
TOTAL G-1 G-2 G-3 N
________________________________________________________
59
Data Analysis
Data for one-way ANOVA is analyzed with the use of F-test for k-groups equal of or
greater than three (3). A summary table for analysis is made such as the following:
Summary Table for One-Way

ANOVA
__________________________________________________________
Source of : Sum of : Degrees of : Mean :F
Variation Squares Freedom Estimate
__________________________________________________________
Between Groups : k –1
Within Groups : n–k
__________________________________________________________
TOTAL N-1
__________________________________________________________
**
p < 0.01
For two-way classification, the F-test is also used and analysis is made with the following
summary table:

ANOVA
__________________________________________________________
Source of : Sum of : Degrees of : Mean :F
Variation Squares Freedom Estimate
__________________________________________________________
Between Rows R –1
Between Columns C–1
Interaction (R-1) (C-1)
With Cells RC (n-1)
__________________________________________________________
TOTAL n RC - 1
__________________________________________________________
60
The null hypotheses tested for two-way ANOVA consist of the following:
1. Ho1 : There are no significant differences between rows.
2. Ho2 : There are no significant differences between columns.
3. Ho3 : There is no significant interaction effect between rows and columns
under different conditions or treatment and groups.
Interaction effects can be understood by an illustrative example and graphs of
mean shown below:
A. No Interaction Effects
Methods
1 2 3 Age
80
∙ A
60 ∙ ∙
∙
40 ∙ ∙ B
20 ∙ ∙ ∙ C
B. No Interaction Effects
61
Methods
1 2 3
80
∙
60 ∙ ∙
∙
40 ∙ High
∙
∙ ∙ Average
20
∙ Low
0
When there are significant differences found between groups in one-way ANOVA, say K=4,
then there is a need to find out which pairs of group means are significant. Thus, a posteriori
t=test, Studentized Range or Newman-Keuls Methods, Tukey Method or Scheffe’ Method
may be used:
______________________________________
Groups
1 2 3 4
X12 X23 X34
X13 X14
X24
________________________________________
Comparison F
________________________________________
1,2
1,3
1,4
2,3
2,4
3,4
_______________________________________
62
Using the Scheffe’ Method, the following steps should be done:
1. Calculate the F-ratio between of means using the within-group variance estimate, Sw.
2
(X1 – X2)
F = _______________
2 2
Sw / n1 + Sw / n2
2. Consult the table of F and obtain the value of F required at .05 or .01 level, for df1 = k –
and df2 = n – k.
3. Calculate the quantity F’ , which is F’ = (k –1) F.
4. Compare the value of F and F’. To be significant at any required level, F must be greater
than or equal to F’.
Example 3. Score in productivity for three groups of salesmen exposed to three different
kinds of training program. Test the hypothesis:
Ho. No significant differences among the three methods of training program
Group
_______________________________________
A B C
5 9 1
7 11 3
6 8 4
3 7 5
9 7 1
7 4
4
2
______________________________________
n 8 5 6 N=9
Ti 43 43 18 T = 104
2
Xi 5.38 8.40 3.00 T /N =560.26
2
Exij 269 364 68 EE xij = 701
2
2 T
63
Ti/nj 231.13 352.80 54.00
E --- = 637.93
N
_______________________________________________________
Sum of Square
______________________________________________
Between 637.93 - 569.26 = 68.67
Within 701 - 637.93 = 63.07
______________________________________________
Total 701 - 569.26 = 131.74

ANOVA
__________________________________________________________
Source of : Sum of : df : Mean : F
Variation Squares Estimate
__________________________________________________________
Between 68.67 2 34.34 **
Within 63.07 16 3.94 8.716
__________________________________________________________
TOTAL 131.74 18
_________________________________________________________
**
p < .01
The null hypothesis tested that there are no significant differences between
groups of salesman exposed to three different methods of training is rejected since the
computed F-ratio=8.716 is greater than the tabular F-ratio=6.23 at .01 level with df = 2/16.
Therefore, it can be concluded that expose to the different training program has affected
significantly the productively of salesman – that is Group B got the highest mean, followed by
Group A and then Group C.
To find out which of the pairs had significant different, the Scheffe Method is
used:
64
Table for Comparison of Pairs
Using Scheffe’ Method
______________________________
Pairs Fair
______________________________
*
1,2 7.18
1,3 2.07 ns
*
2,3 3.75
________________________________
*
p < .05
Computation of F of pairs using the formula

2
(X1 – X2)
F = _____________
2 2
Sw/n1 + Sw/n2
2
Pairs F (1,2) = (8.40 – 5.38)
______________ = 7.08
3.94./4 + 3.94./8
2
F (1,3) = (5.38 – 3.00)
_____________
3.94/8 + 3.94/6 = 2.07
2
F (2,3) = (8.40 – 3.00)
_____________ = 3.75
3.94/5 + 3.94/6
The required tabular F (.05) = 3.63 with df1 = 16. Therefore, the only pairs whose F is found
higher than 3.63 are pairs 1,2 and 2,3. This shows that no significant different exists between
Groups 1 and 3 but significant differences are found in Groups 1 and 2 and Groups 2 and 3.
Analysis of Variance can be two-way independent variables simultaneously analyzed.
This is also true with three-way classification with three independent variable being analyzed
65
at the same time. In so far as computations are concerned, it is recommended that
computations be done through the use of the computer using the SPSS (Statistical Package for
the Social Sciences). The next chapter will give you knowledge about interpreting and
analyzing computer printouts using the different test statistics useful in research.
Module 4. Analysis of Computer Outputs
Objectives: at the end of this module, the students should be able to:
1. program data collected from surveys and experiments ready for encoding in the
computer.
2. Interpret results of computer printouts for both descriptive and inferential test
statistics.
Lesson 1. Preparing the Program, the SPSS Program, for Encoding.
You have learned the different types of variables—how to define them in order to be
measurable. To a researches and statistician-measuring, encoding, labeling of variables for
encoding in the computer are very important. The following tables show the program for
computer encoding based on the three research problems.
1. Two what extent is the acceptability of the new training program of selected
employees of ABC company?
2. Are there significant differences in the level of acceptability when respondents are
grouped according to the following variables?
a. sex
b. age
c. educational attainment
d. position
66
3. Are there significant relationship between employees’ acceptability level and the
variable above?
Steps:
1. Identify what variables are used as found in the research problems. there
are:
Acceptability – measured in terms of 5-point scale
4.45 - 5.00 100%

3.45 - 4.44 75%
2.45 - 3.44 50%
1.45 - 2.44 25%
1.44 - below 0%
Sex - measured as
Sex
Male - 1
Female – 1
Age – measured in years (e.g. 36 years old)
Educational Attainment – measured using the following scores
Score
High School Graduate - 1
College Undergraduate - 2
College Graduate - 3
With MA Units - 4
MA degree holder - 5
Position – measured using the following codes
Score/Code
Position and File - 1
Supervisory - 2
Managerial - 3
2. Enter the score in a table for each corresponding respondent for all
variables.
67
Respondent Educ’l Scores in
No. Sex Age Attainment Position Acceptability
1 1 36 1 1 89
2 1 28 3 2 72
3 2 25 1 1 90
4 2 36 3 1 76
5 2 29 3 3 74
6 1 45 4 3 86
7 2 39 2 1 84
8 2 45 5 2 90
9 2 52 4 2 42
10 1 56 1 1 92
11 1 58 4 3 68
12 2 61 4 3 89
13 1 48 3 1 80
14 1 39 5 2 84
15 2 53 1 1 64
16 1 27 2 1 87
17 1 60 5 2 60
18 2 50 5 3 78
19 2 42 2 1 58
20 2 35 3 2 69
21 2 30 2 1 46
22 1 54 4 3 78
23 1 50 2 1 90
24 2 48 5 3 65
25 2 40 3 2 86
____________________________________________________
3. Encode all entries made (as in the above table ) in the computer.
Open SPSS for Windows

Command Type in New Data
Input the Program Type in the Variable name
Program the appropriate test statistics (refer to ANALYZE DATA Menu).
Statistical tests (based on our research problems)
# 1 problem →
Mean
# 2 problem →
Group Mean
t-test →
a) Sex – 1, 2
t-test →
b) Age – 1 ( 39 and below)
2 (40 and above)
ANOVA → c) Education – 1,5
68
ANOVA → d) Position – 1,3
# 3 Problem → Correlation Matrix
Printout
B. t-test for differences between sex and score for acceptability level
C. t-test for Age differences in acceptability level
D. One-way ANOVA for level of Acceptability in terms of differences in
education
E. One-way ANOVA on the level of acceptability differences in terms of
differences in position
F. Correlation Matrix for all Variables (sex, age, education, position and level of
acceptability)
Lesson 2. Interpreting Results of Computer Outputs
The following are the tables taken form the printouts based on the program made for
statistical analysis. What you should do is to present the data, using the different ways of
presenting data you have learned (textual, tabular, and graphical).
Table 1. Mean acceptability Level

by Sex
___________________________________
Group : f : Mean : SD : Se
___________________________________
Female : 11 80.54 10.17 3.067
Male : 14 72.21 15.75 4.21

___________________________________
Total : 25
___________________________________
69
by Age
__________________________________
Age : f : Mean : SD : Se
Group
___________________________________
1 (39 yrs. : 10 77.10 13.2 4.18

& below)
2 (40 yrs.
& above) : 15 75.07 14.86 3.84
___________________________________
Total : 25
___________________________________

by Education
_______________________________________
_______________________________________
1 (High School) 4 83.75
2 (Col Undergrad) 5 73.00
3 (Col Grad) 6 76.17
4 (W/MA/MS) 5 72.60
5 (W/Doctoral) 5 75.40
_______________________________________
Total
_______________________________________

by Position
_______________________________________
_______________________________________
1 (Rank-anf File 11 78.09
2 (Supervisory 7 71.86
3 (Managerial) 7 76.86
_______________________________________
Total
_______________________________________
70
Table 5. Result of t-test for Sex
Differences on Level of Acceptability
____________________________________________
Statistics : Grp 1 (Female) : Grp 2 (Male)
____________________________________________
n (No. of Cases) 11 14
x (Mean) 80.54 72.21
x1 – x2 8.32
____________________________________________
t-value (df=23;01) = 1.52 NS
____________________________________________
NS
p .05 since t (tabular; .05) = 2.065
Activity 4.
A. Test the hypothesis that there are no differences in ratings of performance of

employees coming the different departments
Dept. A Dept. B
(Production) (Accounting)
90 86 90 87
96 78 85 86
86 70 80 84
85 80 84 80
80 82 90 76
78 81 90 75
95 96 87 78
92 89 80 89
84 94 84 90
80 99 83 90
B. Test the hypothesis that power consumption significantly increased in 2008.
Power Consumption for year 2006 and 2008 in thousands of KWH
Year Months
2006 Jan Feb Mar April May June July Aug Sept. Oct. Nov. Dec.
2007 4.5 3.6 5.8 9.2 8.7 6.9 7.2 7.5 8.2 8.4 9.2 9.4
2008 5.3 6.9 12.5 12.2 10.6 10.8 10.9 10.8 11.5 12.7 12.9 13.5
71

Seminar PP T Black Background

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Seminar PP T Black Background

Uploaded by

Copyright:

Available Formats

STATISTICS

WITH COMPUTER APPLICATION

DR. HIPOLITO P. PALCON

Module 1. Introduction to Basic Ideas in Statistics

Lesson 1. Statistic Defined

Module 2. Description Statistics

Lesson 1. Statistical Notation

Module 3. Inferential Statistics

Lesson 1. Testing Hypothesis

Module 4. Analysis of Computer Outputs

Lesson 1. Preparing the Program for Computer Encoding

Objectives : At the end of the lesson, students are expected to:

1. define what statistic/s is.

2. Differentiate population from sample; use the different

sampling procedures and identify the best method of determining the

3. Enumerate the different kinds of variable – their characteristics and

Lesson 1. Statistic Defined

population from a given sample of that population.

documents, and through the use of various techniques of observations.

Company for the past 5 years: 2000, P2,416,025; 1999,

presented in textual, tabular and graphic forms.

P2,117,680.75 in 2011; P1,986,592 in 2012; P1,876,458 in 2013 ; and P974,6974 in 2014.

c. Ogive (cumulative < or > )

manuscripts, the following points should be kept in mind:

1. every table should be self-explanatory

3. column of numbers should be appropriately labeled, and arranged in a logical

4. the information contained in a table may be partitioned by the insertion of

horizontal and/ or vertical lines

5. tables should be appropriately numbered, should be inserted in the text close to

where they are first mentioned

Graphic representation is often of great help in enabling us to comprehend

Features of frequency distribution allow us to make comparisons in mathematical form,

learn how to make tables of statistical results.

A. What statistical data do you have in your office or company?

form of presentation appropriate to the kind of data.

Present these data in tabular and graphical forms.

Lesson 2. Population and Sample

In every language, the term population refers to groups or aggregates of people. In

of t-shirtst; rank-and-file employees in the government.

characteristics, whatever characteristics or descriptions the population has. For example, if

such as the following are used:

pick the appropriate members that will make up the sample

say, 100 students from a population of 8,500 students, any four-digit

c) Systematic Random Sampling - a procedure which uses mathematical

a population of 100, one may use this procedure:

size by the desired sample size to get the interval.

researcher has reason to believe that the population is composed of distinct

sub-groups or strata. These subgroups or strata are characteristics or

variables of the population which may influence differences in the results

of the study. For instance, a certain Insurance Company decides to study

each subgroup, simple random sampling procedure can be applied.

Subgroup Size Size

unique carrying the same characteristics – different from the others

b) Purposive Sampling – is characterized by the use of judgment and a deliberate

effort to obtain representative samples by including typical groups in the sample

(i.e. company manager, presidents, mentally – retarded children) presumably

because there are only few members.

c) Accident Sampling – the weakest form of sampling which considers available

samples at hand because of convenience.

the smaller the error (Kerlinger; 1986)

Selecting a Representative Sample

A good sample must be as nearly representative of the entire population as possible

Adequacy of sample size

2. 3,560 students of ABC School _ ___

3. 10,000 factory workers of ABC Co. _ ___

4. 5,420 farmers of Nueva Ecija _ ___

5 1,275 members of Women Federation _ ___