You are on page 1of 272

Statistics

Statistics is the science of the


1. Collection
2. Organization
3. interpretation of data.

Uses of Statistics


1) Statistics helps in providing a better
understanding and exact description of a
phenomenon of nature. (Patients reactions towards
novice nurses).
(2) Statistics enable proper and efficient planning
of a statistical inquiry in any field of study. (Is
vitamin C effective in preventing cold )



(3) Statistics helps in collecting an appropriate
quantitative data.
(4) Statistics helps in presenting complex data in a
suitable tabular, diagrammatic and graphic form for an
easy and clear comprehension.
(5) Statistics helps in understanding the nature and
pattern of variability of a phenomenon through
quantitative observations. (What are the important
risk factors for heart disease, for bone cancer, for cot
(crib) death?)

(6) Statistics helps in drawing valid inference, along
with a measure of their reliability about the
population parameters from the sample data.

Example:

4 out of 5 dentists recommend Dentyne
Almost 85% of lung cancers in men and 45% in
women are tobacco-related.
Condoms are effective in 94% of the time.


Statistical analysis:
Purposes of statistical analysis
To summarize the large data in to
understandable and meaningful form.
Ex :.79.48% of all statistics are made up on
the spot.



It exactly describes the data by using
percentages and frequencies.
Ex. There is an 80% chance that in a room
full of 30 people that at least two people will
share the same birthday.
It promotes for identification of causal
factors underlying complex phenomena by
using inferential statistics.
Data are collected and analyzed in order to
predict or make inferences about
situations that have been measured in full.
It helps in making generalizations of the
results from the collected data.


Levels / scales of measurement
Nominal Level:
lowest level of measurement.
Data can be placed in two or more classes
or categories which are in descriptive
form.
Nominal data are categorized with
different names for different groups as
gender, blood type, and marital status.
These categories are discrete and non
continuous .
The numeric codes are assigned for
the decoding and describing the data
like we classify males as 1 and
females as 2.

Example:
Sex Boy Girl
Male Female
Religion Hindu Muslim Christian etc.
Activity Active Non Active
Example: Census data of 678 villages:

Census data Men Women
1971
1981
30021
61200
37889
68256

The Ordinal Level Measurement

The second or ordinal level of measurement is
characterized by variables that are assessed
incrementally.
For example:
1. Pain can be measured as slight, moderate,
or intense.
2. Exercise can be measured in terms of
frequency that is, often, sometimes, or never.




these intervals cannot be considered equal.
It shows level of ranking
there is no implication of equal distances
between groups on scale.
values of variables can be rank-ordered from
highest to lowest


A ranking of patient behaviors accordingly to how
often they occur during a given period.
Example: Always, mostly, sometimes, rarely and
never..
Other variables that can be used as ordinal data:
Depression, skin turger, helplessness, grief, stress,
confusion, nausea, self-esteem, hope
Ordinal data is used for graphical presentation



Example 1. Choice of screening among men & women:

AND
Men
Eg:
50
40
30
20
10
AND
Women
Bi variate data
Example: 2. Assessing nursing care , categories are
ranked in order from highest, quality care to very
poor care ranking from 1 to 5

1 2 3 4 5
Care is of
highest,
quality, could
not be better.
Care very
good a few
things could
be improved
but for most
part time.
Average care
no better or
worse than
could be
expected.
Care is below
average some
services
below poor.
Very poor
care, many
services
could be
improved.

The interval or ratio data can be converted into
ordinal data.
Example: 81 to100 Excellent
61 to 80 Good
41 to 60 Average.
21 to 40 Poor
Below 21 Very poor
data can be categorized in rank order cannot add up
the categories or get mean or average.
can compute percentiles and rank order correlation.



The Interval Level of Measurement:
This level is quantitative in nature.
Increments on the scale can be
measured, and they are equidistant
does not have an absolute or actual zero.

Between a temperature of 97 and 100
degrees Fahrenheit, there are three equal
increments of one degree each.
The interval between 10 degree F and
5degree F, or 10 5 = 5.


Ratio Level of Measurement :-
highest level of data
The ratio level has a meaningful zero point.
all arithmetic operations are permissible.
One can meaningfully add, subtract, multiply,
and divide numbers on a ratio scales.
All the statistical procedures suitable for interval
level data are also appropriate for ratio level data.
Obvious ratio scales include time, length, and
weight.




Ex: Monthly Income and occupational Back ground of
respondents:
Monthly Farmer Carpenter Black Smith Total

4001 5000 10 5 1 16
3001 4000 70 35 32 137
2001 3000 15 30 47 92
2000 & less 10 20 35

Total: 100 80 100 280






Measurement levels

Weight Ratio
Sex Nominal
Emotional status Ordinal
Body temperature Interval

TYPES OF STATISTICS
descriptive: describes what the data
inferential statistics: to reach conclusions




1. Descriptive Statistics

to describe the basic features of the data
in a study.
numbers, percentages, averages (mean) or
indication (variability and the relation ship
among two sets of data (correlation).
a population/parameters
samples (or subsets) / statistics.


Types of Descriptive statistics
The distribution : Frequency distribution,
percentage
Measures of central Tendency: mean,
median, mode.
Measures of Variability: range, SD ,
average deviation and inter quartile range
Measures of relationship between 2 or
more variables: correlation coefficient.

The Mean or average is probably the
most commonly used method of
describing central tendency.
The Median is the score found at the
exact middle of the set of values
The mode is the most frequently
occurring value in the set of scores.


3. Measures of variability :-
describe how spread out values
is in a distribution of values.

Range: The distance between the
highest and lowest value in a
group of values are called the range.
Standard Deviation: This statistic
describes how values vary about
the mean of the distribution.



The quartile range:
It determines the inter-quartile range,
the score at the 25
th
percentile is
subtracted from the score at the 75
th

percentile

The variance:
It is the average of the squared
differences of each score from the
mean.

4 Measures of Relationship: Correlations
are computed to determine how one
variable on another variable. The
Correlation co-efficient can vary between -
100 and +100. These two numbers represent
the extreme of perfect relationship. A
Correlation co-efficient of -100 indicates a
perfect positive relationship and 0.0
indicates the absence of any relationship.

Calculated by
degree of relation between two
variables.
Charles Spearmans co-efficient of
correlation determines the extent to
which the two Karl Persons coefficient
of correlation (or) simple correlation:
measures the sets of ranking are similar
or dissimilar.


Inferential statistics

For example: A Board of Examiners may want to
compare the performance of 1000 students those
completed an examination. Of these, 500 students are
girls and 500 students are boys. The 1000 students
represent our "population".
Uses of Inferential statistics can:
Provide more detailed information than descriptive
statistics
Investigate differences between and among groups.
Yield insight into relationships between variables
Reveal causes and effects and make predictions
Generate convincing support for a given theory
Level significance helps in applying the results to
population
Generally accepted due to widespread use in business
and academics


Two areas of statistical inference.
1.Estimation
2.Hypothesis testing
I step: Estimation of standard error of mean
because sample distribution have some error as
estimates of the population mean in order to make
generalization with the study results
There are two types of errors:
Type I error: Accepting experimental hypothesis when
null hypothesis is true.
Type II error: Accepting null hypothesis when
experimental hypothesis is true

II step: Apply specified test:

Features of Null Hypothesis

III step : Probability:

Example
What is probability of getting heads when you flip a
coin?
What is probability of rolling a 4 with a die?
What is probability of not rolling a 4?

IV step Confidence levels:

Types of Inferential Statistics

Parametric major ones used if sample size large, used
when the data is in interval and ratio level.

Nonparametric (less powerful) used with small sample
size when data is at nominal, ordinal level of
measurement


Parametric tests
1. ANOVA
2. ANCOVA
Non-Parametric tests
1. Chi-square
2.Mann-Whitney u test:
3.Sign test:
4.Median test:
DESCRIPTIVE STATISTICS
Types of Descriptive statistics
1. The distribution : Frequency distribution,
percentage
2. Measures of central Tendency: mean,
median, mode.
3. Measures of Variability: range, SD ,
average deviation and inter quartile range
4. Measures of relationship between 2 or
more variables: correlation coefficient.


Frequency distribution:

Example:
Frequency distribution table (Only tells about
number of sample fall into each category)

Percentage distribution table (Total no. of
sample distribution per 100)


Frequency distribution - Unvaried table

Frequency Distribution of Weights by Age
n=100

Wt in kgs F %
80-84 30 30
75-79 50 50
70-74 18 18
<70kg. 2 2
Total 100 100
Frequency distribution - Bivariate table
Frequency Distribution of Population of Villagers by
Age & Sex Distribution.
n=100

Age Male Female Total
0-4 200 180 380
5-14 600 800 1400
15-44 1000 1200 2200
45-60 800 900 1700
60- 70 300 400 700
Total 2900 3480 6380
FREQUENCY DISTRIBUTION BAR
CHART
Cumulative frequency distributions:

Cumulative frequency distributions are useful to
show what proportion of a dataset lies above or
below certain limits, e.g:

What percentage of this class scored the required
pass mark of 41%?

Marks: Frequency: Cumulative f Cumulative %
91-100 1 50 100
81-90 9 49 98
71-80 9 40 80
61-70 13 31 62
51-60 7 18 36
41-50 6 11 22
31-40 4 5 10
21-30 0 1 2
11-20 0 1 2
1-10 1 1 2
N = 50
Percentiles are points on a frequency
distribution below which a specified
percentage of cases in the distribution fall,
e.g: a person scoring at the 75th percentile
did better than 75% of those in the
distribution:

(n+1)P
100

Points to remember for frequency
distribution:
Arrange class intervals correctly

No two class intervals should share the same
value (10-14 years and 15- 19 years and so on.)

Class intervals should be with the same size. If
the interval is four it should be consistent for
the entire interval as same number.

MEASURES OF CENTRAL TENDENCY:
Mean
Median
Mode

Mean. The Mean or average is probably the most
commonly used and simplest method of describing central
tendency. It is represented x (sample mean.)
To compute the mean all you do is add up all the values and
divide by the number of values. For example, the mean or
average quiz score is determined by summing all the scores
and dividing by the number of students taking the exam.
For example, consider the test score values:

15, 20, 21, 20, 36, 15, 25, 15
The sum of these 8 values is 167, so the mean
is 167/8 = 20.875.
It means is the sum of scores divided by the
number of cases/values/subjects.

Mean = Summation of individual values
No. of observation.

Method of calculation ( method 1)

Wt in kgs Frequency
(f)
Midpoint
x
fx
65-69
60-64
55-59
50-54
45-49
40-44
35-39
6
5
10
9
9
8
7
67
62
57
52
47
42
37
402
310
570
416
423
336
259
53 fx = 2716
x = fx = 2716 = 51.245
N 53
Procedure:
Systematically arrange the scores under
appropriate class intervals (f)
Find the mid points (x) of each class
Internal.
Compute ( fx ) by multiplying frequency
with mid points.

Find the total no. of score = n

Find fx (i.e. sum of all fx)

Compute mean fx = X
n


(or)


Computation of mean from grouped
data: ( method 2 )

Scores Frequency x
1
fx
1

65-69
60-64
55-59
50-54
45-49
40-44
35-39
30-34
25-29
20-24
1
3
4
7
9
11
8
4
2
1
5
4
3
2
1
0
-1
-2
-3
-4
5
12
12
14
9
0
-8
-8
-6
-4
N = 50 fx
1
= 26
A = Assumed mean or arbitory reference
point (we take highest frequency class in
travel i.e here lies between 40-44 and its
mid point i.e. 42
Calculate x
1


Take highest frequency class interval group
as 0 point; add above it as +1, 2, 3t . . . . and
below it as -1, -2, -3 etc.

i = class interval.
f = frequency.
x
1
= deviation from arbitory reference point.

x = A + fx
1
x i
N
= 42 + 26 x 5
50
= 42 + 2.6
= 44.6

Uses of Mean

Needed to compute other statistics such as
SD,t test etc.

N.B: Mean will be disturbed by extreme high
or low scores.

Median: It is the point in a distribution with
an equal number of items on either side of it.
Or

If the items are arranged in ascending or
descending order, the value of the central item in
that series is taken as median. Median divides
scores in to two equal parts. The Median is the
score found at the exact middle of the set of
values. One way to compute the median is to list
all scores in numerical order, and then locate the
score in the center of the sample. For example, if
there are 500 scores in the list, score #250 would
be the median. If we order the 8 scores shown
above, we would get:
15, 15,15,20,20,21,25,36
There are 8 scores and score #4 and
#5 represent the halfway point.
Since both of these scores are 20,
the median is 20. If the two middle
scores had different values, you
would have to interpolate to
determine the median.

From ungrouped data it can be calculated as


= N value of n/2 item + value of n/2 + 1 item.
2 2

7 + 8 = 7.5
2

For grouped data:
n = 50
Score F
65 69 1
60 64 3
55 59 4
50 54 7
45 49 9
40 44 11
35 39 8
30 34 4
25 29 2
20 24 1
Total 50
Md = L + (n/2 f ) i
F

L = Exact lower limit of the median class.

(It can be calculated by dividing
n i.e 50
2 2

Which means median may fall some where
between 25
th
and 26
th
item. So we need to
start adding up frequencies of class intervals
from lower limit (down in the above
example as class intervals were arranged in
ascending order.) and take the class
intervals which will have 25 or 26
th
item. i.e.
in this 40-44 class interval will correspond
so we will take lowest limit of that class
interval.

L = 39.5
f = Total of all frequencies before
the median class.

i.e. 1+2+4+8 = 15


F = frequency of the median class = 11

i = class interval.

n = total of all frequencies.
Md = L (N f ) i
2
f

= 39.5 + 50 15
2 x 5
11
= 39.5 + 10x 5
11
= 39.5 + 50
11
= 39.5 + 4.55
= 44.05

Median is computed when exact mid
point or 50% point of the distribution
is desired. Extreme scores disturb the
mean where as this does not disturb
the median.
Some statistical computation such as
chi-square require median when the
scores are considered as above or below
median.

Median represents 50
th

percentile of the population as
equal number will be there
above and below 50
th
percentile
Median cant be used freely like
mean for further statistical
computations.


Mode:

It is most frequently occurring
observation in a series.
Ex: 6, 7, 8, 10, 12, 3, 2, 1, 8, 8.

Most frequently occurring value in this
example is 8.
It is defined as the size of the variable
which occurs as most frequently
repeated.
It is calculated when most typical value
is wanted.

The modal value is the highest bar in a
histogram
Ex: Size of dress, size of shoes, average
wage.
Mo = 3 md 2m.
Note: If the distribution is truly normal
(i.e., bell-shaped) the mean, median
and mode are all equal to each other.

MEASURES OF VARIABLITY
Measures of Central tendency mean, median & mode
makes us to compare quality or characteristics of a whole
group/two groups by a single number. It unable us to know
much about the distribution of scores in a series or
characteristics items in a group. Hence measures of central
tendency provide insufficient base for the comparison of two
or more frequency distribution or sets of scores.


For example:
Test scores of group A (boys) 40, 38, 36,
17, 20, 19, 18, 3, 5, 4
Group B (girls) 19, 20, 22, 18, 21, 23, 17, 20,
22, 18.
Range (R)
Ex: Diastolic BP of Ten patients:
76, 80, 70, 90, 96, 94, 84, 98, 72
Range = 98(highest score) - 70 (lowest
score) = 28



The computation of this measure
of variability is recommended:

1. When we need to know simply the
highest are lowest scores of the total
spread.
2. When the group or distribution is too
small.
3. When we want to know about the
variability within the group within no
time.

4. When we require speed and ease in the
computation of a measure of variability; and
5. When the distribution of the scores of the
group is such that the computation of other
measures of variability is not much useful.

Quartile deviation (Q)

Formula Q = Q
3
Q
1

2

Q
3
Q
1
represent the 1
st
and 3
rd

quartiles of the distribution under
consideration.

It is also called as inter quartile
range. The number of scores is
divided in to 3 parts & each part
sum is taken for Q
1
& Q
3
.


The use of this measure is recommended

1. When the distribution is skewed
containing a few very extreme scores
2. When the measure of central tendency
is available in the form of median
3. When the distribution is truncated
(irregular) or has some indeterminate
end values

4. When we have to determine the
concentration around the middle 50
percent of the cases

5. When the various percentiles and
quartiles have been already computed

Earnings of Two Employees

Earnings of Employee
A/Day
Earnings of Employee
B/Day
$200 $200
$210 $20
$190 $400
$201 $0
$199 $390
$195 $10
$205 $200
$200 $380
The mean, the median, and the mode of
each employee's daily earnings all equal
$200. Yet, there is significant difference
between the two sets of numbers. For
example, the daily earnings of Employee A
are much more consistent than those of
Employee B, which show great variation.
This example illustrates the need for
measures of variation or spread.
Employee A's earnings have considerably less
deviation than do Employee B's. The
variance is defined as the sum of the squared
deviations of n measurements from their
mean divided by (n 1).

So, from the table of employee earnings, the
mean for Employee A is $200, and the
deviations from the mean are as follows:


The squared deviations from the mean are,
therefore, the following:


The squared deviations are, therefore, the
following:



The sum of these squared deviations equals
217,000. Dividing by (n 1) yields 217,000/7,
which equals 31,000.
Although they earned the same totals, there
is significant difference in variance between
the daily earnings of the two employees.


4. Standard deviation:
Definition:
It is the square root of the average of
the squares of the deviation of each
score from the mean.
Standard deviation is a statistical
measure of spread or variability.The
standard deviation is the root mean
square (RMS) deviation of the
values from their arithmetic mean.

Its symbol is (the Greek letter sigma)
Uses :
When we need a most reliable measure of
variability
It is useful while calculating correlation
coefficient, SED (significance of difference
between means).
It is useful when the distribution of sample is
normal or near normal.
When measure of central tendency is available in
the form of mean
When the distribution is normal or near to
normal.


FORMULA:







= Sum of
X = Individual score
M = Mean of all scores
n = Sample size (Number of scores)



Example 1 : Standard Deviation Method1
Example: To find the Standard deviation of
1, 2, 3, 4, and 5.

Step 1: Calculate the mean and
deviation.

X M (X-M) (X-M)
2

1 3 -2 4
2 3 -1 1
3 3 0 0
4 3 1 1
5 3 2 4


Step 2: Find the sum of (X-M)
2

4+1+0+1+4 = 10
Step 3:N =5, the total number of values. Find N-1.
5-1 = 4
Step 4: Now find Standard Deviation using the formula.


10/4 = 1.58113









Example 2: Solving Meaning of
Standard Deviation

Solve the standard deviation for
the values 12, 7, 11 and 6.
Solution:
i) We can find the mean and deviation.
X = 12, 7, 11, 6

M =

=

= 9


(ii) Then we can find the sum of (X - M)
2










N = 4, the total number of values.
Then N-1 = 4 - 1
= 3


X X-M (X-M)
2

12
7
11
6
12-9 = 3
7-9 = -2
11-9 = 2
6-9 = -3
9
4
4
9
Total 26
(iii) The Standard Deviation can be located
by the method.

S =

=

=

= 2.94

Example 3: Solving Meaning of
Standard Deviation
Solve the standard deviation for the values 14, 11, 9, 7,
4 and 3.
Solution:
(i) We can find the mean and deviation.
X = 14, 11, 9, 7, 4, 3

M=


=

= 8
(ii) Then we can find the sum of (X - M)
2

S =

=


=

= 4.20

Shortcut Formulae
A shortcut method of calculating variance
and standard deviation requires two
quantities: sum of the values and sum of the
squares of the values.
x = sum of the measures
x
2
= sum of the squares of the measures
For example, using these six measures: 3, 9,
1, 2, 5, and 4:

The quantities are then
substituted by the shortcut
formula shortcut formula

The variance and standard
deviation are now found as before
Solving Meaning of Standard
Deviation Practice Problems
The following practice problem
shows meaning for standard
deviation.
Solve the standard deviation for
the values 11, 3, 12 and 6.
Solution:
= 4.24

Solve the standard deviation for the values 9, 11, 8,
7, 4 and 3.
Solution:
= 3.04
Solve the standard deviation for the values 8, 9, 6,
12 and 5.
Solution:
= 2.73
Symbols:
denotes the standard deviation of a population.
S denotes the standard deviation of a sample
S
2
denotes variance

Calculating SD with excel
Enter values in a column
Click Data Analysis
on the Tools menu
Select Descriptive
Statistics and click
OK
Click Input Range
icon
Highlight all the
values in the column
Click OK
Check if labels are
in the first row
Check Summary
Statistics
SD is calculated precisely
Plus several other Descriptive
statistics
MEASURES OF RELATIONSHIP -
CORRELATION
Definition: The relationship (or association) between
two quantitatively measured (continuous) variables is
called correlation. An increase in stress, for example, may
be related to an increase in specific somatic symptoms.
The data can be represented by the ordered pairs (x,y)
where x is the independent, or explanatory, variable
and y is the dependent, or response, variable.
Examples of variables that may be correlated:

height and shoe size
SAT score and grade point average
number of cigarettes smoked per day and
lung capacity
- Dull children tend to be more neurotics
than bright children
- Is there any relationship between the size
of the skull and general intelligence of the
individuals?
Note: Correlation Tests will establish the
association among variables but cant show
cause and effect relationship.
Coefficient of Correlation:
LINEAR CORRELATION
The purpose of a LINEAR CORRELATION ANALYSIS
is to determine whether there is a relationship
between two sets of variables through scatter plots .
A scatter plot is a graph of the ordered pairs (x, y) of
numbers consisting of the independent variable, x,
and the dependent variable, y.\

We may find that there are five kinds of
linear correlation. They are
A. Perfect positive correlation.
B. Moderately positive correlation.
C. Perfect negative correlation.
D. Moderately negative correlation.
E. Absolutely No correlation.
This relationship between the variables can
be easily visualized by using
SCATTERED DIAGRAMS. They are

1. Perfect positive correlation:
E.g.: Height and weight; Age and Height; age and
weight.
It is very difficult to get perfect positive Correlation
On the abscissa x values plotted and on ordinate y
values are plotted.


5
4
3
2
1
2 4 8 9 10
6
7
12
Ex: Bivariate distribution. Here it should be noted that
every increase of 2 units on x variable there is
corresponding increase of one unit on y. Variable here
is a straight line runs form lower left of the scattered
diagram to upper right. If this were a perfect positive
correlation all of the points would fall on a straight
line. The more linear the data points, the closer the
relationship between the two variables .Positive
Correlationas x increases, y increases
Perfectly Negative Correlation:
Ex 1: If pressure in lung increase its
air volume decreases
5
4
3
2
1
2 4 6 8 10
6
7
12 14 16
Example :2
Notice that in this example as the number of
parasites increases, the harvest of
unblemished apples decreases. If this were
a perfect negative correlation all of the
points would fall on a line with a negative
slope. The more linear the data points, the
more negatively correlated are the two
variables. Negative Correlationas x
increases, y decreases

Moderately Positive correlation: The Correlation
ranges from 0 to 0.8. Here the scatter will be around an
imaginary line which runs from lower left to upper
right.
Ex: Temperature & pulse rate correlation.

X
Y
Moderately Negative correlation:
Ex: Age & vital capacity in adults:


X
Y
Absolutely No correlation: Here the variables are
not related to one another.
X in completely independent of Y. Ex: Height and
pulse rate, height and I.Q

X
Y
Interpretation of Correlation
Coefficient
Coefficient
Range
Strength of
Relationship
0.00 - 0.20 Very Low
0.20 - 0.40 Low
0.40 - 0.60 Moderate
0.60 - 0.80 High Moderate
0.8.- 1 Very High
Pearsons correlation coefficient is also
known as Karl Pearsons correlation
coefficient.
Pearsons correlation coefficient is the
method of measuring the correlation.
This method was developed by Karl Pearson
and is therefore named Pearsons correlation
coefficient.
Typically denoted by r is a measure of the
correlation (linear dependence) between
two variables X and Y, giving a value
between +1 and 1

ADVANTAGES
It is known as the best method of measuring
the correlation, because it is based on the
method of covariance.

Pearsons correlation coefficient gives
information about the degree of correlation
as well as the direction of the correlation.

Pearson product moment
correlation ( Method 1)
N = Number of values or elements
X = 1st Score
Y = 2nd Score
XY = Sum of the product of 1st and
2nd Scores
X = Sum of 1st Scores
y = Sum of 2nd Scores
x
2
= Sum of square 1st Scores
y
2
=
Sum of square 2nd Scores


Example 1 : Knowledge scores in Test I & II

X Y X
2
Y
2
XY
19
18
15
15
13
12
12
10
9
7
16
15
11
14
12
10
9
10
3
5
361
324
225
225
169
148
144
100
81
49
256
225
121
196
144
100
81
100
64
25
304
270
165
210
156
120
168
100
72
35
x =130 y=110 x
2
=1822 y
2 = 1312
xy=1540
r = 1540 (130) (110)
10
______________________________________






=
10
(110)(110)
- 1312
10
) 130 )( 130 (
- 1822 x
13464
(110)
102 132
) 110 (

1210 - 1620x1312 - 1822
1430 - 1540
x
= 110
116.03
= 0.948


A correlation greater than 0.9 is
generally described as strong / very
high positive correlation, hence it is
inferred that Those who have
performed good scores in test I also
performed with good scores in test II.
Bivariate Correlation coefficient:
Age & weight distribution

Example 2
Age
X
Wt
Y
X
2
Y
2
XY
1
1
2
2
3
3
4
4
5
5
6
7
9
11
13
12
13
14
15
15
1
1
4
4
9
9
16
16
25
25
36
49
81
121
169
144
169
196
225
225
6
7
18
22
39
36
52
56
75
75
x = 30 y:
115
x
2

110
y
2

1415
xy
386


= =



= = = 41/ 43



= 0.95
2 2
10
) 115 (
1415
10
(30)
- 110
10
30x115
- 86 3
2 2
10
) 115 (
1415
10
(30)
- 110
345 - 386
0x92.5 2
345 - 386
1850
41
Inference: Increase in the age positively
correlated with increase in the weight
among children below five years.
From the deviation from items: (
Method 2)
Scores of English & Maths , n = 10


X Y x
(X- )
y
(Y- )
x
2
y
2
xy
19 16 6 5 36 25 30
18 15 5 4 25 16 20
15 11 2 0 04 0 0
15 14 2 3 04 9 6
0
13 12 0 1 0 1 1
12 10 -1 -1 1 1 2
12 9 -1 -2 1 4 3
10 10 -3 -1 9 1 12
09 08 -4 -3 16 9 36
07 5 -6 -6 36 36

x
130

x
110

x
2
132

y
2

102

xy
110
xy
1. ( = = 13 =

Formula: XY / square root of x
2 *
y
2


R = = =

= 0.948
= 0.95


x 10
130
10
110
= 11
102 132
110
x 13464
110
034 . 116
110
Rank Correlation Coefficients
Rank correlation is the study of relationships
between different rankings on the same set of items. A
rank correlation coefficient measures the
correspondence between two rankings and assesses its
significance.
Meaning: Spearmans Rank correlation coefficient is a
technique which can be used to summarize the
strength and direction (negative or positive) of a
relationship between two variables.

Two of the more popular rank correlation statistics are
Spearman's rank correlation coefficient (Spearman's )
Kendall's tau rank correlation coefficient (Kendall's )
An increasing rank correlation coefficient implies
increasing agreement between rankings. The coefficient
is inside the interval [1, 1] and assumes the value:
1 if the disagreement between the two rankings is
perfect; one ranking is the reverse of the other.
0 if the rankings are completely independent.
1 if the agreement between the two rankings is perfect;
the two rankings are the same.

Spearman's rank correlation coefficient or
Spearman's Rho, named after Charles Spearman and
often denoted by the Greek letter (Rho)
Create a table from your data.
Rank the two data sets. Ranking is achieved by
giving the ranking '1' to the biggest number in a
column, '2' to the second biggest value and so on.
The smallest value in the column will get the
lowest ranking. This should be done for both sets
of measurements.
Tied scores are given the mean (average) rank. For
example, the three tied scores, but occupy three
positions (fifth, sixth and seventh) in a ranking
hierarchy of ten. The mean rank in this case is
calculated as (5+6+7) 3 = 6.

Find the difference in the ranks (d): This is
the difference between the ranks of the two
values on each row of the table. The rank of
the second value is subtracted from the
rank of the first.
Square the differences (d) To remove
negative values and then sum them ( d).
Efficient of correlation of Rank Difference
method (spearmans formula)
n = 10

Marks in
1
st
test
X
Marks in
2
nd
test
Y
Rank in
X
Rank is
Y
R
1
-R
2
=
d
d
2

12
15
24
20
8
15
21
20
11
26
21
25
35
24
17
18
25
16
16
38
8
6.5
2
4.5
10
6.5
3
4.5
9
1
6
3.5
2
5
8
7
3.5
9.5
9.5
1
2
3
0
-0.5
2
-0.5
3-0.5
-5
-0.5
0
4
9
0
0.25
4
0.25
0.25
25
0.25
0
d
2
= 3.00
r = 1 - (6 d
2
) / n(n
2
- 1)

=

=

= 1- 0.26
= 0.74



1 - 10(100)
6x43 - 1
990
258 - 1
Practice Problems for Correlation Co-efficient:
Calculate Sample Correlation Co-efficient:
X Values Y Values
3 4
2 3
1 3
3 4
2 3
5 2

Answer:
Sample Correlation co-efficient = -0.3241.
Calculate Sample Correlation Co-efficient:

X Values Y Values
5 2
5 4
2 8
9 2
3 8
2 6
7 4

Answer:
Sample Correlation co-efficient = -0.80468.
Normal Probability Distribution
Characteristics of normal Curve:
Properties of a normal distribution
A normal distribution is symmetric about its mean
The highest point is at its mean
The height of the curve decreases as one moves away
from the mean in direction,
It is bell shaped. It has 2 curves central part in
convex where come down. It becomes concave on
both sides
It is symmetrical distribution; Variable on either
side of mean is equal in number.
A normal distribution curve is uni modal (i.e., it
has only one mode)



Skewness of the Curve in zero.
It is a asymptotic (i.e. that tails never touch the base line
theoretically).
The curve is continuous, that is, there are no gaps or
holes. For each value of X, there is a corresponding value
of Y
The curve never touches the x axis. Theoretically, no
matter how far in either direction the curve extends, it
never meets the x axis but it gets increasingly closer
The mean, median, and mode are equal and are located
at the center of the distribution
In some cases where the scores of individual in a group
seriously deviate from the average, the Curves
representing these distributions also deviate from the
shape of a normal Curve. Those are called Skewness and
kurtosis

The distribution is determined by the mean mu, and
the standard deviation sigma. The mean, mu controls
the centre and standard deviation, sigma controls the
spread.
The total area under a normal distribution curve is
equal to 1.00 S.D. The area under the part of a normal
curve that lies as follows:
1. About 68.3% of the area under a normal curve is
within one standard deviation (SD)
2. About 95.5% is within two SDs
3. About 99.7% is within three SDs
4. 32% lie outside the range mean at + ISD and + 2sd
says 95.45% observations are in normal range and
4.55% outside these limits.



Normal distribution helps us to predict
that where cases will fall within a
distribution probabilistically.

For example, what are the odds, given the
population parameter of human height that
someone will grow to more than eight feet?

Answer: likely less than a .025 probability.

Skew:
Positive skew:
Negative skew:
Kurtosis is the degree of peakedness of
a distribution. A normal distribution is
a mesokurtic distribution. A pure
leptokurtic distribution has a higher
peak than the normal distribution and
has heavier tails. A pure platykurtic
distribution has a lower peak than a
normal distribution and lighter tails.

Parametric & Nonparametric
Methods
Parametric tests are statistical methods which depend
on the parameters of populations or probability
distributions and are referred to as parametric
methods.
Parametric Test - Key features
Sample randomly selected
Sample homogenous
Data at ratio or interval level
Parametric tests include:


Large sample (>30) z test
Small sample (<30) t test.
ANOVA
Regression
Correlation
Nonparametric methods
1 Methods used with qualitative data.
or:
2. Methods used with quantitative data
when no assumption can be made
about the population probability
distribution.

Non Parametric Tests -- Key
features
Sample not homogenous
Not normally distributed
Data is at ordinal and nominal levels
Nonparametric tests include:
Chi-squared test
Wilcoxon signed-rank test
Mann-Whitney test Kruskal-Wallis tests

Differences in Parametric and Non-
parametric Tests
1. Scales of Measurement
Example: 100 kg person is twice as heavy as a person
weighing 50 kg (the ratio scale), and 10C is 5C
warmer than 5C (but not twice as warm; this is the
interval scale ratio in which the zero on the scale is
not absolute but arbitrary).
2. Normal Distribution
Parametric statistics are used when the data are
normally distributed.

Example: If you measure the weights of
1,000 males and then graph the results
showing frequency of weights, you will likely
find a bell-shaped curve with most people
around the mean (average) weight at the
center of the curve, which tapers off at the
sides as frequency of extreme weights
decreases. This is called a normal
distribution.
Non-parametric statistics used when the
sample is distribution free.

3. Equal Variances
Parametric statistics used when the variance of the test
is less to compare the two sets of data
Example: The variance is a measure of the spread of
values from the mean. Suppose you wish to test if the mean
weights of males and females differed, but the values of
males are scattered much more widely (therefore a higher
variance) than those of females)
Non-parametric are used when data does not assume equal
variances among samples like chi-square test, the Mann-
Whitney U-test should be employed to compare the two
sets of

Power of Test
Parametric tests are more powerful than
those of non-parametric statistics in making
conclusions.
If data violates one or more criteria of
parametric tests, then use a non-parametric
equivalent, even though it is less powerful,
at least the risk of error is less.

Tests of significance
Basic Concepts:
1. The standard error of the mean:
The standard error of the mean:
SEM is usually estimated by the sample estimate of the
population standard deviation (sample standard deviation)
divided by the square root of the sample size (assuming
statistical independence of the values in the sample):

Random Sampling Error = standard deviation/ square root
of the sample size

2. Degrees of freedom
A single sample:
Two samples:
One-way ANOVA with g groups
3. Type I and II errors (1 of 2)


Statistical Decision
of the Null Hypothesis
H
0
True H
0
False
Reject H
0
Type I error Correct
Do not Reject H
0
Correct Type II error
4. Confidence Intervals

We can actually use the information we have about
a standard deviation from the mean and calculate
the range of values for which a sample would have
if they were to fall close to the mean of the
population.

This range is based on the probability that the
sample mean falls close to the population mean
with a probability of .95, or 5% error.

5.How Confident Are You?
Are you 100% sure?
Social scientists use a 95% as a threshold to test
whether or not the results are product of chance.
That is, we take 1 out of 20 chances to be wrong
What do you MEAN?
We build a 95% confidence interval to make sure
that the mean will be within that range

6. Significance Level:
First, the difference between the results of the
experiment and the null hypothesis is determined.
Then, assuming the null hypothesis is true; the
probability of a difference is computed.
Finally, this probability is compared to the significance
level from the table with specific degrees of freedom
If the calculated probability is less than or equal to
the significance level, then the null hypothesis is
rejected and the outcome is said to be statistically
significant. Significance is set at 0.05 levels
(sometimes called the 5% level) or the 0.01 level (1%
level), The significance level can be defined as the
probability of constructing a type I error.
Therefore if we select significance level of 0.05 denotes the 5%
possibility of constructing a type I error.
If we select significance level of 0.01 denotes the 1% possibility of
constructing a type I error.

The significance level is used in hypothesis testing
7. What is the difference between a
probability value and the significance
level?
Odds ratios are widely used in medical
literature because:
They provide an estimate (with confidence
intervals) for the relationship between two
binary (yes/no) variables.

They enable us to examine the effects of other variables on
that relationship, using logistic regression.
They are useful in case-control studies.
The odds are a way of representing probability.
8. Two decision making rules of hypothesis testing

Rule one: If the p-value (calculated value) is less than or
equal to the significance level (table value) then reject the
null hypothesis and conclude that the research finding is
statistically significant.
Rule two: If the p-value is greater than the significance
level then you fail to reject the null hypothesis and
conclude that the finding is not statistically significant.

9. Two areas of statistical inference.
Estimation
Hypothesis testing
1.Estimation:
A. Types of estimation
B. Points to remember:
2. Hypothesis testing
10. Types of Statistical Hypotheses.
A) Null hypothesis.
B) Alternative hypothesis.
H
0
: P = 0.5
H
a
: P 0.5



Steps of Hypothesis Testing
State the hypotheses. This involves stating the null and alternative
hypotheses. The hypotheses are stated in such a way that they are
mutually exclusive. That is, if one is true, the other must be false.
Formulate an analysis plan. The analysis plan describes how to use
sample data to evaluate the null hypothesis. The evaluation often
focuses around a single test statistic.
Analyze sample data. Find the value of the test statistic (mean score,
proportion, t-score, z-score, etc.) described in the analysis plan.
Interpret results. Apply the decision rule described in the analysis plan.
If the value of the test statistic is unlikely, based on the null hypothesis,
reject the null hypothesis. Describe the results with probability ( level
of significance ) Example: In a study designed to determine the effects
of primary care nursing as compared with functional team nursing on
patient satisfaction, a significant difference was found between the two
approaches to patient care. Higher rates of satisfaction were found
among patients exposed to primary care nursing ( t = 12.23, p < .05).

Parametric vs. Non-parametric Tests
Parametric Non-parametric
Assumed distribution Normal Any
Assumed variance Homogeneous Any
Typical data Ratio or Interval Ordinal or Nominal
Data set relationships Independent Any
Usual central measure Mean Median
Benefits Can draw more conclusions
Simplicity; Less affected by
outliers
Tests
Choosing Choosing parametric test
Choosing a non-parametric
test
Correlation test Pearson Spearman
Independent measures, 2
groups
Independent-measures t-test Mann-Whitney test
Independent measures, >2
groups
One-way, independent-
measures ANOVA
Kruskal-Wallis test
Repeated measures, 2
conditions
Matched-pair t-test Wilcoxon test
Repeated measures, >2
conditions
One-way, repeated measures
ANOVA
Friedman's test
Choosing an Appropriate Statistical Test
Goal
Dataset
Measurement
(from a normal
distribution)
Rank, Score, or
Measurement
(from non-normal
distribution)
Binomial
(e.g. heads or tails)
Describe one group:
Mean, SD
Median, interquartile
range
Proportion
Compare one group to
a hypothetical value: One-sample t test Wilcoxon test
Chi-square
or
Binomial test
Compare two
unpaired groups: Unpaired t test Mann-Whitney test
Fisher's exact test
(or chi-square for large
samples)
Compare two paired
groups:
Paired t test Wilcoxon test McNemar's test
Compare three or
more unmatched
groups:
One-way ANOVA Kruskal-Wallis test Chi-square test
Compare three or
more matched groups:
Repeated-measures
ANOVA
Friedman test Cochrane Q test
Quantify association
between two variables: Pearson correlation Spearman correlation
Contingency
coefficients
Predict value from
another measured
variable:
Simple regression
Nonparametric
regression
Simple logistic
regression
Predict value from
several measured or
binomial variables:
Multiple regression
Multiple logistic
regression
PARA METRIC TESTS FOR HYPOTHESIS TESTING
LARGE & INDEPENDENT SAMPLE
(2 GROUPS I.E, EXPERIMENTAL & CONTROL)


Z test
1. Significance of difference between the means has to
be calculated.

Step - 1
SED or
m m
D
2 1
2 2
o o o + =
o m
1
= the standard error of mean
of first sample.

1
1
1
N
m
o
o =
2
2
2
N
m
o
o =
Or
Directly



Step - 2
Complete Z Value

2
2
2
1
2
1
N N
D
o o
o + =
=

=
D
Z
m m
o
2 1
Difference between means
Standard error of the difference between
means
Step 3
Compare the null hypothesis at 0.05 and 0.01 level of significance.
LARGE & INDEPENDENT SAMPLE
Example 1: The teacher has taught lecture cum demonstration
to Group A and Group B by only lecture method. Which
method is effective?

H
O
= There exists no significant difference between means of 2
sample.

Groups A Group B

Mean 43 30
o 8 7
No 65 65



65
7 7
65
8 8
2
2
2
1
2
1
x x
N N
D + = +
o o
o
32 . 1
65
113
65
49 64
= =
x
85 . 9
32 . 1
13
2 1
D
m m
Z
The critical value is higher than values at 5% i.e., 1.96
so the difference is significant and we reject null
hypothesis saying that lecture-cum demonstration
methods is effective than only lecture method.
T-test
The Student's t-test (or simply t-test) was developed
by William Gosset - "Student" in 1908. The t-test is
used to compare two groups and comes in at least 3
kinds. A t-test is an inferential statistical technique
used to compare the means of two groups. The
reporting of the results of a t-test generally includes
the df, t-value, and probability level is generally used.

A t-test can be one-tailed or two-tailed.
One-tailed test: Used where there is some basis
(e.g. previous experimental observation) to predict
the direction of the difference, e.g. expectation of a
significant difference between the groups. . If the
hypothesis of the study is directional, a one-tailed
test is used.
Two-tailed test: Used where there is no basis to assume
that there may be a significant difference between the
groups - this is the test most frequently used. If the
hypothesis is non directional, a two-tailed test is used.
Why are they called "tails"?
Note that H
A
states 'there is a difference .... ', it
does not state why there is a difference or whether
the difference between the two groups if greater or
less than. If H
A
had specified the nature of the
difference, this would have been a one-tailed
hypothesis. However, since H
A
does not specify the
nature of the difference, hence we can either
accept a reduction or an increase. This is therefore
a two-tailed hypothesis. For a variety of reasons
two-tailed hypotheses are safer than one-tailed.

Another classification:
Paired t-test:
Unpaired t-test:
Two-sample assuming equal variances
Two-sample assuming unequal variances
1.Unpaired t-test
2.Paired t-test
Samples are independent when there are two
separate groups such as an experimental group and a
control group.
Samples are dependent when the participants from
the two groups are paired in some manner. For
example, when the same participants are assessed on a
given characteristic before and after an
intervention

SIGNIFICANCE OF DIFFERENCE BETWEEN CORRELATED ( WITH IN)
SAMPLE SINGLE GROUP
Formula




Example 2 : Ten subjects were tested on an attitude
scale. They were made to read literature in order to
bring change in their attitude. The Attitude scale is re
administered . Check whether literature could bring s
change in the attitude

) 1 (
) (
2 2
2

=
N
D D N
D
t
c c
c
df = n 1
Null hypothesis: There in significant
deference in the attitude score before and after reading
the literature.


Initial Final D D
2

10 11 -1 1
9 7 2 4
9 8 1 1
8 9 -1 1
8 6 2 4
7 6 1 1
7 8 -1 1
5 4 1 1
4 3 1 1
4 4 0 0
N=10 N=10 c D=5 c D
2=
15
9
125
5
9
5 5 15 10
5
=

=
x x
t
34 . 1
73 . 3
5
88 , 13
5
= = =
df = n -1 = 10 -1 = 9
Table value is 2.26 at 5% level with 9 df , calculated
value is lower than the table value , Hence accepting
null hypothesis saying that reading literature has
shown significant deference in the attitude score
before and after reading it.
t TEST SMALL & INDEPENDENT SAMPLE
(2 GROUPS i.e, EXPERIMENTAL &
CONTROL)
Example 3: Two groups of 10 students each got the
following scored on an attitude scale. Find the
significant difference between means.

Group I 10, 9, 8, 7, 7, 8, 6, 5, 6, 4
Group II 9, 8, 6, 7, 8, 8, 11, 12, 6, 5
Df= N
1
+ N
2
-2
10 + 10 2 = 18

Null hypothesis: There in significant deference in the
attitude score between two groups of 10 students

Group I Group II
x
1
m
1
x
1
x
1
2
x
2
m
2
x
2
x
2
2

10 7 3 9 7 81 1 1
9 7 2 4 8 8 0 0
8 7 1 1 6 8 -2 4
7 7 0 0 7 8 -1 1
7 7 0 0 8 8 0 6
8 7 1 1 8 8 0 0
6 7 -1 1 11 8 3 9
5 7 -2 4 12 8 4 16
6 7 -1 1 6 8 -2 4
4 7 -3 9 5 8 -3 9
Total 70 c X
1
2
=30 Total 80 cx
2
2
=44
m
1
= 70/10 = 7 m
2
= 80/10 = 8
03 . 2 111 . 4
18
74
9 9
44 30
) 1 ( ) 1 (
) 1
2 1
2
2
2
1
= = =
+
+
=
+
+
=
N N
x x
Pooled
c c
o
2) SED or
10
1
10
1
03 . 2
2
1 1
1
+ = + =
N N
Pooled D o o
908 . 0
5
1
03 . 2 = =
1 . 1
908 . 0
1
908 . 0
8 7
/
2 1
=

=
SED D
m m
t
o
18 DF 5% t value = 2.10
Table value is 2.10 at 5% level with 18 df, calculated
value is lower than the table value , Hence accepting
null hypothesis saying that there is a significant
deference in the attitude score between two groups of
10 students.
SIGNIFICANCE OF DIFFERENCE
BETWEEN 2 MEANS FOR 2 SMALL BUT
INDEPENDENT SAMPLES

1
st
calculate pooled SD
In samples we calculate simple SD called pooled SD for
further calculation of SED or oD.



1) Pooled SD =


2) Calculate o D =


3) Calculate t value

4) Test Null H
O
at pre-established level of
significance.
Df = (N
1
+ N
2
2)
At 5% or 1% level

) 1 ( ) 1 (
2 1
2
2
2
1
+
+
N N
x x c c
2
1 1
1
N N
+ o
D
m m
o
2 1

=
5) Compare t values.
Example 4: Language teacher divides the class in 2 groups. For
example group they gave 2 hours daily reading of news paper &
magazine and not for control group. After 6 months both
groups were given a vocabulary test. The scores obtained are:

Experimental Group:115, 112, 109, 112, 137
Control Group :110, 112, 95, 105, 111, 97, 112, 102

Null hypothesis : There is no significant deference in the a vocabulary test among the
experimental group who had 2 hours daily reading of news paper & magazine for 6
months and not for control group
Group I Group II
x
1
m
1
x
1
x
1
2
x
2
m
2
x
2
x
2
2

115 117 -2 4 110 105.5 4.5 20.25
112 117 -5 25 112 105.5 6.5 42.25
109 117 -81 64 95 105.5 -10.5 110.25
112 117 -5 25 105 105.5 -0.5 0.25
137 117 -20 466 111 105.5 5.5 30.25
- - - - 97 105.5 -8.5 72.25
- - - - 112 105.5 6.5 42.25
- - - - 102 105.5 -3.5 12.25
565 c X
1
2
=518 844 cx
2
2
=330
m
1
= 565 / 5 = 117 m
2
= 844 / 8 = 105.5
Pooled SD =



o D =



Df = N1 + N2 2 = 5 18 2 = 11 Critical value at 5% is 1.80

Inference: Our value computed is high so it is significant and we
reject Null H
O
and accept that the1
st
method is good in increasing the
vocabulary among the students.

11
840
6 4
840
) 1 8 ( ) 1 5 (
330 518
) 1 ( ) 1 (
2 1
2
2
2
1
=
+
=
+
+
=
+
+
N N
x x c c
7 . 8 36 . 78 = =
8
1
5
1
79 . 8
2
1 1
1
+ = +
N N
o
57 . 0 79 . 8 125 . 0 200 . 0 79 . 8 x = +
9 . 4 325 . 0 79 . 8 = =
3 . 2
9 . 4
5 . 11
29 . 3
5 . 105 117
2 1
= =

=
D
m m
t
o
SIGNIFICANCE OF DIFFERENCE BETWEEN CORRELATED SAMPLE OR
WITH THE SAME GROUP (PRE & POST TEST)
o D

Example 5: A teacher of mathematics gave a test in
multiplication to the 30 students of his class. Then he
induced a state of anxiety among them and the
achievement test was re administered
T = 0.82
Initial Best Final Test
m 70 mean 67
o 6 o 58

2 1 2
2
1
2
. 2 m m m m o o o o + =
628 .
648 . 0
3
648 . 0
67 70
648 . 0 42 . 0 89 . 1 12 . 1 19 . 1
06 . 1 09 . 1 82 . 0 2 ) 06 . 1 ( ) 09 . 1 (
06 . 1
30
8 . 5
09 . 1
30
6
. 2 /
2 1
2 2
2
2
2
1
1
1
2 1 2
2
1
2

o
o
o
o
o
o o o o o
= =

=
= = +
+ =
= = =
= = =
+ =
D
m m
Z
x x x
N
m
N
m
m m m m D SED
Example 6 : A random of 10 boys had the
following IQ: 70, 120, 110, 11, 88, 83, 95, 98, 107, 100, Do
these data suggest the assumption of a population
mean IQ of 100?

t test is done to test the difference between
sample mean and population mean. It is worked out as
under:
x
70 -28.2 795.24
120 21.8 475.24
110 11.8 139.24
111 12.8 163.84
88 -10.2 104.04
83 -15.2 231.04
95 -3.2 10.24
98 -0.2 0.04
107 8.8 77.44
100 1.8 3.24
982 / 100 1999.6
x x
2
) ( x x
9 1
3818 . 0
10 / 91 . 14
100 2 . 98
2 . 98
100
/
91 . 14
9
6 . 1999
1
) (
2
= =
=

= =
=

=
= =

E
= =
n Df
X
n S
X
t
n
X x
S SD Sample

t Table value with DF 9, p 0.05 is 2.262. t Calculated


is less than table value. Hence, accept NH. It is only by
chance.

Example 7: The weight gain (pounds) of
experimental animal fed on diet A and diet B are given.
Diet A: 25, 32, 30, 34, 24, 14, 32, 24, 30, 31, 35, 25 (n = 12)
Diet B: 44, 34, 22, 10, 47, 31, 40, 30, 32, 35, 18, 21, 35, 29, 22 (n = 15)
Test whether the two diets differ significantly as
regards to their effect on increase in weight of
experimental animals.

t test is one to test the difference between two sample
mean.

Sample A
X
1

Sample B
X
2

25 -3 9 44 14 196
32 4 16 34 4 16
30 2 4 22 -8 64
34 6 36 10 -20 400
24 -4 16 47 17 289
14 -14 196 31 1 1
32 4 16 40 10 100
24 -4 16 30 0 0
30 2 4 32 2 4
31 3 9 35 5 25
35 7 49 18 -12 144
25 -3 9 21 -9 81
- - - 35 5 25
- - - 29 -1 1
- - - 22 -8 64
336 / 12 380 450 / 15 1410
= 28

= 30
) (
1
x x
2
1
) ( x x ) (
2
x x
2
2
) ( x x
1
x
2
x
78 . 8
2 15 12
) 71 . 100 ( ) 54 . 34 ( 12
2
71 . 100
14
1410
1
) (
5454 . 34
11
380
1
) (
2 1
2
2 2
2
1 1
1
2
1
2
2
2
1
2
1
2
1
2
=
+
+
=
+
+
= =
= =


= = =
= =


= = =
n n
s n s n
S SD Pooled
n
x x
S Variance SD Sample
n
x x
S Variance SD Sample
Df = n
1
+ n
2
2 = 25

t Table value for DF = 25
at P = 0.05 is 2.060
t Calculated is less than table value for DF 25, P =
0.05 not significant. Accept null hypothesis

5882 . 0
4 . 3
2
15
1
12
1
30 28
1 1
78 . 8
2 1
2 1
= =
+

=
+

=
S
n n
x x
t
Example 8: A drug given to 12 volunteers showed the
following difference in systolic BP.






Can you conclude the drug, in general, is accompanied
by an increase in Systolic BP?
For paired observations test for paired test is done.
Data is re-tabulated to facilitate the calculation of and
s (SD)

1 2 3 4 5 6 7 8 9 10 11 12
Before
Drug
120 112 110 120 106 110 110 114 120 116 104 98
After Drug 125 114 118 119 109 110 108 115 125 120 110 98
Before Drug After Drug Difference (d- ) (d- )
2

X
1
X
2
(d)
120 125 5 2.42 5.8564
112 114 2 -0.58 0.3364
110 118 8 5.42 29.3764
120 119 -1 3.58 12.8164
106 109 3 0.42 0.1764
110 110 0 -2.58 6.6564
110 108 -2 -4.58 20.9764
110 115 1 -1.58 2.4964
114 115 1 -1.58 2.4964
120 125 5 2.42 5.8564
116 120 4 1.42 2.0164
104 110 6 3.42 11.6964
98 98 0 -2.58 6.6564
31 / 12 = d 104.9068
= =2.5833
(d- )
2
=104.9068
d
d
d d
8980 . 2
8914 . 0
58 . 2
12 / 0882 . 3
5833 . 2
/
0882 . 3
11
9068 . 104
1
) (
2
= = =
=
= =


= =
n S
d
t
n
d d
S sample of SD
DF = n 1 = 11
DF = 11 P = 0.05 is 2.201
t Calculated is more than table value at
DF = 11 with P = 0.05
Hence reject null hypothesis Drug has definite
influence on systolic blood pressure.
One-way Analysis of Variance
(ANOVA)
Frequently in the study of nursing practice, more than two
means are of interest when assessing an independent variable.
Example: Nurse Investigators may want to compare three
different patient groups critical care patients, ambulatory
inpatients, and outpatients (sub groups of one independent
variable i.e. type of patient.) in terms of their level of satisfaction
with patient care.
One-way analysis of variance (ANOVA) is an extension of the t-
test that permits the investigator to simultaneously compare
more than two means.
ANOVA, unlike the t-test, uses variances to calculate a value that
reflects the differences among three or more means.
In this test an F statistic or ratio is calculated.
Analysis of Covariance (ANCOVA)
ANCOVA is an inferential statistical test that enables
investigators to adjust statistically for group differences that may
interfere with obtaining results that relate specifically to the
effects of the independent variable(s) on the dependent
variable(s).Usually there are two, three, or four factors
(independent variables) and a number of levels within each
variable (usually no more than ten).
Example : If there were three modes of delivering care
primary nursing, functional team nursing, and modified primary
nursing and both males and females were to be assessed, there
would be two independent variables (modes of delivering care,
sex) and one dependent variable (patient satisfaction) . Here
independent variables are modes of delivering care and sex,
Dependent variable is Patient satisfaction
Multivariable Analysis: Multivariate analysis refers to a
group of inferential statistical tests that enable the
investigator to examine multiple variables simultaneously.
Unlike other inferential statistical techniques, these tests
permit the investigator to examine several dependent or
independent variables simultaneously.
Example: A group of nurse investigators designed a study
to examine the effect of two forms of relaxation therapy on
levels of depression and anxiety among male and female
spinal-cord-injured young adults (paraplegics). Data
collected in relation to the two independent variables
relaxation therapy (two groups) and genderand the
dependent variableslevel of depression and level of
anxietywere analyzed using a multivariate test.

The Chi Square Test (
2
)
The chi-square (
2
) test can be used to evaluate a relationship between two
nominal or ordinal variables. It is one example of a non-parametric test. In order
to test the association between two events in a binomial or multinomial cell is by
X
2

test. The two events can often be studied for their association. Chi square tests
can only be used on actual numbers and not on percentages, proportions, means,
etc. The Chi Square statistic compares the tallies or counts of categorical
responses between two (or more) independent groups or in a single group.

Example:
Smoking cancer
Treatment outcome of disease.
Age Knowledge score.
Social class disease prevalence.
Cholesterol CAD.
Wt Diabetes mellitus
Bp Heart disease.

There are two possibilities. Either they influence or affect each other. i.e.
whether the two variables are independent (no association) or dependent
(association on each other).

This test can be used even in multi nominal sample.

I. Incidence of filariasis and social class (very rich,
middle and poor).
II. Party of the mother and weight of the baby like 1
st
,
2
nd
, 3
rd
, 4
th parity

III. State of nutrition and IQ.
<60%
61-80%
81-100%

Death & survival among control & experimental
group.
Eg: out of Drug and placebo

Groups Died Survived Total
Control on
placebo
10 25 35
Experimental on
drug
5 60 65
Total 15 85 100
Here we are seeing association between
two classes about events. So they are
called 2x2 contingency (Death,
(Survival, control, experiment ), table
or four cell.
It can calculate even if there are more
than 2 cell or class and
events(multinomial)


Eg: Social class & leprosy
Social class Leprosy
positive
Leprosy
negative
Total
Higher 4 76 80
Middle 20 180 200
Low 60 440 500
Total 84 696 780
Uses:

Chi-square test in best used even if the sample is not
in Normal distribution and even if the sample size is
small. As it is not possible to calculate significance if
the sample is very small through parametric tests liket
and z tests.

Example 100 boys & 60 girls were asked to select one of
five subjects. Do you think that the choice of subjects
is dependent upon the sex of students?

Chi Square Goodness of Fit (One
Sample Test)
This test allows us to compare a collection of
categorical data with some theoretical expected
distribution. This test is often used in genetics to
compare the results of a cross with the theoretical

Example
The opinion of 90 unmarried people and 100 married
people on child marriages were collected on an
attitude scale. Do the data indicate a significant
difference in opinion in terms of marital status?

1. Establish Hypotheses
Null Hypothesis: There no difference of opinion on
attitude about child marriages among married and
unmarried persons.


Marital
status
Agree Disagree No
opinion
Total
Un
Married
Married
14
(19.4)
27
(21.6)
66
(62.5)
66
(69.5)
10
(8)
7
(9)
90
100
41 132 17 190
2. Calculate the expected value for each cell of the
table(column & row)

Formula:









i) 90 x 41 = 19.4 ii ) 41 x 100 = 21.6
190 190

iii) 132 x 90 = 62.5 iv) 132 x 100 = 69.5
190 190
v) 17 x 90 = 8 vi) 100 x 17 = 9
100 190

Or
Total of cells in row x Column
Total frequencies


3. Calculate Chi-square statistic
Compute
2
= sum of (fo - fe)
2

fe
fo fe fo-fe (fo-fe)
2
(fo-fe)
2
/fe
14
66
10
27
66
7
19.4
62.5
8
21.6
69.5
9
-5.4
3.5
2.0
5.4
-3.5
-2.0
29.16
12.25
4
29.16
12.25
4
1.50
0.196
0.5
0.135
0.176
0.44
Total 190 190
2
= 4.106
4. CALCULATE DEGREES OF FREEDOM
The formula is


In your example is
5. Check table/ critical values at 0.05 and 0.01 level of
significance:

It is required to find the association with the table
value.

Critical values of 2

from table
0.05 level = 5.99
level = 9.210


6. Make inference
Computed X
2
value = 4.106, it is lower than the critical
values at both levels of significance. When the computed
value is less it says that it is not significant. So we accept
null hypothesis saying that there will be significant
difference between attitudes on child marriages between
male & female.

The Chi Square statistic compares the tallies or counts of
categorical responses between two (or more)
independent groups. (Note: Chi square tests can only be
used on actual numbers and not on percentages,
proportions, means, etc.)

2 x 2 Contingency Table Chi-square
2
II
formula
Variable 2 Data type 1 Data type 2 Totals
Category 1 a b a + b
Category 2 c d c + d
Total a + c b + d
a + b + c + d =
N
Table General notation for a 2 x 2
contingency table.
Variable 1

For a 2 x 2 contingency table the Chi Square statistic is
calculated by the formula:

2
= n (AD BC)
2

(A+B) (C+D) (A+C) (B+D)
Note: notice that the four components of the
denominator are the four totals from the table
columns and rows.
Suppose you conducted a drug trial on a group of
animals and you hypothesized that the animals
receiving the drug would survive better than those
that did not receive the drug. You conduct the study
and collect the following data:
Ho: The survival of the animals is independent of
drug treatment.
Ha: The survival of the animals is associated with drug
treatment.


Table . Number of animals that survived a
treatment.
Dead Alive Total
Treated 36 14 50
Not treated 30 25 55
Total 66 39 105
Applying the formula above we get:
Chi square (
2
)

= 105[(36) (25) - (14)(30)]
2
/
(50)(55)(39)(66) = 3.418

Before we can proceed we need to know how many
degrees of freedom we have. When a comparison is
made between one sample and another, a simple rule
is that the degrees of freedom equal (number of
columns minus one) x (number of rows minus one)
not counting the totals for rows or columns. For our
data this gives (2-1) x (2-1) = 1.

We now have our chi square statistic (
2
= 3.418), our
predetermined alpha level of significance (0.05), and
our degrees of freedom (df =1). Since our
2 statistic

(3.418) did not exceed the critical value for 0.05
probability level (3.841) we can accept the null
hypothesis that the survival of the animals is
independent of drug treatment (i.e. the drug had no
effect on survival).

Probability level (alpha)
Df 0.5 0.10 0.05 0.02 0.01 0.001
1 0.455 2.706 3.841 5.412 6.635 10.827
2 1.386 4.605 5.991 7.824 9.210 13.815
3 2.366 6.251 7.815 9.837 11.345 16.268
4 3.357 7.779 9.488 11.668 13.277 18.465
5 4.351 9.236 11.070 13.388 15.086 20.517
300 cases of Typhoid are admitted in a hospital in one
year. 150 cases were given ciprofloxacin and 150 cases
were given chloramphenicol. Which drug has better
cure rate.
Null HY: There is no significant difference in cure rates
among the two 2 drugs.

Drugs Curved Not cured Total
Ciprofloxacin
Chlorampheni
col
143 (A)
197 (C)
7 (B)
13 (D)
150
150
Total 280 20 300

2
= n (AD BC)
2


(A+B) (C+D) (A+C) (B+D)


2
= (143 x 13 7 x 137)2 x 300 = 810000 x 300
150 x 150 x 280 20 12600000

= 24300000 = 1.928
12600000


df = (r-1) (c -1) = (2-1) (2-1) = 1
At 0.05 level of significance critical value is 3.84
and computed value(1.928) is less so it is not
significant. Accept null hypothesis

The mothers of 2 hundred adolescents (some of
them were graduated and others were non
graduates) were asked whether they agree or
disagree certain aspects of adolescent behavior.


Null Hypo: Attitudes of mothers is independent of their
being graduates or non graduates.

Agree Disagree Total
Graduate
mothers
Non
Graduate
mothers
38 (A)
84 (C)
12 (B)
66 D
50
150
Total 122 78 N = 200

2
= n (AD BC)
2

(A+B) (C+D) (A+C) (B+D)
= 200 (38X66 84 X 12)
2

50 X 150 X 122 X 78

= 200 (2508 - 1008) = 200 X (1500)
2

71370000
= 200 X 2250000 = 45000000 = 6.305
71370000 71370000

Df = (r-1) (c-1)
= (2-1) (2-1) = 1
Table values at 0.05 level of significance = 3.841.
The computed value is higher than the
critical table value at 0.05 level of
significance so the difference is significant
and we reject null hypothesis and say that
attitudes are influenced by educational
status of the mother.


YATES FORMULA
Note: If any one of the self frequency in a (2x2)
contingency table less than 5, we use Yates
correction in the formula for calculating chi-
square test statistic. The corrected formula is.

) )( )( )( (
)
2
| (|
2
2
d b c a d c b a
N
bc ad N
x
+ + + +

=
Example:
The following data was obtained in an investigation in the
effect of vaccination of small pox
Vaccinat
ed
Non
vaccinated
Total
Attack by small
pox
3 (a) 12 (b) 15
Not attack by
small pox
8 (c) 5 (d) 13
11 (a+c) 11 (b+d) N = 28
Examine whether vaccination is effective in preventing
small pox.
Sol. Here, we want to test
H
O
: there is no association between attack of small pox
and vaccination (i.e, vaccination is not effect)

H
1
: There is association between attack of small pox
and vaccination (i.e. vaccination is effective).
For testing association we can use Chi
2
test. A= 3, b=12,
c=8, d=5:
A=3 Here 1
st
frequency is less than 5 hence

We can use Yates correction i.e.,












Tabled value of x
2
at 5% level with 1 df. = 3.84


44 . 3
36465
4489 28
36465
) 14 81 ( 28
17 11 13 15
] 14 ) 96 15 [( 28
) 12 5 )( 8 3 )( 5 8 )( 12 3 (
|) 8 12 5 3 (| 28
) )( )( )( (
)
2
| (|
2
2
2
2
2
=

=


=
+ + + +

=
+ + + +

=
d b c a d c b a
N
bc ad N
x
Inference the calculated value is less than the tabled
value of x2 at 5% level with 1 df hence we will accept
the Null Hypothesis. So vaccination is not effective in
the attack of small pox.
Regression:
After having understood the correlation between two
variables it is necessary to estimate or predict the
value of one character (variable say Y) from the known
value of the other character (variable say X) such as to
estimate height when weight is known. This is
possible when the two (variables) are linearly
correlated. The variable (Y, i.e. height) to be estimated
is called dependent variable and the variable (X, i.e.
weight) which is known, is called independent
variable. This is done by means of regression line or
equation.



Regression is the measure of the average relationship
between two or more variables in terms of the original unties
of the data. The prediction or estimation of most likely
values of one variable for specified values of the other is done
by using suitable equations involving the two variables. Such
equitations are known as Regression Equations.
In linear regression the relationship between the two
variables X and Y is linear (i.e., straight line of the type X = a
+ bY (or) Y = a + bX). In order to estimate the best average
values of the two variables two regression equations are
required. One equation is used for estimating the value of X
variable for a given value of Y variable and the second
equation is used for estimating the value of Y variable for a
given value of X variable. Therefore, the two lines of
regression are:

(i) Regression equation of X on Y is



(ii) Regression equation of Y on X is



Where X = Value of X; = Mean of X values
Y = Value of Y; = Mean of Y values
= std. Deviation of X values or series
= std Deviation of Y values or series r =
correlation coefficient between X and Y





) ( Y Y r X X
y
x
=
o
o
) ( X X r Y Y
x
y
=
o
o
x
o
y
o

Regression Coefficients:

Regression coefficient of Y on X is denoted by and
regression coefficient of X on Y is denoted by b
xy.
These are found by either of the following three
formulate.
(i) If correlation coefficient, r is ready calculate d,
the regression

Coefficients are derived as


And

(ii) If means are already calculated, the regression
coefficients are derived by the least-squares
methods:

x
y
yx
r
b
o
o
=
y
x
xy
r
b
o
o
=
2 2
) (
) )( (
x
xy
X X
Y Y X X
b
yx

=


=
2 2
) (
) )( (
y
xy
Y X
Y Y X X
b
xy

=


=
(iii) If means are not to calculated, then a simple and
direct method is adopted to find regression
coefficient as




And
n
x
x
n
Y X
xy
b
yx
2
2
) (



n
y
y
n
Y X
xy
b
xy
2
2
) (



EXAMPLE 1
The following results of the height and weight of 1000
students:
= 170 cm.; = 60 kg; r = 0.6, o
y
= 6.5 cm, o
x
= 5 kg. Anil
weight 45 kg. Sunil is 165 cm tall. Estimate the height of
Anil from his weight and the weight of Sunil from his
height.
SOLUTION

Here, Height = Y Weight = X
= 170 cm = 60 kg r = 0.6
o
y
= 6.5 cm o
x
= 5

Y
X
(i) The regression equation of Y on X is

2 . 123 78 . 0
8 . 46 78 . 0 170
) 60 ( 78 . 0 170
) 50 (
5
5 . 6
6 . 0 170
) (
=
+ =
= =
=
=
X Y
X Y
X Y
X Y
X X r Y Y
x
y
o
o
When Anils weight X = 45 kg
Then his height Y will be = 0.78 x 45 + 123.2
= 35.1 + 123.2
= 158.3 cms.

Required height of anil = 158.3 cms.

(iii) The regression equation of X on Y is
2 . 18 46 . 0
2 . 78 60 46 . 0
2 . 78 46 . 0 60
) 170 (
5 . 6
5
6 . 0 60
) (
=
+ =
=
=
=
Y X
Y X
Y X
Y X
Y Y r X X
y
x
o
o
When Sunils height Y = 165 cms.
Then his weight X will be
= 0.46 x 165 18.2
= 75.9 18.2
= 57.7 kg
Required weight of Sunil = 57.7 kg.


2423854368905876587789790

Mann-Whitney U Test
Nonparametric test, alternative to two-sample t-
test
Actual measurements not used ranks of the
measurements used
Data can be ranked from highest to lowest or
lowest to highest values
Calculate Mann-Whitney U statistic
U = n1n2 + n1(n1+1) R1
2

Example of Mann-Whitney U test
Two tailed null hypothesis that there is no
difference between the heights of male and
female students
Ho: Male and female students are the same
height
HA: Male and female students are not the
same height

Heights of males
(cm)
Heights of
females
(cm)
Ranks of
male
heights
Ranks of female
heights
193 175 1 7
188 173 2 8
185 168 3 10
183 165 4 11
180 163 5 12
178 6
170 9
n1 = 7 n2 = 5 R1 = 30 R2 = 48
U = n1n2 + n1(n1+1) R1
2
U=(7)(5) + (7)(8) 30
2
U = 35 + 28 30
U = 33
U = n1n2 U
U = (7)(5) 33
U = 2
U 0.05(2),7,5 = U 0.05(2),5,7 = 30
As 33 > 30, Ho is rejected


Research Designs to Appropriate Statistical
Analyses
-----------------------------------------
DESIGN STATISTICAL TEST
-----------------------------------------
EXPERIMENTAL DESIGN

1. Basic two-group design 1. a. t-test - independent means
(Interval or ratio data)
b. Mann-Whitney U test
(Ordinal data)
c. Chi-square (nominal data)

2. Pre-test and post-test 2. a. t-test - dependent
Design. (non-independent) means
(Interval)
b. Wilcoxon or Sign test
(Ordinal)
c. McNemar test (Nominal)

4. Covariance, or repeated 4. a. Repeated measures analysis
measures design. of variance OR Analysis of
Co-variance (Interval)

b. Friedman's AOV by ranks
(Ordinal)
c. Cochran's Q (Nominal)


5. Three or more groups 5. a. Analysis of variance
Design (Interval)
b. Kruskal-Wallis (Ordinal)
c. Chi-square test
Independent groups (Nominal)

DESCRIPTIVE RESEARCH
6. one-group sample from a 6. a. One-group t-test (Interval)
Known population. b. Kolmogorov-Smirnov test for
Goodness-of-fit (Ordinal)
c. Chi-square goodness-of-fit
Test (Nominal)


Summary of Statistical Tests
t -test for independent means
t -test for dependent means
One group t-test
Analysis of variance (ANOVA)
Repeated measures analysis of
variance (RAOV)
Analysis of covariance (ANCOVA)

Categorical I.V and D.V
a. Chi-square
b. Cochran's Q
c. McNemar test
d. Lambda beta

When data are scores (ordinal measurement) use
these methods. These can be used with interval data as
well by converting the interval data to ranks.
Spearman's rank order correlation
Kendall's Tau
Kolmogorov-Smirnov test -
Mann-Whitney U-test
Wilcoxon Matched-pairs,
Kruskal-Wallis -
Friedman analysis of variance by ranks -
Relationships research questions and require large
sample sizes, i.e., twenty or more per group. Smaller
sample sizes can be used (10 to 15 per group) but the
validity of the results can be reduced.

1. Pearson's product moment correlation
coefficient
2. Regression analysis
3.Multiple regression analysis
4. Multivariate Multiple Regression Analysis -

What is Excel?
Data are organized by worksheets, rows and columns
Worksheet limits are 256 columns and 65,536 total cells
Cells contain data or formulas with relative or
absolute references to other cells
Direct manipulation of data and flexibility to move
data around (e.g. sorting, replacing, merging)
Opens many file types
Quite useful in prepping files for use in SPSS, SAS
or other programs


What is SAS?
SPSS (originally, Statistical Package for the Social
Sciences) was released in its first version in 1968 after
being developed by Norman H. Nie and C. Hadlai Hull
A general purpose statistical package with a basic
programming capability utilizing scores of statistical and
mathematical functions in numerous modules
Can readily access data from a wide variety of sources,
perform data management, and present findings in a
variety of report and graph formats
Provides powerful tools for both specialized and
enterprise-wide analytical needs

What is SPSS? (Statistical package for social
sciences)
It is used by market researchers, health researchers,
survey companies, government, education
researchers, marketing organizations and others.
Program functionality is broken into over a dozen
different modules which are sold individually
Most commonly used are Base, Regression Models, and
Advanced Models
Other modules can be installed to run more complex
analyses
SPSS data files include both the data and also
variable information (variable and value labels,
formats and missing values). It has versions of
SPSS

SPSS - Strengths
Easily opens data from other programs such as Excel
and SAS
Variable view screen allows for quick overview of file
contents and allows for easy modifications of names,
formats, labels, and variable order
Having all data information in a single file allows
sharing files on a project to be very easy
Point-and-click menus do not require memorizing
syntax for majority of procedures
Many procedures can be expanded beyond the menu
options in syntax
Split-file command allows all output to be replicated
for various groups through a single command
Journal file tracks all commands used for life of
program, with good resources to find code
accidentally deleted

SPSS Weaknesses
Ease of doing data manipulation can sometimes lead
to mistakes as the program does not preclude
inappropriate modifications to the data
Matching feature requires exact match
Duplicate records generate warnings but can be marked in
file
Error logs are hard to interpret at times
Incompleteness of menus means some options are
only available via syntax
While the majority of output is saved as pivot tables
allowing great flexibility in modifying tables
Output tables and graphs generally not done as well as
Excel and are harder to manipulate

What is analyzed in Statistics?
Descriptive statistics: Cross tabulation, Frequencies,
Descriptive, Explore, Descriptive Ratio Statistics
Bivariate statistics: Means, t-test, ANOVA,
Correlation , Nonparametric tests
Prediction for numerical outcomes: Linear regression
Prediction for identifying groups: Factor analysis,
cluster analysis

LISREL (statistics package used in structural equation
modeling)

Ideal for discrete data types
Test data, Likert scale item data
Data can be imported in various types
ASCII, Access, Excel, SAS, SPSS, etc.
Variable names have length restrictions
Data files then stored as system files for later use
Basic statistics (e.g. means and correlations) are
generated in an underlying program called
PRELIS
LISREL itself is used to confirm the structural
validity of a measurement model for any
assessment
Requires syntax and input matrices

HLM
Hierarchical Linear Modeling (HLM) is
becoming a more popular type of analysis,
namely in cohort trend modeling
Also allows you to look at variance component
estimates and regression models given a nested
sample of respondents
Students within countries within global regions on
personality variables
More tedious to set up analysis with fewer
available file types
Also requires more upfront work as multiple data files
are needed

What Program should be used?
Microsoft Excel is the most basic and accessible
spreadsheet program available today
It is most ideal for general data exploration, histograms, scatter
plots, etc.
Appearance of tables can be customized to meet APA standards
Allows for easy transition to other programs to complete analyses
and write reports
However, its heritage is not as a statistical analysis program
Certain statistical programs are designed for specific
analytic tasks
Balance the results and what will being presented
Choose wisely in the interests of efficiency and accuracy of results
Some output is good for looking at the data through basic
exploration and to generate basic tables, but not to present the data

Computers and Research
Computers are now used by researchers through
out the research process to conduct bibliographic
searchers, to learn about funding opportunities
for research projects to collect and store data of
all types, to maintain administrative records etc.
Computers can be used by just about any one;
doctors, policemen, pilots, scientists, nurses,
engineers, and recently even housewives.


Use of computers in research
1. Problem identification
2. Literature review
3. Research design
4. Data collection and analysis


PREPARING THE DATA FOR COMPUTER ANALYSIS AND
PRESENTATION


Computers have clearly created numerous opportunities for
researchers. The computer centers at universities are
particularly likely to have a variety of software packages
available to their users. Sophisticated programmes are
available for performing statistical analysis.
Computers are ideally suited fore data analysis concerning
large research projects. Researchers are essentially concerned
with huge storage of data, their faster retrieval when required
and processing of data with the aid of various techniques. In
all these operations, computers are very helpful.
Computers do facilitate the research work. Innumerable data
can be processed and analyzed with greater ease and speed.
Moreover, the results obtained are generally correct and
reliable. Not only this, even the design pictorial graphing and
report are being developed with the help of computers.

Steps for preparing the data for analysis and
presentation
The data organization and coding
Storing the data in the computer
Selection of appropriate statistical measures
I. mean
II. median
III. standard deviation ,
IV. frequency distribution
V. percentages
VI. range
VII. Variance

Selection of appropriate software package
SPSS
PSS represent a highly flexible programme with a syntax
that is not technically oriented. It has a data entry
programme that can be used to create data files for
subsequent analysis. It can be perform most widely used
multivariate analysis , including multiple regression,
analysis of covariance, discriminate function analysis,
factor analysis, multivariate analysis of variance, logistic
regression, life table analysis etc.

SAS
SAS is integrated set of data management tools that includes a
complete programming, language as well as modules for multiple
functions including spreadsheets, project management, scheduling
and mathematic, engineering and statistical applications.
Execution of the computer program
The computer may be operated to execute instructions.
Limitations of computer based analysis
Computers are machines that only compute, they do not think.
As such, researchers should be fully aware about the following
limitations of computer-based analysis.
Computerized analysis requires setting of an elaborate system of
monitoring, collection and feeding of data. All these require
time, effort, and money. Hence, computer based analysis may
not prove economical in case of small projects.

Various items of detail, which are not being specifically fed to
compute, may get lost sight of.
The computer does not think; it can only execute the instructions of
a thinking person. If poor data or faulty programs were introduced
into the computer, the data analysis would not be worthwhile.
Bibliography:
C.R .Kothari, Research Methodology, 2
nd
edition, New Delhi New
age international (p) Ltd, 2005.page no 122-151 and 361-371.
F.Polit et al, nursing research principles and methods, 6
th
edition,
New York, Lippincott Williams and Wilkins, 1999, page no 437-602.


John, R. Cutcliffe et al, The essential concepts of Nursing, 1
st
edition
China, Elsevier limited, 2005, page no 125-140.
Basavantappa, Nursing Research, 1
st
edition, New Delhi, Jaypee
brothers medical publishers (p) ltd, 1998, page no 219-230.
Dorothy Young, Fundamentals of Nursing research, 3
rd
edition,
London, Jones and Barllet publishers, 2003, 277-322.
Eleanor Walter et; al; Elements of nursing, 2nd edition, London,
The C.V Mosby company, 1977, page no 331-381.
Catherin H.C, Seamon, Research Methods Principles, Practice,
and theory for Nursing, third edition, New York, 1987, page no 331-381.


http://www.stats.gla.ac.uk/steps/glossary/hypothesis_testing.ht
ml#1err


THE END

You might also like