Professional Documents
Culture Documents
1
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
2
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
3
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
The only way to minimize both types of error is to increase the sample
size, and this may or may not be feasible.
Hypothesis testing is the art of testing whether a variation between two
sample distributions can be explained by chance or not. In many
practical applications type I errors are more delicate than type II errors.
In these cases, care is usually focused on minimizing the occurrence of
this statistical error. Suppose, the probability for a type I error is 1% ,
then there is a 1% chance that the observed variation is not true. This is
called the level of significance. While 1% might be an acceptable level
of significance for one application, a different application can require a
very different level. For example, the standard goal of six sigma is to
achieve precision to 4.5 standard deviations above or below the mean.
This means that only 3.4 parts per million are allowed to be deficient in
a normally distributed process. The probability of type I error is
generally denoted with the Greek letter alpha, α.
To state it simply, a type I error can usually be interpreted as a false
alarm or under-active specificity. A type II error could be similarly
interpreted as an oversight, but is more akin to a lapse in attention or
under-active sensitivity. The probability of type II error is generally
denoted with the Greek letter beta, β.
c) There are two different types of tests that can be performed. A one-
tailed test looks for an increase or decrease in the parameter whereas
a two-tailed test looks for any change in the parameter (which can be
any change- increase or decrease).
We can perform the test at any level (usually 1%, 5% or 10%). For
example, performing the test at a 5% level means that there is a 5%
chance of wrongly rejecting H0.
If we perform the test at the 5% level and decide to reject the null
hypothesis, we say "there is significant evidence at the 5% level to
suggest the hypothesis is false".
One-Tailed Test
We choose a critical region. In a one-tailed test, the critical region will
have just one part (the red area below). If our sample value lies in this
region, we reject the null hypothesis in favour of the alternative.
Suppose we are looking for a definite decrease. Then the critical region
will be to the left. Note, however, that in the one-tailed test the value of
the parameter can be as high as you like.
4
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
Example
Suppose we are given that X has a Poisson distribution and we want to
carry out a hypothesis test on the mean, , based upon a sample
observation of 3.
Suppose the hypotheses are:
H0: = 9
H1: < 9
We want to test if it is "reasonable" for the observed value of 3 to have
come from a Poisson distribution with parameter 9. So what is the
probability that a value as low as 3 has come from a Po(9)?
P(X ≤ 3) = 0.0212 (this has come from a Poisson table)
The probability is less than 0.05, so there is less than a 5% chance that
the value has come from a Poisson(3) distribution. We therefore reject
the null hypothesis in favour of the alternative at the 5% level.
However, the probability is greater than 0.01, so we would not reject
the null hypothesis in favour of the alternative at the 1% level.
Two-Tailed Test
In a two-tailed test, we are looking for either an increase or a decrease.
So, for example, H0 might be that the mean is equal to 9 (as before).
This time, however, H1 would be that the mean is not equal to 9. In this
case, therefore, the critical region has two parts:
5
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
Example
Lets test the parameter p of a Binomial distribution at the 10% level.
Suppose a coin is tossed 10 times and we get 7 heads. We want to test
whether or not the coin is fair. If the coin is fair, p = 0.5 . Put this as the
null hypothesis:
H0: p = 0.5
H1: p ≠ 0.5
Now, because the test is 2-tailed, the critical region has two parts. Half
of the critical region is to the right and half is to the left. So the critical
region contains both the top 5% of the distribution and the bottom 5%
of the distribution (since we are testing at the 10% level).
If H0 is true, X ~ Bin(10, 0.5).
If the null hypothesis is true, what is the probability that X is 7 or
above?
P(X ≥ 7) = 1 - P(X < 7) = 1 - P(X ≤ 6) = 1 - 0.8281 = 0.1719
Is this in the critical region? No- because the probability that X is at
least 7 is not less than 0.05 (5%), which is what we need it to be.
So there is not significant evidence at the 10% level to reject the null
hypothesis.
d) There are two types of test data and consequently different types
of analysis. As the table below shows, parametric data has an
underlying normal distribution which allows for more conclusions to be
drawn as the shape can be mathematically described. Anything else is
non-parametric.
Parametric Non-parametric
Assumed
Normal Any
distribution
Assumed
Homogeneous Any
variance
Ordinal or
Typical data Ratio or Interval
Nominal
Data set
Independent Any
relationships
Usual central
Mean Median
measure
6
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
Simplicity; Less
Can draw more
Benefits affected by
conclusions
outliers
Tests
Independent
Independent- Mann-Whitney
measures, 2
measures t-test test
groups
Independent One-way,
Kruskal-Wallis
measures, >2 independent-
test
groups measures ANOVA
Repeated
Matched-pair t-
measures, 2 Wilcoxon test
test
conditions
Repeated One-way,
measures, >2 repeated Friedman's test
conditions measures ANOVA
As the table shows, there are different tests for parametric and non-
parametric data.
7
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
one thing causes another, you are saying that there is a direct line
between that one thing and the result. Cause means that an action will
always have a predictable reaction.
When you define correlation, the terms cause and correlation become
easier to understand. If you see a correlation between two things, you
can see that there is a relationship between those two things. One thing
doesn’t necessarily result in the other thing occurring, but it may
increase likelihood that something will occur.
Understanding the difference of cause and correlation can be helped by
an example. You can, perhaps, examine the statement: “Violent video
games cause violent behavior.” According to all research on this
matter, this statement is not true, due to the use of the word causes in
the sentence. Research has shown that violent video games may
influence violent behavior.
It also shows that a number of different factors may be responsible for a
person being violent, among them, poorer socioeconomic status,
mental illness, abusive childhoods, and bad parenting. You cannot say
violent video games are the cause of violence. In order to make the
above statement, you’d have to be able to prove that everyone who
ever played a violent video game subsequently exhibited violence.
Instead, what you can say, and what has been studied, is the
correlation between violent video games and violent behavior.
Researchers have shown that there is a connection/correlation there.
Such games may influence others to act in more aggressive ways but
they are not the sole factor and sometimes not even a factor for
predicting violence. Thus there’s a correlation there, which should be
considered, but there is no cause factor. Plenty of people were violent,
prior to the advent of video games, thus if you’re deciding between
cause and correlation here, you must choose correlation.
In some ways, it can be almost impossible, except in extremely
controlled circumstances to say any one thing causes something else,
especially when you’re dealing with human health or behavior. You can,
in limited ways, make blanket cause/effect statements about some
things. For example, heating water to a certain temperature causes it to
boil. This is a specific cause/effect relationship that no one would
dispute.
Yet it can be helpful to understand the difference between cause and
correlation since we are often barraged with information about things
that may pose health risks to us. What most researchers arrive at in
research is that some things, for instance, alcoholism and cancer are
connected or co-related. Alcoholism may increase your risk of getting
cancer, but it does not, in and of itself, cause cancer.
When you hear about the causes of disease, it’s important to be
skeptical. Scientists define correlations all the time, and unfortunately,
news media loves to call these causes, since they then translate to a
much more dramatic story. Read or listen carefully for qualifying words
8
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
9
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
10
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
You should immediately see in the bivariate plot that the relationship
between the variables is a positive one (if you can't see that, review the
section on types of relationships) because if you were to fit a single
straight line through the dots it would have a positive slope or move up
from left to right. Since the correlation is nothing more than a
quantitative estimate of the relationship, we would expect a positive
correlation.
What does a "positive relationship" mean in this context? It means that,
in general, higher scores on one variable tend to be paired with higher
scores on the other and that lower scores on one variable tend to be
paired with lower scores on the other. You should confirm visually that
this is generally true in the plot above.
Calculating the Correlation
Now we're ready to compute the correlation value. The formula for the
correlation is:
11
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
We use the symbol r to stand for the correlation. Through the magic of
mathematics it turns out that r will always be between -1.0 and +1.0. if
the correlation is negative, we have a negative relationship; if it's
positive, the relationship is positive. You don't need to know how we
came up with this formula unless you want to be a statistician. But you
probably will need to know how the formula relates to real data -- how
you can use the formula to compute the correlation. Let's look at the
data we need for the formula. Here's the original data with the other
necessary columns:
Self
Person Height (x) Esteem x*y x*x y*y
(y)
16 63 4 252 3969 16
12
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
The first three columns are the same as in the table above. The next
three columns are simple computations based on the height and self
esteem data. The bottom row consists of the sum of each column. This
is all the information we need to compute the correlation. Here are the
values from the bottom row of the table (where N is 20 people) as they
are related to the symbols in the formula:
Now, when we plug these values into the formula given above, we get
the following (I show it here tediously, one step at a time):
So, the correlation for our twenty cases is .73, which is a fairly strong
positive relationship. I guess there is a relationship between height and
self esteem, at least in this made up data!
Testing the Significance of a Correlation
Once you've computed a correlation, you can determine the probability
that the observed correlation occurred by chance. That is, you can
conduct a significance test. Most often you are interested in
13
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
determining the probability that the correlation is a real one and not a
chance occurrence. In this case, you are testing the mutually exclusive
hypotheses:
Null Hypothesis: r=0
Alternative r <>
Hypothesis: 0
The easiest way to test this hypothesis is to find a statistics book that
has a table of critical values of r. Most introductory statistics texts
would have a table like this. As in all hypothesis testing, you need to
first determine the significance level. Here, I'll use the common
significance level of alpha = .05. This means that I am conducting a test
where the odds that the correlation is a chance occurrence is no more
than 5 out of 100. Before I look up the critical value in a table I also
have to compute the degrees of freedom or df. The df is simply equal to
N-2 or, in this example, is 20-2 = 18. Finally, I have to decide whether I
am doing a one-tailed or two-tailed test. In this example, since I have no
strong prior theory to suggest whether the relationship between height
and self esteem would be positive or negative, I'll opt for the two-tailed
test. With these three pieces of information -- the significance level
(alpha = .05)), degrees of freedom (df = 18), and type of test (two-
tailed) -- I can now test the significance of the correlation I found. When
I look up this value in the handy little table at the back of my statistics
book I find that the critical value is .4438. This means that if my
correlation is greater than .4438 or less than -.4438 (remember, this is
a two-tailed test) I can conclude that the odds are less than 5 out of 100
that this is a chance occurrence. Since my correlation 0f .73 is actually
quite a bit higher, I conclude that it is not a chance finding and that the
correlation is "statistically significant" (given the parameters of the
test). I can reject the null hypothesis and accept the alternative.
The Correlation Matrix
All I've shown you so far is how to compute a correlation between two
variables. In most studies we have considerably more than two
variables. Let's say we have a study with 10 interval-level variables and
we want to estimate the relationships among all of them (i.e., between
all possible pairs of variables). In this instance, we have 45 unique
correlations to estimate (more later on how I knew that!). We could do
the above computations 45 times to obtain the correlations. Or we
could use just about any statistics program to automatically compute all
45 with a simple click of the mouse.
I used a simple statistics program to generate random data for 10
variables with 20 cases (i.e., persons) for each variable. Then, I told the
program to compute the correlations among these variables. Here's the
result:
C1 C2 C3 C4 C5 C6 C7 C8 C9
C10
14
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
C1 1.000
C2 0.274 1.000
C3 -0.134 -0.269 1.000
C4 0.201 -0.153 0.075 1.000
C5 -0.129 -0.166 0.278 -0.011 1.000
C6 -0.095 0.280 -0.348 -0.378 -0.009 1.000
C7 0.171 -0.122 0.288 0.086 0.193 0.002 1.000
C8 0.219 0.242 -0.380 -0.227 -0.551 0.324 -0.082
1.000
C9 0.518 0.238 0.002 0.082 -0.015 0.304 0.347
-0.013 1.000
C10 0.299 0.568 0.165 -0.122 -0.106 -0.169 0.243
0.014 0.352 1.000
This type of table is called a correlation matrix. It lists the variable
names (C1-C10) down the first column and across the first row. The
diagonal of a correlation matrix (i.e., the numbers that go from the
upper left corner to the lower right) always consists of ones. That's
because these are the correlations between each variable and itself
(and a variable is always perfectly correlated with itself). This statistical
program only shows the lower triangle of the correlation matrix. In
every correlation matrix there are two triangles that are the values
below and to the left of the diagonal (lower triangle) and above and to
the right of the diagonal (upper triangle). There is no reason to print
both triangles because the two triangles of a correlation matrix are
always mirror images of each other (the correlation of variable x with
variable y is always equal to the correlation of variable y with variable
x). When a matrix has this mirror-image quality above and below the
diagonal we refer to it as a symmetric matrix. A correlation matrix is
always a symmetric matrix.
To locate the correlation for any pair of variables, find the value in the
table for the row and column intersection for those two variables. For
instance, to find the correlation between variables C5 and C2, I look for
where row C2 and column C5 is (in this case it's blank because it falls in
the upper triangle area) and where row C5 and column C2 is and, in the
second case, I find that the correlation is -.166.
OK, so how did I know that there are 45 unique correlations when we
have 10 variables? There's a handy simple little formula that tells how
many pairs (e.g., correlations) there are for any number of variables:
15
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
16
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
time you are doing your research Cornell has only 20,000 students and
those who are helping are so fast at the interviewing art that together
you can interview at least 10 students per person per day in addition to
your 18 credit hours of course work. You will require 100 research
assistants for 20 days and since you are paying them minimum wage of
$5.00 per hour for ten hours ($50.00) per person per day, you will
require $100000.00 just to complete the interviews, analysis will just be
impossible. You may decide to hire additional assistants to help with the
analysis at another $100000.00 and so on assuming you have that
amount on your account.
As unrealistic as this example is, it does illustrate the very high cost of
census. For the type of information desired, a small wisely selected
sample of Cornell students can serve the purpose. You don`t even have
to hire a single assistant. You can complete the interviews and analysis
on your own. Rarely does a circumstance require a census of the
population, and even more rarely does one justify the expense.
The time factor.
A sample may provide you with needed information quickly. For
example, you are a Doctor and a disease has broken out in a village
within your area of jurisdiction, the disease is contagious and it is killing
within hours nobody knows what it is. You are required to conduct quick
tests to help save the situation. If you try a census of those affected,
they may be long dead when you arrive with your results. In such a
case just a few of those already infected could be used to provide the
required information.
The very large populations
Many populations about which inferences must be made are quite
large. For example, Consider the population of high school seniors in
United States of America, a group numbering 4,000,000. The
responsible agency in the government has to plan for how they will be
absorbed into the different departments and even the private sector.
The employers would like to have specific knowledge about the
student’s plans in order to make compatible plans to absorb them
during the coming year. But the big size of the population makes it
physically impossible to conduct a census. In such a case, selecting a
representative sample may be the only way to get the information
required from high school seniors.
The partly accessible populations
There are Some populations that are so difficult to get access to that
only a sample can be used. Like people in prison, like crashed aero
planes in the deep seas, presidents etc. The inaccessibility may be
economic or time related. Like a particular study population may be so
costly to reach like the population of planets that only a sample can be
used. In other cases, a population of some events may be taking too
long to occur that only sample information can be relied on. For
example natural disasters like a flood that occurs every 100 years or
17
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
take the example of the flood that occurred in Noah’s days. It has never
occurred again.
18
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
6. Geographical Area of the Study and the Size of the Population: If the
area covered by a survey is very large and the size of the population is
quite large, multi-stage cluster sampling would be appropriate. But if
the area and the size of the population are small, single stage
probability sampling methods could be used.
7. Financial resources: If the available finance is limited, it may become
necessary to choose a less costly sampling plan like multistage cluster
sampling or even quota sampling as a compromise. However, if the
objectives of the study and the desired level of precision cannot be
attained within the stipulated budget, there is not alternative than to
give up the proposed survey. Where the finance is not a constraint, a
researcher can choose the most appropriate method of sampling that
fits the research objective and the nature of population.
8. Time Limitation: The time limit within which the research project
should be completed restricts the choice of a sampling method. Then,
as a compromise , it may become necessary to choose less time
consuming methods like simple random sampling instead of stratified
sampling/sampling with probability proportional to size; multi-stage
cluster sampling instead of single-stage sampling of elements. Of
course, the precision has to be scarified to some extent.
The above criteria frequently conflict and the reasercher must balance
and blend them to obtain to obtain a good sampling plan. The chosen
plan thus represents an adaptation of the sampling theory to the
available facilities and resources. That is, it represents a compromise
between idealism and fasibility. One should use simple workable
methods instead of unduly elaborate and complicated techniques.
19
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
Q 5. Select any topic for research and explain how you will use
both secondary and primary sources to gather the required
information.
Ans.
After you have decided to assess the problems and the needs of the
audience, and develop their profiles, the next step is to collect
information. The programme planners and producers will use the
information only when they are sure about the quality of information.
You should, therefore, be concerned not only about the type and the
amount of information but also about the quality of information. Some
key criteria for quality information are given below :
Accuracy or Validity: It should show the true situation. For this, plan
in advance, be clear and specific with regard to information needed,
simplify your samples and research methods, use more than one
method/ source for the same data and develop guidelines for analysis
of the data.
Relevance: It should be relevant to the information users. Should
reflect the commitment for the cause of the community, engage the
target population in the process of information collection, try to know in
advance who needs what information and how it will be used.
Significance: It should be important. Many a time researchers collect a
lot of information, which is irrelevant, unnecessary and insignificant for
the purpose.
Credibility: The information should be collected in a scientific manner
to be believable. Researchers should be objective while gathering,
analyzing and interpreting information, be transparent about the
methods used to obtain information and draw conclusions.
Timeliness: Information should be available in time to make necessary
decisions. There is little use in providing information after programming
has already made a significant headway. For this you should plan in
advance, use simple tools for collection and analysis, create a schedule
with deadlines and stick to it.
20
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
21
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
22
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
23
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
24
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
25
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
better response than business reply envelopes, although they are more
expensive since you also pay for the non-respondents. One important
area of question wording is the effect of the interrogation and assertion
question formats. The interrogation format asks a question directly,
where the assertion format asks subjects to indicate their level of
agreement or disagreement with a statement.
The prenotification letter should address five items (Walonick, 1993):
1. Briefly describe why the study is being done.
2. Identify the sponsors.
3. Explain why the person receiving the pre-letter was chosen.
Personal Delivery
The researcher or his assistant may deliver the questionnaires to the
potential respondents with a request to complete them at their
convenience. After a day or two he can collect the questionnaire
method, it combines the advantages of the personal interview and the
mail survey. Alternatively, the questionnaires may be delivered in
person and the completed questionnaires may be returned by mail by
the respondent.
26
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
News-stand Inserts
This method involves inserting the covering letter, quetionnarie and self
addressed reply-paid envelope into a random sample of news-stand
copies of a newspaper or magazine.
Advantages of Questionnaires
27
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
Disadvantages of Questionnaires
For computation purposes, the formula can be used in the form shown
below which allows the variance to be derived without first calculating
the mean:
28
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
Standard Deviation
Standard deviation is the square root of the variance:
The downside is that the use of absolute values makes the analytical
treatment of functions difficult, but this is a small price to pay for such
an acronym.
In situations where the median is a more stable measure of central
tendency, it is used in place of the mean.
The example below compares the standard deviation and the MAD for a
small sample which contains an anomalous extreme value. The
measures of central tendency for the sample are:
Mean 1.7
Median 1.5
29
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
Median Absolute
0.36
Deviation
30
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
about the spread of scores in a data set. Like central tendency, they
help you summarize a bunch of numbers with one or just a few
numbers.
31
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
32
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
33
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
34
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
35
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
36
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
Interview method
Interviewing is one of the prominent methods of data collection. It may
be defined as a two way systematic conversation between an
investigator and an informant, initiated for obtaining information
relevant to a specific study. It involves not only conversation, but also
learning from the respondent’s gesture, facial expressions and pauses,
and his environment. Interviewing requires face to face contact or
contact over telephone and calls for interviewing skills. It is done by
using a structured schedule or an unstructured guide.
37
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
First the greatest value of this method is the depth and detail of
information that can be secured. When used with well conceived
schedules, an interview can obtain a great deal of information. It far
exceeds mail survey in amount and quality of data that can be secured.
Second, the interviewer can do more to improve the percentage of
responses and the quality of information received than other method.
He can note the conditions of the interview situation, and adopt
appropriate approaches to overcome such problems as the
respondent’s unwillingness, incorrect understanding of questions,
suspicion, etc.
Fourth, the interviewer can use special scoring devices, visual materials
and the like in order to improve the quality of interviewing.
38
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
39
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
40
MB0034 RESEARCH METHODOLOGY ROLL NO 510922802
• Models if any
• Design of the study
• Methodology
• Method of data collection
• Sources of data
• Sampling plan
• Data collection instruments
• Field work
• Data processing and analysis plan
• Overview of the report
• Limitation of the study
• Results: findings and discussions
• Summary, conclusions and recommendations.
41