MPC 6

SE
o m
U
. c
50 HO
r t i n g
80 N
26 IO
a a d
y m Re
91 UT
e s
u d l n
i k
98 L
n oo
t
SO
O
s r - b
.e fo E
U
b n d
we
O
u a
H
N
w Th
IG
WWW.IGNOUASSIGNMENTS.IN
1
m
SE
c o
U
t .
50 HO
g
a r d i n
80 N m e a
26 IO
R
y e
91 UT
u d l in ks
n oo
t
98 L
O
SO
s r - b
.e b
fo E
d
U
n
we u a
O
H
w Th
N
IG
WWW.IGNOUASSIGNMENTS.IN
2
ASSIGNMENT SOLUTIONS GUIDE (2017-2018)
M.P.C.-6
Statistics in Psychology
Disclaimer/Special Note: These are just the sample of the Answers/Solutions to some of the Questions given in the
Assignments. These Sample Answers/Solutions are prepared by Private Teacher/Tutors/Authors for the help and guidance
of the student to get an idea of how he/she can answer the Questions given the Assignments. We do not claim 100%
accuracy of these sample answers as these are based on the knowledge and capability of Private Teacher/Tutor. Sample
SE
m
answers may be seen as the Guide/Help for the reference to prepare the answers of the Questions given in the assignment.
As these solutions and answers are prepared by the private teacher/tutor so the chances of error or mistake cannot be
o
denied. Any Omission or Error is highly regretted though every care has been taken while preparing these Sample Answers/
U
Solutions. Please consult your own Teacher/Tutor before you prepare a Particular Answer and for up-to-date and exact
c
information, data and solution. Student should must read and refer the official study material provided by the university.
.
50 HO
Note: All questions are compulsory.
r t i n g
Q. 1. Describe the major statistical tecniques for organising the data with suitable example and diagrams.
80 N
Ans. Organisation of Data: Techniques for organising data are:
1. Classification
26 IO
a a d
e m
2. Tabulation
3. Graphical Presentation
R
y
91 UT
4. Diagrammatical Presentation
e s
d
1. Classification
l n
Similar data may be grouped together then the classification. It is the summary of the individual scores or their
i k
u
98 L
ranges for variable of scores. To draw conclusions once it is collected, we must arrange it in a format. Hence,
n oo
t
classification is the next step after collection of data. In fact classification gives a clearer picture of data collected
SO
O
and classified to tell how frequently each data occurred.
s r - b
Frequency Distribution can be with Ungrouped Data and Grouped Data
1. We can start from lowest to highest or highest to lowest score, placing a tally mark besides each score every
.e fo E
time it occurs. The frequency is denoted by f.
U
2. Grouped Data Distribution: In case data is large of a wide range of score instead of individual score, a
b d
grouped frequency distribution will be constructed have a clear picture of data. This organises data into classes,
n
we
O
u a
which show number of observations from the data set that fall into each class.
H
Construction of Frequency Distribution: We determine:
N
w Th
1. The range of given data = Difference of highest and lowest scores.
IG
2. The number of intervals = A number of class intervals is decided upon, say anything between 5 to 30 and
divide data into it.
w3. Limits of each class interval = Another factor used in determining the number of classes in the width/size of
class which we call class interval i.
Class interval should be uniform width giving same size frequency distribution of classes. They should have a
width divisible by 2, 3, 5, 10 or 20.
There are three methods to describe the class limits:
(i) Exclusive Method: Here, upper limit of a class becomes the lower limit of the next class. The upper limit of
each is exclusive e.g., 20-30 class contains 20 and more than 20 but everything less 30 and not 30 or more and so on.
(ii) Inclusive Method: In it, upper limit of a class is not the lower limit of the next which is 1 more or (0.5 more)
than previous e.g., 20-29, 30-39, 40-49, etc.
WWW.IGNOUASSIGNMENTS.IN 3
(iii) True or Actual Class Method: In inclusive method say 20-29, 30-39, 40-49, etc., can be made continuous
(as 29 to 30, 39 to 40, etc., are missing) by taking them as 29.5-39.5, 39.5-49.5, etc.
Types of Frequency Distribution
Two of several ways of arranging frequencies of a data array based on the requirement of analysis are:
1. Relative Frequency Distribution: It indicates the proportion of the total number of cases observed at each
score value or interval of score values.
2. Cumulative Frequency Distribution: To find number of observations less than or more than particular value
can be found by this method.
3. Cumulative Relative Frequency Distribution: It is score’s cumulative frequency as a proportion of the total
number of cases.
2. Tabulation
SE
It can be a table (of frequency distribution) or in the form of graph for frequency distribution. Tabulation in the
o
process of presenting the classified data in the form of table. It is a fit for further analysis and intelligible form. Main
components of table are:
U
1. Table Number: For cases having more than one table, mark them with a number for reference and identification,
write this number on centre at the top of table.
t .
50 HO
2. Title of the Table: Every table should be given an appropriate title to describe its contents. The title should
g
r
be clear, brief and self-explanatory. We can put this title just after the table number or below it.
i n
3. Caption: Caption are brief, self-explanatory headings of columns. There may be in addition sub- headings
too. Put these captions on the top middle of column.
d a
80 N a
4. Stub: They are brief and self-explanatory headings for rows.
e m
5. Body of the Table: Containing numerical information or data in different cell, this is a real table. We put data
26 IO
R
y
according to captions and stubs.
6. Head Note: Written in extreme right hand below the title and explaining the unit of measurements used in
e
91 UT
tables.
caption, stubs and headnote are written here.

uldin ks
7. Footnote: A qualifying note written below the table explaining certain points of data uncovered in title,
n oo
t
98 L
8. Source of Data: The source of data from which it is taken is written at the end of the table.
O
SO
s
TITLE
Stub
r - b
.e fo E
Caption
Head
d
Column Head I Column Head II
b
U
Stub
n
we u a
Entries Stub Stub Stub
Stub Head
Head Head Head
O
H MAIN BODY OF THE TABLE
w Th
N
Total
IG
w
Footnote(s)
Source :
3. Graphical Presentation of Data
‘To look at’ and understand data in a systematic way is the purpose of preparing a frequency distribution. To
understand even better, graphics and/or diagrammatic forms are used. In graphs, frequencies are often plotted on a
pictorial platforms formed of horizontal and vertical lines, called graph.
A graph has x and y-axes on appropriate scale. x is horizontal called abscissa and y is vertical called ordinate.
The various graphs commonly used ones are histogram, frequency polygon, frequency curve, cumulative frequency
curve, are discussed below:
1. Histogram: It is a series of rectangles in a graph, most popularly used to present continuous distribution with
equal width on x-axis and corresponding frequency on y-axis, as height.
2. Frequency Polygon: Abscissa originating at O and ending at x are prepared. Similarly, ordinate originating at
O and ending at y are prepared. The corresponding points so obtained for data given are plotted on graph and joined
by straight lines to give frequency polygon.
3. Frequency Curve: A smooth free hand curve drawn through frequency polygon is frequency curve. The
objective is to eliminate as far as possible the random and erratic fluctuations present in the data.
Cumulative Frequency Curve or Ogive: There are two types, less-than and more-than cumulative frequency
distribution curves or ogive.
1. ‘Less than’ Ogive: In it less than cumulative frequencies are plotted against upper class boundaries of
respective classes. Slopes are left to right increasing.
2. ‘More than’ Ogive: In it more than cumulative frequencies are plotted against the lower class boundaries of
SE
respective classes, a decreasing curve sloping downwards from left or right is obtained.
4. Diagrammatic Presentation of Data
o m
Visual forms like Bar Diagram, Sub-divided Bar Diagram, Multiple Bar Diagram, Pie Diagram and Pictogram
U
c
are simple graphic presentation.
.
1. Bar Diagram: It is most useful for categorical data. A bar is as thick line, drawn from frequency distribution
50 HO
t
table representing the variable on horizontal axis and frequency on vertical axis so that height of each bar corresponds
to the frequency or value of variable.
g
a r
2. Sub-divided Bar Diagram: Sub-classification of a phenomenon can be studied by sub-divided bar diagram,
having several shades. The portion of the each sub-class is the corresponding portion of bar.
i n
80 N d
3. Multiple Bar Diagram: Two or more inter- related phenomenon or variables are shown by it, drawn side by
a
side without any gap. Different bar in set, colours or shades are used.
26 IO
proportionately constructed and shaded.
Re
y m
4. Pie Diagram: Angular diagram taken 100% = 360º and angular sectors corresponding to each data
91 UT
Q. 2. Using ANOVA find out if significant difference exists between the scores obtained by the three
e s
d
groups of employees on Work Motivation.
l n
i k
u
Group A 12 12 14 43 34 45 54 34 45 65
98 L
n oo
t
Group B 67 45 32 32 23 23 23 12 43 56
SO
O
Group C 43 45 34 33 21 11 16 27 26 32
Ans.
s r - b
.e fo E
Group A Group B Group C
U
d
2 2
x1 x2 x3 x32
x
b
2
n
x2
we
O
12
u a
144 67 4489 43 1849
12
H144 45 2025 45 2025
N
w Th 14 196 32 1024 34 1156

IG
43 1849 32 1024 33 1089
w 34
45
54
1156
2025
2916
23
23
12
529
529
144
21
11
16
441
121
256
34 1156 43 1849 27 729
45 2025 45 2025 26 676
67 4489 56 3136 32 1024
360 16,100 378 16,774 288 9366
(∑ x) 2 360 + 378 + 288 1026
Correction Term = = =
N 30 30
= 35089.2
SST = ∑ x 2 – Cx
= (16,100 + 16774 + 9366) – 35089.2
SST = 7150.8
∑ (∑ x) 2
SSA = – Cx
x
2 2 2
 360   378   288 
=   +   +   – 35089.2
10 10 10 
SE
= (1296 + 1428.84 + 829.44) – 35089.2
= 3554.28 – 35089.2
SSA = – 31534.92
c o
U
SSW = SST – SSA
SSW = 7150.8 + 31534 .92 = 38685.72
t .
50 HO
Q. 3. Define non-parametric statistics. Discuss assumptions, advantages and disadvantages of non-para-
g
metric statistics.
r i n
Ans. Non-parametric statistics cover techniques which do not rely on data belonging to any particular distribution.
They have:
d a
80 N a
1. Distribution Free Methods: They are not from normally distributed population and consist of statistical
m
models, inference and statistical tests.
e
26 IO
2. Non-parametric Statistics: Rank of observations are used.
R
y
3. No Assumption of a Structure of a Model: Techniques do not assume that the structure of a model is fixed.
e
91 UT
The individual variables are typically assumed to be from parametric distributions and assumptions of types of
(a) Non-parametric Regression
u
n oo
(b) Non-parametric Hierarchical Bayesian Models.
d
in ks
connections among variables are also made. These techniques also have:
l
t
98 L
The structure of relationship is treated non- parametrically in non-parametric regression. Based on Dirichlet
O
SO
Models.
s
Process which permits the number of latent variables to grow as necessary to fit the data in regard to Bayesian
r - b
.e fo E
Assumptions of Non-parametric Statistics
Non-Parametric Statistics is based on the ranks of observations and do not depend upon any distribution of the
b d
U
population non-parametric statistics is applied in the situation where we cannot meet the assumptions and conditions.
n
we u a
The sample with small number of items are treated with non-parametric statistics. It is based on the model that
O
H
specifies only very general conditions. In case of counted or ranked data we made use of non-parametric statistics.
w Th
Non-parametric statistics are more user friendly than parametric statistics.
N
ADVANTAGES OF NON-PARAMETRIC STATISTICS

IG
In case sample size is small, it is only useful test and in case of fewer assumptions too it is more appropriate for
w
research investigation. They are easier to learn and apply. Also their interpretation is often more direct.
DISADVANTAGES OF NON-PARAMETRIC STATISTICAL TESTS
If all assumptions of parametric are met then the non-parametric test is wasteful which (wastefulness) can be
expressed in percentage of its power efficacy.
Next they may have widely scrattered different formats.
SECTION-B
Answer the following questions in about 400 words (Wherever applicable) each.
Q. 1. Discuss type–I and type–II error and describe steps in setting up the level of significance.
Ans. Type–I Error: If we reject a true null hypothesis, we make a type I error, that is to say type I error happens
if we conclude that the research hypothesis supported by study when in reality it is false.
Suppose, a lenient probability level, say 20% is the significance level cut off. Then 20% of the chance we shall
be making type I error.
At 0.051 and 0.01 levels we make 5% and 1% type I error.
So we must know that significance level that we have set is the probability of type I error, denoted by α.
Using a very low significance level, like 0.001 is having almost surety of eliminating type I error. Mostly
p < .0001 is set by researchers who do not want to take risks.
Type II Error
But setting p < 0.0001, we run different risk, the risk that research is true but results do not give extreme enough
to reject null hypothesis. Hence, a false null hypothesis not being rejected brings in reality the type II error, i.e.,
β would prominently occur, by showing the study results as in conclusive.
Steps in Setting UP the Level of Significance
SE
1. Give proper statements of null and alternative hypothesis.
2. Set the criteria for a decision.
3. Decide α value i.e., very unlikely sample outcome of µ if null hypothesis is true.
o m
U
c
4. Criteria region, composed of extreme sample values that are very unlikely if null hypothesis is true. α decides
.
50 HO
the boundaries of critical region. If sample falls in critical region, null hypothesis is rejected.
5. We now obtain and calculated sample statistics using:
z=
x −µ
r t i n g
σx
where x = sample mean,

80 N
26 IO
a a d
µ = hypothesised population mean, and
σ x = standard error between x and µ
Re
y m
91 UT
e s
d
6. Make a decision and write down the decision rule. Use
Obtained Difference
l n
i k
z=
u
98 L
Difference due to chance

n oo
t
SO
O
To get z-score, the test statistics, its use is to determine whether the result of a research study (i.e., the obtained
s b
difference) is more than what would be expected by chance alone. If a manufacturer produces good quality articles,
r -
.e fo E
from which a purchaser picks up a sample randomly and finds many defective articles hence, rejects the whole
product lot. Here, manufacturer suffers a loss even through he produced good articles. This is type I error called
U
“productive risk”.
b n d
we
O
u a
Otherwise if consumer accepts the entire lot on the basis of a sample which is not really that good, consumer
suffers a loss, called “consumer risk” .
H
N
w Th
For practical purposes while accepting or rejecting other aspects are considered. The risks of both producer and
IG
consumer are compared. Then type I and Type II errors are fixed to reach a decision.
w
Formulating Hypothesis and Stating Conclusions
1. We state alternative hypothesis H1.
2. Opposite of H1, i.e., null hypothesis H0 shall have equally sign.
3. Let sample evidence support the alternative hypothesis, then null hypothesis is rejected and probability of
making wrong decision is now α, which can be manipulated to be as small we want.
4. If sample provides no sufficient evidence to support the alternative hypothesis then conclusion is that null
hypothesis cannot be rejected on the basis of sample. Then we must collect more information about the
phenomenon under study.
Example: A defendant in a courtroom on trial for committing a crime is likely to use the logic of hypothesis
testing as follows:
1. Formulate appropriate null and alternate hypothesis for judging the guilt or innocence of the defendant.
2. Interpret type-I and type-II errors in this context.
3. If you are a defendant, would you like α to be small or large? Explain
Ans.
1. A defendant is “innocent until proved guilty”. Under a judicial system. Hence, court has to collect sufficient
evidence to claim defendant as guilty. So
H0 : Defendant is innocent
H1 : Defendant is guilty
2. Table below shows 4 possible outcomes and errors.
SE
Decision of Court
o
Defendant Defendant
is Innocent is Guilty
U
True Defendant Correct Type-II
State of is innocent decision error
t .
50 HO
Nature
Defendant Type-I Correct
g
is guilty
d
error decision
a r i n
80 N a
3. Type-I error is most serious so it should be made small. However, which error is more serious may be
e m
debated. Setting free a guilty dependent can cause lot of harm to society or to one injustice has been done. In
26 IO
R
business type II may be a source of loss of opportunity, which may be serious.
Types of Errors for a Hypothesis Test
e y
91 UT
as follows: l
u d
We want to decide whether to reject the null hypothesis H0 in favour of alternate hyothesis H1. But on sampling
in ks
based hypothesis it may not be easy to get correct decision. Four possible cases of decisions of hypothesis testing are
n oo
t
98 L
O
Table: Conclusions and Consequences for Testing a Hypothesis.
SO
s r - b Decision of Court
.e fo E Defendant Defendant
is innocent is Guilty
b d
U
n
we
True Defendant if Correct Type-II
u a
State of innocent. decision error
O
Nature
H Defendant Type-I Correct
w Th
N
is guilty error decision
We risk a type-I error only if null hypothesis H0 is rejected and type-II error if we fail to reject H0. As α increases,
IG
w
β decreases. The two can be simultaneously be made smaller by the increase in sample size.
We avoid “accept the H0” but instead say “fail to reject” or “not to reject H 0”.
Next β is difficult to calculate.
Q. 2. Using Spearman’s rho for the following two sets of scores.
Data1 12 11 14 16 21 23 43 34 12 14
Data 2 67 68 70 87 76 98 78 87 78 78
Ans.
Data 1 (x) Data 2 (y) Rx Ry D = Rx – Ry D2
12 67 12 67 –55 3025
11 68 11 68 –57 3249
14 70 14 70 –56 3136
16 87 16 87 –71 5041
21 76 21 76 –55 3025
23 98 23 98 –75 5625
43 78 43 78 –35 1225
SE
34 87 34 87 –53 2809
12
14
78
78
12
14
78
78
–66
–64
4356
4096
o m
U
N = 10
. c
∑ D 2 = 35,587
50 HO
t
6 ∑ D2 6×35587
r3 = 1 – = 1 – 10(102 –1) = 1– .006
n(n 2 –1)
g
r3 = .994
a r i n
80 N
Q. 3. With the help of Mann Whitney ‘U’ test find if significant difference exists between the scores
a d
26 IO
obtained on attitude towards health of individuals belonging to low Socio Economic Status and high Socio
Economic Status.
Re
y m
91 UT
Low Socio EconomicStatus 32 34 32 34 45 45 34 65 45 34 45 34

e s
Ans.
u d
HighSocio EconomicStatus 67 78 76 87 76 87 68 67 78 45 56 67
l n
i k
98 L
n oo
t
Data 1 R1 Data2 R2
SO
32
O 1 67 6
34
s r -2
b 78 9
.e fo E
32 1 76 8
U
34 2 87 10
b n d
we
O
45 3 76 8
45 u a 3 87 10
H
N
w Th
34 2 68 7
IG
65 5 67 6
w 45
34
45
34
3
2
3
2
78
45
56
67
9
3
4
6
∑ R 1 = 29
∑ R 2 = 86
 (N1 + 1) 
U' = N1 N2 +  N1 – ∑ R1
 2 
12 (12 + 1) 
= 12 + 12 +   − 29
 2 
= 14 + 78 − 29 = 193
N 2 (N 2 + N1 )
U' = N1 N 2 +
2
U' = 144 + 78 – 86
U' = 136
Q. 4. Describe partical correlation with suitable example.
Ans. Partial Correlation (rP): Let A, B be closely related. If a third variable (or more) influence one or more of
them then it is partial correlation. If a third variable is influencing A and B then it can be considered as a correlation
SE
between two sets of residuals, written as RABC .
o
Formula and Example
Let us consider case of anxiety (A) and academic achievement (B) controlled for intelligence (C), so that correlation
coefficient:
U
rAB − rAC rBC
2
t .
50 HO
rp = rAB.C =
(1 − rAC
2
) (1 − rAB.C
2
)
g
Now the following data for 10 students.
a r
Table : Data of Academic Achievement, Anxiety and Intelligence for 10 Subjects.
d i n
80 N Subject
e
Academica
m
Anxiety Intelligence
26 IO
Achievement
R
y
1 15 6 25
2 18 3 29
e
91 UT
d
3 13 8 27
in ks
4 14 6 24
l
u
5 19 2 30
6
n oo 11 3 21
t
98 L
7 17 4 26
O
SO
s
8 20 4 31
9
r - b 10 5 20
.e fo E
10 16 7 25
d
We first compute the Pearson’s product moment correlation between three variables.
b
U
n
we
Here, correlation between Anxiety (B) and Academic achievement (A) is –0.369 = rBC
u a
O
The correlation between intelligence (C) and academic achievement (A)

rAC = 0.918
H
w Th
N
The correlation between anxiety (B) and academic achievement (A)

rAB = – 0.245
IG
w
then putting these values
rAB − rAC rBC
rAB.C = (1 − rAC
2
) (1 − rBC
2
)
.369 − (0.918 × −.245)

= (1 − .918) 2 [1 − (−0.245) 2 ]
rAB.C = – 0.375
Variance explained by
intelligence
Academic Intelligence
Achieve
Variance shared by
Academic Achievement
and Anxiety not
influenced by
Anxiety Intelligence
Graphical Explanation of partial correlation between Academic

Intelligence and anxiety controlled for intelligence
Fig. 1. Venn diagram explaining the partial correlation

Significance testing of partial correlation
SE
Let us write:
Using t-distribution,
H0 : ρP = 0
H0 : ρP = 0
o m
U
rP n − v
. c
50 HO
t=
1 − rP2
Where,
rp = rABC , the partial correlation computed on sample
r t i n g
n
v
80 N
=
=
26 IO
df =
sample size.
total number of variables employed in the analysis.
n–v a a d
Then by
Re
y m
91 UT
rP n − v
t =
e s
1 − rP2
l n
i k
u d
98 L
−0.375 10 − 3
n oo
t
= = 1.69
SO
1 − (−.375) 2
O
s r b
At df = 7 the critical value, significance level 0.05 from table = 2.36 > 1.69
-
.e fo E
So we accept H0.
Q. 5. Compute Chi-square for the following data:
U
Age in Years
b d
Emotional Intelligence Scores
n
we
O
u a
High Average Low
H
N
26 to 30 years 45 56 65
w Th
31 to 35 years 57 65 75
IG
36 to 40 years 23 29 10
w Ans.
Age
26-30
High
45
Average
56
Low
65
Total
166
31-35 57 65 75 197
36-40 23 29 10 62
Total 125 150 150 425
F0 – Fe
Chi-square χ 2 = Fe (total)
166
= 125 ×
425
= 48.82
we have
(observed – expected) 2 (45 – 48.82) 2
=
expected 48.82
= .29
SECTION-C
Answer the following in about 50 words each.
SE
Q. 1. Hypothesis testing.
c o
Ans. Hypothesis Testing: Hypothesis testing is one of the most useful tools of inferential statistics. If u a
certain value characterise the population of observation’s is hypothesised, the question will remain that hypothesis is
U
.
reasonable due to sample. No particular value need to be stated, rather what is the population value’ is pertinent
t
question. Hypothesis testing is also referred to as statistical decision-making process.
50 HO
Statement of Hypothesis
g
the probability distribution that we wish to validate.
d a r
A statistical hypothesis, is a statement which may or may not be true about population parameter or parameter of
i n
We usually perform experiments with random samples rather than the entire population, then the outcomes/
80 N a
inferences obtained are generalised to entire population. The observed results however could be due to chance factor
e m
that needs to be ruled out.
26 IO
This chance has a probability denoted by H0. The null hypothesis is a statement of difference. We can also state
R
y
null hypothesis that the two samples came from the same population, assuming that both samples have equal means
e
91 UT
and standard deviations and the population is normally distributed.
proposes:
u din ks
As the null hypothesis is testable proposition, there is a counter proportion, alternative hypothesis H, which
l
n oo
1. The two samples belong to different populations.
t
98 L
2. Their means are estimates of two different parametric means of the respective population.
O
SO
s
3. There is a significant difference between their sample means.
r - b
H1 is not directly tested, rather accepted or rejected due to H0.
.e fo E
Let p0 = probability of H0 being correct. If p0 is quite low then H1 is accepted.
For high value of p0, H1 is rejected.
b d
U
Level of Significance
n
we u a
The probability (p < 0.05 usually) of chance of observed results for which or lesser of null hypothesis is correct
O
H
is considered too low and results of the experiment are considered significant. p < 0.005 is level of significance,
w Th
which sometime, is even taken p < 0.01 too.
N
If p exceeds p (i.e. p > 0.05) then H0 cannot be rejected also results are not considered significant.
IG
To choose p < 0.05 or p < 0.01 depends upon choice of researcher. The null hypothesis is rejected at 0.05 means
w
out of 100 cases 5 or less times, observed results may arise.
One-Tail and Two-Tail Test
Depending upon H1’s statement, being one-tail or two-tail test is chosen for knowing the statistical significance.
A one-tail test is directional giving both magnitude and the direction of observed difference between two statistics.
In two-tail test, the researcher is interested in testing whether one sample mean is significantly higher or lower than
the other sample mean.
Q. 2. Measures of Central Tendency
Ans. In statistics, a central tendency (or measure of central tendency) is a central or typical value for a probabil-
ity distribution. It may also be called a center or location of the distribution. Colloquially, measures of central
tendency are often called averages. The most common measures of central tendency are the arithmetic mean, the
median and the mode. A central tendency can be calculated for either a finite set of values or for a theoretical
distribution, such as the normal distribution. Occasionally authors use central tendency to denote "the tendency of
quantitative data to cluster around some central value.
The central tendency of a distribution is typically contrasted with its dispersion or variability; dispersion and
central tendency are the often characterized properties of distributions. Analysts may judge whether data has a
strong or a weak central tendency based on its dispersion.
Q. 3. Scatter diagram.
Ans. The scatter diagram is known by many names, such as scatter plot, scatter graph and correlation chart. This
diagram is drawn with two variables, usually the first variable is independent and the second variable is dependent
on the first variable.
SE
o m
U
Dependent Variable
. c
50 HO
r t i n g
80 N
26 IO
a a d
R
y
em
Independent Variable
x
91 UT
e s
u d
The scatter diagram is used to find the correlation between these two variables. This diagram helps you deter-
l n
i k
mine how closely the two variables are related. After determining the correlation between the variables, you can then
98 L
predict the behavior of the dependent variable based on the measure of the independent variable. This chart is very
n oo
t
SO
useful when one variable is easy to measure and the other is not.
Q. 4. Tetrachoric Correlation.
O
s r b
Ans. Tetrachoric correlation is used to measure rater agreement for binary data; Binary data is data with two
-
.e fo E
possible answers usually right or wrong. The tetrachoric correlation estimates what the correlation would be if
measured on a continuous scale. It is used for a variety of reasons including analysis of scores in Item Response
U
b n d
Theory (IRT) and converting comorbity statistics to correlation coefficients. This type of correlation has the advan-
we
O
tage that it's not affected by the number of rating levels, or the marginal proportions for rating levels.
5. Multiple Correlation. u a
H
N
Ans. Multiple Correlation Co-efficient (R)
w Th
Correlation of one variable with multiple other variables (e.g., A, B, C, D ..... k) like A related to B, C, D .... k is
IG
multiple correlation coefficient. For A, B, C it is written as RABC.
w
So R can be obtained by:
2
rAB + rAC
2
− 2rAC rAC rBC
RA.BC = 1 − rBC
2
Where RA.BC = multiple correlation between A and B linear combination of B, C.

rAB = Correlation between A, B
rAC = Correlation between A, C
rBC = Correlation between B, C
Example: Let us take data table as follows:
Subject Academic Anxiety Intelligence
Achievement
1 15 6 25
2 18 3 29
3 13 8 27
4 14 6 24
5 19 2 30
6 11 3 21
7 17 4 26
8 20 4 31
9 10 5 20
m
10 16 7 25
SE
Correlation between A and B = – 0.369
Correlation between C and B = – 0.918
Correlation between A and C = – 0.245
c o
U
Where A = Anxiety, B = Academic achievement
t .
50 HO
C = Intelligence
Then
g
RA.BC =
2
rAB + rAC
2
− 2rAB rAC rBC
1 − rBC
2
d a r i n
80 N a
(−0.369) + (0.918) − 2 × −.369 × 9.18 × −945
2 2
= 1 − (−0.245)2
e m
26 IO
R
y
0.813
= = 0.929  0.93
0.94
e
91 UT
We also know.
Square of correlation coefficient
= % of variance explained
u dl in ks
n oo
t
98 L
∴ R2 = % of variance of linear achievement explained by linear combination of intelligence and anxiety i.e. R 2
O
SO
s
= 0.9292 = 0.865 = 86.5%.
r - b
So linear combination of intelligence and anxiety explain 86.5% variance in academic achievement. R 2 , the
.e fo E
value obtained on a sample is denoted by P2. R2is an estimator of P2 but R2 is not an unbiased estimator of P2.
Let R 2 denote an adjusted R2 then,
b d
U
(1 − R 2 ) ( n − 1)
R 2 = 1 −
n
we n − k −1
u a
O
Where
H
w Th
N
R = Adjusted value of R
2
k = Predicted variables number

IG
w∴
n = Sample size
R 2 = 1 −
(1 − R 2 ) ( n − 1)
n − k −1
(1 − 0.865) (10 − 1)
= 1− = 0.826
10 − 2 − 1
which is smaller adjusted value.
The significance testing of R
Let H0 : ρ2 = 0
Let HA : ρ2 ≠ 0
for F – distribution
( n − k − 1)R 2
F = k (1 − R 2 )
(10 − 2 − 1) (0.826)
= 2(1 − 0.826)
= 16.635
anddfnumerator = k = 2
dfdenominator = n – k – 1
= 10 – 2 – 1 = 7
The table value F(2, 7) = 4.737 at 0.05
Level of Significance and
SE
F = 9.547 at 0.01 level
Calculated value of F = 16.635 > critical value of F.
∴ We reject H0
o m
U
So, often the statistical value package report the significance of R2 and not of R 2 , the adjusted R2.
Q. 6. Regression equation.
. c
50 HO
Ans. Linear regression attempts to model the relationship between two variables by fitting a linear equation to
r t
observed data. One variable is considered to be an explanatory variable and the other is considered to be a dependent
g
variable. For example, a modeler might want to relate the weights of individuals to their heights using a linear
regression model.
i n
80 N a d
Before attempting to fit a linear model to observed data, a modeler should first determine whether or not there is
a
a relationship between the variables of interest. This does not necessarily imply that one variable causes the other
26 IO
e m
(for example, higher SAT scores do not cause higher college grades), but that there is some significant association
between the two variables. A scatterplot can be a helpful tool in determining the strength of the relationship between
R
y
91 UT
two variables. If there appears to be no association between the proposed explanatory and dependent variables (i.e.,
e s
the scatterplot does not indicate any increasing or decreasing trends), then fitting a linear regression model to the
u d l n
data probably will not provide a useful model. A valuable numerical measure of association between two variables
i k
is the correlation coefficient, which is a value between –1 and 1 indicating the strength of the association of the
98 L
observed data for the two variables.

n oo
t
SO
A linear regression line has an equation of the form Y = a + bX, where X is the explanatory variable and Y is the
O
Q. 7. Normal curve.
s
dependent variable. The slope of the line is b, and a is the intercept (the value of y when x = 0).
r - b
.e fo E
Ans. In probability theory, the normal (or Gaussian) distribution is a very common continuous probability
U
distribution. Normal distributions are important in statistics and are often used in the natural and social sciences to
b d
represent real-valued random variables whose distributions are not known. A random variable with a Gaussian
n
we
O
u a
distribution is said to be normally distributed and is called a normal deviate.
H
The normal distribution is useful because of the central limit theorem. In its most general form, under some
N
w Th
conditions (which include finite variance), it states that averages of samples of observations of random variables
independently drawn from independent distributions converge in distribution to the normal, that is, become nor-
IG
mally distributed when the number of observations is sufficiently large. Physical quantities that are expected to be
w
the sum of many independent processes (such as measurement errors) often have distributions that are nearly nor-
mal. Moreover, many results and methods (such as propagation of uncertainty and least squares parameter fitting)
can be derived analytically in explicit form when the relevant variables are normally distributed.
Q. 8. Degree of Freedom
Ans. In statistics, the number of degrees of freedom is the number of values in the final calculation of a statistic
that are free to vary.
The number of independent ways by which a dynamic system can move, without violating any constraint im-
posed on it, is called number of degrees of freedom. In other words, the number of degrees of freedom can be defined
as the minimum number of independent coordinates that can specify the position of the system completely.
Estimates of statistical parameters can be based upon different amounts of information or data. The number of
independent pieces of information that go into the estimate of a parameter are called the degrees of freedom. In
general, the degrees of freedom of an estimate of a parameter are equal to the number of independent scores that go
into the estimate minus the number of parameters used as intermediate steps in the estimation of the parameter itself
(e.g. the sample variance has N ? 1 degrees of freedom, since it is computed from N random scores minus the only 1
parameter estimated as intermediate step, which is the sample mean).[2]
Mathematically, degrees of freedom is the number of dimensions of the domain of a random vector, or essen-
tially the number of "free" components (how many components need to be known before the vector is fully deter-
mined).
The term is most often used in the context of linear models (linear regression, analysis of variance), where
certain random vectors are constrained to lie in linear subspaces, and the number of degrees of freedom is the
SE
dimension of the subspace. The degrees of freedom are also commonly associated with the squared lengths (or "sum
o
of squares" of the coordinates) of such vectors, and the parameters of chi-squared and other distributions that arise
in associated statistical testing problems.
U
Q. 9. Measuring Skewness.
.
Ans. Skewness is a measure of symmetry or more precisely, the lack of symmetry. A distribution, or data set, is
t
50 HO
symmetric if it looks the same to the left and right of the center point.
g
For univariate data Y1, Y2, ..., YN, the formula for skewness is:
g1=?Ni=1(Yi?Y¯ )3/Ns3
d a r i n
where Y¯ is the mean, s is the standard deviation and N is the number of data points. Note that in computing the
80 N a
skewness, the s is computed with N in the denominator rather than N - 1.
e m
26 IO
The above formula for skewness is referred to as the Fisher-Pearson coefficient of skewness. Many software
R
y
programs actually compute the adjusted Fisher-Pearson coefficient of skewness
G1=?N(N?1) /N-2
e
91 UT
u din ks
This is an adjustment for sample size. The adjustment approaches 1 as N gets large. For reference, the adjust-
l
ment factor is 1.49 for N = 5, 1.19 for N = 10, 1.08 for N = 20, 1.05 for N = 30, and 1.02 for N = 100.
n oo
The skewness for a normal distribution is zero, and any symmetric data should have a skewness near zero.
t
98 L
Negative values for the skewness indicate data that are skewed left and positive values for the skewness indicate data
O
SO
s r - b
that are skewed right. By skewed left, we mean that the left tail is long relative to the right tail. Similarly, skewed
right means that the right tail is long relative to the left tail. If the data are multi-modal, then this may affect the sign
.e fo E
of the skewness.
Q. 10. Kruskal Wallis Analysis of Variance.
b d
U
n
we
Ans. Introduction To Kruskal-wallis Anova Test: The non-parametric counterpart of the above is Kruskal
u a
O
Wallis analogue to ANOVA. It is computed with medians. Rest is similar with above. So that:
H
H0: The population median are equal.
w Th
N
H1: The population medians differ.

Relevant Background Information on Kruskal Wallis Anova Test
IG
w It is an extension of Mann Whitney ‘U’ Test. For two or more samples it is employed with ordinal (i.e., rank
order) data in a hypothesis testing situation involving a design. For K = 2 it yields the same results as those of Mann
Whitney U test. If the result of Kruskal Wallis one-way analysis of variance is significant then it indicates there is
a significant difference between at least two of sample medians in the set of K medians. So the researcher can
conclude that there is a high likelihood that at least two samples represent populations with different median value.
Then one of the following is true:
(i) The data are in a rank order format since this is the only format in which scores are available or
(ii) The data has been transformed into a rank order format from an interval/ratio format since the researcher has
reason to believe that one or more of assumptions of the single factor between subjects analysis of variance
are saliently violated.
Obviously data information is sacrified by ranks so some statistician are reluctant to use it.
The assumptions hereby are:
1. Each sample has been randomly selected from respective population.
2. The K samples are independent of one another.
3. The dependent variable is a continuous random variable.
4. The underlying distributions are identical in shape, thought not necessarily normal.
Even (3) is many-a-times violated. There are suggestions that Kruskal Wallis test static is not affected by
violation of the homogeneity violation of variance assumption as is F distribution.
n n
SE
o m
U
. c
50 HO
r t i n g
80 N
26 IO
a a d
Re
y m
91 UT
e s
u d
l n
i k
98 L
n oo
t
SO
O
s r - b
.e fo E
U
b n d
we
O
u a
H
N
w Th
IG

MPC 6

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MPC 6

Uploaded by

Copyright:

Available Formats

SE

caption, stubs and headnote are written here.

H MAIN BODY OF THE TABLE

w Th 14 196 32 1024 34 1156

43 1849 32 1024 33 1089

(a) Non-parametric Regression

ADVANTAGES OF NON-PARAMETRIC STATISTICS

where x = sample mean,

Difference due to chance

H Defendant Type-I Correct

is guilty error decision

Low Socio EconomicStatus 32 34 32 34 45 45 34 65 45 34 45 34

The correlation between intelligence (C) and academic achievement (A)

The correlation between anxiety (B) and academic achievement (A)

.369 − (0.918 × −.245)

Graphical Explanation of partial correlation between Academic

Fig. 1. Venn diagram explaining the partial correlation

and standard deviations and the population is normally distributed.

Ans. Multiple Correlation Co-efficient (R)

multiple correlation coefficient. For A, B, C it is written as RABC.

Where RA.BC = multiple correlation between A and B linear combination of B, C.

k = Predicted variables number

observed data for the two variables.

H1: The population medians differ.

You might also like