You are on page 1of 324

L

,n,^Lrn*r"

tsASIC
om
t . c
o
(A Text Book fo, lnte;::;*;.li&i{,

p
r- iasscs,

PART.II
g s
o
::.:,\ bl {,
.
,-ijk_r"::r,1,_! ;-;\.ry.11 ,

By 3
I .. ..

9 4 KIANI
t9
GHULAM HUSSAIN

a
Ex. Associate professor of

t
SGii.ii.,

/: /s
MUHAMMAD SALEEM AKHTAR

s
tt p
Govt. Gordon College nawitpinoi,

h
BOOK
II4AJEED trDEPC)T l_lJcad oiace
22-Urdu Vizai, Lahore
I

Ph:Q42-37 3 1 1 484 gt gSSl 87


ut{touE PUBuCATt0fia
Al-Mustafa plaza, 212-C Sunny
Aminpur Bazar, Center Al-Hanif plaza.
Road,
6!n Sateltite Town.
Faisatabad University Road,
Rawalpindi Gujrarrwala
Ph: A41-2643322 Ph:051-4429949 Sargodha.
Ph: 055-3825612 Ph: 048-374OUg
I
AII Rights Reserved
' ffo part of this book mag be reprodued or transmittedin any form or
by ang meens, electronic or mechanica\ including plwtoeopying recording,
or ang information storage and retrieual sgstem, utitltoututrittenpermissfon
of the autlwrs or publi.slwrs.

om
t . c
p o
g s
l o
. b
43
99
t
aEDITION 2011.2012
REVISED
s t
//
!

s :
I

Publlshers: Kashif Mukhtar (

tt p
Maieed Book Depot

h
Urdu Bazar, Lahore. !
t
Printers: Z.A. Printers, Lahore. ir

Gopy: 5oo s

Composlng: Muhammad Khurshid Khan &


B
Shahid Ayub Ghanna
b
K

t?c /_
Price: Rs.225f---
!
L

\ttt)

errtce
Basic statistics -
Part II has 6een written to
serye as the text
Intermediate rever crass XII.
n r,.i u..r,r1;itr;-J;aiy ...oraing tofor the students of
approved by the Ministry of rhe new syrabus
Education (curriculrr-wing),
Islamabad' The book will meetir,.i.qui*inds;i;ir coulrn*.nt'of pakistan,
iie roucation Boards in pakistan.

m
The students of M'A' Economics,
i. tgr,, Mr s; t*bi.pr.,v, M. sc, psychorogy,

o
Business Administration and B.B.A.,
*'"
riro.ntr or ,.nv;A;,. areas of sociar

c
read their courses from this sciences can
uoqk They .un u.r"htl'tt
.
from this book because the .,

t
lessons in the book have
oeen o[iLsedin a;;il;;rd,
simpre and rucid manner.

o
For the students, who do not
have the ctass-room facirty,
il;;;';;;
p t. ilr.r. "
gift in their hands' The students a good
oi allrru rquii op., University who

s
courses of BBA and B'A' will are taking up
find this book or tr.rn*aous value

g
prepare their lessons from They can
this book.without

o
intensive
niri*ise,..and*t.roi.g
The book is reary 'basic', ,.t crass-room lectures.

l
interested to learn the basic l;;il;g.-wise and anybody who is il

b
tneow or st tiiti., *ili n,iJit. it

.
book a beneficial guide. ;t
The entire book has been written \i

3
in a simple manner. Speciat attention .ri
given to theory of sampling, has been ,ri

4
rrvpotnesls testing and estimition.
have been made clear wit-h irllliriJru the tn.oreiicrr concepts 4l
il;; #;';;;J[
9
i!
examptes. Efforts I
to the sit,ations or practicat rire keep the
:fj|f#fJ.ffi:
9
so-tnat sreater i.ilr.Jis created I

t
{
I

a.;;
we are extremely gratefut to our I

t
colleagues who have done the
proof reading' Nonetheless arduous task of the I

s
some

/: /
lisnt
we shatt be grarefur if such u;r;;;;omisiions stiil appeai here and there [n the book.
ure'urougr,t to our;;ilJ'for prompr
'*.,*lri";eedsussestioni"nohearthy

s
Ii'fli"i'H;ff t-hHHiTil.l::ili,:ffi

tt p
co,easues in Rawarpindi an,r
i;'J:T:.#,:".:?Hf.:1,1?i^s::
j;H'#il.?:TI I: :u,
ji::T',.,:Trr:iliilii'ff '#Iilil[::::l rsramabad
j'J:ff
h
il'.xlil: jJ:::?s; 3::
I,"JffHJ:Hil';:*m'rlll":#i{ii{4x::,ff
jr*]rG j:;;iffi t.[:xffi':'#.;.:;:,ljTfl
'lil;:ilH,':: :'i:;:
ff ffi ::::t,,;.Y:#:'"ti:
stre ng us bv
th e n ed'
ti, ; ; ,.;t .'i. rifi ;ff.'rilffi
app reci g'
a fi
;t,.tf
,T. ffi:'3,.1['# j:i: n?*']fl-'^':l j:: I I *, to
e r p u b r is h e
jHff rs M ess r. M ajeed
ffi ['f 1?l.Y't^:.'m:;'*'j:'?:,",]1T,*dlHH,I Tt,ffi ;I?ff : ..o.u

ffi :Hl,l,il::,["..][.i*:l*11,"y^"::i_J"o,ililHT,,#r#f"I[Ir^llrr#,,,:;
Khurshid Khan, the computer operaro,
*lo;;'k.r*,ii'iril,{;T'i[o,}.t[.li*ilfl:
August l, ZAt1
Ghulam Hussain Kiani
Muhammad Saleem AUri",
CONTENTS

om
t . c
p o
g s
l o
. b
43
99
a t
s t
/: /
s
tt p
h
7
l/
(ui)
11.5.6. I(nown Probability ............. ......... BS
11.5.7. Non-Zero Probabiiity .................:................:........... .... Ba
11.6. Probability and Non-Probability Sampling ............... BB
11.6.1.
Lt.6.2. Sampling without Replacement........... ....... Bb
Combinations.........
m
11.6.3. ...... Bb

o
11.6.4. Permutations ......... ..;.............. ............:......... B5

c
Simple Random Sample

.
11.6.5. ................ ...r.......... Bb
11.6.6. Difference between Random Sample and
Simple Random Sample
ot
p
.............. 86

s
11.6.7. Selection of Simple Random Sample .,.:...... 36

g
LL,1. Errors ............;.... ........... Bz

o
Errors
l
11.7.1. Sampling ...............:.......... Bg

b
rt.1.2. Reducing the Sampling Errors ..............
.
..... Bg
Errors................
3
11.7.3. Non-Sampling .................. Bg
Distributiors....................:.....
4 ............
11.8. Sampling ..... 40

9
11.8.1.

t9
11.8.2. Sampling Distribution of X ............ 40

ta
11.8.3. Sampling Distribution of s2 and 52 ............47

/ s
11.8.4. Sampling Distribution of Difference ber*ceo rwo Means......... 50

:/
Proportion............. ..............-
11.8.5. .. b4 .

s
Proportion
11.8.6. Sampling Distribution of .......... bb

t t p
11.8.7. Sampling Distribution of Difference berre,en ff, zmd ir........... b8

h
[A Short Definitions.............. ........... 60
sA Multiple Choice Questions ..........64 "

rA Short Questions ............7L


Exercises 73-78
Chapter 12
Statistical Inference - Estimation ..i............ 79-116
12.1. Introduction................................. 79
72.2. Inference............
Statistical .............. Tg
12.2.1. Approaches of statistical Infsrtyr+--. ............79
80
12.3.1. Point estimator and point Estimate
......d.i.................. g0
- 12.3.2. Point Estimation ..........
12.8.8. u"iiuu"aness........ .. .....,.:........:::::..:..::..... """"';""""""
81

L2.4. Interval Estimation.............,............:........,:.:....-........ "" """" 81


'"""""-"""""""":"'9?
72.4.L. Confidence Coefficient...
tz.s._;;;;*ionofco.,nau....;;;;;..:..:..::.......
""' 82
om
c
12.5.1. Selection of proper Confidence lnterval....'."""""

t .
12.6.1. Meaning of the Confidence Interval
p o
s
;.......................... g6
12.7. conhdence Intervar Estimate for popuration Mean p .
Population Normal (Small Sample)..

og gg

l
12.8. confidence Intervar Estimate for the Difference between..........

b
two
' Population Means (Large Samples)..:.............. ..........
.
g.l
12.9. confidence Interval Estimate for the Difference hetween two

4 3
Population Means - popurations Normal (sma[ su-pi"*i...... gg

9
12.10. confidence Interval for the Difference between trvo

9
Population Means - Dependent Sampl"r......

t
l2.1L.Proportion............................. -...-.. -.................. 96
12.12.
ta
Confi.dence Interval Estimate for

// s
Population Proportion p (Large
Sample).. ................ gg

:
12.13. confidence Interval Estimate for the Difference hetween

s
two
PopulationProportions(LargeSamples)...'..........

rS p
rA
tt Multiple Choice
euestions
h r*u""i.ur..:-::".,
...............:;....... 108
os Short Questions
""""':"""""""" 110
...... r.13-116
Chapter 1B
Statistical Inference._ Testing of Hypotheses
..... Ll7_164
13.1. Introduction....
t8.2. Statisticar n*"irr"; ..............:. "" LL7

13.2.1. Nu, Hypotr,u*i, .::.....,.:......::::..:....:..:.....:...:.t.: il;


Lg.Z.2. Alternative Hypothesis......................... .... 11g

L9.2.4. Composite Hypothesis...........


.... 119 l.

--./n
( uiii \
. Aceeptance and Rejection of Null Hypothesis .......................... 1 19
13.2. f
13.2:6. Test Statistic......... .....:........... ..... 119

13.2.8. T\vo- Tailed Test........ ......"....... 120


tg.z.g. One - Tailed Test........ .............. 120
13.3.ErrorsinTestingofHypothesis.......................'..
13.3.1 Type I - Error

om
....".......122

c
13.3.2. Type II *-Error
.
........... tZZ

t
13.3.3.Relationbetweenaandp.'.'..........

o
: 13.4. Levelof Signifrcance........ .......... 124

p
13.5. Formulating Ho and H1 and Making Critical Region.............. 12b
s
g
13.6. General Procedure for Testing of Hypothesis .......... 126

o
1q.7. Hypothesis Testing Population Mean p, o Known
l
-

b
(LargeSample) ....;........".. .......... 128
13.8. Hypothesis Testing
-
3 .
Population Mean p, o not-Known

13.9. Hypothesis Testing


-
9 4 Population Mean p, o Known

-9
a
13.10. Hypothesis Testing
t Population mean p, o lJnknown

t -
Normal Population (Small

s
Sample).. ....... 132

/: / -
13.11. Hypothesis Testing Difference between two Population
I Means lrl * pz,6land ofr Knorvn (Large Samples)................... 135

s
tt p
13;12. Hypothesis Testing Difference between two Population
Means Fr - Fz, of and ol Unknown (Large Samples).............. 137

h
13.13. Test about pr - ttz, oi and o! Known, Populations Normal

13.14. Test about p1 -


ttz,6?and o! not linown,
Populations l.{orrnal (Smali Sampies) ...... 139
13.15. Test about Ir - F2, Dependent Samples,
Populations Ncrmal ...........,...... i40
13.16. Test of Population Proportion p (Large Sampie) ...............:..... 143
' 13.17 . Test of Differenre hetween two Population Proportions,
Pr - Pz (Large Sampies).i...................:........... .... ....... 146
13.18. Choice of Proper Test .-' Statistic .............. 149

I
( ix)

BA Shorr Questions

m
.RegressionandCorreIation-..........:....
14.L Introduction..........
14.2. Mathematical Model or Equatiorr..............
c o
,.. 165

.
" """" ""'
t
14.3. Non-Linear Mode1......... 19?

o
14.4. Statistical Model . .. ... ..............................' """' 168

p
""":"""""""" 16e

s
L4'4'L. Independent and Depend.en, v;#i;;.....

g
14.4"2. CauseandEft'ectneiation
....... .....".....:........... .'."'"'.'"',i1;

o
L4.5. Regression.......................
14.5.L.SimpleLinearRegression.......'.....

bl .
.
L4.5.2. Purpose of RegressionAnalysis............

3
14.5.3. Scatter Diagram
. .....................,........ l;; l

4
,i!
li
........... 173
74.6. Fitting a Linear Regression Line_the
9
:i

9 y "'
Method of Least Squares..
t
........... 125
14.6.1. Properties of the Regression Line..."....;...............i.....r..1...........

ta "
14.6.2. Regression Equation of X on 1Tg

s
......... .. .. ........,.... .
"" 17e

/: /
t4.7. Introductt"".-....._...._:.-..:: _ .
14'8' correlation.,........'......................................."..:........................:..
ili
s
14.8.1. Measurement of Correlation...........".

tt p
14.8.2. Ferfect positive Correl.ation......."..... ........ 1g4
......,. lg5

h
14,8.11. Perfact Negative Correlation.............
....... lgb
""":""""' """ 185
14'8.5. Seatter Diagrancs
14.9. Correlation Coeffi.cient for Sample Data......
1.g7
14.9. 1. Causa.tion in Correiation.".
14.g.2. Spurious Correlatio",...."..":."..:.."................""". "" 191
14.9.g. Changeof Origin '"""""" 19i

14.9.5. Change of Origin and $cale ........ 792


14.9.6,jr'irraLirrearBegression}lelation'...:...".., tgS
14.9.?. 'r,for Random Variables
.-._.-:_..". " .......... l'u
I93
(r)
14.10. Relation between br*, b* and r ................ 193
14.11. Properties of Correlation Coefficient r ............. ........ 194
[aShortDefinitions...........................'..
uS Multiple Choice Questions .....;.. 199
sS Questions
Short ........ 205

m
Exercises.............. ............ 2L0-2LG

o
Chapter 15
Association.........
t . c 217-.254

15.1.1. Notation for Attributes ........

p o
...............;.....217

s
15.1.2. One Attribute .................:...... .... 218

g
15.1.3. Two Attributes....... .....i.."........... 218

l o
15.1.4. Positive and Negative Classes............ ...................... 219

b
15.1.5. Order of Classes ..di.....................220

3 .
15.1.6. Ultimate Class Frequencies........... ...........220

4
15.1.7. Lower Order Frequencies in Terms of

9 9
15.1.8. Higher Order Frequencies into Lower Order Frer1uencies......22L

a t
15.2. Consistency............ ....223

t
15.3. Attributes...
Independence of ....224

/ s Defined..........
15.3.1. Independence

/ Association
. ...................226

:
Independence.......
15.3.2. Another Definition of ..227
L5.4.
s
Coefficient of ..........22g

tt p Independence........
h
15.5.1. Test of ......... 230
L5.5.2. Direct Formula for Calculating Xz in 2 x 2
TabIe...............................
Contingency ....... 234
15.6. Contingency Table of Higher Order .........235
;
15.7. l,imitaticns of X2............ ...........:.... ............ 2BG
15"8. ,Rank Colrelatiou.............. ...:............ .........240
@ Short Definitions.............. ..r............. .........243

$€ Questions
Short ................ ........247
Eiercises.............................. ...:........ 250-254
(ri)
Chapter 16
Time Series.................... .. ZEE_2gg
16.1. Introduction.......... ..........................:..... ..... ZEl
L6.2. Purpose of Time Series...... .....".......... ........ zbs
L6.2.1. Graph of the Time Series ..........2b5

m
16.3. Components of a Time Series
o
...286

16.3.2. Seasonal Variation ................


t . c .... ZbB

o
16.3.3. Cyclical Variations..............

p
......25g
16.3.4. IrregularVariations..............

s
....260

g
L6.4. Analysis of Time Series .............261

l o
b
16.5.1. The Method of Free-handCurve................
.
.........,....262

3
L6.5.2. The Method of Semi-Averages .268

4
16.5.3. The Method of MovingAverages ..............265

9
Squares...
L6.5.4. Method of Least

9periods
........269

t
Line.........:......
16.5.5. Fitting a Straight ..............269

ta Coding
16.5.6. Codiirg of the Time ...-....!.,..... .......269

/ s
16,5.7. Change of Origin in .......270

:/
16.6. Fitting of Second Degree parabola ................ ........... 27B

s
uS Short Definitions..............................

tt p
......... Z7S
0g Link with Time Series Components............ .............277
Bg Multiple Choice euestions
h
........27g
rg Short Questions ........2g2

Chapter 17

t7.I. Introduction to Computerp..............


"':"".l"
Zgg
L7.L.l. Computer Capabiiities and its Uses ......... Zg9
t7.2. ComputerHistory 2g0
17.3. Tlpes of Comput*".......... ".. ... .... .."............... .'.'.'.... rn,
(*if)
17.3.2.
L7.3.3. Hybrid Computer .......2g1
L7.4"
17.4.t.
77.4.2. Minicomputers....... .... Zgz

m
17.4.3. Microcomputers.......:........ ......... Zgz

o
17.4.4. Super Computers.............. ......... 2gg
17.5.

t . c
................o
t7.6. ComputerHardware . ZgB

p
17.6.1. Input Unit........ .......... 29A
t7.,6.2.
Storage
g s
o
1?.6.3. Secondary .....297
17.6.4.

b l
Computer Sbftware
.
17.7.

3
:
... 2gg
ti.7.L.

4
............... 299

9
L7.7.2. System Software..............;.................. ....... 2gg

9
Software..............
t
t7.7.3.. Application ... 901

a
17.8. Basic Idea of Writing and Running a Computer program...... BO.L
17.8.1. Prograrn Design
s t ......... 801

'
77.8"2.

://
s
1?.8.3.

t p
17.8.4. Documentati.on, Implementation and Ma,intenance ........,....... 802
L7 "9.

17.9.1.
h t
17.9.2. Binary Number System ..........." 802
1?.9.3. Octai Number System.:.............. ............... 802
17.*"4. Hexadecimal Number Systern .. g02
17.10. Einary Number System as a Foundation of Compurer ........... 802
[g' Iv{uitiple Choice Questions ........ g04
[a
Statistical Tables B0Z-B1Z
Chapter

DISTRIBUTION
10.1 INTBODUCTION
om
. c
ffil*"'"*r, ,n" urr"overy of normal distribution goes back to the seventeenth
and eighteenth centuries and is associated with the names of De
t
o
(166J --1754), Laplace (LI4g Moivre
- LBZT) and Gauss dnt _ t85b). el-tfrrt time,

p
received the attention of mathemat7icians and naturd it
,J;;;i"i-r.i""tists.
s
application to biological Its
was pioneered at a later date by Sir Francis Galton

g
(1822 - 19tt). The normal{ata
distribution, also cailed th" ,,or-al il;;f ;;r, is widery

o
lusJ;; ;;;hirfoppostte
used in research in the biologicar, physical and social ..i;;:;";;;;"1
quite hfe we

b
often come across the distributions close to this distribution
and hence the

.
"normal" is used for it. The word normal is not to be
to

3
the word abnormal. Normal distribution is also called'mother
of distributions

4
because various other distributions are generated from
this distribution. This

9
distribution makes the base for inferentid ;tati;ti; brrrr"h of statistics in which
" of informrition gained from

9
we draw conclusions about the populations on the basis

t
the sample study.

ta
LO.z NORMAL DISTRIBUTION

s
Normal distribution was first described in 1ZBB by De Moivre

/: /
limiting form of the binomial density as the rr,,*U*, of trirl, become as being the
discovery did not get much attention-and the aisiriu"tlon
infinite. This
was ,,discovered,,again by

s
both Laplaee and Gauss about a half century later. Both
men J;;il *iti prout"*.
of a9fronomy, and each derived the- ,ror*ri

tt p
distributio;;;;i.ffi;;;;;;;
seeniingly deseribed the behavior of errors in astronomical
measurements. The

h
distribution is often referred to as the "Gaussian" distiibution.
one of the most important examples of a continuous probability
distribution is
the normal distribution also called normai curve oiCur.*i"n
distribution.
E-vv^'vBvrv'' The
r curve
is defined by the equation

y = f(x) = +.-*(+)'_o<X<oo (10.r)


o {2n
where, p = mean of the distributio a parambter.
o = standard deviation of the distribution
Tt = a constant approximately equal to 8.141b9
-"a.parameter.
I
Basic Statistics Part-II
e = a constant approximately equai to 2.71828
X = abscissa, measurement or score marked on horizontal axis
Y = ordinate, height of curve corresponding to an assigned value
ofX
The total area bounded by the curve (10.1) and the X-axis is one. The area
under the curve between two ordinates X = a and X = b, where 3 < b, represents the
probability that X lies between a and b and this probability is denoted'by P(a<X<b).

m
When the variable X is expressed in terms of standard units or standard normal
variate , = +,
c o
then equation (10.1) iS replaced by the so-called standard form

1
t .
o
_22
Y= i-a Z .-....(10.2)

p
^'lz"

s
In this case we say that Z is normaliy distributed. The nrean of standard

g
normal variate Z is zero and its variance is one. The value of.Z is zero when X = p. A

o
graph of this standardized normal curve.is shown in Fig.10.1. In this graph we have

l
indicated the areas included between Z - - 1 and + l, Z= - 2 and + 2, Z= - 3 and +3

. b
which arc 68,270/o, 95.45ya and 99.73% respectively. The area under this curve
bounded by the ordinates gt Z = 0 and any positive value of. Z are given in table.

symmetry of the curve about Z = A.


43
From this table the area between any two ordinates can be fotrnd by using the

99
a t
s t
/: /
s
tt p
h 1-l
z--
+
012
68.27
i< aE o/
% +
99.73 %

Fig. 10.1 -
10.3 PROPERTIES OF THE NOBMAI-, DISTRIBUTION
7"@) It is symmetrical about ordinate at X = p. It means that the central ordinate at
X - p d.ivides the curve into two equal parts.
L@) The arithmetic mean, median and mode coincide.
(3) The lower and upper quartiles are equidistant from the mean and are at a
8r, p-a'b7\56
distance of 0.6745 o.

Qr, P *o'b'145 d
3

13 fj::,i:::::::i::.'-171o or quarrile
(5)
or 4/5 o(approximaterv) . *€t"* . o .Clt rtE
3ff;tlj:._:_11Til::31g"
which is equal to 0.674bo or z)g"
deviarion ru uq.f,lBiri" probabre error
tupp"oJ*-i"rri -
(6) The ordinate is highest at the mean p.
(7) ,2a-8 ,' Li+ E
::":ff^:Tlffr:
direction) which lie"1.'nrl3.ction
at a di
(the point. *ffi ;"
;,*;;,,e ?hanges its
of one o g!o.y{he mean p and one o below
the mean p.

m
(8) The curve is asymptbtic to the't a." Un"f It

o
means that it
but never reaches the base line.

c
"orrtr.r,rlJ[-rpproach

.
(9) In normal curve,. if nth moment is odd, the value of

t
alwavs be zero. This is because t\" ;;*;r'irir"
this odd moment will

o
symmetrieal distribution sum of the}ffir"
is symmetrical and for

'"w sp
equal to the sum of the negative ddiation;
i""irti"ns from p will always be
each other. If nth moment
fr;; t, and thtrs will cancel out
sYErr' *u
is even, -" frr*ru it
g
rru ro
n! '"'fiilt'
"":;; """lrti* *vtAwt.^,t^
^)) '*^,t^"t:
o
= *
l
P,,
"t (where
,-i{,(, 'JJ
\wrrtiru'rs
n is even/
e,
ba z"Lo
o;(f )r
b
W
. fi tl =Fr !--* =
, o

3
=
It that -
4
follows p, o2, p:= go1, p,
= = y? = 0, i.e. skewness is zero and

9
ffi

9
P, = = 3, Tz= 0, i'e' normal curve has zero kurtosis. The normal

t
ff distribution

a
is also called mesokurtic.

s t
(10) The total area under the normal curve

/
is unity.

/
(11) Area properties of the normal

:
distribution.
In a normal distribution:

s
,.,-

tt p
p+0.674bo coversb}o/oare,- : p+1o
o I covers 6g.270/oarca
1r * 2 covers g5.45% arc 0 I
p+3 ss,,,% area

h
covers
10.4 STANDARD NOBMAi UsrirsurloN
The properties of the normal-
9u1ve _permit us
distribution in terms of the ,u"irUtu 2 defined a to define standardized
as
Z = X-P
o
This is equivalent to measuring the distance
standard deviation o as the unit of measuring X from the nrean p using the
distance. The variable z is termed. as
the standard normar variate a"a
probability density function in termsd1v;r ; ,'";y ;;;;rtant role in statistics. The
of Z is
p(z) = +r-T ;
"'l2n
Basic S-taUstics Pa ft-II
The mean of random variatrle Z is zerc and its variance is fnity. If lve know the
mean p and.the standard deviation G, we can calculate Z comesponding to any value
ofX and corresponding from the central ordinate to the value ofZ.
10.5 USE OF THE AREA TABLE
The table "areas under the standard normal curve" gives the areas for various
values of.Z. For example Z - - 1 to 0 and 0 to +1 gives the area 0.34134 as shown in
the following frgure.

om
t . c
p o
g s
l o
. b
Fig. 10.2
As the curve is syrnmetricai, the same area table can be used for negative

Z= and.Z = 0 is also 0.34134.


-L 3
values of. Z. The area from Z = O ta Z = 1 is 0.84184, similnrly the area between

4
Example 10.1.

9 9
t
In a normal distribution mean is 100 and standard deviation is 10. Find:

aquartiles
(, the mean deviation
t
(ii) the quartile deviation

s
(ili) the third and fourth moments about mean (iv)moment ratios (B, and pr)

/: /
(v) the lower and upper (vi)the median and mode
(vii) the values of points of inflection

s
(viii) the value of the ma:5imum ordinate correct to four places of decimal.

tt p
Solution:

h
Here, p = 100, <l = 10, 62 = llz= 100
(1) Mean cleviation = 0.7979 o = 0.?979 (10) = ?.9?9
(ii) Quartile deviation = 0.6745 o = 0.6745 (10) = 6.7,15
(iii) Third moment about mean = fh = 0, because all odd order moments about mean
in a normal distribution are zero, i.e. pr = Irs = p5 = ...- 0.
Fourth moment about mean = Ir4 =.3on = g(tb)4 = 3O0OO

(ir) p,=H=ffi=oand o _ Il! _ 30000 _


t', - pZ --
(1Og1z "
(v) Qr= Ir - 0.6745 o = 109 - 0.6745(L0) = 93.255
Qa= p + A.6745 o= 100 + 0.6745 (10) = 106.745

t-
[Chapter 10] Normal Distribution E

(vi) Mean = Median = Mode = 100, because in a nonnal distribution the rnean,
median and mode coincide.
(vii) Normal distribution has two points of irrflection which lie at a distance
of one o
above the mean pr and one o below the mcan F. i,e.
p-o= 100-f0=9.0 and p*o=100+ l0= ll0
(viii)Maximumordinate =+ - - 1

o-V2n ffi=o'0399 t
Example 10.2.

om
c
In a normal rlistribution, mean is zero and the standard cleviation is l. Write

t .
down its equation and find the value of the maximum ordinate
correct to four places

o
of decimal.
Solution:

s
The equation of r,ire rrormal curve with mean p and standard deviation
p
o is
r = r(x)= rH
Vc/ (+)'
og
- oo < X < o;

l
"-;

b
when p = 0 ,r,o*T = ,*
.
,, equation of the normal curve wiil be
y = f(x)= $u-j*'
v 4,r

43
9
We know that th.e nraximum ordinate is at X p and
= |t = 0, the vilue of the

y = -f j,o,' 9 -
maximunr ordinate is

a t 1 .,0- 1.. (sincec"=l)

t
= *
^J2n ,tr"- G
"-

/: / s
1 1

2(3.14t6) = 2.5066 0.3989

s
Example 10.3.

tt p - - 0.6 2.2L
Find the area under the normal curve in each of the cases:
(i) Between Z= O andZ= 1.2 (ii) Between Z =- 0.68 and Z = O

h
(iii) Betweenz= 0.4G andz= (iv)Betweenz= 0.g1 and z= l.g4
(v) To the left of z = (vi)To the rig.ht of z = -r.2g
(vii) To the right of Z = 2.0b and to the left of Z
= - t.44:
Solution:
(i) Between Z = 0 andZ = l.Z
Required area = Area between
Z= 0 andZ= L.2is 0.Bg4g
Basic Statistics Paft-II I
I
(ii) Between l= - 0.68 and Z= 0
Required area = Area between
I = -0.68 and Z = 0 is 0.2518

(iii) Between Z = -..0.46 andZ = 2.2L

om
c
Required area = (Area between

t .
l=-0.46andZ=0)

o
+(Area between Z = 0 and Z = 2.21)

p
= 0.L772+0.4864 = 0.6636

(iv) Between Z -- 0.81 andZ-= L.94


g s
Required area
l o
. b
- (Area between Z=O.and Z = 1.94)

3
-(Area between Z=A andZ=O'.81)

4
= 0.4738 - 0.2910 = 0.1828

99
t
(v) To the left of Z = 0.6
-

a
Required area = (Area to left of Z = 0)
l=
t
- (Area between - 0.6 and Z= 0)
s
//
= 0.5 - 0.2258 = 0.2742

s :
tt p
-06 r,

(vi) To the right of Z = -t.28

h
Required area
= (Area between Z = - 1.28 and Z = 0)
+ (Area to right of Z = 0)
= 0.3997 * 0.5 = 0.8997

(vii) To the right of.Z = 2.05 and to the left


- of.Z = -1.44
Required area = Total area
- (Area between Z = -1.44 and Z = 0)
- (Area between Z = 0 andZ = 2.05)
=l - 0.4251 - 0.4798 = 0.0951
[Chapter 10] Norqral Distribution
Example 10,4,
Given a normal distribution with p = 40 and o = 6, find
(a) the area below 32 (b) the area above 27 (c)
the area betwe en 42 and 81.
Solution:
Here, lr .=. 40, o . = 6, Z ,=
X--u x- +o

,=*#=-B
m
(a)
o
=-1.88

c
P(X< 32)=P17 <- 1.33)
' =P(- q sZ < 0) - P(- 1.38 <Z < 0)
t .
=0.5-0.4082=0.0918

p o x

g s ,Z

27-40
-?13
l o
b
(b) u- = 2'L7

.
6 -

3
P(X> 27)=P(Zr-2.I7)

4
= P(- 2.t7 <Z<0) + P(0 <Z<*)

9
= 0.4850 * 0.5 = 0.98b0

t9
ta
(c) z,
"#'='#//s
= =3=0.r,
rz
o, _
s :
Xy-40_51-40 11
--6- =-l-=
tt p
= 6-= 1.83
P(42 < X < b1) p(0.88 < Z < i.8B)

h-
=
= P(0 s Z < 1.83) -p(0 < Z s 0.S3) x
= 0.4664 0.1298 = 0.3871 II Z
0 0.3.j 1.83
Example 10.5.
- A newspaper stall sells an average of 400 papers per clay. Assume that these
sales are normaliy distributed with a standarj deviation of 2b. For
each of the
following probability questions use graphs with both X ancl Z axes and indicate the
c-orresponding areas under the nonn-al curve. what i, th;;;;"ujru,
,rr," un n gir.n
day:
ia) more than 420 papers will be sord? (b) at most 410 papers will be sold?
(c) less than 395 papers will be sold? (d)
between 3g0 and 40b papers will be sold?
Basic Statistics Part-II
Solution '
. Y-, x-400
Here, p=400, o=25, Z= * =
o 2lt

'(a) Z = 420 - 400 : 0.9


25
P(Xr420)=P(Z>0,8)
=P(0.2.*) -p(0 <Z<0.8)
= 0.5 - 0.2881 = 0.2119 {20

om
c
0.8

t .
o
(b) rJ-
410 - 400
-u..1

p
25

s
P(Xs410)= P(Z<0.4)

g
=P(- at <Z<0) + p(0 < ZsO. tr
x

l o
= 0.5 + 0.1bb4 = 0.65b4 +,ti li0

. b
3
395 - 400
'L = --Tb = -0.2
4
(c)

9
P(X < 395) = P(Z 0.2) < -

t9
< <
= P(-oo Z O)-p( - 0.2 <Z<O)

a
= 0.5 - 0.0793 = 0.4207

s- t
-
/: /-
X1 400 390
(d) zr = 400

s
25 = --25- =-u'4

tt p
t,z -
Xz - 400 405 400 I
ZS25 - v'2 =
I

h
P(390 <X<405) = P(- 0.4 <Z<0.2) :{ I
:i$il l6ali05
= P( - 0.4<Z< 0) + P(0 < Z< O.Z)
--nO Z
. ,.) -1 i_r t) l
= 0.1554 + 0.0793 = 0.234i
Example 10.6.
The heights of freshmen students at a military acadei:r;' a.re nornrally
distributed lt'ith a lnean of 5 feet 10 inches anci a stanclard deviatioir of 2 inches.
(a) What is the proportion of freshmen at the acadcni}, rvho arc taller Lhun {.l t'cer ll
inches?
(b\ what is the proportion of freshmen who are less than b feet 7 inches? x
ir
,'c) W'hat is the proportion of freshmen betrveen 5 feet 8 inches and 6 feat 0 inches?
6
l C(
iCha -10

Solutian:
(a) z =25:rio = 2.i
Ir(X>75)=Ir(Z>Z.i)
=P(0 <Z<m) *p(0 <Z<.2.5)

m
= 0.b -- 0.4g3g = 0.0062 or A.62,)/o x

o
75

c
Z

.
2..5

t
(b) 7. = 9ji-9 --

o
h
r.5

p
P(X. riTJ .- Il1,l, ..- I.ir)

s
=P(-.n.:l <0) _Ir(_ 1.5 ,<z<0)

g
= 0.5 -" 0.ziBB2 = 0.()66g or.

o
6.GU9ru

bl *1.5

(c) ,, = \j! 68 -- 70
2
--I
3 .
,, _ [i:_?Q== 72 _ 7a
,,,)- 2
9 4
--7"-=+l
-P(- l<?< 0)+p(0.2.1)t9
I'(68 < X < T2) = p(* t <Z <

a
L)

s t
/
= 0.3413 + 0.g4lil = 0.6g26 or 6g.?6%

:/
10.6 NORnIAI, FITEQUENCY
DISTRIBUTION

s
sometimes we have to cr:tlvert

tt p
the norrnal probability distribution
freqr'rency distributio"' wr,u"
irre prohabilit;, aistJbution is
into normal
rnultipliecl with thc total
ilffil::ffiilHillJih l]i o"
g"t u,"

h !"-j(+')'
rrequen., al.i.iu,tion rhcl
"",*ar
r(x;=
u'hereas the nornral frcqucncy
clistributio,, i"

For exarnple' we know that thc'probability


X will fail bef'r'een the intervar is o.cszz that the r.ncrorn variabre
1,-o.to pro. i:h; ;;abilit_v 0.68?? can be
into per.centage of observation. *hi.h co,vertecl
68'27 %. If the tr:tal number of
fi" U"t*""Ji"io.una.,r -i.o.. ,fhis pcrcentage
obsorv,tiors a.c r000, is
contain G8s obserirations, i.e . 0.6u27 - -' the
r'rru incerval
rrrLlJ'val 1tlt _
- o to
l * + o wilr
x 1000 = 6g3.
to Basic Statistics Paft-II
Example 10.7.
In an intelligence test administered on 1000 children, the average I.Q. was 42
and standard deviation 24.
(a) Find the number of children exceecling a score of 50.
(b) Find the number of children lying hetrveen the scores 30 and 54.
Z=+ =ry
m
Solution: Here, lL=42, o=24, N=1000,

,(a) ,=u'oiin'=0.33
c o
P(X>50)=P12>0.33)
t .
= P(0 <Z < *)- P(0 < Z < 0.33)

p o
s
= 0.5 - 0.1293 = 0.3707

og
l
0 0.33

.b
IIence the expected number of children exceeding a score of 50

3
= N.P(X > 50) = 1000(0.3?0i) = 370.7 or 371 approximately.
(b) z,=V=W=-o.b
9 4
t9
?7 -Xz-42 54-42
Zr=T=-1f,-=+0.5
-
ta
P(30 < X < 54) P(- 0.5 <Z< 0.5)

/ s
= P(- 0.5 <Z< 0) + P(0 < Z < 0.5)

/
:
= 0"1915 {: 0.1915 = 0.3830

s
tt p
Hence the expected number of children lying betrveen the scores 30 and 54
=
= N.P(30<Xs54) 1000(0.3830) = 383.

h
10.7 THE NORMAL APPROXIMATION TO THE BINOIIIIAL DISTRIBUTION
The (continuous) normal distribution provides a close approximation to the
(discrete) binomial distribution when n, the number of trials is very large and p, the
prohability of a success on an individual trial is close to 112. To provide a theoretical
foundation for this argument, let us make the following statement, a proof of which
can be found in most of the texts in mathematical statistics.
If X is a random variable having a binomial distribution with the parameters n
and p, thenZ = (X - nptn[npq approaches the standard normal d.istribution when n
approaches infinity. Strictly speaking, this statenrent applies when n approaches
infinity, but. the normal distribution is often used to approximate binomial
probabilities even n is fairly small. A good rule of thumb is to use this approximation
orily when np and nq are both equal to or greater than 5- The procedure to follow in
using a normal approximation to the binornial is as follows:
[Chapter t0] Normat Distribution
11
il"p l. Cornpute F = np and. o = V;;;
Step 2. Apply a continuitv ."r*.tion
factor to convert a discrete (binomial)
random variable into o (rror.nul)
cclntinuous random variable, so
standardized norrnal Z transfon"uiion i, that the
(x-1/2)_r
o orZ=
3'
m
step I"':e a stanciard rtormal table to rlnaine
probabilities corresponding to Z in

o
order to obtain the binomial b(x;
,j. fo. example,

c
",
,[5P-=, Yzr-] -
.
(X = a) = (" *

t
=
P(x < h) = vlrz
=a.i+-]
p o
p(x > c) _: vizrq+l ,, t _yfz.
g s *
(" {?) _ p-i
j -
o
o

l
ot.Jnao* urr;r[tu x. ,
'
where a, b and . u." *,*u uulu".

b
rO.8 IM/ERSE USE OF THE AREA

.
TABLE
The area table of normal aistrJiution

3
is designed to give the aread.for various
values of the standard normal ,ariate
z. Butthi, ;;;; tabre can also be used to reacl

4
Lhe values of z for a cerLaln given
area under the normal curve. This is called

9
inverse use ofthe area table. s"ppo."
there are gil,/oob.errations less than a certain

-lf mX:"JfiJ lff;." t9


point sav, X(Pn.). crearlv rhe area
between p and x i.;;;;ir.i;;-t:il'J
onu rruu

a
read the value c"r"*ponding to the area equal to

t
"rz.

// s
s :
tt p
h
Exomple 10.5.
0 1.645
Z

A random variable X is normally distributed


deviation = 4- (i) Finrl a point that has with mean = 40 and standard
9? * ;ith";irrriurrro., berow it.
. (ii) Find a point that has 62.2 % of
the disiribution berow it.
(iii)Find a point that has 90 % ofthe
distribution above it.
."
(iv)Find rwo points containing th"
midJle;;t;;;;:'"
(v)Findtwopointscontainingthemiddleg5%;;;;,.
(vi) I.ind p2s, pss, p* and prr.
r
Basic Statistics Part-II
Solution:
Hgre, p=40, o=4, Z=\J! - X:40
(i) P, is a point having 97 percent of the area below it Area table shows this
point to be
x-40

m
1.88 =
4

o
x-40. = 4(1.88)
' X=
Thus, Pr, = 47.52
40+7.52=47.52

t . c
o
I--.: 11

to
p
re:
Pc-

s
(ii) Por.,is a point having 62.2 percent.of the area belorv rt. .{rea table shows this

g
point to be

-
l o
b
0.3108

X-40 = 4(0.3108)
=,
3 .
4
X = 40+ L.2432=4"1.2432

9
Thua, loz.z = 4L.2432

t9
{0 Pe::
(iii) 90 Yo ofthe area above it, Area table shows rhis pornt ro be
-ry ta
-t.28

// s
X :. 40

s :
= 4(- L.28;)
X = 40 -5.L2 = 34.88

tt p
(iv)
.
'
h
98 % area under the normal curve
means that 0.49 area each to the
left and right of mean. From the
area table, the value of Z for which
the area between 0 and Z is 0.49 is
. 2.33. Since X, will be on the lefi ot'
the mean, therefore Zr ,will be
negative and X, is on the right of
the mean, therefore Z, will be
positive

\--. -l
10] Normat Distribution

/ -- Xl-40
n
un_
4
Xr-40
-2.33 = 4
+ 2.33 - X:: - 40
4

x,-40 4(-2.33)

m
X.r- 40 = 4(.2.8J)

o
xr 40-9.32=80.68 X, '= 49 + g.B2 = 49.82
Ilence two points are 30.68
and 49.82.

t . c
o
(v) 95 9'o area unrJer the normal curve

p
that 0.4750 area ur.n to if,.

s
-m9ans
left and right r:f mean. fr"rr- ifr"

g
area table, the value of Z f.or
which

o
the area hetween 0 and Z is

l
C..qiSO
is 1.96. Since X, will be on the-jeft

. b
of the mean, therefbre Z, will be

3
negative and X, is on th.e right
of

4
the trnean, therefore Zz witt be

9
positive

Lt=TXr-40
n
t9
a
. X:l-40'
:: -----

t
/:,,
"4

s
v _40

/: /
Al-
-1.96-4- + 1.96 = &-L9

s
X,-40 = 4(-1.96)

tt p
X,- 40 = 4(1.96)
Xr = 40-7.g4=82.16
. X, = 40 + 7.94

h
= 47.g4

re 32.16 and 47.94.


(vi) P,o is a point rrrifi"rT"j:ilt:i' the area below it. Area table shows
point to be this

x-40
- 0.8415 = 4"
X-40 - 4(- 0.8415)
?''
'\r_ .1\ - 40-3.306=86.684
. Thus, Pro = 36.6b4.
14 Basic Statistics Part-II
'Pru is q point having 80 percent of thc arca below it. Arca tablc shnrvs this
t- point to be

_= x --40
0.8415
X
-{-
--40 = 4(0.8415)
X = 40+ 3.366-43.366

m
Thus, Pso'= 43.366.

lr't

c o
Ps"

t .
Pro is a point having 95 percent of the area belorv it. Ai:a tabie shows this

o
point to be

p
1.645 -.
x-40
X-40 = 4(1.645)
g s
o
X = 40 + 6.58= 46.58
Thus, Pr, = 46.58.

bl
3 . -10 Pq:

4
Prn is a point haring 99 percent of the area beiorv lt. -\ea tahle shows this

9
point to be

9
x-40

t
2"33 =
4

ta
X-4r) = 4(2.3t1 ,

s
v_
^.- 4rJ+9.32= 49.32

//
Thus, Pro = 49.32.

s : {{} Prs

t p
Example 10.9.

t
In a normal distribution 31% of the items are und.er 5l and E9,/o are over ?6.

h
Find t;he mean and the standard deviation of the distribution.
Solution:
Let p ='Mean, o = Standard deviation of the normal distribution.
- _X-u
lr -
o Xr-li _ 5'l-tt -
Ir, -
o,- Xr-[ 76-tr
Lo - -
o,ooo
Since 3I% of the items are under 54, the area
to the left of the ordinate at X = 54, is 0.31.
Therefore, the area between X = 54 and thc
mean p is 0.5 - 0.31 = 0.19.
Then the eorresponding value of Z, is 0.495u
54-Lt
Zr=- 0.4958 = -;-- 5+ tr

t-
[ChdDt€l lOI Normal tlictrihrrx^-
("' we have taken z,toben"*urr,
at mean)
54-p =-0.4988o or p-0,49bgo = b4 ...... (1)
Again
It is given that 8 % of. the items are over 76. Therefore, the area uncler the
normal curve betrveen pr and 76 is 0.42 (or 42 %)"

om
The corresponding value of Zris l.40bB i.e. Zr= 1.4058 ZA+
-o =

t . c
o
(We have taken Z, to be positive because it
p
falls on the right of the mean
ordinate)
76-p=1.4053o or p+1.40b8 o = 76
g s .....(Z)

o
'
l
solving equations (r) and (2), we get 1.g011. o 22 or o = fl.57

b
=

.
Substitutirg o = LL.57 in equation (l), we get
-
3 or
0.4958(11.57.; = 54 p = b4 + b.7864 or 59.74
4
tt = i9.7864

9
Hence, Mean = ig.74and S.D. = ll.b7

9 ,rirtribrtior,.
Exanrple 10.10.

t
= ta
In a normal distribution the lower quartile is I0 and the upper
--r-r-- quartile is
Find mean and standard deviatiop of the
a-*' 22.

Solution: Here, Qr l0 and er =

/ / s 22

s :
The two quartiles are given by

tt p
Qr = F-0.6.745qandQr = p+0.674bo
Substituting the values of er and ea, we get

h tt-0.6745 o = 10 ...... (1) p+0.6745o = 22 ......(Z)


Solving equations (l) and (2), we get 2p = 32 or
trr = 16
Substituting p = 16 in equation (l), we get
16-0.6745o=l0oro=g.g
Thus, the mean and standard deviation of the normal distribution are
16 and g.g
respectively.
I
I
16 Basic Statistics Fart-II
SHORT DBFINITIONS
Normal Distribution
A normal clistribution is a. particular idealized. smooth, bell-shaped hrsr.oqrartr virti)
ail of the randomness removed, [t represents an ideal data sei that iras lor,s of
numbers concentrated in the midcile c.rf the range and trails off sy'mrnetrica).lr, ,.r1
both sides. A data set follows a normal distribution if it resembles ihe :rnoeith,

m
symtnetric, bell-shaped normal curve, except for some randomness The norur;.r!

o
distril.rution piays an imp,:rtant role in statistical theoil' and pracricc

c
Stanrlard Nor'ma I f)istribution

t .
The disrr:ihution of a nortnal randorn variablt, rvith :n.an zci; ;irtl ,ri::indAr'[l

o
clr:viaciorr one l,s ,:llled a starrdard :ronrral distrrbuticrn.

p
ot'

g s
A norntal Ci.triirut,r.orr t,hat has a mean of zerr, and standard derration of onr is,:allco
tlre stanitarrJ norrn,ll rlrqtriburion If 7,is the st.andard n,trmal rarjoii i,arrabJ,;. thr-r:;

l o
Zh.rstlr--prohahitrit,,'disrrihrrtionf(z)=+o.f;fn,_.'<Z<....

b
tl9n

.
MULTIPI,E . (.'T{OICE QUESTIO\S

3
4(b)
t. A normal rjisr:ribution has the me;rn p = 200, If 70 percent of rl;e area under the

9
{rrrve lies r;r, t}:e lert of 220, t}re area to tlte right of 220 rs:

9
(at 0.3 d

t
05

a
(c) t) 2 (d) 0.7

st
2. Given a nrirmal distributi<.rn with p = 100 and or = lCO. rhe area to the lcft of

/: /
100 is:
(a) cne (b) equal to 0.5. ,

s greater than 0. j
(c) less than 0.5 (d)
3.
t p
A randon: variable has a norrnal rlistribution rvith the mean u = 400. If 80

t
percent of tlie area under the curve lies to the left of 500. tire are: betx'r:e rr 400

(a)h
and 500 is:
0.5 (tr) 0.2
(c) r
0.3 (d) zero
4. In a normal distribution mean is 100 ancl standarci cleyiation rs l0 The r';riues
of points of inflection are:
(a) 100 aurl110 (b) 80 and 120
(c) 90 and 110 I (d) none ofthe above
5. If X is a normal variate with mean 20 and variance 16. The respective values of
B, and B" are:
(a) 0and3^ (b) Sandl
(c) 0.5 and 1 (d) 3 ancl 3

\-
10] Norma! Distribution
t7
6. A random variable X is normally clistributed
with yt=70ando!= 25. The third
rtroment about arithrnetic mean is:
(a) zero (b) less than zero
(c) greater than zero (d) none of
7. if X.is N(t,,iS), the fourth cent.ai ' the above
-- -' ,";;";;l;, Bd "
.; . ^t-'i_'"
^'
(a/ ua (b) 75
L,u
l.L (c:\ Rrr (d) 100
-\
m
,n., stanclard nonnal clistributio n, p(Z>

o
li,
(a) more then 0.5 mean) is:
(b) lcss than 0.5

. c
(c) equal ro 0.b (d;
t
9' Given a standarclized normal distribution clifficulr to tell

o
(with a rrrea, of zeroancl a stanciarcl
,leviation of one), p(Z < variance)

p t * ?u
is equal to:
(a) 0.8413 (b) o.B.1lB !=e^1LX
i;\ j;fi7 r'/ s (cr) o'ooo, 'tLy=

'-g er= f
t'/
10 ancl X is N(10' 2Q)' thc. ttteatr of '
,.,
'o
1o):

l
(a) 50 (?) Go 6^
(c) ?o
*1b
.
lI. If X is a normal .anciom variablc

3
-:iJ,
'J = 7, rr'y = X - ?" then stanrlard cleviation
p = 50 arru
a.d Dr'dlrLrili-r
standar.cl dcviation

4
ofy is;'JV

,. iii J rL r r 9 [] ',1 ,f,,',i;;i;,, b-


t9 (b)
area tc the left of (p+3; for a nonnal distnbution
tr.;
\-/ (,r, o.lti
l'h.. is api_.roximately equal to:

a
:

t
0'il1
ic) 0.50

s
r3'
(a) o.tgtb / , '41*
distribution with * =
,5:: ::^ru&,obabliry

/
ffif ;1iral or a varue-srepr.e,.

: /L\ .,...,- /rfl-.f,-


s
(c) o6elb :=Et>; [:] '
a,V+' g-,[ ','''
, ^

tt p i
313i3
14, For a normar distributnmiirrr., mern p i -: i
and.t*n,lu.,i ereviation
I

(a) c:

(c't h
Approxirnately b % of varues ar.e outside .,.i-
the
{L') A,proxir,at,e}y % of varues are greate. than'ange 2cr)\(pr -. 2o) --
'--l \r +
to (p .

1p * zor
r\pproxirnateiy 5 % of varues are outside
id) ' i
the r.ange (p*o.)
Approxirnaiery b 96 of varues are less ilian (pr -/ to
-- (pr
\r' r:)
15' - a";
The cl^stributio. a prop.e' probability distribution o1
'o.nal
random variable, the total-is a cohti,uous
area uncler it_," .u.r" ii*; i.,
.-. (a) equal to one ..ni i^" -"ll
lra'. 11 L!-o't'u' f:l"lP,,V distlibuti<.rn of a continuous ranclonr variablc, rht, v.Juc
J :t.:,undard deviation)is: *__,/
(.al zeto /L\ r^^--r
(c) greaterthanzerb-\ [:] HHf;;",.
*""_: '*:*-;J:::::::'

t] '> a'
18 Basic Statistics part-II
17, The value of e is approximately equai to:
(a) 2.7183 (b) z.17BB
(c) 2.8173 (d) s.1416
18. The value of n is approximately equal to:
(a) 3.4116 (b) 8.1416
(c) 3. 1614 (d) 8.6416

m
'[h' tt normal probability distribution with mean np ancl variance npq nray be
-'r used
" to approximate the-binomial distributioil.Ifii bd';";;;;i-,lo;a
o
;;f"rru,

. c
{a) greater than b
-,/ (b) Iess than i
(.) equal to b (d) difficuit to teli
20.Theparanleter'softhenormaldistributionare:
ot
(a) p and o2

s
ft) p and o
p
g
(c) np and nq (d) n and p.

o
21.

(d) - l
The median of a normal distribution corresponds to a r.alue of Z is:
(do
b
G)
.
1

(c) 0.5

3
. 0.b
22. The mean and standard
4(d)
deviation of ihe stanclard normal distribution are
respectively:
9
,
(a) 0and1
9
. &) landO

t
(c) p and or

a (b)
n and e

t
23. If a normal distribution with p = 200 has P(X > 225) = 0.15g?. then p(X <

s
17b)

/: /
equal to:
(a) 0.3413 0.8413

s
(c) ,0. ts87 (d) 0.b000

p
24. Given a random variable X which is normally distributcd

t
s-rth a rnean and

(a) 7 t
variance both equal to 100. 'lhavalue of mean deviation rs appro-\rmately equal

h
to:

ft) 8,,-
(c) 8.5 (d) g'
25, If X is a normal variate with mean 50 and stanclard deviation 3. The value of
quartile deviation is approximately equal,to:
(a) I (b) 1.5
(d ,2 (d) 2.5
26. In normal probability distribution for a continuous random variable, the value
of mean deviation is approximately equal to :
(a) 213 G) ZIB o
(c) 415 (ct) 4156/

1
Chapter 10 Nerrmal Distribution
19
27' In a normal distribution whose mean is p and standard deviation
o, the value

(a) 4lb (b) Atbo


k) 2/3o+ (d) zt}
26' In a normal distribution, the lower and upper quartiles are equiclistant frorn
the mean and are at a distance of

m
:

(a) 0.7979
o
(b) 0
*e79
o .

c
(c) 0,6745
.
(d)

t
0.67 4b o s
29, In a normal curve, the or:clinate

o
is highest at:

p
h) mean C (b) r,a'iance

s
(c) standard deviation (rl)
g
e,

o
30. The total area of the nor.mal probabiiitl' ciensity function

(ii) l
is equal ro:
(a) o
b
(b)

.
0.5
(c) la 0.2b

3
31' The normal curve is symmetrical and for symmetrical clistribution,

4(b)
the values
of all odd order rnoments about rnean lvil airvays be:
(a) 1

99 0.5
k) 0.25

a t (d) 0*

st
32, ln a normal curve p+0.6745ocovers:

/: /
(a) 50 9'o area 1 (b) 68.27 yu area
(c)
s
95.45 o/o
ateo (d) 99.73 o/o
?ttel

tt p
33. 'Ihe skervness and kurtosis of the nonnal distribution
arqrcspcctively:
(a) zer-o and. zerq, (b)

h
zero ancl one
(c) one and zero (d) onc ancl onc
34. In a normal curvd, the highest point on the curve occurs at the mean, p,
rvhich
is also the: '

(a) median and mode. (b) geometric mean ancl harrnonic rnean
(c) lower and upper quartiles (d)
variance and standard cleviation
35' The normal probabilit)' density function'curve is syrnmetrical
about the mean,
p, i'e' the area to the right of the mean is the same
as the area to the left of the
mean. This means that P(X < p) = p(X > p) is equal to:
(a) o (b) 1

(c) 0.5 (d) a.25


20 Basic Statistics Part-II
36. The shape of the normal curve de;rends upon the value of :

(a) standard deviation (b) Qr


(c) nlean deviation (d) quartilecleviation
37. In a standarcl normal distribution, the value of rnode is:
(a) equal to zero (b) less than zcro
(c) greater than zero . (d) exactly'one

m
38. In a standard normal distribution. the area to the left of Z = I is:

o
(a) 0.6413 (b)
c is:
0.7413

.
(c) 0.ti413 (d)
t
0.3413

o
39. The serni inter qua.r'tile range for a stanCard normal random variable Z 'd'

p
(a) . 0.fr7 ai, ft) 0.67.15 o

s
(c) 0.7979 (d) 0.?e7e o

g for a standardrzed
40. The lower and upper quartiles normal variate

o
Are

(d) bl
respectively:
(a) (b)
.
- C.674i' ry and 0.6745 o - 0.6745 and 0.67.15
(c) *

(b)3
0.7979 o and 0.7979 o - 0.7979 and 0.?979

4 0.b --\ ..
41. The value of ttre standard deviation o of a normal distributron is alrvays:

9
(a) zero
equal to greater rhan zero

9
(c) zero (d)
t (b) X=p+o '',..--1
Iess than equal to r . t Ii')

a
42. at:
The maxinrum ordinate of a normal curve is
C B- --_-j

t
(a) X=p -s^'

(c) X-p-2o
// s (d) X=o? :\r,,}\J
:
X
43. If is: t \
s
X - N'(100, 6-1), then stanrlarcl deviation o

p
(a) 100 (b) 64
(c)
tt 8$ (d) - 100 64 = 36

h
44. The vaiue of secoud tnotnent about the mean in a normal distributron is 5. The
fbur:th moment about the rnean in the distribution is:
(a) 5 (b) 15
(c) 25 (d) 75 o
45. Most of the area under the normal curve with pararneters p and o lies betrveen:
(a) p-0.5oandpt+0.5o (b) p-oandti-ro
(c) p-2o andp+!6 (d) p-30 and pt+3o.
, 46 ) If X is a normal random variablc
,J having moiln ;.t, thcn n l X - pr l is c,qtr:rl to:
(a) 'iariance (b) standard de,iiation
(c) quar:tile deviation (d) mean deviation 1--.
t
lCha 101 Normal Distribution
-*4i.__,If
X is a normal random variable having rnean p, the E (X - p;t';
(a) o,2 (b) o"
(c) 3o'r (d) Fr
48. Which of the f.llowing is possible in normal distribution:
(a) o<0 ' (b) o=0
(c)l, o>0 (d) o>n

m
49. The range of normal distribution is:

o
(a) 0ton (b) Otooo
(c) -l to+l
'lhe range of standard normal distribution
(dI - @ to + s)
is:
t . c
o
50. r
(a) Oton (b) 0m*f
p
I

s
(c) Otok (d) -ooto+.oC

#log
51. In the normal distribution, the iralue of the maximum ordinate is equal to:

#(a) (b)
(d) b
(c)
# 3 . ft,
4(b) #..
'{3:rrh the inate at points of inflection of the normal curve is equal to:

9
9 (d) ;hr E\
a t
t
"__"-t\

i 53. P(p
(a)
- <r <

/: / s
X. p * ir) is equal to:
(b)
.-.t--N z'I 4-

i/ *a Ar
r\
-l \-

s t
0.5000 "0.6827
A-(+--
(c) (d)

tt p
0.9545 0.e973
54. In a normal curve p 2o covcrs:
(a) 50% area
h
(b) 68.270/o areal
(c) 95.45% arcu - (d) 99.73% arca
bD. In X is N (p , o!), the percentage of the area contained within the lirnits p +' lio
2
is:
(a) 50% (b) 68.27%
(c) 95..t5% (d) 9s.73%,
56. The probabilily dcnsity function of the standard normal distribution is:
,, IJ .t2
1 !!_
(a) --+e 2o (b) -+e 4

w:ln oV:Z7re

1 -",'!2
.^ft- _22
(c) u (d)
1
e4
VStr ^JZ"
\
Basic Statlstlcs Part-II

67, The equation of the normal frequency distribution is:

-i(t,)' +.i(+)'
(a)
;h" (b)
vzTr

G) #'-*(?)' (d) +" j('"r'


m
6VZ7I

o
6E. If Xis N (lr, or) andif Y= a* bX, then mean andvariance of Y are respectively:
(a) p and o2
(c) a + bpr and sr
(b) a+pandbor
(d) a + bp and brord
t . c
;;il, o
-f ,]'it,,,ili
x1.{,'',.11

p
6r) ;;; J,-.;ffiil variat., tr,u)i'pr ; .n.",',i,

s
(a) 0.0260 ".rmar
(b) 0,4760 ,_ 1'
"l ',---
^1 o L-

(c) 0,96
og (d)' 0.9760 ?t

(b) l
60. lf a ctandard normal variatc, then P(: 2,676 SZ S + 2.676;7 ts equal to:

(d).b
Z Le

(a) 0,9961 o.ee


(e) 0.4961
3
4(b)
0,4949

9
61, lf Z iza ctandard normal variate, thcn P(-1,646 s Z s + 16.16) rs e qual to

9
(a)
t
0,90 0,e6

a
(e) (d)
t
0.98 o,es

s
6!, lf Zle a standard normal varlate, then P(:2,99 <Z s + 2,38; is equnl to:
(a)
://
0,4901 (b) 0,6E27

s
(e) 0,9646 (d) 0,9802

(a) tp
64, In normal distributlonl

(e) t
(b) (

h
meaR = mcdlan = modc meeR < tnediun tnocle
(d)
meen > medlan > mode mean * rneelian * mode
64, In e normal distributlon Qr = 20 and Qu = 40, then mean is equal to:
(a) 20 (b) Bo
(e) 40 (d) 60 .

66, The value of maximum ordinate in etandard nonnal eliutriltutiun ie e qual to:

#
I
(a) ('0) ::
ttZre

(c) :L r/zo
(d)
1

s!2n
-
1. (a) 2. (b) 8. (c) 4, (c) 6. (a) 6. (a) 7, (b) 6. (c)
9. (a) 10. (b) 11. (a) 12. (d) 18. (d) 14. (a) 16. (a) lO. (c)
17. (a) 18. (b) I9. (a) 20. (b) 21. (a) 22. (a) 28.- (c) 24, (b)

m
25. (c) 26. (d) 27, (c) 28. (d) 29, (a) 80. (c) 81. (d) 82, (a)
o
38. (a) 84. (a) 86. (c) 86. (a) 87, (a) 88. (c) 99. (a)
c {E.
40, (b)

.
4r. (b) 42. (a) 49, (c) 44' (d)
t
46, (d) 46, (d) 47, (a) (c)

o
49. gl_ 60. (d) 61. (d) 62, (b) 63. _ (b) 64. (c) 66, (d) 60. (c)

p
07, (rl) 58. (d) 69. (c) 60. (b) 61. (a) 62, (d) 68. (a) 04' (b)
s
66, (c)

og
Guests at a large hotel stay for an averago of g

bl
.
dayo with a standard deviation
of 2,4 days, Among 1000 gueets how many ean

3
be cxpected to rtay lem than ?

4
dayu' Aacume rhar length of stay ie normafly dia*ibuisd.

9
Ane, P03

9 tiailiouioi G;;rffiil;ffi;ii;
2' A rnaehine which altotnutieally paeks potatocd into baga ir

t
hnow.n to opcrage
with a mcan of r0 kg, and

a
deviatioo

t
'tandard
fincl thc pcroenrage of bags weight morc i.s,

/: / s
Ang.60ot
8' The heighte of large rarnple of nren werc found
3 to bc aBproxlmately normally

s
dictt'ibutod'with mean 67,b8 inchce and etanaarJ devtatlo n 2,67lnohu,
Flnd

tt p
rhc heighr cxcceded by 6 % of the mcn.
Ane,7l,79

:*,:flf.y'forh
rll8t'r'ibuted witlr standard dcvtation B,7E
rability of waltlng to go lnto
more than 20 minurec iB 0,0ggg: tr thc warring trme
Ir normdly
minutes, flnd thi iiai-*iitrog dmo,
Ans, ll,6l
6' If x is no*riaily rrietributod with u moan of 4 and a standard
doviationrof {,'
find the.pr,obnbrliry that X is less than 6,
Ans,0,6916
I' Find the Brobubility that the vrtlus of a etendard
normal variable ic less then i,
Ans,0,9772
7
'Find thc probability thau the value of a standard
normal veriable exseedl 1,6,
{ne.0,0668
Baslc Statlstics Paft'll
2d
teet are normally distributed wi\th
E. The acores made by candidatee in a certain
to 100' Find the probabilitv
meai equal to Egd "rd;;;Jard deviation equal
that a score will greator than 700'
Ane.0.022E .. ,
vary rn
g. A manufacturer of pipe knows that the pipe lengths-it producesdiameter
mean
that the diameters are norr"ily digtributed. The

m
diameter and a diamete
;robability tha6 a tength of
pipe will have

o
ilil;;h, ";;;"

c
of the diameters'
exceeding 1.1 inchee ie 0.16g?, Find the variance
Ane.0.01 (inchee)t
t .
o
working in a factory is Rs'28i
10. The mean wages of a certsin Sroup of workerspercentage

p
of workers' who get
with etandard deviation of Rs'60' Find the
above Rs.200.

g s
o
Ans.96.54%

l
it.a0 lnd.the fourth
i+' For a normal dietribution the firet moment.aboutof19the distribution'
11.

b
48. Find the etandard deviation

. =
;;;;u"rt oo i.

3
Ans.2
171'094' Find the standprd

4
L2, In a normal diatribution p = 163 and Q3

9
deviation.
Ane.12

t9 standard deviation is

a
13. In a normal diatribution the lower quartile ie 10 and the

t
10. Find the mean of the dietribution'

s
/: /
Ans.16.746
standard deviation o = 4'5' Find
14.. A normal dietribution has the mean p = 85,

s
tho value'of Qr.

tt p
Ang.88.04
standard deviation
16. If X ia a normal random variable with mean p = 113.49 and
h
o = 20. Find the value"of Qr.
Ans.100
16. Define normal distribution'
: i;: *rite down any five properriea of normal distribution.
18. Write down the equation of the normal curve
(l) with mean p and etandard dcviation o
(ii) with mean 50 and gtandard rleviation t0'
10. Define the normal probabi'lily deneity funttion'
t0. Defrne the normal frequency dietribution'
21, Define the etandard normal dietributign'
distribution and the nonnal
zz. what ic the relationJip between the binornial
distribution?
29. Describe the important properties of the normal d-istribution.
24, What is a standardized normal variate.
25. write down the ordinates of the standard normar curve at
(i)Z=L (i1)Z = *1.
26. Explain why odd order nroments about mean equals zero for the normal

m
27. Explain why Br equals zero for the normal distribbtion.
28, The normal curve is defined by thc equation '

c o
t .
where p=_o=_Tt= '
p o l=
s
e=_X=

g
29. Define the points of-inflection in a normal distribution.

o
30. Sketch the normal curve and then place the values for the means on the

l
respective X and Z scales. Verifu'that the area under the normal curve between

b
.
the mean and 2 standard ceviations above and below it is 0.gb44.

3
31. Sketch and verify the area under the normal curve between the mean and B

4
standard deviations above and below it is 0.g9ZB.

9
32. When is it appropriate to use a normal approximation to the binomial

9
''

t
distribution?

a
33. write down the basic properties of the standard normal curve.
34.
s t
Complete the following table for the norrnal curve with pararneters p and o.

/: /
Draw four graphs illustrating your results.

s
Given X-values Corresponding Z-scores Area between X-values

tt p
p - 0.6745o and p + 0.6745 o - 0.6745 and + 0.6745 0"50
p-loandp+lq
h $-26andp+2o
Ir-Soandp+3o
25 Basic Statistics Part-II

E)(ERCISES
If the random variable Zhas the standard normal distribution, find:
l.
(i) P(z < 1.46) (ii) P(Z> r.46) (iii) P(z < -1.48)
(iv)P(Z > -1.e6) (v) P(0.65 <Z< 1.99) (vi)P(0 <Z< l.Ll)
' (rii) PCl.32<z< 1.65) (viii) P(-1.25 <z<0).
Anr. (i),0.9279 (ii) 0'0721 (iiil 0.0694 (iv) 0'9750 (v) 0'2345

m
/
l{r) 0.3749 (vii) 0.8571 (viii) 0.3944.
o
2.lln a normal distribtrtion, M.D. = 3.9895, then find standard deviation, quartile
c
.
deviatiorr, second and fourth momonts about mean of the nortnal distr'ibution.
; Anr. S.D. = 5, Q.D. = .3..3725, p, = 25, tr r= 1875.
ot
p
<4, In a normal distribution with lL = ancl o = 5. l'incl the area:

s
-?o

(iv)between 12 and 18 (v) between 30 and 42

og
- ,'
l
Anr. (i\ 0.4772 (ii) 0.8413 (iii) 0.0228 (iv) 0'2&98 (v) 0'0228

b
d. The mean safes of all the different branches of a big cloth shop is Rs.10000 with

3 .
pencoDtage/proportion of shops the sales of which are between Rs.11000 and
R8.12000.

9 4
9
Ang.zlf.93 %.

t
tlXis a normal variate with mean I aryd standard deviation 3, find the probability

ta
s
Ans. t I o'iazftilo.zasz

/: /
0.
-' In a normal clistribution the mean is five and the variance is one. Write dorvn its
aqu"tion. Also find the value of maximum ordinate correct to two places of

,l= s
, "

p
decimala.

t
ht
g.40
^*YG*
frn"
'.
"-i1x-:,,),,
sconae made by candirlatcs in a ccrtain test are norrpgiry.{istributcd rvith
."""[OOO]nd standarcl dcviation 100. What percent of the candiilates.t"ff]Y.$
r**rililireater than 700 (ii) less than 400 (iii) between 400 and 600 (iv) fthich
eff", from mean bY more than 150'
tr"/(t')z.zlo/o (ii\ 18.87 % (iii)68.26 % (iv) 13.pa
8./lf,the average hgight of miniaturc poodles is eOlentimeters, 11th a standA
deviation ol 4.{centimetcrs, what percenta$qgf,,tn^iniature poodles .9xce9cl{.-JPl
centimeters in height, assuming that the heig[ts follow a normal distribution
and can be measured to any desired degree of accuracy?
Ans. 11.12 %

L
Normal Distribution
diameters of bolts rnanufactured by a company are normally distributed
iih
rnean 0.25 inches and standard dcviation 0.02 inchcs. A bolt is considered
&efective if its diameter is < 0.20 or > 0.28 inches. Find the percentage of
defeclive boits manufactured by the company.
Ans.716 orit
,/
10.Lf the rveights of ball bearings are normally distributed with mean 0.6140
newtons and standard deviation 0.0025 newtons, determine the

om
. c
(ii) gneater than 0.617 newtons (iii) less than 0.608 newtons.
o"rfr Bs.o4% (ii) 11.b1 % o.szy{
(iii) - -' l

ot
tVLet X be a normal random variable with mean = 16 and standard deviation = 5,
Determine: (i) P( between 1l and 2l )

s p
(ii) P( at least 26 )

g
,/' (iii) P( less than or equal to 6 ) (iv) P( at most 21 ) /
(i) 0.6826 .

o
(ii) o.oz28 (iii) 0.0228 (iv) 0.8418

l
Y
iZ.In a certain examination 3000 students appealed. The average marl# obtained
/

. b
were 50 % and standard deviation was 5 o/o.Y7ow many students do'you expect

3
who obtain: (i) More than 60 % marks (ii) Less than 40 % marks
/

4
(ir/Y Between 4A % and 60 % marks?

9
trs{ 61 ae (ii) 68 (iii) 286a.

t9 l-/
t'g. A..r-L the mean height of soldiers poffi,.{inches with a variance of I inches.

a
How many soldiers in a regiment
,"Y { lO0[would you expect to be cjver six feet
errs/sg
s t " ' N
14.

/: /
The mean life of stockings used by an army was 40 days; with a standard

s
deviation fg .auyr. Assume the life of the stockings follows a normal

tt p
distribution. If 100000 pairs are issued, how many would need rqplacement
(i) before 35 days? (ii) after 46 days?

h
Ans: (i)-26600 (ii) 22660.
l\$ven that 1t = 300 and o2 = L00. Find I
' (i)
the area above 314 (ii) thg two v4ues that eontain the middld 75 o/o area
/i:L n ,rrt r t. r.r \\
---In
{+ii)' Qr and Q, of the normal distribution.
Ans.(i)0.0808(ii)288.5,311.5(iii)293.255,3D8;!I45
16. A random variable X is npnnally distributed with mean = 70 and S.D. = 5. ,,
(0 Find a point that ha6 gZ.g % 9f the distriburion below it. \,
(ii) Fintt a point tt ut nuJSi.ZTof thc distribution abiffi
.6i) pina two such points between which the central 7O% of the distribution,lies.
,dv) Find two such points bctwccn rvhich the ccntral 90% of the distriSfrsn lics:
Ans. (i) 75 85 (ii) 65.48 (iii) 64.815,75.185 (iv) 61.775,78.225
I
I Basic Statistics Part-II
,
II - In a.normal distribution p = 40 and o = 3'8' Find: '
17.
'| gg falling between thern
{r*o points such that the curve has a % chance of
than 38'6'
(ii) the chance that a single observation will be iess
Ans. (i) 31.146,48.854 (ii) 0'355? '
18. if X is N(24, 16), then find the:
(i) 33rd percentile (ii) 9th decile'

Ans. (i) 22.24 (ii) 29.12'

om
c
under 50 and 10 % are over'70'

.
l.9.In a normal distribution 30 % of the values are

t
Find'thenleanandstanrlarddeviationofthedistribution.
Ans. (55.81, 11.09)

p o
ancl o = 6' find the value X that has:

s
20' Given a normal distribuLion rvith p = 40

g
(i) % of the area below it'
38
(ii) P% of the area above it'

o
t\
Ans. (i) 38.167 (ii) 49'87
l
J" '

b
is

.
students whose average r'i"'ight 150
21.Assume that We have a large number of th,"
lbs. and that the weights u'" "o'*'llv d"t"'b"e

3
If y"
i* ;;;; lt""Y
;;,d io a ru. w$ I !+v *re
f9*l:;l,lii
n d a r ddEi: a tion of

4
ffi ,:l?.'il il JT.i#r ffi
s t a
. ./ !\ - o.'l\tn -r

99
t
t,lrlD. , r .vv .._
i ?1 \.\,
60 and 15 0/o are over 90'

a
distributio n 25 % of the iterirb lre
normal 'n[iir
t
22,lna

s
Findthemeanandstandardcleviationofthedistribution..

/: /
Ans. (71.82,17.53) .
quartiles are 18 and 26
23.In a "normal distribution" the lower and upper
s
deviation'
r\lso find mcan
respectively. Find its mean ancl standarcl deviation'

tt p
Ans. :f,5.93,4.73
' ,.-
h
ancl 8g 06 are undcr 63.
zii{.-In a normal distribution 7 % of the 1': undcr
itcms are
he items
"11:t
Bl-r

Whatisthe*"",,andstandarddeviationofthedistribution?
Ans. 50.29, 10.33
nor'mally
25. The heights of a large sample of tncn were found to bc approximately
distributedwithmean6?.56it"rr".^"astandarcldeviation2'57inches'what'
heigJrt is exceeded by 5 "/i of the metr?
Ans.71.?9.
Ghapter
11
SAMPLING AND" SAMPLING DISTRIBUTIONS

om
c
11.1 INTRODUCTION

t .
In our daity life it is quite often that we have to examine sorne given material.

o
We examine fruit before we purchase it, we make a small study of the material

p
whenever we have to purchase something. Even the children check the swelets,

s
pencils, bats, rubbers and other items when they have to purchase them. This

g
approach is applied in different fields of life. The products of the factories are
inspected to ensure the desired quality of the products. The medicines are

l o
manufactured on commercial scale when their effects have been tested on the

b
patients. The different fertilizers are tested on agricultural plots and different foods

3 .
are tested on animals. Srnall darns are constructed in the laboratories to study the
life and other characteristics of the big dams before they are actually constructed.

4
Some colour may be applied on a wall, on a door or cloth etc., and the result of the '

9
colour is observed before it is applied on large scale. Cement, steei.and bricks are

t9
examined before using them in different places. This process of inspection is very
wide and is commonly used on various cccasions. But this job is never done on very

ta
large scale. This process is carried out on. a small scale. On the basis of this small

s
study, we make an opinion about the entire materiai under study.

/: /
11.2 POPULATION
I

s
The word population or sloiislicol populatiort is used for all the inrlividuals or

tt p
objects on which we have to rrake some study. We may be interestecl to know the
quality of bulbs produced in a factory. The entire product of the factory in a certain

h
period is called a popu.l,atitin,. We may be interested in the level of educarion in
primary schools. All the chiklren in the primary schools will make 4 populatioru. The
populatiot may contain living or non-living things. The entire lot of anything under
study is called populotiori.. All the fruit trees in a garden, all the patients rn a
hospital and all the cattle in a cattle farm are examples of populoriorus in different
studies.
II.2.I FINITE POPULATION
A population is callerl fhtite if it is possible to count its individuals. lt may also
be called'a coutttable populotion,. The number of vehicles crossing a bridge every day,
the number of deaths per year and the number of words in a book arc fhtite
populatiorls. The number of units in a finite population is denoted by N. Thus N is
the size of the populatiott,.
2g
rr--
Basic Statistics Part-II
30
IL,2.2 INFINITE POPULATION
Sorzretimes it is not possiblc to count the units contained in
the popu'l,oliott'.
suppose ihat we want'to
Litrch a populotiorl is callei irtfi'tite or turcotttttuble'Let us
ex;iiuine whether a cgin is true or not' We shall toss it a verv iarge
number of times
to obs;erve the number of heads. All the tosses will make an int'intte or coutr'tably
in.finite populotion.. The number of gerrns in the body of a patient of malarta
is
perhaps something which is utrcoutitoble'

m
11.2.3 TARGET AND SA1VTPLED POPUI'ATION

o
Suppose rve have to make a stuCy about the problems of the
farnilies living in

c
houses is'our
rented tor."u in a certain big city. All the families living in rented
torget popu,lotion..
-of
t .
The entire target trtoltttlation ma,r' not he' cousiqlerecl foi: the

o
;;;;J selecting a sample from the population. Some famiSes may not be

p
"popr,lrt*,r, to be inciucled in the sarnple.
interested We rnay ignore some_pa-rl of the torget

s
to ieduce the cost of study. The populolitrri out of which the sarnple is

g
seiectetl is called sornpled, p6pu.|o!iarr. ot' stutl,i.cd poprLLati,ort.

l o
11.3 SA}IPLE
Any part of the popuiation is callecl a sarnple. A study of the san[tle ehables
us

. b
population^ The number of units
tc rnake some decisionsabout the properiies ef the

3
bv n' A good
included in the sotrtple is called th,: size of the sonLple and is denoted
population . A sctntple

4
sornple is that or," *hi.h speaks about the qualities of ttre
This process

9
study ieads us to make some inferences about the populatio'n measures'

9
is sornpling.

t
"ailed AND STA ISTIC
11,3.1 PARAMETER

a
Any measnre of the population is cali.eri puratneter and the rvord sladis/ic is
used for any
s t
i,alue calculaied t'rom the sample The populaticn mean pt is a

/: /
pot.ctrneter and the sample mean X ,s a t;t'ai,i'st,ic. The sample mea.n
X is used to
r'"i""u o' ts a ltaronreter
s
estimate the population mean pt' Sirnilarll' ilte F;opulation
*nJ tn" .r*pt" r,ariance S3 is a stotisti,c; In gerlerai ihe syrrrbol 0 is used for a

tt p
paranteter is
poirarneter anrl the syrnbol 6 iu ,."".1 fcrr a siati.stic. The vaiue of the

h
mostly"unknown anrl-the sarrple starist:r: is '":secl to ura.kr: snme i.nferences about the
trnknorvn paronreler
I1.3.2 SAMPLING FR;\C'IION
is called
If size of the population is I'J anrl size r,t'the sarnple is n. the I'atic' ft
lf N ''' !gt} i' i: lC, thc i:iitio ft- iJ}== io' It means that
on
thc,sorrrPli rtg {rrtction.
thc rrverage l0 units of tire population x'iJl he rcpr'*sentcri hV one unit in
the sample'
. I{ the srtrtltlittg fractiotr fli. n]rltiy;iir;.I .vilr, 10[r, rf i: g8t. Ehe sa:npling f'cr'ltort
tu

n ii)
100 -- !ila - J-t nre;rns 10 ?'r of thi: populatirin,
percentage form, Th'.:s ft'( 100 =
_#'
.:s inciuded in the -"ample.

\*_
lChapter 1I,l Sampling and Samptiqg pis-tilglions 31
rT.4 COMPLETE COUNT
If we collect information about ali hhe inriiviCi-rais in the population, thc study
is called contplete couttt ot cotilplete etuttnereiio;r.. The word cerr.srus is also used for
the entire population study. In statistical si,ud,*s the t:ontplete co1,ril, is usualll'
avoided. If size of the popuiation is large, the c,,tr;itlete count requrres a lot of time
and a lot of funds. The contplele coulr/ is rnoslllr rjiiTicrrit i'or various reasons. Suppose
we ivant to make a studl'about the cattle in thr: cattJ.e thrrns in our country. Vy'e are

m
interested in the average cost of thcir fcrotl ibr a c*rtarn period. We rvant to link their
cost of food with their sale price. This rs of cc,urse an import,ant study. It is very

o
difficuit to coilect. and maintain the inforrnaticn al.,or:t each and every cattle in thl

c
.
farms. If at all we are able to cio it, thc stucly rnai, rrot be of mucl, ,r.. 'l'hc rlesiled

t
information can be obtairred from a reasonabie sainple size of the cattles.
11.4.1 POPULATION CENSUS

p o
A complete cotttrt of the humal populabio;'r is callccl psltliolrrtrl rcrrsrr.r. In

g s
Pakistan, the first populotio:L cetlsus was conclucred in 11)bl anrl thc scconrl was
conducted in 1961. The third c€tlsrts af pa1,.uLat,loru could:rot be conductecl in 1g?I

l o
because of agitations in the then East Fak.istan, It wa.q conrlur:terl in 1g72. The zrth

b
cetlsu,s rvas conducted in 1981. The fifih pcpulation census was concluctecl in 1gl)8. A

.
lot of information is ccllected about the h'-rirran ocpuiaition thmuqh tlxl. psi1111r11,r,,

3
census conducted regularly after every 10 ye,ars. Thr. ,_:er:r:ls reports give inforr;raliorr

4
about various characteristics of the ;ropulation e.g.. i,he urhan and rirral pcrpuiaticir,

9
the skilled and un-skilied Jabour iorce. the a.ri:icullural Iabr.r.-rr force ar:rt the

9
industrial workers, level of educati,rir. .rncl illiteracy in lhe cnunl;r3i, geographical

t
distribution of the population, age and spr distnbution of the populatinn etc.

ta
11.5 SAMPLE SURVEY

s
If it is not essential to conduct the complete enrrmeration, tirbn a sarnple cf

/: /
some suitable size is selected from the population and the st;.rdy- is carried ,.rut cn ihr:
sample. This study is called sotrtpit: i;tl,t't-tey"
'\fost of tiie r.?.sear.ch vrork ll ii6n*

s
through santple surueys. The opinion cf thc voie!'s in, favou'r o[ ceriarri .propcsed

tt p
election candidates is ohtained through sontpl.e.strr.uc.),s
11.5.1 ADVANTAGES OF SAMPLING

h
s,ampling has some advantages over the complete count. These are:
(i) Need for Sarnpling
Sometimcs there is a necd for santpling. Suppose wc u,ilnL to inspect Lirr: cggs,
the bullets, the missiles and the tires c,f soure firrn. The study rnav be such that the
objects are destroyed during the process of inspection. Obviously, ure cannoj. afford to
destroy all the eggs and the bullets etc. We have to take care r,har t:he \\asr.age
shouid be minimum. T'his is possible only in sarnple st,udy. Thu-q.sarnpli*g is
essential when the units under stucty ale destroyed.
(ii) Saves T'ime anrl Cost
As the size,-rf the sanrplc is srnall as oornpared to the poirula.t;on, l.he trrnr.ariri
cost inv:olved on sarnple study are rnuch less thah the complete counts. !'cr coml !eL3
tlouht huge funds iire leouired. 'fhcrc is alrvays the prohlern of {iaancerJ. .,\ srnalj,
32 Basic Statistics Part-II

sample can be studied in a limited tirne and total cost of sample study is very srnall'
For complete count, lve need a big team of supervisors and enumerators who are to
be trained anrl they are to be paid properly for the work they do. Thus the sample
study requires less time and less of cost'
(iii) Reliability
about all the units of population, the collected
If we collect the infrirmatii-rn
information rnay be true. Bu[ we are never sure about it. We do not know whether

om
the infbrmation is true or is completely false. Thus we cannot sayr.anything with
confidence about the quality of information. We say that the reliobility is not

. c
possible. This is a very important aclvantage of samptring. The inference about the

t
population parameters is possible only when the sample data is collected frorn the

o
selected sample.

s p
(iv) Sometimes the experiments are done on sample basis. The fertilizers, the seeds
und th" medicjnes are initially tested on samples and if found useful, then they are

g
4pplied 6n large scale. Most of the research work
is done on the samples'

l o
(v) Sample data is also used to check the accuracy of the census data.

b
II.5.2 LIMITATIONS OF SAMPLING

3 .
Sometimes the information about each and evcry unit of thc populution is
required. This is possible only through the complete en..meraiion because the
sample will not
"uio"
9 4
the purpose. Some examples in which the sampling is not

9
allowed are:

t
(l) To conduct the elections, we need a complete list of the voters. The candldates

a
participating in the election will not accept the results prepared from a sample'

t
With increase in literacy, the people may become statistical minded and they

s
/: /
may become willing to accept the results prepared frgm th9 sample-.ln advanced
countries the opinion polls are frequently conducted and unofficially the people

s
accept the results of sampie surveys'

tt p
(ii) Tax is collected from all the tax payers. A complete list of all the tax payers is
required" The telephone, gas and electricity. bills are sent to all the consumers. A
complete list of the owners of land and property is always prepared to maintain

h the records. The position of stocks in factories requires complete entries of drll
the items in the stock.
T1.5.3 SAMPLE DESIGN
In sample studies, we tfave to make a plan regarding the'size of the sample,
selection of the sampie, collectign of the sample data ancl preparation of
the final
results based on thl sample $[udy. The whole procedure involved is called the
sarnple d,esign,. The term sample survey is used for a detailed study of the
sample' In
ge.reral, the term sample survey is used for any study conducted on the sample
taken from some real world data.
11.5.4 SAMPLING T'RAME
A complete list otl all the units of the popdlation is called the sontpling t'ronte. A
u,ni,t of population is a relative term. If all the workers in a factory
make a
s
population, a single worker is a unit of the population. lf all the factories in a
country are being studied for some purpose, a single factory is a unit of the
population of factories.'fhe sorupling frortrc contains uit thu units of the population.
trt is to be defined clearly as to which units are to be included in the frame. The
frame provideb a base for the selection of.the sample.
1r.6.5 EQUAL PROBABTLITY

m
The term equal probability is frequently used in the theor:y of sarnpling, 1'his

o
term is quite often not understood correctly, It is thought to be closc' to 'equal''in

c
meaning. It is not brue always. Suppose there is a population of 50(N = 50) stuclents

t .
in a class. We select any one student. Every student has probability li50 of bcing

o
eelected. Then a second student is selected. Now, there are 49 students in the
population and every student has 1/49 probability of being sclectecl. When thc first

s p
student is selected, all the students have equal (1/50) chance of selection and when
the second student is selected, again all the students have equal (1/49) chance of

g
selection. But 1/50 is not equal to 1149. Thus equal probability of selection means the

l o
probability when the individual is selected from the remaining available units in the

b
population. At the tirne of selecting a unit, the probability.of selection is equal, It is

.
called equol probobi,lity of selection

3
I1.5.6 KNOWN PROBABILITY

9 4
. In sampling theory the term htrcwtt, probobility is usetl in random (probability)
sampling. Let us explain it by taking an example. Suppose there are 300 workers in

t9
a certain factory out of which 200 are skilied and 100 are non-skilled. \\re have to

a
select one sample (sub-sample) out of skilled workers and one sample or,rt of un-

t
skilled workers. When the first worker out of skilled workers is selected, each

/: / s
worker has a probability of selection cqual to l/200. Sirnilarly when thc firsr worker
out of un-skilled workers is selectcd,. each rvorkcr has a proliabiiity of sclcction equal
to 1/100. Both these probabilities are httouttt., though they are not equdl.

s
tt p
11.5.7 NON-ZERO PROBABILITY
Suppose we have a population of 500 students out of which 50 are non-

h
intelligent. We have decided to select an intelligent student from the population. T'he
probability of selecting an intelligent student is 1/450 which is non.-zero. In this
example, we have decided to exclude the non-intelligent students from the
popuiation for the purpose of selecting a sample. Thus prr:bability of selecting a non-
intelligent student is iero.
11.6 PROBABILITY AND NON-PROBABILITY SAMPLING
The terrn probabilit,y sontplilry is used when the selection of the sarnple is
purely based on chance. The human mind has no control on the selection or non-
selectioryof the units for the sample. Evely unit of the population has known non-
zero probability of being selected for the sample, The probability of selection may be
equal or unequal but it should be non-zero and should \e htr.outru. The probobili,ty
stlttlplhlg is also called th" ,g{&ry:X*plrng (not simple randorn sampling). Some
examples of random sampling aiE-
7'-
34 Basic statistics Paft-ll
(i) Sinrple t'irnd()ln surltpiing
(ii) Stratilied t'atrdotn sampling
(iii) Systematic random -sanrpling.
ln trcru.probability satrt4tlirq, the sanrpk: is not based on chance. It is rather
determined ty sornc p"r.u,i. \\'u ,,u,rn,'i, assign to an elenrent of population'the

m
probability of it* being selectetl in the sample. Sornebody may use his personal
judg*ment in the selection .of ttre sample. Iri this case the sampling is callecl

c o
ju{gn,,tr,rt sanplittg. A clrawback in notr,"probobility samplirtg is that such a samplc

.
.uonot b* us"cl,to cletermine the error. An.y statistical rnethod cannot be usccl to

ot
drnw int'eyence frt-rr1 thr.s sarnple, llut it should be retltetnberecl thrtt .luclgetttent
sarnplilg becornes esscntial in sorne situations. Supposc we have to take a cmall

s p
*urp6 lrn,, * big hcap of eoal. Ws cannsb ntnke a list of all tlie pieces sf eoal, The
uppur part of the hcap rvili havc perltaps big pieces uf coal, We hnve to use our

g
quttlity of'eortl,'l'hc rto,l'
lo,ignninot in sclectirrg n surnple to hrn'c rtn rclctt abottt tho

l o
probability santplirg is also culled [ron'randonr silnrpling

b
11.6.1 SA}IPLING wlTH ITEPLACENIE}i'I'

3 .
Sampling is ealled u,ith replar:enrctLt when a unit celeeted at randoln frorn the
populatiol is ;eturned to the populatieirt nntl thefi a Bceond clcment ie gcleeted rtt

9 4
.nn*n*. Whenever e unit is ssleluea, the pepuiatisn eontaiRs all thc Bame unitc, A
oni, nrry be selected ttrorc than onee, t'here ts ri.o cltango at all ln the .eize of the

t9
popolrt*rr at any stage. we eun a58uui€) that a orilllple of any size ean be seleeted
iroru ttre giverr pnpulitiol bf any siae, This ie only a thesretical eoneept and in'

ta
practieal u-ituatiorr. the sumple i.s hot eeleeted bv using this eehcnlc of teleetion,

s
bil;;;. the pspulatirrri eii{er N = 5 urrd sartrple eise r = 2, nnd sainpling ie dono uiidlt

/: /
iiitorgi,,n,i.r,'Olt of 6 elerne'nto, tlie firet olernsnt Pan be seleetcd in 5 ways'Thq
seiecteil unit ie returneril to tLre ainiir.lot and nCIw the Beeond uniE ean ulso be scloebed

s
in A *ry*, thtlB in total there arrr S x I * 26 samples 0r pdirE whieh Hre pss8ible,

tt p
Soppn*o d eontainer r;orrtaind 3 gor-rd bulbe denoted by Gs, O, and 0., trntl 2 det'eetive
UoiUr den6terl by D, anri Do. lf arry two br,rlbs arc Beleeted rriitfu rcpldeentet r, there

h
are 26 possil-rle satnples listed betwecn in 'l'ablc I l' l'

Llr Ur
ttiblc 11"1

Gr Di D:

tJr OiGr 0rOg G t0,r 0rDr e rDr


(}z GrGi 0gGc G*Gs 0rllr 0r1l:
Ga 'GrGr G,iGp [J,iG,r C,rD r 0,l]o
Dr DrGr IJIG,J IJrG,r Drllr Il r I)'J.

Dg DcGr IlrGs DrGr DgDr DsDg

given tty N" = 6s = 35. The seleeterl stttttplo rvill be


'possible ia
The nUrhber rif'sdurples
*ny one nf th" Z5 samples. lXach sample ttas. equal pruliabifity l/:26 of
Eelection, A sample eelected itt this mannel is r:allerl sirnple randotn curnplc

\_
IChapter 11]_ !31npling and Sa m pt in g Distri butions 35
I1.6.2 SAMPI,ING WITHOUT REPLACEMENT
Sarnpling is cal.led uti.lhout reltloretttt,n l rvhcn,iurrrt rs sclcctcrl irI runrjdrn fi,onl
the popultttion untl it is not rctur"rrccl to thc rnrrin lot. I.'irst, unrt is sclor:i.r:d out ot'a
populatiorl of sizc N and tltc seeotttl unit is sclcctcrl out of the rcrnai,rlr,!j
lj{.rpulrrtio,i
of N * I units and so on. 'lhus the sizc of the populat,ion goes on elccr,easirrg as the
satrtple size n incrcases. 'l'ltr-. surntrrlc size rr cannot exceed ttre population sizc lJ. 'fhe

m
unit once selectcrt lbr: ir sarrrpL-, cannot be repeatert irr the sclrne rrilflple.'fhus nll the

o
units of the samplc,, arc distinr:t frorn ont: anothu'. A sarnpla tuil.houl rcpiace.trrt:ti,l can

c
be selected eitirer: [:\, usrrtg t[rc irlerr of pelrnutations ol' cornbirraiiens. I)epending

t .
tlpon the situation, rve lvritr: all possiblc perrnutirrtions or cornbinations. It' rhe

o
clifferent al'rangetnents of thc utritrr *re to be cunsidererl, then the pernrutations

p
(arrangernents) are writren ter get nll possible samples. lf thc nrrilngemcnt of i.rnits rs

s
of no interest, we wriee [hr corribir)ations to gct all possible hirrnplL,,.j.

g
11,6.8 COMBINATI()NS

o
Let us aBarn'conhrrl,-'l.n klt (population) of 5 bulbs with lJ goorl (C,, G, ancl G,)

bl
anel 2 defeetivs (D, and Ur) bulh,*. Suppose we harve to selecb two l:ulbs in any or.rler.

there are uC, =

3 .
,h= l0 possibk: corttbiintiotts ar sotrrltlee. 'lhese cotrtlttttnliotr.s

4
(samples) are listed as,Grr_1", GrG,, G,D,,G,l),,, L],D, GrDr, G,,lJ, GiD!, D!IJ,.
G,1G,1,

9
There are 10 possrbie sarnples and each of thern has probabiht,y of sr:li:ction

t9 r u, *
cqual to l/10, The eelesterl sattrpie will be {rny orre sf tliese 1b sarnpl,rs. fno sanrple

a
seleeted in thie rnanner ie also enlled simple l'nnelorir anrnplu. In g,rni:rul, thr. ntrrntror

s t
oIaanrplee by roirt bitrutirtttsia equul to i'r(: --=F!- -=

/: /
n!(N-n)! '

I I.6.4 PERftIUTATIONS

s
Ear:h esfiltitratitrn SuttoraLcs B hutnber of ar'r'arrgemcnts (perrtutali.ilrs), 'l'itus

tt p
in general the nurnbet'sf pci'ttltttalions is gl'ester tlialt the riumber of combirratrutrs,
In the Freviou'e e ttirxple sf hulbs, if the or,rler rif tlic solccte il bulbe is to he corlside'r,ed

h
then tlrc nuurbet'of uarnplrs by petirtrttoliorro is givur l,',,'1, ,- Fl.,=
' 'r -= (6-g)! -' ?0. Thnsp
sarnpleBarc: t
G,0, GrG, 0,G, orG, GrG, GrG, C,D, DrGr G,D/ D,Gl
orD, D,G, (12b.t D'G' CrD' D,C,r GrDs [toc, DrDr Dru,
Eat:h saurpie hrs ;rr'ol"rutiilit3, ol stleetion equal to 1/t0. 'l'he ssleeter{ sumple
liuupil"tg Lh vtett t,h* ci'riirl t,i'the buiLrs will tre rr,rS,or*u r;l tlrr-,sc Z0 salrtlrles. A sun prle
sclr-rcLeil irt fltis lll;,11irlut is uirio cailerl srrrrplu lrrrrrl6fir strurlrlc trecatrsL eat,h snnrpi"
hae equal protrabiliry o1'Leing aclaeted.
I1.8.6 SIIITPLE NANDOM SAMPIT}
SrmPL: t'ilIldoln srttnplc (SltS) iB il sl)cclfll c,lsc of a rlrrrlorlr sar-trllle , .\ sarrrpie rs
r;rllcrl uitripl,': tntttlr;ttt sampie r{'r,lrch rlnit uf thc iropulatrotr has urt uqull qhliii,:rt uf
:IciIlB sr:lr;itutt fLrr tltt't'ttttl,lt: \\'irt,trevcr;l ut'!tt is sel,,r:tcrl {irr tlr,: tlit, unils
",rttrpl,',
36 Basic Statistics Paft'II

of the population are equally likely to be selecteci. It rnust be noted that the of
of selectilg th; first elcrnent is not to hc cornparcd wrth thc probability
probabiiity-the
;;i;.;id second unit. When thc first unit is.sclcctcd, all the units of thr:
p"prf"ti"r have the equal chance of selection rvhich is 1/N. When the second unit is
selected, all the ,orouining (N - 1) units of the population have 1/(N 1) chance of
selection.
Arrother way of defining a sinryle rond.anr.'sonrple is that if u'e consider all

m
po.*i[iu .urnptus of si.ze n, then each possible sarnple has equal probability of being

o
c
selected.

t .
If sampling is done with replacement, there are N" possible samplcrs and each
sample has probability otl selection equal to 1/N". If sampling is done withor'rt

o
Ncn possible samples ahd

p
replacement with the help of cornbinations then there are

s
are-rnade with
each sample has probability of selection equal to 1/NCn. If samples NP,,. Strictly
to
g
permutations, each sample has probability of selection equal 1/
speaking, the sarnple selected by without replacetnent is called simpl'e

o
ron'donr'

bl
.
i1'.'fj"rrurrERENCE BET*EEN RAND.M sAMptE AND srMpLE

3
RANDOM SANIPLE

4
of
If each unit of the population haq known (equal or un-equal) probability
selection in the sarnple, ihe sample is called a randour sarnple. If cach
unit

9
of the
population has eqrril ltrobability of 5ei.g sclcctcd for thc satnplc, the s^rnple

t9
oUial.rea is called .i*plu random sa.mple'

ta
11.6.? SELECTION OF SiNTPI,B RANDOM SAI\TPLE

s
A sintple ranclotn sarnple is usuaily seiected by +vithout replacement'- The

/: /
following mlthods are used for the selection of a simple ron'dont sontpl'e"
(i) LotterY Method
s
This is an old'classical methocl but it is a powerful technique and modern

tt p
popul.ation are
methods of selection are very close to this method. All the uriits.of the
frame. These numbers are written on
nurnberetl from I to N.'Ihis is called sampling-rnetallic

h
the small slips of paper or the small round balls. The paper slips or the
metaliic balls shouid be of the saine size otherwise the selected sarnple
will not be'
ball is picked
truly random. The slips or the bails are thoroughly-mixed and a slip or
up. Again the population of slips is mixed and the next unit is selected.
In this
selected' The units of the
manner, the number of slips equal to the sample size n are
pop,rtutio,, which ufpuu, o" the selected slips rnake the shnple-ran'dor.rt
sorDple ' This
ir"tnoa of ielection l* .onr*only usecl when size of the population is small' For
a

Iarge population there is a big heap of paper slips and it is difficult to rnix the slips
properly.
(ii) Using a Random Number Table
All the units of the population are nurnbcrecl frorn 1 to N or fi'orn 0 to N - 1'
We
thc sizc
ronsult the random nurrrL., table to talte a sirrlple rondo1t santple ' Supposc
rs 80 and rve have to select a random sample of 8 units' The uniLs
;;;ffi;il;;
of.the population are numbcred from 01 to 80. We read two.dlgit numbers frorn the
table of random nutnbers, We ean takc a rtart from any columnE o, ,o*, uf tnu 6gbls,
Ixt ua eonsult randorn tmntber tolrJe givcn in thir iook, i*u.aigit ioiri6u,,. ui,
talren frdm the table, Any number aboio B0 will bo isnorsd ;Jl? ili;i;,;ffi;
repeated, we lhall not roeord it if ranrpling lr donc wlthiut rCplaoemunt, t*t uc road
thc first two eolumne sf the tnLlo, Ths rnridorn nurnboi from trrs muis iltoJo, B?,'08,
1.2, 06, 81' 68 and ?3, The two numbot's 0g and 86 hnvo not becn

m
rooorrlgd boonuao
the population docs not eontain theso numbem, ThJ unrii ;atfi purruioiinn whoc€

o
numberr havc baEn seleetod nonstitule tho edmplo ,o,rdin io,,tpl6,-G;i;;;r!il;;;

.; c
that,thc rlae of the populution ie 100, If the uniti arc-numbercd frsm OOf to 100, we

t
rhall havc to read B,'tigit randorn numbore, Frorn ths fimt B eolumns of tho ,;,iotlo,t,

o
numbcr table, the randorn nurnberc aro 100, gZO, Oge, OgO; itrild uo, Wn fioA

p
thet tnoEt of the numbers are abovg 100 and *e aru *artlng our tlrno wlriio roadiire

s
thc teblo, We oen avoiel it by nunrbdring tho unltr oi thi pop"ulatron ftom d0 to 00, Io

g
!Li^i-Iut, we shall read Z'digit numbsm fiom ttrs taulo, rrrul if N tt i00, 1000 or

o
10000' tho numbuing ie done from 00 to 00, 000 to 0gg or 0000 to g00g,

l
(ltl) Urlnl the Oomputer
*.. {h. faollttyir urcd
. b
of seleetin g-a aintple, random aantplc lr evallable on ths oomputorl,

3
Tho.aomButor for relee.ting a rample prlzo.bond
of wlnncm, a ramplo:of HaJ

4
applleante, a eample of applieants for'rmidentiel'plotr ina iur-roii6u. othor

9
PUrBois0,

t9
11,7 EBnORg

a
Suppoee we ar€ interE$teel in tho valuo of a populatlon paramotQr, iho truo

t
value of whieh is 0 but is unknown, llhe knowledgo aisut O oan bo obtalnarl either

/: / s
from a rarnplo dntn rrt' frotrt thu po;luLfiion dntu, tn bsth enrsr, thsro ir n pon;ibillty
of not roaching thtl truo valus of tlrs parumotor, The dlfforgnag bstwgun the

s
saleulatod value (fronr aatnple datn sr from populatisn data) end the truo value of

tt p
the paramctff ia ealled errot', Thuc error ir romsthing whioh oannst bs detgrmincd
aaeurately if the populutiott ie largo and tho unltiof ttre
Bipufatisn are to bo

h
meerurEd, Buppose we al's interested to flnd tho total produetion sf
.whoet ln
Palrirtan in a eurtttilt ,vuar'. $ufficiont fundr and tlmo ars'at uur it pu.ul and we
went to gEt thc 'trtte' fi_gut's about production of whsat, Tho maximum we eai do i.
that we "eontaet all the farmers and eupBore all the farmers give maximum
cooporation and supply ths inf'ornration ac honootly ar poraible,
But the informatiori
rupplied by tho fartncls will havc errors in mo:t of the ca56r. Thue wo may not be
ablc to idontifv tlru 'uue' figur,e. Incpitc of all efforts, *rii.u b;
salculated or the obsen'oel figure may be good for ail practical purpore5
i; d;rk;;.;, il;
but ws gaR
ncver clairn that a ,'us value of the pl*u*-.tu, hu. i; obtainod, r? tr,o ,iuay of ths
units is baaed on 'countittg' tttuy bs we can get the true fijure of the puprlntinn
Brrrametcr. There are two kirrclc af errora (i) aampling erroru or random orrors
(ii) non.eamDlin*r r,,,,^.,,.
rF
Basic Statistics Pail'II
38
TI.?.1 SIAMPLING ERBOBS
sampling' The sample
These are the errors which occur due to the nature of
,.t..t"a-f.m d; population is one of all possible samples' Any value calculated
sample statistic' The
;;;g ;;;pi. is baeed on the sample data and- is called
sample statietic may or may not be close to the
population parameter' If the statistic
value of the poputation parameter is 0, then the differen." -
6 0 is
f?|Pj]ffiil;e
m
is a random variable
called eantplittg error.It is impor tant to note that a statistic

o
error is the difference

c
and it mai taki any value. A particular example of' samplitr'g

.
, between the eampl. ,."n f; and the population mean p' Thus sarnpling error is also
;]ffi;
ot
t ;.'The population parameter is usualry not known,' rherefore
the
due to the'

p
sampling ernoris estimatcd from the sample data. The somplhtg-eL'r'ot'is

s
sarnple. obviously, a part of
rsaron that a certain part d[ the populntion gocs to the

g
of the poptrlntion' Btrt
;;;;r;i;tion cannoi sire the true picture of the properties

o
givcs the resulb which is full
one rhould not get the inipreseion that a sample always

l
manner so that
;;;;. W.."n deeign a eample and sollect the sample data inbea reduced
. b
' the oorttp ling errorc are redUced. The sampl'ing errors can by the
following methods:

43
(i) by increaeing the gize of the sample (ii) bv
gtratification'

99
(i)
a t
By Incrcaelng the dze of the sample

t
.The sampling error can be reduced by increasing the sample size. If
the sarnple

/: / s
error is zero'
size n ie equal to the population eize N, then the sampling
(ii) BY Stratlllcation

; ; s
random sample is'

tt p
When the popUlation contains,homogeneous units, a simple
, ltkely representative of the population. But if the population
contains
of

h
. dissimilar urrite, a simple random *"*pt" may fail to be representative of ali kinds
is
;;t"l; rh" gopulation. To itnprove the resllt of the satnple, the.sample'design
groups contair"ring similar units'
modi6ed. The iopulation is divided into different
groupt are calted slrata. From each group (straturn), a sub-sample is selected
Thesetrlnao,
manner. Thus all the groups are represented in the sample
and
irr
.rtor is reduced. It is called stratified-random sampling' Tl^re. size of
" the
ilrli",
t il-;".it from each stratum is frequently in proportion to the size of the stratum'
600 are intelligent and
Suppgsg'a populatiorr consists of r00b students out ofruhich
have this much
400 are non;intelligent. we are assuming here that we do
100 is to be
i"f*;;r6r, eUout the population. A stratified sample of size n =and the size of
, ;;i;;;;i. n .;ir" of the stratum is denoted bv N, and N, respectivelv
;;;;;;br}*, each stratum may be derroted bv n, and n,' It is written as under:
buUons 39
Stratum No. Size ofstratum
Nr = nI Nt
1
I 60Q
nt=T-= loo x 6oo
I
-0O0 :60
Nz = 400 nl Nz loo x 4oo
'D2=T= 1006-=40
m
N,*N,=N = 1000 trrf tlz = n =

o
100

c
The sizc of the sarnple from each etratum

.
has been calculated according to the
size of tlre straturn. This is called pr;opoytiornt

t
allocation. In the above eample

o
design, the sarnpling fraction in the popuration
is ft = ffi = * and the
p
sampling fr'rction in bolh the strata ie also

s
lllo .Til, thie deaign ie aleo called
fixed santprhtg fractioru. This moclified e",rop;u;;ig;'i; ftcquentry

g
surveys tsut rhis design requires some p*rio, used in sampre
i;i;;;;ion
o
about the unite of the

l
population' on the hagis of ttris inroimation,
the population is divided into difrerent,
strata' If the prior information is not available-

b
tt.n the strati{ication ie not
.
.
applicable.

3
1I.7.3 I{ON.SAMI'LTNG EBRORS

4
There nre certaiil sdurces of eriors which

9
Qccurs both in eample survey ae well
as in the conlplet '\ entlmeration. These
emors are of common nature. Suppoee we

9
study each and.everv unit of the pop.ulntion.

t
is the populatiorr meon nnd the 'true' ,utu"
Til;ill"tion pariimeter under study

a
,f it ic p which is unknown.

t
we hope Lo get the.value p bv a complete ;"r"t"i"-r"irt"r
;ii tr,. units of the population.

s
9f
we get a value cailecr 'calcuiated' o"-'Jrrir;' ;;J "f

/: /
the population mean.,Thre
observed vzrrue r,av bu denored bv
rrcal. The differ.n; b";;;il;ffid; (true) is
called tLor'sqrtplii.g error. Even if wp,.atudy

s
the popuration unite ,"1;;;;;
conditirns, there rnav srilr be rhe difference-il;J;';;;;;;;"

tt p
population rir€il, an'! the true value r}uu of ,the
-ii,i-i,,iiti4g
trir p"prr"it-rr',nu"n
--- ernoro
may occur dujs 1., rnflr/ reasons. some "r
of theri i"",--

h
.
(i) The units of the. pnpulatioa may not be defined properly.
suppoee we have to
carry our a stud,,about skilled labour force
per3 jn' S,tF - pt cr:le do more
in orr'.ou"tiv.-fin;;;';k;i;
than one ioU. Some ao tn" secietariat
as the technical jobs. some are skilled rUr *"fi
worker. Tiius it is important to crearly
["t ii"v *" "r
doing the job of un-skilled
otherr+ ise there.will be *o*'sanrpli,ts-iiroii
aufi""-iir" units of the ,;;;1"#;
the "^.n_.nlc stur y.
ili ;; the population count and
(ii) There :.ray be pcrr r'sponse on the part
of respondcnte.
-irrri, The people do not
supplv corrccI i,forrnation about their
propertv etc' Theue errors are likeily
incoml, children, th"ir;;;;;
to be of high magnitude in populatioi
rhan the -ar:lpre sturly. '[o redu-cq1[;;;";H
;rp"il;;-;r. "tuJv go be
{o
snumoratorc may
(iil) Thc thinfr ln humrn hend are llkely t'o be mlE'handled' Tho
bo oarolo.. o,'iil.yil ffi;
d; ;l;ilir t;intain uniformitv t'f^*tnt:"*ii:'
iilffiffiil rrr*;$ Bopulatisir or from rhe eampla,
ffi"'#iil&!'rffi;iiy''ruriu*. tn itrb population ttatn than thE
Thsrc orrlotr are llkely to b0 tutu
remplo datr,
part of thc
'blet', Blas moans &n 0rror on thElliau rnay bc
tirl Anoifrir rorlour s*or ir dus to*it*

m
snumrrator oiiiu't.rpuia.nt tiro tlata ir bgin5 solloeted'
mey no[ bc eapable of reBorting '

o
lntontlondf or u*fniutitinnuf, An onurnerator
ifi'ffifi; ffi; il iI ffiiu *poit itiuut ttru elnditisn of oropu ln dlfferont
', lr€lt rfter he'ivy'riinl*h,-trii'nlrclrmcntr may ho
;;;til#ilhi may bc lnsllnod tn gl* ffing
blarcd

t .
duo
cto laolr of
roBorti, Blar lr a roriour errot' und
oannor ue reoueei 6i-ffid;ringii. Ju"Bin iiro,
;ill,i; ;tu-dfia wo[-dr-tho rorulatlon rtudv'
p o
Binr mav be prerenr tn rho

s
11:! dTIUPT..TNO DISTRIBUTIONSand wo dra-w all poglblo rlmplo randont
':

Supporc *. ft** nifu pqputniion

o g
l
..r ir.l'J};;6 ; 6ri:iut ;Jpr-rffi ili ui. * iitr rcplaco en r, ForE aoh
m ra te e mp w

. b
oh')' {! p.omlblo valuor of
oaloulnte rcmc rtltirtls (ramplg poan X or.proporttoS,0
wtttitt lr'oalled the rcntpllrtg
r*notrl'ii3
tho rrrtlrrlo mahc r'Brobeblfft,':aiitiit 11[[.
nur[.rtil1id1ti. iiiipi-uii. uiuittv verv larso and obvrourlv

4
dhtrlburhrr, Thc numbsr of
-iotouturoa' -will be oqual i-o tho

9
tho numbrr of rtrtrrtror (any tr,i:*mpto)
."i;E.*; icatr.u;i from eaoh ramplo, [n fact' ln

9ttr:1:',,...l[.;r,'4'n';"r.it',lh*r$i;l*"Ji:
mmp. tf ono

t
volusr' The
iiil
prrottorl rlturttoni, qiiiwll ,i-aliriUnitio, har vs.rv larse numbei sf

;ffiia ,*aor ..'rrii.i,ta


iltd%'t"'ffii',1f,*':Ilf
/ s iomiof the famoul tamplfity. dist'tihttiort, aroi

/
(i) grnomiii[iriribution, ftp Nor,rnai.dhlrlbutisn, (iii) t'rlirtlibtttion'
il'o
s : dhmlbutloi,
Chl.rquaro (v) F;dlrtrlbution'
ar.tributlonr boeaure thov are dsrlved

tt p EBBOR
Thon a**uiliorl.*.iircd iiiir.a tho
from rll podble lamPlcl,

h
11.!,1. BIA!{DABD
"''hi;:;;i;e'acrirtiin h callod the rlordord stra)r, sf rhar
of romc rrarirtto
poglble valuoa gf X b
rtrtlrtlo, If tho rtatlrtls X, tho rtandard devlation of all
ir
or oJ' Similarly, lf the
orlhd tr,n&|rt.flotof f, whloh may be writtan al 8,8, 6)
of all poraiblq valuEe of 0 ic
umplc rtatlrtlc ir proportlon $, the rtandard deviation
g'E'( $)'
..ii.i rt", turd s?irot $ rna ir donoted bl o6 or
llrt.l SAMPTINO DI$TBIBUTION Of *
Theprobabilitydirtrlbutionofallporriblovalur lE of
X ealeulatod from all
distributiorr of X' In brief' we
poraible ,irpt random ramplo ir oallod-th e eantplittX
oxpoctocl vah't'i
it digtribution of x, The moan of this dirtribution ia callcd
irrau eatt
l rnd
v
),
sf X and h written at Efi) up [tr, Tho rtandard dovlatlon (rtrndard orror)'of thir
l,
dhtributbn b denotedrby B,E,( f;) or oJ and tho vertenoo o(f, tr donotod by
E var 6)
or ofl, The dlrtrlbutlon of f, hae rcm6 lmportant propertler ar
under:

m
0) An lmportalt ploperty of tho dirtrlbutlon of X ls that lt lr a norrnal
dlrtrlbutlon

o
whon the stae of rho sqmple tr_larsc, when ltJrampfcA;;il;#;ffi"fd;
c
we oell lt a larse eamplc ilrc, The ihapc of the populaUri
lriiirUli;6; aoei ila
,
t .
mett€r,Ihu puPltltign,ryay bo normal or non.normal, the dirtribution

o
sf X lg
normal for n > 80, But thir h rruc when thc nurnbir-;,f ffi;lr;ii'rJ y tursu,

s p
Ar thc dlctrlbutlon of randonr.vnriablo X lr normnl, X ean be trnnlformod lnto

standard normal variablo Z whoro Z


=
f;:-F

og
l in[ {dL;;n;'iil'ifrii,
o / t/n,

. b
Tho dlstrlbutlon of * har tha t.dlstrlbution when tho populatlon
lr normal and n

3
s B0'..Dtasram (a) rhows rho normer dr*riburron rho r,

9 4
t9
ta
/: /s
(ti) Thc m€en of tha dlrtrlbutton of t lr equal to tho mean of tho populatlon,

s
Thur

tt p
Efi) = Fx = F (Populatlon msan), Thlr rnlatlon lr true for rrnall ar wcll ar larye
rample:lae ln rampllng wlthout rcplaooment and wlth repla@ment,

h
(ttt) Thc rtandard €mor (rtandard ilevlatlon) of
X ir- rnlaced wlth tho standard
dcviatisn of population s through tho relaiionu
g,E,( x)-6x- fro
Thtr lr true whcn.popuiatlon ir inllnlto whioh rReanr N ir vcry largo or
thc
aampling ir donc with roplaeemont from finite or inftnitc
Boputuu'uol-'

Thir is truc when rampling ir witlrout roplacornont fi.om-finite population, I.hc


abovc rwo equations between o* and s are truu uuir, 6;'il;li
;;;;ii
aa larye
aamplo rizor.
,fi
Basic Statistics Part-II
a2
11,7,
'.&cample
Draw all Possible samPles of size 2 without replaccurett from a
population
lZ, 15. Form the sampling distribution of sarnple means and
consisting of 3, 6, 9,
verify the.reaults:
(i) (Xl =u (ii) var(x) = *'(*=)
Solution:
ltle have pOpulation valuee 3, 6, 9, 12,15, population siLe i\ = ii
om
c
ar-ttl sample size

.
n= Z.Thus,-tfi" n"*ber of possible samples which can be ''.it'awn without
rep)lacement is

ot
(I)' =.(l) = ro

s p
g
Sample

l o
Values

I 3,6' 4.5

. b 6 6, 12 9.0

3
3,9 6.0 7 ri, 15 10.5

4
2
g,lz ,7.5 8 9,12 L0.5

9
3
I 9, 15 12.0

9
4 3, 15 9.0

t
6,9 7.5 10 12" r5 i3.5

a
5

s t
The sampling digtribution of the sample mrian X
,rnd its nlL'an and standard

/: /
deviation are: -l_ I
s
x f f (x) l-f d) x3f (x ) i

tt p
1/10 5/10
4.), 20.2511,0
4.6 1

h
I 1/10 . c.00/1 0 36.00/10
6.0
c, 2lt0 15t0/10 112.50/10
7.5
2 2lL0 18.(
.0110 162.00/10
9.0
2 2lt0 2L.(.0/10 22A.50170
10.5
144.00i 10
12.0 I 1/10 t2.(
;.0/10

13.5 I 1/10 i3.j


i.5/10 l9z.25ll0

l0 '1 90 877.5110
Total
90
,E(X) = rIrtXl = 10
o

- r z /90\:
- Irx (xu = 16-
877.5
Var(X) =EXz f(X) -[to,r = 8'}.7t: * Bi = 6.7tr
t-
I [Chapt"r tU S".rpting ibrUonr. ,
S"rpting Di.t {3
"nd _
The mean and variance of the population are: .

x 3 6 I t2 15 EX =45
x2 I 36 81 t44 225 XX2 = 495

p=# =+ =eando'=#-(#)' 495 / as \z


-l.T,, = 90-81 = l8

m
Verification:

o
(i)E(X;'= p=e (ii)var(X) =
*(N*)
18 15-2\

c
= T [5-il = 6.76
Example 11,2

t .
o
If randcm samples of size three are drawn without replacement from the

p
population consisting of four numbers 4, b, b, z. Find eample mean

s
X for each

g
sample and make sampling distribution of X. Calculate the mean and standard

o
deviation of this sampling distrrbution. Compare your calculations with population

l
parameters.

. b
Solution:

3
we have population values 4, b,5,7, population size N = 4 and sample gize n B.

4
Thl., the number of possible samples which can be drawn without replacement= is
N)
9
f ( 4\
(r/ = [e,/ =a'

t9
a
Sample No. Sample Values

t
Sample Mean (X)
"
s
I 4,5,5 l4lg

/: /
q
4,5,7 r6il3

s
3 4,5,7 l6i/3

tt p
4 5,5, 7 t7l3

- The sampling distribution of the sample meanrX and its mean and etandard
h
deviation are:

x f f(x) x f(x) X, r(X)


t4l3 1 Ll4 t4lt2 196/36
1643 I 2t4 321t2 512136
r7t3 1 u4 17112 289136
Total 4 I 63n2 997/36

rX r(X) = f; = b.z6

63 \2
\rX, r(X) - [rX rlXy;z =
n) = 0.3632
J!
The moan and rtanderd doviatlon of qbs PgEggggn ryg

EXU e 116

om
t . c
u-'Y:;r'jri'rrru6e
p o
s
rampler of srze rwo with reptaeernent frop thc popul*tion-P,

g
z, E, S[;;;iiltid;;eoiaiioo moen lr equal to thc meaR of means sf all
sntnplos

l o
ihO pueotatlon varlane'o ir twlee the varlanee of samplo tnoane
SolwtlEru
. b
,

We have population values f[,2, -g,population siue N = 3 and eamplc


tizs n = 2'

3
Thur, ttrf;ffibii'.f-i6.ri6ioiaiuptei,'whieh can bc drawn with rcplaeemont
is

4
\fn=$!=$,

9
Eample Mean Saurplc Barnple Mean

9
Eample Bamplc Bamplc

t
No, Values (x) No, Valuce (x)

ta
I 2,2 2 6 2,8 6

s
E'?

/: /
7 6
z 2,9 a
B 2,8 6 I 9,2 6

s
4 2,2 2 0 8,8 B

p
2,2 tfl'

t
6

hxt
rneaR x a tid tH mcaR aRfl varianes are:
nd va
The ramplinE dietrlbution of the ca
Tally t f (x) x f(x) ls r([)
2 ll'll 4 4ls 819 16/0

6 illl 4 4lg 20/9 100/t)

I I 1 110 8/0 04lg

0 30/9 180/9
Total 1

E (,x) = rf;r(xl = 9#= 4


Var(X;= ENr f(X) - tEX f (X)lr
180
I =(%u)u * n

2Var(X)= 2(4\ =, $
[Gtfitr tU trmpllnc rnd lrmpllng Dlrtrlbutlonr 4t
The mean dhd'"varianee of ths population arel {,,

x 2 2 I EX= 12

xI
*
4 4 '6d EXU = i2

p=# = + #: (#)'= f :(#)'=t


m
e ahd n'=

c o
.
$enee E(N) =pt=4 nntl s!=9Vnr,(*) =8,
Drample l/,{,
ot
p
A population has the values 10, 12, 14, 16, 1t and 20, Drlaw all porelblo

s
ramples of slze 3 wlthout roplacornent and ealoulate the sample moaR X fol oaeh

og
rample, Wrlte the sampllng rllstrlbution of f,, Irlnd the followlng probabilitioa:

(i) * will be grcator than 16,


bl
.
(i0 X wtll differ from p by lem than B unite,

(tit)
3
Sarnpllng error will be less than 0, (lv) f; will be equal to

4
p,

I 9
Sohtllont
Aill poretble rampleo of rlre

t9 will be equal to oC, =*pfu = 15

ta
Thc eamples, their mcens and,necessery oaloulations are ag undcr:

Eample

/: /
Sample
Values s Sample Mean Sample Eamplc
Valuor
Barnplo Moan

s
No, No,
(D (x)

tt p
1 10, lP lt 0 19,24 l0

h
2 to; 14 t2 l0 14, 16 16

I 10, 10 1B 11 14, 18 t8
4 10, 18 L4 t2 t4,26 I t7
6 10, 20 l6 18 16, 18 t7
6 L2, L4 18 l4 16, 20 18

7 12, 16 14 16 18, 20 10
J
I 12, 18 16 ,l'
46 Basic StatisUcs Paft-II

Sampling Distribution of X

x f ffi)
11 1 1/i5

m
t2 1 LILS

o
2lL5

c
13 2

t .
l4 2 2lt5

E 3

p o
3/15

s
()

g
16 2lt5

l o
L7
c)
ztts

. b
18 1 LILS

3
I 1/15

4
19

9
TotaI 15 1

ffi t9
a =#
poputation mean u= T='u =

s t
/
(1) P6:16)=*.*.*

:/
s
(ii) X will differ from p by iess than 3 units if X is greater than 12 and is less

p
than 18.

t t .2232211 =

(iii) h
ThusPtlX-[tl<3] G* G+ lb* G* G
=P(12<X<18) = G
The sampling error will be less than 2 if the random variable X is greater than
13 andless than 17. Thus P(13<x" tz)= P(14<Xs 16) = p Ils'B |'2]
2927
-15 -15-15 -15
3
(iv) P(X = P) = P(X = 15) =G
Erample 11,5
Certain tubes produced by a company have a mean lifetime of 900 hours and a
standaid deviation of fOO hours. The cornpany sends out 2000 lots of 100 tubes each.
f!fiapten Lll Sampling and Sampling Distributions 47
Compute the mean and standard deviation of the sampling distribution of the
sample mean X if sampling is done: (i) with replacement (ii) without replacernent.
Solution:
Here N= 2000, n = 100, p= 900, o= 100
(i) Sampling with replacement
lt*=P=900 and o;=
fr = ffi = tO

om
(ii) Sampling without replacement

t . c
o
ffioo- roo
=fr\H 100

p
1t;.= P = 900 and o; -:_ zooo-r =9'75
=
rlroo \
11.E.3'SAMPLING DISTRIBUTION OF s2 and 52

g s
l o
Suppose we draw all p.ossible samples of size n from a finite population and

calculaie the sample variance .' = H


. b
for each sarnple. The mean of the

3
sampling tlistribution of s2 is denoted by E(s2) or psz.

4 x(I:
It
can be shown.that if

9
sampling is with replacement, then E(:') = p.z = o,2. Thus s2 is an unbiased estimator

t9
of o2. The sample variance s2 is defined as: s2 =
D'. If samples are drawn with

a * o'
n

replacement, it

s t
can be shown that: E(Sz) = [ftS) * o'],

/: /
Thus 52 is a biased estimator of o2. In case of sampling rvithout replacement,

s
we have the following relations:

tt p \J or = (*)"'
Pr.') E(szy

Example
h E(Sr) ;h.

11.6
or E(s,)= (sJ(+)",,
4 population consists of three nu.rnbers LO, L2, 14. Take all possible sanrples of
size tivo with replacenient from this population. Find the mean and the unbiased
variance for each sampie. Show that E(sz) = o2 where s2 = I(X - X)rl(n - t)
Solution:
We have population values LO, 12,14, popt lation size N = 3 and sample size
n = 2. Thus, the number of possible samples which can be drawn with replacement is
Nn=32=9.
tT

Earnplc Mcan SamBlo Varianee


Eample No, Samplc Valucc
fr cr. e E(X*n!/(n-l)
" EXAr
1 10, 10 E** = ,u r0
,
l0+ @=ta
m
@=tl
12
2 10, 12 f) - r+
2=L

o
a

W=a
. c
-uQ*[ =
I 2 -*-,u
t
10, 14 a_7 -- +

o LI
(12: l1)l +

p
uE-19 = ,, [10= =u
4 12, 10
2 - ',*. 2-L

g
l?,+
s 2 -*'
12 12: 12)! + (12 =.IP)s

Illl-Ll =-'x lo W=z


@=IU
6 12, L2

'6

. b
r fr
l?,,1/l fl D_t

3
a-,

#=,, 4
9
7 14, 10 =S

14, 12
t9tL_il! _ ,,
s#=,t @=s o
a-_t t

a --=
e

I
s t W=o
/: /
14, 14 2:t ru

s
The rampllng dlstrtbutlon of thc sample varlanoe ss and lts mean isl

tt p
t9t Tally f# f(rs) gr f(el)

I E(rr1 = tut (3t)

h
0 ilt 8/9 0
z lll'l 4 4lg 8/9
=T e 2'61
I H 2 zls ,16/9

Total 0 I 24lg

The varlancc ef thE Bopulation ial


x 10 t2 14 EX = ll0

x!l r00 t44 106 JxB = 4'io

Ex! l Ex \t 440 I s0\r -


o'= 5r-l.Ti.j 3 -\3i
_
=
| I
2,87

Hsnco E(sr) = ot = 2,67


IGhrphr tll lrmollno rnd trmpllns Dlttrlbutlont 49
Dxample 11,7,
A population eoncists of five valuog 4, 6, 8, 10, 19, Take all pooaibto samBleo of
rize two without rcplaeement front thie Bopulation and veriff that

E(sr)=(t)(H)n,
-Solutlsru
-*,
tave Bopulatlon valuo:4, 6, E, 10, 12, populetlon eize N = 6 anel nntnple
clna n r 2, Thur, thE number af Boceible rampler whloh ean bs drawn without
om
rcplaeement h (X) " ( E) ",u,
t . c
p o
Sample Vurianse

s
Samplc Monn
Samplo Valuor
x.+
g(4-6\llt(6-Slr r
Samplo No,
gr E

trs!bl o
.
1 4,6 2
I

4,8
43
tr-g ({-0)t+(8-6)r :-

9
2 11

t9 - - (d 7\l {:(10 7\,1


I 4, 10 E 0

a ff"a
2

s t * ({ - 8\'l +.(12 - 8)'r

/: /
4, 4, 12
2
l0

s
(6-ilt+(8-flt
6'8 rytl =
tt p
6 1
2

h
6 6, 10 H" (0 - g\r {:(10
2
- Bu c I

1 6, 12
Y-, (6 - 0\t

(8-0)r+(10-0)r
{:(12:0lrt
2 = 0

8 8, 10 ef,! = 6' a)

0 8, 12 Efll
2 --rv
,^ (8 - 1O\r +-(12
?,
- 10\t
=i
10+ 12 (10;Xl\tt(12-11tu ,.
10 10, 12
2 -" -. 2
1

-rli
50 Basic Statistlcs Part-II
n distribrrtion
The sampling of th
n oI tne sample D- a nd its mean is:
le varlance s2
S2 f f(s) s'?f(s)
1 4 4lL0 4lr0
4 3 3/10 LzILO
I I 2lr0 18/10
16 1 1/10 16/10

m
Total 10 1 50/10

c o
50
E(S1= Irsz = ES'?f(Sz) -u
.
-E
10

t
The varian ce ofthe popul
ula tionn is:
ls:
x 4 6 I l0 t2 EX =40

p o
s
yz 16 36 64 100 r44 IX? = 360

og
bl
Hence E(s) = (*)(:t"
3 .
4
)o2 =
b

9
"
,./Example 11,8

9 (ii)
A population of 10 numbers has a nrean of 100 and a standard deviation of 10.

t
If sampies of size 5 are drawn from this population, find the mean of the sampling

a
t
distribution of variances when sampling is done

s
(i) with replacement withoutreplacement.

/: /
Solution:
Here N= F= 100, o= 10, o2= 100, n =

E(s) s
10, 5
(i) Sampling with replacement

(ii) tt
p = (+)o, (?)
ps2= = loo = 8o

h = = (^*) +)o, (#X"'"


Sampling without replacenrent
E(s,)' r,s2 ( = ) r00 = 88 8e
11.8.4 SAMPLING DISTRIBUTION OF DIFFERENCE RETWEEN TWO
MEANS
Suppose there is a population with mean Fr and variance of. Another
population has the mean p,, and variance oj. eU possible simple random sa.rfiples of

size n, are selected from the first population and the sample means X, for each
sample are calculated. Similarly, al! po:sible simple random samples of size n2 are
selected from the seconcl population ancl the sampie means X, are calculated. The

\-
lChapter 11I Sampting and Sampting Oistributions 51

difference (X,. - Xr) is another randorn variable and its distribution is called
eampling distribution of X, - Xr. Some properties of this distribution are:

(i) The mean of the distribution of X, - X, is equal to the difference Fr - ltz. Thus

E(X,-Xr;= p*,_x, = pt,-pz

Similarly the distribution oiX, - X, has the mean pxz


- Xr = lt, -
om
ltr.

If Fr = p, then E(X, - Xr) = 0

t . c
o
The above relations are true for any type of population with any samplc sile,
' small or large and the sarnples may be drawn by without replacement or with

(ii)
replacement.

s
When samples are selected by without replacement from a finite population, p
og
I, - X, has the following relation with of and oi.
l ).;t N,;Tl
the standard error of
ffiN,_n,)
s.E,(x,-x,; ox,-x, =
.
\,,tN;= b =

4 3
When samples are drawn with repiacement or they are .drawn from infinite

9
populations (Nr andN, are very large), the relation becomes:

t9
S.E.(X,-Xr; = ox,_x,=
a
\*.[
It

s t
may be noted that in practical life, N, and N, are usually very large and the
'fractions
H
:// ffi una are almost equal to unity. Thus in the subsequent

p s VG -

t
=
chapter, we shall frequently use the relation of

t
\ ", d , - X,

hin
(iii) The sampling distribution of X, X, is a normal distributio, *hur, n, > 30 anrl
-
n, > 30. The sample sizes n, and n2 may be equal or unequal but both should be
large size. The difference (X, - Xrl is a random variable wit,h normal
distribution and the standard normal variable Z can be writtert as
77
L ,),
6', o;
nl nz
The distribution of X, - X, has the t-distribution when both n, and n, are small
in size.
lmlc Strtlrtlc. Pffi.ll

Draw all poeaible randotn sarnpior of cizo trr r 2 without roplaeomont frorn the
finitc population 2, 9, 8,Similarly, draw all posciblE random eamplos sf cizo hr = !
without rephcement from ths population I , 1,2, 4,
(i) Find the pseeiblo difforsnscs botwosn tho aamplo menRc of tho two populations,

m
(it) Oonstruet the eampling digtributlon of X1 - X, and eompttto its menn und
varianee,

(t+t.J - *
o
c (H
o
(iii) vorify thatr E(f,r:f;d - p,-p, and vur(x,-x,) "
* )
Sohttlont
s p
Population Il 2,2,8

og [I:
PoBulation L, 1, g, 4

Populatlon rirc N, . I
bl rlre
PoPulatlon rize Nu r
.
4

3
Semplc rlzg tr1 I 2 Samplo trr F 2

9 4
The number of pouible rampler whlch The numbor of po:rlblo ramplor whlch

t9 '.(Xi)=(l)=
oan be drawn without raplaelmant can bo drawn wlthout replaoomont

"(Il) -(B)='
ta u

/: / s
s
From PsBulatlsn I From PsBulation II

tt p Sampls Mean Eample Surnplc Moan


SamBIe Sample Samplo

h
No, Veluor (Xr) No, Valuer fir)
I 2,2 2 1 1'1 1,0

2 2,6 4 2 1,2 1,6

I 2,8 ,4 I L'4 2,6

4 1,2" 1,6

6 1,4 2,6

6 2,4 8,0
lChapteT 11I Sampfing

(i) The 18 possible differences X, - fr, are shown in the following rable,

xr
(,
x, 4 4

1.0 1,0 8,0 8.0

om
. c
1.6 0,& 2.6 2.6

t
2,b * 0.5 l.d

o
1.6

p
I ,ll 0.6 2.6 2.6

s
2.5 - 0,6 1,6 1.5
8,0 - 1.0

og
1,0 1.0

The sampling.distribution of differenses betwoen sample meanB X,

bl - *,
.
and its
mcan and variance are computcd below.

X1-Xr=4
43
- 1.0 I
99 l/18 - l/18
t
1,0/18
q
-
a
0,6 2fi8 - 1/18 0,6/18

t
0,5 t)

s
,
2tL8 1/18 0,6/18

/: /
1,0 3 3/18 B/18 8,0/18

s
1,6 4 4lL8 6lL8 9,0/18

tt p
2,6 4 4lt8 10/18 26,0/18
8,0 o
2lL8 18,0/18

h:*d
1 8/18,,

67/18

E(f;r = It(rt;= Edf(d) - ffi= * I

varfi1= xu) = ver(d) = Eds (d) * [pa (at]s


(ffi) Ths mcen and varianee of the flrst populetlsn
* # = (6), - w"ffi
are:
xr 2 2 6 EX1 x 19

x? 4 4 g0
EX! * aa

p, =ffi *$n,r *?-#*(ffi),. f (#),"* ry-4%rsa.ry


l'
;

Baslc Statlstlca Paft-Il


I

The mean and vafiance of the second population arq;


xr 1 1 2 4 EXr=6
xi I 1 4 16 ZXI=t22

ffi = ?=zand"i=#-(#J' ='+ G)' =? n= W=E


m
P, =

o
10
=f 2^ = -F=
-1-0--:-0 L

c
Irr-Fz

.
5

:i r*,-n,),siIN,-o,)=
[t,NF )- o'[ N'- )-
q(*.*J-*(]*j= i3* i= #=
ot ?B

p .*(H)=i3
L 1

Hencenfi,-X,y=lrr-r,, =8and Var(x, -X,)=


g s
*[H)
Exomple 11,10

l o
b
dirrn Nr = 800, Nr = 600, nt = 200, nv'= 124, p1= 1800, pl = 1600, or = 200 and
.
ot = lZ4, Compute th. ,o..n and stanclard error of the sarnpling dietribution of
the

difference X, -
43
X, if sampling is elone (i) with replacetnont (ii) without roplaeetnent'

9
Solutlon:

Pi,-iz = *P, = 9
(i) Sampling with rcplaeemcnt

fT-olt
t, e 1800: 1600 200

oi,=i, = \ii.ilta
/ s = =18
(iD
:/=
Sampling without rePlaeement

s
PIr=Il=Fl-P! 1800-1600 = 2Q0

tt p=
nI,=ir=m= (200)!

h t6'11
11,T.6 PROPORTION
of whlch
What io a proportion? Suppoco there are 1000 ctudents in a sehool out
coo aie- mufu ioa aoo uru female, ihe ratlo of 600 to thc total iB callod
the
Ep+
proportlon of maler end l: denoted by p, Thuu proportlon of maler = n'n ffi
ffi = 0,4
and proportion of fernalee = q =
tet ur denote male by ,o.*l uod fcmale by a failure, lf the tnalc ctudents are
populatt^on
then tho
arElgned iriooo*iu;ig[6t femaler aro areignedthe nutnber 0,
oontains 800 oneJ aoa ibO ,rrsc, fhis .a,i be written as bclow in the form of a
iiiii,i-u-i,tiri uuitua;ii; B;iriltli distribution, Lct ua oalsulate tho meaR of thie
distribution,
Random Variable (X)

400/1000 = 0,4

600/1000 = 0.6

om
c
E(X) = Mean = llX (X; = 9,6

t .
. Thus thc-propot'ti.o.tt of tho population callcd the binomial populntion is equal

o
1p.

to the mean of the populution corltli;i,lg Os and l,s,

p
II.8.6 SAMPLING DISTRIBUTION OF PROPORTION

g
Suppooe there ic a finite population in whieh tho proportion of
s
oueeosoes is. p
and the proportion of failuree ie q, Supposo wo draw ali pogsiblc rurptur

o
of r*izc n

l
from the population and ealculnte the sarnple proportioi for each oample, The

b
ff

.$
eampling distribution of p hae the foilowing properties

3
(i) The mean of the sampling diatribution of lo equal to the

4
population
proportion p, 'l'hus E(0)

9
= 1.0 = p

9
I
I
Thic relation is truo in aampling with rcplacement and without replacement

t
for
any cample sizc,
(ii)
ta
The ctandard error of 0 ia relatcd to the population parametero p and

s
q

/: /
through the equationc:

s
S.E,6)=o0=\ffi(Trueforaamplingwithoutreplrrcement)

and S,E,f6l = u0 =
tt p\F (Truo for sampling with rcBlaesmont\

h
\ or when N ia vory large )
(iii) The eh.ape of the distribution of p ie normal whcn n > g0,
Thc valuo of Z ean be
ealculated frorn $, wherc g = i=g,
,83
\'
warntngt note thar when n is omall, the dtrrrtburton of
lj^i: 11lP_9,xayrt
tR6 t"dlttrib.uflon,
$.tr not
EwmBle IJUI
. I$_nonulation
rize
constetr of flve numbrm 2,6,6, ?, g, Take ell pomlblo ramplc: of
from thie population without ,cptaouileiit rini iomputE th; pro[reriion of sdd
numbcre for eaoh earnplc, Veriff rhatr (i) pg:1p_$,@
!
Basic Statistlcs Part-II I
f(
solution"ave
popuration values 2, b,6, ?, g, population siz,c N = 5 u*cl sampre.sizc (j
n = 3. Thus, the number of possible satnples which can be drnwn without (
replacernent is (I) = (3)= 10. Let fi rrp.u*unt tirb
pro,ortiorr.f orld nurnbc's in !
I

m
the sample.

o
Sample Proportion Sanrple Sample Sample Proportion r
$ample

c
Sample
(6)

.
Values No. Values
No. tfir
1 1,6,6 t/3 6

o t
2,7, f) 213

p6,?,[l.
5,0, 7 2t3
t) 2,5,7 213 7

s
3 2, 5, f) 'Jl3 I 6,CI,9 2li)

g $
2,6, 7 l/3 I 3/3

o
4

l r#
10 (;, ?,9 2t3
t) 2,6,9 1/3

b
. -
The eampling distribution of the satttplc proportion ancl its mean and

3
varianceato:
4 t0l $, rt0r
Tally g

99
a t
s t
/: /
s
* u$ rtfi) * 19: 0,6

tt p
Fii B0*
uu' * p0, rt$) * r($)1, : BB /Ig\J
00 -\80/
:--lg-l
= 0,40 = 0,Bg
* 0,04

h
tE$

Populatisn prgpsrtisn oT* - B e 0'fl, q E I = p' 0'4


popriluttotl'
where X reprcsente the numbclr sf sdd digiie in thcr
gsilq4 (W\ '
r'--g-
f S=?\1 =
n\N:li \E=tJ=()'o'1 I
r

I
Henee (i) pr^ = P = 0'0
(ii) u'0 * # (H) = 0'0'l
Example ll,l8,
s;1 anrl s.T,Lffixl
A frnitu,p*pulatign egntaina 4 smekere densted by 8r,sg,
grnekers denoted by Nr and NB, Draw all peesihts randont eamffo of
siap I

propot'tion qf ctnqkprb fr in ench


l,eplpcepent from the population nnd saluulate the
and Sam Distrlbutlons
mple. Write the probability distrihution (oampling distribution)
of 0 and fincl the
lorving probabilities:

I p ,^... A I
0 is more than (ii) $ is equal to p (ttt) p =; (iv) that both are smokers.

'e have population values Sr, Ss, Sr, S,t,.N,,


{r, populaticlp size N = 0 und snmple
n = 2. Thus, the nurnber of poesible sainples

m
which can be clrawn without
placenrentt'(I) = (3) =
o
rb.

Sarnple

t . c
Sarnple

o
Values' proportion (fi)

p
I Sr, Sz 2t2 -9 Sr, Nz y2

s
t) ,', lt7
Sr, Su Sl

g
10 Srt, 2/2
3 Sr, .7 l1)

o
S,r 1l S,t, Nr Lt2

l
4 Sr, Nr u2 t2 Nf

b
S,t, Lt2

.
l) Sr, Nz lt2 13 Sr, Nr y2

3
6 Sz, Sr qfi
l4 Sr, Nz Lt2

4
7 Sr, S,l O!()
Nr, Nz

9
1l-r 0
I Sr, Nr

9f (c)
r/2

t
The sampling distributionof the sample proportion

a
$ i*rr

s
p

t
/: /
0 I 1/15
y2

s
8 8/15

tt p
2t2 6 6/15
Total 15 1

h, r)
Populat.ionlrrolrortion

(i)
p = fl = ?
r'(0
* = (ii) P(0 = p) = 0

(ii1 u(6 = *) -
* (iv) I'(both arc rirnokers; = I
Example 11,18
If
samples of n = 200 oh"servations are to be drawn from a large population
N = 2500 in which the population proportion is 20 %, Oetermine
if,. *xpu.ted moan
,r*l^tlii.*rd_-deviation of the u,r,npling clistribution of proportione when rr*pfi"g-"
,ufon. (l) wtth replacemcnt (ii) without replaccrnont,
roh
r::
Baslc Statlrtlcr Part'll
58
Solutlon:
Here N= 2600, n= 200, P = 0.20,
q= 1 - p= I - 0'20 = 0'80
(i) Sampling with rePlacernont
(0.20x0.80)
E($) = p = o.2o and s.E, (0) = aF = 200 = 0,0288
(ii) Sampling without replacoment

E(0)= p = o,2o and S'E' t0l =

om
.
= 0,0271

t c
BETWEEN $, and fl'

o
11.E.? SAMPLING DISTRIBUTION Of DI!'pEB1INCE

p
. suppose there aro two populatione with proportiono p, and p, rrnd all possible

s
selectcd frorn the poprrlrttions
eimple ,rnao, eanrples of .iru.,n, and n, are

g
frorn the samples are fi' and fi:l' The
reapectively, Thc samplo proportions calculatod
diffsrence 0, - 0, is a random variablo and
l o
ite distribtrtion is callcd tho sa,tpling

. b
digtribution of $, - ffr. The propertiea of thie digtribution
are:

3
bctwoen p'
(i) The msan of the distribution o.f !, - 0, i. equal to thc difference
4
l

9
r
andpr. Thus p0, E(0, -0il'= P1 -Pz
-0r=
9
i

t
f

sampling with ancl without


Thie relation ig truo for any eample size and for
I

ta
replacement.
ttut the following relation with

s
ir (ii) The etandard error of the distribution of Gr -'0rl

/: /
r
1

population Parameters

s
tt p
s.E.6r - 0r) = o0, - rr =
(Truo for eampling without replacetncnt)

h and S.E,(6r - 0rl = o0, - 0, = !? - lp,q, PrQr


ff
(TrueforsamplingwithreplacementorwhenNisverylarge)
has ths'normal distribution when both
n' and n' are
(iii) The diatribution of $, - ff,
eize, the dintribution ug 0t.- $., ctoee
large in eize, whon n, and nr are ernall in
difforente (0, - $r) ean be
not form any standard distribution. The randorn

tranaformedintostendqrdnormalvariableZwhere,=ffi
Il1 I11
59
Example ll,I4
Given the data: Nr = 6, Dr = 3, Xr = B, Nl = 5, fi,t= Z,Xy= Z.

Find E(fi, - 0, anrt Var(fi, - fi, if sarnpling ie done


(i) with replacement (iiy without rcplaconrent
Solution:

Hero Nr=6, ht =3, Xr=3, p, = =63 = 0,5, qt =1-pr =0,6


ff
om
Nr=6, nt=2, Xr=2, p2
2

t . c
= ffi =;D = 0,4, Qg=1'-pr=0,6
(i) Sampling with replacement

p o
nf0, - 0rl = pr - pz =
g s
0,6 - 0.4.= 0,1

var(0r
l
= T - ry gryO . s4#*)
- 0,
o
b
=

=
3 .
0.0888 + 0.12 = 0,2093

4
(ii) Sampling without replacement

Pt0,-0rl pr-pr
99
t
= = 0,5-0,4 = 0,t

var(or_or=
ta
?(H+).H(H3J
// s
. =s:
= ryoGu*) .ry(B=f)
tt p
0,06+0,0g=0,14

h
Baslc Statlstlcs Patt'Il g
60
NTTIONS
s
A
..Population w ;ry?"re. fl&" " bi\oh6'r'
or iirreresr in a particuiar problem'
d
;;:ffi; ;; tot"ir.t lr"*rirur.runiu
or T
Thepopulationieaset{datathatcharacterizeseotncphcnomenon. t
population; For s
-
#:';:JrlE*:tn: *,t
number or erements, it is.cauetr as rinfte I

m
of chaire in a college
example h,,*un'ilpJ;tit;; ;;mber t
-- Inflnito Population
o
. o,r---^-r^ *it i- as infinite
infinite population'

c
number of eremente, ^"irort op
is crilled

.
If a popurationi*r'Trrinite lt
t
For exampru nu*irr;'fi;;;; ;n linr, numbor of stare in the eky.,

o
which we want to get some inrormation
is calted target

p
IHl,T,:fXiTl,
s
population.

g
.-Sampled PoPulation population'
A populati"r il;;;;h a sample is drawn is called sampled
- SamPle r--r-r r-^* population
l o
b
,,^-,rlnti
A,;;ple is a subset of clata eelected fromora^
A sample is a subset of the population
3 .
that contains measurements obtained by
an

exPeriment.
?Bandom Sample
9 4
9 **plo
r!- r- ^-r^A a *^^^nm nnrnnle.
by random sarnpling- iscalled ^ random sample.
qut.i;.i
t
A sample
or

a
whose barnpling unite have known

t
If a sample is selected from- euch a popiiation is eaiilto be a random sample'

s
probability ,t ri*"vi, equal o, unliuJr-irr.

/: /
poputntionf
3lffiill1e the procees of drawing6amnle from the
s
tt p
,m;i:#.:iffill$,ectinssm6q#Jrrom a sroup on the basis or chance or luch ie
calied a random samPling' , or
h
A method of selecting samples
ii;ffi;qrd ;; ;;qial
Unite
eo that ,i.t, ru*ple
chanco of beins selected'
of a given eize in a population

- SamPling.,nit. .re
.l/sampl'rg honoverlapping colloctions of elements
from tho population'
E

or
, popuhtion aro known as sampling units'
The basic slemente that constitutes
*stmpte Random SamPle evory item from'a populatibn has the same
.{ eimple random eample ie one in which
JU".l
"r
ttl;li;; tt tnv other itom'
or
Aeampleseloctedineuchamamerthateachposeiblesampleofaspecifiedsizehas
an equal.f,ui.u ofUuing aelected'
(ir)
119
119 rA Multiple Choice euestions...........;....
120 ........ 1E
tza
LzA
Chapter 14

m
121

o
122

c
t22
|23

t .
24

p o
s
25

g
26

l o
. b
31

43
99
t
t2

ta
/: /s
s
tt p
h
)
)
[Chapter 11] Sampllng and Sampling Olstrlbutlons 61
Simple Random Sampling
A procedure for eelecting members from a population in such a manner thnt each
drawing gives every available rnember an equal chance of selection,
or
A method of selecting items from a population so that every possible sample of a
specified size hae an equal chance ofbeing selected.

iiTll:l-rlxffif:ffi1il?
m
rhe popuration is first divided into subgroups, caued
etrata and a random sample is then taken from each stratum,
or'

c o
.
A etratified randotn sampling ie obtainecl by partitioning the sampling units in the

then eelected from each straturn.


ot
population into nouoverlapping subpopulations called etrata. Random enmplca are

p
Parameter

s
-
A parameter is a numerical.clescr.iptive measure of a population.

og
l
rA parameter is any measure which au...ioUl. a population.

b
Statistic ---

.
A statistic is quantity calculated from the observations in a sample.

3
4
A meaeure computed on the baeis of uu,n,il data is termed as eratisric.

9
Censug*

t9
The etudy of all the data pointe in a population is called a census.

ta
To etudy all the individual observatione of the entire population is called censun.

/: /
titf#f-"iffT# population. s
ilfr,.1r"r,on abour a popuration wthout u*u*ini,rg each and

s r
every unit of tho

tt p
(ii) To find reliability of estimates derived frorn the sample,
Advantagee of Sampling
(i) Sampling is cheaper thun cornplete count,
h
(ii) The data are collected and analyzed rnore guickly.
(iii) Sampling eavee time.
(iv) A highsr quality of labour with better supervision can be employecl cluo to
reduced volume of material.
(v) A emall fraction of population gives eometimes compreheneive and detailed
I
resulte.
Sampling Deeign /
t A eampling design is a definite statistical plan. which hae all eteps taken in the
I
eelection of ths eample and method of e-s$1gsliglr
I
1
.of
i The eampling deeign specifies the method of collecting the sample,
I
Baslc Statlstlca Part'Il
62
- Sampling Frame
population'
;lffi;l#g frame is a list of all narnpling units in thc

i*?}roa a sarnpling fratne'


A riet of the earnpling units for a stucly
-" ProbabilitY samPling'
unite are chosen on the basie of
A probability eampling is one in which the sarnpling
known probabilitien. '
of
When each anel every element of the pi"pulatisn
seleeted in the .u*pi,j', tr,u" rrrpling ic"eaid
to be probability campling,
om
hae known probability of b*ing

-ilflj';:;"!,*tlyL?filt|illn.r,rnr,ubility
t . c
sarnpling .*hul the procedure.o,f selecting

o
*n p*bubility but perconal judgement
the elernentc from the population ic not blee(t
is involved in selection

s p i

g
object
i^iHxffi;#ffi1ffi'-T# the popurarion and is,replaced beforc rhe next

o
ie known ae eamplins with replacement'

repla.rruff*tenl
i;;;il;d,, i-.tr " r,eio.tiun

. b
we draw a eampling unit from a
Sampling ie said to be with

3
next unit is drawn' In eamplrng
population una ,.tur* it to the pbprltiiun bcfore the

4
withrepIa.o*.ni,ui;il;'.unuuchosenmorethanonceinasample.

9
Sampling without BoPlacement
is p.erformed when an object is not replaced in the

9 o''
Sampling without replacement

t
population after it has bcen salectcd'

a tit,itit
'
t
when wc draw a eampling unit from a
sampling ie said to be without replacemcnt

s
before the next'unit ie drawn' In

/: /
population and do not return it to''tllpopulotiun
sampling without replacement an ."nnot be chosen more than once in a

s
sample,

tt p
Pormutation of the objectc selected from a
A permutation ie an arrangement in whieh the order

h
gpecific pool of objects is irnportant'
'or
objecto'
A permutation ie an orcterecl arrangement of
Combination to order'
i-.lJi*tion is coUcetion of a group of ohjccts without regard '
or
regard to order'
A combination is an arrangement of objecte'without
.gampling Error
a population paralnetcr and a snmple
The eampling error is the difference between
ctatistic'
or
gtaiietic and ite correaponrling population
The difference between a samPle
parameter ie called sampling error'
[Chapter 111 Sampllng and Sampllng Dlrtrlbutlons 63
Non-.Sampling Error
- All types of error other than sampling error, such as measurernent error, interviower
error and proceseing error is called non.eampling error.

Non-eampling error ie introduced by Ui".llon..iously or unconsciouly, on the part of


the recearcher. This is due to irnproper earnple eelection, improper questionnaires, etc,
-Bias [Ln\aia-5: u'L.e'r er-fec\a,ti.,r o] o.-*r s\^t,s\a q .1*-t\";t,rg.z*;fu1
The difference between the mean or expecled vatue of a statietidand the value of {ne)
eflsm:teTl'J:STtgi-i:H,?l*rfuf ., \.j;:Txm*
o
".u".d

. c
Bias meano a sycternatie cornponont of error which dcprivoa a statictical roeult of its

t
representativoneec,

o
- Sampling Distribution

p
yThe distribution of all poscible values that ean be ascumod by uonre statietic,

s
computeel from samples of the eamo eize randomly drawn from the came population,

g
ie called the sampling rliotribution of that statietic.

., A probability distribution coneieting .f


l o
;fi poesible values of a sample etatistic ie

. b
known as earnpling dietribution.
- Standard Error

etandard error, 3
The etandard deviation of the sampling distribution for a statistic is called the

4 ,,
99
t
The etandard deviation of any ostimr'rl, called the standard error of the

a
estimator.

t
Sampling Distribution of the Mean

s
If we take all possible samples of a given eize from a population and determine the

/: /
mean of each eample, the probability distribution of the sample meano is callcd the
eampling dietribution of the rnean,

s
tt p
A probability distribution of nll
poseibluo'ron',ulo meunc of a given errrnplo cize is
known as sampling diotribution of the rnean.

h
Central Lirnit Theorem
If alt eamplee of a epecificd cize are eclected frorn any population, the sampling
diatribution of thc sample mean is approximately a normal diatribution, Thic
approximation improves with larger samples,
ol'
If the sample cizc is large, thc theoretical sarnpling distribution of the tRcan can be
approximated closely with a norrnal dietribution,
Populatton Proportion
Thi fraction of values in a population whieh hae a epccific attribute ic called
population proportion
Sample Proportlon
A sample proportion ie the fraction of iterns in a sample that hae tho attributc sf
intereet.
Baslc Statlstlcs Part'U

MULTIPLB - CHOICE QU
Sample is a eub'eet of:
(a) population (b) data
(c) set (d) distribution,
List of all the units of the population is called:
(a) random samPling (b) bias
(c) samplirrg frame (d) probabrlitysamJrling.
8. Any calculation on the samplu data ie called:
(a) parameter (b) statistic
om
(c) X (d) error.
t . c
o
Any measure of the populatinn is called:

p
(a) tinite (b) para.Irreter

s
(c) without replaccment (d) random'

g
The difference between a stntistic ayrd the parameter is callerl:

o
(b) sampling error
l
(a) probabilitY
(d) non-random.
b
(c) random

.
Probability distribution of a statiirtic is called:

3
(a) sannplinB (b) parameter

4
(c) data (d) samphng distribution'

9
Sian,larrl deviation of the sampling distribution of a etatistic re called:

9I
(b) ,dispereion

t
(a) serious ertor
(c) standard error (rI) difference.

ta
s
If we obtain a point estinrate fo, o population mean p, the difference botween

X and p is callcd:
/: /
s
(a) sx6ndal{ 6rrror ft) bias
(d) difficult to tcll

tt p
(u) error of estima[ton
A'clistribution fcrrmed tiy all possible values of a statistic ie called:
' (b) hypergeometric distribution

h
(a) binomial distribution
(c) rrormul rlistribution (d) sampling digtribution
10. In prohability sampling, protrability of selecting an itom frorn the population rt
known and ie:
(a) equal to zero (b) non.zero
(c) equal to one (d) all of the above
11. A populatiorr about which we want to gct sotne information
is called:
(a) finite populabion (b) infinite poptrlation
(c) eampled population (d) larget popr'rlation
L2, Study of population is called:
(a) parameter (b) statietic
(c) error (d) cen$us
[Chapter 11] Sampling and $ampling Dlstributions 55
13. .For making voters list in Pnkistan we neerl:
(a) satnpling erreir (b) standard error
(c) cen$us (d) sirnple rilndom sampling
14.' ^Sampling based up,)n equal probability is called:
(a) probubilitysarnpling (b) .systematicsampling

m
(c)simple randortr slmplirrg (d) stratilicd runclom sarnpling

o
f 5. In sampling with rcplace rnont, nn elcmcnt can bc chosen:
(a) less than once
(r:) only once
(b) more than once
(d)
t . c
o
difficult to tell

p
16. In sanrpling without replncement, alt element cun be chosen:

s
(a) less tharr once (b) more than once

g
(c) only once (d) difficult to tell

(d)lo
1.7, In sampling with replacernent, thc following is alwuys true:
(a) n*N (b) nfN
(c) n>N
. b trll of the above

3
[8. Suppo*e q finitu population hae 6 items ancl 2 items are selectecl nt random

4
without replaecrnent, then all poscible campleis will bo:

9
(a) 6 (b) L2

9 (b)
(e) l6 (d)
t
36
10. Suppose a finite population contains 7 items anel 3 iicnrc are eelectcd at

ta
ranclom without ru:plaeement, then all poesible oanrplec will tre;

s
(a) 2l

/: /
35
(e) 14 (rl) 7
20' A popularion
s
con[ains hl iterna and all possiLrle enrnplers of cizs 11 rlre eelected

p
without replacement, The poseible numbor, rif earnplea will bel

t
(a) N (b) PN

g1' t
(e)
h(a)
Nen (d) N" ,

'9uppoee
a finttp prrpulation containa 4 rterns and 2 itemc arn celeeted at'
randsm with replaQemBR0, then all pessihle anurpl*ls will bet
0 (b) 10
(p) B (il4
&8' r\ populatislt ceninirie 2 itenrs flnd 11 itprrru aro Eplsete.d at rnnrlom with
rcplaesm$nt, rhBR nll pounihlo Hamlligs will bet
(al 1€ (b) I
iri) tfla (rl) 4
g|l, $uppirso rr prtp11lqpiqn han N itemp and n iir,nta al'p Eelostpd with replacement,
Nrrnrher uf dl poneihle sarnpk:s will ber;
(a) f{n ft) Non
(s) N 6) 11
66 Baslc Statlctle Paft.ll
24, In random sampling, the probability sf eelecting en item from ths population
isl
(a)unknown (b) known
(e)un-decided (d) ot'10 .

26, Random samPling ie also eallcd:


(a) probabilitYsarnPling (b) non.probability sampling

m
(e) samPling error (d) rantlom on'or,

o
26, Non.random eantpling is also eallecl:

c
(a) biased..*piiod 0) non-probability campling
i.i randorn sarnpling (d) reprcoentative sarnple
t .
o
27, Sampling error ean bo redueed bY:
'nori.random

p
(a) eampling (b) inereasing the population

s
i.i deereasing the eample cize (d) increasing t6e sarnple size,

g
2E. if frf is the eizc-of the population and n is the size of the cample, then sampling

o
fraetisn ic:
(a)
l
(b) N,

b
nN

.
n Nen
(e) N (d)

29, 'Ihc finite population


ffiI
4 3
eorrection factor ic:

9
N+R
(a)
Iffi G) N+

9
1

t
6[-n \-Ir

a
(c)
Ii*= (d)
n*1

s t X

G\Fi ://
80, , In campling with replaeement, the etandnrd error of ie eqttnl to:

(a) 6 /ii--o (b) ; .ol

G)fr s
tt p
(d) 4.
fr N

h
8,1, In eamPling wlth replaeement, standard eruor of the rample proportlon $ ts

.ffi
squal tol
(a) {*s (b)

, (e) \F (d)
N:N
N-1
lChapter 11I Ssmpllng and Sampllng Dlstrlbutlonr 67.

88. Ifpl=pz=pand Dr * nr; then S.E (0, - 6r) ia equal to:

,-\ ElllJ. + IL!91


\4, nr n2 (r,) ?-Y
EllLt + E!ll! IL Fa)
1
(e) (d) ,ql..t
/IJoo(;
tll tu nt)
RI,

m
,l

84. The eeleetion of cricket tearn for the world eu


rp lisis ral
ca ledr
rdr

o
(a) random sarnpling (b) 8y{,EtlLern
ernatrti c EA
Eatrnpling
(c) purposivesarnpling ter Hrt

. c
(d) cluuBl{ter Hl nruli
rlll ing
86. Random sanrpling is aleo ealled:
(a) probability snrnpling (b) judgrnent eampling
ot
(e) quota sarnpling
s
(d) sequentialsarnpling
p
g
86. A cornpletc list of all the uarnpling unite is ealled:

o
(a) campling decign (b) saurpling frtrme
(e) population frarne (rl) cluster
bl
(a) population design '
3 .
97. A plan for obtaining a Barnple fronr a population ic ealled:
(b) sampling dosign
(o) sarnpling frarte
9 4 (d) sarnpling dietributisn

9
8E. If a eurvey is conduetecl by a camplirrg design ic ealled:

t
(a) sample curvey (b) population curvey
(c) cystematic survey
ta (d) none of the above

/: / s
89. The differenee between the expeeted value sf a statiebic and the value of the
parameter being ectirnate d ie ealled a;
(b) non.sumpling error
s
(a) sampling eruor

tt p
(e) etandarcl error (d) bias
40, Thc etandard dcviation ot'any surttpling tlistlrbutioll iH salle€l I

h
(a) stanrlard errol' (b; non srrtnpling error
(e) type: I error (d) type.ll ertot, ,

41. The ctandard error inerEasoe whon sarnple eize ie:


(a) inereased (b) dcereaecd
(e) ftxed (d) rnore than iJO

4n, 'l'he mean of sanrpling tlistr,ibutton of nlCInrle lo oqunl to:

(d) x (b) p
(e) B (d) noRe ofthc abovc
48, The maan of the Barnpls rneuns ie oxaetly oqual to thel
(a) eample mean (b) population mean
(u) weightcd lfloall (d) cornbined ,rrean
68 Baslc Statlstlcs Paft.Il

t, Sum of all sanrplqlneaqg is eoual t.:


t,.' Total number of samples

(a) Etb (b) Lr


(c) both (a) and (b) (d) none of the above

m
46, A sa.nple which is free frorn bias is called:
(a) biased (b) unbiased
(c) positively biased (d) negatively biased
c o
t .
o
46. If E(X) = p thon bias is:
(a) (b)
p
poeitive ncgative

s
(c) zero (d) 100'Zo

47, If E(b = 10 and lr = 10 thon bine ia equal to:

og
l
(a) 0 (b) 10

b
(d)
.(b)
(c) 20 difficult tq tell

48. IfX=10andpr=
(a) 3
12 then sanrpling

4
error ie equal to:
lo

9
22
(c) Lz (d)
9
2

t
49, Thc etandard dcviation of ihc distribution of earnplc tneaRs is equal to:

a
t
(a) o'lfi (h,) 16 ln

s
(d) s/n

/: /
(e) s Nn
Ifn= 26,ss=26andX=26, then ctandard errsr sf X wifl bcl

s
I0,
(b)
p
(a) 2F 6

t
'0
t
(s) 1(d)

x'hsu*ES#iaaalled;
(a) unbiaaed sample varlanee (b) populatien variense
(e) bieeed sample vflriaRes (d) all sf the Bbove

re-l*
"n =
ElX.i* le enlled:
E,
n: I
(a) unbiascd samBle varianse (b) true varianse
(e) bialed eample variRnsP (el) varlenQe sf meanc
B, If H(e*) * B end sB s I then binE will bg;
(a) 6 $) B

(e) a (d) I
[Chapter 1U Sampling and Sampling Distributions , _
69
I
I1 54. ln sampling without ,.Olu.u*ort, the standard error of sampling distribution
of sample proportion $ is equal to:

I (a) b'(#J 0) Y(ilfr)


(c)#m ,(d) *(*--J
55. Wheh saurpling is done without replaeement oO is equal to:

(a) * om
c
,0,.
fr
fr1[t= (d);[--T t .
o
(c)
56. In case of sampling with replacement or. " is equal
Pr-Pz - '
s
to:
p
{r+ aFog
l
(a, (b)

. b
3
(c) (d)

9 4(b)
57. The distribution of the rneans of sarnples of size 4, taken from a populati,on

9 (d)
with a standard deviation o, has a standard doviation of:

t
(a) o ot4

ta
(c) c/2 o'12

s
58. In sampling ivith replacemen6,, of is eeual to:

/: /
r_;,

s
(b)

tt p
o:l+ ol

h
(c) (d)

59.. When sampling is done with or without replacement, E($, - 0, it equal to:
(a) 0,-0, 'i (b) pr - pr
(c) Pr * Pz (d) prPz
60. In case of sampling with replacement, E(S2) is equal to:
(a)'(--l) ' (ur (,,--u- j "'
(c) (N-) (d) *
"'
{
r

70 Baslc Statistics Paft:ll

61. In sampling without rcplacement, the expected value of 52 is equal to:

(a) (Y)ffi" (b) (*_J(#'J ",

62,
(c)
When
k*)(*"
sampling
)owith is done
(d)

replacement, then pr.z is equal to:

m
(a) (b) *

o
o2

. ffi c
lcz
(c) (a) o'

t
\n-
(b) (#J", o
68. In sampling without replacement, pr:l is equal to:

(d) (#) s
p
og
l
",

G)b
64. When eampling is done with or without replacctncnt,
i, i* cquitl to;

.
1tO,
-

3
(a) - pz Fr Fr + Itz

(c) - Pz ltr

9 4 (d)
ff-ff
9
GE. If X represent the number of units having the specified charactcristic and n is

(b) *at
the size of the sample, then sarnple proportion $ is cqual to:

t
n X+o o
(a)
s
(c) (d)

/
x r/n

/
(a)* s: tu)*
G6. If X represent the number of units having the specified characteristic and N is
the size of the population, then population proportion p is equal to:

tt p
(c) (d) x o:l
N N

1. h (a) 2. 3. 4.
(c) 5. &) 6. (b)7; (b) (d) (c) 8. (c)

9. (d) 10. 11. 12.


(b) 13.(d) 14. (d)15. (c) (c) (b) 16. (c

L7. (d) 18. (c) le. (b) 20. (c) 21. (b) 22. (a) 23. (a) 24.
25. (a) 26. (b) 27. (d) (c) 29. (c)
28, 30. (c) 31. (d) 82.
39. (d) 94. (c) 35. (a) 86. (b) 87. (b) 38. (a) 3e. (d) 40. (a) i

4L. ft) 42. (b) 4s. (b) .44. (c) 46. G) 48. (c) 47. (a) 48. (d)
49. (c) 60.. (c) 61. (c) 62. (a) 53. (d) 54. (c) bD. (c) 56. (c)

67, (c) 58. (c) 59. (b) 60. (a) 6r. (c) 62. (a) 63. (b) 64. (a
I
65. (b) 66. (c)
T

lChapter 111 Sampllng and SlTptlng DtrtrtbuUonr ,t ilI

qHonr qr,rEsTrolrs
1. Given lr = 6 and n =- 80.
"]:. .Find
r.ru l.;.
t=
Ans.6
2, Given n = 36 and o = G. l'ind o!.
I n
Ans. 1 .

m
8. rGiven n= 26 and o- = 6, Finclrthe value of o?.
Ans.626

c o
4, Given F,, = 10 and p, = 6. Find F,
t .
o
xr-x:
= r

Ans.4
p
.

s
5. Civen nr = 30, n, = 25, o? = SOO ancl"of,= lEO. Find oi, _
g
rr.

' lo
Ans. 16
6. Given N = 800,.n =.1o0 and

.b
s2 = 200. If eampling ir done without roplaccmenT
-
then find the vaiue of or-.
Ans. 1.16
7,
43
9
Giveri N = 310, n= 100 and o3 = 35. If sampling ii'don" without rrplacement,
then find o2. '
9
x

Ans.5150

a t
8. Given
It
s t
3, Dr = 2, N, = .1, D,r = 2, g? = g/g nnct ol = 6l4.lf rnmpling ir done

/: /
.-
without ruflo."*unt, then fincl the valul of o1
Ii- xz

s
Ans. 1.08 ra

tt p
9. Given N = 7, n = 2 an4 oz = 16. If sampliog is done without replnsmont, then
find E(Sz).
Ans.9.33
h
l0' Given N = 7, n = Zand o2 = 16. If samplinq is done without rcplnccmcnt, thon
find p.:.
Ans. 1E.6? '.',

11. Gi.ygn F = G, n = 2 ancl oz = 10.g. Find E(Ss)


Ans.5.4
12. Given- p = 6, n = 2 and oz = 10.8. find b(sl
Ans. 10.8
13: Given N = 7. o= 3, lf =3/7. Findthevaluiof populationproportiotr p.
t,
Ans.3/7
r
I 72 Basic Statistics Part-II

\d. Given N = T, n = 3 and pnp = 3/7. If sampling is clonc lvitliorrt rcplacernent, tind
,
o'n '
,p
Ans.0..0544
15. Given .n = 5 and p = 0.5. Find O-,rt) .

m
Ans.0.05
nl = 2, pz= | l2 and n, = 2. -h'ind

o
16. Given pr= 2l 3, ;.r..,,
- lr

Ans. 1/6
L7. Given Pr=2/ 3, n, =2, Pz= 1/2 and D., = 2. Find or..
t . t,
c
o
l)i- r)l

p
Ans.0.24

s
18. Given N, = 4, frt= 2, N, = 4, frz=2, pi= 112 and p, = i'-1: if sain;;l,lrg t:i tiotte

g
without replacement, find S.ti. (0, - 0,).
Ans.0.3819
l o
19. What is the value of the finite population correction t'actor r'"'hi:t: :t -' lli rttld
N = 125.
. b
Ans.0.93

43
20. Differentiate betrveetr sampling with and without I'uiiia'jerncnt'

9
21. . Distinguish betrveen probability rind non'probabi[t1' s:rnc lirg

9
t
22. Differentiate between parameter and statistic

a
23. I)istinguish between populatioll and sautplc'

s t
24. Distinguish between sampling and non-sampling el'r'ois.

/: /
25. .Differentiate between sirnple l.andotn sanrpiing and
sampling.

s
26, Explain the term sampling frarne.

tt p
27. Define the standatd error.
2g. Distinguish between sirnple randorn sarnplc and sttnple rantiu'm surtiplir:g'
29.
30.
31.
h
Explain the term sanrpling dcsign.
Differentiate between finite and infinite pcrpulations.
Define the sampling distribution.
32. Differentiate between randotn sample and sirnple ratlrl'lt:r sa'ctple'
33. Write down the advantages of sarnpiing'
g4. Write down the basic aims of sampling'
35. Define the terms sample and sampling'
36. Define the sampling distribution of means'
3?. Describe the propet'ties of the sampling distribution of sampie lncitnri'
38. Define the sampling distribution of samplq proportion anti describc its
properties.
39. \\'hat is meant l:v bias?
IChapten !.lj
}IXERCISES
;{ A popula*,icn ccnsists-of fivc numbcrs 3, 7, ll, 15 anci 19. Takc all possible
salrllrirs of s: ;e trvo rvithout rcplaccrncnt from this population. Find thc mean
and standarc deviation of thc sarnpling distribution of means
Ans:1t., - i1, rt.- = il.lC

m
T'ake aii pos,siblo sampies of size 3 rvithout replacerncnt from the population 2,
-?t
o
6, I' r2 anrl i4. For:rn sampling distritrution of rnean and find its mean and
t,ariance. \'erify that: pr; - p anrl "i - * (Hj)
t . c
Ans: pi, = 8.4, G__
x
- 3.04, lr = 8..1, oz = 18.24

p o
s
3- Di"aw .'rll possible samlrles of sizc two rvithr:rrt replacement frorn the population

g
c' 16, 1l, 20 anrl 22. c;alculate their mrrans and. prepare thc frequency

l o
dist,-ibui,ion of sarnple inean. Cornpute me.ln and varianco of frequency

b
disriiburion of rnean and compare them with population mean ancl variance.
Ans:pto : j.9, oi = 1.67,.F= 19, o2= b

3 .
4
4, A popuitation crinsists of four numbers 5, 6, 7 and 8. Take all possible sitmples

9
of. size three without reprlsgsrr.nt from this population. Calculate mean and

9
variance tf salrtple means and compare them with population mean ancl
variairce.

a t
rr-
Ans: 'x = 6.5. o2
t
= C 14, p = G.b, oj = l.Zb
s
/: /
X

5. .{Sopuiatiorl crins;'-rts of two elements 24 and,35. Take all"possible samplee of

s
siLe two -*'ith repiar;ement and find their means. Make a sampling distiibution
'of sarriple rie ans and find its rnean and standard deviatiq. Vlriff

p
that:

Ans: ir,h
t
t c-
-
.= 2.1.5,
(i) p* = ft
= 3.89, p = 29.5, o = 5.{-l
(ii)
lvn+-::
oo =

6. . A population consists ofthree values ZA, 4d and 60,


(l) Take all posoible samplcs of size 2, rvhich can be clrawn'with replaccment
fro:n this i:opulation and find means of.these sarnplcs.
(ii) Make a. frequency distribution of the sample rnean and show thai"tlre
t'arian,le of this distribution is equal to the population variancc divided by
the sarnple size.
I
Ans: Var(X) = 133.33, o2 = 266.G7
Stasrtlct Pail'll

A population conrists of values B, 8 and |Z' Tako


']lry.tibt3-i:T.l]tt :|;':f"3
XrtITffil;;J; ...,r,prii,s' with.':plu:onl,lI: I::lll:",.:,i:T::X
ffiih.,iilflf r.iiuiA pro"i thit etnndurd orror ol'rnean thc squurc - ie root
sia!'
of populaUon varianco dividod by tho samplc
.t
; Ar*S,Ed) I l.?6, ot=8,2222
2, A'Then show

m
=(ffi;;;'--.U *rtble ramploe of aizo-8eample
with replaccmcnt'from

o
; ;h"t tii population mean = moan of moans

c
(ii) rtandard error = population S'D'AF

.
'

ot
p
would be
a populntion are 6 nnd 2.16 reepectively' Whnt
if
'i.. L
r' If Jcan and varianco

s
;ffi;il; .;;;i ruati ii eumploe of eizc 4 arc clrawn with rcplaccment'

g
i{:'

l o
=nuo'"*tn*
of population ar-e ? and 8.16 respectively. what wo*Id
be
{,*t''6)
b
," '- ;;pi;r
.
(i from
Irffift error of mean if arc drawn without roplacement of size
:- ,0\
3
poiulation of size

4
,.J
irr 9.8.6) = o.48llo {n

9
if (i) sayp.lg of 36 ie diawn
$lhat will be the mean and variance of eample means {t;'s,6, 7' (ii) Slrnple of 4 is

9
, u.
'-'*tf,
t
,"pfl*rmnt frgm tho potrulation 1, zia,.a, 4', 4, (i).

a
fii*o iiin ut rrplalrcment from the population given in

s t
//
*1|iatili :
,'i;l'fr:[f d:ltlnfirffi"'i!:iii&#,1'ffi ,':l
p s
"#ffi;,stJl.'l}ffi oach-Fre obtaiired, what would be the expected nt€an

t
"ft[".iuaonh of tho' r'oculting earnpling distribution of menne if
t
end *anaaiJ-aeviiiion
;1rpUnf *|[ dilii) *ith ,.plocompnt (ii) without replacemcnt?
h
A[(i)Xf s 66, cI= 0-.6 - (ii)p*='66, or= 0'5
a tn-enn of 68'O.inches
laTh! hslght of 1600 etudente:aqq,normallV distributed withsamplee of eize 26 are
end e.t"nA"ri-i"ri"iion of 2.6 inchee.-lf 300 randorn
the expectcd" mean and stsndard
d'rn hom d;-;;;;fiti;n;'laomtrinu
'*rptini'aittriUution of meane if ryrnnling is done
'

tri.ti,ri Of ffi
(l) sith rcplacement (ii) without roplace*1nt' '
''
A1;Op;a 08, c*= Q.$;:"i (il)g;='68, of = 0'6 '''''
n
'

la, data.on.earninga of industry workero'''


--' ffrai.a"r"f Buneau of Statistlcg.collects
qf workerg in the indusJry is Re1$6p0' S1tpP389
iil ""ininst
' . thrt 'o*Il;;Llt
n ruch workers arc to be sclectecl nt rnndom., I-ct X donotc the
mean
t...
(+= {-L-r g,a
/\
(,@\i-(w-c
-25
P)
Dlstrlbutlons {Tl V^(-( )
weekly salary gll thc workers chosen. Asnuming n populntion strlndarcl
cleviation of Rs. foo, ,tnn thc rncan ilo I in ,Ut'i-
strrioarcl deviation or
(i) g=25 (ii)n=100 (iii)n=40d Si
Ans: (i) F* = 1{00, o*- = 40 (ii) Ff = iObO, of = 20 (iii) }r1= 1600, oX = _10
16. suppose a random samplc of eize n ie taken from a population of size N.

m
(a) Assume n = 1, Computo o; , if t\e sampling is done
(i) with replacement (ii) without replacement.

c o
(b)
t .
Assume n = N. Compute o1 , when sampling ie done without replaccment.
Ans: (a)(i) or = o (ii) o* = 6 (b) ox = O
/'
p o
s
16. A population coneists of the three numbers 2; 4, B, Consider all poseible snrnplee

g
of size two which can be drawn with replacement from thie population. Find the

l o
mean of the sampling distribution of variances.

b
Ans. Fs2 = 1.3333

3 .
1?.A population of 7 numbers hae a mean of 40 and a etandard deviation of B. If

4
samples of size 5 are drawn ft'otn lL;.,
this -^,...1^;:^- ^-, the variancc St = Zg-X lZ

9
population and
n

9
of each sample is computecl, find the mean of the sampling distribution of

t
variances if sampling is: (i) with replacement(ii) without replacemont.

a
st
Ans. (i) 7.2 ' (ii) 8.4

/: /
lE.A population consists of four values 4, 10, 14,20. Tako all pogsiblo sam;lles of
size two without replacerncnt from thia population and veriff that

s
f N \/n-1\
I'Y = (N_1/(. n
tt p
,/".,
Ans. prz = 22.67, o2 = 34

h
19. A population consists of three numbers 4, 6, 8. Take all possible samples of siie
two with replacement from this population. tr'ina tne mean and thi unbiaeed
variance for each sample. Show that (i) IrX = p and (ii) pg2 = 62.

1t.. t * = 6, psz = 2.67, lr = 6, oz = 2.67


20, A population consists of four values 4, 6, 8, 10. Take atl possible earnples of size
two without replacement from thie population.and verify that E(s2) = (
1s )",
where s2 = E(X - X)rl(ir - fl.
Ans: E(s3) - 6.67j oz = 5.
Basic Statistics Paft-II

qf size ftr = 21ith re,laccment


Y.*rx, reprepent the mean of c random sample
SimilqrlV, Irat Xr rcnl;sent the
*rrn frnite population consiqting of values 4, 8,
"
mean of a random semple of eize \t'= 2
with replacement from another finite

population consisting of values 2' 4' Form


a sampling distribution of X' - X''

m
Zt)
6: oI
!-.3
-Xr;=-
o
+
- p, (ii) Var (X;

c
Verify thqt: (i) E(X, - Xr) = lt, rtrz

t .
6' oi = 4' ltz- 3' of,= 1

o
Ans. = 3, var 6r - X, = '2'6'' |rt=
Efi, - X;

p
at 2 without replaceurent from 'a
22.oraw all possible random "nrr,plu, of size =

s
frnite population consisting of 3, 6, 9.
similarly dray all possible random samples

g
finite population consisting of
of eitn n = 2 ;iil;ffi;i"*:T;;'f;;;"iher
' " 2r416.
\I ' (a) Find the possible
l o
.tetween the sample means of thc' two

b
dift'erences

. -
I
poPulatiorrs.

4 3
(b)'Construct the sampling distribution of X'
X, and comPute its mean and

9 .*tH)
variance.

t 9 6, f(H)
Veri& that: (i) rx, - r, = lLrltz 1ii; of, - xr=

? t-a
G)

s
:.')v
4,oi = 6, ltz= o;=

/
A6;, lk, - *, = 2' ofir- rr== 2.1661,
--r Fr = E

/
^I
:
20( = 300' nr = 100
)' 6z= 250' Nr = 400' Nz.l^rrinrinn
Given pr = 4500,"p2= 4000, or =

:ffiJii";sdil#;;;";iiiii';",i"
'88. of the
Iil'l',ul=u61",i"rli",i,l?'"".0;.''d *'un ilu ---r -l1l,1Tt.
1::::""n
^r^-,r^-,r or

tt p
tho means if sarnpling ':-l"nu
"r
(r) with rdptacemcRt Gi) without replacement

24. h
Ans: (i) 600 and 40.62 (ii) 500 and 36'69
Given Nr = 125, nr = 30, pt = ?8' oi = 150'
200, l-r0, pz = 85 and

Xr) when sampling is done


ol2 = 200. compute E(Xz - Xr) and var(xz -
(l) with rePlacement (ii) without rePlacement
and 9
(i) ?, auu
Ahs: (r,
Ans: r (ir) q'rs 6'85
\u,, 7' and v'.--v
4' 6-''1I and 10. Draw all possible
ilfi, Th""" are frve digits in a population-,-!i'e;^-l2' ri-t rtlooortion t0)
t0) or
of
nnd llra oornnle p'oportion
lli"J;Tlr};:";ffi;ffi."i,"a
ai*tt in pach sample' Verify that:
the sampre

"r"ri pll /N-n\


(D' E(0) = p(population proportion) (ii) S.E.(0) = n \.N-1,/

Ans:E(0) = 0.6, s.E(0) = o'2, P = 0'6


Chapter 111 Sam and Sam Distributions

26. Draw all possible sarnples of size three without replacetnent froln the
populatlon'A, 4, b and.7. Calculate ptoptrtioh of odd nutnbers in cach.-
s;tttrul& ,t
andverifythat: (i) E($) = p (ii) var(01 ='T(H=)- --
where fi and p are proportions of odd numbers in sarnple ancl population,
' respectivelY.

lodge, there live five friends and their marital status is U, M, M, U,


om
c
27. Ina private

.
M where U an-ct M stand for unmarried and married respectively. Find the
,
i*o
o
frierrds without replacement from this population and find the proportiont
proportion of marriecl friencls in the population. Take all possible samples of

of married friends in each sample. Make the sampling distribul"ion of the

s p = Y (H-=
g
sample proportion and verify that: (i) ui; d (ii) o2^ )
Ans: 'p
u,. = 0.6,,. o2r., = 0.09, P = 0'G

l o
b
p

.
(a) Draw all possible samples of two letters each, with replacemcnt frotn the

3
2g.
letters of th6 word "NEW".

9 4
(b) Find proportion of lettcr "E" in each sample'
(c). Make sampling distribution'of proportions obtained in part (b)'

t9
(d)' Find mean and variance of the dist'ribution'

ta
(e) Verify that: (i) P0 = P
=s
(ii), or2_.p = pg
!.4

/: /
1/3, =
oi.p p
Ans. u^ 'p = 1/9, 1/3

s
29, suppose a random sarnple of size n = 80 is taken from a population
of size

p
N =-fOO. The proportiorrin the population being calculated is 72.3 Find the

t
9'o.

t ' -p = "'p
mean and standard deviation of the distribution of sample proportions rvhen

- h
sampling is d.one (i) with replacement wittrout repiacement.
f1)_
rd Ans: (iJ Vt= 0.723, Ga
"'p 0.050 (ii) u" = O'723, o,rp
= 0'046 '
80. Find the mean anct stanclard deviation of the sampling distribution of
proportions for n = 100 and a population proportion of
3

(t) 20 %u (ii) 40 % (iii) 50 % (iv) 90 %

Ans: (i) p0 = 0.20, o0 = = 0.05


p" = 0.40, o0
(iii) p6 = 0.50, o0 = 0.05 (iv) p0 = 0.90, o0 = 0.Q3
of size
3l. Let $, represerit the proportion of even nutnbers in a randorn Salnple
or = 2 without replacement frorn a finite popul:rt.ion consisting of values 4, 6,
!I
of
Similarly, Iet $, represent the proportion of cvcn nutnbcrs in a randotn s;arnpie
78 Basic Statistlcs Part'II

size n, = 2 without replacement from another finite population consisting of


valuee 2, 2, 5. Form a sampling dfitribution of (fi - fir). Verify that:
'
(i) E($, - 0J = pr - pg (ii) var (0, - 0,) = ? (H) - T (PJ
Ans. E(fl, - 0r) = o, Var ($, - 0J = l/9, p, = 213, P:r = 2/3
82. Solve directly:
(i) Given N = 1500, n = 30, lt = 22,4. Find'[*.
om
(ii) $iven n = 25, o* = 2'5' Find o?

t . c
(iiiFGiven N= 310, n= 100, ltX = 24000
p o
oe = 5000. Fincl o!x (without

(it)
, replacetnent).

g s
Given F = 6, oe = 8, n = 2. Irind p1 ancl pru whcn sampling is done with

l o
b
replacement and Sz = E( X-I

.
;21n.
whcn sampli'g is donc with

3
(V).,lciren Fr = b, Fsz = 4, n = Z Fincl p and o2

9 4
replacement and Sz = E( X-X ;21n.

9
(vi) GivenN = 5, n= 2, p= 6, o2 = 8. F'ind p1 and prz when sampling is done

a t
t
without replacement ancl Sz = E( X-X ;2ln'

s
:2,
f,,2= 2,Fr = 6, ltz= 2, o\= 2.67, oi= Fincl lt*,

/
(vii) Given ill O.OZ.
- *,
andof,-*r.
:/
p s = 0.3, o? = 0.36, o?r= 9.16. Find lrr -
t
(viii) Given n, = 49, n, = 36, FX, - f, ltz

h t ,nd oX, _ x,
(ix) Given px,
- xr= 4, $.t=
6, or = 2.25, Nr = 30, N, = 25, n, = 4, nr= 4,

6.25. Fincl p, ancl o, when sampling is done without replacetnent'


9x, - X, =

&1 Given N = 5, n = ,.p =


2,lrl 2/5. Find p and of, when sampling is clone without
, p

a.
replacement.
(xi) Given pr= LlZ, 'pz = 1/3, Nr = 3, trr = 2, Nz = 3, tz= 2. Find B(0, - 0J nnd
S.E. 0r) *hon sarnpling is done without rcplaccment'
6r -
Ans. (i) 22.4 (ii) 156,25 (iii) 33.98 (iv) 6, 4 (v) 5, 8 (vi) 6, 5
/,,ii\ ,{ 11.67
(vii) 4, A1 /rriii\ o 1O82 (ix) 10,
R 0.1082
(viii) o0'3, 10. 13.166 (x)
k\215.0.09 (xi) 1/6,0.3436.
215, O'09 (xi)
Chapter
12
STATISTICAL INFERENCE
ESTIMATION
om
I2.I INTRODUCTION
t . c
p o
A pereon eelected at random from a certain place shows the colour; habits and

s
language of the people of his area. He gives some information about all the people,
We say that he ie the representative of all the people of hie area.'Whatever we gain

g
from thie person ie the inference about the lot of people among whom he has b6en

l o
selected. Frorn a particular pcrson, we gain information about the mnin group of

b
people. The information travele frorn the particular individual to the gener'hJ group

.
of people. To get better information about the people, we may select more than one

3
person out of them. We say that the sample eize has been increased. A large-saniple

4
usually contains greater inforrnation about the population. In our daily life, if an

9
individual says something, we may not accept it. We may have some doubts about it.
But if many people say sornething, we feel like acccpting whatevcr thcy say. The

t9
same logic works in etatistical reasoning. Every individual from u statistical

a
population speaks about the properties of the population. Every drop of water from

t
an ocean has certain characteristics which lead us to sorne conclusions about the

s
water in the ocean. A single drop of water or oil is,sometimes sufficient.to give clear

/: /
picture about these liquids in the big containers. In statistical studies, if the
population contains individuale whose characteristics are similar, then a small

s
simple random sample is eufficient to give the required information about the

tt p
properties of the population. We decide about the sample size according to, the
nature of population and the nature of the study. The conclusions based on a sample

h
data are related to probability.
r2.2 STATISTICAL INFERENCE
The information gained frorn the sample data is used to reach soine concluslons
about the characteristics of the population. This process is called stolistical
. ilference. Statistical inference is baecd on the principles of the sampling theory.
There are two approaches of statistical inference namely,
(i) Estimation of parameters (ii) Hypotheeis testing or testing of hypothesis.
In this chapter we.diecuss esti,matiotr. and hypothesis testing will be discussed
in the next chapter.
12.2.t APPROACHES. OF STATISTTCAL TNFERENCE
There are two approaches of statistical inference called estinwt,i,ort, und testing
of hypothesis. These two approaches are completcly different as far as their
79
Easic Statistics Fart-II
80
0v @'*.#

on i,h* riilir'xi theor;' c'f proi:a$:l'll


conclusions are concerneci but both zrrc basccl il1
principles of .u.rrpting thcor.'*' They start wirl: a sr'r1:;f '.
the .1a1n':::'::",lii:i:::,^li:
ffiJ;;:"#fi;ffi';; ;i;; ,;;;;oo"""1'o'ate paths ,!'11':':-j,:,.ij: ::::.f'::*::
irffi,h"^;;;;;;;-. of, stnokers i* o,-ir cou*trv. Tl-ris rs a^prohlem l-t]'tl:"f::
t"ireE of some size claims that
ffi; .r7;,1"r;r,,. !:"u;; u ,r,untrr*ti;.re' of .opp*t rr L^1*.
H'J;;-'*il;:;;;;t';'"I1,i."i,,"i;'. i' 10 kgu;'rricr
i{is clailn is to }'rc tesrt''d rvith thc- help
ir1'pctiresis testing'
oi uorr-r. test!. Thisls a probleru rv6icit
corncs

12.3 ESTI}LA.TION
Statistical inference about the tlnkno',vn valu::s of tire
om
popi.rlation i;ararr:ciers is

. c
tO lintrv i'i:ie avererge lif"r
called estit*otio*of paratnet.-rs. Siippose lYe are lutct:cs-:r;eci scr:':ihitrg which is not

t
atr csiirnate of
of tires of a certain firrn. This lBcans u'e \l'ailt

o
known to us. It is a prohlem of eslirnetion. The
tninitnuri a'ird maxixlurn cholestrol

p
These e;ititnates are provided by the
level of persons is also a problem r.)f csiirnation.

s
is donc hy't"ro methods rvirich are:
sarnple oSservationu. Eutlrnuiion of para-inetcrs

g
' (r) Point estimation (ii) inicrval e'si'iination'

l o
rz.g.i POiNT" ESTIM,t.TO{I A}JE I}OIN? .trSTi&TATE
.. point estimatc is a value ^caiQuiar.ect frci:-, the sample riata. A sirrgie val'ue

. b
calculated fro* , *omple or: saulples is ceiiad
pciirt esttmate' Consirler a Sirnple

3
of this sarnpio is
;;J;;.;;;* ri sizc t with values as 10, 15, ?0 a::d 25' Tire rnean

4
(10 + 1g + 20 * zisii= tr.i-rnu* the sarnpie ineen 1?'5
is a poi,b estimate of the
to knorv thc 1:ercentage cf

9
unknown population paratiietcr it. \'ve ur* irrtrr*stcci
the- breakfaltt w1 have taliern a

9
chilclren under 5 years who talie tca regnlarly rvith

t
tea'
simple ranclom of 100 children and AO are found to he habitual of taking

a
"*;;;i"
The sample proporiion 60i 100 - 0.6 is ea.llecl the
point estirnate of the unknown

t
value calculated from the
population proportion p. If ihe pararneier is 0' the specific

/: / s
*u*pt" is calied. a point estitnate of 6. Srppose we have t$'o samples
of sizes n' and
the differe,ce (0, - 0') is a]'so

s
n, and we have calc*lated thcir p'opo.tions ff, an'l ffr'
popul'ation paramt:ters' This

tt p
a poiirt estirnate of the .actual Jifference between tire
(p, - PJ'
unknou'h po.u*ui.. may be denoted b1' r ,i
is based on bhe

h
wt,u. is uscd in gencra} far a s,iat.ig Ltc. Esiitlt,olor
The word esdill
sample and in general differs from un*pie to sample'
It is a rantlom variabie with a
probability distributior'r cailed the sampiing distribution'
If the unknorvn populatron
uy 6 (tnetu hat)' If the
pararn:ter is d.enoted by 0, its eslurtotor is gencraiiy d,enoted
XX/n (*ith any value) is
parameter is the poputration mean pt, the satnple.meln =
X
Lhe estir*ator ofp'- Medlan and rno,le are
also esti'm;tars of p' Thc sample proportion
., t(x-x)2ut'aI
p.'repsvtion p' The sanrpie variartt:r's s' = -:l--
$ is an estintatorof popuiation

are estirnalols 0f popttlat'ion variat"lce o'' Bre have already learnt in

chapter l-1 that li(s?; = o? and }t(S?) * o2. r\t sorne


later st'age we shail call s2 an
oi o!'
unbiased estinrator of o2 ancl Sr a biasccl estirnator
[Chapter 12] Statistical Inference Estirnation 8t ;

I2.3.2 POINI' ESTT&TA.TION


Po*tt e"qtiliioliorr is a plccess of getting a sirrElc r.illuc frorn t.he samplc as al)
esthn,ute of the unlinorvn poputrutron parameter. h poiit! e,qii,tttole, in general, is not
eqrral to the population paramctcr. Point estinLal,iott is of great irnportance in
practicai lif'e. In o'rr daily lii"c it is qlitc c()mi!ro:i thai.xc rnal<e use of l"he poitr.t
esl.i,nwtes lvithout r',-.fcr'iii-rg tr,, ijrl. i,ir,lt ui''.',:tti',,it,"rtl ii:ilr;',,:1crt. 'l']ic 1;orcr.:ut:rge of
peoplc iiving in rer,icr{, hct:scs is;r 1-,;'1vi;i,',lu r.,{',r,ri;l.it.l t:tiinr<;iit;tr.'i'lrrr )clcentage of
bottles proper'[y'fillerj is provitlerl i,i'u i;r,'in/ lts!inie!e. 'i'ire ;rercrrrLagc r-rl'biibrcs ivl'r<.r
are born rvith,phi'sical dt;fi:cts a;rrl thr: pci-cei)t:l6c'cl' cirildren rvho do' not get
om
adinissirrn in thr: s,:hcols lrlc thc iirc::ts of ;,,i;i;rl cstitu,rtl;otL. A serious drawbeck in

t . c
pttittt estintat/ott is tj:rit tite antotint of t,i'r(,r'Clrtrni.rl i-'c caictrilrtcd inpoilrl cslirttatiott.
T2.3"3 UNI}IASEDNESS

p o
s
\Vhen a Iarqe nunrbr:t'of lanrir:rn sar::1:lco i;f';: givl,n si:"c arc takuir and rire

g
vaiue ,:f the e:;tilitar,-rr is caicuiatril fbr cach sai:.,pjc, il:c a'?"{-.rsBe r.if t}rcsc valucs may
be equal ti; thr.r 1;oi,rull-.ii;n paritrictcr. An cr:tirnatct'liar.ing tlris propcrty is called

l o
u,rftiase'd t:sli.ttir,ttit'. irr ot.hr.-.r rvolcls, the usri.na:o:: 6 is called unbiascd cstimator of
the paramet;:f Er t
. b
r tii) = 0.'fiie esfirtiatur is a st;itistic with a prol;ltbility

3
disiributic* w l-..i,'ii i.s c*lir,ri the. sarnpling clistril-nrtion of iiie staiistic. An estimiitor is

4
calied an ulti:rased estir,iati:;r oi ti',e pr:pulaticn pariu:rcti:r if the me&n of rts sa:npiing

9
i,iilauletcr 'Ihe i.
Cist,rrbuLiitn lr; tr:JuaJ t,r tlie ir, an tritbiosed esii.ntettor

9
-.e11i;.,1e rr:c,rr"r

t
of the pop.-rlatiun niii.;:iri li. It rriear:s E(X) = pl. Ti.re sl:'l't-lpl6j proportio:: () ls aiso

ta
ttttbillr:;t:ti rslir;i,lf',;' of thc ,'rrj:uiatiori proporiiorr p br:cuus+ E(0) = p. '['ire sanrple

v*.riance .' '.'.=:'[.1I,


/: /
:i'i - -t:: ,.
s ti,;'tticseri eslirnolar cf tire popuiafion variance o' but

s
\', Y r,' rZ

tt p
qj-*-----:sabia.;ec cstin-iator r.,f o? and tire bias is equal to the diffcrence
ti
j)
H(S - o:.

h
UtLl:iosediter-ts is r-,nt of thc tlnp,:rtant pr..ri.ri:rtles of goot-l point eslimators. Othel'
properties cf prlrcd f otnt csiirliitoi's alc ccrusislartcy, e/t'iciclr.c1, and s;.tfl'it:icrtcy. These
properLres r,viii nryt be discLrsserl in this hooll.
12.ts.4 ISiPCIltTANCE Oir Uh"Ei;',SrlDi{CSS
a tireJCl roie iii strtis'iic:ri ir:feicncc. l"'he next chapter i.s
Un[,'iaBeJne*os ]ri:1,'s
about the testing of irl"potht-'sis wirci'ein rve sliall icarn tirat the h;,rpothesis abor-rt the
populaticn parantttc:'i$ infacl th,: h5:';;ctliesis rl:oril'rire sarnplrng ciistnbution of the
estimator: of that pararncter. if"ivo int'cr that ilic ltoan of t,ire sarnpling distribution
ofX say 150, it means that tlie l1i*at1 cif l.he population aiso 150. This is due to
the unbiaseclness of X. tf X were a biaseiL estin:ator, sorro vcr';v irnportant tcsts of
hypotheses about p would not have hcen pcssible.
Baslc Stitlstlcs Paft'II
82
12.4 INTERVAL ESTIMATION
parameter
If a random interval is calculated so that it contains the unknown ilfioruo.l estinr'ate or
with a known probability, then the intcrval is catled confi'den'ce
process of finding such intervals is
simply confidence ;;;;ili for thp paratneter. Thc
gained a lot of intportnnee in
called iruterual estitnaliort 'llte intcrval estirnation has
statistical inference. It is based on randorn sarnpling.
ih,,s the conlidctrce itlterval
on tlte saruple data'

m
constructed is a random term because it is based
provide the estirnate of error'

o
A drawback in point estinration is that it docs not

c
Point estirnate is a single value and
No assurance is atiached to the point estimate.

.
the value of thc unknown

t
it is wrong to think that a singie value will be equal to
..tinrJte has some assurance of containing the population

o
parameter, The i";;;i
parameter. Corrriitn.J int"runl tclls us with a knorvn
dcgpe of confidencc as to
*h"r" the population parameter actually lies'
s p
g
L2.4.I CONFIDENCE COEFFICIENT

l o
Theprobabilityattachecltotheconfidenceintervaliscalledcon'fi'den'ce
coefficientor level oiconfidence. It is denoted
by 1 = a' If cr is specifiod as 0'05' then

. b
con{idence cglffgient in tcrms of
I - cr = I - 0.0b = 0.95 or 95 o/o. Wo .an speak of coefficients *hich are conrnronly

3
unity or in terms oi-f.r.untugu. tt.t.orrfi,lrn.u

4
.,rud ,r" gO Yo,95 o/o,98 % and 99 %'

99
To make a confidence interval estimate of the
parameter 0' we adopt the
following Procedure.

a t
t
Xn from a
(i) We take a random salnplc of size n with obsorvations X1' Xs' X't' ""

/: / s
population with unknown paratnetcr 0'
(ii) The point estimator of 0 is decided. Let it be 6' tno point estimatc
denoted bv

s
tt p
6, is calcrlated from the samPle'
(iii)Theconfidencecoefficientisdecidetl.Letitbe(1-cr).

h
lower limit and u is the
(iv) Let the interval be denotcd by (L, U) where L is the
upper ltmit.
(v) A certain proceclrtre
ra urrrLa.r of calculatine ! ?"d,Y t":d:11",1-t;t|-':::*?:"3':Ji:
n,rvvvs..^;:^,li-
in contains the parameter 0' In syrnbols' wc may
l, - ") that ttre interval (L, U) "I''to
;r;"-'r[;".;. j= 1 - cr where cr.ries !etwy,1 o,Tjl]1j,,:t^t:,.:',:*:I
I t L--t:r i^.,^.r^llrr
u
H:i, ;il ;;;J."a. ;;
irrrru"r
^'u'1,
and u which are called Lhc tower
"f of the parameter 0' L and! TY u are random $armo
^--^ -^--l^* terms
;,;",;,,;;;";r',;i;;;,,;;-1i"in'
based on the samPle data'
point estimate
ffr. to*u, Iimit L and the upper limit U are calculated from the
6,r. Thrr, as a general rule,

t=1, - k (Standard error of 6) and U = 6p + k (Standarcl error of 6)


lChapter 121 Slatisflcal Inference EstimaTion g3
where k depends upon the shnpe of thc, sarnpling clistribution of 6 and the
Qonfidence uusrrruruilr,, J, - (r. Thc
vvrrrrqerrls coefficient r ltc estinrator
esunla[or 0u lnlly bc
bc sample
sample lncan X,
x, sitmple
.. n
proportion fi, the clifference bctween me{lns
dr - X, or the diffcrelce bc.tween
.proportions 6, - 0r).
L2.5,I SELECTION OF PROPER CONFIDENCE INTERVAL
For making the confidence interval estimate frlr sonre paralneter, we Suve
use the appropriate fonnula. Sonre iritervals arc basccl on tire
om to

c
nonnal'clistribution

.
and some are based on the t-rlistribution. It is in fact thc sampling

t
clistribution of
the statistic which decidcs thc fornrula. If the sarnpling clistribution

o
of thc statistic is
. a normal distribution, then the stanclard normai variate Z is used in the interval.

p
and if the distribution is 't', then tlte ranclorn variable 't', is used in

s
the formula. It is
important to note that it is the sarnpling distribution whicrr a".ia"r tt e proper

g
formula. It is not the parent populationwfrictr decides the interval, though
the shape

o
o'f the population distribution also plays its role in cletennining

l
the pro[u" interval.
We have to examine the following poiirts for rnaking the confidence

b
intoruui i* trr"

.
population mean p.

3
(i) Parent Population

4
What is the shape of thc population which is sarnplecl? Is it normal,
9
approximately normal for practical purposes or known to be non-norrnal?

9
(ii) Sample Size

a t
The sample size--n plays an itnportant role in thc statistical infcrence

t
about p,
When n > 30, it is called large sarnple size. According to the Central lirnit

s
theorem,

/: /
the sampling distribution of X tends to normality by incrcasing the sample
size.
(iii) o is Known or Unknown

s
tt p
If o is known, the distribution of X can be ass.umccl normalcven if n s B0

h
provided the population is normal. If o is unknown and n S
30, the distribution of X
is not assumed to be normal.
t - DISTRIBUTIOI{

The sampling distribution of X forms t-distribution under: the following


conditions
(i) The simple random sample of srnall size is drawn from a normal population
with
mean p. This assumption is very irnportant for any inference about X.
(ii) The sample X1, X2, X., ..., X,, is selected at random.
(iii) If there are two populations under consideration,
both. are nonnal with equal
variances.
Bdsic Statistics Par!!!
n from a normal. PoPulation
with
If Xl, Xn is a ranclom sample of size
Xz, X3, "',
is
o2 and X is the sample mean' then the random varrablc''t'
rnean p dnd variance
dcfined bY
defincd as t=ffi s is the satnplc standarcl deviation
.rvhcre
t--- - with (n ;

m
1)
- x',,, The ranclom variable- 'r,' forms the t'clistributidn
s-. -= 1 /t<xn-l "'-"

o
\

ct.
rlegrees of freedont (d'f'')
Thet-distributionissymmetricalabotttits.tn<:anZetolikethenormal
changes by increasing the samplc
t .
size'

o
distribution. The shape of the-
t-6istribJiion the normal
lar*" i;; tt'u"t-tii"iribution tends

p
When szrmple .ir" i.-r'.,ttciently '0t'

s
distribution.

g
Tables are available foom which
we

o
given values^
a.""a tire t-values for
ol^-o.- ff = 0'05 and degrees of
ir."a"* is"I' then from thc t'table' weI
bl
,-oad r.rrd'", column 0'05
and against

3 . u

4
is
;;;; of freedom' We get 1'833'.It
as ts.05(9) = 1'833' Sirnilarll' . -to721d.r'.) t=o
9
,"Jtt"n hlzt't'l)

9
= 2.ii06' . Figure t2' 1

t
to.ozs(e)
MEAN

a
ESTIMATE OF POPULATION IT

t
12.6 CONF'IDENCE INTERVAL

/ s
(LARGE SAI\{PLE)

l.-K"";l
,rd s
:/
Lettrsconsiderapopulation(normalornon.nolmhl)withmeanlwhichis

tt p
*ir,rr", i. assumed to be known'
A siinple random
. unknown
,#'";ffi:?

h
sampleofsizenisselecteclf;orni}repopulationandthesamplelneall*is
calculated.whenthesarnplesizeislarge(n>30)'thesamplingdistributionofxisa tt I
where o* =
distrir-rution with mean px = p and standard. error oo fr'
normal
may
the population is infinite or it is very large' The population
assumed he.re that
ormaynotbenortnal.ThenortnaldistribtltionofXisshownbelow:
TherandomvariablexcanbetransformedintostanclardnormalvariableZ
u to + o' Let
Dcf,weelr - oo
X - u -' r --- --^-r^k
variabl 1o7. oqntnke
e Z can take anv value bctween
any value
rvhere Z = -#' The random
o/\n
rtsmarktwopoirrts-Zgalld,Zs.onZ-scale,wherecrliesbetween0and]..-Zgisa
Chapter 12J Statisllcal Inference Estim
85
point on the left of which thc arct
unrler normal, rlistributio n of z isl trnd zg cuts
2'
cx.
olt an area, to the right. Thus thc a.ea
of the norrnar curve betwecn
-ztLandzsis
I - the total area unclcr thc not',ral culve
q''
bcing unity. out of all possibll u,,luuJof
z' 100 (1 * cr) % of the-values o".upv thc spacc

m
,]ro,'toa I - a. Thus thc proti*bility
is(1 - cr) that the rancrorn variabrc
2 wiil iarru a ,oir" bctween
o
_ z!!. andZs. This

c
probability staternent can be rvrittc.n

.
in syrnbols as p [* Z*. Z . Z{ :t _ o, tiuttioe

t
x-u , we gct t'-"i)
'=JG
p o
-lt- .-. 'iI
X-u
g s
L z o/!n --iJ ^
l o
u

b
Without proof,.we writc thc

.
conficlence interval for p rvhiclr

3
is

4
X- z*# *rr.X +Zy*
'-I

9
z Vn {n .=Zu/2
0 Z.=

9 ,r,.,,,?;:'il:.'.rr*donce
Zu/Z

Ir ie etr[ect 100 (l

a t
.* a),,/uconficlenec

t
inrervat lirriit ia

s
L - I *Z+*

/: /
arrrl tlruul)pereonfideneeljntitie
+Z!*
".'- -'''r='vv'r'rqErrLv rlllllL lE U
U E
2 Vn =XJ
Z Vn

s
'l'he interval ean slrio be written ns * ,t
X

tt p
fr
for'1r,

h nreCI

0,{750 I o,qzso
.. ,_1.*_ ltr , _ .
0.
Z
--1,q6 I
ion
-1,06fr <1,< +1,g0fr iriguuo ILB

I,'or gg % eonfide nco,intelvnl ltrr a


1r, c = 0,01, - 2.68 (From
arsa table of norrnal rliiriribution)
2^ 0,006 nnd Zu,,,,n

X-z,oefr <rr <XF2,68*


Basic Statistics Part-I!
86
rvide intcrval' which
th:.ubllt"?.?.'1:]nte'rval' A verv
This interval is wider tlln p is
givesffi-;;p*U"Ut" confidence linrits for
o
# <F<X*3
X-g"r/n
'L : T
\n vaiue of p'
This intervuil" **o't certain to cotrtarn thc rruc
tion
ot' samPling is done without
om
rePlacement' the

c
When the PoPulatiotl is finite

.
is given, the confidence interval
ft\F t
whenN
siandard error of f is o* =

for p would become'*


p
.';fta[., o
s
- qft \F' p' x

og
CONTIDE*"I. INTERVAL
l
rZ'.C.r 'iHg
MEA}.IING OT

b
- 1'96 a"a
p' ThL interval hrrs L = X

.
Let ue consider 95 % confidence
interval for ft

U = X + 1.96
4 3
probabilitv is 0'95 that the ra1-dom interval
(L' U) contains

9
fr 'n" rnean, th-"t if samples of size Yere
n- repcatedly

9
parameter p. This conficlence interval
the unkhown u;J if thc to"i*
t
taken from the population

a
samplq:::i e5 out or 100 such

t
ror each
r;ft,x;;ruft)*'*'::louted
iij
/ s
;"*ff lty.*:*",:1i,"#.Tl;l';:lffiiililt*T?ilxril;u%.-chancesthat
/
s : u;hichr'ill nob cover the-value of
o 24
it' suppose x = 80' =

p
we shall get an interval eonfidence interval' we
get

t t *-r.;6 = 80-l'ecft
and n = 36. Let
":Tilil;;;;;;;;il '' = 80-1'e6x7'84 64'63

h andx+r.seft = 80+r.gofr =
-r/n
= B0+1.96x?'84
a-r---^ inle;vul
r*+^urr.l for
fr.n p
rr is
95'3?

ig (64.63 to 96'87)'
(64'63'to 96'8?)
,., "lr.o*.; tt"ll* eample' 96';':onfldenee
ll'-t, Ar rhis srage
is1
*,, .nnoot'sav that nrobabilitvwcl
we carl it 96 % dil;;;;il;rgl eont*in.-in6 r*tuu ;i:;i:'B;?;rc toseing a rlie'
0.96 thar tho rnterval
(64.68,.g6,ao
*.]ii';;;; t" tr'tt di;' b; when a dio haa been
rav that probabilitv it 1/9 that,4
l'';' ,,;t bt'en obeerved then we4are
not
toiEed and the fase'l hag bee-n- oU*uruuii;il sn the
Now *, ;;;.;..v trrgl pt.d;tlitv of sottins
in a situation of piobabiliry,.
til".rri*i*i.1-i;;ii; t';fric ruloc' thsn thc
die ir 1/0. rf z %-;iii; drivcre un.
ari"r, trru;ril;, ilui wr'en a drivsr hag
orobability is z/ro6'="i;o;iil; ilfi;e;-
"iiiuiJrutuno* r',. has n-s- concern with
the
ma.e a misrake;i'i; ii.g u9o1 g
.ituatio ns, whe n eo rnerhi
n

probnb*ity or o. oil irffi iriiri i; i.;'J;iJIiir"


[Chapter 121 Statlslicat InferenceEsUmatton E,
has happened or it has not happened, it is now ricliculous to
probabilitv of its happening. when Mr. A has died, talk about the
*" ao noi*y il;rolability of
his survival is, say 0.90 when a'confidence interval has bcen
.inrtiuJtua from the
sample data, it
.is- now sonrething which hap hafpened or which har been
deterrnined. The different possibilities are not involvcd
no*. The calculnted interval
is now not a random variable. It is the realized value
oitr,o r"nii, iii"r'*r,
Example 12.1.
(a) An electrical iirm tnanufqctures light bulbs that have

om
a length of life with mean

c
p and a stanclard cleviation of 40 h-ours. If a ;o,";i;

.
of 100 b'ulbg h"!

t
life of 780 hours, finrr a g5 % confidence intorvai ro"iri. "n-;;;;;;
;"priil;;;;f ii
o
bulbs producccl by this firrn.
(b) A ratndom sarnple of size n = 400, sclected without

sp
replacement from a
population of size N = 2000 rvith o 4, the samplermEan

g
ie found to be x g0.
construct a so o/o confidence interval= for ths true mJan ,fth";il;Uiion. =

l o
Solution:

b
(a) r\ 100 (l - s) % confidence

.
intcrval for p is

3
\- zy*
zVn.p<X +b+ q

l-o4
lVn

9
HereI = 780, o = 40, n =
9
100, o
= 0.96 or = 0.06 and != 0.026

a t
From the area table of normal dietribution, we hava Zg=
Zo,otr= l.g6

s t
Hence the gb % confirlcnce irrtorval for p ie

zrro -.
/:
r e6
/
(ffi) <p< (#J
z8o + r.e6

s 7ii.t6 <
tt p
780- 7.84 .< pr < Tg0 +1.g4
<p 797.94

h
O) A 100 (t - rr) % confidence interval for p ie

N * 'Ls+
2Vn
Heren=i00,N=2000, da4tf;=gg, l-o = 0.g0 ora = o.l0 and f; =, o.oo
we have %E= 2...='1 64;
;:':,::ffii:1,,T,::T::Tii:::T
Bo- 1'6''|5 (ffihhffi < p < so+ r.*(#hffi
80-0,204 < p < E0+0.294
79.708<p<g0,Zg4
Basic Statistics Part-II

n
.In the previous article it was assumed that o is known. In practical siturttions,
o is u;ually'not known. Wherr o is not ktrown, we can replace it by
the sample
stanaara deviation s. In this cqse the confidence interval for
p is

F,_Zs*.rr<X+ZsA
-I!n 2Vn

m
used only whcn n is
is isportarrt to note that this interval estimat:...ot be

o
It-B;t-iiii
be normal. Whcn the population is finite'

c
t"re". p"o"fation rnay o, ,ray

.
'ot

t
the interval for p would becontc

x-rtftatr.p.x.4rt
p o
s
This interval can bc calcttlatctl rvhett N is givcn'

':"T:: {";:;ts
og
of a random sarnpte of I'r0 collese stuclents showed a mcan
ot 114'5

l
c.nti*.iriJ-?iJ'-" standard dcviation of 6.9 centirrretcrs. construct a 98 %

. b
.oona.n.. interval for the mefln hcight of all college students'

3
Solution:

4
A loo (1_o) % conndcr;1,+,jl::l_
r,*
99 Vn --I
!n
t
2

a
1-o q = 0'02 and fi=
= 0'98 or 0'01
= 6,9, n = 60,
t
Herefi= 1?4,6, S

s
2'326
From the area tablc of normul digtributionl wc have Ze = Zo,o, =

/: /
Hence the 98 % eonfidsnco interval for p ic

p s <
17+6-2,s20(ffi) < rt l'szl(#J 174'6+

t t <
174,6-2,27 < Ir 174'6+ 2'21

hrtri [l <
L12,29 < 11$,17
Exomplo 11,8,
:yrtotie blood prorcure of 00 nrsn hrte a rRuuR of 128,0 :n* ol 1l]nl.:::I^r*
, .r.iinri ffi;;ffi;l iirnm of n*ruuri, Anuunring rhar rhoc' .nr' a -,-r*ndom
Lr -^l
gg *, ounfidoneE intorval fsr ths maaR blood
rample of blood pru,r*., uuiuutntu u
Breriure in the Population,
-golutlontA
100 (1 - E) % oonfidsnee interval for p ie

7,-'lh!*.,,<X+
-I tv*
lrr r r1n

Horol:. 128,9,8 r 1?' n r g0, l-c !s 0'00 or s = 0'01 nnd fi*0'006


l
[Chapter 12J Statisticat Inference Estimation -,qe_
IIromtheareatablcoftrot.ttritlclist,ribution,wel.tivcZg=Zo.oos=2,575
Hence the gg % confirlcncc intorval for. is
1t

128'e-zszsffio) < l, < r28.e +2.575(#)


128.9-4.61 < p < 128.9+4.61

m
12,1,29 < p < l33.bl

o
12.7 CONFIDENCE INTERVAL ESTIMATE
TOR POPULATION MEAN

o-Known, Popqlation Normal


t . c
Here we are stressing that the population is normal.

p o
tf n is small, o is known,
then the random variable z can be ,secr in the i,rru;;;i;;l;;h;;;i;upo"p.,r,,tioo

s
normal' The confidence interval for i. tt o same as for i"

g
f, the large sample. Ifor the
convenience of students, the r00 (l a) % confirlence
-
o
interval fqr p is.*;;;;;;

l
here i.e.,

x- Zy*.r.X+
zVn
. b
2,,*
3
zVn

9 4
This is an importunt case in rvhich thc, ranclorn var.iable

9
When n is small, population is nor.rnal with unknown
Z cannot bc used.

t
o, the randorn variable

a
X-u
s/!n
s t
/: /
Let us mark two points _ tf
,n- ,1

s
and tlr, r) on the t-scalc in thc figrir.c.
-

tt p
Using tables of the t-distribution, rvc
can find - t;,r- ry ancl tlrn-,;.'l'hc

h is (l-er).
ar.ca
of the t-distribution betwecn - t,,
-r(n - l)
.

I -ta/2(n-l)l t:Q tq/Z (n-t)


and tlrn-,y Thus the
Figure 12.4
probability is (l-cr) that the randonr
variable't,will fall betwcor, _ tftrr_r1 and tlrn-1;.. We can write
thc probability
stateltentas: .n < t. tfr"-,r]
[-t]t"-,t',r) = 1-cr
_ Putting the value of wc h:rvc

I
PL-5,"-,,'ffi '
x-,,t;,,,-',JI
= t -o
Basic Statistics Paft-II
90
we can gct the confirlcncc interval for
From thig inequalitY within the brackets
p is
r,. tt - *tonddontu interval for
"iliioti- "\

m
the random variable 'Z' ot't' is to be
12.1. can be used to clecide whether

o
for p'
-i"rrc
.,""a in ."king the confrdence intorval

. c
Table 12.1.

t
n - Srnall
Z
o
(normal PoPulation)

p
s
t '(normal PoPulation)

fr*omPle 12.1,

og
l
population gives the sample mean
A random eample of n = 20 fro.rn a norrnal gSoio confidence interval

b
gampi"ltrna"ra deviation, s = 8' Construct a

.
140 and ttre

3
for the PoPulation mean'

4
Solutlon: is
interval based on t'rlistribution
Hen n ie small, therefore the cqnfidence
9
.l i6O (1- o) % conlidence itrterval for p is
9
t
"*a. e c
-F
X- $r"-t,G<P<x+t!,n
a
,r
r/,r

s t l-o = 0'98 or cr = 0'02 and i = o'ot


/: /
HereX= 140,8 = 8, n = 20'
to.o,rtol = 2'539
Fmm ghe t- table, we hate tf t"- ,l=

s is
tt p
for p
Hence the 98 % confidence interval . .\
- 2.53e (#i < P < 140 + 2 53e (frJ
h
r4o
; 140-4.54 < P < 140+4'54
. 135.4G < [t < 144'54
D*omple 1218. ' ,^^^^ rL^+ o-o r.rrlin A lample o.
pieces !!ut^u13 cyiindricai in shape'
s

A machine is producing metal


th"i, diameters 0.9?, 1.03, 1'04, 0'99, 0'98' 0'90' 1'01
piecea ie tslcen -rina gg %"*"-i.or, t"l"Y for the mean diameter o
and 1.03 "rrd
..ru*"f,rr. a *nnJon.. '

an approxrmate normal population'


pieces produced uyitris machine, assuming
Solutlon:
p is
A 100 (f - a) % confidence interval for q

x- t5t,,,,f; ' p' x*';,,'-,,i8


[Chapter 12] Statistical Inference Esti mation 91

HereXX=9.0b, EXz=g.10{11, D=g, X= = oil


? =1.00b6,

s2 = ;h[tr, ry]= {[r.rour -tt#,,] = 0.0006, s = 0.0245,

l-cr=0.ggora=0.01 and ; =0.008


From the t- table, we have tf tn-,, = tn.ooera) = B.Bbb
Hence the 99 % confidence interval for p is

., . r 00b6+B.s'b(ff) o
m
.c
1.00b6-BBb'(ff)
< 1.0056 +A.0274
1.0056 -0.0274 < [r
ot
0.9782 < p < 1.0330
12.8 CONFIDENCE INTERVAL ESTIMATE FOB THE
s p
g
DIFFERENCE
BETWEEN TWO POPULATION MEANS (LARGE SAMPLES)
of and o! known
l o
. b
Consider two large populations with means p, and p, which are unknown and

3
.

4I,
variances of and o'! which are assunrcd to be known. The populations may or may

9
not be normal. Two independent random samples of sizes n, and n2 are selected from

t9
the populations and sample mcans X, and are calculated. The point estimator qf

ta
the-difference between p, and p, is given by the statistic X, - Xr. The statistio X, - X,

/ s
is an unbiased estimator of trr Ir: and has the normal distribution with mean pl -
-

:/vr, *
lo?-o,

s
p, and stand.ard errorl /; *
n, tno standard normal variable of
. (X, - X; it

tt p
Z=
h
The probability is (1 - a) that the value of random variable Z will fall between
trvo selected points -Zsan,-l Zg. \4Ic can writ-e the probability statement as

Pl';'z'z;j = r-cr
ur'l-";-
or d -r,. (Xr - x,) - (ur - rt::r .rr1= 1-a
!,gE 1l
92 Basic Statistics Pad-II

We can simplify this incqualitl, to gct ihe 100 (t - u) '% confidcncc interval flr
pr - Fz rvhich is

f"t-e it -z6;
/o',
(Xr -Xr) - Zs1\l /-t],* -' t,, - pt: < (X, -X::)+r:\q *rr-
11.,
" l;

This inberval estirnate catr bc rvt'ittctr tls


f22

m
(Xr - Xr) tzs\ /5 .1
o
z \ n' I1'r

c -
If we want to get the confidcncc intcrval of pt, - [ti, wc shall use the interval in
i. trscd. Thus the conficlcnce interval for p,
t .
o
which the difference (X, - X,) ltr is

p
r)
l": o;

s
(Xz-Xr)tZsli:+*
z y tt' Il:

og
l
In the numerical qucstions, thc valucs of X, and X, ato usually positive but the

. b
difference (X, -X, or (X,.- X,) nray bc positive or ncgntivc.'lhe conficlence linrits of

3
(tt, - trr) or (F2 - p,) are sotttctirncs Ircgittivc'

4
Exomple 12.6,

9
Arandomsampleofsiz.cn,=2Stakctlfromanormalpopulationwitha

t9
standard deviation or = .b has a lltean Xr = 80. A second random sarnple of -(ize

ta
n, = 36, taken from a differcnt nornral poptrlation with a standard dcvlation Gz = 3,

Solution:
/: / s
has a mean Rz=75.Fincl a 94')/o confirlcncc intorval lbr lrr- lt..r.

A 100 (1-cr)
s
% confidcncc itrtcrval for 1t,- 1t, is

tt p
h
Irrom the area table of ttornlal distribution, we havc Zy=
')
Zr.o.t= 1.88

Hence the 94 % conficlcnce intcrval for p, - 1t, is

(80-75)-1.88\H+ <Itr-1t, < (80-75;+ t'88\'F.


V
5-2.1 < pr-p:r < 5+2.1
2.9 < p,- pr < i.i
lCha 121 Statistical Inference Estimation
I
93

FF-"@
when the poptrltrtio, r,.r'iu,"c.. oi :rncl oj irr.., .oi, givcn, the1, itle cstirnitted by
ttre sample variances sf ancl s.j. 'l'hc populaLiorrs ulay or rnay not bc normirl. The
confidence interval {br (ttr - it:r) bcct_rmcs

,\FT q
VilI;i m
(Xr - Xz) - lir,'
Exumple 12,7.

c o
t .
Construct a 95 % confirlettcc irttcrval for thc truc ciifferencc bctwecn the

o
average time in breakdorvns of tu'o kintls of'rleviccs, given that a ranclonr sanrple
of

p
40 devices of type A on thc ilvcrilge larited 20ti hotirs of continuolrs Llse bctween

s
breakdowns with a standard dcviation of 26 hours, anrl thzrt a randorn sarnplc of
b0

g
devices of type B lasted on thc avorurge 192 hcurS lvit[ a stanciar.d 4eviatio, of 22
hours.
Solution:
l o
. b
A 100 (t - cr) % confideucc intcrval fbr it, - p, is

ls;
43
Gr--X:)-Zg1 l) o-.u,--tr!<1Xr-Xz)+Zg\
E?#
/1 n2
s:
-' ";

9
z \ rtr Il,: V n, r):

I{ere 40,n, =

t9 22, X, = 20t1, S, = 26., Si = OZA,

50,
n, =
ta = l g2, S, =

s
X., Sl = aga

/ /
-1 -0"t)5 cr. a = 0.01-r zrnd crl2 = 0.02b
cf,

:
I,'rom the area tablc of nonnal clistribution,*g have Zs= Zuuzr= l.g6

s
-tt
p 676
Hence the 95 7, con(jrlcuce ilitcr,val for'1t,- 1r. is

h
(208 i92)- i.e6 62(riB4
.lti I
< pr < (208
40 -p, - l-r0
192)-r- r.yu 1V
40 o-b0
< lrr - ll,: < 16 + 10.1
- 16 10.1

5.9 < lrr- p: < 26.1


12.9 CONFIDENCE INTERVAI, ESTIMATE I]OR TII}' DIF]I.'EIiENCE
BETWEEN T\ryO POPULATION ME]ANS.POPULATIONS NOIIMAL
(SMALI, SAN{PI,BS)
of and ol Krroo',
When the populations arc norrnal ancl thcir v;rrianccs are knovvn, ilie lilrmuia
for confidence interval for (pr, - pr) for srnail sarnplcs is the.. srrrne ;rs ilrr large
Basic Statistics Pqtll
94
intervals in t'heir Proper +rder, the
interval is
samples. To keeP the different confidence
is written here again' Thus 100(1 -c,,)%
highliehted in this section and
inierval for (P, - tl, it
1lJ+:
E--o:
nz
\''

om
t::T:::-:::'"::.,:"1:
c
the random variable
When of and o! are not known'

.
3

t
the
.",oJffJ;#;;";;;,:;;; i;;i no,n,ur popur.tiorrs, wiilr smarl sampre sizes'

o
we
with (n, * n' - 2) dcgree of freedom' IJut

p
statistic (X,-Xr) has the t-distribution equal
the variances of the populations are

s
another assumption that
have to make
o2 which is common for both
the

g
i.e., ol = "! ";ituy)'
= The pop'f i*-t"'iot"u

o
pooled estitnator sf where'

l
populations can be estimated by a

b n,*nr-2
si
.
(nr -1) SrrjtX&
si + (nz t) E(xr - - Xzt'z
=
3
p
"2= *nr-2 n,

4
samples of small
variances. Thus, if independent
s! and s2, are the unbiased sarnplc

9- 9
populations ivith o7 = o"the statistic 1x'-x')
has

t
sizes are drawn from the normal
* ,z 2) degree of freedom, where

a
the t_distribution with (n,

s t
: /
/n. *2
j+j a2

s
n,,

p
t.

h tt 11
nl n?
taiztrt.t.) t=0
Figure 12:5
ta/2(d.t.

when nl = n2 = n, we can write'

- Gr - pz). iandor"n v-ariable t will fall


L--
*
(Xr - Xz)
The probability is (1 =' ct) that the
lz
,,\"
and t!tn,+nr-2)' The probability staiernent for the random
between - tltn,+nz-2)
variable t is:
P q (,, + n't,- 2)< t
< t] tnr +
= 1.-o
"'- "]
[-
[Chapter 121 Statisticat Inference Estimation 95

Now, we directly write the 100 (1 - a) % confidence interval for (p, - p) which is

m
(Xr-Xz)tts E I

o
' 7,,r+nz-2)t'!*.;
Eromple 12,8.

t . c
o
The following summary statistics are recorded about the strength of two types
)
of synthetic rubber.

p
I

s
) Type I n,=16 f,r = sr 4.4

g
15.3

l o
Type II Il.r=9 X,, = 13.8 s, = 3'9

. b
Assume that the distribution of strengths for the two types of rubber are

3
normal with equal variances. Cornpute a 99 % confidence intervai for the diffcrence
I
4
Pr - Pz.

9
Solution:
s

fl9
A 100 (f - cr) % confidence interval for p, - p, is

- Xs) - tir,r t,t


a,U.:,
-l ' *

t
(Xr (X, t;,,, ,r
!A ,L Pr - P2
' - Xr) +
1ffi
Heren,
/: /
= 16, X,
s = sr = 11.4,
"?=
tg.ge

s-
trz = 9, Xz = 13.8, s2 = 3.9, ,?r= tS.Zt

tt p - t)sj _ (16 -
(n' 1) s?+ (nz t)19.A6l(s
"p_
., - 1) 1b.21 _ 290.4+12L.68

h
nr*nz-2 16+9-2 23
_ 412.08
Zg = 17.916, ., = .,/l?ffi = 4.233
1-o = 0.99 or o = 0.01 ancl
fi =0.05, v=ir+n2 Z=L6*g-!=!J
From the t-table, we have tlt"t = torx,r,(g,lr = 2.g07

Hence the 99 % confidcncc interval for 1t, - p, is

(15.3-13.8) -(2.807)(42rr,1fl.* < pl - p2 < (t5.8-18.8)+(2.80 7)(4.zs))


llI
f.5-4.95 < Fr -lr. 1.5 + 4.95
\G* e

-3.45 <,Fr - ytr< 6.45


IT

96 Basic Statistics Paft-II

1,2.L0 CONFIDENCE INTERVAL FOR THE DIFFERENCE BETWEEN TWO


POPULATION MEANS.DEPENDENT SAMPLES
(PAIRED OBSERVATIONS)
Suppose we give a test to a sample of students and the rnarks obtained by them
or. d".rot"d by X where X takes the values X1, X2, X3, ..., X,r. The students are given
sofire extra coaching and again t}rey are given the test of the'same difficulty
and the
by them are denoted by Y where Y takes the values Y,, Y2, Y3, ""

m
*;rr.r
"rrtuir"d
yrr..The marks obtained in the first test are calied 'before' and the marks obtained in
the second test are called 'after' observations. These two sets of marki are in
pairs

c o
Iike (X,, Yr); (X2, Y2), (Xs' Yr)' ... (xn, Yn) ancl are called paired observations'
t .
o
obviously the Y values depend upon the X values, hence the'fcrm samples are dependent'

p
Let us write the paired observations in the. following and calculate the

s
difference d; for each Pair.

g
xi Yi dt=Xi-Yi

l o
xr Yr X,.- Y, = d,

b
x2 Y2 Xr-Yr=d,

.
x3 Y^.) Xr-Yr=d,

43
:

;"
9
;, X,.-Y,r=dn

t9
The mean of 'd' values is clcnoted by d where d = Idi/n' We can think of a

a
population of X1 and Yi observaiions with means p, and pr, and the population of

t
random differences di with me an ptn and standard error od. It is required
to calculate

/: / s
the confidence intervai t'ir the mcan po. The distribution of d has the t-distribution

s mean pn and standarrl The random

tt p
lr''ith (n - J ) degrees of freedorn ft.
".ro,

h
valiabl.e I can be transforrned randomvariabletn'here , = -1-P
o6/!n
The standard deviation on'is unknown and is repiaced by its sarnpie estimate
so

d-u.,
wnere so
U = T Thus t- +
so/!n
. The random variatrle li.es between
-o'=.
II _

-ia. ,. and tq,- 11


\\'ith a probability of (l-cr). We can write the probability
;(n-lr ,\ir--

st e're mcrrt -1=,-'


,
I)r <t<ts.("_r)] =l-cr "rrl--*
L ';,"
.['E.,.,
t
ffi'
- t,
ttt"- t,,r= t - o
L'
Chapter 121 Statistical Inference Estimation
,IheteymswithintIrebl.ackcLscanbcwt.ittcnas:
T
rrla- s'r r , s'r
I
'L-- ,,a6'1t,, <rl * t'!,,, ,,i"J =l-cr
Thus 100(1 - cr) % confidcncc intcr.val fbr is
",,' pt,,
I
.,lI

G
m
Example 12.9.

o
The following data give paircd yiclds of trlo varieties of wheat. Ilach pirir was

. c
planted in a different locality,
Locality I 2 4 i) t)

ot
p
Variety I ,,10 25 D' 43 46

s
Variety ll 47 27 33 4r) s2

g
Compute a95 % confidcr:ce intct'val for the nrcan difl'crence betwecn tho yields

o
of the two varieties, assutning thc difl'crences of l,iclcls to be approxirnatcly nc-rrmally

l
distributed

. b
Solution:
A 100 (1 -s)
3
% conficloncc irrter.vnl for' 1r,1 = lrr* lr, is
d

9 4
- tt,, - ,, fr < lt., < ri' + 6o,,, .. ,,.ft

t9
The neecssa tion rrlc givcrt bcl
errlculatrons bulow:

a
xl 40 25 37 ,l ll 46

t
x, t)1

s
,17 33 40 52

//
d1 = X1-Xg _r) ,l :6 Xcli=*g

:
il

?Edi8 s
I d? 49 tl 16, 3(i )-dP = 114

=; p =*B =
tt -ry] =*[,,,-.#l =]rror,pr
ct _1,0

.;h=S[ro1 =15,
cd*6,08, l-u=0,060rq=0,0Ear:rl
7 - 0.086
Ilrom the t.tablo, wo lrnvo tf
trr_ rr
- toorng) = g,TT6

Hcnee the 96 % eonfirlsnee interval for p,1 = 1r, =- 1r, iu

* 1,6 * 2,17(tffi . rrd < .: t,G + 2,77(,


1,6 - (i,?4 < lrd < : 1,6 + 0,2,1
ffi
- 7,84 < p,1 < 4,04
98 Basic Statistics Part-II

I.2.11 PROPORTION
Suppose a population is divided into trvo groups. Thc observations in thc first
group a.e ca[ea 'iuccesses' and the observations of the second group are called
failures'. For example the people rnay be divided into literates and illiterates. The
proportion of successes in the population is defincd as

om
This proportion is rlerrotccl by p. 'lhe proportiort of 'failures' is dcnotcd by q and

c
q = 1 -p or q + p = 1. Let us see how q * p = 1. Let N dcnote the total num'ber of
observations in the population. Wc have,

t .
o
N = number of successes + number of failures' Divide both sides by N
N _ number of successes + qumber of failures
N=N
s p
number ol-quggglqcs ,
og
nunrber of failures

l
-

b
^^

.
I = p+q

3
Suppose a random sample of size n is selected from the population. Let there be

4
X successes in the sample. The ratio X/n is the samplc proportion and is denoted by

9
random variable'
0. fh,rs 0 = )Vn, where $ is randorn variabie and X is also
POINT ESTIMATE
t9
ta
The sample proportion $ calculated from a sample is the point estimato of the

L2,I2
/: / s
population proportion p. The statistic ff is unbiased estimator of p' Hence E(0 ) = p'
CONFIDENCE INTERVAL ESTIMATE FOR POPULATION

s
PROPORTION p (LARGE SAMPLE)

tt p
Suppose a population proportion is p whiqh is unknown. A random sample of
size n 1" , eOl is selected from the population and sample proportion 0 is calculated,

h
The gtatistic 0 is the eetimator of p. The dietribution of 0 is normal with mean

1.0 = p and etandrra ,rro,


1ff, Thue the ranclorn variable fi can be transformed
A

into random variable Z, where


.l = +, Whcn n is largc, the termc p and q in
/P-g
"!n
the dcnominator ean be replaeod by their cample cstimatec fi and A Thuc
0-p
a=Tn
!+
lChapter 121 Statistical Inference Estimation

We take tu'o points on Z - scaie.


These are -2, andZ,!.'l'he area of the
22
normal curve between - Zs and Zc is
(1 - Z wtll
cr). The random variable
fall betrveen - 'fly and Zs with a ll

m
I

- Zan Z=0 Za12

o
probability of (1 - o). This statement
can be expressed in the follorving Figure 12.6

c
I

t.=1-cr
form:

o
,l-r;.2.2;)= r-cr or pl +.ffi .rr1
s
Op I
The terms within the br*ckcts ctttt uo f rittun

og
l
I r;; [;l
"*V
oL$- qliu<p<$. ,;\i#l=
b
r-cr
Thus 100(1 - cr) 9/o

3 .
confidencc intcrvai estimate for p is

4
[Tn /^"
fi- z;VT $* z;V+
9
and

t9
For 95 % confidence interval we have c = 0.05, alL = 0.02 5 and Zo.or, = L.96,

a
/""
-Zo.or, = -
s t
1.96..Thus 95 % confitlence interval for p i, 0 - f.O6lf ,"a

f/+ //.
s:
f^^,
fi + t.96 For most probable confidence lirnits we take Z = 3.

tt p
Exomple 12,10.
A random sample of 200 persons from a city was intorviewed and 50 of them

h
were found to be literate. Calcdlate a 90 % confidence interval for the proportion of
literate persons in the city. AlBo calculate a confidence interval for the proportion of
illiterate persons in the city.
Sohttion:
A 100 (1 - a) Yo confidenee ittEerval for p (liternte perrions) is
/aa
fi-'t1,I# <p<0- ,t!T
/an i

x 50
I:feren = 200, X = 60 (nurnberof literate pelsono), $ =
n 200 = 0,26,
0= t-0= 0,76,1-o = 0,90 or o = 0.10 and ul2 = 0.06
F rom the area table of nsrmal distribution, we hrtvo %!,
2
= Zn.on = 1,646
100 Basic Statistics Part-II
Hence the 90 % cgnfidencc intcrval ibr p (litcratc pcrsons) is

o.zb -, unr\m < p < o.2rr + l.Gtb{@Ho D


0.25- 0.05 < p < 0.25 + 0.05
0.2<p<0.3

m
Also

o
A 100 (1 -cr) % conticlence intcwal fbrp (iiliterabe pcrsons) is

i' - 2,,'\
; VN
/Pg .o. 0 + 2,,1
zVN
/u
t . c
Herci'r = 200, X= 150 (nriinbcr of illiteratc pcrsons)

p o
s
X lI;0
^ =* =
Yn200
rf **
=g.,Ib,0 - l*il= 0.2t-r
I

g
t
,

o
Hence the 90 % confidoncc intcrval for p (illitelatc persons) is

i,6rb!'OqFi,
bl 6rD^F##
.
0,75- < p < 0 Ti + r I

3
0.?r'r - 0,05 < p < + 0.0Ir

4
0.71-r

C),7<p<0.8

9
12.1A EONFII}ENCIT IN'I'I,BVAL NSTIMATE ITOIT THE DIIi'FERENCE

9
t
BEI"vvEENi T'WO .FO I, U I,A'T I ON II 11 O IIO II'I'I O N S (LAR G E SAMP I,ES)

a
Suppose tlrcre ttre two perpulutionu huvirtg lrrupr,r'trons p, nrrrl p, which are

t
unknown, It iu rcquired to cak:ulnte the cerrrfirlcrrce irrtervrrl fol tlre diflt,renee

s
/: /
(pr - p,,)l 'l'wo ineleponrlent raurlor:r surrrpleu oi'sire n, lrrrl r1e ars nerleeted frorn the
pe,pulationn nnd tlre enrnpk pnrpoliions rlre r:nlrulrrtod which ill'e 0, nnd 0,

s
respcctively, '['hn statistic tff, "- ij,,l in crr[irnntttr of Ilru purnrrretcr' (lr, : pr), When n,

p
t
h*t
rrnd n, aro lnrge, the rnnclorrr vtlurblo (ii, :. ilr) l',*,, tlre' rrorrrral dietribution with

lnean (pr ps) ancl stnndnrd rjll'0t . 'l'hr, xtnrtrlnrel norrnal rnndom

(tt
vnrintrlo Z ie rvritton ae Z = , 'l'lrc plobnbility rs (1 -- u) that the
l)rQr lrr(lc
11
r 11*

l{lndonl vlriabls Z will tnlte ort {l vnlur: heh,{1ron - Zrl rinrl I


rlr'! 'l'hia strrtenlont can bc
rvrittrrn an below:
+
It f ' Zy< Z':
L*TJ
Zy7 =l"-u rI' I, it; 'f,t l
tlttliT ll:li
.r{ =1--c
l"
t 11
I D3
Cha Statistical Inference Estimation 101
The terrns within the brnckcts can be written as:

u[,U,- fin - zi P rQr-;,=.t)r


PzQi
;; -Pz"(0,-0J+
Thris 100 (1 - a) % confidence interval estimate for (p, p,
- is

(ff, - $J - ,t\F.T< p - pc < (0,- 0, * ,;VT-Y


But the termB Pr, Qr, p, attd qr al'e for the populatione and are unknown. For
om
large sample sizes, they can he estirnaleel by their sample ostimatee which arc
ff,, ff,,

t . c
o
$, anrl Q, respucti'clv.'l'hus Lhc co,fidcnca lirnics for (p,- py) are:

p
,AN AA

s
* /P rQ r
'Ls'\ l.:- PzQ,r
Lower Limit (], - $r) - +

, ZtV? - Tg
-

o
f"-i-

l
A A

Upper Lirnir = (ii, -


b
6,rl
Example 12,11,

3 .
4
Cortsider twn paln r*lieving drugs compared on two independent samples of

9
1000 indivrduals eaclt, Sul,pose ?$0 of those rnclividuals receiving drug I and 800 of
lI
9
those reeeiving drug reported sorne 1:ain relief. Construct a gO o/o confidence

t
interval for the difference betwecn popuration proportions,

ta
Solution:

($,-0J- /s +ff.Pr-Pr'tii'-0r+
A 100 (1 * a) % confidence intenal for p, ps is
-

:/f
AA NN
P rQ r PrQz PrQr , Pz{z

s
ze Zs
Il1 ll2

tt p
2 2

Heren,= 1000,
* ffit Xr =?s0, fl, = = = 0.?8, 0r=1-0r=0,28,

h = tt, 1000,
x2 X,
- ffi
* 800, $, = =
Boo
= 0.80, 0r=1-02=0;20,
1* a = 0.90 or, o *
0.10 nnd ttlL = 0.05
Froin the alea tatrlo.1'nornrll rlisir:ibr.rtir:n, we havezi= zs,sa = 1.646

Herrce the !)0 ?6 conliclunce intcrval for p, p,


- iB

(0'?5-0'8'.'o,uffiaoL-@a<Fr*Pc<(0'?6-0.8)*,'unum
0,05 - 0.03 < pr-pz< - 0.05 + 0,08
0.08<pr-pz<-0.02
Basic Statistics Paft-II
102
sHoRT DEFII-ITI-ONS
Statistical Inferencu ,,
reach ^_^r..^:^-- about a population based on
The process by which decioio-n make,rs 1l11lusions
;;;;1" information collected from the poprllation'
or
prediction' or generalization about the

m
A statistical inference is a decision, estimate, sample'
1

in a I

o
population based on infortnation contiined
J nx;'nation
Estimation is the process by which
t .
we nttcrnot
.,hi^h *.'p atternpt
c
to dctermine the vaiuc of a

'o
population parameter from samplc inforrnation'

p
"""*^:-- or 'or

s
o*'nrrf rrnkno'n untru I
Estimationisaprocessbywhiclrrvegetinformationaboutunknorvnl

g
population parameter by using sample values'

o
l
Estimate

. b
O""r,t*rteisthenumericalvaluecalculatedfromsampledata^
or

43
An estimate is the numerical value of the
estitnator'

9
Estimator i based on the

t9
Anestimatorisu,,l@ttellshowtocalculateanestimate
measurements contained in a-Bam ple'

ta
s
an
to use the sarnpie data to estimate

/
An estimator is a statistig.{hat specifies how

/ 'l -'-""^^'
I

:
unknown parameter of the population

s
PointEstirnate
p
- nn cstittttit'c 0f the p*pulation uu'"*utu'l
A point estimatc.ic a numbel$o*tnting

t t camPle,
based on a

h
or. I

Apointestimatoconsistsofasinglesatnplestatieticthtrtisusedtoestirnatethe| I

Parameter'
true population
Interval Estimate r rr- value of ..r tho
+r"., r
is the range of values within which the^.--^r-.^ Rarameterl
An interval estimate
is expected to lie'
or
within which the true value of the
An estimate expreaoed by a range. of values to ae an interval estintate'
population puru*.i.ii. llfiuu.a ti tio, ie rcfeged
Error df Estlmation the crror oi
parameter is called
fha.didtansc betwecn an eetirnats and the estimated
eetirncttbn
I t )',1r* r4 _5!a llstica I Ir fg ren
u..vv-Lr!.t..qLt.,.l
ce-Esti nr ati on
/ ;__-__;:.."_.
Unbiased Estimator
_. _ _ 103

An estirnator i.s unbjased if its expected value


rn
i.s equal tc' the population parameter
being estirnared.
,\
l" of it popnlarron paramcter is said to be unbiased --
,"f "..Tilhtor
g.\distribution is cqual if the rnean'of
- its
/samplin io the parametcr

m
/ Biased Estimator i
"-"

o
If the rnean o? tf," ostirntrrolis

c
equal to the population parameter, the estimator

.
a, is said to bc biascd. I 'ot

t
"

o
/ or"

p
An estimator 6 i* said to be biasccl if
the expectcd value of the estimator is not equal

s
"fi ,h: popuiation 1-.arameter being cstimaterl i.e;

g
I T E(6 ) * O.
Confidence fnterval

l o
A conlidcncc ittte^'ai is it t'ange o{'r'alucs
rvirhin which tlrc population paranrcter is

b
I expected to occur.

An interval that esrimates a popularion


3
or'
.
4
para'leter within'a range of possible values
with a specifiecl probtrbilitv"

9
he Confidence Linrirs

t9
The two endpoints of a conljdeni-:e
intcrval are called confidence limits.

a
Level of Conficlence

t
The probabiiity t]:at thc population pararneter

s
an is included.within the confidence

/: /
interval is called the level of confiaon"o.
oI

s
The probabilii'v r'f t:orrectly rrcc.pting
the nuil hypothesis (1 - c,), is callecl the level of

tt p
confirlcrrc*.
De gnr es of I,'r.c rr El nr r

h
Deg'ees of' frcl'rrortr *, rl:,, rrLlr!rrr.r
ot' va.rucs tlrot ar,g li'r:o to vary al'tor w0 lravc)
;he placecl certain rc$trrr:t rrjr]. rJl,()1
tf,1', aof,u

*-- * -rtlllr.il].Plll_QHgJqE 9UESUAN,S_


1. 'Ihr: prrice:r r:i ;;; ,-r; . .rr*,il;
e..rtlrnates abi.,ut the population paramctcr
saurpie is cailerl: I'r'orn a
(a) statrstical inrrr,;r,'r,ir.r'r,:c (t
(c) ) st,ti.sticar i.f,crenco
statrstrcal h.v,,;tl;, srs (cl) st,aris[ical decision
2, Statistical inferericc lrirs tv,,o branches
nnrrrcly:
(a) level of conficier::e nncl degrees of fi.eeclorn
(p) biased estimator and unbiased
estimator
(c) point eetimate ancl intervnl estirnate
-. (d) estimatinn of paramuter nnrl testing
of hypotheeis
I

Basic Statistics Paft-II I


104
3. Estintrttion is of two tYPes:
(a) one sided and two sided (b) type i and tYPc -[1

(c) point estiniation and interval cstitttation


(al biased and unbiased
the pirrarnett,r' is called:
4. A f.rrnula or rule ueed for estirrrating

m
(a) estintation (b) cstitttittt'

o
(d) itltcrvtti cstitrtute

c
(e) eiitimator

.
valtte is enlled:

t
D. Stntistic is an estimator and its ealctllated
(b) estit:rut.tr:n
o
(a) biased estimate

p
(c) interval eetimate (d) eBtimate

6. ltrstirnate ie the observed value of an:

g s
(b) cst,itttii[rlr

o
(a) untriuecdestimator

l
(c) (d) itltet'vitl t'sIitrtntiotr

b
estimation

.(b)
thc valrtrs o1' unk'tttllvrr poptrlatitln
The process erf using samPle clata to estittlitte

3
1.
parameters i.g callcd:

4
(a) estirnate cstimnttrr

9
(c) (d) itltervtrlestimate

9
estirnation

t
fi'tlm thc srrnrple for population
8. The numerical value which we rleterrnine

ta
parameter ls called:
(b)
s
(a) eetiruation esiin"liltc

//
(c)estirnator(cl)csnficlencuci:r:ffiticnt

(a) s: estimatc
value is eullud:
9. A single value used ts estimnte a populatirtn

(c)p
ft) puirlt csttrttittti

t
interval

t ccufidenee interval (d) ievel of confidener'r

h;6;;;;-ier
lhe value
sarnple data arrd it i$ Iikely to contain
10. An interval caleulated from tlre
with noure prohabilitv ie calied:
(a)interval eetimate (b) point estimul't:

(c) level of eonfidcnce (d) dcgrees cf freci{o'n


is
populatiCIn paratneler is experete el to otrt:ur
11. A range of values within whieh the
caller{:
(a) confideneecoefficient (b) eonfid*trce intervnl
(c) eonfidcnce limits
(d) level of signifieanee

The end points of a confidence iRterval are


e alleel:
L2,
\ (a) conficlence eoefficient &)confit{ence limits

(c) error of estimation (d)paranreters


lChaplel-lzl St?tislicatlnference Estimation 105
,u l;;
(a) level of confidence (b) conficlencecoefficient
(c) bcth (a) and (b) (d) confidence limits
14. If the mean of the estimator is not equal to the population irarameter, the
estimator is said to be:
(a) urrbiascrl (b)
m
biasccl
(c) positively'biased
o
(d) negatively biased

. c
15. If 6 is the estirirat'r'of the pir'rrmercr

t
0 , then 6 is called unbiascd if:
(a) Et6l , o
o
(b) E(61 ..e

p
(c) E(0)n * o (d)
s
E10y = o

g
16. Estimates given irr tirr: i'crrn of confiriencc intcrvals nrc called:
(a) pci,t estiilirrteo-
o
(il) interval estirnates
(c) ccrnfidenr:r: lrrrril;,r
l
(d) Cegrecs of frecdonr

b
.
)n
L7. (1 - c) is cal]*:,J:

3
(a) criticai veii.is (b) level of significance
(c) ievel r;I c,.r:ti irjr:nr:c

9 4 (rl) interval estimate


18. if (1 - cr) is ilcreese.d, i,ha rvrdth of a conficlence interval is:

t9
011
(a) decreased (b) increased

ta (b)
(c) consran"u (d) same

s
19. By decreasing the sample size, the conficrence interval becomes:

/: /
(a) narrower wider

s
(c) fixed (d) all of the above

tt p
20, Confidence interval becomes narrow by incrcasin.g thc:
(a) saniple size (b) population size
h
lue
(c) level of confielence (d) degrees offreedom
21. By increasing t}:e eamplc sizc, the precision of confidence intee-vi:l ir;:
(a) increaserl (b) decreased
(c) same (ci) unchangcd
22. The distance lietlveen an estimate and the estimated parameter i* c.tilr:ql:
(a) sampliirf, r-.rrr;r (b) clror of estimation
(c) bia s (d) standard error
23. ?he numl'rer of values that are free to vary after we have x;J-;,tced certain
restrictions ulron tire data is called:
(q) degrecs of freerlom (b) confidencecoefficieni;
(c) nrrmber of paraneetersr 1d) nr:mber of samples
Basic Statistics Paft-II
106
24. A95o/oconfidenceintervalforthemeanofapoptrlirtioiiissuchtlrat;
(a) It contains 95 % of the vulucs in the populabion population
(b) There is a 95 % chance that it contaitts all the vrtlues in the
(c) There is a 95 %o chancc that it contains the tnentt of tire population of the
(d) There is a 95 % chance that it contains thc stetndard deviation

m
population'

o
25. A confidence interval will be widcned if:

c
(a) The confid'ence level ilo increasecl anrl the riatnl;lt si'ze is reduced
(b)Theconfidence}evelisincreasedandtlrclsamplcaizeisincreased
t .
The confidence level is clecreased ancl the
p o
(c)Theconficlencelevelisdecreasedarrclthesarnplcsizeisincretrsed
stirnplc size is decreased.

s
io for p rvhen o is knorvtr' The
statisticiur, .ur.riut.s a g5 % confide.ce intervai

g
26. A
zzxxl,thc amottnt of the sarnple n]""" X

o
confidence interval is Rs. 18000 to Rs.

(b) l
b
is:

.
(a) Rs. 18000 Rs. 20000

3
(c) Rs. 22000 (d) Its' 40000

4
kuown, the r':onficlence Jntcrval lbr the
27. If the population standard dcviation o is

9
population rnean P is based on:

9 (d)
t
the poisson distribution (b) ,. the t-distribution
ia)

a
tirc normal d'istribution

t
(c) X2-distribution
the

/ s
28. Ifthepopulationstandarddeviationo.isunkno*:,,andthesamplesizeis
populaticn mean p is bascd on:

/
small i.e; n < 30, the confidence interval'for the

s :
(a). thet-distribution - (b)
(d)
the iormai distribution
the hypergeotnetric rlistribution

p size
(c) the binornial distribution

(a)tt
upon the:
29. The shape of the t-distribution clepends

hiitn"parameters
sample 0) po1:ulation size

(c) (d) dcgrees of freedom

population standard deviation o is cloubles, the


width of the confidence
30.
p (i'e; Lhc upp.er limit of the confidence
interval for the population "''9.a.n interval) rvill bc:
iii.rr"f - lower liinit of the corrfidence
(a) divided by 2 0) rnultipliecl f'v rE
(c) doubled , (d) dccreasc

31. The follorving statistics are unbiased estimators:

(a) the sanrPle mean (b) bhc s*rnple varlance *'= IT#
(c) the samPle ProPortion (d) all thc ltlruvc
Chapter 12J Statislical Inference Estimation
32, A statistic is an unbiased estimator of a parameter if:
(a)B(statisric) = parametet. (b) E(rnean) = variance
, (c)
Il(r'ariance) = mean (d) E(sampre mean) = proportion
33. Which of the following is biasccl astimator?

(b) 0=*
(c)-'=$ g' = I(Xit' (d)
om
34. If the observations are pairccl and thc number of pairs is n, then degrees
of
t . c
o
freedom is eclual to:
(a) n
p
(b) n -1
(c)'nt+nz-z
s
(d) nt2

g
35. If n = 0.10 and n = lb; tg equais:

o
2

l
(a) t.761 &) 1.753

b
(c) t. t't L (d)
.
2.t45
36. If n, = 16, Dz = I

3
and cr = 0.01; t".) equals:

4
(a) 2.787 (b) 2.807

9
(c) 2.7e7 (d) a.767

9 (b)
In t-distribution for two independent samples nr =

t
37.
freedonr is equal to:
=', 'z
then the degrees of

ta (d) n-1
(a) 2n -1 2n-2

s
(c) 2n+ 1

/: /
If 1 - cr = 0.90, then value of Zsis:

s
(a) 1.96 (b) 2.575

tt p
(c) 1.645 (d) 2.326
If the populationstandard deviation o is known and the sample size n is less

hX* Z"*
than or equal to or morc than 30, the confidcnce interval for the population
mean p is;

(a).
' (b) X* Zr+
z{n 2Vn
rfr,
(c) --S
X* tc(--F (d/ p+
z" 1 /p-q
zVn ivn
{0' If the population standard deviation o is unknown and the sample size n is
' greatcr than 30, thc confidence interval for the population
*urn p'i.,
(a) X* (b) x*zo*
uvn (c) x*to* \-/ x*tn#
zVn (d) "- TV,
' Basic Statistics Paft-II
108
4|.Ifthepopulationstandarddcviationoisunknownirnd|he.samPle:1i::isless
for the population mean p !s:
than or uq.,d1o go, the coufidcnce interval
s (b) x*t**\n
\-/ X+ Zs*
(a) 2 !n
2

sd
(c) X+to7 (d) b*tqor-z)su

m
2!n

o
interval for population mean pt when
42, A student ."t"rtutu* a 9O % confidence

c
u"a ,, - 9. the confidence interval

.
standard deviati*

t
population ";;;;f.,',*n

o
s to 64.3 cents, the samPle tnean
X is:
(b)
p
"- 24.i1
(a) 40

s
(c) 64.3 (d) 20

g
proportion p is 32'4
o/u to 47 '6 "/o' lhe
43. Ag5 %confidencc interval fbr population
value of samPle ProPortion $ is:
l o
b
(b) 32.'1"/o
.
(a) 40 %
(d) 8o o/o
3
(c) o/o
47 .6

4
44.Ifwehavenormalpopulationswithknowrrpoptrlationstandardcleviationsol

9
and"o2,theconfidenceilrtervalestirnateforthedifferencebetweentwo

9 ft) tx'-x,) t'r{i:.;,


. population means (Pr -'PJ is:

t
Pvr/grqw4v.....-:---:.,;_-fz"-z r;:.----.

a
(a) (x, x,.,*fH*
t
// s
s :
tt p
45.Ifthepopulationstandarddeviationsolando"arettnknownandsamplesizes
interval for p'
is: - Ir'

h(a) G,-x,) * z*\/H


Illr
n1, D2 / d\.,,
Il2 > -9!"'lfidence
30, the 100 (1
ulrs avv \r tTIz EE
C.+e (b) -x,l .
zv ';\.* rx'
E-.i
' n, .*,-r,) - r;\fffi (d) 1x'-x').';!{."2
cstimate of a population
46. If the sample size is llrge, the confid'ence intervai
nronortion n f;
Xr: i*z!\;
(a) xtz;ft (b) /k_q

(c) (d) fi * zu{iffi


$ *,2s. 2
2
[Chapter 12] Statistical Inference Estimation 109
47. If n,, nrX 30, the confidence interval estimate for the di{ference of two
population means (p, - pr) when population standard d.eviations oi, oz are
unknown but equal in case of pooled variates is:

(a) (Xr- X,) + Zs


/s? s;
f (b) (Xr - _ r r8,,,,1
Xr)
E=
/= * :1
z\v, Vr, nz

!,, otim
i) -,
(c) (d) .
- X2)
't*,t'\m 1x, - xrl ,; .

.c
(Xr +

48.
t
The confidence interval estimate for the difference of two population means

o
trr.- trz = pa in casc of paired observa[ions small sample (n <.30) is:

s p
(b) r{*5i,- r)tt
Si

(d) lo
f2- g
b
(c) tg,"-,,!T&
.
d+ b+ts(n_z)sr,

fi.^ ^"4
3
49, If the sampie size is large, the confidencc intervai estimate for the dii'fei:erncre
between two population proportions p, - p, is:

9 9 fi;.;"
(a) - 6, ze1 /+.
0,) +
i v nr
a t +tn, &) zg\ lry+*t
-, v
(0, - fi,r + nr'tn,

s t
(c)

:/ /
*zrvfis,+
1fi, - fi,) $r0,

p s
e. ht
r. t
(b)

(b)
2.
10.
(d)

11.
(a)
4.
L2'
3. 5. (c)

(b)
6.
14.
7.(c)

15.
(b)
8.
16.
13.
(d)

(c)
(b)

(b)
(c)

(d)
(b)

(b)

17. (c) 18. le.


(b) 20. 2t. (b) 22. 2s.
(a) 24. (a) (l)) (a) (c)

25. (a) 26. (b) 27. (d) 28. (a) 2e. (d) so. (c) 31. (d) 32. (a)
33. (d) 34. (b) 35. (a) 36. (b) 37. (b) 38. (c) 3e" (a) 40. (b)
4t. (b) 42. (d) 43. (a) 44. (b) 45. (c) 46. (b) 47" (c) 48. (tr)

49. (a)
110 Basic Statistics Paft-II

SHORT QUESTIONS

1. Given n = 64,I = 42.7,cr = 8 and 2,l,)= 1,6,15. Find the confidence interval for p.

Ans. 41.1 < pt < 44"3


2, Givenn= 16, X=52.5, o= l0and 1-a=0,90, Cornputethe 90%confidcnce

m
interval for p.
<p<56.6

o
Arrs.48.,1

. c
3. GivenN=500, n= 100, I=60, o=5 ancl Zrl,rra= 1.96, Find the95o/o confidence
interval for p.
Ans..59.1<p<60.9
ot
4. n= lM,X = 750, S = 6 anrl ?1,0r=2.326.
Given

s p
Cornpute the 98% confidence interual for p.

g
Ans. T4g.BBT. ii *: ?5i.168
5.
l o
Given n= 16, X= 80, s= 3 and to.or(ro) =2.602. Constructthe 98% confidcnce
interval for population mean p.
. b
3
Ans.78.05<p<81.95
6.
9 4
Given n, = 32, n, = 50, Xr = 125, X:r = 100, o?= 16, o?= ZS and Zuuu" = 2.575-

9
Find the 99% confidence interval for the difference between population means
l-tr - Fz'

a t
t
Ans.22.425 < lrr - ytr< 27.575

7,
/: / s
Given n, = 48, Xr = 90, S? = lZ, n, ='12, Xz = 85, 53 = tA andZn.o2. = 1.96. Irind

s
the 95% confidence interval for p, - Fz.

tt p
Ans.3.6<pr-pr.6.4
8. Given the following surnmary statistics for irrclcpenclent random samplcs from

h
two populations:

n, = 16, Ir = 60, sf = 36, rz = 9,'Iz = 50, s?r= 25, s, = 5.67 and t,,,,, (x)= 1.714-
Find the 90% confidence interval for pr - lr:,.
Ans. 5.95 < pr * pr, < 1"1,05 i '
g. Givend = 3, n=9, sa= 3andto.u,,,1sy=2.306. Findthe 95%confitlenceinterval
for po - ltl - lr:r.
Ans.0.694<pa<5.306
L0. Given n = 500, 0 = 0.aO and Zco,r = 1.88. F-ind Lhe 9.4% confidence interval for
the popr"riation prbportion p.
Ans.Lr.26<p<03.1
11. Given nl = 400, 0, = 0.8, n, = 1100, 0g = 0.d and Zo.o4= 1.7b. Find thc g2%
!g
confidence interl'ar for the diffe rence between popuration
proportions.
Ans. 0.14 < pr - pe < Q,26
12' Determine the critical value of Z in each of the following circumstances;
'(a) l-o=0.g0 (b)1-s=0.92 (c)I_o=0.94
(d) 1-a = 0.96 (e) 1-cr =0.98 (0 l-s = 0.99
om
c
Ans.(a) 1.645 (b) 1.7b (c) r,r]8 (d) 2.054 (e) 2,326 (f) 2.575
13' State the formula which is used to calculatc a gb % confidence intcrval
t .
o
populatiorl mean p, when thc population standard for the
cleviati;, ;;r;;;*".
Ans.X- 1.96+ . r- p. X'-+ -'""
n t.e6+
s p
g
ri Vn

o
L4' state the formula which is used to calculate a g0 % confidence interval
l
for
the

b
population mean p, when the population standard
deviation o.is unkn'wn and

.
n=10.

Ans.X - t.33s+ .
3
vlo p. < X a 1.s33
4
-'""" $

9
vro
15'
9
State the formula which is used to calculat e a gz % confidence

t
inter:val for the
population proportion p, when n > 80.

- L.TIV+. p. ar.75VT
t
/a a 1""

s
Ans.ff +

/: / n=lZ
O

16' Determine the critical value of 't' in each of the following

s
circumstances:
(a) 1-s=0.9b,
p
(b) 1- o = 0.98, Dr = 10, nz= 12
(c)
t t 2.528 53
1-cr =0.g0, n=16 t (d) . cr = 0.99, r1r= 6, r: =6

17' If h
Ans. (a) 2.201 (b) (c) 1 .7 (d) 3. 169

X = 100, o = 8 ancl n = 6'1, sel up a g5 %conficlcncc interval


estimate of the
population mean p.
Ans.98.04 < p<101.96
r8' Stl,te the formula which is usecl to calculatc a g8 o/oconfid.ence interval
for the
difference between two poptrlzttion means pr -. pz, when populaiion
variances
are known with any sarnple size.

o; (v;
,)

Ans.(X,-X, -2.s26 < trr - pz < (Xr - Xz) + 2.g26 -|-


n, n2
Lt2 Basic Statistics Paft-II

19. State the formula which is used to calcr.rlrrte a 95 % r:onfidence intervai for the
rnean of a population of paircd differences for n = 9.
s; -- s,r
Ans. d - 2.3061 . L\r. a + Z.aOo ;
20. Distinguish hetween point estimnte and interval estimate.

m
2L. Explain the terrns estintate and estimator'

o
() .,
What is meant by csi;imation?
23. Itrxplain rvhat is rnearit by confidence interval.

t . c
o
24. Explain v,,hat is meant by unbiased esiirnator.

p
25. What rfui you mean by unbiaseci estimator? Give at least two exampitrs.
26.

g s
Defirre the terms point estimate and ini*:trvai estimate.
27. Wriie all the conficlence intcrvals for population mean with srnali r,n:d large

l o
samples, popuiation standzrrci deviation l:eing known and unknown.
28. What do you knorv about statistical inference?

. b
3
29. Differentiate between biaseti estimator airil unbiased esiiuratcr.
30.

9 4
What is mean! by unbiascdttcss?

9
3r" What is the procedure foilorved in {he constrr-tcticin of confirtgllqt' int'Et'v'all

t
Why is a confidence intervai cstimate of a pararneter is rnore useful tltau
ocl a

a
tl2t

t
point estimate?

/: / s
s
tt p
h
lchapter 1 2l st*iIllca I I nfelgnce Esti !.!gti.on 113

EXERCISES
l. An electricnl lirrn rnanufactures light bulbs that have a iength of lit'e tlrat is
rupl)t'clxrrnatcl,v nornrully disiributcrl witir a s[anrlarcl dcviation of 42 hours. lf'a
rntrdorn satnple of 4{} bulbs hrrs an avel'Hge lifrr of' 800 hours, find a 95 %
corrfidetiue itttervel ftrr tirc pupulirtion nroun of tll bulbs produeed by thrs firm.
Ans.788,?4<p<811.76

m
2, Finil a 90 % cunlidencc inte rvui fur thc rnc,rn ot;l nonuill tlistribut,ir:n if o = 2 and

o
rt satnplu of'sizc I grtve thc valuee f), l.[, i0, 13. ?, ll,
I l, 1?,
i Ans.9;84 < p < 12.1fi

t
8. Arr aclvertising agency want* to estimate the lverage ineome of fpm+figs loeated
.
y
c
o
I

in n partieular area uf a lorv.ineome Bcctir:n o[ Iinrae.]ri. 'lhcro lIUO00j,rrniliee

s p
in this area, &nrl the agency eirooses a randorn s:il)1plr.ut I00. The\reen rneome
c,f tltese fanrilies is lts" 1tt()0 per mtinth, Cleirnput,.r a 95 o/o confidence interval f'or

g
thr: population meatr, if tlre polrulation sttndarri clevitrtion is known to be Rs,

l o
900,

b
Ans" 1762.79 < p < 18S7,?1

3 .
.1. 'Ihe population o1' ,ecore$ of l0-year oid children in a psychologicul peltbrrnance,-
test is knorvn to,llavc a standarcl clcviar"ionT-,2. If a rnnclorn snniple of size !0

9 4
sherws fl meaR of 16.9, find a 95 % eonfidence interVal fbr the mean score of the
population, assuming that thc population i.s normal.
Ans, 14,(i2 < 1r < I0,18,

t9
a
6. r\ trre r-rranul.irsturcr rvnnts ts ergtinrate the rnearlryergirt of the tires produced by

t
one of its plants. Hri takes a ran;lqm samplc c,i lOb tires proctueed at thi* plant

s
/: /
and finels thut thc sanrpic mean i$&S.1 lbs. anrl tffiinmple stonciard cleviation ie
" 0.1.2 lbs, Calculute a 95 % confirlcnec intervsl for tlie population urean.

s
Ans""at*.Cff6 < pt < 48.124

tt p
6. Cornpute a 90 ')/o corrfitlence interval f'or the po;;nlut,ion lnoln, ifn* 36,
*
5400 and X(X * X)r * tZ$ti.
h
IIX
Ans. 148,1155 < p dt51,G,l'5 'x
1, The heights of a rrinrtorn sanrplo uf 64 qolle ge s[udents slrowed a mean af L72
ce nl;irnetera urtcl a strrndnrd rlovirition of 6,]-i cerrtirnetcrs. Irinda92%confielence
interva}forthelneanheielitoffi}icollegesturlcnts
Ans. 170,5?8 < [ < 1?[].4?2
8. The hourly wages of 144 workcre of a largc fhctory wqre reeoreled, and the
sarnple mesn and standarci dcviation were found to be lill" ?8.59 and Ra,6.'i1
re$pectively. Ilind a "99 % eorrfiricrrec intervul f'or rh* mean wages of faetorl'
workcrs,
Ans.2?.08<p<24.06
114 Basic Statistics Paft-II

S" A randorn sample of I


cigarettes of a certain brand has an average nicotine
content of 8.6 miliigrams and a standard deviation of 0.9 miiligrams. Construct a
gb Vo confulence inierval for the true average nicotine content of this particular
brand of cigarettes, assuming an approximate normal distribution'
Ans.2.91 <p<4.29
10. A certain machine is used co produce items whose weights are assumed to be

m
nbrmally distributed. Suppose that the variability in the weight of the output of

o
the machine is unknown. For a random sample of size 16, X is ibund to be 282

c
.
grams and. s is 8 grams. F'ind a95% confidence interval for the true poptrlation

t
mean.

o
Ans. 277.738 < p < 286.262

p
r..rf bool. lrrom the
11. A firm wants to estimate the mean lifetime of a particular kind

s
previous experience, it is known that the lifetimes are normally distributed' It

g
dru** o ,urdo* sample of four of these tools and finds that their lifetimes are

o
T.g, g.B, 10.8 ancl 11.4 years. Calculate a95 % confidence interval for the mean
lifetime of this kind of tool.

bl
.
Ans.7.35<p<12.35

3
f1.1. Arandom sample cf size n, = 30 taken frorn a normal
population vvith a vttriance

4
= g, has a mean X, = 75. A sccond randotn sample of size nr=
25, taken from a

9
o?

9-
different normal population with a variance of,--ZS, has a mean Xz = 70' Irind a

a t
98 % confidence interval for p1 lt2'
Ans. 2.4 < Fr - ytr< 7.6

s t
/: /
13. Two independent sarnples of 100 machinists arrd 100 carpenters
:rre taken to
estimate the difference between the rveekly wilges of the two categories of

s
workers. The relcvan
rele data are glveIlrn below:

tt p
Sample meaR wagc Population varianec
hinictc 345 196

h
Mae
Carpentere 840 204
put*ffig/,confidenc0limitsftlrthetruecliffer'encebetween
the average wages for rnachinists and carpenters. Which interval is rvider?
Ans. 95'% confidence itlterval: 1.08 < p, - ltz < 8'92
gg % confidence interval: - 0.15 < Fr-,tr< 10.15. Therefote 99Yo interval is
wicler,
14. A standardized statistics test wns given to 75 boys and 50 girls. The boys made
an average grade of 82 with'a variance of 64, while tlie girls made an average
grade of ?A *ittl o variance of 36. Find a 9S% confidence interval for F r - Itz,
where pl stands for tnean score of all boys and p, stands for mean st;or1;
of all

Ans. 2.77 < !r - P, < 9.?3


[Chapter L2] Statistical Inference Estiniation 115
15. Two independent random samples of the diameters of tires are drawn, one frorn a
batch of tires produccd at plant A., and anothcr from a batch of tires prcdticed at
plant B. The results are as follows:
Sample Sarnple sizc Samplc mean Sample variance
. (inches) (inches)2
"
Plant A 100 50.7 0.09
Plant B r00 50.3 0.04

om
. c
Calculate a 95 % ionficiencc interval for the differcnce between ihe mean

t
diameter of the entire batch produced at plant A and the mean cliameter of the

o
'entire batch produced at plant I3.

p
Ans. 0.33 < pr -- ttz< 0.47

g s
16. Suppose that for a ranclom sarnple of size 8 from population I the sarnplc mean
and standar.d deviation were 14.9 and 4.17, respectively while a random sample

l o
of size 5 from populatiou II yieided a sample m€an of 10.6 and a sample standard

b
deviation of 3.62 respectively. Assuming that the populations are normally

.
distributecl with equal variances. Cornpute a 90 % confidence interval for the

3
dif'ference between the population ineans.

4
Ans. C.22 < Fr - p, < 8.38
17. The following summary

99statistics are recorcled for independent randbm sarnplers

t
from two populations:

t
Sample I
a trr =9 nL = sr = 1.54

s
16'18

/: /
Sample II Dz=6 Xz = 4'22 sz = 1''37

s
tt p
Assunre that populabione are normal with the identical etanflarrl devintions.
Caleulate a 95 % eonfidenee interval firr pr, - lrr '

h
Ans. 10,28 < Irr - p, < 13.64
lE.It is claimed that a new diet will reclucc pu,'*un;, weight by S kiioerams on the
" of 4 men who were given this diet
average in a period of 8 weeks. The weights
were recorded before and after a 8 week period:

*-
t
Weights betbre

{ lVeights after
Compute a 90 % confidence interval for the mean difference in the weights.
Assume the distribution of we ights to bc' npproximately Rormal,
Ane. - 0.19 < p6 < 8,19
I Basic Statlstlcs Part-II

To compare two treatments, a matched-pair ,&Reriment was conducted with 12


1.9.
pairs of subjects, and the paircd differences lt tl"te response to treatment B from
*
ihe response to treatment A were recorded/2, 5, 6, 8, :- 6, 4, l'8' 12, L7, - I ' 16'
12. Construct a g5 % confidence interval for the mean difference of the responses
bo the two treatments.
t
Ans. - 0.98 < p, < 11'48 |

m
nO.A ranclom sample of 300 cf'garette smokere is seleeted and ?5 are found to have
a
v-/-

o
preferenee for'Gold le$f, Ilind u 98 % contidence interval for the fraction of the

c
population of cigarctte smokerg whu prefet' Gclld leaf'
.Ans.0,19<p<0.31
t .
o
tire ntatrufacturcr tlraws a randont sample of i60 tires produeod by a new
i7-.rZL,A
-- pro."*u
p
and finde that 40 percent wear better than required speoifications.

s
A;gtruct a g0 % confidencc intcrval for the population proportiori of the tires

g
p.uJr*eel by the new proeess that will wear better than required specifications.

l o
Ans.0.34<p<0'46
in 40 tosses of a
-/.ZZ,Find,a gb % eonfidenee interval for'p'if
b
24 heacts are obtained

.
coln.

3
Ans.0,45<P<0.?S

4
l Zl"Iu a random sample of 1000 hbmee in a certain city, it is found.that 228 are

9
heatecl by oi1. Find a gS % confidence interval for the proportion of homes in this

9
oil. /
t
city that are heated bY
t ./

a
Ans.0.194 < p < 0.2(i2

t
t; LA,A firm hae faetories in Karachi and Lahore. It picks a random sample of
100

/ s
workers from each fgctoryr In Karuehi, 32 percent say that they buy the firm'e

/
proeiuet; in [,ahore, 27 pe]cent, Construct a 95 % confidence interval for the

s :
clifferenee iretween the iroportion nf workers in Karachi and Lahore who
say

tt p
they bu)' tlre firttt's Protluet,
Ans. .- 0.08 < pr .. P, < C\.lti

h
16, A poll is takel alRorlg tlre resitlents of a city ancl the surrounding
semi'urban
a civic center, If 240 of
. ur*u to rletermine thelea,sibility of a proposal toofeonstruct
semi'urbans favour it, find
b00 city renidents favour the propoeU nna 120 300
a SE gi, confidcnce interval fcri the truc diffcrence in thc fractionu tavouring
the
propcisnl to con$truut t,he civic centrrl
Ans. * 0.20 < trr, * i)y <
* 0,04

tit t(o\'
.. $\I \',
''
Chapter
13
STATISTICAL INFERENCE
TESTING OF HYPOTHESES
om
18.1 INTBODUCTION

t . c
o
Statistical inference consists of estimation, of parameters
and testing of

p
hypotheses' Estimation has already been discur.u'd in
ihe previo.,, .t upi"" and in

s
this chapter our lessoh is about the testing of hypotheses. point
ihterval estimation as discussed earlier ti"r. tt*ir- own fields estimation and
g
of application.

o
Sometimes there is a situation in which the poini estimation

l
and the interval.
eetimation are either not required or the estimation rf;**;;;";;r';;t.provide

. b
anv inference. For example, the following situation. r"q"iib
l;f;;; *ij.r, is not

3
pbssible by methods of eslimation.
(i)
4
::ll,t-llt bfIna this
medicine have been changed to improve the effectiveness
''i P"
the medicine' of

9
situation both the point estimation and the t;;;;i
estimation fail to answer the question aboui the improvement

9
the meclicine.
: this case we have to take help from the samfie data to decideofwhether
t
In
or not

a
:the medicine has been imptoved.

t
(iri) A manufacturer of tires claims that the average

s
' kilometers' The life of tires is an important faitorlifetoofsettle
his tires is at least 15000

/: /
the price of the tires.
is a big information if we prove *iir, ,"".r""ui" amount
-It of confidence that the

s
life of the tires is not mo"e th"n 15000 kilometeis. The answer
is not provided

tt p
by a point estimate by an inrerval estimare or ttre-iire;l ;; ffi;. what we
-or
shall have to do is that we shall examine the claim
basis of the experiment conducted the r"rnpte or"rirr"
-"""i;;;r., on the
h
_on tires. A .""t*r, procedure
yill be adopted to reach some conclusion. This is what we shall call the test of
-
r,. hypothesis about the life of tires.
18.2 STATISTICAL HYPOTHESES
Any opinion or idea may be formed about the population.under
study. Consider
the following statements: Average con€umption or suga" per
month for a cdnsumer
I kg; Intelligent parents have, intelligent -'childreri, tall father. tr"r" -
tall
is
sons,
average life of the people of pakistan is higtrer than
that of ,India, proper greasing
increases the life of ceiling fans, use of coffee irt"ruur.,
chances cf
variety qf'seed is better than the other, a med.icine of allergy giveshea4 attack, one
relief ;;;1""r;
8o %o. of the people, more than 25 % people hre titerate
people will go to the polling stations foi voiing. rnu"u
ln'ffi .;;;;;, only 60 %
.iut"*urrt. thl'qu".tior.
tt7 "*
Basic Statistics Paft'II
rt8
of life and these questions are to be answered after
proper
in different field.s
;il;;;;;;. d;.u q,r..tions ha*'e come up in the Proc:tt",:.frY"t$il:"i:
iil:"iJ"ii;;'",'il" hypotheses are generated i,ring various studies. when an
about the clistribution of a
assumption. is explained in the form of a statement
population o* popuin-tions, it is callerl a statistical hypothesis'
In sirnple words' a
statisticalhypothesisisastaterncntabouttheunknorvnvalueofthepopulation

m
parameter. The staletnent may be true or false'

o
13.2.1 NUI,L HYPOTHESIS
hypothesis' It is denoted by

c
l

The hypothesis which is to be tested is calLed tt"till

.
A statement which we hope will be

t
l

Ho. It is a starting point in the investigations'


is different' Today any
o
rejected is taken as a hypothesis. Moclern approach
and is denoted by Ho' In this

p
hypothesis we wish to testls called null hypothesis

s
rvill be called null
book we shall follow the o-ld convention. Any hypothesis

g
we hope to reject it. Thus the null hypothesis
is framed for
hypothesis orty *t that tall fathers

o
"r,
possible rejeetion. Tail fathers have tuit .t la."n'
We shall assurne 'will be

l
tall children. This will be considered as null hypothesls and

b
do not. have
rejected on the basis of samplc data'

.
denoted by Ho. w;;; h;pi.,e tltut H" wlll be
To start rvith we shall assume that

3
Use of coffee increases chances of heart attack'
will be taken as Ho and we hope

4
heart attack fru, .r"lint *itf, tn" .rs, oi.offue. This

9
it will be rejected by the sample data'

9
L3.2.2 ALTERNAiTVE HYPOTHESIS

t
-"'lfivpoirru.i* been rejected is
which is accepted when the ,uli hypothesis has

a
called the alternrtl"" tvp"thesis. It is denoted
by H, or Ha' Whatever we are

s t
expecting from the sample data is taken-as the,altcrnate
hypothesis' "More than

/: /
hoping to get this result from the
2b% people are literate in our'.ounirf'. We are.
ri, nult hvpothesis Ho will be
sample. It will b" l;k;; u. un uttrrr,ril hypothesis

s
1na o/o or less
o/o orless than that are literate. To be more specific' Ho will be 25

tt p
that 25
o/o aYQliterate' It is rvritten as:
are literate and H, will be more than 25
Ho: p < 0.25 (25 o/t'or les.s) H,: p > 0'25
(more than 25 %)

h
.
To keep the things simpie, we san write Ho in the
form of equality.'" jo' p = 0'25
Thus we write
but it is important to write Hr with proper clirection of inequality'
H,:p>0.25. :

( > ). We shall explain


In this case tire H, contains the inequality tttore th,atr.
fut", tfrut H*'f"oy be written with inequrtity lnss tltan
(< ) or n'ot equal ( * )' In
0 is 0o, then H, can be
general, if the hypothesis about the population parameter
written in three different ways'
' fo. Ho:0=0o, Hr:0*0o Hr:0>0o H':0<0o
the students' Another way
But this is ihe simple approach which is allowed for
of writing the above hypotheses Ho and H, is
(a)I{o:0 = 0r,H,:0*0o ft)Ho:0 s 0o'Hr:0>0o (0) H":8 > 0o , H, : 0 < 0n
I

lChapter 13] Statistical Inference Testing of Hypotheses 119


i
The alternative hypothesis ll, never contains the sign of equality. Thue H, will
not contain '=', 's' or '>' signs. The equality sign '=' and inequalities like 's' and ,>,
are used for writing Ho.
13.2.3 SIMPLE HYPOTHESIS
If a hypothesis has a single value for the population parameter, it is called
simple hypothesis. The breaking strength of copper wire is lokg. Here Ho: p 10 kg

m
= I
has a single specified value. Ho is simple hypothesis, similarly
Fr - trz = 10 and

o
p = 0.6 are simple hypotheses.
13.2.4 COMPOSITE HYPOTHESIS

t . c i

o
The hypothesis is called coutposite if it specifies a range of values for the

p
the

s
hypotheses (pr - Fz) > l0 and p s 0.6 are composite.
i

g
13.2.5 ACCEPTANCE AND REJECTION OF NULL HYPOTHESIS

l o
The given hypothesis is testcd rvith the help of the sample data. A simple
random sample hae the full freedom of giving any value to its statistic. The sample

b
I

.
is n9t a-w1ry of our plans. We decide about our hypothesis on the basis of the .u*pl.

3
statistic. If the sample does not support the null hypothesis, we reject it on {

4
probability basis and accept the alternative hypothesis: If the sample does not

9
oppose the hypothesis, the hypothesis is accepted. But here 'accepl'does not mean
I
the acceptance of null hypothesis but only means that the sample has not strongly

t9
opposed it. "Not opposed" does not mean that the sample has strongly supported the

a
hypothesis. The support of the sample in favour of tn" hypoihlsis'cannot be

t
{
established. When the hypothesis is rejected, it is rejected with a high probability.

s
;

/: /
Thus rejectiorr, of Ho: is a strong decision and it leads u$ to the acceptance of H,. But
acceptance of H, is not like the acceptance of Ho. The acceptance of null hypothesis

s
does not give us a certain strong decision. It is a situation which may require some I

tt p
i
further investigations. At this stage, many factors are to be taken inio account. The I
sample-size and certain other things not yet discussed help us to do something more

h
I
about the null hypothesis before it is finally accepted. Thus reje:ctiort,is a decision but. :

j
not necessarily true and occeptance is not a decision in any sense of the word.
There is a modern approach in which the terms rejectiort and, acceptance are not
used. This modern approach is beyond the level of this book. But it remains true in
its place that acceptance of a null hypothesis is a weak decision whereas rejection is
a strong evidence of the sample against the null hypothesis. Vfhen the null
hypothesis is rejected, it means the sample has done some statistical work but when
the null hypothesis is accepted, it means the sample is almost silent. This.behaviour
of the sample should not be used in favour of the null hypothesrs.
13.2.6 TEST STATISTIC
A statistic is calculated from the sample. To begin with we assume that the
hypothesis about the population parameter is true. We ccinpare the value of the
statistic with the hyperhetrr:;rl value of the parameter. if the rlifference bctween
Basie Statistics Fart-II
between t'hem is large'
them is small, the hypothesis is accepted ancl if the dift'e'etrce
can be based whether to
the hypothesis is ,ui*t"d. A statistic cn which the decisic'n
of the test statistics to be
accept or reject u t ipott .ris is called test statist'rc' Some
discussed in this book nre 'Z', 't' antl 1' lchi-square)
L8.2.7 ACCEPTANCE AND REJECTION REGIONS
agree with the given
The values of the test statistic which we think do not

m
The values of the test '
hypothesis are cailed the eritical region or rejection region'
statistic which the hypothesis form ihc acceptance regionr I'he rejection'

c o
.
""pi,ort
,"gion is equal to'n and th* zicceptance region is denoted by -
(1 C[)'These two

t
combinecl together rnake the
regions are separate frorn eacit other and both regions

o
regions are separated by a
complete sampling clisiribution of the ntatistic. 'fliese
;;il (or values), *hi.i, is called critical value (or vaiues).
s p
g
Wheri the rejectio' region is taken orr both erlds of tlte

o
sarnptring distribution'

l
the test is called two-sided. test at two-tailed,lesJ' When \Ie are
us-ina a.two'sided

b
the rig,ht side and the other

.
test, half of the ,.io"iior, region equal to alT is taken on
half equal to .,l1f;;t;" oi th" teft side of the-sarnpling
ilistributioii' Suppose the

3
and we have to test the
sarnpling distribution of the staiistic is.a normal distribution

4
0o which is two-
lvp,iirr"rrr Ho, o ;-;; ;;;t"ri tt , alternative hypothesis H,: 0'*

9
greater than Zoi2 or it is le.ss
sided. Ho is rejected when the calcufated value of Z is

t9
<-Z''12 can also be written
than-Zoy2. Thus the critical region isZ > Zul2orZ 'iL
as-Zop<Z<Z*2
ta
s
is shorvn in Fig' 13' 1'

/
when Ho is rejected, then H, is accep ted' . Two'sided test'

:/
s
tt p
Rejection Region
Rejection Region v (l-u)
I
h
Acceptance Region

Zsl2
-Z o,t2 Z=0
{ {
l.orver Critieal Value Upper Critical Value

' Figure 13' I


T3.2.9 ONE-TAILED TEST
0o or 0 < 0o' then the
When the alternative hypothesis FI, is one-sided likc 0 =
cl'istribution' It is called
rejection region is taken only on one side of the sampiing
one-taiJ,ed, test or one-si,rJecl J*si. Vv'hen H, is orre-sid.ed,
to the i:ighN like 0 ' *o'-lh'
entire rejection region eqnal to cr is taken in the riglit end of thc sampling
distribution.
lCha pter 13
] ji Lliirtl sti e* i Xn fe r*st {e Testi sr s g{.EIE{}t}g:qE LzL
The test is cali*d an,e-sidcd to One - Sided totheRight
the right. Thc .hypothesis l{o is
rejected if the caL:uiatecl vaiue of a
statistic, sa3'Z fails in Ll-ii: rcjection
region. The criticai vaiue " is 2,,
which l'ras the area equal to cr to its
right. The rejection region rrrld

m
1,-t)
acceptance reg;ion are shown in

o
[,'igure L3.2
Fig.13.2. The nuli hypothcsis Iio is
rejected r.r,hen ?(caicutrate d) > Zn.

t . c
o
If the alternati.ve hypothesis is One-Sieled tn thr: !,*ft

p
one-sided to tire lef't likc 0 .- 0o, the

s
entire rcjecticn rcglon cqu;rl to ct is

g
Ite.iection llegion
taken on the i*-,ft i.*il of the

l o
sampling rlistribuiion. 'fht; t.:st is i

b
!

called one-siclerd or ar:e-tailed to the .t

.
i
left. The critical yillue is * 2,, rvhich

3
cuts off the aret cqual to ,; 'uo its Figule 13.3

4
Ieft. The critical region is Z < - Z*

9
and is shorvn in Fig.13.3.

t9
For some irnportant vaiucs ol'a, the critical values of Z for trvo,l;ailed and one

a
tailed tests are given bek:w:

s t Critical values of ff

/: /
a Trvo *.siried test 0ne-sidecl 0ne-sided

s *
to the r:ight to the'left

tt p
0.10 (10 ?6) -. 1,645 ancl * 1.645 + l.?82 -- L.282
0.05 (,5 ;s6) 1.t)6 and + L.96 + 1.645 - t,645

h 4.02 t2%)
0.01 (1 %)
*
*
2.3?6 and + 2.326
2.575 and + 2.575
+ 2.054
+ 2.326
-- 9.054

- 2.326
13.3 ERRORS IN TESTING OF' HYPOTHESIS
The null hvpothesis flo is accepted or rejected on the basis of the value of the
test-statistic which is a function of the sample. The test statistic may land in
acceptance region or rejection region. lf the calculated value'of test-statistic, suy Z, is
small (insignificant) i.e., Z is close to zero or we can say Z ltes between - Zrr12 and
Zrlg is a two-siderl rili;ernativ* test (H,: 0 x 0o), the hypothesis is accepted. If the
calculated valrte of the tcst-statisticZis lnrge (significant), Ho is rejected and H, is
accepted. In ihi$i rcjectior: pian or acceptance plan, there is the possibiiity of rnaking
any one of the two e rrors which are called Type I and 'Iype trI-errors.
L22 Basic Statistics Part-ll

13.3.1 TYPE I ERROR


The null hypothesis Hn may be true but it may be rejected' This is an error and
any value
is called Type I error. When Ho is true, the test-statistic, say Z, can take
between - o to + oo . But we reject Ho when z lies in the rejection
region while the
rejection region is also includecl in the interval - o to.o. In a two-sided
H, (like 0 * 0o), the
2,172' When Ho'is
hypothesie is rejectecl when Z is less than - Zozor Z is greater than

m
region
true, zcan fall in the rejection rcgion with a probability equal to the rejection

o
iir""it is possible ttrat Ho is rejected while Ho is true. 'l'his is called Type I error'
".
.
The probabitity is (1 - a) that Ho is acceptecl when Ho is true' It is called

t c correct

o
decision. we can say that T'ype I cllor has been committed when:

p
(i) an intelligent student is not promoted to the next class.
(ii) a good player is not allowed to play the match'

g s
o
(iii) an innocent person is punished'
(iv) a diiver is punished for no fault of him'

bl
(v) a good worker is not paid his salary in time'

3 . quoted to make

4
These are the examples from practical life. These examples are

9
a point clear to the students.

9
cr (ALPHA)

a t I
The probability of making Type error is denoted by cr(alpha).
when a null

t
hypothesis is releCted, *u bc wrong in rejecting it or we may be right in

s
^oy will be' it
,ui'".ti"g it. We do not tno* that Ho is true or false. Whatever our decision

/: /
probability of
*iff f,ui. the support of probability. A true hypothesis has some the size

s
called
rejection and this irobabiliiy is dcnoled by cr,. This probability is also

tt p
of. Type I error and is denoted bY a'

13.3.2 TYPE II ERROR

h
The null hypothesis Ho may be false but it may be accepted' It is
an error and is

called Type II error. The value of the test-statistic may fall in


the acceptance region
;;; H" is in fact false. Supposc the hypothesis being tested'is Ho: 0 = 0o and Ho is
0o and 91 is very
false and true value of 0 is 0, or 01ru". If the difference between
large then the chance is very small that Oo(wrong) will be accepted'
In this case the
true sampling distributiori of the statistic will be quite away from thefall in the
sampling
distributior, .rnau, U"., ih"ru will be hardly any test'statistic which will
overlaps the
acceptance region of H,. When the true distribution of the test'statistie
u...prrrr.u region of Ho, then Ho is accepted though Ho is false' If the
difference

between 0o ar.rd 0, is small, then there is a high chance of accepting


Ho'This action
will he an error of TYPe ll'
fChapter t3l Statisticat Inference Tesllng_s[Hypolheleg .
123
p (BErrA)
The proirabiiit3, of rnafting ,l,ype II e1r.oi, is denoted
committed rvhen Flo is accepted *irito [I, is true.
by F. Type II error is
The value of B can be calculated
only when we happen to know the true value of the population
paranreter being
tested.
13.3.3 RELATION BE'IWIIEN s AND
B

m
suppose we have tg tcst Ho: ir = pn ag:rinst the
albernative H,: l, > Fo. A randonr
sample of size n is sclcctcd front the population. and the sample mean
X
c o
.
is

t
calculated' The sarnple size n is large and therefore
the samplingdistribution of X is

o
normal with mean p. To srart with we assurne that

p
Ho: p =;" i, tru" and x has the
distribution,as shorvn on left sicle. of the fig. 18.4.
Undcr IIo

g s
Under H,

l o
. b
43
99
t
Fig.13.4. has two sampling or.rr,orr?lxT"'"": on

a
rhe lefr side and the other is on

t
th,e right side. when the null hypothesis Ho:
,, = p; i.-being t"rt.a, iir"r" are the

s
rollowrng tbur possibilitics.

/: /
(i) Ho is true and X falls in the area marl<ed (t
- a) in the Fig.1B.4. The hypothesis

s
Ho is accepted and this is callecl correct decision. Probability

tt p
of this correct
decision is (i - o.). we rnay or r'ay not rnake this
decision.

h
(ii) Ho is trtte and X falls in the area marked cr. This
is the area of the distribution
on the left side' Now Ho is true but it will be rejeeted
because X faus in the
rejection region. This is an error of Type I and this"error
will be committed.wittr
the probability of q. we do not know whether we
have committed o, emor or not.
(iiil Ho is false' The true value of p is say and the
Fr true distribution of X is the
distribution on the right side in Fig. r8.4. Now suppose
marked (1 - p)' This is outside the acceptarrce region
x fals in the area
of the distribution on the
left side' 'Ihus Ho: p = pro is rejccted *nd th" p.ouiuitity of
this action is (1 - 0).
It is called colrect decision when Ho is false. fact, X belongs to some
' distribution' when we take a hypothesis H, thisIn is an assumption about the
L24 Basic Statistics Part'II

mean ofithe ailtrilrtion of X. If true distribution of X is on the right side, then


;;; *ru*pf thi, dlrtribution is falling on the acceptance region of the
hypottibtical distribution on the left side. This area is marked as B.

(iv) Ho is false and the value of X falls in the area marked P. In thie case Ho is
accepted because X has fallen in the acceptance region of the first
distribution'
ThusHobeingfalse,maybeacceptedwithprobabilityofB'
If the distribution on the right side ie shifted to the right, B will decrease
o
and ifm
. c
value of p depends

t
this distribution is shifted to the left, B will increase. Thus the
when n is
upon the,true value of population mean p. In A certain given situation

o
decrease o, we
fixed the value of B increases when a ie decreased. Thus if we want to

p
g'risk an-d
shall do it at the risk of increasing B. cr -error and B'error are also called

G; the costs of committing o-error


g s
p-rirt respectivety. wt i.tt risk do-;e want to keep at minimum level? This depends
a1-d p-'error. Suppose we are hesitant of

l
rejecting Ho when'il i, tr.r", ihun *" shall take cl at a small level'
o
In most of the

b
o/o) or 0.05 (5 %).
tbsts, a is.fixed at a small level like 0.01 (1

.
of hypothesis'
The following table shows four possible decisions in a certain test

43 Ho is True Ho is False

9
Ho is Accepted Correct decision Type II error

t9
Ho is Rejected Tlpe I error Comect decision

ta
wn*ffithesis,ourdecisionwillfa1linanyoneof.th3above
-clecisions

s
-four boxes. 1'he fo"r porJiUtu in terms of probabilities are ehown below in a

/
'

:/
tabular form

s
Ho True Ho False

tt p
(1 p
Ho ie Accepted - cr)

(1-p)

- h
Ho is Rejected ct

under Ho
may u" notra ttrut q is an area in the right tail of the distribution
It
*I
and p ie the area in the left tail of the distribution under H,, Thus
ct + F in
general. In some epecial case and that too very.rarely' o +,p f:y b:,:,q111lP llff]
ff;"fffi;i;Uirr. Thus probabiliry te smau that our decision will fall in the box
marked o. But *frun our dicision hae fallon in the box marked
a, it is a powerful
decieion against Hn.
18.4 LEVEI OF SIGNIFICANCE
-Th;;.riek
is the probability of rejecting a true null hypothesis. It rs aleo
calle{
by o and its
the significance level or level oi eignifi4ance of the tegt. trt is denoted
before the selection of
level is ueually I yo or 6 %, The ,uiu. of cr is usually decided
the sample.
[Chapter l3I Statistical Inference Testing of Hypotheses 125
I8.5 FARMULATING HO AND HI AND MAKING CRITICAL REGION
Now, when we have discussed different terms used in the testing of hypothesis,
we are in a position to discuss a point which is quite confusing sometimes. The
question is how to formulate the null hypothesis Ho and the alternative hypothesis
H,. We elaborate this point here and we shall repeat here certain points already
discussed in this ehapter about framing of Ho and H,. Let us consider some cases.

m
(i) A machine has been produeing components with mean length of 3 cm. which is

o
the required standard; A new machinery has been installed and it is required to

c
test the hypothesis that the mean length of the components is the same. It is
obvious that in this case the Ho and H, will be:

t .
o
Ho:Ir-3cm. Hr:F+3cm.

p
H, contains the inequality'*'which means that the rejection region is taken in
both ends of the sampling distribution.

g s
The test-statistic used is Z = +.
l o
b
" o/r/n

.
The null hypothesis Ho is rejected if

3
Z.-Zaz or Z > Zon .It is called ,uo'

4
tailed, tesl with rejection region on
, ui3
9
both sides. Ho is rejected when -2CI/2 Z= 0
,

9
Zun

t
sample mean X is sufficiently larger

ta
than 3 cm. or sufficiently smaller

s
than 3.

/: /
(ii) Suppose that we want to test whether the mean p of a normal distribution
€xceeds a'specified value [,o. We set up the null and alterhative hypotheses as

s
follows: Ho:lr=lro Hr:F>Fo

tt p
The null hypothesis Ho and the alhrnative hypothesie H, in thic case can also
be written as Ho : lr 5 lro

h
Hr : [r > po
o
H,'is complement of Ho and the area of the distribution unddr Ho and H, makes
n the complete diatribution. In thie iase, the region of rejection ig takon in the
rl
rieht tail of the dietribution.
x The test-statietic is
il
Z =4.The
ofi/n
null hypotheeie
Ho is rejected when the
d calculated valud of Z is
;s greater than the critical value
lf Z=0
za.
.Figure 13.6
126 Basic Statistics Part'II

(iii) At least 60 % of the people are in favour of English as medium of instructions.


(1) at least
The sampling distribution of proportion ff is dirided into two parts
60% (2) less than 60%.
We have a serious doubt about the statement and we hope to disprove it.
The
; proportion of the people p > 0.6 is to be tested. The idea or suggestion of at least
bO i" (p > 0.6) wili be'rejected if the sample gives the- result weII
below 60 %; The
iui".tiio r"gion is deciied by H, which is one-sided to the left' Thus we frame
HoandH,as: Ho: P>0.6 H,:P<0.6
om
c
In this case the entire critical region lies in the left tail. If Hr: P < 0.6 is true
then the sample proportion fi should lie in the rejection region.
t .
The test statistic used here is

p o
z = #.
^\n/pq
The hypothesis Ho

g s
l o
b
is rejected if.Z < -Zo. p=0.6

3 .
-zq Z=A'

4
Figtrre 13.7
Example 13.1.

99
t
Indicate the type of errors cornmitted in the following casesl
(i)
ta
Ho: P = 500, H,: p * 500' Ho is rcjected while Ho is true'

/: / s
(ii) Ho: p = 500, Hr: lr < 500. Ho is accepted while true value of p = 600'
Answer:

s
(i) The hypothesis p = 500 is true and it has bebn rejected. Type I error has been

tt p
committed.
(tD;;;i"lseandhasbeenaccepted.TypeIIerrorhasbeencommitted.

h
18.6 GENERAL PROCEDURE FoR rbSrtNC OF HYPoTHESIS
Following are the main steps involved in the testing of a hypothesis. about
population Parameter
the

1. Formulating Null hYPothesis Ho:


'frame the hypoth-esis
First of all we have to identify the problem and then we
which
which we think shall be rejected. Supposi the population_parameter is 0 about
*" fr""" to frame the hypothesis.We specify a value 0o for the unknown parameter'
The null hypothesis Ho can be written in three lvays as shown below:
(i). Ho:'6=0o (ii) Ho: 0 < 0o (iiil Ho: 0 > 0o
In some particular situation any one of the above three forms of Ho is taken. The
important thing about Ho is that Ho always iontains some form of ari equality sign
such as '=', ')', or 's '. As Ho always contains sign of equality of some type, some
people always write Ho ns Ho: 0 = 0o and they do not write the inequality contained
in Ho.
Alternative hypothesis if ,:

om
The alternative hypothesis H, is the opposite or complement of Ho. Ho and H,

c
combined together make the entire sampling distribution. Both Ho and H, are

t .
equally important and they are to be defined properly and clearly. As H, is

o
complement of Ho, therefore H, stands decided when Ho has been fixed. For

p
example, for each value of Ho, the corresponding value of H, is given below:

s
(i) If Ho: 0 =0o then H,: 0 *

g
0o
' (ii)" If Ho: 0 <0o then

o
H,: 0 > 0o
(iir) If Ho: g >0o then H,: 0 < 0o

bl
.
2. Level of significance q,:

43
It is the probability of rejecting Ho when Ho is true. It is denoted by a. It makes

9
the size of the critical region.
3.
9
Test-statistic:

t
The testr stafisfic clepends upon the shape of the sampling distribution of the

ta
statistic. If the sampling distribution is a normal distribution, tle test-statistic to be

s
used is Z and, if it is a t-distribution, the test-stqtistic to be used is t. Other test

4. Critical region:
/: /
s
Critical region or rejection region is decided by Hr: The size of critical region is

tt p
equril to a.
(0 If the alternative
h
hypothesis is H,: 0 * 0o the rejection region is taken in
both ends of the sampling distribution. Each side has rejection region equal
to uJ2. It is called two-sided rejection region. The rejection regions are
separated by the trrio critical values.
(ii) When H, is 0 > 0o, then rejection region of size a is taken only in the right
side. It is called one-sided to the right. The rejection region is eeparated
from.the accept:rnce region by a critical value of test-statistic.
(iii) When H, is 0 < 0o, the rcjection region of size a is taken only on the left
side. It is called one-si.d,ed to the left.
6. Computations:
The relevant test-statistic is calculated from the sample data. The calculated
value is to be compared with the tabulated value.
Basic Statistics Part-II
128 T
6. Conelusion: (
region, the null
If the calculated.value of test-statistic lies in the.rejectionvalue
hypothesis Ho is. rejected and H, is accepted. If the calculated
of the test'
'it is not
statistic falls in the acceptance region, we say that Ho is accepted bu't
means that the
acceptance in the real sense of the word. The word acceirtanc-e only
,u*plu has not provided suffieient information against the null hypothesis'

m
IS.THYPOTHESIS TESTING - POPULATION MEAN p, o KNOWN

o
(LARGE SAMPLE)

c
Suppose u pop"ruiior, has the mean p which is unknown and
the'standard

.
from'the population

t
deviation o, which is know.,. A large sample of size n is selected

o
and. sample mean X is calculatecl. We are required to test a
hypothesis that the

p
((
population mean p ilr the specified value p". ih" steps of the procedure are listed

s
below:

g
i.--Wu frame the null hypothesis Ho and the alternative hypothesis H,' Three

l o
different forms of Ho and H, are possible which are:
> Po

b
(a) Ho: p= lro and.H,: lt * fto ft) Ho: Ps Po andH,: F
(c) Ho: F) lto andH, : F < lro

3 .
4
.)
Level of signifrcance o is decided.

9
3. Test-statistic:
When sample size is large, the sampling distrib"tln of X ha1
the normal

t9
distribution with mean p and.the standard error o/r/n . The'population
may or

ta X_po

s
may not be normal.'Ihe test-statistic to be used is Z where
Z=

/: /
m
4'
s*
ff:'::||ifr-l:;:" depends upon the arrernative hvporhesis. rhere are three

tt p
pogsible rejection p[arrs. We discuss a]l the three turn by turn'
(a) When H, is 1t Po, the

h
rejection region equal Rejection Region/f I \ Rejection Region
to atT in size is taken
on both onds of the
sampllng distribution ,:,1!,2
ae ehown in Fig' 13.8. tt = ]t o

The criticat u"tr.r-Ji?


which separates the Figure 13.8
region are - Zs12 and Zo2' The 6.
critical regions from the central acceptance
critical value
britical value - z(xt|hae the area on its left equal to atT and the ac
* Zot'has area on it, right equal to uly.'Ho is rejected if the calculated value of ag
< and Z > Zs.tz' When
Z il;in rejection region..The rejection region isZ -Zon re.
cr = 0.05, then - ZalD= -Zo.ouo = - 1'96 and
Zo,oro = 1'96' tS
[Chapter 13] Statistical Inference Testing of Hypotheses L29
(b) When.H, is p > po, the
rejection region equal
to cr is taken in the
right end of the
distribution as shown
in Fig. 13.9. The test Z=0 Zq

m
plan is called one-tailed
Figure 13.9

o
to the right.

c
The hypothesis is rejected when the calculated value of Z is grdater than Zo,

t .
where Zo is the critical point on the right of which the area is equal to cr.
(c) When H, is p < po, the

p o
s
rejection region equal

g
to cr is taken in the left

o
end of the distribution

l
as shown in Fig. 13.10.

b
The rejection plan is -Za

.
Z--O
called one-tailed to the

3
Figure 13.10
left.

9 4
The hypothesis is rejected when the calculated value of. Z is less than the

9
critical value - Zo where -Zo is a critical point on the left of which the area is

t
cr. The rejection region is Z < * 26. Corresponding to each null hypothesis, the

ta
alternate hypothesis and the rejection regions are given below:

/: /
NuII hypothesis
s Alternative hypothesis Rejection region

s
(a) Ho:p = lro Hr:tr * po (two-sided) Z <-2o12 and Z, Zon

tt p
(b) Ho:pspo Hr : lr > Fo (one-sided) Z, Zo

h
(c) Ho:p2po H, : pt < po (one-sided) Z. -Zo
5. Computations:

The value of Z iscalculated by using the formul a: Z=+


o/Vn
6. Conclusion:
If th; value of Z lies in the acceptance region, the hypothesis is accepted. But
acceptance is just an indication that the sa*ple data has failed to provide evidence
against the null hypothesis. If the value of Z lies in rejection region the hypothesis is
rejected. When Ho is rejected, there is only 100 o % chance that the null hypothesis
rs true.
130 Basic StaUstlcs Part'II

iExomple 13.2.
past records show that the average score of students in statistics is 57 with
,

sample
.tu"iuia a""irii"n ro. A new rnethod o1teaching is ernployed and a random basis of
,i?O .i"a.nts is selected. The sample average is 60. Can we conclude on the
these results, at5%olevel ofsignificance, that the average score
has increased?

Solution:
1. Null hypothesis: Ho: p = 57 Alternative hypothesisi H' : p>
m
57

2. Level of significance:c = 0.05


c o
X-Fo
t .
o
Cl-
Test - statistic:

will p
I
3.
oir/n

s
4. Critic6l region: Z > -L.645. Here we use one-sided test to the right. The

g
hypothesis Ho: F = 57 be rejected if Z lies in

o
lo.=
rejection region.

b
(From the area table of normal distribution, we have Zo = zo.or-- 1.645)

5. Computations:
3
Here n = 70, X .
= 60, 10, and hence

Z=
9 460-57
-
3
10
r,ffi = 2.51

9
10d-70

t
6. Conclusion: since the calculated value of.z= 2.51 falls in the critical

a
region, so we reject our null hypothesis Ho: P = 57 at

s t
5 % Ievel of significance and we m.ay conclude that the

/: /
average score has increased.

s
*"T:';:::;"ar
rirm manufactures right butbs that^have a tength of life that is

tt p
approximately normally distributed *Ith * mean of .812 hours
and a standard
against the alternative
ffitrtil;iaO t "rr.. Test the hypothesis that p = 812hours
h if a rantiom samplt-of 36 bulbs has an average life of 800 hours' Use
a
tL * 8lzhours
5 % level of significance.
Solution:
1. Nutl hypothesis: Ho : p = 812 Alternative hypothesis: H': p * 812

2. Level of significance:o = 0'05


X-po
3. Test-statistic: ,=ffi
4. Critical region: I Z I > 1'96 (Z < -1'96 and'Z > 1'96)
(From the area table of normal distribution, ri,e have ,?,= ,o.orr=
1'96)
lChapter 131 St?tistical Inference Testing of Hypotheses 131

5. Computations: Hercn=36, X= 800, o = 40, andhence


L = 800 - 812 12
(6)=_1.9
40 / v36
-_-
6. Conclusion: Since the calculated value of Z = -1.8 falls in the
acceptance region. Thus Ho: p = 812 is nolrejected.

m
13.8, HYPOTHESIS TESTING-POPULATION MEAN p-o NOT

o
KNOWN (LARGE SAMPLE)

. c
This is an important case in which o is not known. When sample size n is large,

ot
the population may be normal or not, the.sampling distribution of X has the normal

p
distribution with mean p and stanclard rrro. oA,6 . But when o is unknown, it is

s
estimated by thg sample standard deviation S and the estimated standard error is

/G rhe z-statistic becomes I-


og =
where s2 The remaining

l
ffi

.b
procedure is exactly the same as discussed earlier. The only difference is that S is
used in place of o in the calculation of Z.

3
Exomple 13.4.

9 4
A home heating oil delivery company would like to estimate the annual usage
for its customers who live in single-family homes. A sample of 100 customers

t 9
indicated an average annual usage of 1103 gallons and a sample standard deviation

a
|
of 327.8 gallons. At the % level of significance, is there evidence that the average

t
annual usage exceeds 1000 gallons per year?
Solution:

// s
:
1. Null hypothesis: Ho :lr s 1000 Alternative hypothesis: Hr > 1000

s
:F
2. . Level of significance:c = 0.01

tt p .,- X-lto

h
8. Test - statistie: b-
/{nS
4. Critical region: z> 2.326
(From the area table of normal distribution, we have Z, = Zo.o, = 2.826)
5. Computations: Heren=100, X= 1103, S = B2?.8, and.hence
1103 - 1000 103
.L___=m(10)=3.14
327.8 / {100
6. Conclusion: Since the calculated value of Z= 8.14 falls in the critical
region, so we reject our null hypothesis Ho: F < 1000 at
I % level of significance and we may conclude that the
average annual usage exceeds 1000 gatlons.per year.
lr
L32 Basic Statistics Paft'II

Exomple 13.5,
A sample of 42 measurements was taken in order to test the null hypothesis
that the populatiori lnean equals 8.5 against the alternative that it is different from
8.b. The'sample mean and standard deviation were found to be 8.79 and' 1.27,
resp"ctiuely. Perform the hypothesis test using 0.01 as the level of significance.
Solution:

m
1. Null hypothesis: Ho : p = 8'5 Alternative hypothesis: H' : p * 8'5

o
2, Level of significance:cr, = 0.01

3. statistic: Z=
X_po

t . c
o
Test - S tG
p
-2.575 andZ > 2.575)
4, Critical region: I Z I > 2.575 (Z <

s
(From the area table of normal di.trib.,tion, we have 2'575)

g
'7,= ',*u=

l o X=
7= 8'79, S = L.27, and henbe
5. Computations: Here tr = 42, a

. b
z=ffi=Wr@=1.48
6. Conclusion:
43
Since the calculated value of Z
= l'48 falls in the
acceptance region, so we accept our null hypothesis

99
Ho: [r = 8.5 at 1 % level of significance'

t
1S.9 HYPOTHESIS TESTING - POPULATION MEAN
p, o KNOWN -

ta
NORMAL POPULATION (SMALL SAMPLE)

s
is
Sometimes the hypothesis about the population which normal and its

/
/ #.
,tarrauia deviation o i* t rro*n. In this case Z-test is usbd both for small and large

:
s The procedure for testing of population mean p is the

tt p
sample size. Thus Z =

same as discussed earlier.

h
13.10 HYPOTHESIS TESTING - POPUI"ATION MEAN [r, o UNKNOWN -
NORIVIAL POPULATION (SMALL SAMPLE)
When the stand.ard deviation of the population is not known, it is estimated by
the sample standard deviation's'where s = * xG - X)'. Thu Procedure runs
as follows:
The different forms of hypotheses are
1. (a) Ho:P=lto and Hr:P*Po
(b) Ho:ttf Po uld Hr:F>Po
(c) Ho: p:'Po and Hr:P<!ro
2. Level of significance o is decided.
lChrpt"r 131 St"tirti.at r.!,1g.*I:S]rj!j1s gf llypoth*r*o
__ 133
3. Test - statistic:
when population is normal anci samplc size n is small, tht: nampling
distriburion of X tras the t'distribution with (n l) degrees of freedom, The test-
-
X-po
statistic is f =
s /\6

m
4. Critical region:

o
The critical .egion is based on the alternative hypothesis.

. c
(a) .For the alternative

t
hypothesisH,:p*po,
the rejection region is
two-sided as shown in
R.ejecticn Region

p o Rej ection Region

s
+
Fig. 13.11. The two

g
I
criiical values -tui2 ( n*1)
", "
l o
and. top( n_t; are s€en
,* *--

b
from the t-table belorv

.
-tcrl2 (rpl) t=0 tai2 (n-l)
alZ and against (n - i)

3
degrees of freedorn. The Figure i3.i1

4
critical region is

9
t> t al 2 (
n-l)ort{- tcrl2 ( n- 1)

t9
as show in Fig.13.11.

ta
(b) When H, is pr > p,r, tire

s
rejection reglon is taken on

/: /
Rejection Regiot:
the extreme righi side of the
sarnpiing distribution as

s
shown in Fig. 18.12. The

tt p
critrcal value tu (n_l) ts seen
from the t-tahle below cr and re
t=0

h
ta (n-l
against (n - i) degrees of )

freedom. "Ihe critical region is Figure 13.12


t > tcr (n-1)'
(c) When H, is p < trro, the entire
rejection region is taken on Rejection Region
the left side of the sampling
distribution as shor+,n in Fig.
13.13. The critical r,alue
ta(n_1) is seen from ttrE: t-ta!:ie tt: Po
-*---T*.* - _
belorv o, and against (n - 1) : ta(n-t) t=0
degrees of freedom. Tire
'eritical regir:n is t < ' 1,.,. Figure 13.13
i;r_lj^
L34 Basic Statistics Paft-II
5. Cornputations:
X-po
The test-statistic't' is calctilated from tht: sai:rple data where t, =
r /\F
6. Conclusion:
'The
nuli i-lypothesis Ho is rejected in flavour I-I, rvhen [he value of t lies in the
*1.

m
rejection region. Ho is accepted lvhen the vaiue of t l-ies in acceptance region.

o
Exampie 13.6.
A rnanufhctr.rring cornpany lnaking automobile trres claims that the averi,:" 1 ie

t
qf its procluc'r, is 35i]0* lrriies. :\ random sampie of 16 tires was selected; and , ';' iIS
. c
o
fbund that the tneall liib u,*r ,1.i000 miles rvit.h :r stattcir..rti deviation s = 200() I l,'s.

p
Test hypothesis Ho: 1r = 35000 tgaiir.,t the aiternative 1I,: p < 35000 at cr = 0.0i''.
Solution:

g s
.Altemative hypothesis: H,: F < j; 't00

o
1. Null hy'pothesis: Ffo : ;r = i15000
2. Level of significance: u = 0.05
bl
.
v

3
X __Fo
3. r
Test - statlstic: t.-

4
s/1n-

9
4. Critical region: t..-1.?53

t9 X=
(Florn the t-tab!e, u,'c havc - t,r(,,. ir = - to.ol,.rr, = -1.753)

5.
ta" i{ra
Hcrcn=1G, s = 2000, anclhence ;
Computations: 3,1000.

s
35000 -i0c0

/: /
34000 - /..\..
2ooo
-
20u0 \'') -- -'o

s
-_
6. Conclusion: Since the calculated value of t = -2 falls in the criticai

tt p
region, so lve reject our ttull hi'pothesis FIo: p = 35000 at
5 7u lcvel of signiiicancc.

h
Exantple 13.7.
A ranrlom sample of 8 cigarettes of a certain brand iras an average nicotine ,
content of ,tr.2 niilligrams and a stlndard deviation cf 1 4 milligrams. Is this in line
rvith thr: manufactrlrer'1,, r.rlliiril thal, t,l-,i: lly.l;igt tr.ir:olinr-, cr:u;ent does not exceed 3"5 3.
uriliigrams? Use I ')6 ir:'.'*l cf :igr:iticance and a$sirllre tl-rc clistribution of nicotine
content-s to be irorir-ia1.
So/rrtion.'
l. Null hypothesis: Ho: pr s 3.5 Alternative irypothesis: H, : p > 3.5
2. [,cvel of significanee:o = 0.0i
X-t,n
:i 'I't-<r .. statistic: t-
,r
s / \"in
135
4, Critical region: t > 2.998
(From the t-table, we have i,i(n_r) = to.or(z) = 2.ggg)

5. Computations: Hcrcn=8, X= 4.2, s = 1.4, andhence


,l .1.2 ;1.5
- L4 --ti8 0.7
r=--==-=JVg=1.414 r-

m
1.'

o
6. Conclusion: Sincc the calculated value of t = 1.414 falls in the
acceptancc r"egic;n, so we accept our null hypothesis
Ho: pr = ll5 at I ?/o level of significarc€.
t . c
o
:

p
13.11 HYPOTHESIS TESTING * DIFFERENCE BET\IIEEN TWO
"popul,ArroN
s
ITEANS 1r, -;;;,;;Nil11 x'xorvrv

g
(LARGE SAIITPLES)
Suppose there are two no1:ulations (normal or non-normal) with
l o
b
means p, and p,

.
which are unknorvn antl the variirnce" oY and oj which are known.
r*t h.gu

3
random samples of sizes n, ancl 11, are selected from the populations

4
and the sample

9
lneans X, and X, a." calculatccl.'l'lrt'rliltcrence.(!, i"
- Xr) a r.andom variable ancl its

t9 lol oZ
distribution is norrnal rvith rnean lri - F: a:rd standard error \ \i/fnt + -nr

ta
s
The proceiitrre for testing the lil,pothesis pt, _
F: = 0 is explainecl below.

/: /
1. The null and the alternative hypotheses which are possible are

s
ri (a)Ho : Fr-ir:= 0 (orFr = [rr) and H, : pl-fr2*0(or
Ft*ltz)

tt p
t
(b)Ho: lrr-Fzs0(orpr <p, and H,:Fr-Iz> 0(orpt, >[r2)

h
(c)Ho: Irr-p2 >0 (orp, )1r.3) and H,:Fr
-lr::< 0 (orFr < pz)
e
e
2. Level of significance o is deciclctl.
5 3. Test - statistic:
e
The distribution of (X, - i. normal, therefore
Xr) the test-statistic to be used is Z,
(Xr-Xz)-(trr-u:.)
where Z =

Critical region:
^E=
V't Ir:

For each altern:lte f,3,potlrcsis H,, thepe is a rejectic;i plan as explair.-d


oarlier.
Basic Statlstlcs Part'II
136
5. ComPutations:
The Z'statistio is calculated using Sometimes the
null hypothesis states
the samPle clata where, some difference between P, and P, and
the difference is denoted by A' In that
case Ho is 1t, * ltz = A (saY) and
b- (Xr-Xz)-(rrr-uz)
.,-
I"?
m
' 1, lJ+J "3 ,a)--6gf,*L:-1
o
n2
Vt' I o I

V"' '"
^. loi -o;
c
t .
6. Conclusion:

p o
value of Z lies in rejection region' If Z

s
The hypothesis is rejected if the calculated
lies in acceptance resion, the hypothesis is accepted'
Example 13.8,

og
l
Suppose You wish to estimate the effects
of a celtain sleeping pill on men and

b
and' the relevant data are shown
women. Two samPles ;;" ild;p.ndently 1"tut',
below:

3 .
4
Men Women

9
nr=36
Sample size
nr=64

t 9 frl=
Sample mean *z= 8'75 7 '25

ta o?= g qlr= 4

s
Population variance

/ /
Testthe,,.,ttt,ypotffiinstthealternativehypothesisH,:p1>Pz
:
s
at cr = 0.05.

p
Solution:

t
Alternative hypothesis: H' : p1> Pz

t
1. Null hYPothesis: Pe
Ho: [r1=

h
2, Level of significance:o = 0'05
(Xr-Xz\-(pr-P:)
3. Test - statistic: lt. .z

^V',15.\ n2

4. Critical regioir: Z> l'645


(From the area table of normal distribution, we
havgZo=Zr.ou= 1'645)
dl
Xz= 7 '25' oZ= 4'
5. ComPutations: Here n, = 36, Xr = 8'75, o? = 9' n, = 64'
(8'75 - 7'?!I: !
and hence z=-ff4- = #o = 2'683
!ft.*
lChTter 131 Statistical ltfergn:e Testjng of llypothesg:- L37
6. Conclusion: Since the calculated value of Z = 2,683 falls in the critical
region, so we reject our null hypothesis Ho: ftr = p, at 5 o/o
level of significance.
Example 13.9,
Two.astronotners recorded observations on a certain etar. The mean of 30
observations obtained by first astronomer is 8.85 and mean of 40 obqervations made
by second astronomer is 8.20. Pasr experience shows that each astronomer obtained

m
readings with variance of 1.2, Using a = 0.0L, can we say that the difference between

o
two results is significant.

. c
Solution:

t
1. Null hypothesis: Ho : lrr = Fz Alternative hypothesis: H, : p, * p,

o
2, Level of significance:o = 0.01 '(

8. Test - statistic: Z
s
(Xr-Xz)-0rr-rrz)
p
g
=
I
^w
1

o
nl

l
n2
Vn' n2

. b
("' o? =ali=o2)

3
4, Critical region; ZlI > 2.$75 (Z < -2.575 and Z> 2,575)

4
(From the area table of nornral distribution, we have Zs= Za.ooa= 2,575)

5, Computations:
99 :

t
Hcre n,=$Q, Xr = 8,85, nr=40, Xz = 8.20, o2=L.2, o = 1.10
(8,85-8.20)-0 _

a
0.65
=

st
and hence Z 2.407
I i =
1.10
,a.27

/: /
m+a6
6. Conclusion: Since the calculated value of. Z = 2,407 falls in the

s
acceptance region, so we accept our null hypothesis

tt p
FI,r: p, = p, at l. % It-.vel of significance. We may conclude
thut tlre difference between two results ie insignificant,
18.12 HYPOTIIESIS TESTING . DIFFERENCE BETWEEN TWO

h
,

POPULATION MEANS pr * p2, of ANO ol UNXNOWN ,


(LABGE SAMPLES)
When the popllation varianccs of, and o! are unknown, they are estimated by
their sample variances Sf and Sl and the test-etatistic to be used becomes,
(Xr..Id * (ur-:uz)
l=
EI-=
\*.il
This formula is used oniy for large sample sizea but the populations may or may
not be normal. The procedure for testing Ho is the same as explained earlier.
Basic Statistics Paft-II
I
Example 13.10.
I
Suppose that two ranclomly selected sarrrpies ;vield tlie follorving information:

Sample I Sample il
Size n,=82 tt, = 41

Meart Xr= 50 Xz= 55

Variancc s? = aor si = aza

om
Test the null hypothesis that the two population means are equal that

t . c is,

o
Ho:pr= p, ag&inst the alternative hypothesis Er:Ir < pt, at a = 0.01.

p
Solu.tion:

s
Nutl hypothesis: Ho: p1= lt: Alternative hypothesis: "H,
1. : pt1 <

g
F:r

o
2. Level of significance:s = 0.01

/s; b
-Xrt'*(u,-p:)
l
.
tXr
3. Test Z.
- statistic: {;7- 5;
^,

V"' 3
,\ l-) t --2

4
no

4, region:
Criticaf
9
9 :
Z<-2,326

t
(Frorn the area tabie of normal clistributioll, we have * Za= * Zo.o, = - 2"326)

5.
ta Here n, = 82, Xr 50,S?= 405, n, = 41, X, = 55, Si = 324,

s
Computations:

://
s \ - rlz ,rt

tt p
6. Conclusion: Since thc calculated valtte cif Z = - 1.40 falls in the
acceptatrcc region, so we ttccept our null hypothesfs

h (srvIALL SAMPLES)
Ho: pt, = 1t, at 19/u level'of significance.

18.13 TEST ABOUT pr - ps, of AUn ol XwOW'U, POPULATIONS NORI\'IAL r€


E,

In case of small sarnpie sizes, we can use Z-test fbr testing the diffeience ta
betrveen p, and p, when of and ol are knolvn and the populations are necessarily fo

normil. The Z-teet used is 'l =


(Xr-- Xr) Jlti * tz)
ffi----;
. /-J
loi + -lol
^\
\ v'/ n, tt,,
[Chapter 13] Statistical Inferenee Testing cf hlypothe*es 139
13.14 TEST ABOUT Fr * irE, ei ..UUi: ol NGT' [.i]i{}\YN, PUpI"JLATIONS
. NORMAL (SMALL SAIIIFLES)
This is a case whici: is riri't'cr*ni trcrn tlLi::'.I:revrr)us t,irree crlscs. [{ere the
conditions are that:
(i) the populaiior:s are nt;r:rral
(ii) of and oj ar" unknor,,r: but assrrnied tu t,+ ceual.

m
(iiil the sample sizee n, and n, are sinall and:ire selected inilependentiy.

o
The variances oi and oj lre unkno.*'n but o? = = o'. The parirnet,:r' o' !s

c
"i

t .
estimated by the sample variances. The sample e*tinrator of o2 is sf, wher(\

p o
r(Xr *8,;z+r4X,
g s
o
and sp=

l
n -ttr .-')

b
s2 called poolerl estirnator of the colrmon populaiion variancc a'.

.
p
ie T'he

3
difference 6, - X,) has the t - drstribution rvith (n, , n: 2) degrees of freedom

4 r{
where

99 .. (]Kr -- Xz) - (p: - rr:)

a t .t/;; - ;; *,

st
The tabulated value c,f 't' for ir-, .f r1o -- 2 degrees of fi'eedom is seen from tire

/: /
t-table.

s
For H, : p, * pr, the critical values ate - tg,(*..r-nr-Z) and toy2 (nr+nr-2)

tt p
For H, ; p, > pr, l,he critical v'alue is ta qn,+n2*I1
and For H,: p, < tr2 t.he critica! value is - tr, (r:1tn2-2)

reglon. h
The null hypothesis

Example 13,11,
FIo is rcjectcd rvhen the calculated value nf t lies in r:ejoction

Two sarnples are randornly sclccted from two classes of stucierris who have Lecn
taught by different methocis. An cxamination is given and the results are shownr as
follows:
Class i Class II
Sarnpic Size fi,=I 10

I Ulean fir - !)I'


*---.;'*::-* **. a-,; = S?
*1*--
Varianc*
L_.--___-_*-_
si= +l ..;i +,r
$', -- rjv
Basic Statistics Part-II
140
of the two classes of students have
On the aseumP'tion that the test scores trvo clif{'ererrt methods of teaching are
ihe
identical variancr:s, determine wiret!{er
equaliY effective at cr = 0'01'
Solution:
Alternative hYPothesis: H, : P1* Fz
1. Null hyPothesis: Ho : P' = ltg
2. Lcvel of significance:q' = 0'01
om
. c
l-

t
it.. Test - statistic:

it l>2.921 (t<*2'921 antlt>2'921)


p o
s
4. Critical region:
= 2'92L)

g
(Frorn the t-:table, we have t!,n, * or - 2) = io,
t0.,,00,

l o
lIcrc rl, = 8, X, = 95, st= 4?' nz =
10, X? = 9?, sl = 30,

b
5. ComPutations:

. ilo .1) 30.

3
(nr - f+(nr-
1),_ s;I 1) _-s; LL
H = B7 '4375'
\'-t -
= =u.1.1.1
8+ 10-2 =

4
"2=
p tf,+nr-2
-l]);g =-4= -
9
*--f:T ($5
= 2.9030 = - o.6ge
6.12, anrl hence t "

- t9
= c;tz\/ I
=
*m

ta t * Since tl"ir: ealculaterl value of =


0'689 falls in the

s
6. Cor:clusion: ouv nr'tll hypothesis

/: /
ilcceptilllcc region' so -!v1 accept
;;il'level H.,: lt, = iI, of uignirit*nte' on tl.re basis of
the two rlifferent

s
the cvirlcllcet we tnay conclucle thr'rt

tt p
*.tnoas of t'eaehing are equally effective '
IB.ISTESTABOUTFt-ItTIIEPUNDENTSAMI'LES'POPULATIONS

h NORilIAL
Supposethereuretrvopopultrtionswithtnettnlr,anclu,whichareunknown.
,l.wo rundOur sarnples of sizes tt, itltd nx are selected' It ie
further assumed that the
a sample of . some
are Suppolic \i,'o ,u.n,,l bio.:d' pre$$ure$ of
silmple s clepetr<tent. anrl again thoir blood
I'he patients are giurn a treatntent fof sonre
lratients. ut't' of oh'qervation' uit called depenclcnt sarnples'
'ul"ina
prcssures are recsrdeel. The,se tt'o set c]f obsen'ations is
,l.hc first sct of obscrvations is c,,licti'i.,-f"-' and the s*tonct
ofr*',,uu*ions are in pairs'
*'',*ul "'' Xn are
cailccl'*ftcr,ob*ervations.'lhcs0 "1t, then tlte
,bcf.rc' ohseniations and Yr, Y'r, Yj, :"' Yn are the'after'oblervltions'
thei
l'et'us find the
are (X' Yr)' (X'' Ye)' (X3' Y'1)' "" (Xn'Y;)'
lriii'trrl 'bseryatioll$
tlrc pltirct! ,..luo*. l,.t .titfu,nn*u
a, * X, -.Y,, 'dz = X,. Y,,
.liticretir:ir bretwr:en
rl ,= X -Y.,, ',cl'=-X"-Y"
lChlpter 131 Statistical Inlere.nge Te:ling ot Hypottljr?es ,__ L4t
-
The inean of the sarnpie 'd' values is denoted bv [. Suppose the corresponding
paraneler of the difference between paired obsanaticirs in the populations igr
denoted by po. 'Ihe various steps of the procridure are:
1. . Three different forms of null anrl alternative hypoi;heses are
(a)Ho : po = 0 (orp, =p2) and H, : pr, ;e 0 (orp, *[2)
(b)Ho:p, ( 0 (orpr,Spr) and Hr;iro > 0 (orpr>lr,z)
(c)Ho : [p ] 0 (orp, ]fr,) and Hi : Iii, < 0 (or1lr <ilg)
om
. c
Sornetirnes rve hrtve [r, exirrninc that the differences of the paired observaiions.in

t
the population hlrvc sornc sJ"rccificd value say A. In that case p,., = A

o
2. Level r:f signilicancu (-r. is riccirlcr[.

p
B" Test-statistie :
d'has tlic t-clistrihr.rtion rvith (n *
g s
1) degrees of lreedom.

l o
. b
3
4, Critical region:

4
Corrcspontling to each H,, thclo is a critical region.

5- 9
9 {$
, = 1:&
t
Computations: The test-statistrc t is calculated where

ta *ffi
/: / s
when Ho is pr,, = o, then , = =
6. Conclusion:

s = 0 is rejcctctl rf the calculatett value of 't' lies in the

tt p
The hypothcsis it,-, r.9Jcc'Lioi]
rurgion,

h
Exurnple 13.12.
Suppooe that a shoc cerrnpiiny' *'iinted to test matc::ra! foi' ther sales of shoes. For
each pair of shocs tho nerv nrnt':r'i;ll was placeel on onc shoe anrl tl're olrl material lvas
placed on the olhcr slioc. ,\ftcr ir givcri periocl of tiure a randonl sample of len pairs
of shocs rvas scicr:tc,! rrrrti tlrc rrc;lr wa$ ineaeur*d on a tcn-pornt scrlc u,ith the
follorving rcsults:
Plrir number I 2 .J *1 5 6 I It I tCI

Nirv rnattrial I l-i 7 7 l'r f] I I 7

0lrl nraterinl 4 s ;l B 1) '4 I D

Dilfcrences *i) '-'l1 r2


+t) il *1 -r.lt o +l
-+"
t
Al thc 0.05 lcvel r:f significance, is therc evidcnce lhrit thr: ilveinfi,,; ',.i'riir'iri
highcr fcl tlrc nc\v urltorjal than tlru olrl rnaterial?
L42 Basic Statistics Part-II
Solution:
1. Null hypothesis: Hu : p,,"* s poraor lrD = p,,o* - Fora S 0
Alternative hypothesis: H, : fl,rew >.pordor pD = ltnu* - pora > 0
2. Level of significance: o = 0.05
d-do
it. - statistic: [=

m
Test
s,1 / t/n
4. Critical region: t> l. 833

c o
.
(Irrom the t-table, rve havc t*(*-l) = t0,s(s) = 1.8113)
5. Conr:putations:
ot
Lct X, = nelv rnaterial and X, = old materiai.

I p
The necessary calculations are given below:
x1 2 4 r,l

g s I 7 5 8 8 7

o
I
l
X, 4 5 d 8 4 7 8 5. 6

. b
d=Xr-Xz -2 -1 +9
-1 _9 +1 +2 0 +3 +1

3
d2 4 I 1 I 4 I 4 0 i
=4
1

9
Ilere n [= + =*
10, Ed = B, Id2=29, =0.3,

si =9*[ro,-q$] =m=[rr-S]
a t
st
= 3.1222, s6 = 1.77, and hence

/: /
0 0.3
rF pr
- = 0.536
0.3

s
1.77lV10 L'tt
L f-
- - "

tt p
6. Conelusion: Sincc thc calculatecl vaiue of' t = 0.536 falls in the
accelltartcc region, so we accept our null hypothesis

h
Hur l!,,u* < fro1,1 flt 5 % level of significance. On the basis of
the evidence, we may conclude that the averagd wear is
not higher for the new material than the old mater:ial.
Example 18,13,
Tv,,o varieties of wheat oro un.O planted in ten localities with clifferences in yield
as lblio'uvsr 2, 4,'2, 2, 3, 6, 2, Z, 4, 3, Test the hypothesis that the populatiotr mean
differerrce is zero, using o = t).01.
Sol.ution.:
1, Null hypothesis: = Itr - ltz = 0
Ho : p, = [ts o].'[u
Alternative hypothesis: ll, , p, * [2 or p, = ltr.- [, * 0
?,. tevel of significance: s = 0.0I
T

d-do
B. Test - statistic: L-
s'1 / r/n
4. Criticalregion: I t l>8.2b0 (t<-B.2b0andt>8.2b0)
(i"rom the t-table, rve have tlr,,_
r)= to.tns(sr = 3.250)

5.

om
Computations: Herc n = 10,Ed = Bo,xd2= 106, d= +=#=r,
,i -
.c
*[ro, ry] =ml[,ro-'',?']
= 1.7778, s6 = 1.33, and hence
ot
p
.t 3-0 3 r-
= --'6=mltO=7.188
s
1.33 /V10 l'riJ
-- Y

g
6. Conclusion: Sinco the calculated value of t=7.133 falls in the cfitical

o
lp
region, so rve reject ow null hlpothesis Ho: pr, = [2 at I olo

b
level of significance.

.
13.16 TEST oF POPULATI0N PRoPoRTIoN (LARGE SAMPLE)

43
Let us consider a binornial population with a proportion p which js unknown
and we have to test a hypothesis about the unknown population parameter. A

9
random sample of size n (n > 30) is selected frorn the population Lnd the sample

t9\ F. ,n.
proportion ff is calculated. When sarnple size is large, the distribution of $ i.s normal

ta
with rnean p and standard error random varia'ole Z canbe calculated

s
V

\n:/
from $. Thus Z - -P---P-

^E /
p s
t
The random variable Z is usad as test statistic and the value of Z make.s a base

t
for the acceptance or rejection of the null hypothesis about the popuiation

h
proportion. The procedure for testing p runs as below:
1. We frame a hypothesis about the population proportion p. Let us specify a value
po for the population paramclcr p. The null hypothesis Ho and the aliernative
hypothesis H, can take any one of the following three forms:
(a)Ho:p = poandH,:p * po ft) Ho:p ( poandH,:p > po
(c)Ho:p ) po and H,:p < p,
2, Level of significance is cleciderl. [t is denoted by a.

B. Test-statistic: used in this cnse is , = where ec, =1* po


*
V,
L44 Basic Statistics Palt-II
x
The sample proportion $ can also be written as p
n ,
where'X'is the number
x
of successes in the samPle of size n. Putting fi = n ln the above formula for Z,
x_npo
rl
;*nPo 1_npo
we get lJ

m
{nPoqo

X-nPu
c o
.
cnn also be r.rsed as le.sl-slo tlstic for testing population

t
Thrrs ll = --*L
! r"r p., q0
pt'oltortiotl
4. Critical rcgion:
1-r'

p o
s
T'lic criricul riegion dcpcrrrls ullon the nlternative hypoihesis H,. The three fcrrms

g
o
oil [, lrt'c
(a) l{, is p * pe. ln this case

bl
.
thcr rojcction rcgion is

3
takt'rr in bot"ir e nds of tlie

4
r-;itt: 1i lin g rilstribtriion.

T. 9
'l-l-rtr lcjrrctir-in rcgiotr on

9
,-:nclr srtlc is cqtral to alL'

t
I
'llic tlvo t:r'itical valucs *Z g,t2 Z= Zst|

a
0

t
* Z,rt2 tnxlZul2 seParalc Figure 13.14

/: / s
the critical rcgiotr fronr lhe ncceptance region as shown in Fig' 13'14' Ho is
rerjected rvheil thc cralculirterl vitlue of Z lies in rejectioll region'
Ho is rejected

s
*
rvltctr 'l < * Zuti or Z > .2,,t2. 'fltc values be[ween Zsly and Zr12 form the

tt p
acceptitttce rc:giott. 'l'hc test is callcd two - sided'

h
ibr .Ll, : p >' Po. Itr this casc
tlre r:cjection rcgiotr is Reject,ion Region
talicu onlY iu the right (l-_a) 1
sirlcr of tlic sarullling
c{isLt,iilritrutl.'l'ltc test is
P = Po
calleil uttc *'sidt'cl kr thg
ri61'rt, 'l'he ct,iticrtl vnluc Z=0 Zu
l;i-'trt'ct'tt thc lttic.:Pt:tttcc
Ftgure 13.15
rcgir-'t'r lirttl t lrc lsji:ctirlu
lr-'iritttt is 2,, it,s shorvn in Fig' tlj,15' The values above Zuform the
critical region
irnri tltc vitltttrs icss tltirn 2,, for:rtr the acceptance region whet'e as
Z* is the
r:r'iticril r':riuc ltncl shortlrl not bo used for acceptance or rejectiolr of Hu,
(c) When H, is p < pp, the cntire
rejection region falls in the left Rejection Region
side of the sampling
distribution. The test is cailed
one-sided to the left. The
critical value - Zr'is a point tt
P=Po

m
between the critical region and -Zd Z=0

o
the acceptance region as Figure 13.16

.
shown in Fig. 13.16. The value less than - Z, form the critical region. Ho is

t
rejected when the Z value calculated from the sample data falls in the rejection c
o
region otherwise the null hypothesis Ho is accepted with the usual meauing of

p
s
the term 'acceptance'. The rejection region is Z < -Zr,

g
5. Computation 6. Conclusion

o
Example 13,14,

l
In a poll of 1000 voters selected at random frorn all the voters in a certain

. b
distriet, if is found that 518 voters are in favour of a particular candidate. Test the
null hypothesis that the proportion of all the voters in the district who favour the

than 50 percent at cr = 0.05.


43
candidate is equal to or less than 50 percent against the aiternative that it is greater

9
Solution:

9
1. Null hypothesis:
t
Ho :p < 0.50 Alternative hypothesis: Hr : P > 0.50

a
2. Level of significance:G = 0.05
3. Test - statistici Z = t
s .tr
P-Po

/: /
Critical
s
region: Z> L.645

tt p
(From the area table of normal distribution, we have Zo= Zoos = 1.645)
"x 518

h
5. Cpmputations: Here n = 1000, X= 518, ij=; - 1000 = 0.518
po = 0.50, go = 1-po = 0.50, and hence
(0.518 - 0.50) 0.018
Z_ (0.50x0.50) 0.016' =
1.r25
1000
6. Conclusion: Since the calculated valtre of Z = 1.125 falis in the
acceptance region, so we accept our nttli hypothesis
Ho: p < 0.50 at.5 7o level of significance.
Exomple 13.15,
At a certain college it is estirnated that at most 2.5 % of the students ride
bicycles to class. Does this see.m to be a valid estimale, if in a random sample of 90
college students, 28 are found to ride bicycles to class? Use a 5 % level of
significance.
146 Basic Statistics Paft-II

Solution:
Alternative hypothesis: H, : P > 0.25
1. Null hypothesis: H, : P < 0.25
2, Level of significallce; a = 0.05
0-po'
3. Test - statistic:

m
^\ iPoQo
i-
vn
4. Critical region: z> l.G-15

c o
.
(Irrom the area table of not'nlal clistribution' we haveZo=Zo,u = 1.645)

5. Cornputatiotrs: 'I{crclt=90, X=28, 0=* = # = o'81'


ot
p
' ltu = 0.25, Qo
'= 1 - Po = 0.75, and hcnce
0.31 - 0.25

ve0 og
s
t=-ffi
0.06
= 1.32 =ffi;ft

bl
Sirrce the calculatect value of Z = 1'32 falls in the

.
6. Conclusion:

3
acceptance region, so we accept ottr null hypothesis Ho:

4
p < 0.2ir at 5 % level of significance. On the basis of the
tviticncc, we may conclude that at most 25 % of the

99
sttidctrts ricie bicycles to class'

t
13.17 TEST OF DIFFERENCE BETWEEN TWO POPULATION

a
PROPORTIONS, Pr - Fr (LARGE SAMPLES)

s t
suppose there are two binornial populations with proportio'ns
p, and p, which

/: /
n2 are selected
are unknown. Two independcnl large random samples of sizes n, and
difference
from the populations ancl sartrplc proportion $, and fl, ate calculated' The

s
tt p
(fi, - 0r) is a random variabl,; iinrl has the normal distribution with mean Pr - Pz and
6"q,T'Jr,

h
standard erro'1 l- + n"
Vnr - given beiow:
The proced..." fo, testing of the difference between p, and p, is
1. Thrbe forms of the hypothcscs are as below:
(a)Ho:Pr-Pz= 0 (or Pr = PJ and Hr:Pr -Pz* 0 (orpt *p2)
(b)Ho:pr-pr(0 (orp, sp3) and Hr:Pr-Pz> 0 (orp1 >P2)
(c)Ho:pr-pz> 0 (or p, ) pr) and Hl:P, -Pz< 0 (orp1 < P2)
I Level of significance is decided and is denoted by a'
.J. Test-statistlrc:
The random variable Z is used as test statistic where
6,-$r-(p,-p,)
PrQr PvQz
I1 I 11+
Chapter 13 Statistical Inferepce Testing of Hypotheses 147
btrtz as defined above is only in theory. In actual practice when Ho is p,-p,
=0
(or p, = pz), the values of pr, er, p2 anri
Q2 a].e not known because these are all
unknown parameters. when IIo is pr = p,i, then wc assume that the cornrnon
population proportion for both populations is p.. This proportion p^
is estirnated

by 6. by poolir-rg the data frorn both samtr;les. ,l.hus

Thus the test - statistic userl in actual practicr: is

om
c
r0,-0r-o
AA N
A n (t + 1\
t .
o
Pc Qc Pc (lc
nc tc
T
[nl ,,"j

p
nl [:

s
lVhen Ho is p, - pz = A (say), thcn the test statistic r.rscci is

g
,An
(Pr-P.J-A

o
ry

l
L
nAnn
PrQr-t

b
P:r(i:

4. Critical region:

3 .
4
The critical region depencls upon the alternative hypothesis H,. For
three forms

9
of H,, the rejcction r.egrons ar.c:

9
(a) When H, is p, - ps = 0 or p, pr, thu rejcction region is taken in both ends

a t
of the sampling dist'ibution.=The critical values are .2112 and zop. The
-

t
values grcater than'1,112 and less than zrrl2forrn the rejecii<_rr-, .ugion.
- rh"

/: / s
values which lie betwee n - zar2 and zop form the acceptance region.
Ho is
rejectecl if Z < -Zutzor Z > Zu12.When Ho is pr then it
-"pz = 0; does not

s
make any difference whcrhcr we take (0,-0, or (02-$,) in the test-statistic.

tt p
(b) when H, is p, - pz > 0 or p, > p2, the entire rcjection r.egion is takcn rn the
right side of the turvc. It is called one - tailed test to the right. The critical

h
value is zo and if z ries in rejection regio, the hypothesls
6, - p:r) < 0
or (p, < p,) is rejected and H, : pr > p2 is accepted. It is important to note
that if H, is Pz > Pr, thcn the clifTerence 6r-0,) is used in the test statistic.
-
rhus Z=--S4:= The rejection region is Z > Zo.

(c) When H, is (p, - p,2) < 0 (or p, < pJ, the rejecJion region equal to cr is taken
in the extreme left sirle. 'fhe critical value is - Zo and the hypothesis
Ho : (pr - pJ > 0 is rcjccfed anci H, : (pr
- pz) < 0 is accepted. The critical
region rs Z<-Zo.
5., Computation 6. Conclusion
Basic Statistics Part-II
r48
Example.13,i6,
The cigarette-manufaciuring firrn ctistribuies two brands
of cigarettes' It is
S0 of 150 smokers prefer
found that 56 of 200 smokers prefcrr i:rrand 'A' and that
brar.rd
,8,. Test th; iry;;;;;*i."o, 0"05 level of significance that brand 'A' outsells
brand 'B' by 10% against the altcrnatire hypothesis
that the difference is less
than 10 %.

m
Siolution:

o
1. Null hYPothesis: FI6: Pr* p, > o.1o

c
< o.1o
Alternativc hYPothesib: H, t).

.
r_I - P,
:

t
2, evel of significalrce: G - 0.05

o
tr

p
rr*
3. Test'- statistic: Ar\ n

s
P rQr,t P'zQz

g
nl i1:

l o
4. Critical reglon:
Urrtlcal region: z < -- 1.6,{5
(From the area tabie of normal distrihution' we haye - Zo= -Zoor,= -

b
1.645)

.
prefer brand A)'
5r Computat'ions: Here n, = 200' Xr = 56 (No of smokers who

3
',n',=tr5C1'X'=30(No'ofsrnokerswhopreferbrandB)'
* 4
9 #
0,= 0' = 1-
= =o'28' 0r=o'?2'

fi--*9 #
a t '(0.28-0.2)-0,10-- = =
-= =0'2,02=1 0z=-0'8' andhence

s t'=@\ ,oo- '-t5o


"-# -9'9,?

/
-0'0455 -o.44

: /
s
tt p
6. Conclusion: Sincethecaicuiatedvalueof.Z=-o,44fallsintheacceptanceat
- 0'10
region, so we accept our null hypothesis Ho: Pr P:2
brand
5 9/o level of signiticance and we may conclude that the

h
u""Tort{niii';r*r}e
'A'outsells brand'B'.

of 180 high schoor students was asked whether I}.:I r","}1 in


*"r;" ;;;'f;th;;;-; trruit *"tirers for help with a home work assignment
- --.1-^J rL^
srr'rdents'*'as asked the
i#a"#;il #;;;;tl"r.rra"," r**pr" of rbb high schoot
::il:";''i:H:;;;#;;;d ;;;;;;;;r'
r. r TT-- !I^^
in -^^,,1+
1.'lsilef l:i'::l; Y::.11: ::*:
revel or significance to test rvhether or not
ffiJr;;r;'irii;;j;*"*ur" ;; rrretrueo.oiproportions of high school students rvho turn
i r I r,f ^--r^ --.L^ l"--

;il;;.'; ffif-;;;;" bltween the


ffi;i; f.trl"* trtrru, thun tl'";L*qt\9'o {"i'}"lP t} th*tg tt"'
h{athematiei; English
Mother 59 B5

Father 91 65
[Chaphr 13J Statistical Inference Testing of Hypotheses 149
Solution:
l. Null hypothesis: Ho : pr = p2 or pr
- pz = 0
: Alterriative hypothesis: H, : pr * pz or pr p2 * 0
-
2: Level of significancei c = 0.01

m
3, Test - statistic:

c o
4. Critical region: I Z I > 2.515 (Z < - 2.575 andZ> 2.575)
t .
o
(From the area table of normal distribution, we have Zg=Zo.,od= 2.575

5. Computations: Here n, =
s
150, X, = 91, n, = 150,
p
X, = 65,

,.. Xl
P'=il 91

ogXz
= fE6' P, =_=-
65

,. lt;;;-n,o, .blrbo(ffi)
nz 150 '

150(#) +

3
.,0, * el + 6b rb6
m= -T06-- = 306 = 0.52,

- = -4
pc = =

st9
= 0.48,
9
0. = 1 0. 1 0.52 and hence

t
( 65\

ta
tr_ Ilbo-lbor-o

s
L-

:/ /
s
6. Conclusion: Since the calculated value of.Z = 3.003 falls in the crifical region,

tt p
so we reject our null hypothesis Ho:. p, = p2 at I 7o level of
significance and we may conclude that there is a difference

h
between the true proportions of high school students.
13.IE CHOICE OF PROPER TEST - STATISTIC
In a certain given situation, we have to choose the proper test-statistic. For
example the population mean p can be tested with the help of Z-teat and t-test. The
testing of hypotheses along with other things, mainly depends upon the sample size.
The sample size plays a major role in the testing of hypothesis. The. following table
can be used for guidance in choosing the proper test.statistic.
n - Large n - Small
o - Known Z - test Z - test
o - Unkhown Z -.test t - test
Paft-II
Basic Statrttics
I
T DEFTNITIONS 1

t
Hypothesis
-i?*u^unt v
purpose of testing'
about a population parameter developed for the
/or "l
//Hypothesis is a statement which may or may not apper
ars be true after conclusion' t
.A

SYpothesis Testing
r

m
.rfi;9|;;.liu" of'ffithbsis testing is to check the validitv of a statement about
a

o
populaiion parameter. I

c
or

.
theoryto deter5nine whether tr

t
A procedure based on sample evidence and probability

o
hypothesis'testing' T
the hypothesis is ,"".onible statement or not is called
"

p
T
Statistical HYPothesis value of a population
A statistical hypothesis is a statement about the numerical
s
l't

parameter

og or T

is l
about a population'
A statistical hypothesis is a quantitative statement is

b
.-
.
X"l'jf'ilr"Jlffis is any hyporhesis which tested for possible rejection or A

3
acceptance Lnder the assumption that it is true'
T

4
or a

9
of a population parameter'
The null hypdthesis is a statement about the value

evidence't9
Alternati.rl ffvpothesis or Research Hypothesis which the researcher wants T
The altern"ti"Jrlv"p-"ii"Jrli rtraUy the hypothesis for

a
le

t
i
to gather suPPorting C

s
or

/
statement specifuing that the populatiori parameter
is some value other t'han the A

/
\rA "

*iri.t :
ci
@the nullhYPothesis

hypothesis s
9.
- Simple HYPothesis

tt p
called siinPle
A hypothesis ,pu.rfy a1 values of parameters of a distribution is ,,: A
d.
, or .e

h
T
uniquely specifies
A hypothesis is said io be a simple hypothgsis if the hypothesis A
the distribution from whichthesample is taken' ,str
\Gti*po"ite HYPothesis ,w
does not completely specify the
A hypothesis is said to be a composite hyirothesis if it
probability distribution. A
or
of a distribution is di
A hypothesis which does not specify aIL values of parameters -l
calledcompositehypothqsis. '
C
r
\l -- .q, i -\
lgignificance.Level or Level of signifioance is called the significance level cr'
T
The probability of rejectine u trru ,,;l hypbthesis vi
'.or
The probability of making a type I elrort is called ttle significance'level of the T]
hypoihesis test and is denoted by cr (alpha)' r€
I
lChapter 13] Stati:]ical Inference Testing of Hypotheses . 151
Tests of Signipdance
A significance lest is a statistical test laying down the procedure
for decirling
whether to accept or reject a'statisticat hypoiheJls.
Test Statistic'g/

A statistic used as 'a basis for deciding whether ,the null hypothesis should be
rejected is called test statistii
or
The sample quantity on which thq,decision to support Ho
om
c
or Hr is based is called the

.
teststatistic. a ! f,-i
Rejection Region./ (11;,h"' ,r'4

ot
p
The rejection region-is the *ut ofio.*ible computed vaiues
of the test statistic for

s
which the null hypothesis wilt be rejected.

og
The set of values for the test statistic that lead to rejection

l
of the null hypothesis Ho

Acceptance Region \-,


. b
3
The set of values for the test statistic that lead to accept the
null hypothesis is called

4
acceptance refiion.

9
or

t9 -l
The portion of the area under a curve that includes those values
of a statistic
--: that

a
lead to acceptance of the null hypothesis.
rr

t
one-TailedTest*.:?.tii,n]-\:L.>

/ s
A statistical test in which the critical region is at one end of sampling

/
distribution is

:
called as one-tailed test.

s
tt p
A one-tailed test.of.hypothesis
is orru'lr, which the alternative hypothesis is
directional, and includes either the symbol ,: < ,, or,, > ,,.
Two-TailedTest ",Jr ,,
h
+ t
i,
A two-tailed test of hypothesis is oru which the altern#J" ;rr;esis does nor
specifr deilarture from Ho in prrii".rtur direction; such an alternative
" is written
with the symbol " * ".

A statistical test in which the critica, -11"" is located ui both ends of sarnpling
distribution is known as two-tailecl test.
Critical Value
The value which separaCes the rcjectionlind acceptance regions
is cAlled the critical
value of the test statistic.
-''"t",, ot
The dividing point betweer, the regioq *h*,r" the null hypothesis rejected a,rd the
- 1S
region where it is accrpre,i is said.to be critical valu".- "
I q, , ./' '' Basic Statistics Part-II
:'-v/ ' : : :

'fype I Erro|*.,.
If *.e r:eject {true hu11 hypothesis, the e*or is called a type I error.
'or
Type tr error is the rejection of Ho when it is true'
Tyne II Erro2-r
If we accept afialsq null hypothesis, the error is called a type II error.

m
\*--/

o
nF Of 9.

c
it is false. is known as type II error'

.
Acceptance of Ho when

t
Power of a Test

o
1
The power of a tesl[tire probability of rejecting the null hypoihesis when it is false.
i

s p
The power of a test is the probability that the'test will lead to a rejection of the null

g
hypolhesis Ho when, in fact, the alternative hypothesis Hr is true.

o
l
Power Curve

b
-i
A graph of the problS-iliry of rejecting Ho for all possible values of the population

.
parameter not satisfuing the null hypothesis is knbwn as power curve.

3
4
MULTTPLE _ CHOICE QUESTIQNg

9
9
A statement about a population developed for the purpose of testing is 0alled:

t
1.
(a) hypothesis (b) hypothesis testing
(c) level of significance
ta (d) test - statistic

s
Any hypothesis which is tested for the purpose of rejection under the

/: /
c)

assurnption that it is true is called:


(b)
s
(a) null hypothesis alternativehYPothesis
(d) comPositehYPothesis

tt p
(c) statisticalhypothesis
3. A statement about the value of a population parameter is called:

h
(a) null hypothesis (b) alternativehYPothesis
(c) simple hypothesis (d) compositehYPothesis
,4. Any statement whose validity is tested on the basis of a sample is callqd:
(a) null hypothesrs (b) alternativehYPothesis
(c) statistical.hypothesis ' (d) simple hYPothesis
5. A quantitative statement about a population is ealled:
(a) researchhypothesib (b) compositehYPothesis
(c) simple hypdthesis " (d) statistical hypothesis
6. A statement that is accepterl if the sample'data provide sufficient evidence that
the null hypothesis is false is called:
(a) simple hypothesis (b) compositehYPothesis
(c) statisticalhYPothesis (d) alternativehYPothesis
[Chapter 13] Statisticat Inference Testing of Hypotheses .153
7. The alternative hypothesis is also called:
(a) null hypothesis (b) statisticalhyporhesis
(c) research hypothesis (d) simple hypothesis
i8. A hypothesip that specifies all the values of parameter is called:
(a) simple hypothesis (b) . composite hypothesis

l, (c) statisticalhypothesis
The hypdthesis p.< 10 is a:
(d) none.of the above.
m
(a) simple hypothesis (b) compositehypothesis
c o
.
(c)
t
alternativehypothesis (d) difficult to tell.

o
10. If a hypothesis specifies the population distribution is called:

p
(a) simple hypotheois (b) compositehypothesis

s
(c) alternativehypothesis (d) none ofthe above.

g
11. The probability of rejecting the null hypothesis when it is true is called:
(a) level ofconfidence
l o
(b) level of significance

b
(c) power of the test
.
(d) difficult to tell

3
12, The dividing point between.the region where the null hypothesis is rejected

4
and the region where,it is not rejected is said to b'e:

9
(a) critical region (b) critical value

9
(c) acceptance region (d) significant region
t
18. If the critical region is located equally in both sides of the sampling distribution

a
t
oftest - statistic, the test is called:

s
(a) one tailed

/: /
(b) two tailed
(c) risht tailed ' (d) left tail€d

s
14. The choice of one-tailed test and two-tailed test depends upon:

tt p
(a) null hypothesis (b) algernativehypothesis
(c) none of these (d) compositehypothesis

h
15. A rule or formula that provides a basis for testing a null hypothesis is called:
(a) test-statistic (b) populationstatistic
(c) .: both cfthese (d) none ofthe above
16. The test statistic is equal to:
(arffi
Sample - Population
(b) -Earnple
statistic - ParArneter
Standard error ofthe statistic
/a\ Selnple mean - Population mean Statistib-E(Statistic)
\v/ Population standard deviation (d)
Variance of the statistic
17. I-crigalsocalled:
(a) confidencecoeffrcieryt (b) power of the test
(c) size ofthe test (d) level of significance
Basic Statistics Part-II
_1I1
18. If true and we reject it is called:
Ho is
(a) type-I error (b) type-Il error
(c) standard error (d) sampling error
is:
I"9. The probabiiity associated with committing type-I error
(a) B
(b) cr
(c) 1-B (d) 1-cr
2A. 1 - is the probability associated with:
cr
G) type-II error
om
c
(a) type-I error

t .
(c) level of confrdence (d) ievel of significance

o
21, I,evel of significance is also calied:

p
(a) power of the test &) size of the test

s
(c) level of confidence (d) confidence coefficient

g
22. The probability of r'ejecting Ho when it is false is called:

o
(b) size of the test
l
(a) power of the test
(d) confrdencecoefficient
b
(c) level ofconfidencc
(b).
28. In testing hypothesis o + p is always equal to:
(a)
3
zera

4(b)
one
(c) (d)
difficult to tell

9
two
24. The significance level is the risk of:
(a) rejecting Ho when Ho is correct
t9 (d)
rejecting Ho wheri H' is correct

a (b)
accepting Ho when Ho is correct'

t
(c) - rejecting H, when H, is correct

s
is:
25. An example in'a two-sided alternative hypothesis

/: /
(a) H,:p<0 H,:trr>0
(d) H,:P*0
s
(c) II,:p)O

tt p
tabulated value of t'
Il the magnitude bf calculaied value of t is less than the
;"; H,l;;wo'sided, we should:

h
(a) reject Ho (b) accept H,

(c) not reject Ho (d) d.iffrcult to teII

27. Accepting a null hYPothesis Ho:


(a) Proves that Ho is true (b) proves that Ho is false

(c) irnplies that Ho is likely to be true ' (d) proves that p < 0'
when sample size is:
28. The chance of rejecting a true hypothesis decreases
'(a) decreased (b) increased
(c) constant (d) both (a) and (b)
29. The equality condition always appears 1n:
(a) null hypothesis (b) simple hypothesis
(c) hypothesis (d) both (a) and (b)
alternative
I
lchapter 13I statistlcat rnference Testing of Hypotheies 155
S0: Which hypothesis is always in an inequality form?
(a) null hypothesis (b) alternativehypothesis
(c) , simple hypothesis (d) composite hypothesis
81. Which of the following is not composite hypothesis?
(a) p>p.
(c) lr =ilro

m
32. P(Type I error) is equal to:

o
(a) 1-a

c
(c) o
88. P(TYpe II error) is equal to:

t .
o
(a) o

p
(c) 1-q,

s
34. The power of the test is eqtral to:

g
(a) cr

o
(c) l-s
35. The degree of confidence is equal. to:

bl
.
(a) cr

3
(c) l-cr

4
36. alZ is called:

9
(a) one tailed significance level (b) two tailed significance level

9
(c) left tailed significance level (d) right tailed significance level
t
37. In an unpaired samples t-test with sample sizeg nr = 1l and n, = 11, the value

a
t
of tabulated t should be obtained for:

s
(a)' 10 degrees offreedorn (b) 21 degrees offreedom

/: /
(c) 22degreesoffreedom (d) 20degreesoffreedom
38. In analyzing the rebults of an experiment involving'seven'paired samples,
s
tt p
tabulated t should be obtained for:
(a) 13 degrees offreedorn (b) 6 degrees offreedom
(c) 12 degrees offreedorn (d) 14 degrees offreedom
h
89. The purpose of statistical inference is:
(a) to collect sampld data and use them to formulate hypotheges about a
population
(b) lto 4raw sortclusion about populations and then collect sample data to
support t[re conclusions
(c) to draw conclusions about populations from sample data
(d) to draw conclusions about the known value of population parameter.
40. Suppooe that the null hypothesis is true and it is reiecied; is known as:
(a) a type-I'errsr, and its probability is p ',*:
(b) a type-I errer, and its probability is o
(c) a type:Il error, and its probability is o
(d) a type-Il error, and its probability is p
Basic Statistics Part'II
156
the proportion of
41. An advertising agency wants to test the hypothesis that pereent. The null
adults in pakistan rvho read a sunday Maiazine ]: 2q l
Magazine is:
hypothesis is that the proportion reading the Sunday (
(a) different from 25 o/o (b) equal to 25 %
(c) less than 25 % (d) more than 25 % 1

m
iq distributed:
42. If the mean of a particular population is pu, Z = ffi

o
3

c
4

.
(a) as a standard normal variable, if the populatign
is non'normal

t
,' as a etandard normal variable, if the slm.nle
is large

o
(b)
(c) as a standard normal variable, if population is normal

p
;

s
(d)asthet.distributionwithv=n_ldegreesoffreedom

g
A
(X'-X')-0rr-Pe)

o
of two PoPulations, Z =
l
48. If Fr and l\ are means m 2

.b
A
\"."

3
distributed: I
4
(a) as a standard normal variable, if both samples are independent and less

9
than 30

t9
(b) as a standard normal variable, if both populatigns
are normal

a
(c) as both (a) and (b) state

t
D

(d) as the t-distribution with n, * rlz - 2 degrees of


freedom

/: / s
44. If the population proportion equals Po, then
ij-po is distributed:

s
'= ffi, o

tt p
\, A

(g) >
as a etandard normal variable, if n 30'
6

(b)
(c)
(d)
h
as a Poisson variable
as the t'distribution with v = n - 1 degrees
as a 2g2'distribution with v degrees of freedorh
of freedom

Ho; the absolute value of the


45. Given Ho: [r = tlo, Hr : p * [r,, o = 0.05 and we reject
Z-statisticmust have equalled or been beyond what value? I
(a) 1.96 (b) 1.65
(c) 2.58 (d) 2.33
8

I
46. Given lro = 130, I = 150, o = 25and n = 4; rvhat test statistic is appropriate?
(
(a) t (b) z
(c) x'
(cl) Ii I
1. (a) 2. (a) 3. (a) 4, (c) 5. (d) 6. (d) 7. (c) 8. (a)

e. (b) 10. (a) 11. (b) 12. (b) 13. (b), L4. (b) 15. (a) 16. (b)

L7. (a) 18; (a) 1e. (b) 2a, (c) 2L. (b) 22. (a) 28. (d) 24. (a)
26. (d) 26. (c) 27. (c) 28. G) 2e. (d) 30. (b) 81. (c) 82. (c)

m
33. (b) 34, (d) 85. (c) 36. (b) 87. (d) 3E. (b) 39. (c) 40. (b)
4r. (b) a2. @) 43. (b) 44. (a) 46, (c) 46. (b)
c o
SHOBT QUESTIONS
t .
1. Given I = 100, oO = 16 and po = 90. Find Z.

p o
Ans.0.62

g s
l o
2. Given o=80, n=625, lto= 050andX= 356. FindZ.

I= .b
Ans.l.8E

3
8. GivenHo: p= 12, Hr: $> 12, n=64, tr6, o= 10 andc=0.05. Find Zand,

4
make the statistical decision.

9 9
4. GivenHo:p= 1b0, n= 86,

a t
I= 1G0, S=60ando=0.05. FindZandmakethe

t
statistical decision.

/: / s
Ans,.Z= 1, accept Ho

6. I=
s
Given 120, po= 100, s=34.75 andn=25' Findt.

tt p
Ans.2.E8
6. GivenHo:Fr= lto, H,: pf [to, a= 0.05, t=-2.08and n= 26. Make the statistical

h
decision.
Ans. reject Ho and assert H,

7. GivenHo: p= 10, Hi: p* 10, n = 16, I= 10.5, e=0.?5 ando=0.05. Findtand


make the statistical decision.
Ans. t = 2.67, reject Ho
8. Given o?= 150, oi = 180, nr = 30 dnd n, = 30. Find or,
-*r' .:
Ans.3.32
9.GivenHo:Itr=t,,,I,=6.53,7l=4.44ando*,_1,=0.?8.FindZ.
Ans.2.68
158 Basic Statistks Part-II

10. Given X, = 26, I, = 18, o*,-*r= 3'41, Ho:P*S pr.and o = 0'05' Find Z and

. make the statistical decision.


Ans. Z = 2.35, reject Ho

11. Given Ho: pr = P2, Hr: $t* llz,n, = 100, I, = 14, 4= 4, n, = 150, Iz = 11' 4=n

m
and cr = lYo. Find Z and make the statistical decision'

o
Ans.Z= 9.5, reject H,

L2.
t .
Given Ho: trr ) pr, Hr: Fr < [tz, n, = 60, I, = 75'6, S, = 25, n, = 40,
c
*z= 89'2'

o
sz = 30 and s = 0.05. Find Z and make the statistical decision.
Ans,Z= -2.37, reject

s p
Ho

g
-Xr= 23,sp 11'48, n, = 19' nz= 23 and o
13. Given Hol Pr = P2, H,: Pr * [tz, I, = 15, =
= 0.05. Find t and make the statistical decision'

l o
b
Ans.t =-Z.Z]o,rejectHo

3 .
L4. Given ,1= 1'43, *1= 5'21,n, = 10 and n, = 10' Find sp'

4
Ans. 1.82
15.
99
Given'X, = 84, Xz = 77, nr= 31, n, ! 41, Ho: p, = p, and .*,
-*r=
3.07. Find t.

Ans.2.28

a t
t
16. Given xXr = 671, EXI= 38275, n, = 12' EX, = 551, DXtr= 3L707 and n, = 10.

Find t*, _*r.

// s
s- :
Ans.4.4

tt p
Given Ho: pz ltr = 10, H,: Itz - lrr > 10, n, = lb, nr = 18, Ir = 10' Iz =
25'
L7.
sp = 31.68 and cr = 0.05. Find t and make the statistical
decision'

h
Ans. t = 0.40, accept Ho

18. Given Ho:pr = F2, Hr: ltr # Ft,l, n = 10, f, =-0'5, so =3.44ando= 0'05' Findt
: and make the statistical decision'
Ans.t --0.46, accePtHo
lg.GivenHo:P=0'5,H,:p*0'5,0=0'54'n=1340ando=0'02'Fin-dZand
make the statistical decision'
Ans. Z = 2.93, reject IIo
20. GivenHo: p)0.85, H,: p < 0.85, n= 400,0 =O'gf andcr=0'01' Find'Zand
make the statistical decision'
Ans.Z =-2.23, accePtHo
lChapter 13I Statistical Inference Testing of Hypotheses 159

2L. Given Ho: pr = p2, H,: pr * pz, fi, = 0.30, $r-- 0.25, n, = 1200, nz = 900 and
cl = 0.05. Find Z and make the statistical decision;
Ans.Z = 2.53, reject Ho
22. GivenHo:pr -pz) 0.10, H,: pr-Pz < 0.10, n, = 200, nr= 150, $, = 0.28,
fiz= O.2O and cr = 0.05. Find Z and'make the statistical decision.
Ans.Z=-0.44,acceptHo
23. Describe the procedure for testing hypothesis about mean of a

om normal

c
population when population standard deviation is known.

.
24. Explain the general procedure for testing of hypothesis regarding the
sample size is large.
ot
population mean when population standard deviation is unknown aud the

p
25, Describe the procedure for testing hypothesis about mean of a normal

s
population when population standard deviation is unknown and the sample

g
size is small.

l o
26. Describe the proceduie for testing equality of means of two normal populations

b
when population standard deviations are known and sample sizes are large or

.
small.

3
27. Describe the procedure for testin! equality of means of two normal populations

4
when ol = c2 but unknown for small samples'

9
28. Describe the procedure for testing hypothesis about two means with paired

9
observations.

t
29. Explain the general procedure for testing of hypothesis regarding the

ta
population proportion p for a large sample.

s
30. Explain the general procedure for testing of hypothesis about the difference

/: /
between two population proportions for large samples.
31. Distinguish between null hypothesis and alternative hypothesis.

s
32, Differentiate between type I error and type II error.

tt p
33. Differentiate between one-tailed test and two-tailed test.
34. What is meant by critical region?

h
35. Differentiate between simple hypothesis and composite hypothesis.
36. Differentiate between acceptance region and rejection region.
S7. Define null hypothesis and describe the general procedure for its tdsting.
38. What is meant by test-statistic?
39. Explain the terms hypothesis and tests of hypothesis. :

40, Explain the terms level of significance and tests of significance.


41. What is meant by a statistical hypothesis?
42. Explain the-difference between one-sided and two-sided tests, When should
each be used?
43. Explain with example the clifference between acceptance region and rejection
region.
44. What is meant by critical value?
45. Define the terms potver of a test an,i power curve.
160 Baslc Statistics Paft-II

E}(ERCISES
I{
/t . A sampie of 900 plants is found to have a mean of 34 cni. Can it be reasonably
regarded as a rahdom'sample from a largB population with mean 32 cm. and
standard deviation 23 cm. Use 5 %olevel of significance.
Ans.Z = 2.6L, Ho:p = 32, H,: tt*32: rejectHo

m
2, Suppose that the variance of the IQS of the high school students in a certain

o
cityis 225. Arandom sample of 36 siudents has a mean Ia 9t 106. Ifthe level of

c
significance is chosen at 0.05, should we conclude that the IQS of the high school
studente in this eity are higher than 100?

t .
o
Ans. Ho: p ( 100, Hr: lt > LO}; Z = 2.4; reject Ho

s p
3. Suppose that scores on an aptitude test used for determining admission to
graduate study in statistics are known to be normally distributed with a mean

g
of SOO and a population standard deviation of 100. If a random sample of 64

o
l
applicants from a college has a sample mean of 537, is there any evidence that

b
their mean score is different from the mean expected of all applicants? Use
= 0.01.
.
cr

Ans. Ho: p = 500,


.E
H),! * 5OO,

43
Z= 2.96; reject Ho

9
4. I€t X 1 N (p, fOO) and X be the mean of a random sample of 64 observations of

t9
" X, giveR that X = 15. Test Ho: p = Lz against the alternative H,: p > 12' Use
o = 0.05.

ta
/: / s
Ans.Z= 2.4i reject Ho
5. ,A random sample of 64 drinks from a soft-drink machine has an average content

s
of 21.9 deciliters, with a standard deviation of L.42 deeiliters. Test the hypo'

tt p
theeis that p = 22.2 deciliters.against the alternativb hypothesis p < 22.2, at the
5 % level of signifrcance.

h
Ans.Z=-1.69; rejectHo
G. A.random sample of 200 trucks were driven on the average f 6300 miles a year
*itt sarmple'etandard deviation of 3100 miles. Test the null hypothesis that
"
the average trucli mileage in the population is 1?000 miles a year,againot the
alternativl hypothesis that the av€rage is less. Use the 5 % level of signifrcance.
Ans. Ho: p = 1?000, Hr: P < 17000, l= - 3.19; reject Ho
7. AmanufafiUrer of detergent claims that the mean weight of a particular box of
a"i*'g""tj& B.2b pourrJr. A random sample of.64 boxes revealed a sample
of t.238 pounds with a standard deviation of 0.11?.pounds. Using the
"uur"e"
I % level of significance, is there evidence that the average weight of the boxes
is different from 3.25 Pounds?
Ans. Ho: lr= 3.25,H,: p * 3.25, N=-0.82;acceptH,
ri
lChapter 13I Statisticat Infprence'Testing gf Hypotheses 161
8. Past experience indicates that the time for high school seniors to complete
a
standardized test is a Rormal random variable *itt, u mean of Bb minutes.
If a
random sample of 20 high school seniors toolc'.an ii.i *inutes to
=
"r"""U"
complete this test with a standard deviation. i.B minutes, "i
test the h;;il;r;;
at the I o/o level of significan"ulhat l, = minutes against the alternative that
p < 35 minutes.
15

Ans. tp- 1.976:aecept Ho


w--
9' g r*naom sample of 10 from a population gave X = 20 and sum of square of
om
. c
devjations from mean is 144 test Ho: p = lg.b-against H,: p > 1g.5. At cr

t
= 0.0b.

o
Ans. t = 0.395; accept Ho

p
r0. C'i/dn the following information. What is your conclusion in testing each of the

s
indicated null and alternative hypotheses?

og
(,
bl
(ii)

3 .
l; 4
(iir)
Ans.(i)t=-

99
accept Ho (ii) t =
accept Ho: (iii) t = 1.5; accept Ho

t
11. suppose you wish to estimate the difference between the daily wages for

ta
machinists and carpenters. Two independent samples of E0 people each are

s
respectively taken, and the relevant data are shown as follows:

/: /
Maehinists Carpenters

s 50 ft.
Sample Size 56 n\

tt p
Sample mean 172.5 \ 170.0 q\

h
Population variance e8 (" toz r
Should we reject the null hypothesis that the daily wages for machinists
and
' carpenters are the same in favor of the alternatiie t
v'i"*".i" th;; they aie
differentata=0.0b.
Ans. Ho:-Fi= ltr, Hr: pr .
k\ f = L.25; accept Ho -{)
12' A random sample of 100 *orkers in a large.fa-rm took an average of 14 minutes
11.:"rol"te aJask. A random sample of rsovivbikers in another-large fgrm took
an average oftY minutes to complete the task. Can it be assumed at
b % level of
significance' that'the average time taken by the workers in the two farms
is
saae, if the standard deviations of all the workers of first farm andsecond farm
are 2.minutes and I minutes respectively.
Ans. Hj:'pr = p2, Hr:[rr {yr, Z= g.49; reject Ho
Basic Statistics Paft-

has a
13. .A tire manufacturer wishes to test two types of tires. Fifty tires of type 'A'
has a mean

Ii
mean life of 24000 miles with 52 = 6250000. Forty tires of type 'B'
difference between
life of 2G000 miies with 52 = 9000000. Is there a significant
.the two samPle means ? Use o = 0'05' '
Arrs. Ho: Fr = Fz, pr * 1tr, Z= - 3.38; reject Ho
|r: Inajor outlet
14. A carpet manufacturer is studying differences between two of its
in the time it takes
m
before
stores. The company is particularly interested

o
. customers reeeive .urputlng that Las been ordered from the
plant' Data

.
."n."r"i"g a sample of delivery times for the most popular type of carpet are
t c
o
summarized as follows:

p
A B

x 34.3 days

g s
43.7 days

l o
.--"5 2.4 days 3.1 days

. b
n 4l 31

43
At the 0.01 level of significance, is there evidence of a difference in the average
delivery times for the two outlet stores?

99
Ans. Ho: Fr = Fz, Hr: Pr * 1t, Z - - 14; reject Ho /

t
15. Two random samples taken independently ho- ,or,nal poPulations with an

ta
identical variance yield the following results:

s
I II

/: /
Sample Sample

Size nr 10 nr=18

s
tt p
Mean Xz=
Xr=10 25

h
Variance t? = tzoo .3 = goo

population means is 10,


Test the hypothesis that the true difference between the
> at the 5 % level
that is, Ho'-t, - Pr = 10, against the alternative H': Fz - lti 10
of significance.
Ans. t = 0.40; accePt Ho
are 196'42 and
16. The means of two random sarnples of sizes 9 an{ 7 respectively the mean are
198.82 ,".p..tiu"Iy. The sums of tn" squares of the
deviation from
26.94 and rA.ii iespectively.
-*itft Assum" tt ut the two-sampleq are. drawn from
normal poprlut-io* iienticat variance. .Test Ho: ltt = Pz'' against the
aliernative Hr: ltr < Vzatthe 5o/o level of significance'
Ans. t = - 2.63L; reject Ho
[Chapter 13J StaHstical Inference Testing of Hypotheses . 1,63
'

17. In an examination, a class of 18 students had a mean of Z0 with . = 6. Another


class of 21 had a mean of 77 with s = 8 in the same examination. Is there reason
to believe that one class is significantly better than the other? Consider the
students as samples from one population. Use a b %o level of significance.
Ans. Ho: Hr: *
lF ltz, trr 1tr, t = - B.0b; reject Ho
18. ThTr{veights of 4 person-s_before they stopped smoking'and b weeks after they

om
c
Person '1 .)
3 4

Before .148 776 153

t .
118
After L54 t76 150

p o 120

s
Use the t'test for paired observations to test the hypothesis at 0.08 level of

g
significance that giving up smoking has no effect o, , pu"ron's weight.

]{i Vr * ltz, t - - 0.662; accept Ho


l o
Ans. Ho: ltr = llz,

. b
19. An.expery{en! was performed with five hop plants. One half of each plant was

3
pollinate/and the other half was non-pollinated. The yielfi of the seed of each
hop pla/t is tabulated as follows:

Pollinated
9 4 0.78 o.76 0.43 o.92 0.86

t
Non-pollinated
9 0.2r o.L2 0.32 0.29 0.30

ta
Determine at the 5 Yo level of significanee whethgr thq pollinated half of the

/: / s
plant gives a higher yield,in seed than the non-polihated half.
Ans. Ho: pr s lrz, Hr: Fr > lt2, t = 5.lOZ; reject Ho

s
20. Let X designate the.defective parts produced by an automatic machine. From a

tt p
randomly selected sample of 50 parts, 10 are defective. Let p be the true
proportion of all the parts that are defective; test the null hypothesis Ho: p 0.1
-

h
against the alternative hypothesis H,: p * 0.1 qt cr = 0.01.
Ans. Z = 2.Bbg; accevtrh.l'l \^
'1
^
,/
2L. Acoin is tossed zdrm"sresulting in 5''heads. Is this sufficient evidence to reject
the hypothesis at the 5o/o level of significance that the coin is Uutr"""a i" f^;;;;
of the alternative that heads occur less than iloa/o ofthe times?
Ans.Ho: p=b.5, Hr:p< 0.5, Z=-2.286; rejectilo ' \, :
I
22. A
random sample of 20Or'woilieis was selected from a population and 140
workers were found to be skiiled. The factory owner ctaimea ihut ut least g0 %
workers were skilled in his factory. Is it possible to reject the claim of the factory
owner at 5 %o level of significance.
Ans. Ho: p > 0.80, Hr: p < O.BO, Z= - B.bB4; reject Ho
Basic Statistics Part-II
164
85',/";i. the parts which it supplied
28. An electric company claimed that6L'least
conformed to .p""ii.;;;;;.. e ear$pk of
4,O-f{rts'was tested and 75 did not
. r{aim at 1 % level
of
meet specifications. Can we ".""pi it u- .o*puny's
significance?
accept Ho
Ans. Ho: p ) 0.85, Hr: P < 0'85, fl= - 2'095;
*L':l* f::fXl"':l"i:f!i
m
experu is
24. An expert rutrErE in the proportion :t
ru interested
of 100 males' 31
d"'' ltt a randomr sample
that have a certain minor blood disor

o
r^-r^r
resred appear +n rra,e
to have
lH"lffi;" dTilffi;il"*J;;rl;-d1
c
il,;-i "i]09 r*.*es -.-^ ^L^! rL^ --^nnrfinn
.
il: t'"Ili,ffi ffi z'

t
i:}ffi "i.
.r
*itt,1""er "r 3isni,n11::il"
rhis
: :h:, ::H:::17
blood disorder is significantlv
:?";ff'tr'il";;rlation ofafflicred
o
than the proportion women afflicted?

p
;;";1";

s
Ans. Ho: pr ( pz, Hr:Pr u pr, Z= 1'109; aci:ept
Ho

g
of male and{emale students have
studies comparing the mathematical abilities

o
25. of grades earned in
produced .o"iii.[i"g .on.tu"iorrr. tt " distribution at one institution is
introductory statistics by " ,"nio* s"*pl"
l
of students

b
.
that there is no difference in
given below. Use thdse data to t"tlit. hypothesis
that reeeive grade A' L€t

3
the population proportion of ,rr"i"t an'd females

4
cr = 0.05

9 E Total

9
A B C D
Gradb

a t - r.6 8 11 68

t
Males 15 18

/: / s
19 t2 t5 82
Females 20 16

s
Ans. Ho: pr = pz, Hr: Pr * 92, Z= - 0'331; accept Ho

tt p
h
Chapter
,"i
' \-/ 14
J
REGRESSION AND CORRELATION

om
c
..4.1 trNTliODUCt'iON

.
, ,,

t
rt

'l'lret'e iii'c sorr)e statistical tools with the help of which .l

we study a Sirfgle'

o
)
variable. llhe averages, the measures of dispersion, ih, moments etc. areial."f"tai

p
the^ f'requency di'-qrribution of a single variable, There are certain
toof. *iif, ilrt

s
1'o1
heip of which two or tnore than trvo variables or attributes are studied. What do
we

g
study when tliere ar:e tlvo or r'llore than trvo variatrles or attributes. In Cfr"pt"i

o
association, we slurll discuss tnutuai relationship between qualitative variables.

l
The
qualitative variables are aiso callerl attributes. The attributes are studied

b
by a

.
statistical, tool 12 (read as chi-square), In the present chapter and in the next

3
chapter we shall cliscuss the tools rvhich ,rud for the rtudy of two variables.
".* the level of this
4
Cases of more t,harr trvo variables are beyond book, and therefore
wili nr-rt be covered in this book. Theru u.L t*o different techniques whieh are used

9
for the study of two ot' lnorc tharr two variables. Thes, ur. regression and,

9
t
correlation. Both studv the behaviour of the variables but they differ-in their end

a
resulbs. Regression studies the relationship where {,epend,ence is necessari$

t
involved' One vilr'iai'rir: has the rlepentlence on a certain number of variablee.

/: / s
Regression can be used lbr preclicting the values of the variable which depends
upon
other variables' Correl;rtion attetnpts ta ',stucly the strength of tire mutual

s
relationship beiween two variablcs. In correlation rve asgume that the variables are
random arrd ricl.rcntlc,ce of'arr;' nature is not involved.

tt p
r4.2 MATnEilr-aTrcAL I{ODEL OR EQUATION

h
Regression involv-es the stud.v of equations. First we talk
about some simple
equation's br tlroclels. Tire sirnplenl rnathematical nodel or equation is the
equatlon
of straight irnc.
Example 14.1.
Suppose a shop-kecper is sclling pencils. He sells one pencil for Rs. 2. Table
14'1' gives the nunrber of pencil"s solcl.and the sale price of the pencils.
Table 14.1.
Number of pencils solcl 0 t1 .)
3 4 D

Sale pr:ice (Its.) 0 , 4 '6, 8 10


Let us exarnine the tu,o variables given in Table 14.1. For the eake of our
convenience, we can give sollre name$ to the variables given in the table."Let X
1G5
Basic Statistlcs Part-II
realised hy
denote the number of pencils sold and S (S for sale) denote the amount
selling X Bencils. Thus,
() .1
5
x 0 1 r) ,a

2 4 6 (t 10
S 0

The information written above can be presented in sonte other lorms as


rveil'

m
For example we can write an equation describing the above reiation bctweetl
x and

o
S. itl- *iy simple to write the equation. The algebraic equation
conlrecr:Ilg ,\ arrd S

. c
ie, S = 2X.

upon x. Here X ie called independent variable a1d s


ot
It is called mathematical equation or mathematical model in rvhrch S depernds
is called dependent variable.

p
Tl;;r is exact relation between X and S. When 2 pencils are sold, the sate price is

s
ns. +. Neither leee than 4 nor more than 4' The atrove moclel is caileti detr'rininistic

g
mathematical model becauee we can determine the value ot' S without any
erlor by

o
pottii* the value of X in the equation, The sale S is said to l:e function X' Thisof
atatement in eymbolie form is written as: S = f(X)

bl
.
It ie read ac ,S ia function of.X'. It means that S depends upon X anri oniy X and

3
no other element. The data in Table 14.1 can be presented in the form
of a graph as

4
ehown in figure 14.1.

99
a t
s t
/: /
s
tt p
h
-*. Figure 14.1

The main features- of the graph in figure 14' l ' are:


(i) The graph liee in the firet quadrant because ali the values of X antl s are
poeitive.
(ii) It ia an exaet straight line. But all graphs are not in the tbrrr e.rf a straight line'
It could be eome curve aleo.
(iii) All the points (paire of x and s) lie on the straight line.
(iv) The lile passes through the origin'
t:II. [chapt3rltBggrg;sipn:!{Sq!:lgtalton, .- _ 167
thv (v) Take any point P on the line and draw a perpendicular line PQ which joine P
with tire X-axis, Let us find the ratio
&E Here pe = 6 units and Oe = g
units. f'fxrs
ffi =56 = 2 units.

It is callerl the slope of the line


and in general it is denoted by 'b'. The slope of
ve1l, the line is the $&rne at all points on.the line, The alope 'b' is equal t-o the change-in Y.

m
and for_a-unit change in X. The relation S = 2X is also ciUea Hncar equation between X

o
idS and S,

c
I4.2, '
.
&xomple

t
,
:nds Suppose a carpenter wants to make some wooden toyg for the small children. Hg

o
rble, has purchased some wood and sonre other material for Be. 20, The coet of mhking

p
:e is each toy is Rs. 6. Table 14.2. givea the information about the number of toys madl

s
istic and the cost ofthe toys.

g
rr hry
Nurnber of Toys 0

o
Thie 1 2 B 4 6

l y
Qost of Toys 20 25 30 36 40 46
and

. b
Let X denote the number of toye and Y denote the coet of the toyo, What is the

3
has , algebraic relation between X and Y. Whcn X = 0, = 20, Thie is called frxed or

4
starting cost and it nray be denoted by 'a'. For each additional toy, the coct ia Bs, 6.

9
Thus Y and X are connecte{ through the following equation:
'
9
Y = 20+EX

t
It is called equation of straight line. It is aleo mathematical model of

ta
deterrninistic naiure. Let us make the graph of the data in Table 14,2. Figurc t4,2,

s
is the graph of rhe data in Table 14.2,

/: /
s
tt p
h
Y*20+5X

i are

line. Figure 14,2


Let us note some important features of the graph obtained in figuro 14,2.
(ir The line AB does not pass through the origin, It pacses through the point'4'on
Y-axis. The distance tretrveen A and the origin '0' ic ealled the lintercept'and ir
usrrally denotpd b), 'a'.
168 Basie Statistics Part;II
(ii) Take any point P on the line and complete.a triirngle I'QA as *iii:rvtr in the
figure. Let us find the ratio between the perpendicular PQ atrcr, the base AQ r;f
this triangle. The ratio is,
ffi = i[ = 5 units.
I'his ratio is denoted by 'b' in the equation of straight line. Thus ';he equrrtio;r of
straight line Y =20 + 5X has the intercept a = 20 and siope b = 5. In geuerai,

m
when the values of. inlercepl and slope arc not known, we write tlie equatlon of

o
straight line as Y = a + bX.'It is also called linear equation betweer, X and Y anrl

c
.
tf
the relation between X and is called lhwar. The equation Y = a + bX tnay also

ot
be called exact linear model between X and Y or sirnl:ly linear model between X

p
and Y. The value of Y can be determined conrpletcly' when X is given. 'I'he

s
relation Y = g + bX is.therefore, called- the deternrinisl,ic lurear mc,ciel between X

g
and Y. In statistics, when we shall use the terur 'liuear r:iodell, we shall noi

l o
mean a mathematical model as described above.

b
'Another property of the exact }inear model is rhat thc 1st rliiferenccs of

.
Y-variabie are zero. The first differences of the Y-varialrle in T'abie i4.11. ar*

3
4
calculated as below:

9
x Y Fi.rst differences AY
0

t9 20
2'i-20 =F
I
ta 26

s
30-2.5 = 5

/: /
()
.JU
35*30 = 5

s
J 35
40-35 = 5

tt p
4 40 :,,.,
45 -. 40 5
o 45

h
F'i
' It means that when all the points of the pairs (Xi, Yi).tie on the atraight Line, the is
first differences AY are exactly constant. We ehali take help li'orn rhis p::opei'ty later da
on. In a certain observed. data, when the frrst.differences rviil l:e r:<.rnst,ai1t or almc,st de
sel
constant, we shall consider the observed data to be close to a straight line anrl we
11
would like to find the equation of that line.
14.3 NON. LINEAR IVIODEL str
Let us consider an equation Y = 10 + 5Xz thr
By putting the valuea of X * 0, 1, 2, 3, 4, in this equai;ion, lve find the values pri
of Y ae given in Table 14.3 belolr,. The first and seeond clifferences are calculated in
'lable 14.3.
Table 14.3.
tr'irst differences Ay Seeond differences A2Y
n 10
15*10 =5
1 15
16-S=10
30- 15 = 15
2 " 2A* 15=10
m
30
55-30 = 2b

o
.,]
3 55
35-25=10

c
90-55 = 35

.
4 90
'I he scccnd
dlfferences are exactly constant. The gener"l

ot
qrudr"tffiuation or

p
mociel is rvritten as

s
Y = ,1"+bX..cXz : (c*0)

g
Ir is sist'r c;iiit:d second (ltgr"ee parabo{a or second degree curve. The graph

o
dafa is r,hoir';r of the

l-
bt:1r,.,,,- i;.r i'riturc i+.g.-
i'
I
l.

. b
43
99
a t
t
Y=lo+5x2

/: / s
s , 'r't'r'
tt p
| 2
Figure 14.8

h
Figure l'1''1 is ttot a straigirt line. it is a curve or we
say that the model y = 10 + bX2
'fhe 'qr*denrs are advised to rememb;;iil;f1"
T,1o";li"*ar.
clata' the second ciffr,'rence:s a.re eonstant or almost'constant, " "";i"r" ;;;";;;
we frnd the second
rlegree curvd clcse ro Lhe q:bsr:rved data. we
shati iuc"-tlrtr;;; in r:.me
series. "i.iir.ri"n
I.1.4 ST.\TtrSTICAL T,{ODEL
statistical model is aiso a mathematical modefl but the difference
statistir:al mcciel always contain.q an error term or is that
rande;m tei; i" ,i" ,*i.rliat'Ii
the mathemaiical equation. what is an
-- tirmlrl*t
"rro* ---'-.."' us take ,"
S practrcal lifc to explain tliis term. ".;;;il;i
n Suppose 'r,hei:e arn 10 agricultural plots of the la**tir"
it is assurneel ti-rar rhe plors arer similar in all. [h" .rme fertility.
"rrd
t;r];r;;" lrri a"u.rrrrg
rre ro remain as, cnnstant as p*ssibte from piot ""p-rir.:tt.
to pis;jtir;;;#;#'"?;.e
.
is.used
,. 1 |, ..!
170 Basic Statistics Paft-II
each plot. The yie,lds of rice from
i{ithe plote. We decide to put 5 kg. of fertilizer in
i"ot pi,rt Are rpcordecl. tei X denote the amount of fertilizer and Y denote the yield
[i;;. rr;" ei"gr" fixed value of X there are corresponding 10 figures of vields of
in very large number.o{plots, then the vields
;il;. i1t ke. oiiu"tilizer is applied some rnean denoted by pys. The mean Fyr5 is the
i,ifilni, u riormal distribution with
mean of y values when X is fixed at 5 kg. This mean is also denoted by E00' S9*j

m
,ffiki; vield, Y
arr uuor* b6y") rna some are below E00. The difference between theresidual.

o
'.ril Em fr *uUuA ihu term or the random term' It is also called the

c
,tffi;*;gdi; "rro" rice are a random variable with

.
*uy be denoted by ei. The yields of

t
from the
n Certain probability distribution. The random .errors are calculated

o
Ieiaoro'r"iltk;{ A "arrdom variable calcuiated'from another random variable is
p
aIsO ; random variable. Thus the errors ei are the random
variable and it is a well

s
0. Table
krrown fact they ei's are normally distributed with mean zero. Thus E(ei) =
L!!,shows

og
yields of rice Yi for a given value of X and the mean E(9 and the

l
".rt.in
errors ei are calculated as below:
Table 14.4.

. b
Amoufrt of fertilizer, X

43
Yield of rice (kgs.), Yi Average Error ei = Yi - E(Yl

9
40-55=-15

t9 40-55=-15

ta 50-55=-b

/: / s 50-55=-$'

s
tt p
bu-DD=-c
6k9., E(D = 55 kg

h
60-55=5
60-55=5
60-55=5
70-55=15
70-55=15'
Xei=0
t
r fnia5ie fal w6"have takerr only 10 values of Yi.In actual practice the h'umber i
are very large corresponding to a fixed value of x' In this Jable we
'Y.rvalues
of r
given Y values'
;hp;rfghi;t trr.reii po iquation which links the X value with the d
i' tt''| '' t
.l
[Chapter 14] Regression and Conrelatidn L7L
There is in fact no relation of mathematical nature between X and individual values
of Y*. The individual values of Y cannot be determined by any mathematrcal .
equation. If we change the amount of fertilizer, we shall obtain another set of Y
values (distribution of Y values) for the difTerent yields of riee. Thus for each value of
X, there is a normal clistribution of Y values. This fact is illustrated in frgure L4.4.

om E(Y)
I

c
lB

t .
p o
g s
l o
. b
4 3
99
a t x2 x3 xn

st
Figure 14.4

/: /
On each value of X, there is a normal distribution with mean E(Y). The Y-values
in the same distribution differ from their mean E(f) and the difference is called

s
error term. If a population data on two variables X Arld Y is under consideration,

tt p
then a linear statistical model or equation can be written as:
Yi = cx+pXi+e1

h
where cr is the intercept, B is the slope of the line and. ei (epsilon) is the error term
and it may take positive or negative values. The line AB in figure 14.4. which passes
through the Eft)'s is called the regression line. The observed value Yi can also be
written as Yi = Effr) + e 1

This equation contains a random term ei on the right side. Thus the variable Y1
is ranclom because it depends orl €1.
I4.4.T INDEPENDENT AND DEPENDENT VARIABLES
The value which is'decided by the experimentoi,',is called fixed variable'or
independent variable. it is also calied regressor or predictor. The variable which is
influenced by the independent variatrle is called" dependent variable. It is also Cailed
iegressand or predictand. This variable is of random nature and cannot be
dctsrlnlllsd exactiy for a given valrre of X. It is also called random variable.
172 Basic Siatistics Paft-II
L4.4.2 CAUSE AND EFFECT RELATION
t ;' In a relation, in which one variable is independent and the other is dependent,
qome people use the terrns 'cause' and 'effect'. In the previous exa.mple of pi'oduction
' of rice for a given dosage of fertilizer, the aniount of fertilizer is the 'cause' and
'produetion of rice' is the 'effect'. Thus in this regression relation, we can say that
fherd is 'cause' and 'effect' relation between the variables. Some special food may be
'
ferited on poultry birds. The amount of food is 'cause' and ihq rveight of the birds is

m
The 'effect'variable is also called the response variahle. But there may be
-'aan,effect'.
o
regression relation between two variables X and Y in which there is no couse and

. c
effdct (causal) relationship between them. In sorne cases a change in X does cause a

t
, change in Y but it does not happen always. Sometrmes the change in Y is not caused

o
by change in X. The dependencq qf Y on X should not be interpreted as cause and

p
' effect relation between X and Y. In regression analysis the vi.ord dependence means

s
that there is a distribution of Y values for a given single value of X. Fror a given

g
height of 60 inches for men, there may be very large number of people with different

o
weights. The distribution of these weights depends upon the fixeri value of.X. It is in

l
this sense that the word dependence is used. Thus depend.ence does not mean

. b
regponse.(effect) due to some cause. Some examples ar,e discuissed i:ere to elaborate
: the idea.
3
'
(i) The sun rises and the shining sun increases the ternper:ature. Let temperatrtre

9 4
be denoted by X. With increase in X, the ice cn the mountains melts and the
average thickness of ice Y; decreases. It is possibLe that the thickness of ice

t9
. decreases due to increase in temperature. Rut this is also possible that the

a In
thickuess of ice is decreasing due to weight anrtr hardening of ice. We may be

t
fegressing the thickness Y against the temperature X only whereas another

s
/: /
ifnportant factor is being ignored. this type of problern, more than one

s
qlmultaneously to estimate the unknown parameters.

tt p
(ii) We may think that increase. in the number of workers (X) is increasing the
production of fans (Y) in the factory. The increase in Y may be due to change in
i
h
't ' the administration and some changes about the leave rules and other benefits.
In a regression relation there may or.-ry rroi be a causal relation between X
and Y. The cause and effect relation betrveen twr: variables is also calleC causation.
It is important to note that the statistical method of regression analysis is silent
about the cause and effect relation between the variables. Sometirnes it is not
' possible to identify as to which variable is 'causc' anci which onc is 'effect'. In fact,
'. the answer is to bq searched rrot in regression alaysis but in some other area of
relationship,betvy,e6h the variables.
14.6 REGRESSION
Regression is concernecl with the.study of reiati.onships among variables. The
- aim of regression (or regression analysis) is to make mr-.dels fcrr prediction and for
making oihu" inferences. Two variables or more thrrn trvo variables may he treated
lChapter 14] Regression and Correlation. L73
scientist, Sir Francis Gaiton, who analyzed the heights of sons and the average
heights of their parents. Gali;on concluded that the sons of v6ry tall (or short)
parents were generally taller (or shorter) than the average but not as tall (or short)
as their parents. His work rvas published in 1885 under the title "Regression Towarrl
Mediocrity in Hereditary Stature". According to his conclusion, "regression tcrvards
mecliocrity" means that the sons heights tended towards the average rather than
take the extreme values. But now the word regression is used in much broarler

m
sense. It is the statisticai study of the relationship among variables.

o
14.5.T SIMPLE LINEAR, RITGRESSION

. c
Suppose we want to study the depenclencc'of Y variatrie on a single indelenrlenI

t
rrariable X. The variable Y depends on X and is also subject to unuccountatrle errors.

o
This study is covered by simple linear regression. For a popuiatiou data the simple

p
linear regression rnodel is written as Yi = ct + pXi + e 1 rvhere u is the interc'ept, p

s
is the slope and ei (epsilon) is the el'ror term and on sampie basis ihe simple linear

g
regrgssion model is r.r'ritten as Y1 = a + bX, -r e1 s'here 'a' is the intercept in the

l o
sample anti is the estirrrate of the population parametel o. ?he para:neter p is

b
estimated by the sample value 'b' and e, is the err.ol term in rhe equatir.,n.
14.5.2 PURPOSE

3
OF REGRESSION AN.A.LYSIS
.
4
There is no statistical prohlem when the parameters of the regression moclel are

9
known. Statistical problern arises when some of the parameters are not known. The

9
study of regression aims at:

t
(i) The regression models contain the unknown parameters. These pararneters are

ta
estirnated in regression analysis.

s
(ii) The value of the dependent variable can be predicted. when the valu'e of the

/: /
independent variable is fixed.
(iii) Certain hypotheses about the parameters cr and E J are tested. Confidence

s
Confidence t
intbrvals for cr-and $ are construclg,{
/t xi j l (: .,,#'\11f
tt p
f* r" h lilr lAr u "#+"rt'.'
14.5.3 SCATTER DTAGRAM (v filu,l
7\ t f , qaq lK Jaq4.-J-'r
! "r .r: {!,{, t'i '
h
Scatter diagram is a graphic pi#ure of the sample data. Suppose h random
sample of n pairs of obse.rvations has the valnes (Xr, Yr), (X2, Yr, (Xs. YJ, ...,
(Xo, Yrr). These poi4ts are piotled on a rectanguiar co-ordinate system taking
independent variable on X-axis and the clependent variabie on Y-axis. Whatever be
the name of the independent variable, it is to be taken on X-axis. Suppose the
piotted points are as shown in figure 14.5(a). Such a diagram is cailed scail,er
di.agrant.In this figure, we see that when X has a smail value, Y is also srnall and
when X takes a large value, Y also tahes a large value. This is called direct or
positive relationship between X and Y. The plottecl points cluster around a straight
line. It appears that if a straight line is drarvn passing through the points, the line
will be a good approximation for representing the original data. Suppose we draw a
line AB to represent the scattered points. The line AB rises from left to the right and
has positive slope. This iine can be used to establish an approximate reiation
L74 Easic Statistics Pail-II
between the rhnclorn variable Y and the indepenrjent variabie X. It is non-
mathematical method in the sense that different persons rnay draw different lines.
This line is cailed the re{ression line obtained by inspectiorr or juclgernent.
l;,
L/,\
B" "t{
om
t . c
p o
Fositive and Liuear

g s
Negative and Linear

l o
Y (c)

. b
3
[\,
tuF"
9 4
t9
ta
//
Negative Non--Lincar
Non--Lincar
s No Reiationship

:
Figure 14.5

to s
Making a and drawing a line or curve is the primary

p
scatte,r d,i,agrant

*o.ttt
investigation assess the type of reiationship between the variabies. The
knorvleige gained from the r."iti, diagrani can be used for further analysis of the

h
clata. In of thd cases'uhe diagrarns are not as simple as in figure 14.5 (a). There
are quite complicated diagrams and it is difficult'to choose a proper mathematical
model for representing the original clata. The scatter d"iagram gives an indication of
the appropriate mod.el which should be used for further analysis with the help'of
rnethod of least squares. Irigure 14,5" (b) shows that the points in the scatter
diagram are falling from the top ieft corner to the right. This is a relation called
inverse or indirect. The points are in the neighbourhood of a certain line calied the
n
regression line.
As long as the scattered points show a closeness to a straight trine of some
direction, wle draw a straight line to represent the sample data. But when the points
d.o not lie around a straighl line, we do not drarv the regression line. Figure 14'5.
(c)
shows that the plotted points have a tendency to fall frorn left to riSht in the form of
[Chapter 14] Regression and Correlation !75
a curve. This is a relation called non-linear or curvilinear. This type of relations will
not'be discussed in this book.
Figure 14.5 (d) shorvs the points which apparently do not foliow any pattern. If
X takes a small value, Y may take a small or large value. There seems to be no
sympathy between X and Y. Such a diagram suggests that there is no relationship
between the two variabies. But there is one point to be remembered that the figures

m
Iike 14.5 (d) have sometimes the relationship of cireular nature, something which

I,4.6 FITTING A LINEAR REGRESSION LINE - THE METHOD OF LEAST


c o
SQUARES

t .
o
The linear regression line is Y = o + BX which contains the parameters s

p
and p. If we know the values of cr and p, then this line is determined. llut the values

s
of a and p are usually unknown and we have to find the regression line by

g
estimating cr and B from the sarnpie data. Suppose we have a random sampie'of n

o
pairs of observations (X,, Y1), (Xz, YJ, (X3, YJ, ..., (Xn, Y*) and we are required to

l
find the regression'line of Y on X. These observation's are plotted in figure 14.6.Let

b
.
us draw a line AB passing through the plotted points. The values of the dependent

3
variable which }ie on the line are denoted bv t. Thus the estimated regression iine
of sampLe data is t
values of o and p.
= ,
4
+ bX where 'a' anci 'b' represent the estirnates of the true

9
Y

t9
ta
/: / s
s
tt p
(&,' Yn)

h
(&, Yz )
1x, ' Y;.)

(xr, Yr )

Figure 14.6
The difference between Y;.and t1 is called the error which is denoted by ei. The
sum of squares of errors (SS,E) can be written as .

./\.,AA.A.
SSE = ff, -(i,)'+ Gz-i")'+... + Gi-yJ'+... + CY,r-Yrr)'

= e?+ ,i+ ... + ef + ... + u3 - x"?


L76 Basic Statistics Paft-II

We have to finel that line whicir ie &esC, iitti.tt,g for the sample data. This besl
fi,tti.lg line is otrtainecl by using the principle of least squares.
The principle of least
.rquu*. is that "the bcst fitting line is that one for which the sum of squares of
errors is rninimum". This ineellls r've have i'dminimize
SSII = Ic? =I(Y,-t,)t butt, = aibXi a*cl SSE=:tli-a-'bX1)z
Thus SISE is a function of 'a' antl 'lt'. Each iine has some vaiues of 'a' and 'b'.

m
Those values of 'a' anci 'b' are required lbr ivhich SSE i*" minimum. The '"'alues of 'a'

o
and.'b' are calculatecl from the foil<.iwrng trvo equatians caiied the normal equations:

c
EY = na + b EX and XXY = aEX + bIX2

t
Solving these norrnal equaiions simultaueously, rt'e get the values of 'a' aud 'b'
.
o
rvhich minimize SSE. We can calculate the values of 'a' and 'b' b-y using the formula.s

p
as below:

s
(IX) (IYi
YL1f -.*-

g
ICi - Xi n'-Y) n

o
D= (rx):

l
J+
rix - x):l sV!-
iA- n

b
. --li[z - (I& (II\):
If the numerator ancl elenominator are muitiplied with n, this is convenieni; for

3
compu bitional purposcs.

-X,J:EIF 4
n :XY - (EX) (I\1 (IX'i) (lY)
get b= anri a =

9
wc 1:arz

9
The slope 'b' is alsc called thc regression cr.refficient of Y on X and is denotecl by

t
b,*" Putting tire vaiues of 'a' ancl 'b' in the regressiotr equation t = u bX, we can find

aregt;ess+ =
+

t
n

s
ttre Y values which iie on the ion line. The t values tne estimated
called the
les are calied e

://
values. The linear regression equation 2 r- bX can be used" to estimate the

s
values of the dependent variable when the value (or values) of X is known. The

tt p
p
calculatecl values of 'a' anil 'b' are the estimates of the unknown parameters cr and
and are used for inference about ry" and B. The inf'erence about cr ancl B wili not be

h
discussed in this book'
Horv to Write Normal Equations
The cterivation of rhe normal equations is heyon,l the level cf this book. Horvever
the normal equations can be written dire.ctl,r'as explained helow:
We write the equahon of straight line i.e.' 'y = 6 + hX '^"" (1)
We.wani to rvritc the norrnai equalion qf 'a'. The coef,ficient of 'a' is l- and the
ahove equaticrn is muitipliecl with t and then surnmation I is applieti.
Thus rve get
IY= na+bLlX (a+a+""+a=Ia=na)
This is called normal equation for 'at. To find the nortnal equation for 'b' the
(L) and then
equation (1) is muitiplie.t rv.ith X which is tire coefficipnt of b in equation
stumuration X is aPPlied- We'get
IXY -: aIX+b:X!
'fhis is callcd the ttot'm:rl r-'quation for'b''
[Chapter 14] Regression and Conrelation L77
Exumple 14.3.
Const,rtict an equaticn of tire hne of regres-eion {using' norrnerl equtitions} of yrelcl
of ri.ce on waier from the riata given in the fcrllowirrg tal,'le which shows the amounr
of water applied in inchcs and the yiek{ of rice in tons per acre }n an experirtenta}
farm. Esiimate the raost i.rrobable yield r.,f rice of 36 inches of water.
lYater ( X ) 10 g,)
1G 28 Jrl 4t) 46
Yield of rice { Y ) 2.25 9.85 2.95 3.15 3.40

om 3,80 4.00

. c
Solu,tion:

t
The rcgressicrn line of :'ield of rice (Y) on warcr'(X.t is I = a+hX

o
'lhe normal cquatiorrs aru: s-)' :: na 'r ir I-\ rnri I-X\' = aIX+bIX:
Tire necessulry caiculatiot-iij arr: given belorv:

s p
g
X \I \ry vlJ

l o
f,r, -\
i0 i) .) J-.
100

. b
16 ? rlf-, .i5.6 256

3
,,) 2.$5 64.9 484
28

9 4 3.15 88.2 784

9
34 115,6 156

t
3..10 1

a
4A 3.80 152.fi 1600
46

s t ,1-0C 184.0 2116

// IX = 196

qqn: . 7a+196b .,...(t)


LY = 22.4 IXY = 672.8 IXz = 6496

s
Substituting the values in the normal equations, we have

tt p
. pi.a 6i2.8 =196ar-649Gb....".(2)
.

h
So.lving these two equations, we multiply equation (1) bV 28 anci subtract I'ron:
equation (2;, we get
672.8 = 196a+64$6b
62i.2 = 19Ga+5.188b

1a a

4r.G = loo8borb=ffi =o.o;"i.


Substituting b = 0.05 in equation (1), we get
t
22.4= ?a+ 196(0.05) or 22"4 = ?a*9.8 or 7 a=22.4-9.8
or7a= 12.6 or =
"= # 1.8

Hence the regression line of Y on X is Y = tr.8 + 0.05 X


178 Baslc Statlstics Part.II
To esLimate the mos$ probabie yield of rice, we put X = 36 in the above
equation, x'e get
A
Y= 1"8 + 0.05 {36) = 1'8 + 1.8 = 3"6
Exomple "14,4,
Show that the sum of errors and sum of squares of errore are zero.
,7 o 4 5

m
1 r)

i
o
Y 0 I 3 4

. c
Solution:

t
The equation c,f a least square line is Y = a + bX

o
'\..
The ntrrtnril equations are
BY = na + bEX and
IXY = aEX + bIXz

s p
g
The necen-qary caiculations are given below:

o
t=X-1 (Y-t) tv - t)r
l
X Y XY x?

.b
'1 0 0 1 0 0 0
.2

3
r) I 4, 1 0 0

4
() o ()
e G 0 0

9
.I l(} 16 3 0 0

9
0

t
5 4 ?0 25 4 0

a
IXY = 4tt IX: = 55 EY=10 E(Y-t; = s r(Y*t)z = o

t
IX= 15 XY=trO

/ s
Suhstituting the velue s in ihe trr.rr:mal equatioRs, we have

:/
l0 l5b ..'"' (1) 40 = 15!r+55b ... (2)
= 5a +

s
Soiving thesr: two equati.rns, we m-ultiply equation (1) by 3 and eubtraet from
ii

tt p-=
et{uation (2), we gel
40 15a +' Siib

h
30 lgz+45b
10
tr.0= 1()krorb=T6=1
Suhrstituting ii = I irr eqitation (1.), rve get

10 = 5a+ i$(1) or 5a = 10-- 15 = -5 or a=r


-D *1
r'-
I{ence the fitted least sclutlre line is Y = X -1
Another Form of Regression Equation
We know that the hnear t'trgression equation.is
t = a+bX (1)
g and L79
ve Norrnal equation for 'a'is
IY= na+bXX -\ :
Dividing both sides by n, we get
-1
,J
lY na + ---
bIY
-_=
nnn mr i.I t2) v
it

.//r
____._
{\
Itmeans that the regressirin line passes through the point (X, $.
<\1
\-/ m
Subtracting equation (2) from equation (1), rve get

o
t/ z-
- [v',
c
Y-Y= b(X-X)

t .
which is another way of writing the reiression equation of Y on X. From

o
equation (2) rve can r,vrite

asp
a = Y-uX
The reg'ression coefficient b may be written

g s
br*, which rneans tlJat the

o
,coefficient beiongs to an equation in which X is the independent variable and Y is
the dependent variable.

bl -
.
Calculation of Sum of Squares of Errors

3
The sum of squares of eruors definecl by SSE $'X(Y * .rr. also be calcul

4
as below:

9
- ^_ ,./
SSE = rYr-axY-brxY (A". ,/ ,.uhj.
14.6.r pRopERTTES
t9
oF rHE THE REGRESSTON LrNEI |ffi, !J/
REGRESSToN LrN'n
s;'/
W.i-"
\'-'' ('*-;
.?
l)
=ta
0 ---r,*.
,/ i'-;The regression line oft= +
o followirg prop*itG*,
bX has the \ \ "
tn
[i7- Wu know that Y

/: /
t-
s a 't- bX. 'fhis shorvs that the line passes through the means X

s
ancl Y,
,/n

p
(ii) = a + bX ancl

tt$A-t
The Jum of eruors is equal ta zeto. The regression equation is $'
the sum of deviarions of obscrvecl Y frurn cstrnrateC I is

h
t\
= E&-a-bx)=.IY-,na-bIL=
(.1).--*__r_\
0 IEY=na+b]JXl
when E(Y -'h = 0, it means thalfif = It )
14.6.2 REGRESSION EQUA?rON O
There are some,special cases in which X and Y can be assumed independent
variabld turn by turn, This is possible when both X and Y'are rand.om variables and
to estimate some Y-value, X is assumed as independent and to estimate sorne X-
value, Y is taken as independent. When X and Y can be interchanged then the
regression coefficient of Y on X is clenoted by br* and the intercept of Y on X is
denoted by ay* when Y is the independent variable, the regression equation cf X on Y
can be written as:
* = Stxy+b*rY
180 Basic $tatistics Part-II

where b*, is theregressioir criefiicient of X an Y. 'f]ie ncrtr,al eqLratians for the


regressic'n of X ein Y are
IX , a ,=r, + br. IY lirt'l IX'Y = a*, 11 + l:"- IY'
The regression coelTicicnt. h.. tiri,l the intercepE axy can bc riirectiy calculated by
thc relarions.

om
t .g c
t.- f, o
and

s p
'I'he r:egres*ion c:qu,ation of X o:i Y can be written u* = bxv -Y)
Also I(X-t) =0, ,X
og = rt-andXr=&*r*b*rY or'a*r=X-b*rY

l
\uarious F'orrnulas {br t,he Calculation of Regression Coefficients

b
-.- .
DiiTerelt fbrrns of the fcrmulas fcl regression coefficient of Y on X are:

3
(rx) (t\')
IXY

4
n

9
(rx'!2
L4
\.Y2 -

9-
n

a t xXY-nXY

t
Ix\ -_Gx'1ji! " n llXg _ (IX):

s \') ?llSi ' S*r ;


LK!*nX2

= // Si
s : i(X'"' '\)*t'--
rr
= ,vltcre ' trX - X) CY - Y)

tt p
Wher:i u,he eaiculatiori,i ar'* to be reducecl by change of crigin, then we use Dx and
*
h LL'txaY n
I), where il* = xil, .. A anrl =Y B, ;{ and B are some. constants bro can be
ca lrr:lated as'celorv :

1r*. Er -Lrj)*i-Gll'l nIiJ*Dv * (ID") (EDy)


[r-
): (:l).,).: ':-*x - (ID")r
nID2 \--^/
iU:- .
X-A Y*B
Wtren change rif origin artd scale is used, then U = and V = -1-and
iUV-ry*tu nEU\'- (IUtQlO
L_
Lr.... * /r I i'\2
r-.,r'Y,, \=:a/- nxU?"_ (XU)z
,n-
I
[Chapter 14] Regression and Correlatlon 181
The regression equation of Y on X can be writterr as:

t -y = Stx-xl aL

Similarly, the formulas for the regression coefficient of X on Y are

x(x-nff-n xxY-qf!
m
I.
u*y -

o
(rnz
x(Y - D2 -
c
EY2
.:
n

.
1!

nxXY-(EX)(l$ _ EXY-nXY
n EY2 - (El)z
ot
p
Eyz _ n yz

= ,6-X\g-S = fu
ry- g s
where s*, =
E
l o
b
When the idea of change of origin is applied, then

ED*Dy

3 .
4
br, =

9
>of -(&I n xDf - (EDr)z

t9
When change of origin and scale is used, then

xuv _
(x-u (x\A

ta
s-
nIW

/: /
n - (ELII (IV)
(:vtg nEVz - lpv)z
,b*,= Ev2

s
n

tt p #o-n
The regression equation of X on- Y can be written as:

t-x=
h
Ex:ample 14.5,
v

Compute the regression lines of X on Y and Y on X on the baeis of the following


informations: XX= b0, Ey= 60, XXy= Bb0, X= b, Y=6,
standard deviation of X = 2, stand.ard deviation of y = B.
Solution: The necessary calculations are given below:

X =;orn= _ =?=10
'xE--
[r'-ef'] Jl0
[ruo-*]P] =*
1
s*, = rbol =g
n
182 Basic Statistics Paft-II

The regression coeffieient of X on Y is The regression cdefficient of Y on X is


S*r55 -S*u55
bo=?
b*y= =(r)B = = {ry =A
B 9 Dx

The regression line of X on Y is The regression line of Y on X is

m
-
x-x= b*r(Y-Y) t-v ='by.(x-x)
t-r= fic"-ur =3"-T t-o= |rx-b) =Xx'T
c o
t .
t = $ v*b-+ = 3 Y* 1r[ t= i x+6-+= i x-i
p o
t = 0'56Y+ 1.67
g
t =
s 1.25X- 0.25

Exornple 14.6.
l o
b
. -
For 8 observations on deposits (X) and loans (Y) the following data were

3
obtained:

4=
E(X- 47) = 1.6, E(Y-35) = 8, x(X-47\z 74' x(Y-35)2=66'

99 Y=

t
t(X-47)(Y-35) ='0, X 49' 36

a
of X
(a) Compute the regression equation of X on Y and estimate most likely value

s
when Y = 36.
t
//
of Y
(b) Compute the regression equation of Y on X and estimate most likelv value

:
when X = 45.

s
Here\p
Solution:

t t EDx= 16, EDv= 8, EDI =74,']D? = 68, XD*D, = 0' X = 49 andY= 36'

h
The regression coefficient of X on Y is

xDoDv-(ED*UED"\ o-
(16)(8)

D*Y = ut-S
= - *3 = -az'
'D;-ry
The regression coefficient of Y on X is

-# = -o'38
[Chapter +I RegreisioT and Cqrretatton 1g3
(a) The regression equation of X on Y is (b) The regression equation of Y on X is
t -x = b*y(y-y) A-
Y -Y = b"r(X-X)
t -+g = -0.27(y-86) t -so= -0.88(x-49)
* -ag = -o.zly+9.?2 t -eo= -0.38x+ 18.62

m
t = -0.27Y+g.72+49 t =-0.38X+18.62+86 ,

o
' = -0.27Y +58.72 - 0.38 X+ 64.62

. c
Put Y = 36, we get Put X = 45, we get

t
A
X = -0.27 (36)+ 58.72 = 49
o
t =- 0.38 (4 5) + 64.62 = 37.62

p
Example 14.7,

'
g s
Erri*ate tiie regression equation t = + bX for the following informatione
" (Y) for ten recent production runs

o
on lot size (X) arld number of man-hours of labor

l
performed under similar production conditions:

. b
EX = 500, EY = 1100, EXy = 61g00, EX2 = ZSAOO,Ef = lg4d60.

Solution:
x =; l0-- 50 and4 =? ff =
3
=
99 Y = tlO

t
a =@=Eaoo6
The least squares estimates a and b are calculated by using the formulas as

s t
/
-o=;ffi- nxXY (ER=(xy]
_ l0 (61800) - (b00) (1100) _ 68000 _

Y-bX:-/ lto-2
: =!r,
a. =
p s = lto- (bo)
= loo to

t t t
Hence'the estimated regression equation is = 10 + 2X.

h
WhenX=50,thent =10+Z(80)
Example 14,8,
= = ll0 l0+100

Fitting a straight line to a set of data yields the following regression equation:
t
= Z+SX
(a) Interpret the meaning of the y intercept ,a'.
ft) Interpret the meaning of the slope 'b'.
(c) Predict the average value ofy for X = B.
(d) If the values of X range from 2 tn 25, should you use this model to predict
the average value ofy when X equals:
(0 sr
(ii) -gr (iii) 0? (iv) 24? (v) 26 ?
184
Basic Statistics Part'II !
solution: of !- is 2'
1,

(a) The Y intercept a = 2 means that when X = 0' the &-t'{-li::ige 'atrus
uniL cf x, the value of Y is
(b) The slope.b = 5 means that for each increase of one
) expected to increase on average by 5 units'
(c) ' When X = 3, th"nt = 2 + 5(3) = 17
(0 Yes (ii) No (iii) No (iv) Yes a."
"
m
(d)
tn''o"Iffi3*'"Yf:ir?:irrton
o
are rwo different techniq,res wliich art: us':"l iar ihe Y

c
s'ti:letir:"r'r'; ir':;e the salne li

.
analysis of the bi:";;i;;; data, Both these techniq*es

t
iir'{) i:{!}i:iii:r:! i:: i-ri}t'r l'}l*se
1
I
sample data and there are certain calculations rvtrricir

o
one or: i i:c .iitai:' ilutir hr*'e rheir
techniques. None oilt .ru two can take the piace of

p
Ltr t.i:;.i l;i;.'l-' i'i'''' i'iiii:"[ a:i'i in
own area of application. In a certain situaiion, ulay

s
some other situation, may be that only correlation
is appiitrii;i":'

g
T4.8 CORRELATION 'i:ctween

o
oI ris:ii '-:'r''':-]or:
Correlation is a technique which measures the :',tre';;1lir

l
y
two variabres. Both the variabres X and rnay be i'riiir.rri,-iiii,-):
r,.i:l.'r'i-;e t.e-t one

b
to i-rc cufr'':1it.t;r'i ii 'iep'.,rtde nl'

.
variable is independent (non - random) and the other other
T
t:::h thl,chlnse: ]i

3
When the change. i., or," variable appear to be linke* 'h* a
\\hen the trvo variahies are

4
variable, the two variables are saicl to be correlated' .-t

meaningfully related. and both increase or both rlecrease


siilillfilnltusiy' firen tire

9
1
one va;'i:rbi'': i$ {:r;l'i"(iateLi rvith
correlation ls ter*ed as positive. If increase in any

t9
decrease in the other variabie, the correiati*n
is tertueri ;is llcr,'iitli'r-' '-it' itll'erse' ;,

a
SupposemarksinMathematicsaredenotedby.Xancl*1:5*is,Statistics.are

t
C(
value-u af Y ani-l large valies of
clen,ted by Y. If .*uit uutres of X opp.r, with small

s
y, then correlation is said to bq po;itivo. il-lt st'ftnds for

/: /
X come with large values of
marks in English and Y stands for marks in Mathematics,
it is possihle that srnaii
of negirtive crrreirr[iolt'
valnes of X appear with large values of Y. It is a caae

s
14.8.1 MEASUREMENT OF CORRELATION

tt p
i:"'i-''-': :''.:iuin'tion
The degree or level of correlation is measurecl wiLi: the
V
'correlation. For pnpuia'.iorl ciar;-r, Li-re corlciation
coefficient or coefficient of

h
dnd Y is r:reasurcd by the
coefficient is denoted by p. The joint variation of X'
iry Cor'{X' }i ir; d'rfincd as:
covariance of X anclY. Tire covurinr." of i andll denoteci
cov(X, Y) P[x - E(x)] [Y - E(r)l
=
The cov(X, f) may be positive, ndgative or zero. Th(}
cov;in*iic'j h::'q -:rfli-': itl{nc units
r";''i 'r ' 1i'1 p';r't the
in which X and rur" -er.r.e,l. wh"n cov(x,}} is tiivit:':''i ",' ' .
Cov(X, \D :; ; ...... j:rirri\:rr
correlationcoefficientp.Thus p=-;;'il'
1r":riji':
prsii'e('r; '" " ..,!.".,,i/,Yrr
i'"l::'-i.'rr'r'::]i'i'-ri''t:iti,e
Itisapurenumberantlliesbetlveen-1 and+f'Iip=-l I'i'i ii ijii''';- is trtr
correlation. If p - - 1, it is called pert'ect negaiive *orrel:rt'.*'
correlation between X and
y, then X and Y are inclepenrlr:nl *:rd p = [i. lror satilple
''irc ii::ear
v6si;$rii"r *f, nl'rq:lr;'!\ "i
clata the correlhtion coefficient rlenotecl by'r'is n
,. 1:rtion between X ancl Y variables, where 'r' is a pttrc'.Itumhtli' t:'r'rri lie;r 'Liiwcen - 1
[Chapter L4J ftegressio! mm4 eorretation
14.8.9 p {11-}. f Ht :'i:' {}{.}$ ITIVE C OITRELATION
c'on,,ir,. '',' ..., 'i i,:'r-al;lc I4.i. on X and ywhere X is the number of litres of
oil and':'il l.i;ir i:i i:i;li::tr:; tr';i,f,eJed by er vehicle in kilometers,
'-- .r* - Table 14.5.
;
L*--.ii.__. i _*: ll--- !- ()
3 4 5
! a'_ t r-^
L 2* 40 60 80 100

m
-- ^.1-"-11
Fantj i,-ii,::.-*.:" ii,iirr:: 14.?. jllusirates a perfect positive.orretuiio" U"tween X and

o
Y. IL*re -f i*,:,.,t:,,ir,.:ii i.,, * fixed distance of 20 kilometer when X increases by one

c
trtre. iit-::":r r:.. i,t ..ji: li ,"'" ..- '',1;ned as random variables.

t .
.i] i""1,ii1.,
14"8"i i-l:i.11 :rj,i lj' j.;ii:i-;;1?IVF,l
;_ Ir=' :.;i:ii";;_l?IVr,l CORRELATION
cORRELATION

o
c',.r,'i - : r., ; firril) i;, l'3]rig i4.6. on X and y where X is the number of study
'I ilt'-, v-1,'''].''

p
'.9 l-lilri')':,,
houi:s anti ;'ei'slCeping hourS of different Students.

.]
4 6

g s 8 10

o
ii0 au l" Iu
l
l-.*::--^-:- I d
8 | /
I I ti
6 I

b
Panei (b) cf iisiil"i: 14.7. clepicts a perfect negative correlation b"t*""., X and y.

.
r,s :"' i:o'i :i ;rr:{a.irve relatronship betrveen X and Y so that y
T1:t decreases by

3
ir,1,tn
q fi,tcti.:'iirr,ri.i;'j],,',' lrr-r,,i as il tr:creases by 2 hours. There is a definite predictable

4
J.ecre*l;i:'i : o.' .,,-':;,.-.:,.,-it,: ri^ -ir;ct'cases b5,2 units of time.

9
14.8"4 ii G.r-*;.t iii.i.ii'i0.\.i

9
l,-r:i, r:-e c*ilsitii;;: tirc. data in Table 14.2. on X and y where X is the per

t
i^.,^,..^ .." i1ir,!1!:rf.::iris
capita
riiul,-li:i-'.i'ir
:
cf R,s" and.Y is the crude death rate per 1000 of popuiation in a

ta
COUi,l,'r'

s
Table 14.7.

/: /
I .)
D 4 5
:*I-_*l_"_*jj 15

s
15 15 15
correlation between X and Y.

tt p
The Y-variable is not showing

h
Panel (b) Y Panel (c)

'liA
2 4 6 8 t0 1234-5
['erfect N egati ve Correlation No Conelation
Figure 14.7
Basic SthtisUcs Pett:II tc
186 t4
14.t.6 SCATTEB DIAGRAMS plott6d points
theoretical nature in which the
rr r't^^ -^l^+la- lrofrrraen the
lines. rn practicar ufe, m.os)
,r"
o"
":T;";;;*maticar type as shown in frgure
::t:rr::l-"^t:T,:
li,',tl'lilt :: (i
of the
random variables X and Y is not ::"::
is plotted in the form of
is the form of pairs and the data
m
;H::"til;ta
o
scattered points as shown in
figure 14'8'

.a/

t . c
p o
g s
l o
. b
43
99
a t
s t
/: /
s
tt p
h (e)

Figure 14.8

Panel(a)offrgurel4.S.showsthatincreaseinXisassociatedwithincreaseln
t t""':1.'"i-The frgure shows that the
points are close
Y. ltre scattered " 14'8' shows downward
correlation:, *:q'€ between * *TT:"re
":l,''.Panelnegative correlation between X and' Y'
in Y when x increases' This shows
movement
Panel(c)and(d)alsoindicatepositiveandnegativecorrelationbutthepointsare
the correlation coefficient w*I
have a
centrar line. Thus
scattered away from some
gmallvalue.Panel(e)and(ftindicateasifthereisnorelationbetweenXandY.
[Chapter 14J Regression and Correlation L87
L4.9 CORRELATION COEFFICIENT FOR SAMPLE DATA
The correlation coefficient calculated for X and Y in a sample data is denoted by
r*r. It can be calculated by using any one of the formulas:

(t) xfi-x) ry-n


f,*, --;ffi-. - _
Where S* and S, are the standard deviations of X and Y
t-t-
respectivety anrl are given by t. =
lqry and. S, =
\i+*
om
. c
This formula is called Karl Pearson's product moment formula.

EXY
_ (E)o (rY-)
n
ot
p
(ii) r*, = (iii) r*, =

s
,6-X)zlff-92

o g x,]
l [r*, -,, [ro
(iv) r*, =
nXXY- xx EXY-nXY

. b - ', Y,]

3
Example 14.9.

4
A retail outlet for air conditioners believes that its weekly sales are dependent

9
upon the average temperature during the week. It picks at random 6 weeks and

9
finds that its sales are related to the averhge tempefature in these weeks as follows:
Mean temperature (F')

a t 72 77 82 43 55

t
31

s
Sales (No. of air conditioners) 4 6 I 0 2

/: /
D

Calculate the correlation coefficient between the mean


"temperature and the

s
retail outlet's sal.es.

tt p(x-x)
Solution:
The necessary calculations are glven

x
lz h Y
4 +12
(Y-Y)
+1
(x-X)(Y-Y)
12
(X - X)z
L44
(Y - Y)2
1

77 5 +17 +2 34 289 ,4
82 6 +rq +3 66 484 I
43 1 -17 -2 34 28g 4
31 0 _ro -3 87 841 I
55 q
-5 -1 D 25 1

,x EY x(x * x) >(Y - Y) x(x - xltv - Yl , x(X - x;z E(Y - v;z


= 360 =18 =S -0 = 238 = 2072 =28
Basic Statistics Part'II
188
'rY
rx
\f-:-'-=60
ra-
360 rr_-
Y:^-O
nb
18

n G

E(Y - Ytz
Sx= Sv= n aE = z.ttol
__Erx-X)(y-Y)
I- = ===??8
nSxsv = o(t8'sg31) (2'1602) ==4
- 240'8595 = 0'99
om
t . c
the number of studY

p o
s
Number of studY hours

g
Number of sleePing hours

l o
b
The necessary calculations are given below:

3 .
tx-frxv-Y)
4
16 4

9
2 -4 +2 -8

9
_() +1 -2 4 1

t
4
0 0

a
0 0 0
6

t
_, 4 1

s
8 +2 -1

/: /
o 16 4
10 +4 -8

s
E(x-xxY-Y)

tt p
=-20
Ex30^,
h
X =;=T-=b ano

E(x.- xxY - Y)
rxy =
lrtx - X)t >(Y - Y)'
number of studY hours and
There is perfect negative correldtion between the
the number of sleePing hours'
Example 14.11.
between the values of X
Compute and interPret the coefficient of correlation
and Y from the following table'
ft
[Chapter 14] Regression and Correlation 189
Solutibn:
The necessary calculations are given below:
x Y XY x2 Y2

I 20 20 1 400
2 40 80 4 1600
I
m
3 60 180 3600

o
4 .80 320 16 6400

. c
5 100 500 25 10000
XX=15 XY = 300
nrXY - (xX)_(rY)
XXY = 1100 EX2 = 55

ot XY2 = 22000

p
.- _
' r/t">xz - (rX)rI [nryz - (xy]rl _

s
[5(55) - (15)'] [5(22000) - (300)r]

g
5500- 4500 1000

o
(20000)looo

l
./trol

b
. r = 1 means perfect positive correlation between X and Y..

.
.

Example 14,12.

43
The following are 5 pairs of values of two variables X and Y. Conipute and
interpret the coefficient of correlation between X and Y.
x 11 t2

99 13 l4 15

t
Y 15 t4 13 t2 16

Solution:
ta
/: / s
The necessary calculations are given below:
x Y XY .x2 Y2

s
11 15 165 t2t 225

tt p
t2 L4' 168 r44 196

h
13 13 169 169 169
L4 t2 168 196 744
15 16 240 225 256
XX=65 XY=70 IXY = 910 XX2 = 855 XYz = 990
rY _70
1\
ni) = 13 and \-7
r -_ ND - _
-14
EXY-nXY 910 - 5(13X14

llxz - n(X)rl [rY, -'n(V)r]


[855 - 5(13)r] [eeo - 5(14)r]

910 i
- -F=-- (10) =
910
0. So X and'Y are uncorrelated.
{(10)
Basic Statistics Pail-II
190
Example 14'13'
From the following clata, comprrte the coefficient of c
orrelaiion between X and Y:
X series Y series
15 15
Number of items
25 18

m
Arithmetic mean

o
r36 138
Sum of square of deviations

. c
from arithmetic rnean

t
series from their arithmetic
Summation of Ptoducts of devia.tions of X and Y
means = L22.

p o
s
Solutisn:
Here n= 15,X =25, Y = 18 , x(x - X)' = 136, ,ff - $2 =
g
138,

r(x-xlg-D= 122 and hence


l o
. b
43
Example 14.14.
9
9 tY':
t
In order to find the correlation coefficient between two variables
x and Y from

a
12 pairs'of observations, the fcllowing results
are given:

t
s i
tX = 30, xY = 5, IX2 = 670, 285' XXY = 34'4'

/: /
namely-X= 12' Y - 6 was
Later it was found that one particular set of observations,
16. Cornpute the correct value of
wrongly taken, the correct r"lr"u being = 21, Y =

s
the correiation coefficient r'

tt p
Solution:
The necessary calculations are given below:

h
Correct XX =30-12+21 = 39
= 967 |
Correct XX'1= 670-(L2)2+(2t12
|
Corr""t EY = 5-6+ 16 = 15
Corr'ect EY2 = 285' (6)' + (16)2 = 595
/l
Correct Ep = 344- 12(6)+21(16) = 608 I

Thus,
(rn (E\') (39) (15)
rxY---- 608 - -G-
r=
rx2-Y]['" l[uou
g]
:-- 608 - 48.75 - 559.25
63e'1e60
= 0.875
ft-4025) (48625)
I.4.9.1 CAUSATION IN CORRELATION
The use of the term 'causation' in correlation is not appropriate and shouid be
avoided. In correlation analysis, there is no such thing as cause and efTect reiation
between X and Y. When both are random variables, no variable is under the control
of the experimenter. Both variables change simultaneously. The forces of changes
are taking place in both the variables at a time. The marks of students in English
and Urdu are interdependent and cannot be classified as 'cause' and 'effect''
L4.g.2 SPURIOUS CORRELATION

om
The numerical value of 'r' is to be interpreted carefully. Somebody may calculate

. c
,r'between two variables which are not meaningfully related to each other. There is

t
no sense in calculating correlation coefficient between the number of telephone

o
connections over a period of time and the number of accidents on the roads. The

p
number of telephone crjnnections can be correlated with the per capita income of the

s
people and the number of accidents may be correlated with variables like population,

g
number of vehicles on the roads, the speed of vehicles etc. Any value of r between

o
un-related vatiables is called spurious correlation or non-sense correlation. The

l
observed value of r should not be used blindly. We must examine whether or not

b
.
there exists any mutual relationship between the variables.

3
14.9.3 CHANGE OF ORIGIN
The correlation coeffieient r*, is not affected by change of origin. If a certain

9 4
constant is added to the variable or subtracted from the variable, the correlatioh

9
coefficient of the resulting variables is the same as that of X and Y, Let D* = X - A,

t
where A is a constant and D, - Y - B, where B is also a constant. It can be proved

a
t
that the correlation coefficient r", is equal to the correlation coefficient between D*

/: /s
and D, which may be denoted by rr*r.,. Thus
(EDx) (xD'
-
s
tDxDy
"

tt p
f*y = rn*or =

h
If both numerator and denomin'ator are multiplied with n, we get

.
'f*y t fD*Dy =
(rDx)21 tnrDf
tuDi.- - GDy)21
Exomple 14,15.
The following figures show the imports and exports of a com.modity, in lakhs of
rupees, during the last five rS:
Years 1996 t997 1998 1999 2000
Imports 82 78 75 BO 95
Exports 70 74 78 75 80
TakingX=80andY=70as compute the coefficient of correlation
between imports and exports.
192 Basic Statistics Paft-II
Solution:
The necessary calculations ilrer givei: below:

x Y D*.= X - 80 Dy=Y.70 D*D-u Di Di


82 70 IJ 0 0

m
zl
78 74 -a 4 *16

o
75 78 6 -40 25 64

BO 75 0 5 d] 0

t . c
25

o
.95 80 +15 10 + 150 225 100

Total t0 LI fi2
s p
258 205

og
bl
.
[:nl" _Gryl[ir:
n lL r
_Gp"xl
n J

3
L

4
102 * (10+?a

DJL. 9
i' = ="j&& 48
0.40

9
lr__^ (27i!1
(10):li-^^_ i18.6996 =

t
1/1258--1i205---;-l ^itzsal(;e.2)
VL C-J
L-4.9.4CHANGE OF SCA5,E
ta
/ s
lVhen X and Y are divided.by sonle ccn$tant or they are muitiplied with some

/
:
constant, the operation is cailect charrge of scale. The value of r," Coes not change by

s
Y Y
change of scale. Let i
U = n anc V = tt wirei"e h and k are sorne constants. The

tt p
correlation coefficient betlveen U and V is denoted b3r r.u. It can be proved ihat

h
f*y = flry. Thus,

f*, fln'=
ruz-.P][r*-ry]
fnIU, [nI1r, - (ili;t
LIU)X
The above formulas. are also applicahle u'hen the cirange of scale ls applied on
only one variable and thc other variahle is r:ol changed. iVhen a variabie is left as it
is, it means the variable is niultiplied n,ich f . ii'Iay be that one variable is muitipiied
and the other is divided by some constani. it is aiso cl:ange of,scale and the fonnulas
will work if such changes are carncti cui.
[Chapter 14] Regression and Correlation r.93
14.9.5 CHANGE OF ORIGIN AND SCA.!,E
Change of origin and scaie may'he applied simultaneously on X and Y. The
value of ir' cloes not change L,y change of origin and scale" Lel - U
x-A and
= h

nIUV * (ILI) (X

om
c
lnxU: - (IU;'1 [nIVz - (I\D'?]

t .
o
If the numerical'rrailles r,{'X and'L:in,r,t:r'.-v iril'ge, \r'e carl reduce the calculations

p
by applf ing the opc'rulrrtjs likc chanq,,'oi'r,r'igirr. clui.ng; of scalc or changc of origin

s
and scale . The coeiTicient of correlation betwe.en U and V (r,n ) is the sallre thing as

g
correlation coefficient betvree:r X aud Y(r.,.). Iiut r,,. = lr,.;.,,rvhen the divisors or

o
multipliors iike 'li' and 'k' i'i:rvr: ;lrr: :r.:irr-ir., irii,;iii-rr':liil signs. if the1,' have the opposite

l
signs, then r'* .- - !'r i..
14.9.6 'r'IN A LINEAR REGRESSION RELATION

. b
3
Suppose there is a certain linear regression relation between X and Y in which

4
X is independent i'ariaLrlo and Y is the ilopsi1,.l"rr1 variable. The correlation

9
coefticient 'r' is calculated betlveen X aliri Y. 'l'llis yalue of 'r' measures only the

9
strength of association between the trvo variables. This value of r cannot be used for

t
any type of inference about the population conelation coefficient p. Suppose in a

a
linear regression probiem, we regress rhe pr;oduction of fans Qf against the number

t
of workers (X). lVe ]rave caiculaieri'r'hci',vcci: $ and Y ivhich is 0.90. It means there

s
/: /
is strong association betrveen the nnmber of ''r'olkcrsi and the production of fans.
14.9.7 'r'FOR RANDONI VARIABI,ES

s
If X and Y are both random variables and 'r' is calculaterl betweeri X and Y, then

tt p
the value of 'r' serves two purposes.
(i) It measures the strengtli of assoc;iation i:eiween the two random variables.

h
(it) It can be used for inferencc about p, the correlation coefficient in the population.
14.10 RELATION BETWEEN by*, b*, AND r
For a iinear gegression relatic,n, iet us write the founulas used for the
calculation of r, br* and tr..,

r
xfi
nS*S;.,

br*
r(x*x)fi-\) )'f
:a,
\: ti \_L_-- r.t
--. _ _/ ,^r ^)
nsi ):fi - a;z
p$_-&-c:-D
b*y =
nsi Iff - t;:
L94 Basic Statistics Pail'II
ThdFterm. in the numerator is the same for all the three formulas an_d $e
denominators in atl the three formulas are positive. Thus the algebraie sign of r, br*
and b*, will be decicled. by the numerator. If by* is positive, then r and bo, will also be
positive. If any one of the th;ee terms is negative the remaining will also have
negative sign. Thus all the three will be positive, negative or zero,
We can write b, and b*, in terms of r, S* and Sr. We have,

(1)

om
c
nS?
The right hand. side of equation (1) is multiplied and divided by sr; Thus
t .
&_ &E6-n (Y-n
p o
s
(2)
br* = ns3 Sr- S* nS*Sy

og
l
bo* = *'-,

. b
ln this relation br* will have the same sign as that of ro' Similarly, we can write

b*,

4 3
n-g
trx -
99 -Y)

t
The term is called. thee sample covariance and is denoted by S*,'
samPle covananc
---]

ta 1 -&r
From equation (2), we have

,- -!Lr6-Dg
0v*-S*
// s - Y)
and b*, = Sir

:
n s.s, s? s3

s
tt p Y-Y rff6-X)or Y-Y =
is i
The simple trinear regression equation.of Y on.X - V = bG - D'
(X-x)

h
Itcanbewrittenas = Sf
similarly the regression relation of X on Y can be written as

t-x = '*o-t, o, i-x = Ho-,


v

14.T1 PROPERTIES OF CORRELATION COEFFICIENT T


and
(1) We may speak of the correlation coefficient between'X and Y' or between'Y
' Xt. Both staternents have the same meaning. Thus r*y = ry,. It is called the
symmetric propertY of r.
(2) r is a pure number, If X anrl Y are measured in kilograms, r will not be in
kilograms. It is free of the units of measurement'
!
[Chapter 14] Regression and Correlation 195
(3) The value of r does not change by change of origin, change of scale or by change
oforigin and scale.

Thus r*y=rrrvwhereU = YandV = Handlh'and.'k'havethesame


algebraic signs. If 'h' and 'kl have opposite signs, then r*, and r* will have the
same numerical values but with opposite signs. Thus r,,, - - rw.

m
(4) r lies between * 1 and + 1.

o
(5) In a linear regression relation, r is the geomelric mean of the two regression
rF=' c
lffi
.
Thus , =
t
coeffrcients. = \rc-* =
r wiil have the same sign as that of br, and b*r.
o
sp
(6) If X and Y are independent random variables, then Cov(X, = 0. But if
9
Cov(X, D = O it does not metan that X and Y are definitely independent.
Example 14.16.

og
l
Compute and interprer the coefficient cf correlation. Also show that the

. b
correlation coefficient is the square root of regression coefficients on the basis offihe

3
following inform ations :
X = Amount of fertilizer in pounds per 100 square feet

9 4
Y = Yield of tomatoes in pounds
I(X-35) = - 100, X(X-35)3 = 3000, E(Y- 18) = 39.

t9
n(Y- 18;z- 1106, X(X-35) (Y- 18) = 900, n= 10.

ta
Solution:

s
EDx=- XD?=3000, XDy=30, EDi = iD*Dy = 900, p=

/: /
Here, 1,00, 11C16, 10

The correlation coefficient is

r=
s
tt p
tnEDl - (rD*)21 [nxD] - (xDv)21

1a(,00)-(-ioo)( _ lzQoQ,

h
L4254.82375 -=oR1
- v'o=

There is a strong positive correlation between amount of fertilizer and yield of


tomatoes.
The regression coefficient of Y on X is
nID_*Du - (EP*) ODu)
-- 1Q{9!8L
( 100) (30) 12000
nxD! - (xD*)z 1o(3ooo) - (- roo;' 20000 = 0.6

The regression coefficient of X on Y is


nxD*Dv-(ED*)(xDv) _ W * 12000
=
10(1106) - (30)z
1.18,
nEDf - (EDy)' 101 60

t 16r,.-l; = + {os; 1r:g = 0"84 = r


[Ience the correlation coefficient is the square root ofregression coefficients.
196 Basic Statistics Part-II

Example 14.17,
. The equations of two regression lines obtained from ten observations are:
fOX = 5Y-55 and 100Y = 200X+ 1180

Compute and interpret the correlation coefficient r'

m
Solution:

o
The equations of regressio, lirru. can be written as:

x=ftv-ffi = 0.5Y-b.b and, Y=ffix*+T=2x+11.1


t . c
Here, b*r=o.b,br*r = ffi=
= 2.and.
p o
1frfr4= t

g s
There is perfect positive correlation between X and Y'

l o
Example 14.18,

. b
The following statistics have been computed:

=
X =14, Y

4 3
22, 5*=6, Pr=?, r=0.72

9
Find the two regressionequations.

t9
Solution:

ta
The regression equation of Y on X is The regression equation of X on Y is

A-S,
// s t-X=.sq (y-y)
:
Y-Y=rfr (X-x)

i - ps t* - trl
t t t- r" -
zz = 0.72[ ra = 0.72| rr1

h
Y-zZ = 0.84(X-14) t- -zz) r+ = 0.62(Y

i-zz = 0.84x -Lt.lG t- r+ = o.GZY- 18.G4

t = 0.84X -1r.76+22 t = 0.62Y- 13.64 + L4

t = 0.84X+10.24 t = 0.62Y+0.36
'l--
'4, t 6-l \"
./,
Chapter 14 Regresslon and Corrclation

SHORT DEFINITIONS
\ Regression-
When n'e predict the vaiue of dependent variablc with the help ot' one or moyb
independent variables, it is known as regregsion.
v ()r .,

m
Regression is a process by which we estimate the valrre of clepenrlent variable on the

o
basis of one or rnore inrlependent variableg.

c
Begression Analysis r'

t .
The technique usecl to d6velop the equation anci provide the estirnates is ealled.
rggression analysis. \
,tincar Regression
p o
s
Regression analysis. involving one independent variable and one dependent variable

g
in which the relationshrp between the variables is approximated by a straight line is

o
known as lincar regression.

l
Irlon-Linear Regression

. b
Non-linear regression is a procedure for fitting data to equations which are non-

43
Regression Line or Regression Equation
A straight line that best represents the relationship between two variables,

99
.

t
An equation that predicts tfru *huu of tfrllupundent variable based on the value of

a
one or more independent variables is called a regression equation.

s t
Simple Regression Equation

/: /
A regression equation thab rncltides one independent variable and one dependent
variable. v

s
Least Squares IlLethod

tt p
The least squares method isrthe plocedure used to develop the estimated regression
equatiurr. /
h
Independent Varia kle J
A variable that provirles the basis for estimation, is called the predictor or
explanatory varrablt: or indepenclertt variable.
ar
The vauable that is prerl4ctirtg or e-xplaining the other variable rs callaci regresisor or
independent variable
Deperrdent Variable -
The variahle that is being pi'erlictecl oi' estimateci is known as riependani variable or
regressand or predictarrrl or rcsponse variable,
Regression Cocfficient or Vope of Regression Line
The average change in the
fivendentvariable for a unit change in the independent
variable is call:d regressiM coefficiont. The regression coefficient may he positive or
negative, rleperndirrg on the rclntio*ship between the two variahles.
198 Basic Statistics Part-II
ts
Residual Pr
The clifference between the observed falue of tJre tlcpenrtcnt vrriable and tlte rralue If
predicted using the estimated regression equation is called rt'rilduirl. al,
Scattcr L)iagram Cr
A graph of the pairs of observations on two variables is calleci a st:attel rliagram. A
ot':/ thr

m
A graphic ctevice usecl to summarize visually'the rellrtionship l,otttyri t',rr-r variables'

o
Correlation Ar

c
of the degree of linear assrrcilrtiott hel*'een the

.
Correlal,ion is a measure two

t
Ai
variablcs.

o
(i)
ot'

p
Correlation measurcs tire strength of a rr-'lattonsitip bt't'"vcetr v,rl'iubles.

s
(ii)
Correiation Analysis

g
A'group of techniques to measure the strength of tlrer assor:iatron between two

o
(iii,

l
variables.

b
Positive Correlatiorr

.
When the values of two variables rtove in the sarne clirection i;it tirat art rncrease or (iv)

3
decrease in the value of one variable is associated ivith air incl'ease i;r decrease in

4
the value of the other variable, cc,rrelation is said to be positrve''

9
Negative Correlation

9
When the values of trvo variables move in ciiffclent dilections/ so ttrat rvilh trrr

t
increase in the value of'orte variable the valuc of the othc.r virrial-.le dccrerises. and

ta
with a decrease in the vaiue of orre variable'the value of other variabit' increases,

s
correlation is said to be negative.

/: /
No Corrclation
When tg'o variahles have zero correlation, it is called as Ito rr.r)'I'ul:it1()11.

s
Curvilinear Correlation

tt p
When coryelstion between two variables represettts tl cillrYt-r i.ir:rt. ts rt.:t :r straight
line, the'n the iorrelation is said to be curvilinear c,3rrelaticn.

h
Linear Cnrrelation X
Linear correlation is one rvhere the ratio of r,'iit"iations in tltc rclatr.rJ variai:les is
consLant.
Non-Linear Correlation Y-
Non-linear correlation is one where thc r;atici uf r,:it'iittir.r"tt,i ili Ihc reiirtttcl vlrli:rb]crs is
fluctuating. '. r

Perfcct Correlation }r
If the relationslrip between vai:iables is suclr tlirrt tvit,h rii, itlut'etlso or'([t'uI'L'Ase in t]tc'
value of one, the value of the other irrererse or t[*crc:]se in a fixerl pruptrrttott,
correlation beiween thern is sard to be perl'ect ';i-t'rclstti,t't
Perfect Pbsitive Correlation
lf botlr the series movc in t|e same clirecrlt;n ;ind llic vartations ure pri'lportionate
: , r'c rvould be pe rfect c,Lrrslat.ion beilr'ccn thern.
I
[9IaPIeI13] tegrglsiol anggr*taflpn*.,. -", , ,. -.. ". ,", ,- ,
Perfect Negative Cor.relation
.rrt
If the two series move in reverse directions anrl the variations in therr values are
always proportionatc, it is said to be pcrfect negative .o*ruruiion
Correlation Coefficicn[ {t i-.yr^-io t\\ \,^.,\\^(\1 *., .i- ,. . il.(
A dcscriptive rneazure of the deglee of linear relatiunshjp betweerJX y
and is callecl
the corrclati', coefficierr-.
x\rL* i1\ ,\
",i,'_'-),'
:r
; i,_ ,,,
m
\-(:(,".-{ i ,;. .l:, i, l ,,_''"
.j or)
o
/.
A measure that expresses the extent, to which two variables are

c
related,

t .
Aims of Regression and Cor.relation Analysis
(i) Regression anal5'sis provides estimates of the d.epenclenr variable for given
values ofthe independent variable

p o
s
(ii) Regression analysis provides messures of the errors that are likely be
to

g
involved in using the regression line to estimate the deperrdent vapa-ble.

o
(iii) pt*tu*sion analysis provicles an estimate of the efflect on the

l
mean value of y of
a unit change in X.

. b
(iv) Correlation analysis provides estimates of how strong
the relationship is

3
between the two variables.

9 4
I\{UI,TIPLE - CHOI UESTIONS
A process by which we estitnate the value of dependent variable on the

9 (d)
basis of

t
one or urore independent variables is called:
(a) correlation
a
(b)
t
regression
(c) residual

s
slope

/
The method of least Equares dictates that we.hoo*u a regression line rr.here

/
the

:
sum of the square of deviations of the points frorn the line ie:
(a)
s
maximum (b) rninimum
(c) zcro
p
(d) positive

t t
A relationsiiip rvhere the flow of the rlata poinfs is best representcd by a curve

(c) h positive
iqcalle d:
(a) relationship
\...linear (b) nonlinear relationship
linear (d) iinear negativr:
AII data points falling along a straight line is r:alled;
(a) linear r*lationship (b) nonlinear rclarionslrip
(c) r'esiduul (rl) scaf tcr diagra nr
The value we woltlcl preclict for the ciepeudent variable lvhen the irrrlepgnrlelr
variables are all equal to zero is cailed;
(a) slope (b) sum of reeirlual
(c) irrtercept (d)
clifficult to tell
The predicted rate of respottse of trhe rlependent variablc. to chlpgr:s in fhir
independent variable is callercl:
(a) slope (b) intercept
ic) err,or (d) regressit n equatron
l
Basic Statistics Patt'II
200
is also callecl the:
t, The slope of the regression line of Y on X
(a) correlation coe"fEcient of X on Y ft) correlation coefficient of Y on X
(c)reglessioncoefficientofXbnY(d)regressioncoeflicientofYonX
B. Incimplelinearregression,thenumberofunknownconstantsare:
(a) one
(b)
two

m
(c) three
(d)
fbur

o
involved are:
0. In simple regression eguation, tire number of variables

c
&)1
.
(a) 0
t
(c) 2 (d) 3

o
two variabies are:
If the value of any regression ceiefficient is zero' then

p
10.
(a) qualitative (b)
correlati'on

s
(c) depenrlent (d)
indePendent

g
11. The straight line graPh of the linear equation

&) o
Y = a + bX, slc'pe will be upward

(d)l
It:

b
(a) b=0 b<0
(c) b>0
3 . b+0
lion Y = a + bx, slnpe will be

4
12: The straight line graph of the linear equa

9
downward if:
ft) b<C

9 (b) br0 '


(a) b>0

t
(c) b=0 (d, b*it

a
is horizontal if:

t
The straight line graph of the linear equation Y'= '1
',;1.1", 'i1irp€r
18.

s
(a) b=0 ":'
/
::1
/
(c) b=1 (d) a=b

(a) 0 s:
'--\4, If regression line of i = 5, then value of regression c*effic'rent
of Y on '{ is:

p
(b)

15. tt
0'5
(c) 1 (a) 5

hi;i o.2f
(a)
If Y = Z - 0.2X, then the value of Y intercept is etlual to:
0.2 - 0r)
('l)
2
r;li oi rhe s.h,ve
16. If one regression coofficient is greater: t,hen ol't| ii'.'tt ;il'i'r will l'rc:

(a) more ihan one \i)) ilrlri;rl tr; +il'tr


(c)lespthanone(.1)8Qtr,lit;,iillinlisrtEe
;"':r:ii.i:i. ,r.r glt'rirl rs:
t7. To d,otermine the heisht of a persr:rt.,i.ten hi3
(a) correlation prohlem (b) ilqii.riiftilon gl'*o!;lom
(c) regression problem (d) quali"til'tiv':' Jrroirlen'

18. The dependent variable is also called:


(a) regressor (b) regresrancl
(c) eontinuoPsvariahle (d) indePendent
lelaptel14.1_ Eegf:ssiotand Corretation 201
-_
19" -, The dependent variable is also called:
(a) regressanrl variable (b) predictand variable
(c) explained variable (d) all of these
20. The independent va;jablc is also called:
(a) regressor (b) regressand
(c) predictand (d) estimated

m
2L, In the regressicin equation Y = a * bX, the Y is called:

o
(a) independent var.iable (b) dependent variable

. c
(c) continuous variable (d) none ofthe above
22, In the regression equation X = a + bY, the X is called:
(a) independent variable (b) dependent variable
ot
(c) qualitative variable
s p
(d) none of the above

g
23. In the regression equation Y = a + bX, a is called:

o
(a) X-intercept (b) Y-intercept
(c) dependent variable
bl (d) none of the above

.
24, The regression equation always passes through:

3
(a) \)(X, (b) (a, b)

(c) (X,YJ
4
variable 9 (b)
(d) (i, Y)

variablet9
25" The independent variable in a regression line is:
(a)

a
f: non-random random variable

t
(c)qualiiative (d) none of the above

/ s
26. The graph showing the paired points of (Xi, Y1) is called a:
(a)
: /
di.agram
scatter (b) histogram

s
(c) historigranr (d) pie diagram

t t
27. Tlre graph
p
(a) Iinear
{-represents the relationship that is:
(b)
h
non linear
(c) curviiinear (d) no relation
28. The graph l- represents the relationship that is :
(a) linear positive (b) Iinear negative
(c) non-lii:ear (d) curvilinear
29" 'rVhen regression.iine passes through the origin, then:
(a) intercept is zero (b) regression coefficient is zero
(c) correlation is ze o (d) association is zero
30. When b*, is posiiive, then br* will
(a) negative (b) positive
(c) :;e;c (d) one
202 Basic Statistics Paft-II

BL. The correlation coefficient is the -.--- of two regression coefficients:


(a) geometric mean (b) arithmetic mean
(c) harmonic mean (d) rnedian
gZ, When two regression coefficients bears same algebraic signs, then correlation
coefficient is:
(a) positive (b) negative

m
(c) according to two signs (d) zero

o
33. It is possible that two regression coefficients have:

c
(b) sitme signs

.
(a) opposite signs

t
(c) no sign (d) , clilficult to tell
34. Ilegrossion coefticient is independent of:
(1.) units of measurement,
p
(b) scale and origino
s
(c) both (a) ancl (b) (d) none of them

(b) g
35. In the regression line Y - a + bX:

(d)lo
(a) IY = )lY

b
IX = )-X

.
(c) IX = IY X=Y

3
36. In the regression line Y = a + bX, the following is always true:

Y) 94 (d)
.
n
(a) :(X-X)=0 (b) I(Y-Y)=O

9
n
(c)
t
r(x - kl = rfY - rCY - \)3 = o

a
37. The purpose of simpie linear regression analysis is to:

t
(a) predict one variable from another variable

/ s
(b) r't'piace points on a scatter diagrarn by a straight line

/
:
(c) llleasllre the degree to rvhich two variables are linearlS' associated

s
(c1) obtain the expecterl value of the inclepenclent random variable for a given

tt p
vaiue of the dependent variable
38. The sum of the differencc betrveen the actuai values of Y 'and its values

h
obtained from the fitted regression llne is ahvays:
(a) zcra ft) Positive
(c) nsrlative (d) minimum
39. If all the actual anrl es[imated values of Y are sarne on the regression line, the
sunr of'squares of error will be:
(a) ze[o (b) minimum
(c) utltxilnttrn (d) trnknown
n
40. c, = Y, - \', is callctl:
(n) r'csiclual
{h) cli{ference betrvcen inclependent ancl clepeldent variables
(c) ilrfferencc betrveen slope anti intercept
(,1) surtt of rcsidtral
[Chapter 14] Regression and Correlation 203
4L. A tncasut'c' of the st.rcirgth of the iinear relationship that exists between two
virr:i;iirics is callr:d:
(a) slopc (b) intercept
' (c) corrcl.tion coclficicnt (d) regression equation
42, When the rittio of variations in the related variables is constant, it is callerl:
(a) line ar c,rrclation (b) nonrinear correration

m
(c) positrvc (ion'r'l;rrion ' (d) negative comelatir:n

o
43. If both vitriatrles X and Y incrcase or decrease simirltaneously, then the

c
cot,fficicnt oi' t'r illt.l,rr ion rviil bc:

t .
(a) pr;sit ivr' (b) ncgative

o
(c) zr.ri'o one (d)

p
44. If'the points on thc scaLter eliagi'am ir-rdicate that as one variabie ilcreases the

s
other r,;iriitblc te nds 1o clecrease the value of r ivill be:

g
(a) jrcrfect 1,r;siti'.'c (b) pcrfect negative
(c) ncgarivc
o
(d) zero

l
45, If the points on the scatter diagram show no tendency either to increase

. b
together or dccrease together the value of r will be close to:
'
3
(a) -1 (b) +1'

4
(c) 0.5 (d) 0

9
46. If onc itenr is {'ixcd and unchangcthle and the other item varies, the correlation

9
cocfi'it:ronL rr'rll bc:

t
(a) posrhl,e (b) negative

a
(c) z.(,ro (d)

t
undecided
47. In scatter diagram, if
s
most of the points lie in the first and third quadrants,

/: /
then coefficierrt of correlation is;
(a) :rcgativc (b) positive

s
(c) zero (d) ail of the above

tt p
48. If thc llvo'series rnove in reverse directions and the vnriations irr their values
arc :r1wa1,'s proportionatc, it is said to be:

h
(:r) ncgatir I correlation (b) positive correlation
(c) i:crfect ni gative correiation (d) perfect positive correlation
49. If both ii.:er serres ntove rn the same direction and the variations are in a 1',:.., .i
he I,il'ol)orticn, r,orrelation betrvecn them is said to be :
(a) perfc:t:t correiation (b) linearcorlelation
(c) r'r'.,:rlirrr* ri correlation (d) perfect positive correlation
50" Tht, r ilutr r;i I lrt, coi,ffrcit,nt r,f corre [ntion r lics bctu]t't,n:
(;r) 0 ;rncl 1 (]r) -[ irnti 0
(t' ) 1 .anri + I (d) - 0.5 and r {i.5
51, if' X is iltrli;)sulrcd in iror.rrs rs mca*qLli'cd in rninutes, then correlabion
cr.,u{firletrt r i,,r s I irc rrnit :

(a) hoir r'r i li,t ll) ]iI tl,s


ll
(c) Lr;,rll; ,
i tl) llr'l r.ll'lil
204 Basic Statistlcs Paft-II

52. If bu* = b*, * 1 and S* = Sv, then r wiil be:


(a) o (b) -1
(c) I (d) clifficult to calculate
53. The corr'elation coefficient between X and -X is:
(a) o (b) 0.5
(c) I (d) -1

m
D+. If by* = bxy = r*y, thett:
(a) (b) - Sy

o
S* * S, S,*

(c) (d) s* < sy

c
S* > Sy

t .
Dil. If r*y = 0.4, then r{zx,zy; is equal to:
(a) . (b)
o
0.4 0.8
(c) (d)
p
0 1

s
56, rr* is cqual to:
(a) (b)
g
o -1
(e) (d)
o
0.5

l
1

57. If r*y = 0.?5, then corelation coefficient between u = 1.5X and v = 2Y is:

b
(a) (b) 0.75
.
o
((, - 0.75 (d) 1.5

43
58. If by* = -2 and rxy = -1, thetr b*v is equal to:
(a) (b)
9
'-1 -2
(c) (d) * 0,5

9 (b)
0.5

t
59. If by* = 1.6 and b,,y = 0'4, therr r"v will be:

a
st
0.64
(d) - 0.8

/: /
60. If b1'" = - 0.8 and b*v = - 0'2, then rv* is equai to:
(a) -- 0.2 (b) - 0.4

s
'(c) 0.4 (d) - 0.8

(a) tt p
61" If Y = 6 -- X, then r wiil be:
(b)
h
o 1

(c) --1 (d) both (b) and (c)


n
62. If Y = X + 10, then r is equal to:
(a) 1
(b) -1
(c) Llz (d) clifficult to tell
68. If Y = -10X and X - - Q'lY, then r is equal to:
(a) 0.1 (b) 1
(c) -1 (d) 10
64, If the figure +1 signifies a perfect positive correlation and the figure -1
signifies a perfect negative correlation. then the figure 0 signifies:
(a) a perfect correla[ion ft) uncorrelated variabies
(c) not significant (ti) weak correlation
lChapter l4l Regression and Correlation 205
66. A perfect positive cor:relation is signified by:
(a) o (b) *1'
(c) +1 (d) - 1 to'+1
66. If a statistics professor tells his class: "All those who got 100 on the statistics
test got 20 on the mathematics test, and all those that got, 100 on the
' rnathematics test got 20 on the statistics test", he is saying that the correlation
between the statistics test and the mathematics test is:
(a) negative (b) positive

om
c
(c) (d) difficult to tell

.
zero

67, If E(X - XIA - Y] it zero, the correlation is:

ot
. (b)
p
(a) weak negative high positive
(c) high negative (d)
s
none of the preceding

g
o
68. If r is negative, we know that:
(a) E(X - Xlg - V] u.. negative
X)2 and ECK -

bl
(b) E(Y -:t)2 and E(X - Xlg - 9

3
"r"
negative
.
4
(c) X(X - Xlg - Y) is negative

(d)
9
either t(X - I;'o, x(Y - D'ib negative

9 5.
1. (b) 2. (b) 3. t
11. ta12.
4.
(b)
Answer's
(a) (c) 6. (a) 7. (d) 8. (b)

le./s 2A.
10. (d) (c) (b) (a) L4. (a) 15. (b) 16. (c)

/ 36.
9. (c) 13.

:
L7. (c) 18. (b) (d) 21. (a) (b) 22. (b) 23. (b) 24. (c)

s
25. (a) 26. (a)
27. 28.
(a) (b) 29. (a) 30. (b) 31. (a) 32 (c)

tt p
33. (b) 34. (c) 35. (b) (b) 37. (a) 38. (a) 3e. (a) 40. (a)
4L. (c) 42. (a) 43- (a) 44. (c) 45. (d) 46. (c) 47. (b) 48. (c)

h
4e. (d) 50. (c) 51 54 oo (a) 56
(d) 52 (c) 53 (d) (b) (c)
57 (b) 58 (d) 59 (c) 60 (b) 61 (c) 62 (a) 63 (c) 64 (b)
65 (c) 66 (a) 67 (d) 68 (c)

SHORT QUESTIONS
1. Suppose thatY= l whenX= 0, thatY= 2whenX= 1, andthatY= Swhen
X = 2. Find the least-squares estimate b.
Ans. 1.0
2. Suppose thatY= l whenX= 0, tliatY= 2rvhenX= 1, andthatY= 3when
X = 2. Find the least-squares estimate a.
Ans. 1
206 Basic Statistics Paft-II

3. Given X = 1, Y = 8 andb = 2. Filld the value ofintercept a.


Ans.6
4. Given Y = 16, 18, 20 and X = 0, 1, 2. Find the vaiue of Y intercept'
Ans. 16

m
5. Given Y = 6, 8, 10 and x = 0, 1, 2. Find thc regression coefficient of Y on X"

Ans.2

c o
6. If X = 50, Y = 110 and a = 10. li'inci tlie vaiuc of b'
t .
Ans.2

p o
s
'
7, Find the equation for the strnight line rvhose intercept and slope ari: - 3 and

g
2/3 respectively.

arrr.y=*x -B
l o
. b
3
8. Find the slope and Y intercept of the line rvhose equation rs 3X - 5Y = 20.

4
o
r'

9
Ans.9and
i)
4

t9
g. Find the equation of a regression line whose X and Y intercepts are 3 and -' 5.

ta
Ans.5X-3Y=15

s
li.nes of Y on X and X on Y are respectively given by 2X - 3Y

/: /
10. If the regression
. = 0 and 4Y - 5X = 8. F'ind the values of two regre.ssion coefficients of Y on X

s
and X on Y.

tt p
24
Ans..'5anct b

h
- 11. If Y = 2+3X, and if the expected. value of l\ is 10. l'ind the expected value of Y.
Ans.32
L2. If Y = 30 - 2X, and if tltc variance of X is 8. l"incl the variance of Y.
Ans.32
A

l.B. Given the equation of the straight line Y = a + bX. and the values of a = 45,
b = -10 and X = 3. Find the value of Y.
Ans. 15

L4, Given 2u*=Y-by* X andY= 1.87, b..*= 0.25 and 12.45. F incl a,,.

Ans. -1.24
[Chapter 14] Regression and Correlation ZOI
n
15. Given Y = 109.73 + 1.SB X and X = 100. Find $.
&ns.267.73
tA
16. Given X = 0.6 - 0.b Y and y = 0.8. Find *.
Ans.t.2

L7, In the equation $ = i


om
-X=30,andX=20. I,'ind Y.

t . c
o
Ans. 11.92

18. In the equatron'* = X * , *G-0,


s tf Y = p = r = 0.bb,
g
Dy 192.50, X 18.65,

l o
n
S" = 4.3, S, = 25, and Y 170. Fjncl X.

b
=

.
Ans.17.8
19. supposethatY= r,ihcn X=0,
X = 2. In this case,
43thaty=2wlienX=
fi.d thc sarnple correlation
1, andthaty=Bwhen

9
coefficient.r.
Ans.

9
1

t
20. If Y= 10,8,6 ancl X = 2, r,0. F-ind the cocfficientof correlatien.

ta\
Ans. I

s
2l'

/: /
Given Y = 10, 8. 6 rrnd = 0. i. 2. Irind the sample cor.relation coefficient.
Ans. -1

s Y.
22. The equations of trvo Lr.gi'ccci,rrr ii.es obtainecl frorn ten observations are:

tt p
10X = 5Y - 55 and 1t)0y = iiXix +'i180. Find the correlation coefficient
between X and .

h
,l
Ans. I
23' For 8 observations on clcposit.s (X);',nii i,r,rnu O), two regression equations are
estabiished rvhich ur" t =-ii.i ir- () r-cx ora i = bg.72 a.2|y. Fin.
correlation coefficient hctivccn rle posrts and ioans.
- the
&
Ans. -0.32
A
24. A set of data yields the follon'rng equation; Y = 16
aYerage value ofY flor X = 6.
- 0.5X. Find the
Ans. 13 r
25, If by* = 1.6 and b",
Ans.0.8
208 Basic $tatisties Part-II

26' If bv* = *1'6 and b*v - 0''1' find the value of r*''

Ans. - 0.8
A

27, For two variables X andY the regressi.on equation of X onY is X= 5Y- 7 and

the regression equation of Y on X is Y = 0,1X + 1.7. Find the coefficient of


correlation between )I and Y.

m
Ans.0.71

o
'Given
S*, = 16 and S*S, = 81. Find r.

c
28.
Ans.0.20

t .
o
29. Given br" = 0.82 and r*, = 0.97' Find b*".
Ans. 1.15

s p
g
30. Given b*v = -1.4 and r", = -0.87. Find br*.

l o
Ans. -0.54

b
31. Given r*, = 0,8, S* = 4, S*, = 20. Find the stand.ard deviation of x.
Ans.6.25

3 .
32,
4
Given r*, = -0.75, S, = 5, x(X

9
- X)G - Y) - -15n' Find S*'

xltv'Y9
Ans.4

33.
a
Given r = 0.605, X(x
t - = 24, S* = 2.12 and sy = 2.34. Find the number
of items.

s t
/: / '(X-X)'= x(Y-Y)'=
Ans.8

s
g4. x(x-Xlg-D
Given = 0, 10, t0 ancl n = 5' Find the

tt p
coefficient of correlation'
Ans.0

(i) h
35. Interprdt the meaning when:
r=*1 (ii)r=-1 (iii)r=0 (v)r=0'2 (vi)r=2 (iv)r=-0'98
Aris.(i) correlation
perfect positive (i1) perfect negative correlation
(iii) no correlation (iv) high degree of negative correlation
(l) week positive correiatiort
(vr)not possibie because r lies between -1 and +1
36. Explain the difference betrveen r = - 0.80 and r + 0'80'
Ans. r = - 0.g0 ind.icates that the two variables have a strong negative relatiouship,
whereas r = + 0.80 ind.icates that two variables have a strong positive
relationship. The two coefficients indicate equally strong relationship'
[Chapter 141 R,eEression and Corretation 209
37. What is meant by regression?
38. Explain the terms regressand and regreasor.
39. Differentiate between linear regression and curvilihear regression.
40. Explain the diff'erence between fixed variable and random variable.
41. Explain the terms regression and linear regression,

m
42, Write a short note on scatter diagram.
43. Defi.ne simple linear regression

c o
t .
44. Define correlation.

o
45. Distinguish between positive and negative correlation.

p
46, Differentiate between perfect positive and perfect negative correlation,

s
47. Write down the properties of the correlation coefficient.
g
o
48. Sketch the scatter diagrams for the followrng terms:

l
(a) perfect positive linear correlation (b) perfect negative linear correlation

b
.
(c) strong positive linear correlation (d) strong negative linear correlation
49.

43
Sketch the scaiter diagrams for the following terms:

9
(a) no linear correlation (b) weak positive linear eorrelation

t9
(c) weak negative linear correlation (d) positive correlation

ta
50. Differentiate between linear and non-linear correlation.
51.

/: / s
Define the terms no correlation and curvilinear correlation. -

62, Differentiate between regression and correlation.

s
tt p
53. Explain the meanings of eorrelation and perfect correlation.
54. What is meaut by correlation coefficient and write its properties?

h
55. Write down the properties of the least squares regression line,
56, I)iscuss the rnethod of least squares for fitting the regression lines of Y on X
and X on Y.
67, Write down the aims of regreasion and correlation analysis.
68. Define ttre terms regiession analysis and cout'elabion analysis,
59. What is meant try residual?
210 Basic Statlsticc Pert-II

EXERCISES
1. Compute the regression equation of Y on X from the following data using normal
equations,
X Z5 30 40 60 65

m
Y fi 5 4 8 7

o
Ana,t = 3.774+0.053X
2* Determine the regression e.quation to thc following data taking X k the
independent variatrle. Also find the differenee between tlre actual valuci of Y
t . c
o
A

p
and the values obtained from the fitted line and show that Eff - Y ) = O,

s
X 5 10 15 20 25

Y ?5 20 15 LO

og D

Ans.t =30-X.
bl
S, The foilowing sample observations were randomly aelected;

3 .
4
x 4 D 3 6 t2

9
Y 4 6 5 I \/. \\

9
I

t
n t l'--,.
7. - r o.{
6.38 a
Deterrnine thc value of Y when X is )

t
n . q) \\
L

s
Ans; Y = 3.72 + 0.38X;

equalto1.1 + /
/
4, Sholv that ihe suln of errors equals zero and the sum of squares of the ermrs ie

s :
p
x I I 3 4 5

t
A

t x(Y-t;'=1.,
t) ()
Y 1 1 4

h
9.

Ans:E(Y-i1=s
S. On the basis of figures recorded below fol supply and price for five years,
eonstruct a regression equation of priee on supply. Cornpute frorn the oquation
eetablished the moet likely priee when cupBly is 99 unit$.
A
Year 1906 1997 1998 1999 2000
1(
Supply 8? 90 98 95 90

Price 132 125 115 123 140

{ne: Y = 281,56 .- 1.68 X; l?7


\

,. A market research firm rvishes to develop a model to predict purehases of


tetlnis balls b3' city, based on the numher of tennis courts in a city, A simple
random sample of 50 cities developed the following data;
X = number of courts in a city
Y = thousandsr r:f tcnnis balls sold in the city

m
X = 235, Y = 37ir, XXY = ,1,135650, IX,.r = 2780850

o
What is ihe equation of the estirnatecl regression line that you would u$e to

. c
preclict Y fi:orn X?

t
n

o
AnsrY=1.5X+22.5

p
7, A university administrator utudied [he relationship betrveen the cost of

s
operating an acarlernic deparl,rnent and the total studr:nt.hours of teaching and

g
supervision undertaken in the dcpartrnent, for eleven departments for the most

o
l-
recent acadetnic year. The results are sumrnarized below in a convenienb forrn:

. b
X = number of student.hours (in thousands) , Y = cosr (Rs- thousands),
'

3
n = 11, nXX?- (IX): = 16810000, XX = 704a,nIY2 (Iy)' = gg10000, Ey = 42ilb,
- (IX)(IY)
4
nEXY = 92.11-1500. Cornpute tfe two regression equations.
Ans.l, t
=1.0.1Y+239.6;
99+33/

t
=0.55X
I
8.
a
For 9 observations on supply (X) and py'icc- 6,1 the foliowing data rvas obtained;
E(X-90)=
-s t
25, I(X-90)!= 301, E(Y- 127)-= 1?, E(Y *127)2 = 1006,

-
E(X -
://
90) (Y 127) = {69. Obtain the line of regre ssion of X on Y and estimate

s
the supply when the price is Rs. 125.

tt p
.t -
Ans. 0.44 y + 148.68b2; 88.69

h
9. .r/Giverr the following information, esrimatc:
(i) the value of X when Y = 30 (ii) the value of Y when,X = 55
The nrc'!an valuc of X = 5J. 'fhc lncan value of Y = 28. The regressiorr coefficient
of'X on ! = - 0.2, The r.egression <lot;fl'iciont'q.rf 1'otr X = - 1.5,

Ans.(i) t =-0.2Y+59.6; 53,G (ri) t =-1.5X+109; 36.5,


10. firnrputing fronr it clata ser of (X, Y) valur-'s, thc l'ollirvirig surnnrirly rilutistrcl,
J
ryore rcrcordeil. n = 18, X = 1.2, Y = 5.1, S: = 1.l,lU, Si = g.tll; S,, = ?,111.
Construct thc regression lines of Y on X rlrrd X on }-.
nA
A*s.y = 0.164Xi-4gfl3; X 1.1.19 Y - 4.66
2L2 Baelc Statistica Part.II

- Fitting a straight line to a set of data yields the following regression equation ie
tr1,
obtained:
t = 16-0.5X
(i) Interpret the meaning of the Y'intercept 'a"
(ii) Interpret the meaning of the slope 'b'.

m
(iii) Predict the average vaiue of Y for X = 6'

o
Ans.(i) The Y intercept a = 1.6 m'eans that when X = 0, the average value of Y is 16'

c
(ii) The slope [ = - 0.5 means that for eaeh increase of one unit of X, the value
unit.
t .
ofY is expected to decrease on average by 0.5 (iii) 13

o
12. Tihe following statisticshave been computed:

p
X series Y seriee

s
v/,- -,. , -.,, .- '''---"---..--.---
l4

g
L4
Number of items

o
83 2;52

l
Arithmetic mean

b
r544 5.6198

.x
S.,mmution of products of deviations of X and Y series from their arithmetic

3
4
means = 90.91.
(i) compute the regression line of Y on and estimate the value of Y when
X=88.
9
9t=
t
(ii) Cornpute the regression line of X and Y and estimate the value of X when

a
Y = 3.25.

t
n

s
Ans. (i) Y = 0.06X -2.46; 2.82 (ii) 16.18Y+42'23; 94'815

/: /
13. Compute the regression coefficients in each of the following cases:
(1) n=24, XX = 5402, IY = 4378,IX2 = 1388656, EYz = 911032, XXY = 1118516
s
tt p
(ii) n = 10, r(x-X)2 = 170, E(Y-Y)2 = 140, E(x-D g-Y) = 92
(iii) rD*= 12, rDy--5, rD*Dy= 390, rDi = 2830' IDi = 91' n= 10

Ans,
h
(iv) r(x - B cr -Y) = 148, S* = ?,933, Sv = 16.627, n = 15
(i) 0,77, b*, = 1.18
br* (ii) by- = 0.54, b*, = 0.66
= (iv) br* = 0.16, b*, = 0.04
(iii) by, = 0'14' b*r= 4'47
L4, The fbllowing data were obtained in a etudy of the relationahip between the A
weight and chest size of infants qQ{th, 1t

Weieht (ke)
Chest size (cut)
compute ancl interpret the sample correlation coefficient'
Ans:0.854. There is a strong positive relafionship between the weight and chest
6n and Corraletlon 213
16. A personnel ofticer ia stut{ying Berformanccs of job applieantc on
two teeta
given whon the applicant contaete thc ftrm. Thc'first'tcct
*uu*rr,
ability; the sceond mensuree potential for euceces in the jrb, Th; i*.t.rruru
;;il;i
recuits of a earnple ofisix applieante are shown bclow:
Applielnt

m
Mcntal alrrlity 1X)

o
Potential flfl

. c
ealeulate th e e a ruBle eorrcla tisn eoeffioiEnt,

t
Ane:: 0.27

o
18r The produetlolr nrnnager of a factory would likc to develop a model to prediet

p
performanee tinte for a nranual acgcmbly tack bared upon the amount

s
of time
Epcnt in training. A sample of 6 reccnt employecs wai seleoted; the training

g
timc in hsur*I a_lu
uurl thcG perfonnunee ti
uRee umc lR mlRurcE Ar€ tcd bclow:
Observation

l o I 2 B 4 6
Tfalgirye rime (hCIurs) 27

. b 24 L2 22 18

3
lg-{qrynance ti n:e (rn inutee) 19 16 t2 t7 10

4
currul*tinolb
Oompute the eoeffieiart of

9
maR€6
timc,

t9
AnE:0,96
17, 0omptite and interpret the eoefficient of eorrelation between the valuce of X

ta
and Y from the follorving table,

/: / s
s y arc uneo*rrtatca,

tt p
Ans, r*, = 0, So X *ntl
18, paloulate the r:or.r,rrl{itjdfl uoefficlcnt bctween X end y and thc ooryelation

h
cocfYleierrt bunvt:r:rr X aird Z,
vTT 'e 4 6 I 10
Y 10 16 20 80
?,6
,
rl

Ant, r*, = 1, f*, * * 1


4A gg a2 28 il,./
24

19, From the folkrwing table, eoffiputc the ooeffiolcnt of oorrelation


by Karl 1
x 4 6 ? ?, I
Y I I 6 11 7
Artthmetie rflsrnu of x and v *u*iffi
I ,{ng, Mlnetng ohecr:vutiein e 10, r,* -.0.Ofl
Basib Statistics Paft-II
2r,4
20. Given the follorving infiormation:
Number of pairs of observations t'f
X and Y series = 15
= 25
Arithmetic mean of X series it
= 3'01
Standard deviat'ton of X series
=
Arithmetic Inean of Y series

m
1:.^
= 3'03

o
Standard deviatiorr of Y series

c
trieans ofX andY series = L22

.
Sum ofproducts ofdeviations frorn
",
t
coefficient of correlatiOn lt"rfv;i:ett
X anti Y'
ao*rrte
Ans.0.89

p o
Letwcen X and Y frorn the firllowing data:

s
21. compute the c,:efficient '.f *,:rr*iutii:n

$eries og
d
Sum of deviarions ofX series

l
Sum of deviations ofY series
4

b
40
Sum of square of d'eviations of '\

3 . Y = g2

4
serir:s X and
Sum of the producl of deviatiotr;: trf

9
Number of pai'rs of observations ''rf
=
X nntl Y series 10

Ans' 0'704
t9 tw* variatet s x and Y is 0'8' The varrance

a
(r) Coefficient otcorrelation l:etrru*en

t
22. of Y variate.
l;';;;;;=
of X is zo-. rno rire rrtanclard deviation

/: / s
.(ii)I.fthecoefficrentelfcorrelaiicubet,weenXarrtlYis_0.?S,thestandard
- - - X) tV Y) = 15n' What will be the

s
d'eviation of Y series is 5 snd X/'X

tt p
standar-d deviation of X sr'rles'
(:&lt;uJat'r: the number of items for which
(iii). From t}re fill]owing inforrrati.-r]r,

h = tltl, Y) stanrlartl deviation of Y series = 8 and

Ans. (i) U.25 (ii) 4(iii) 10


y,
coef}ieient between two variables X arnd rhe
,,.
--- ,, .,uer tg find thc cr:rrclntiorr
iuttu*ing reur-rltr, were obtainecl:
* 1100, f,Xl = i,8400. EYa = 1[]4660, EXY = 61800'
R = 10, EX = 600, EY
otiohecking t}rat two partieular eetE
It wag hu*,,n.. t.iEeI dicuovet,cd at thc tirrt' f*f the
of observatrorlg, rrarnely X= nf,arrA'B;,i=
V; an4 1ft was wrongly taken'valu€
*
eorreer vtluen heing X ff? uod i.A;
t i r:t
'^a
iaa' Computa the eorrtlet
of the eorrCIlatiotl soeffieicnt'
AnE.0,794
Regression and Correlation
24. In order to.find the coefficient of correlation between variables X and Y
from 7 pairs of observations, the following results are giJen:
2X = 220,IY = 47.b6, XXy = lbg4.gg, XX2 = Zggg, Zyz g4l.162g.
=
Later it was found that one particular set of observations, namely X 84,
Y = 4.25 was wrongly taken, the correct values being X= 24,Y G.2b. Compute
=
^ and interpret the correct value ofthe coefficiend ofcirrelation.
=

m
. Ans. 0.9724. There is high degree of positive correlation between x and y.

o
25' Coppute and interpret the correlation coeffieient on the basis of the following

. c
informations:

t
X = maximurn temperature (c") during each day and y sales figures per day

--48, E(yo
=
of lemonade.
X(X - 17) =*2, E(X -L7)'= 218,f(Y-tb)
E (X -'17)(Y * 15) = {g{, n = 30.
s p -LS1l= 1866,

o g
Ans' 0.8695. There is strong positive correlaf,on between temperature and

l*
sale,
26. The following s.ummary statistics were recorded:

. b
3
n = 20, X = 2ii, Y = 3b, X(X - X;z = g0, E(y-?), L70,8(X-X)(y-Y) = - 100

4
Sho.y that the coefficient of correla.tion is the geometrie mean of regression

Ans: r*y = - 0.86, b'x = -


990.bg /
t
1.2b, b:ry = -

0 ta
given from paired observations of two variables X dnd y:

s
8X- 10Y+ 66 = and 40X- tgy - 214 = O

/: /
s
tt p
h
28. Computed from a data set of (X, y) values, the following summary statistics
were recorded:

n= 5, EX= 15, Xy = 25, X(X , X)r= 10, E(y_y)r= 26, E(X-X)(y_y)= tg.
he
Qompute the two regression equationo, Aleo compute the value of correlation
cocffieient
AA
AnEIY=1,8X+1.1 jie0,6y+0,6 r=0.g1
9tB
29. Thc followinH statiaties have bcen eomputcd:
fue
lue ft = 14,19, Y = 21,66, Sx = 6,T1, Sy
= 6,79, r = 0,661
Oonetruet the tws regrcseion equationa,
A.
Ane:t * 10.7G0 .r=
0,ZrJ7 X * = ?,289 + 0,662 y
Bsrlc Statlrtlo Paft{!
2.19-.- .-...""-;..--
80. The following Lnfurnration are givenl

X = 20, t =.4t), $x = 2, SY = '1,


r = t)'?Ll
ntost prgbable valuc of X
Prcdiet the rngrct probable valtte 61lY wlrarr )i = 26 and
'when Y 80,
=

Ansr { = 1.4X + 12; 4? t = o,Bsy+6i 16,8

om
c
81. If the mcan weight of 200 fathffc is 140 pounrlc with standard deviation of 6

t .
yuungt et etrns ig 142 pounds with
pounda uro.i tlJ-iuean weight of t]reir

o
between'thcm ic
etanelarrl Anui*lino ui g p.",;,tt, 'fit .ungi.icnisf esrrelation

p
0, 9, Hsl.rrfl ur,'" tru J*u re[ressio't equations'

AnsrY = l,2X:!6 X = 0,6?6Y+44,16


g s
o
feecling etuff during

l
8!, The mean rtronthly index numbere.of priece of animal the mean monthly
and

b
10
eertain ynur* i* tr*, tgo with clandard deviation

.
indcx nomUcru of prieuu of ftn*t Siu*,oate it 100
with sta'dard tlcviation 8'

3
eoneffuet thc rcgreesion
the eoeffieinni'oi-*Imelation untwE.o i[em being 0,8, I

4
eqttations ol'Y ou X and X on Y' I

9
AA
!

9
AnclY:: U,84X'ril3,? X = Y+90 t

t
d

ta
d

s
p

//
gr

:
tr

s
pr

tt p
le
dr

h
qr
qr
c0
dz
at
of
at
I5

th,
'ta
pe
u

Chapter
x
15

m
ASSOCIATION

c o
.
F6
16.1 VABIABLE AND ATTRIBUTE

t
'th

o
1B There are 4 per$on8 and their heights in
inches are 5b, b6, Tzand 74. Here
heieht is a charaereristic and bhe fis;ee 5i,

p
variable. These figures are the result of ;;;r;Jments.
;d: n and 74 are the valuee of a

s
you know that the
measurements generate the continuous variable,

g
?hus the ,uriuui*-on ireighto is a
continuous variable' suppose we select 4 bulbe

o
from a certain lot and inspect them.

l
Ing The lot contains good as wel as defective bulbs.
The eampr! may contain o, L, z, g, 4

b
h1v defeetive bulbs. The varues

.
.0, L, z, B and a trr. or a discrete variable,
r8, "r, ""r,ie.
out of 4 persone whose heights are given

3
ion above, 2 arc tall with heights 72 and
74 inchee and 2 are shorr wirh heightu 66
a;d ;a i;;.r, when *u ,i*r-tt. words,

4
tall and ehort, any variable is n6t under cun.ideration,
we d; ;;; make any

9
measurements' we onlv see who ie tall and who
ie *l,ori.Eur; i-; oiti*r,t tull or

9
short is not a variable, it ie called an attribute,

t
oui oi + brlbu
defective' Here also any variabte ie not ffi;r-;;deration. 2 are good and 2 are

a
we only count the

t
defective bulbs and good bulbs. we examine whether
the quallty of beiing defective is

s
present in a bulb or not. The status of

/: /
the bulb is an-attnbute rvith two outcomes
good and defective' Thus attribute
ie a quality tt u data is collected to see how
many objects possess the quality of beini dui;J*; "na una rro* many *ru*rrir.

s
possess this quality'"other famous
u*"*piu, of the
t"i}

tt p
ur" iuuai,ieducation,
level of smoking, level of social work, level "ilriurtr* and colour
of ir**u,'ruligion
data on the attribute is the result of recording etc. The
the presence and absence of a certain

h
quality (attribute) in the individuals. The i"t"
*re variables are called the
quantitative data whereas the clata on the
attributes "" are cailed qualitative data or
count data' t\'s the data on the is .ott.cteJ6, trr. purpose of analysis of
data and for infererce.about rhe"uriuut*,
populati";;;;;;;;;;*, similarry the data on thc,
attribute or attributes is coiler:tedior_the pr.;;;;f;nutv.i,
of data *rraio. testing
of hypotheses about the attributes. we shari ai.ru.,
attributes in the subsequent topic in this Ch;p;;.--"-
tle
"' hypothesis testing about
15.1.1 NOTATION FOR ATI'RIBUTES
For a single variable we use the symbol X and
if there are two variables, we use.
the svmbols X and y for them. wren tirere
is a singie;;;tb; ffi ffiffi,"th" *o"d,
'tall' may be denoted by A and 'short' rnay be denoted
by a. If the tall and the short
persons are clivided into intelligent
and 'non-intelligu.rt; p"r.ons, then 'intelligent,
217
Basic Statistics Part-II
218 -.
may be denoted uv s F.*?I
P: :::-1.1i,'X:-"I*,"*:::tiil*'i["il;:i:]ti::';ii
""a t;.,,"a ro. ih" muin sroup rike interligence and
il:l H *H',l:f f,J;:iJ::;i,'i;;;
and'non.intelligent' are also called
attributes.
the sub.groups,intelligent,
15.1.2 oNE ATTRIBUTE te size is
Supposethattherearel00'ind,ividualsinacertainsample,thesamp groups
divided into1wo mutually exclusive

m
roOl"ai"iduals-are
of 100, 60 are tall and 40 are short' If
denoted bv n. There
heieht out

o
on the badis of the attribute of
short are denoted by cr' we can write:

c
'tall' are dbnoterl Uy,e

.
""a

t
Ao

o
,60 40 n=100

p
TherearetwogroupsandwesaythattherearetwoclassesAandctandthe
as (A) = 60, similarly the number
of

s
class frequ"rr"y ..ri"i a t 60. It is-"written within the
(s) ; Thus the attributes written

g
individ.ual. ,rrd"" J1t *'iitu" as the satnple is divided into two

o
theii.L*., fruqrr"n.i.e. In lhi,

l
brackets show 'b "*.-p1e data into two groups is called
liuil, and '.h";:';;;d$;;"
two

b
groups i.ei 'height'
"r"rr* i*to two.In this.exam.ple 4 single attribute
.
d,ichotomywhich ;;;;" cutting is
attribute is involved' the data called
At ,;t-;;
3
divides the data in two groups.
as below:
;;;:*"y;t"tiiit"ii"i'-du t"" make a small table
Act 4
One-WaY Classification

9
(A)+(a)=1 t9
6o = (A) 40 = (cr) n=100

Clearly
t a .

possess A and
/: /
*ho,6;;i s
Thesymbols(A)and(o)areusedtodenotethefrequencyofindividrralswho
ng*r.g1A not 'A'j. tt mav be noted that the
(;-;;";s
fJ';ll il;;" disc,ssion 'short' mav be
ot6er

s
,ri.li.riifv
symbol ,A, is, rrot fi."a

p
denoted bY A.

t t
16.1.3 TWO ATTRIBUTES
be divided into intelligent and
non'

h
The tall and short persons may further
i"rrotua by B and F may be used for
non'
intelligent persons. Inteiligenc" T"y Uu uitrib.rt.. and their combinations'
intelligence. The iii"*i"g"rable showl-airiur""tof the sample aslelow is called two-
\yhen two attribr;;;; i"volved, tfr"-ai"itipn
waY classification' \
Table 1b'1'
.-r( ' Two-WaY Classification

A
(AB)

(Ap) (oF) (p)


[Chapter 15] Association 2t9
The column totals are denotedlv (A) and (a) and the row totals are denoted by
(B) and (0). The above table contains 2 rows and 2 columns and is therefore called
2xZcotr,ti,ngencytableor2x2cross-tabuiationbrieflywrittenas2*2cross-table.
There may be more than two attributes. The symbols A,. B, C are used for the
attributes and cr, B, y are used for the absence of the attributes A, B, C. Thus cr
means not A and B means not B and y means no.t C.
Suppose that out of 60 tall persoris, 30 are intelligent arid out of 40 short

m
persons, 20 are intclligent. We can write these frequencies in the following 2 x 2

o
contingency Table 15.2.

. c
' Table 15.2.
I
I
2 x 2 ContingencyTable

ot
p
I
0. '
s
!, Total

g
29 (B) = 50
(AB1= (crB) =

o
311

(AF) = 3o

bl
(crp) = 2O (0) = iO

.
Total' (A)=uo I (cr)=40 n=100
3
4
From table 15.2. we can write some relations immediately.

9
(i) (A) + (cr) = n (ii) (B) + 19; =

9
(iii) (A) = (AB) + (AB) (iv) (cr) = (crB) +',(oF)
(v) (B) = (AB) + (oB)
a t (vi) (F) = (AB) + (crp)

t
L5,L.4 POSITIVE AND NEGATIYE CLASSES

/: / s
The classes A, B, AB are called positive classes beOause they contain all positive
attributes, the classes cr, 9, crB are called negative clas-"es because they have

s
negative attributes. The classes crB and Ap contain both positive and negative

tt p
attributes, they are called mixed or contrary classes.
If we have three attributes A, B, C with their opponents or complements as o, p

h
and y, then we can write the different class frequencies as belolv in Table 15.3.
Table 15.3.

Total
(ABC) (ABv) (crBC) (crBy) (B)

(r\0C) (A0y) (cr0C) (u0v) (p)

(AC) (Ay) (aC) (uy)

Total (A) (cr)


220 Barle Statlrtlo Pril'll
In thic tablc.the psEitive elaeace are A, B, e'.AB' Ae' B0 and ABC whcrcae th
the
uru o, p, T, o9, crY, FY, spy' AU othei clasneu arc mixed'
ncgative claceee
16.1.6 ORDEB OF CLASSES
going into that
The order of the class depende upon the number of attributes
erne' attrihute' it is called
clase, If a certain .iutt .un give us information about only
class of order ono.
ihe frequcncies (A)'

m
The classee A, o, B and B are claeses of th6 orrtel one and
o'p are
(cl), (B) and (P) ttt ll,t ftuquenciee o{ order one' The classes Ats' A[]' s'B and

o
(Ap), (sB-) and (oF) are the
thc claeeee of order two and the frequencies (AB),
Gr.i.irs of order two. The claesee ABC,(ABt)
t .
ABy, crBg, aBy, ApC A;1y' eFC and cBy
c
"'(c9y) are the I'requencies bf order

o
are the claeeee of orrler three ancl (ABC),
three, The sample eize n doos not coniain uny at'trihute ancl
is therefore called
frequency of order zero.

s p
g
16.1.6 ULTIIT{ATE CLASS FBEQUENCIES
In a certai, ;r; ritr"tioi, ti ultimate cluss frequencie,s are the. frequencies

l o
(AB),
with the highest orcler, For two attribtrtes, the ultirnate elaes f'requeneies are

. b
(AP), (eB) and (eF).
For threo attributee, the ultimate claee frequeneies are of order 3 which
are

43
(ABC), (ABy)r (oBC), (clBy), (ApC), (AFv) (CIFC) and {uF!'
itr.r.z LowEB 9BDEB TREQUENSIES IN TEBMS.9F TIIGHEB 9RDER
TREQUENCIES

99
t
Iret ue disouse the relation of the lower order frequenciee in terms
of higher

a
order frequencies. kt us consider the different cases.
(i) Single Attribute

s t
/: / = (A)+(cr)
n = (A)+(o)
(ii) Two Attributeg

s
the two
L€t us conside r 2 x 2 contingency Table 15.?. for the fi'equencies of

tt p
attributes,Clearly 11 . n= (B)+(li)
(A) =(AB)
(AB) + (cr) = (cB) + (sB;

h
(B) = (AB) + (crB) (P) = (AP) + (aP)
(iii)Three*ilJfJilTom
Table rE.B. to wrire rower order freq,encies into hisher
order frequencies. CIearlY
n =(A)+(CI) Ir= (B)+(B)
(A) = (AC) + (Ay). Bur (AC) = (ABC) + (r\FC) and (AY) = (ABv) + (AFv)
Thus ([) = (ABC) + (ApC) + (ABv) + (APv)
' Similarly (o) = (aC) + (aT) = (crBC) + (oBC) + (cBv) + (cr[]v)
C)+ (crBv)+ (oPv)
n = (A)+(cr) = (ABC)+(Ap c)+(AIJy)+(A (crB C)
pY) + + (crB

(B) = (AB)+(crB) = (ABC)+(ABv)+(crBC)+(aBv)


(9) (AF) + (sF) = (ApC) + (APv) + (oBC) + (opv)
and ,, =' = fegCl + (AB, + (oBC) + (crBv) + (AtlC)"" (AF.r) + (crFC) + (opY)
lll AuoclrUon
with the help of thc Tablcs 18,2, E,E
frequeney in terms of hisher oiour*,-'
and +v'H' wv u'r,
16,g, we ean eaaily
r ;.il;, l"-er ordcr
16.1.E HIGHEB TREQUENCIES
9N93R
FREQUENCIES r"r't r''rr,rr$
INTO TOWEB ORDER
sometimee we have to express the frequency
of a higher r:rder into freque,cicc
of lower order. For thie purpuru we use ttre-roti#iii
rplr"*rs, The frequency (A) ic
writtcn&8r'Aasifn'AtneaReA'eoutofn,^Similartyttufrequency(q)iswritten
as n, o and (AB) is wri*en
know (A) + 1s; =
aB n , AB and (ABC) is;;iiirn ,, n . ABC.

om
c
We

.
n

t
n'A+n.c= n

o
ueing rhe operator., *u .ir,,
;;."r:-il:il,3.
p
rhat --- -' are appricabre on
"r.rme
theee operatore, Dividing equation (l) bV n, we iut
A+o= 1 or A, = l-q, and
g s
s = 1_A

o
similarly we can establish with the herp of operatore

l
that
B = l-p and F =1-B

b
C=l-yandy=l-C
.
Example 16,1,

3
Express (AB) in terms of lower order frequencies

4
with the help of operators.
Solution:
We write (AB)
=
99 n .AB

t
Putting A = l-a and B = l-p
a
t
= n(l-o)(1-pl=n[1-F-q+gF]
(An)

s
= n-nB_no+nop

/:/
Writing the original symbols for np, no and noB, we
have
. (AB)
= n-(p)-(a)+(op) or (AB) =

s
s-(*)-(F)+(sF)
It is to be noted that the left hand side contains positive

tt p
attribites on the a* rid"
-.o;;';;;"d"r*;;r-;ffi;'iu attributes and all
nuerriu" n. A'y
attribute on the left^riq\t
side does not appear on the right side in this type

h
of relation.
Example 15,2,
Express (opy) in terms of lower order frequencies.
Solution:
(opy) can be written as n .cpy. Thus (crpy)
= n .opy
Usingtherelationso = l-A,p = l_B,y - 1_C weget
(oPy) = n(l - A) (1 - B) (1 - C)
= n - n . A - n . B_n . C+n . AB + n . AC+n . BC_n . ABC
(cr0y) = n-(A)-(B) -(C) + (AB) + (AC) + (BC) _(ABC)
The attributes on the left side are negative and all
attributes on the right side
are positive except one frequency oforder zero that
is n.
Basic Statistics Part-II
222
Exomple 15.3.
(AB) = 30, (A) = 40, (B) = 70' Calculate
Given the following frequencies: n = 100,
all the remaining frequencies'
.Solution:
= n' Thus 4o + (g) = too or (rr)
:: 60
We know (A) + (cr)
100 or trll =
m
We know (B) + (p) = n' hence 70 1p1 =
+ 30

o
AIso (B) = (AB) + (crB), hence 70 = 30 +
(aR) or (rv.B) = 40

Also (AB) + (AF) = (A) , hence 30 + (AB) = 40


or (AB) =
t . c 10

o
Also (F) = (AB) +.(ctp), hence 30 = 10 + (rB)
cr (cr0) = 20

p
easily if the ;;ivr:n frequencies are
These frequencies can be calculated very

s
The unknorvn frequencies can be
substituted in the i * Z contingency tabte'

ct g
..fr"f^r"a by simple addition or subtraction' Thus

l o
b lzl'l
Tota1

(AB) =
3 .
3o t""l, = (B) = ?o

p
9 4
[to-l Eol
(crg) =E G) =

Total t9
(AB) =

a
(A)=40 (cr)=@ ,n=100

s t have been calculated by simple

/
The unknown frequencics within the rectangles

:/
subtraction to complete the table'

s
Examql,e 15,4;

tt p
class{req,encies are:
There are three attributes and. their ultimate
(ABO = 19 (ABv) = 36 =
(uBC) 15 (crBv) = 66

h
(ABC)=20(AFy)=rr(crBC)=40(cr9y)=70
order one and qrder two'
calculate all the negative class frequencies of
Solution:
(ct)= (oBC)+(uBy)+(oBC)+(crpy) =' 15+60+40+70 =
185

(F)= (ABC)+(AFv)+(crBC)+(crpv) = 20+15+40+70 = t45


(y) = (ABy)+(AF^y)+(oBv)+(GPv) = 30+15+60+70 = 175
(crp)= (aPC)+(uFY)= 40+70 = 110

(crY)= (crBY)+(crFY) = 60+70 = 130

(Py)= (APY)+(ctpY) = 15+70 = 85


lvith the help of the foliowing
These unknown frequencies can be calculated
table.
lChapter 151 AssocialJon 223
TotaI

(ABC) = 10 (ABy) = 39 (crBC) = 15 (aBy) = 69 (B) = 11S


(ABC; = 29 i (APy) = 15 (crBC) = 46 (crFy) = TO (0) = laS
(AC; = 36 (Av) = 4r (ctC) = 55 (cv) = 130

om
c
,i

.
Total (A) = 7r (cr)=18S n=260

t
l

o
Clearly (cr) = (0)
t45 =
p=
(y) = 30+15+60+70=175 (crp)= 40+70 =
s
110

g
(cr7) = 60+70 = 130 (9y) 15+70 = 85

o
L5.2 CONSISTENCY

bl
If the class frequencies are observed in a certain sample data and all class

.
frequencies are recorded correctly then there will be rro e..o* in bhem and they will

3
be called consistent. But sometimes the class frequencies are not recorded correctly

4
and their column total and row total do not agree with the grand total. If there is

9
I
some error in any class frequency, then we say that the frequencies are inconsistent.

9
If one elass frequency is wrong, it will affect some other irequencies as well. A simple'

t
test of consistency is that all frequencies should be positive. If any frequency is

a
negative, it means that there is inconsisteney in the sample data. If the data is

s t
consistent, all the ultimate class frequencies will be positive.

/: /
Example 15.5.

s
Given the frequencies: n = 115, (B) = 45, (A) = 50 and (AB) = 56.

tt p
Check for consistency of the data.

h
The data is called consistent if all the ultimate class frequencies are
Let us calculate some frequencies of order two.
We know (AB) + (AB)
Here (A) - 50 and (AB)
Thus 50= 50 + (Ap) or (AF) =0
It does not indicate inconsistency because some frequency can be zero.
We know (B)= (AB)+(GB)
45 = 50 + (aB) or (sB) = -5
The data is inconsistent. It means the given frequencies are wrong. If we make a
table of (2 x 2), we get
224 Barlc Statlrtlo Prd.II
Total

(AB) * S0 (qB)*-6 (B)=4tr

(AF) = 0 (q[]) =70 (B) = 7O

Total (A) * 50 (s) = 65 n = 115

One frequency (crfl) is negative in the table, Thus the sanrple


om data is

c
inconsistent.
Example 1fi.6.
t .
p o
In a certain big college, 600 studente of rntermediate level were interviewed,
They were asked tJ eive ttt l, opinion about liking or disliking in the subjeets of

s
Mathematics, Statistiis and Physics. The sample data sent by the enumerator was:

g
300liked Mathematics, 360 liked Statistics.
340liked Phyeice,
l o
I 30 liked Mathematics and Stritistice.

160

. b
liked Mathematice and Phynics, 180 liked Phyeics and Statistics,

3
100liked all the three strbjects. Examine the data for consistency'

4
9
Solution:

9(B) (C)
All the given frequeneiert.un be written in the form of attributes. Let A, B, C

t
denote likin; Mathematics, Statistics and Physics rcspectively and i,, B, Y are their

a
tit aitUt ins of the subjects. We are given

t
opponrntt

s=
n - = 350 = 340
= 600 (A) = 800

/: /
(AB; = 139 (AC) 169 (BC) 166 (ABC) = 100
=

s
All the given frequencies are positive, we ca.n therefore calr:nlate a negative

tt p= n(l-A)(1-B)(1-c)
class frequency of order three which is (spy)'
Now (crFY) = n'61BY

h = n-(A) -(B) -(C) + (AB) + (AC) + (BC) -(ABC)


= 600-300-350-340+ 130+ 160+ 180- 100 = -20
A negative frequency indicates that the sample data sent by the enumerator is
incorrect linconsistent).
15.3 INDEPENDENCE OF ATTRIBUTES
certain exarnples before we discuss the indetriendence in a formal
_^#l.rysider
Example 15.7.

Consider the following sample data on the liking of males and females for fish.
Gender
fulales Fomaleri ?stal
Like itish &0 80 1G0

Do not like Fish

?stal 100 100 100

o m
.c
Dlscueelonl There are 100 males out of whieh 80 lilte fish .anrl out sf 100 femalce g0

t
like fich. Malce arrd femalee have tho same liking for fieh, w;;;titui
ttu., i,

o
independenee between gender and liking or dislikirig of fi"ii. en.ilrci *uy
uf eaying

p
,
,.;
'};;;;;,'}:s.,ffi,ttlrereignorelatioirbctweentf,egendernn*ur*iog/orfieh1
eoneirler the f'r/fcrwing sample eiatu on smokirrg

g s b.v adult nrnlos und adult femalce; I

o
Gendcr i

bl
.
I
q

3
'1
Non.emskere gg ',1

4
a0 t?e
{I

9
Tstal I00 100

9
t'herc are 100 malee out uf whieh 2a affi ernokere and out of 100 fsmalea there is

t
trnly un6 smoker, It uieane that the emokerc irr rrrales ara 30 tirree more than {
I
the

a
'q
eRrokrrrs rrmoRg fcrnalcs, Malee havo a strong relarion ;il;-;;i;;;i;;

t
crnoking,
Thue malec Etld errroking are strollgly asso:eiaurJ, Wu say thnt tirere is pereitive

/: / s
aegoeiation bctween rrralee aud emoking. Thure ure 0CI lbrrruieei wh,r arJnon.*nro[r*e
aE uotnpa-red to 80 rnale.ngtt gnioltere. Thus fernnles are inelinod towardd
Ron.

s
sttoking, The aesoeiation between ferlralee and non.*n orrtrrg ilt;;;rpouitir* typr,
Thwe is tinly I fcmale smoker aB curnBaretl to.!CI rnal: uulukers, tlhus ther€ is

tt p
Regat,ve aesoctatisn between ferrtules and srnnhirrg rrr(l thero ie also negative
aeHueiatiejn between tnales and nun. sruoking. Thirsh a eertain

h
eoritingene)riable,
wlten tltere is positlv6 aeeoelattsn betw,ren #u attrilrutes, then in the-;arfle tablg
tliero exlsts t,l:tl lle$dtive a8oociation bet*, een sCIrllr! eitfier puirs of att,ributes, If
therc
is Poditlve agsseiatiCIn lit:t*serr ,i. und [], then tJ irntJ li ar.e at*6r plsil,iuJy-u*rnniated,
ln thte saee there ts negative ussocration berwesrr rr *i,,t pi, ,;;; ;;;;;;Ln"a'*,ra u,
t'he data irr the fdxarrrple 18,g, urny be wt,rltt:rr irri
Males ll,erunlcs
.:{i..--=i=: -;-= " '
A u, ,l'otal
Non.Bnrot(era, I{ flO i}U l.tll
(AB,
(AB, (eB)
$rttoltcrn, I zn l91
(Ap) (uF)
r!E#a-:4--**L.-; ii
-^,:*:i,*ru;,-.
100 Lrlil g(,u
Basic Statistics Paft-II

In this table 80 is less than 99 and 20 is greater than 1 (or 1 is Iess than 20).
There is negative association between A and B and between o and B, There is
positive association between-A and B and between cr and B. If the attributes in the
one diagonal have positive association, then the attributes in the other diagonal
have negative association. I '
I,5.3.1 DEFINITION OF INDEPENDENCE

m
We know that in probability, ihe two events A and B are called independent if

o
the joint probability of A n B is equal to the produot of the marginal probabilities of

c
A and B. Thus for independence ofA and B
P(An B) = P(A) P(B)
t .
o
The same logic applies for defining independence of attributes. The two

p
attributes are called independent if the probability of (AB) is equal'to the product of

s
the probability A and the probability of B. Consider a 2 x 2 contingency table as

g
below:

l o
. b
(AB) (sB) (B)

3
(AB)

Totali (A)
4 I (o)

ry t9 = S 9
If one individual is selected oui of thir+ table, then
P(B) =?
a
P(AB)= P(A)
' For independence P(AB)

s t = P(A) . P(B)

/:/q3=* * or(AB)=ry
s
'two

tt p
This is called rdle or criterion of independence of attributes A and B. The
class frequency (AB) is called observed frequency and is called expected

h
'^+E'
frequency when A and B are independent. For independence of A and B, the rule is
(AB) = . But this rule is applicable only on the attributes A and B. Similarly
ry
for inelepenelcnce of other attributes, we have the rulee:

(sB) = k)P (AB) =


ry and (eF) = qP
Whcn (AB) > ry, thcn thcre is poeitivc acsoeiatiorl betwecn A and B.

Psartive aesoeiation bctwcen A and L{ meana that proportion of A's in B'r is


greater than the Broportion of A'e in p'e;
/A) (B)
When (AB1 < there ia negative assoeiation hctwee-rn A and B'
ftfl,
--- ----r
[Chapter 15] Association 227
' Negative association betweenA and B means that proportion of A's in B's is less
than the proportion'of A's in B's.
It is important to note that if A and B are assouated in a positive manner, then
cr and p are also associated in the positive marlner and other pairs AB and crB will
have negative associatron.
Iil.3.LANOTHER DEFINITION OF INDEPEND!]NCE
The'two attributes A and B are called independent if the proportion of A's in B's
is the same as in non B's (B's).
om
ff'.c
t
Proportion of A's in B's = ffi Proportion uf A's in p's =
For independence these two proportions are equal.

p o
s
rhus %? = ffl [,, fl = i, then f; = i = ffi]
ffi ffi,'=*. og
l
rhereforeffi = =

Thus ffi
. b
=@o,(AB)='A#tr'
I

43
This is called a simple rule of independence between A and B.

9
If A and B are independent then all the other pairs in the table are also

9
I independent. But if there is positive association between two pairs AB and crB, then

t
the other two pairs Ap and crB will have negative association as explained earlier.

a
l

t
Let us consider the data of Example 1"5.7.

/ s
Total

/: gg

s
(AB) = 80 (oB) = (B) = 160
26
tt p
(AF) = 2o (op) = (F) = 4o
Total 100 (a) = 100

h
Here (AB) = 8f)

/A)G)
=--roo-and @lE)
n
100 x 160
=80

Thus (AB)='R Hence A arrd B at'e independent,

It also impliec independenee betwcen A anrl F, a and B and E and p, LAt uc


choek anothCIr pair',
gt$J ]'!&r# 20
(aP) = 2o and '= =
(trUtl)
(eF) = , there in intlepr,,ndurnep he*vocn ru and p,

The atudents may eheek the othcr ciacscg, The inrlc'prgndcnee in this table mcanc
that men and womcn have the same liking fr:r fish.
BUlc Strtlfilet Prrt.ll
ExamBle 16,9,
Men and women Eo t0 a certain store for buying the artielea. They makc the
payment in each or purehase on ercdit (Ioan), Inveatigate if therc is any relation
between mode of payrnent and the eex of the suatomer, Given thc data bclowl
Payment
Sex Oash
'i Orcdit
Males 40

om
. c
F cnralcs 20

t
Solatlan:

o
Lct us writc the table along with the cymbols

s p
Total
' B (*\B)=sg

og (eB) = 46 (B) = 1P0


PI
bl
(AF)=20
Total
.
(A) = too

3
(AB) = Bo and eP
9 4
t%6#'u
= ry
= 60, (AB) >

t9
There ic poaitive aseoelation betwccn A and B, It meanr ihut nrales rnakc the

a S#
eash paymentg with Brcater frequeney than thc fcflrales, ff wa ofue[ tfc pair (Ag),

t
wc will find negative assdciation,

*
// s
ffi ry
:
(Ap) E0 ana = = 40, (Ap) <

s
Thtts there ia negativc aeeociation betwccn A and p, Femalee are lcee tncllned to

p
t
mako thc eaeh paym€rtt;, It ie alerr elear fronr thc given data, Out of LZl males, B0

h t
make the payment orr eaeh, E0 out of 180 means tt ur,

meke eash payment, 20 sut sf E0 rnoane that


ffi x 100 = 86,1

x 100 ;05 u;femalee make thc oech


% malc:

*$
paynrcnt, Thus maiee and saeh paymellt, go togubher with high frequenoy and arc
ealled pCIBltiv€ly related or anuoeiuted,
Example 1d,10.
We wteh to deterrnine if theru. rE any ditl'erenee tn the popularity of football
bctwecn eollege odueated nralea ariel non collego edusatcd nrtitcl, A ramplc of fOO
eollege eduoated rnaloe showetl that 56 wero football fans. A rample o'f 200 oon
eollEe cducated malea reveal€d that 196 we.re footbnll fans, Ie thera iny evidenoe sf
e diffcrence in football populnrity bobween aollege cdueatsd anrl non ooiilsi edumtsd
rnales,
:
[Chapter 151 Assoclatlon )ro
Solution:
We put ihe data in the following table.

College educated Non college i

I males educated malee i

A o.
Football fans, B

om
. c
Not foothall fanc, p 120 = (p)
Total

ot
100
(A)
200
(s)
800=n

Here 66, ry
s p qP
g
(AB) = %qq 60, = = (AB) <

l o
Thuc thrye ia negativc aecoeiation between A and B. College.Edueatcd maleE

b
chow lece of intercst for football ae eompared to non collegc edueaied malcs, Therc ie

.
pocitive assoeiation between and d, Murc of non uu[ege cdueated maleg.arc
a

3
football fane as eompared to eollege.edueated males, Thus fiotball is more
;oprlr;

4
among non eollege cdueatcd.malee, But hcre wc are eomparing only one obsirved

9
frequency with the eorrecponding expeeted frequeney, In"Exaniplc'io,iz, we shall

9
eomparc all thc observcd frequcneiee with thc eorreiponding cxpeeted frcqucneiaa,

t
In Example 16.lQ' our infcrenae will bc diffcrent and"wc eniU dLoiac that whethcr

ta
thcrc ls independenee bctween the attributcs or Rot,

it ls s
15,4

/
OOi,EFFIOIENT OF ASSOOIATION

:/
When deolred to eaieulate thc level of assooiation, ws ean oaloulatc

s
eoeffieicnt of essoeiation denqted by Q, where

tt p
fAeteg):(A9\ (eB\
-=rr
q (AB) (eF) + (AP) (eB)

h
Thie ie ealled Yule's eoeff,eient of a:eoelation, It lies betwecn 1 and + 1, It lr
-
cxplained in the tamc maRRer aB the ooefflefcnt of oorusfitiuo **f, Uet,*een tha two
random variablc X and y,
If Q =: 1 it ie pcrfeot negatlve asgoelation betwecn the attributeg on the top
lcft aorner in thc Q x 2 oroeg teblc,
If Q= g itmcanuindependenee
u If Q = 1 it m€aRt perfco! posltlve ascooiatlon bctween attrlbutes,
0 tet us ealeulatc Q from thc data glven in Example 16,10,
R
,f
d
e=ffi=ugt}ffi=f#ffi=?#r:0,18
This lndieates negatlvc asao€iatlon between A and B, It ls the rame rerult ar
obtaincd earlicr in Example 16,10,
230 &asie $t*tistles Part'II

I --'--
15.5 1r-DISTRIBUTION
Chi-square written as f is a statistic which /lL|
I
,.--
-'..-
hss a positively skewed distribution as shown
below" The value nf 1r varies from 0 to o. 1e cannot
take any negative value, The shape of the x:=-u
:* l:'--.-=. r; {'1I \

m
distribution clepends upon the degrees r:f frer:e{om
iscalculated from the given satnple"

o
rvhieh lijgill;'" Ili, I

c
1!-distribution ean be used for various purpoee$.

.
Onc of the applications of 1! is to test tlre
independence between the sttributee.

ot
p
15.6.1 TEST OF INDEPENDENCE

s
With the help 9f'12;611a1r.ibution, we eaR l.est rvirr,th,r,:r'tiii'r:,llr:!;,:i, ';r!,:

g
inrlependent or there is acsoeiation tretlveen thetrt, The pr'L,r:r.iltit't: r'tri::: i1l hi.'ie1','l

l o
l, The nuil lrypothcsis Hn is frarned

b
We acaume thut there ia independenee betwer:n tite r.il-ti'!irttt:.i.

3,
3 .
The alternative hyBotheeie H, is that thcre is rrspoeiaii!;n ii'Jl-1ti't-'n i'i'i+ *tt;'ihtitij'.r

4
],6verl of eignifieanee n ie decided,

9
f rr. -= r^r3 \
3, Teet.attttipxlp lururl iu X?'-= I I =gg

9
I

\fo)

a t
whet,e fo stanrls fot,olleervcd li'eqtrorrc)'linrl lo utHtrcl* iir!'L')i!ir:r:lr',i ir.'ritilriev'

t
calculated unelt:t'the nsaumpt,tott r,hat attribq[sa at'* tlxdr:p;ilrli,t:i

s
/: /
4, Computationsl
Thc 1'-efatiatie cttn bo ug0d to check 1ltr,, indcl,r'i111,'1i,'.' ':, .r ..ri'i, t,i'.,tlrtl:ii(t'i'i

s
eonttrining itRy nurrtber ofl solutlna atrcl rolTe, l,et 'Jq 1'rt';ii ::li,,i;lrir iii,., ;riiili:t'irlii:rr i,rf

tt p
XlonR2xSc,ontitlgt.lncytable;t]'he.lbu''r'vltl ft'erlutrt(:ti:cirlr:iai\'r",],,'i,,r, "':!.:''!'r,:it
rrf n talrle, It ie only ftrr out'cnnvetienee that we wt"ii.,-l Ilt* ,']'r,,'.'; 1'r,'i'.ri"i!"ii'rG iil tllu

h
1'orm of a tablc having columns and l'ows,
Px2 Corrtitrgcney T'nble
q .i.i;t;il
S
(4131 (i.ii) (t:)

tl (Attl ts[i) ili!


I

Tutal (r\) t'I) jl

Our null hyppihr,sie ic tlrat ilto qttrilrrii,iia ir!'c : ,,! '; il .'; ;i!!rl !i ;l,ii
iiidopsndent then the ohservr:tl freqrtency irtBi i:i *ilt::ii l" !-1:t'rl::r;,! tlitg

gulit'oaeh, we calcqllte the exl)rrctod fr:er1Llt,$cit-g fi-rr' :*i: :," . , i.'.j;t t.ifii,
Ir' atrrl (eF), Thc cxper.:[t'tl ft',.r(luettcica nre gitli'*lat-i,{ iir;,.1,,'r rlriJ: i,i.i! rtt Lliiit
nuli hYPotheain ts lt lttr.
I
{Chapter 151 Association 231
Expected F'requencies Calculated
;\ q Total

@Jlu (qXIr)
I1
n
( Il)
n

m
(*) (il)

o
GLfU ($t
Il

c
n

t .
'l'n tir I (A) (u) rl

o
It may Lre notr:tl that the eeihrnrn enri row totaln iiltitc table of oilr,rrl'1:sf

p
frequeneies arli thc snrne aB j11 the taiilet of expeeted fr"celuerrcir:r-.i, 'i'ho observc.rl

s
fregueneies are eleuotcrl by fu anri the expeeteel t'requenciee nre rlonerteel by f,,, Iror tlrr:

g([
I ealeulation rrf 1e wr. nrritc the e:rpecterl frerlueneies corre.sprirrriiirp; lo thr:ir obr:r:trv(,.r1

o
frequeneics, 'l''hs neecgtifiry r:nlc;triritir;ne are done a.! shown.in l,he flllorvinrI eolutnns:
I

bl
.
0bserved frequr;ncieI' f:lx1r r:et rr rl fre q ue neie s
trll f,-': fo - f*]"

3
fr,

(AIl)

9 4 lst{El
R

* (At])

t9
sulH

a
ri

t
+
ll

s
IEXS

/: /
(oE1
l"l

s
(nft) el={t]l

tt p
l1

lt u

6, h
CIrltieai regiorr:
Theentieal rcgi'.in in tlrie tes! irhvirya lieri iri tfue riglrL airlc ot'th* rJistirihut.icrn, l't
depende uptt6 i1u it:vt l rl' i,rgrrilieaitce rr aurl 1fu6 rlqigr'Ert$ i'rf l'i,fuiiuiii. hl ttrut,* ril'
indeBenrlenqr:, thp ilrgree of iroedunr i* r:alcrrlatEd au txrlolv;

(r,* l) ({ - 1)
degrees of fieedr;rn (rl,f.l =
\*".-l{'
ict Ii.rr1 I'lur'ttrrt
where r ie tJrq nurlitiel rlf rows arrrl c ia ttre Y,.
number of cirlrrmus in the crrntingopr:y iahb, t1 t
1'hc critieal valuo of x'is
lionr the
ee{iu of +pahle
--i*"1.,.*"-
11, For level of sigtrilicallce r,t, anrl rlegrccs ^rJ{t -
of freeeltur (r: 1) (c - J), tlrp tablc value io
Basic Statlstics Paft'IX

denoted by. Xze(**t)(c*t). In a 7,2,tab1e, under the coitrmn heariing cr and against
d,f. = (r - 1) (c - 1) given in the left column, we read the value nf X'o i4.g.1' Whcn
o = 0.05, d,f. = 1, then Izo,ou(r) = 3,841.
6. Conclusion:
The hypothesie of independenee is rejected if thc calculnlctl vitluu of 12 lies in
the rejection region. The rejeetion of hypothesis meatts that tl:r: allributes are

om
c
acsociated

.
i

t
Dxomple 15.11,
ealeulate 12 by using the data given in Example i5,7. to test the independence
betwecn the gender and liking for fieh. Use u = 0,05,

p o
s
Soltttion:

g
The data of Example 16,?, ic reproduced here

o
Malee Fcmales
l
'tbtal

. b
Like Fish 1Ur)

3
Do not likc Fich

We writc
9 4
uhe hypotheses as belowl

t9
t, Hur There ie indepondcnee betwecn gender and liking ti;r lieh,

a x'
I{,: There ic aeaseiation between gender anrl liking i'or fie}t,
5)

s t
tevel of signifie anec e ie given, e * 0,06,

/: /
3t

B, Tust: atatiutie used ie 1u where = E


[ +f ]

s
tt p
4, Cortrputntions:
The giv€tl table of observerl frequeneiea is ivt'ii,trii itrr

h
Al* 'l'i;1 ri I

B (AB; = g0 (etli) .j gg (lt) '= 1611

F (Ap) = g0 (a[11= 96
([]) =' +tt
Total (A) = 1oo
(u) = luo
ri =:1oo
Tlre esrresponding cxpcetcd frequencies aro eiilcrilatr.rri ne ht'litrt':
A ri 'l'trtttl
='r;-
;;rp,-ru1 ;"inlilJuu' -. *,,'-*,
,, (At(p) * +
l0*0_x4*0
r)r,r
su
' ttt '- '!,ll
klll]"l _ 1!r)g.,,t"' {li; :'
l) n 200 - lr '111

- =-- :Tii:'-ii[i=: = =** ==*--=iT=-n


]r
lChapter 151 Association 233
The necessary columns are as below:

fo fe fo-f. (fo fJ2


(fo - f")'
- f;
80 80 0 0 0
2A 20 0 0 0
80 80 0 o 0

m
2A 2A 0 ,0 0

o
200 200 0

c
0

.
Here d.f.= (r-1)(c"-1) = (Z-L)(Z-1) = 1
5. Critical regicin: X' , X'o.or(r) = B.g4l
ot
6. Conclusiorr:ihe calculated value of X2 = 0 which falls in the acceptance
s
region.
Thus hypothesis Ho of independence is accepted. When = 0, it means perfect p
g
X2
I

o
independence betu'een the attributes. I\{ales and females have exactly equal

l
Iikirrg for eating fish.

. b
E
Example 15.12.

3
Let us consider the data of Example 15.10. and calculate Xz to examine the

4
independence between coliege education and liking for football.

9
Solution:

t9
The data of Example lb.10. is written as below:

ta College Non college

s
I

/: /
educated educated
. males males

s
A cr Total

tt p
Football fans, B (AB; = 5< (oB) = 125 (B) = 180
Not football fans, p (AF) = as

h
(crp) = 7S (F) = 120
Totai (A)=100. (cr)=200 n=300

l1.We frame the hypotheses as:


There is inclependence between type of education and interest for football.
There is association between type of education and their liking for
football.
Levei of significance, cr is decided. Let.c = 0.0b
/rr -r rz\
Test - statistic used is X2 where xz = Dl I+ I
\te.)
234 Basic Statistics Paft-II
4. Computations:
The expected frequ.encies under the assumption that Hn is true are calculated as
bcior,i';

(AB) = ,Yt =rh* =6o (AF) = -?'


{A) (B)
=
100 x 120
-Too = 4o
200 x 180 _ (o) (E 200 120 x
(ctB)=ry = 120 (GF) =-_-.-m =iJo

m
300 h

o
C:llculation of X2

t . c
55 t)u -5 ')x

p o 0.41G7

s
,15 40) If 0.6250

g
ltq 12f) c 25 0.2083

l o
ry:
lt) 80, -5 25 0.3125

300
. b
2
300 0 = tr.5625

(].
43
9
Conclusion: The caiculated value of X' - 1.5625 is less than the critical value.
'lhus the hypothesis of independence is accepterl. . It means that college-

t9
r:clucated males and non college ed.ucated males ha.ve the salne liking for

a
football. This result is different from the result given in Example 15.10. In

t
Example 15.12. only one observed frequency of (AB) = 55 was compared rvith its

s
/: /
100-t lgg
cc,rrespo,cling cr.pected. frbquency - = 90. The difference
"*-

s
between 55 and 60 is no*u very large. They are very close. In X2 all the expected

tt p
frequencies are compared with their observerl freqttencies. The 1z-test is a very
powerful test for test of independence. We shall admit the resuit or conclusion

h
based on the 1z-test. Thus H" is accepted.
7t .5.2 DInEC? FQRNIULA FOR CALCULATING 1' IN 2 x 2 CONTINGENCY
TABLE
In a 2 x 2 contingency tabie the value of 12 can be calculatecl without calculating
the expected frequencies. Suppose a 2 x 2 contingency table has four celi frequencies
as distributed belorv:
lst Attribure ' Totai
b a+b
2nd Attribute
c .l c+d
Totai a+c b +.d a+b+c+d
[Chapter 1E] Ass*ciaition 235
The value of 7;: can be calcuiated directly by using the formula:
r (ar-b+d+d)(ad-bc)z

Thr: i;io+f of this forrnula is beyond the ievel of this book. .

Lei *s calcule r'e y,2 bv ,sing the above formula from the d.ata given in
E;iaraple 15.12. From the clata giu", in exampre rb.rz,we have
a=55, h=125,c=48 and d=7b
ThLrs
^vr = 9=+-
(5ir + tr2ir) (125 + iil eS + Tb) (5b + 4b)
om
= --l1oq-Lzalqqga__
(1S0) (20r)) (120) (100) = 4n
675
= 1.5625
t . c
?his an-"rver is t.he same as calculated

p o
in Example 1b.12.

s
1,5.S C&NTI}{GfiNCY ?ABLE OF' HIGHER ORDER

g
SoErCtinies a cet'lain chai"acteristic or attribute has more the,n trvo categories.

o
For itxarnl;L: r';ilq11 i1r.i elc takrng about heights of persons, the population or

l
sample
can l:c rlj,r.'itic'l intii i'cr-:i'clas.-ces cr categories like, very tall, tali, medium
and short.

b
In qeneral if the aitribute is A, then its differentlevels ard'dencted by A,, Az, ..., A,.

3 .
if ii has, r ca-ti-.gcries. ?he sii.rne popr_rlation or sample rnay also be divided according
tc:rncrihr.ri: iiral'3.i*"isti{ sily ts r,vith its. levels 8,,-Br, ..., B. rvith c

4
categories. The

9
srt.;plr r!al r ',,- tivo attril,utes can be written in the form of two-way classification as

9
i:rrlow:

t
Table 15.4.

ta
'[rr,o-vray Cla.ssification

s
lAttribute A _--.---- B Attribute

/
._--% Row Totals

: /
Br B2 Bj I)
uc

A.

p s (A,8,) (Ar82) (A, B3) (AlBc ) (A,)

t t
1r.l (A.,8,) (A282) (A, B;) (A2Bc ) (Ar)

h (Ai ts=) (A1B;) B.)

/.\ \

Colurnn

'iril:ic r i:ovrs ancl c columns, it is therefore called r x c continge.rcS-


15,4. co::tains
tai-.ic. 1,.ach fi..:quency in the table is called cell frequency. It is the extension
of 2 > 2
contin{i:ni:1. t-abie anC }-1-statistii is used to test the independence betrveen 1,he
atii'ibules l,|ivcn in thc rorvs and coiumns.
Basic Siatistics Part;II
2?6
frequency in
The proeedure is the same as explained earlier. For each observed
the sample d.ata, the corresponding expected frequency is calculated'
It is calculated
the two characteristics' For
on the assumption that there is independence between

each observed frequency (A181) the expected frequency


t. Iry where (A) is the
frequency E' a
total of the row .\ and (B) is the total of the colunrn Bi' For expected

m
more general formula may be written as

o
E= RxC where R is the row total and C is the column total'

r[-{) t.) c
12 is calculated bY the formula

o
( (ro - f"),
=
r5.7
s p
g
LIMITATIONS OF x,'
when allth9
The 262-test of independence gives very good results or conclusions

o
i. q l!"
the test is not very reliable'
cell frequencies are very large. For small ""1ift.qr*.tcies

b
frequency is less than 5' If any expected

.
X2-test stroula not be used if any expected. it' One coiumn
done about

3
frequency is less than 5, then something
column before
containing the small frequency/frequencies is added to the adjacent

4oftws
than 5' the entire
."i."f"G 2g2. Similarly if some row has expected frequencies less
9
If we
;;;d"ff to tt. aair.u,t row by addingthe corresponding cell frequencies'

9
or columns' we should choose that

t
have the choice to reiuce the number
given data and this column

a
column or row ulhich we think is least important in the

t
or row should be added to the adjacent column or row'
Exomple 15.13.

/: /s
In a public opinion survey, 2000 persons were interviewed to
give their opinion'
attitude on a certain

s
The individuals interviewed are classified according to their

tt p
table below:
social scheme and according to sex. The data is given in the
F'avour Oppose Undecided TotaI

h
600 320 280 1200
Men
450 280 70 800
Women
1050 600 350 2000
Total
oPinron about
Calculate 12 to examine whether men and women differ in their
the social scheme.
Solution:
sex and their
1. .The null hypothesis Ho is that there is ind,ependence between the
attitude towards the social scheme'
the two
. The alternative hypothesis H, is that.there is association between
characteristics.
2. l,evel of significance: Let cr = 0'05
lCha-pter 15], Association *-
- ,232
3. Test-statistic: y2 = L(<to -tuY'l
kfe)
4. computations: Let A, and A, denote the rows and 81, B,
t' '' and
--- B,
columns. The given table can be written as: -d denote
-' the

Br B, BB Total

m
Ar 320 280 (Ar) = 1200

o
A2 450 (AJ = 8oo

c
7A

t .
Total(B,) = 1050 (Br) = . (B") = 850600 n =.2000

BB o
Expected frequencies f" are calculated as below:

p
, Br 82
s
Total

g
A1
1050 x 1200 1200' 600 x BEO x 1200

o
2000 2000- 2000 (A,) = 1200
630 = 360
bl = = 2lO

.
800
1050 x 600 x 800 350 x 800

3 = 350
A2
2000 2000 2000
(Az) = 8oo

1050 4
= 42O = 240 140

9
Total
(B,) = (Br) = 600 (Br) =

9
2000 = n

t
It is important to note that the column and row totals are eqr.ral in the original

a
table and table ofexpected frequencies.
262-CaIcuIated

s t
fo
//
: 630 30 goo ru3 -Q-t")'
s
fe (fo - f") (fo - f")' fe

tt p
600 _
420 30 900 2.L4

h
360 -40 16m 4.44
240 40 1600 6.67
2L0 70 4900 23.33
140 -70 4900 35.00
2000 2000 0 v2 = 73.01
5. " Region of rejection:
d.f. =(r - 1)(c - 1) = (2 - 1X3 - 1) = 2
x" x'o.ou(rl = 5.991
X2=o y2
" o.os1z'1
238 Basic Statistics Paft-II

6. Conclusion: The calculat*:d va},.ie of 7,t is ?3.01 and the critical value of 12 is
5.gg1. The X2 caicula.t:d Ir+m the sarnpie data falls in the rejection region.
Thus
hypothesi.'of irrd.-penclcnce is rejecteci. It means that men and women have
different opinions ahout tha sc,cial scheme. Sex is associated trith the attitucte
towards the sociai schet;tt.
Exam.ple 15.14.
Given the following rrii:ie. lnicuiate 7.2 ts e>;amrne whether there is evidence of
reiationship bebween the ir-:teiligenc{- lt:vertr of fathers and sons. Use cr = 0.05
athr:rs
om
c
Y

t .
Sons Ver;r Average Non- Total

o
I

lntelhge nt Iriteliigent

p
Very Intelligent i0 35 5C

Average 1,50 l4c

g s 15 305

o
155

l
Non-Intelligent ,{0 95 20
.10
b
.) ?rt 510
TotaI 2{}0
Saiution:

3 .
1. Tiie nuli hypothesis to l-:e tesreri is that thet'e is nc relationship between the

4
intelligencc of iatiici,t ari rruils.

9
The alternative h3rpothes:s is iliat there is relationship (association) het'weeu

9
', t I;---,
the inteliigerlce levei ilf iaLi:ers';rnl,|- stnE'
2. Level of significance: cr = O.Ci

t a \'e) --
::- t('o-fe)2)
s
3.

/: /
Test-statistic: I

4. Computations:

s
Let A,, Ar, A, be r-tsed for th.e ror.,-s and 8,, , 3, B, be used f,or columns headings"

tt p
Tlbl of expected freciuencies *aicnla*ur':d

h
B, B" B. Tor"al

,A
?oo yo glgiq9 40x50 (A,) = 50
^l 510 51Ll 510
j =
= 19.6 .io.Ll 3.9

200 x 305 2?0-x 305 49j1!qq (Ar) = 301-r


A, 5lC 5ic 510
..)ao
= 1tr9.6 -- 16i.5 -
200 x i55 3i!-L$n 40 x 155
A.r
(Ar) = 155
510 510 510
_ onn ,1.) ,)
= 60.8 -- o!-u -

'l'otal (8,) = zoo (Br) = 270 (Br) = 4o n=510


[Chapter 15 Association
--_ 239 I

One expected frequency under the column B, and against row A, is B.g which is
iess'than 5. This frequency cannot be used in the calculation of
X'. Now we have two
options (i) column B, is added to column B, (ii) Row A, is added to row A," But the
total of column B, is 40 which is minimurn of ail the column and row totals. It means
column B, is less irnportant as compared to row A,. Thus coiumn -B, is added to
column Br. Tl,ris is equivalent tc combining a small sample data with u 1u"gu sample
data. Thus the tables of observed frequencies and the expected frequencies would
become:

om
c
Observed Frequencies

Br Br+B,
t .
o
Total

p
Ar 10 35+5 = 40 50
140+15 = 155
s
A2 150

g
305

o
A3 40 95+ 20 = 115

l
155

b
Total 200 310

.
510
Expected Frequencies
Br

43 Br+8, Totat

9 26.5t3.9=30.4
Ar 19.6

9
50

t
A2 119.6 161.5+23.9=185.4

a
305

t = 94.2
A3 60.8 82.0+ L2.2

s
155

/: /
Total 200 310.0 51C

s
12-Calculated

tt p
(fo _ fe)2
fo (fo f")t
fe - fe) (fo - fe

h
10 19.6 - 9.6 92.16 4,70
150 t.19.6 30.4 924.16 t.to
40 60.8 - 20.8 432.64 7.L2
4A 30.4 9.6 92.16 3.03
155 185.4 - 30.4 924.16 4.98
115 94.2 20.8 432.64 4.59
510 510 0 2
= 32.15
Region of rejection:
d.f.= (r_1)(c_1) = (3-1)(2-t1= 2
X2o.ortr) = 5'991
Basic Statistics Part-II
240
greater than the
6. Conclusion; Since thb calcuiate ci value of 72 = 3? 15 which is
.riti.ut value of I-r.991, t1r.e null hypothesis Ho is rejected and H, is accepted' It
means that there is reiationship (association) between the
inteiligence levels of
This is what
fathers and sons. Intelligent fathets have usttauy inteuigent sons'
the sample data indfcates through the 12 as test of independence.
15.8 RANKCOR,RELATION:

m
'are
we often confronted with situations where the basic d-ata are not

o
be deveioped and
avaiiable in nqmerical magnitudes but where the rankings can

. c
used to examine the relationship between data sets'. To calculate
the spearmatl's

t
giving rank 1 to
rank correlation coefficient, we firsf rank the x's arnong themseives,
the largest or smaliest value, rank 2 to the second largest or second

p o
smallest, and so

s
on; ihen we rank the .Y's siln.ilariS' amollg thcmseivcs' Thc Spearman's
rank

g
correlation coefficient, r", is given by the follorving formule

rs = 1- n(#_1)
6)-r12

l o
. b
3
where d. =. difference betrreen the ranks f'-;r'ths pairect observations'
n =
4
rrttmber of paired observat:r''ns

99
. when there are tied. observations, the mean rarrk is given to each observation

t
variable are
in the set of ties..For example, if'the fourth and fifth largest values of a

ta
the same, we assi.gn each the rank (4+5; 12 = 4.5, and if the sixth,
seventh and eighth

/: / s
(6+7+B) l3'=7'
largest values of a variable are the same. we assign ear:h the runft =
ru is -1 to
The possible range of values for Spearman's rank correiation coefficient

s
+1. If f, = *1, there is perfect positive rank correlation ancl if i'. = -1,
there is perfect

tt p
negative rank correlation. If X and Y hre independent of each other, there is no
f.

h
relationship and thus th6 rank correlation coefficient = 0.

Example 15.15.
The foliowing were the "performance under stress" rankings of 10
honor
students before and after rnid-sernester:

E F G H I J
Stud.ent A B C D

Rank before 1 2 J 4 5 6 7 8 I 1.0

Rank after 6 5 8 I 3 4 10 1 7 2

cohpute the spearman's rank correlation coefficient for this data set'
__
[Chapter _________._
. 151! Association
%
24!
Solution: \
Rank beforr: Itanh after

I 6 -5
.)
;t *r),)

m
3 tt .,"5

o
4 0 *5

. c
i) 4-2

t
t1
4
G A -1.,

o
4
I 10 -3 I

p
I

s
1 4g
$ 7 +2

g
4
10 2 +6

o
64

' l t -ffiffi
Id'= 218

. b
Slteartnatr's rattlt curreirrtiori coeffiuierl[rl',i:- GXdg

= '
4 3
I-ffifrfrTq = - - -6Gt tlt
I l.ilz = -0,32

9
Example J6,16,

t 9
A suatisties ins[rur:trrr wnrrte t(r know wlrt,tlrur tlrslu is h uur,relation
be[woen

a
etutleritH uridterrr averagrre
..yur*t{uE nnrl
nrrq tlroir
Lr}Elr' finnl
llnnt oxarninntir;n
exiil:ltlllntion BcorUB, ,l,he
tseoruB, 'l.he inBtrugtor
i

t
takgg
a randum sarnple of rtirte ritUdtrrlt,rr l'rr;ru
lrri:vinne stRtjeticc courncg antl ,,trtain,

s
fqlluwing rlatal the

/: /
s
tt p
(li) lntrlfl,f't lhti vnlul, u( t,u olitrritl(,(l l' il*1.1, (i)

h
8El. tiunt
Midttintr ir'iri$l
avc!ldHrr it tpt t iritl sr]t.rt'rr ltrinlr rif iiln lt rif'
t
rl,. X=.Y de
(Yl t Y
7g ii i1
'l IJ I I
flrl 1l? il f, (l 0
ti (J
t{0 {J ri U 0
i*A
l, 7it f; $ U rl
rt? ?r T
I
" -.ti
4
l,f, ritt Ir ,i
I 1
riit firi t{ -i
I I
7,i -tB .l I rl tl
fli) it i, I :J -.,a ,l
liils a :i0
242 Basic Statistlcs Part-II
6rd2
r.=1"r("r-1) 6(20)
(i) = i-3(ffi1) = 1-0'17 = 0.83

(ii) The rank eorrelation coefficient, r* = 0.81i suggests that there is a strong
positive correlation between midterm average anci final-exarnination score in
Statistics courses.

m
Example 15,17.
The nurnber of houre of stualy f,x rrn examination and the gradcs received by
o
c I
.
a.random samBle of 10 students are:
Nurnber of hours stuciied, X I 5 11 I3 10 D

ot
18 16 2

p
Grarie in exanrirration, Y 7$

s
au 44 72 70 54 94 86 3B 65

g
eompute ancl inter:pret the Sp*arman'a rank eclrrelation eoeffieient,

o
l d=X:Y
Solutian;

. b
Numbpr of' Graelg in
Rank ofY

3
houre etudied cx*rmin*ttion Rank of X 4o
(x) (Y)
I s6

9 4 4,6 4 0,6 4,26

t9
6 /14 8,5 o 0,6 0,26

ta I I
11 7t) nt :1,0 1,00

/: /s
ryfi
1g lJ1 7 1,0 1,00

1il 70 6 0,0 0,00

s
6

tt p
5 s4 2,5 o
t, :0.6 0,26

i f-l g,l 10 10 0,0 0,00

h
ls I
85 g 0,0 o,oo
i.t
oo 1 1 o'o 0,ocr
g '65 :0,5
4,6 6 0,26

Eds=B

Spearman'e rank sorrelation eocffieient, : 6Ecll


rH-.= 1
fffi=f
. --+
- '--.JfL=. =r.=#-
ioiloo-l)-'g$o: =t-.0.0?=o,UB
{a *
0.[]8, indiEating Btrong poaitive correlatir:n between the number of hsurs of
stu,ly and tho gntde in oxarninatiorr.
15 Association 243
SHORT DEITINITiONS
Aftributes r /
If a characteristie wi:ich is being meaaurecl rs o f(q1 rli ta t i vc) na fure, is called an /L

:.:ttnbutii. ;lrtlihutes ar,e clr:lrritr:,J b1, ,\, F, [], rr" y, tiflFtrr':#r


[], i[i'i tt u tce A, B, e are
t the 1i*sil,ivr-: ettrilirrlcg arrtt rr, iJ, y iri,,,r !irjii:rlr\,,j ii!.1 l rJ
: L, tt i.ii ll

m
.

o
Thc cliarflr:tl:ristrc l:eing stur{ierl,* unu,rrr'rrrer.:i:, js kiroivn ag $R attrihute,

c
'

.
I Verria[:]*
I A v*riatrlr rs a pli,:ril;lll,ijli{jir ti:;it lri;lt.
"*i:r
j i'ri)!11 liir, :iiiiivjllual

ot or.abjr:et to gnot.her,

p
!,':' ?

s
charaeteristic tlra_i earl hltvr: tJiltlr:rrrii v;:iues is
,'1 r;;1i[{rd a yarilriri*.

g
In*lcpendenee o l' 1

l o/' - OrlP
T'rvo gttril-,irLcB A anil iJ are sair.l [u Li.: ilcit.l;r,iirlt,rrt'ii
!Uj) r,.P t

.b
{
Associati ')
tnr*n..,;nro,t.

3
If twu arr ritrurc,,* ar in*rlcririnrlcnt, tho5, arc ,-,fhT J,

4
'1},.,t
I

Prl-s i L I vti Axsor. i ii $;*-

9
t"i

9
'Trv-o attt:itrut.:ii A i,rtrd E or$ srrtl ta br;1;c:;ifivrLy assr;cl:ited il !,H|:}.

t
(ATd
I "' " +! ^.-""--
riY

ta
o;

s
li

/: /
'I'wqr a[trtbritqs A airrl arr, riuri.i tu b,: lii:t{a1rvely ;r$$tui.:iati]6 if {AIl) r {S}lE)-
('ontingcrncy Table

s
A tatile irhi-[virtg tlrs c]'uss-tabulation or jr-,int clistlilrrtriorr of two vrrriatrles

tt p
("(,lll
is known
irhi ! [rfir-'rlt:r/ t i.rtrlr-.

(,,'

h
i! t8hle tr$aril ffi classif:'sttniBlti ohse'i,;itiorrs *cci,'rlr*u tG twD or irrr:re idt,ntiflable
r.'lrarilclt:t'ir,ti,irr tr, r ;iilnrl r:illrlirlH(,u{:y f ;il.ri0
./
i,t.itrrk ('trr,r"r:lirtiarr
'l'li+'r i'{l:ll i:iirr:€llali.iort ri''rlt'r'iirt
',
-/
tiL,' :'r.1*llrrirl,llii;i i-r,,l11iy;.1s tii* Lt*ii srltir tti' rarti<inr:a
ll;:rt ili, l,r..i,,:.,.;r:ri t1,,.: r.rrtlkiilp,,. rri lir,. ,ri,;. 1:rr;:riili. i.ijr(! lfr,, ,,ri,f,,i"*;,;i ,;;;; ;;ffi;
,,'al'rir lii,..

,ti1lt":rl'u l;r ir':i i{ :rl r ir {'t;1.i.*iirl itru Llitt: l'l!t.ir.rr{_ V'


l\ tii*l, l,r,,i ()l riii ,i;.rii'r;:;: jt,;il ii:liittfi tli. ,1..;a1,,ftr! ,irr::,rrr.trrliult Lir;l.1y1rgtt tltc twrf
var,tr*LL.rl fiir,:r-iorll.{::il a, lIld r_il.i.liriirl ievel r* r:rilIer! ;jil]rji{rltii,ill,rj ('-ri* rrr
4'a o trillll.; ct:r.felHticrt
critrlfii:i*nt,
tli'
A,rAUh i.:itfr*iaiit;rr i-:ir;;,itli;:; il ,lifirr:Urrl r_rf lit* iicgr,:r: *l ii1i,_,iifr[1,brti{Eel} the fanliiffg
v rt t'i rl I rlr,*i
l!
8.
. Properties of Spearman's Rank eorrelatioq Coefficient
(i) T#value of r, is always between -1 and +1, i'e. -1 1rr s + 1.
(il),,i;, is positive when'the rankg of thc pairs of sarnple obiervations tend to increase
./
'{iiil
together.
9.
r. = 0 when the ranks are 4ot correlated.

m
J
(iv) r. is nesative when the ranks of one variable tend'to deereaBe as the other

o
variable's ranke incrcaee. /- -

. c
A value of +1 or -1 indicateh perfet

t
1(

o
rankings,

s p
g
In order to carry out a le.tect on data in a eontingency , thc observed ll

o
valuec in thc table should be:

l
(a) clocc to the expected values (b) all greater 'an ot equal to 6
(e) frequeneiee

(b) .
(d)
b quan

3
L2
The ls.test should not bc uced if any expeeted frequeney is:

4
(a) leee than 10 lcsc than 6

9
(e) equal to 6 (d) more than 6

8, lf 1A8) = W , the two


t9 (b)
attributes A and B are:
(a) indepenrlent
ta (d) dependent 18

(c)
/: / s
enruelated quantltatlve
'[,0 ealeulate the level of assoeiation, we oaR ealoulate eoefflcient
of aesoeletlon,

s
the soaf{icient of aesociation always liec betweenl 1{

p
(a) - l attd+l (b) 0 and 1
6, t t
(s) :l*nd0
(n)h
(d) 0 and 5
lf twg atrrlbuter A and B are lndependent, ihen the coeffleient of aeeooiation isl
=l (b) +1 16

(e) u (d) 0.6


6, If (Aij) . qU, the rrecoaiation hetwcen two attrlbutes A and B ls: 16
(rr) tl6gatlve (b) positive
(e) EBro (d)
symmetrieal
1, 1'wo attributcs A and B nre said to ba poeitivc, if:

(a) (A{rt=t#U (h) (AB) *GP


(e) tan) >
Gf+ (d),(AB)*ry
_,245
8' I{ lXq attributcs ,A and IJ have perfeet positive association, the value of
coefficient of association is equal to:
(a) +t
G) -1
(c) o (d) (r - lXc - 1)
9.. The degreec of freedom for 1s are (r - 1)(e - r) for a contingency table with
r"rows and e- eolumRs. so for: a 2 x I contingency table there aie:
(a) one rlegrec* of freeclom (b) two degrees of freedom
om
c
(e) t-hree degrees of freedorn (d) four degrees of freedom
10. For anf x e cerntiuHency
t .
tabre the number of degreee of freedom equals:

o
(a) rc (b) r*c
(e) (r-l)+(c: l) (d) (r - lXc - 1)
s p
g
11. For a B x 3 contingcncy table, the n'rnberof eellc in the table are:

o
(a) 3 (b) 6.
(c) e
bl
(d) 4

-.
12' The null hyprrthesrs of inclepondenec betwccn the variables

3
iE tested ueing tho
l3,atatietic whcro eeleulated

4(d)
(r * 1)(c "= l), are greatel' tlran;
xl
= ziO E)rb, lf ,f,,
degrees of froedom,

(a) 1

9 9 ft)e
t
(e) 3

a (rl)
4

t
18. ?he nhape ofthc ehi.aquare dietribution dcpendr uponl

s
(a) IJrrrgrRefier,E

/: /
(b) degreer offreedom
(e) numhtr of eellr standarddevlation

s
14' The total nrea urrdcr the eurvc of e ehi,nquare distribution

tt p
isr
(nl 1
ft) 0,6
(s) (d)
h
0 to s,
-oeto*ao
16, elri'squfl16r (,urve rangi,n li,opl;
(a) '= c) tB * €,8
ft). 0tom
(c) - qoto0 (d) 0to1
16, I'lre vqlue oilclii,gtperp xtqrje[is rg nlways:
ft) ftBIrCI
(B) trolr-negntive ! (d) unc
17, Lr isstilg inrlependeneqr irr rr 0 x Il ertnrinseney table, thc number
of degreer of
frspduur in 1!-distritrution is:
(a) 1 (b) s
(c) 't (d) 6
246 Basic Statistics Paft-II
18. Given X2 = 5.8, df = 1, Xzo.oo,rl = 3.841,.Xzo.orrrr = 6.635, r,r'e make the fcrllowing
statistical decision:
(a) We reject Ho at a = 0.05 but not at u, = 0.01. :

(b) We reject Ho at c = 0.01. I


(c) We fail to reject Ho at a = 0.05. a
/
(d) We reject Ho at cr = 0.01 but not at ct = 0.05.
m
I
lf X'-
o
19. 13.96, df = 4, X,!o.oo(+r = 9,488, Xsoor(,ri = L[).277, wt:'rn;rke the {'ollorving
t

c
I
statistical decision;

t .
(a) We accept Ho at o = 0.01 and cr = 0,05

o
(b) We reject Ho at ir = 0.05 but not at o = 0.01 4

p
(c) We reject H.n at cr = 0.01 but rrot at e = 0.05
(d)
g s
We reject Hn at a = 0.01 and u, = 0.05
I
0

(b) lo
20. In converting the scores L8, 24, L2, 14, 22, 29 to ranks (assigning rank to the
highest seore), the score of 12 hae a corresponding rank of;

(d) b
(a)
.
1 2
(c) 6
3
7

4(d)
21. In'eonverting the scoreB 8, 20, 14,7,11, 14, 3 to ranka (aseignlng rlnk 1 to l;he

9
lowest score), [he seore of 14 has a corresponding rank of:
(a) (b)
9
6 6

t
(c) s.5 ,4,5

a (b)
22, If a
t
pereon runks lowect on beauty ancl highest rin intelligancc artd nrroi,her

s
porcon ranke higheet oh beauty anrl lowogt on intelligcncc, t,lrc Sptiarrnun's

/: /
coefficient of rank cerrrelation ie probably:
(a) Eero weak positiva

s
(c) perfect pouitivo (d) perfe*tnr:gal,ivr,:

tt p
gs. If is zero, the value of r,
#& inr

h
(a) 0.S (h) 1
(e) -1 (cl) 0

l, (a) 2, (t) 8. (a) 4, (a) 6. (c) B. (n) 7. (c) li,. (rt)


A

0. (a) 10. (d) 11. (c) L2, (a) 1S. (b) 14" (it) 1i'i. (b) 16. (e) 1t

)' L7, (hr' 18, (u) ls. (d) 20, (c) !1" (e) rrD (rl) 93, (b)
[Chapter 15] Association ,^a

STIONT QUESTIO
l. Given n = 100, (A) = 49. Find (o).
Ans.60
2. Given (Ae; = 30, (A) ="10. Find (Ap).

m
Ans. 10
3. Given (aBC) = i5, (aB7) = 60, (apC) = 40 and (apy) 70. Find (a).
=

c o
t .
o
4. Givena=bb, b= t2b, c=,lDandd=7b. FindX2.

p
Ans. 1.6625
6.
g s
Given (A) = 20, (B) = 13150, (AB) = 1/60 and n 260. Whether attributes
= A and

o
B are neg itively aseociatecl?

l
' '
b
Ans. Yes, (AB) .
.
. W
3
6. Given (A) = 3On, (81 = 1gr,, (Ats) 266 and n 1216. Show that

4
= = attributca A
and 11 are indcpendent.

9
.

- 0P
9
Ans. (AB)

t
= 256

a
7'
t
civen (A) = ll400ti tB) = 6000, (AB) 6800 and n ?0000. Whethcr attributcs
= =

Ans. Ycs, (AB) >


/: /
(AXIU
s
s
n

tt p
E. Givcn (AB; = t60, (sB) * l()6, (AF) = 272, (eF) = lt3l, and n = 1660, Find thE

h
Ans. Q = (J.71
g' Giverr x? = !0,r78, rlr=,1 urrtl a 0,01, B'ind the
= tablevalue of 21s and make the
statistical docision.
Arra' 1fi0, (qr
E I 3.27i, rcrjcr:t II,,
10. Givenfu=j10,7f),45,311,?[,,16,{sbg.6,EZ,E,82,6,g?,8,60,0,60.0,dfr2and
a = 0.06. Find xr n*d nrnke the statistical tteoision,.
Ans.1'l* 2$,?f16, r.cjt,ct, I{,,

ll. ll'thoresDocLivcvfllucHol([1-{)r/{,aro0.tZfr, Z,4l4,B,Z14andg,g2g,FindX,t,


Ane. 14.{ig
Baslc Statlstlcs Paft'II

t2, 'Givenfo= 25, fu= 15 andfo= 35, f.- 45. FindE(lf0-ful :0'6)2/fu'

Ans.8.0B

18. If the respective values of fo = 21, 38, 32, 29, 36, 25' 4t' 23 and fu = Bt'31' 27 '69:'
g2,g7,28.63, g2,g7,28'68, 38,96, 30.04, then find

m
1e'

Ans. 11.22

c o
14, Giventhepairsof
t .
ranke (4,2),(1, B), (2, 1), (5,6), (6,6), (3,4)' FindEd2'

Ans. 12
p o
16. Given Ed3 = 440 and n = 11, Find the valuc of rr.

g s
Ans. -1
l o
. b of rank eorrelntion'

3
16. Given Eds = 99 and n = 10, Find the coeffieicnt
Anr,0,4

9 4
t9
L7, Differentiate betwoen attributo anel variablc,

18, Exblain tho eonsietcnsy


taof the data.

.19, when
/: / s
two attrlbutes arc raid to o'e poaitively aaeoaiated?

s
80, whcn two attributes are naid te be negatively aeooeiated?

!1,
tt p
When two attributen are :aid to be aeeselated?

!!,
!8,
h
Explain what ls meant by indapcndenee of attributen?

What do yeu undcretend by aeeoelatlcn?

14, Explain thc terme indcpcndenee and assoeiation as apBlied ta atfirihutes,

t6, . Dlffcrcntiate betwecn pesit{ve assoelation end negative aseesiation"


i

!6, Deflne a eontingeney table,

t?: Explaln thc positivc and negetivc aseoeiaiisn,


*
It, What is moant bY ettrlbute?
[Chapter L5] Association

29' - Interpret the -meaning of coefficient of


association e when:
(a) Q=-1 (b)Q=+l (c)e=0
30. Expiain the general procedure for test of independence
between:the attributes.

m
31. Write rlown the direct formula for calculating
1z in a 2x 2 contingency table.

32. Define X2-distribution.

c o
33. Explain the coefficient of association.
t .
34. Explain the terms positive and negative
p o
s
attributes.

g
j
JD. Explain [he tenn rank correlation.

(a)r"=+1 o r.=_1
i

l
!

b
;
36. Interpretthemeaningwhen: (b) I

.
(c) r"=0
I

3
37. What is meant by Spearman,s rank correlation

4
coefficient ?

9
38. write down the properties of spearman's rank correlation
coefficient.

t9
ta
/: / s
s
tt p
h
Basic Statistics Paft-II
250 !
EXERCI$&S"- I

, 1, Compute all the remaining possibl(class t'requenci\from the foliowing data:


(cr) = 50, (B) = 70, (A0) = 20 and--il-i 100'
Ans. (A) = 50, (g) = 30, (AB) = 30, FD = 10, (oB) = 40
2. Given that: (AB) = 150, (AF) = 250, (aB) = 260, (o.F) = 2340. Find the other

m
frequencies and the value of n.

o
Ans. (A) = 400, (cr) = 2600, (B) = 410, (B) = 2590, n = 3000
3. Giveff that:'(A) - 304, (AB) = 256, (crp) = 144, (crB) = ?68' (A.B) = 48' Show that
attributes A and B are independent.

t . c
o
Ans.:(B) = 1024,n = 1216,
*P = 2bG.A and B are inclependent.

4.
s p
Whether uitributu. A and B are negatively associated, positively associated or

g
independent.

.(i) (A)=w,.(n)=H,
o
(AB)=fi,r' =250
(ii) n = 154, 88,
bl
(AB) + (AP) = 35, (AF) = 2o

.
(P) =

3
(iir) (Ag) = 5300, (oB) = 799, (cr) = 36000, (A) = 34000.

4
Ans. (i) A and B are negatively associated (ii)'A and B are independent

9
(iii) A and B are positively associated

9srl
E. Test the independence by a simplest approach betrveen gender and intelligence'

t
? Gender

Intelligence a \n
s t Males Females# Total

/: /
Level of

Inteiligent \ r50 y$.'Si

s
tt p' ,
Non - intelligen

h
Ans. (AB) = 150,
6.
(AXB)
= lb0. Independencc betrveen males and inteliigence
Test the independence by a simple approach betrveen intelligence of fathers and
sons' -
H;ril-
Sons Intelligent Not intelligent Total

Intelligent 300 500

Not intelligent 100 400 l;00

Total 400 1000

.{ns. Positive association between intelligent fathers and intelligent sons.


I
lChapter 151 Association __ 251
7. Find coefficient of association from the foliowing data:
Height of fathers )
Height of sons Tall a--"' Short
Tall 500 100
Short 100
m
400

o
Ans. Q = 0.905, positive association between height bf fathers and height of sons.
8. Test the independence by a simple approach between attack of disease and
vaccination.
t . c
Yaccinated Not vaccinated

p o
s
Attacked

g
50 500 550

l o
Not a.ttacked 350 500 850

. b
Total 400 1000 1400

3
Ans. Negatil,e association between vaccination and attack of disease.

4
9. From the follox'ing table, tcst the hypothesis that 'the flower 'eolour

9
indeperident of flatness of leaf. Use = 0.05

9
cr

a t
Flat leaves Lean leaves

t
Total

/: / s
White flc''rvers qQ
36 135

Red florvers 20

s
25

tt p
Total 119 41 160
Ans. 13 = (1.{94, accept Hu:
10.
h
In a locaiity, 3()0 persons were ranclomly selected and asked about their
educational altulrureht. The results are given as follows:
Erlucation

St,x N,Iiddie Secondary school College


l\Ialc 30 45 75

75 30 45

flan r.r.t say'i,h;,rt edr,rcation depends on sex? Use

\tts. 7.: = 2t).7U('>, i't,.icct IIo:


252 Basic Statistics Part-II
11. Find chi-square (7.2) for the following bable to examine the association between
the subjeits and their result. Use cr = 0.05.
i\
Result \ '' ''
Subjects i Passed Failecl.

Mathematics

Statistics 2t0 190

om
c
4n
.
English

History

ot
220

p
Education
Ans. 2g2 = 8.959, accept Ho:
g s i

o
12. The following data show initial training program performance and a job rating

l
by a supervisor 12 months later for a sample of 400 employees of a telephone

b
.
:

company:

3
Training Program Performance

4
Job Rating Below average Average Above average

9
l

9
Below average

Average
t
a29
average
s t 60
/
Above 23

:/
Is there hny relationship hetween performance in the training program and job

s
rating? Use the 1 %"levelof significance.

tt p
Ans. X2 = 20.178, reject Ho:

h
13. Find the value of chi-square (X2) from the foilowing data and.test the hypothesis
that there is no relation between the level of intelligence and the social status.
Use o = 0.05.
Level of intelligence

Social status 1 Bri[iant I Intelligent Dr.ril

Upper r,riddle 20

Lower middlel 22

Ans. 12 = 35.163, reject Ho:


T

;.

[Chapter 15] Association 7E?


i
14. The data given beiow are the .o;;;;
I
with regard to total sales and return on equiti,:
Company A 1)
il C D E tr G H I I
,rl ii L
Saies Rank 8 D
.) 10 'I r c I 1 ,1 2 L2 t)

Return on Equit Rank

m
6 1 10 12 2 11 8 5 I .) rt
4

o
compute the rank correlation coef{icient for this set of ciata.

. c
Ans. r" = 0.78
15. A group of ten rt'orhers of a far:tory is ranked accorrling to their
two different iudges as follorvs:
ot cfficiencl, by

Name of worker l1 B C

s
D E
p
F' G H I J

g
l

i I
o
Judgement of Judge o 8 t)
i0 {

l
1 3 6

b
Judgement of Judge Ii I / '] o

.
I 1C 8 5 t f-)

(0
3
Compute the Spearman,s rank corr.ejatiorr coelTrcient.

4
(ii) Interpret the vah.re of ycur result.

9
Ans. (i) r* = 0.88 (ii) r, = C.88 Ineans thr: opinion of the tu'c ;udgcs u'it,ir regard rc, lire

t9
efficiency cf the worker:s shows great similarity.

a
16. The foilowiig table shorvs irow 10 sturlents, aruanged" in alphabeticai orcier,

s t
were ranked according to therr achievements in both laboratory and lecture

/: /
portions of a Statistics course. Compute and interpret the Spearrnan's rank
correlation coefficient:

s I

tt p
Laboratory B 3 2 I 10 4 tr 1 5

Lecture o 5. t0 1
I 8 7 2 o

h
4

Ans.r. = 0'85,'indicating that there is a marked relationship betx,een achievements


'in laboratory and lecture portions.
L7. The foliowing table shorvs the first trvo marks, denoted. by x and y respectivelSr,
of 8 students on two quizzes in Statistics.
t[ "|
Marks on first Quiz (X) 50 65 75 1ab 725 140 170 195

Marks on secpnd Quiz (\) 45 60 80 95 t20 150 145 190


f)ompute and in te rp ret the rank dorrel-ation coefficie nt.
Sp e ar-m a n'sr
Ans. r. = 0.98. There is a high degree of positive correlation between the marks of
two quizzes.
254 Basic Statistics Pad-II
18. A chemical reaction is tirned at severill different temperatures; the results are:

Temperature ( Fo ) 100 15C 200. 254 300 350


t- -,
Tirne needecl (seconds) 84ll 21L 164 69 9.) t7,
r

(i) Determinc the Spearman's rank correlation coefficient between the


vrrriablcs.

m
(ii) Comment on tile yp,!r:e of your result.
Ans. (i) r* = -l

c o
t .
(ii) -1, there is perfect negative'rank correlation betrveen the variables.
ro =

o
19. Consider the situation w-here a panel of 11 financial experts is requested to

p
examine financiai tlata from trvo corporatiotrs, corporation.A and corporation B.

s
Calcuiate and interpr:et the rank cc''rreiation coeffic,ient between financiai

g
strength scores for corporal,ions A and IJ"

l o
Panelist 1
I 2
.)
r) A a) 6 i I I 10 11

. b
Corporation A 4 5 .) D 4 5 ? 4 2 r)

3
()
Corporation B ,j 4 2 J 2 4 2 4 2

4
L)

9
Ans.4^ = 0.1, indicating weziir positive correlation between the' scores given

9
corporation A und t\osc given cor:poration B.

t
2A. In a stuciy between the amor,rnt of rainfali and the quantity of air pollution

ta
removed, the follorving data werc collcctcd:

/: / s
Daily Rainfall, X 4.3 4.5 5.9 5.6 6.1 5.2 3.8 2.1 7.5
(0.01 centimeter)

s
Particulate Removed, Y 12G 121 116 118 114 ,118 132 141 108

tt p
(micrograms per cubic rneter)

h
Calculate the rank correlation coefficient fbr the daily rainfali ancl amount of
particuiate removed.
Ans.r.=-0.99
21, The ranks ofthe sanre 4 stuclents in two subjects A anri B were as follows:
(4, 3), (2, 4), (1, 2), (3, 1).
Two numbers rvithin brackets denote the ranks of the students in A and B
respectively. Caiculate anrtr interpret the Spearrnan's rank correlation
cocfficient
Ans. r. = 0. The ranks are uncorreiated.
Chapter
16
TIME SERIES

om
c
16.1 INTRODUCTION
A time series is a set of observations recorded
t .
o
The observations are usually recorcled according to some period of tin:re.
ut uq.ruil.,t".iur. of time. The r:nrolment

p
students in a certain'coilege foi: a nurntre* of
of 1,"r* is a l.early t,ime series. The

s
imports and exports.of a.or,"try on rnonttrJy
basis make an imprortant nionthly

g
series. The population ,'rf a crruntry time
rna5r be countecl after crrerl, 10 year.s. ,Ihe

o
population figures at intervals of

l
10 y"ui.
interval of the time se'ies depends ,0", is callerl a clece,nial time ser.ies. T,he
in" ,;;;il of the observations. .Ar what

. b
interval the observations woulcl becorne practicallv
population figures are usually not meaningful and important? The

3
recorded on ciaiil'or iveekly basis.
interval of time, the populatiln data ,i;;;"r';"L" For very small

4
prices of wheat, meat anci othe. , *""0 piece of info,nation. For

9
commociiti.., ,;; ,t' not use intervhis
years' The prices of I'ruits may of 5 or l0

9
be stuclierlo, au,t1,or]rr,"ut ty basis
anrl the prices of

t
harrl commodities may be .t"jiua on
monthly or yeariy basis.

a
16.2 PURPOSE OF TIJIIE SERIES

s t
The time series are constructed and

/: /
maintained for some practical purposes.
Some of their purposes are:
(i)
s
The previous pattern of the time- series
enables us to dete.minc or estrmate
future values of the time series, This i. the

tt p
u" i;;;.tJrrt purpose of the time series.
(ii) we ian applv a control on the time series.
If the tiro" .nriu* ,. ,rori , ,ncreasing

h
(like prices of meat), we may lil<e to
controi the increase in th.e tirne series.
control can be applied if we have a cletailed The
infor*urion about the time series.
16.2.1 GRAPH OF THE TIJI{E
SERIES
A graph of the time series plays an important
roie in the stud1, of the
series' To make a' graph, suppose
th" tr*" p".iod. a"e irurotud by t,, t2, t3, ..., time
the corresponding figures of the y-variable t6 and
are cienotecl by y,, yr, yr, ..., yp. The
time t,, t,, t.,), ..., tk are taken on X-axrs
ancr the y-values y1, y2, y3, .,., yp
plotted against their respective time. are
The plottecl points are joi.ed together by
straight lines to get a graph callecl the histoffi"-.
historical series where historical is f*o* i"ii-e series is aiso callerl an
production of firewood in punjab given ni.;r;;;. A g.oph of time series about
is in Figure 16.1.
255
256 Basic Statistics Part-II
Example 16,1"
Table 16.1. Production of Firewood in Punjab
'i
1984 1985 1986 1{}87 1988 1989 990 1991 1992 1993
Year:s I I I I I I I I I i

r985 1986 1987 1988 1989 1990 r 991 1992 1993 1994

m
Values million lls. i8.6 22.6 38.1 40.9 4t.4 40, I 46.6 60.7 57.2 53.4

The graph of the above data is shown in Fig. 16.1.

c o
Y

t .
o
'.o (n

p
'=1

g s
l o
b
q)
L

.
ir.

3
o
a

4
O

9
o

t9
ta
s
89 1989-90 t990-91 l99l -92 1992'93 l99l-94

/: /
1984--85 1985-86 1986-87 1987-88 1988- 89
. Years

s
Figure 16.1

tt p
(Source: Page No. 108, Statistical Pocket Book of the Punjab, 1995)
16.3 COMPONENTS OF A TIME SERIES

h
A time series usually changes with the passage of time. There are rnany reasons
which bring changes in the time series. These changes are called components or
variations in a time series. 'fhe worcls movements or fluctuations are also used for
these changes. These movenrents are:
(i) Secular trend (ii) Seasonal variations
(iii) Cyclical variations (iv) Irregular variations'
16.3.1 SECULA.R TREND
Secuiar trend is the regular component of tire time series. The Lime series moves
regularly in some direction over a long period of time. This regular movement of the
series in some d"irection, upward or downward is calied the trend or secular trend of
the time series. These movements may be slow or fast but they are systematic in
nature. The changes follow some rule. These movements are free of suddeh jumps
[Chqpter 161 Time Serles 257
i
ancl ewinge. The Y values fluctuate round an average anct these fluctuations $re
called 'noise'. The trend is cailed linear if the change, increase or decrease, 1'or a
ceriain period is alnroet [he iiarne throughout, lhe tirne series.
Figure 16.2. shows that thcre is upward growth of the tirne series. 'lhis upward
movement is called the trend. Ae the graph of rhe ohserved time serien ie close to a

m
straight line, we say that trend is linear, Thus the ernooth line r\l] in Figure 16,2,

o
shows the linear trend, Trcnd is not alwnys linear. l,'igrrre l(i,l}. shows t,hnt the.

. c
overall movement of the tirne series is elose to a curve. Here the t,renel is non.linear

t
or curvilinear" The curve AB shows the trenel in the forrn of a e Llrve,

ii o
p 991
Exampte 16,2.

s
Table 16.3, in " dpor potients treated in hospitals the P'rnjair,

1984 1198r

og t9B7 I [r88 1e89 1990 I 19,!2 1999

No, of patiente 68? 676


bl
tis2 046 i000
749 1136 1160 L?,07

.
1101
(in thousande) .i

43
The graph of tlre dnta in Table 16,2 ia ehown in Frgure, 1{1.P,,

9
I

*-.

t9
a
# fjrsph rhowlng the ltncnr trend

t
E
Ut
a. t4

s
sa

/
Lrnear'frend

/
l2

:
,E
G

s
E l0

tt p
Ft
KB
a

h
Ei
7.

re8$ lu86 tq$7 1988 tsfio loq0 legl tqq? lgr)3


Yearri

I"igure I6,8
Exampla t6,&,
Tqble 16,IJ, Salep of e Utility Stqro

Yearp 108fi 1088 1g8g 1090 lflfi1 i003 1s0u 10s4


258 Basic Statistics Part-II

The graph oflthc data in ilable 16.3 is shown in Irigure 16.3.


Y
iii:irph shorving non
\ft

=t
m
E
gl
V)

c o
t .
:,=
I

p o
s
a.

g
a

l o
b
9ti6 1987 lg{tfl | 989 990 legl t9e2 1993 t994

.
I I

43 ,"ru.tr""rtill

9
I 6.3.2 titiASONAI, Vr\lt trATIlONS
. 'lhe elernent o[' irrt:r'r,ase ot tlecrenee in a tinre scr:ies which is purely due to

t9
cha.ngtr* in s*;rsurrs rs,l;illt;ri sensunal var'iations.'l'lrc cltttnging seasons have their

a
inlluerpcr: on thu timr-r lru:rir:R rtnrl t,lrt: Biltne: influcttce is experiettced every ycar in the

t
cortrusi]or!{ligg seasonp. Thr,se I'lrangcs are rep;ular in nature. A year may be dividert

s
/: /
irrlo twri, tltrr,e, tirur or rnoii,tltirri furir $eagoltn. 1'he word season iB g r$lative term
nrrrl t,hc drvisi*rr ol'tlrc .year irrto t{rifert:nt. sc*uu)ns cleperrds upon the variable under

s
consrctci':lt!r.rn. lVher: *ic- tnlk irbor.rt, tlie pr:ices of wnrrn clothing and winter clothing,

tt p
it yoilr' rs rllvrilcrl rntl. t.rvo ,!iciitsontJ i.e. wittter ttntl sutnnter, 'Ihe priceE of wintOr
elolhilrg orit; irit;h in vrrirrtyl' irncl low itr suurrilur. The prices of cold drinke and ice" I
(:irr,ii.Mlti ;tt"tr i$rv i11 lviritr,,r'tntl irigh ilr srtlnrnrr'. 'l'ltc ufTrret of t]re Beason pfevaila fOf

h
c
tire dtrratirln *i'i.h* $(iri$r)ri alrC rlte linri: scriesi rnay trc influenced by the next coming b
,$(:ason. IJ'tl":ct'c i,s nu sr.ragtlrta! olTi:ct, the iinre serics rr,rrnainB stable in all eeagons of
rlie vr,:ar" 'l'he rf'fcct, ot'searrons irii rilrritsurell by se*sonal iudices. If the scaconal index
{irr winti:r in l4{.i ?{,, ii, nrr:uri* thnt. the lcvel of tlte t,irng vilr"iable nhows an increage of
{ti'in rn r+'iril,r-rr.'$$ rolnlr1l:*rl ro llrc overnll &vorago li:r all the yeare. The soeconal
vari;riiorr i*'r.: rrr;t :rL.i'+,sslrlily lirrkr',1 with n Beason. Tho changes lvhrch take place
rcguiarly ()\1(rl"y yeur [;ut. l'Irlr,,c nri c{]nr(}}:Il wit}r any $ed$on are aluo called Fcaconal
variation. 'l'hc inlr'*nse irr tlru lrriccx of ilhorlu on Hids and Xniae are examplea of
sensonal vnriations. 'l'hu clrnngos whrch are rogularly repeated tluring one year ar€
,:irIIcrl s.,ir*rutiil i :trirttItlrtr*.
F'ig. l(}.'1. sirows t.lre r-scflnnrr:tl v,irlat;innri in bhc quartor;ly pt'iccs of wlteat (Maxi"
I'gk) rlur:ing 1!U1 *rrr{ lU{J3.'l'lrri lrrir:e* iire Iorv in quarters I and II in 199i and
lflf)z. Pr.icc$ hn!,o incrpr,t!,icd in quiu:hers III and IV;rrtrl the irrereuso has tnken plaee
tn lrpfl' i111, r.r.,;1i,s
Chapter 16 Time Series
Example 16.4,
'l'atrlc i6,4. Quarterly Pricers (Rs. per Qurntril) r.rflWheat for
I
i
l$gl anr{ igg2
Quartcrs.
)fears II III IV
1991 307 300 323 350
r992 371

m
,Jb,1 iif)0 420

c o
.
Lir:rpir showurg scasolral varjations

ot
cr

s p
g
u
ct

l o
vi

b
d

.
A!

vl
a)
(J

4 3
9
r

t9tIrilrv
ta
/: / s errartr-.rs

["igurc l(i.,1

s
16.I.3 (IYCLICAI, VAITtrATIONS

tt p
'l'l:etlo ll''lov0lRo!lts ltt'e irt thtl fr:rrn ol'rvat,cir. 'i-iru wrlrrcs urc firt,rnecl
ul:,ril, t,ilc:
overvfig11 virriaticttt r:1'r: timc sL'ries. lri gt'rtplr, tht,rgtr nrovcrnclltg 6t,c lilec
bu'incr*s

h
cycltls t"''hich riaru tilrotrglr thc stitgcrs of hcunr, r(rcessioli, ricpresrrion, rccovery a116'
back to prusperrit.y" i,'ig. l{i.6, shorvs 1hc BtnH+.rri ol'c.ytilrs,
[ilulrr't
lJoorn

Dr:prcssiUn
li'igu rr: 1{.i, fi
260 Basic $tatlstlcc Pa*-II
The ctistance ttetrvcen ttre two booms itr ealleri a cycle rinrl the- tirne period
involvert tretwcen two Lrc;ori:rs is s,ri.Jlerd npan of the eyeie. 'Ihe span of the cyCle is
lsually yer.v ir:nti. 'l'hg linle sr:rius are influrlllced by the economic conditione in a
colntry, Ih,i"* tt,(i busiire ss *yclen, if any, in tlri: tl*onotnic ae tivity are reBpoReible for
the cyCles in t.h* relat,-rd tirriu lir;riq:-". ii'i'r,e efl'ecr of ';ltes'.r cirangr:** ia eluite feeble and it
iu tfrirofore dif{icult tu ln.lilr{ure ti-,..J$c va:irrtiorrs.'l'}re itunterical tnagnitucle of theee
varibtions is i;rsrgnitir:ant liud lltri,v ar,,i rnixed with the trenel' It is difficult to
eeparate t.hem frtinr tire ti:t:nil.

om
c
16.8.4 I.RItliCl ULAtt vARtA'ir:(}Fi$

t .
S*rcetimcs tlri:r.r.: are stul.dell rli{}vLlIRtlltts in tirs iirne sr;rics, 'flheee movcments

o
*ru ri*'**;i;;l;-;c;lfls(;)s likt. tl*orin, sttrikes, epieietuica, $'ars ete' The time series
i3
position
ciisturUe,t h1, uermt* nnprsitrct,i.r$11. {irrr:e$, The tinre se,riee somes t0 its original

s
eontrpilerl. 'llrt;-y tiri: al*.i uxllgtl srratic qr ;:cclirlental variutions' p
wlren the cftbr:t, oil ilru:grrlur ur rsruclofil ui:.t{ses ie over.'l'hese variations eannot be

Examltlet tfr,$.

og
l
Explain 1hu foll;ii.vrilH rlleverlrr.,nt* witlt roeperct to variatictnB in time scrics,

b
.
(i) llltlt'airut' itt Iill'l'.,rlr l:ttt,r itt ]r (:t]tlllLt']i '

3
(ii) lrlgri:;r6# rrr tlri, frr.rriri* nf:sfhiul uniliitrn in ttre etnrt erf the aeademie year'

4
(iii) Ttie lipgil,lrir oi"l',V'ri rliiiiir't*rlil dltt l,q lit!! rr, !u[taHr o{'electrie aupply,

9
(iv) Npfr.ttvni[iirr!rt,l .!l {:i,rinlrrt. irr nlrr,ntari*at tluttr [rt i[0ltruaxlull in tht, eCIuntry,

9
tV) I)ti'l'i'til'it, lIt tlr.rltl;trtil li't' lr'i' llt 11 illt'{}f :ilrilFi{rll'

t
(Vt) ltiCt6ri$u t1 l.liy l;rir*,'ir ul gltt:r.'in ttrr+ nI{)ntll Lrf Iiititlirrtttt,

a
t
AnEx0r'pl

/: / s
(ri lnt:rlirxi i,1ltr:r';ri.:.\' I'rrl,,; riJ i1r'rigiiliir and grir,lual procrise, A high lcval of litcraey
g
{:Hiln0[ Ii* :ti:lirrvrirl ip n, trhrryt 1i*r'iotl riI tirna. It reguires ionB tirue te make the
briiiLlilgr1 fiir :,,,iiu,,1*, iinrl Lo t,r'rtitt tlrtr t.r:*cttinH stllff' 'fhis lypa 0f inr:reagc ie

s
tt p
,:trl!r'..1 :.r'{'tiliii' i t',:lltl !i! il I Ilirt' rl'l't{]il'
{ri) lft(:!.*,,iir*i, rn lltr: !i!,r;ri;: ,,[ ,ti'llr.,ul riirllirrtrrg tnkfrn 1tl8cc
*ver]'.yeflf ln thg gtaft gf

h
rrqp;eilicrrrbil
il ie **!lSr1 seAsltrttl be!lrtllt*t,' it irfil1ul'g 9V9fy ltglltl,
tcr Ari!. i*i,fislrIi, 1,;-rt.

irii)" FAll irr rirp rrlltai{6 6i iil ilrtrgUlrii' fria|iirg It, it sUddpn gar:igE,
lt
Lhf t)"Fe *f
isiof :r'trigul*r vill'itrf,rr:!1irr the time
Sfld iE UtUAlly

(iv) Nern"avar!1tl:ilrtv"tii cctrrielrt r.lu+r lri i{t;pt',',itHI;tfi ia r.:ut}lpflt'ir}rle ryitlr nBovpmentB 0f


i:yr;lil*l llai,i,li;rr iir l,lir+ t,illt* s*r'is'*'
ivl F,*pie d.r Br-,1 li,1;iqi1,* ii:rr r{ut{rig r?itrli,t'$(,atst)fl, lf ip clsrfrly FnuiBthing WhiEh ig
felatr'rl tr' !lrr t\'lt!li'!' Bt'!'fjilil
{y1) Inr:reae(; ifl r,ltr Fr,iqliie qf gh** iq rlr* tlri:ulh qf llatuaran i8 a regulgr fcaturc in
eui- *('!uixtrv f ! l"riri-rtri:rir1 Lrvirr,"r. ),tj;i!'in i,tre rnutttlr ol'Ratnatran'
As thie ahange-ig
]rl rrtul]t]']t, ittcr9aEB rnAy nflt tBkg
regulsr, lt is ol'tlil ,,;*ilg.,l"tliI iiat,r.ii,e. aotltS t,he
plaee in the
Biir*, regplnr.!3l ip L1* iirr.rrrl: r;l lt*ltlttr:ii*. lf auy i[ will tre seaeonal if it ie
tnfireitse talies
nigntii .,f tturi,ouon" ri wrll rr*t !ri: r:uii*rl er,iisorlnl.
t'epenttd irvt'i'i'i'j1i! i'
-T

[Chapter 16] Time Series 261


" I.6.4 ANALYSIS OF TIME SERIES
. A time series is the combine cl effect of many forces. Sonre of these forces are very
powerfui and sonie are weak. The forces u.iirrg on a tinie series are called the
variations or fluctuations in a time series. They may also be termed as movqments
or components of a time series. The-time variable Y, is made up of other variables
which are'I, S, C and I where T stands for trend, S for seasonal, C is for cyclical and

m
I is forirregular" ?hus Y, can be written as: Y, =. TSCI which is called the

o
multiplicative urodel of the tirne series. The additive model of the tirne series is

c
.
writtenas: Y ::'l'*S+ C+ I

ot
Anyone of these two rnodels is used for the detailed stuciy of the changes in the

p
time series. Whieh model is better? It depends upon the assumptions about the given

s
tirne series. Usually the model Y = TSCI is used in tlie study of the time series.

g
Sometimes we are interested in the irend of the time series, sometimes our interest
is about the seasonal forces. If we are interested in the trend of the time series, we

l o
have to remove the effects of other forces. Similaily the measurement of seasonal,

b
cyclical and inegular variations is possible. A stuc{y regardrng any component r:f the

.
time series is called the ati,alysis of tinrc'series. In this book lve shall discuss the

3
measurement of trend onty. The measurement of seasonal, cyclical and irregular

4
variations is beyond the scope of this book. A general statistical model for the time

9
series can be written as:

t9
Yt = f(t)+u, t= L,2,3, ..., k

a
It is an old concept in which the observed time series is assumed to consist of

s t
two parts rvhich are systematic part f(t) and random part us. The systematic part f(t)

/: /
of the time. series is also callerl the 'signal'. The random sequence u1 is also
sometimes cailed the 'noise'.

s
tt p
Accorctirrg to this model, the iime series consists of f(t), a slowly moving function
of time which may be consicler*tl. combination of trend and cyclical variations. The
random term ut consists of ;ili forces other than trend. and c;'clical. The

16.5
h
measurement of u, is not possible because it is combination of various actions on the
time series
NIEASURIISXENT OlI SUC{JLAF, TRENI)
For the mcagurement of lrerirJ, we have fo elirninate the short.berm fluctuations
from the tirne serics. if rvr: itre int.cresled in short-terrn flnctuations, the long-term
trend is to be removcd from the time series. The removal of the uerrd from the time
seriee is possible cnly if trend ha; i:een nreilsure.d.
Secul,rr lrl:.rid i: a smotith linu, ol'il crlrve and is a c,:ntinuuus function of the
time. It can be nrersured. tiy ttre frritcwing methocls:'
(i) The method of free . hand cur:ve. (ii) The methr:ct of semi " averages.
(iii) Tlte mcthod o{'moving averages. (iv) 'lhe methorl of least squares.
Basic Statistics Part-II
16.5.1 THE METHOD OF FREE.HAND CURVE
We make a graph of the observed time series taking time on X.axis and the
variable on the Y-axis. The plotted points are joined together by straight lines to get
a graph called the historigram. We carefully examine the shape of the graph. The
graph of the original data may be in the neighbourhood of a straight line or it may
be close to some curve. We draw a straight line or a freehand curve passing through

m
the plotted points such that the growth of the time series is indicated by the trend

o
line or the trend curve. The line or curve drarvn smoothes out short term

. c
fLuctuations and the remaining portion of the time series represents the trend. Wq

t
can read the trend values from the trend line or trend curve for ail the iime.periods

o
of the time series. The trend thus drawn can be extended to estimate the values of

p
the time variable for some periods beyond the given time series. It is calied

s
forecastirtg. The future values can be estimated very easily.

g
MERITS

l o
This method is very simple. It is applicable for linear and non-linear trends. It
gives us a quick idea about the rise and fall of the time series. For very long time

. b
'series, the graph of the original data enables us to'decide about the application of

3
more mathematical models for the measurement of trend. A monthly data of 5 years

4
has 60 values. A graph of these values may suggest that the .trend is lincar for the

9
first two years (24 values) and for the next 3 years, it is non-linear. We accordingly

9
apply the linear af proach on the f:;st 24 values and the curvilinear technique on the

t
next 36 values.

a
DEMERITS

s t
It is not mathematical in nature. Different persons may draw a different trend.

/: /
The method does noi appeal to a common man because it seems as if it is something
rough and crude.

s
Example 16.6.

tt p
Measure the trend by method of free-hand curve from the data given in Table 16.5.
Table 16.5. Production of wheat in the Punjab.
Years
h
Production mi'llion
metric tons
1981-82

8.6
1982-83

8.9
1983-84

7.6
1984-85

8.3
1985.86

rc.4
Years 1986-87 1987-88 1988.89 i9B9-90 1990-91
Production million
9.2 9.2 10.5 10.5 10.5
metric tons
T
tl
i
lChapter 16I Time Series 263

.o
(€ Trend line

&
q)
lo.

.E 8.
C)
Original data

m
EC
>u.

o
o
a4.

c
o

t .
o
E=)

o
o
o.

p
l98l-82 1982--83 1983-84 1984-85 1985-86 1986-87 1987-88 1988-89 1989-90 1990-91

s
Years

g
Figure 16.6

l o
' We observe that the graph of the original data does not show any closeness to

b
any type of curve. It looks }ike increasing very slowly in straight (linear) manner.

.
Thus we draw a line AB as an approximation to the original graph. The line AB

3
represents the trend line and from this line we read the trend values for the glven

4
years. The trend values are: 8, 8.3, 8.6, 8.9, 9.2, 9.5, 9.8, 10.1, 10.4, 10.7'

9
16.5.2 THE METHOD OF SEMI.AYERAGES

9
This method consists of dividing the time series into two equal or almost equal

a t
"T,xllf',*i::Jl:::*1"J:"rT"i1';ilff
t
lJ,ill,,themiddre.varueisomittedso

s
that both the parts of the series contain equal number of values. The average for

//
each part is calculated and is written against the middle period of the respective

:
If
part. a part contains odd number of valves the average is written against the

s
middle period of that part. Suppose there are 5 years in a part, the average is

tt p
written against the 3rd year. When one half of the series contains even number of i
years, say 4 years, the average is written against the centre ofthe Znd and 3rd year. ,d

h
The calculated averages are plotted on the graph paper and are joined together by a ,',

straight line. This straight li.r" ,n"u.ures the trend of the data. This method of 3
finding trend is called semi-averages method. We can read the trend values for all
periods from the graph paper. The trend line can be extended on both sided to give
the trend for the entire time series. The.line can also be used for future prediction i

but only for the near future. {


MERITS
This method is very simple anrl does not require much of calculations.
DEMERITS
The method is used only when the trend is linear or almost lineai. For non'
linear trend this method is not applicable. It is based on thq calculation of average
and the average is affected by extreme values. Thus if there rs some veiy Iarge value
or very small value in the time.series, that extreme value should either be omitted
or this method should not be applied. We can also write the equation of the trend
Line.
t
264 Basic Statistics Paft-II
Example 16.7,
Measure the trend by the method of semi-averages by,using the data.of Tabte
16.1. AIsowrite the equation of the trend line with origin at 1994 - gii.
Years Values Semi-totals I Semil'averages Trendvalues(t)
I (mlltlon Rs.)

1984 - 85 18.6)
om
c
28.664-3.656=25.008

.
'
t
1985 - 86 ,r.61 32.32-3.656=28.664
1986 - 87 38.1 161.6 32,32
p o 32.32

s
F

' g
1987 - 88 nnorjrnn) 32.32+3.656=35.976
1988 - 89
l o
b
35.976 + 8.6b6 = 89.682

.
1989 - 90 40 1.] 39.632+ 3.656 =.13.288

1990 - 91 46.6

43 43.288+3.656=46.944 cc

9
|

t-92 de

9
60.7
| 253.0 50.60

t
4

*'l J

a
se
1992 - 93
t
50.60+3.656=54.256 se

s
in

/
1993 - 94 ffi.4)

/ = 50.60
54.256+3"656=5'1.912 a!

- :
Trend for 1991
s
p = 18.28
92 Trend tbr 1986 - 87 =

tt
32.32

Increase in trend in 5 years Increase in trend in 1 year =


h
3.6b6
Trend for one year is 3.656,'It is called slope of the trend line and is denoted by,b,.
Thus b = 3'656. The trend for 1987 - 88 is calculated by adding g.Gb6 to 82.82 and
similar calculations are done for the subsequent years.'Trend for IgSb 86 is less
-
than the trend for 1986-- 87. Thus trend for rggb - g6 is = gz.Bz 8.6b6 28.664.
- =
Trend"for the year lg84 - 85 = 25.008. This is called the intercept beeause 1gg4 8E
-
is the origin. Intercept is the value of Y when x = 0. Intercept is denoted by 'a'. The
equationoftrendline is 9= + bX = Zb.00B + 3.656 X (19g4 * gb = 0) where t
"
shows the trend values. This equation can be used to calculate the trend values of
the time series. It can also be used for forecasting the future values of the variable.
Chapter 16 Time Series
it Graph showing production of firewood and the trend
by semi-averages method
$ Trend line
'
Y
=
25.008 + 3.656X (1984 _ 85 =.0) .,

0
!

m
fr 4
o
3 Originaldata

c o
.
(J

t
lr

p o
s
0

g
1984 * 85 1985 - 86 r987.-88 t988=89 1989_90 tqco_ct tggt'_92 tgs2_s3 1993*94

l o
Years

b
Figure 16.Z

.
16.5.3 THE METHOD OF MOVING AVERAGES

3
suppose that there are n time periods clenoted by t1, t2, t3, ...,
tn and the

4
coiresponding values of Y variable are y,, y2, y3, yn. First of all we have to
,..,

9
decide the period of the moving averages. For short time series,

9
we use period of B or

t
4 values. For long time_series, the period may be 7, 10 or more.
For quarterly time
series, we always calculate averages taking 4-quarters at a time.

a
In monthly time

t
series, 12-monthly moving averages are calculaied. Suppose the glven
time series is

s
in years w1 decided to calculate B,years miuing uuJrug". irr" moving

/: / Qf
1nd .have
averages denoted by ar, a2, ... &._2 are calculated as below:
(t)
s
Years Variable B-V

tt p
Y1
I

h
l

t2 Y2 Yr+Yr+Y, i

=&1
l

,
i

YZ+YB+Y+
rd
t3 Y3 Yo*Yo*Y,
eo+ -a2
3
ls. t4 Y4

4.

|5
Ie Yr,-z
7
Yn-2 +Yn-1 +Yn
tn-1 Yrr-t Yrr-z*Yr-r+Yo -= 8n-2
of 3
trr Yn
266 Basic Statistics Paft-II

The average of the first,S values i, $S and is denoted by a1. It i''


written against the middle ydar'tr. We leave the first value Y1 and calculate tht:
Yzaf3+Y+
averageforthenextthreevalues.Thisaverageisa2==ffandiswritten
dgaingt the middle year t3. The process is carried out to calculate the remaining
moving averages. 4-years moving averages afe calculated as under:
Years (t) , Variable (Y) 4 - Years moving
m
4 - Years moving
averages averages centred

c o
't1 Yr
t .
.t2 Y2

p o
s
Yr+Yz*Ya*Yr
4 -

g
'al
41*az

l o
t3 Y3 2 -ral
=A

. b
Yz+Ys*Yr+Ya
.
4 -raz

3
: t.

g2*&a

T 94
ta Y4 2 -Az
Ye+IdYstY6

9
=&3
tg

a t
s t
/: /
:"

s
Yr*Yz+Ya*Yn _ .
aL=::-T- It is writter

tt p
The frrst average is a, which is calculated as
against the middle of t2 and t3. The second average is a2 which is calculated r'

h
y' + Ya * Ya * Yr.
&Z= It is written
n agarns[ the midd'Ie
against Lrle IIlr( of t3 and ta' The tw

ayerages o1 and r.2arefurther averaged to get an average Nr=#, which fefers

to the centre
- of t3 and is written against tr, This is called centering of the 4-years
moving averages. The process is continued till the end of the series to get 4-years
moving averages centred. The moving averages of some proper period smooth out
the short term fluctuations and the trend is measure{ bV the moving averages'
MERITS
Moving averages can be used'for measuring the trend of any time series. The
method is applicable for linear as well as non-linear trends'
!
DEMEITITS
The trend obtained by moving av€rages is, in general, neither a straight line nor.
some standard curve. For this reason ttfe trend cannot be extended for-forecasting
the future vaiues. Trend values are not available for some periods in the start and
some values at the end of the time series. The method is not applicable for short
time serieis.
Example 16,8

m
compute 5-year, 7-year and g-year moving averages for the following data.
Year

c o
.
1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000
Value 2 4 6 8 10 L2 L4

o
16
t 18 20 22

p
Solution:

s
The necessary calculatiorrs are given below:

og
l
5-year moving 7-year moving 9-year moving

. b
Year Value Total Average Total Average Total Average

3
1990 2
:
1991 4

9 4
9
t992

t
6 30 6

a
I
t
1993 8 40 56

/: / s
t994 10 50 10 70 10 90 10
1995 L2 60 12 84 TZ 108 L2

s
tt p
1996 L4 7A L4 98 L4 L26 L4
1997 16 80 16 LL2 16
1998

1999h 18

20
90 18

2000 oq

Example 16.9
Compute 'year moving average centred for the following time series:
Year 1995 1996 1997 1998 1999 2000 2001 2002
Production 80 90 92 83 87 96 100 110
(million pounds)
Basic Statistics Part-II
268
Solutktn:
The necessary caLculations are given below:
Col.,3 Col.4 Col. 5 Col. 6
Col.2
4-year 4-year 2-values 4-year moving
Production
(million pounds) moving moving moving total of average
totai average CoI. 4 centred

1995 80

om
. c
1996 90

t
345 86.25
1997 o, t7 4.25 87.L25

o
352 88.00
t77,50 88.750

p
1998 83
358 89.50

s
18i.00 90.500
1999 8"1
91.50

g
366
189.75 94.875
96

o
2000 393 98.25

l
2001 100

2002 110

. b
Alte rnative Calculations:

43 Col.5

9
Cci.2 Qol. 3 Col.4
A-year moving

9
Production 4-year 2-values moving

t
(million pounds) moving total total r:f Col. 3 average centred

a
Coi.4+8

s t
/: /
1995 80
1996 90
345

s
697 B?.125
t997 92

tt p
3$2 88.750
1998 83 7L0
358 90.500

h
1999 87 724
366
759 94.875
2000 96
393
2001 100
2002 110

Example 16.14
ThefollowingdatagivetheannuaIsalesinRu'(09000)o{^W.u.

moving averages
show by direct numerical calculation thal the z-year centrect
''veights tr' 2' 1 respectively'
are equivalent to 3-year lveighted moving avei'ages li'i'"h
[Chapter 16] Time Series 269
Solution:
The necessary,calculations are given belorv:
2- 2-values
year moving Z-year moving 3-year weighted 3-year weighted
Year Sales movi total of averages centred moving total nioving averages
ng Col. 3 ' Col.4+4 (weights L,2, L) (weights L,2, L)
.total
1995 50

om
c
110

.
1996 60 245 6L.25 245 61.25

t
13q
lo

o
1997 283 7A.75 283 70.75
148

p
1998 73 301 75.25 301 75.25

s
153
1999 80 318 79.50 318 79.50

g
165
2000 85

l o
Hence. 2-year centred moving averages are equivalent to 3-year weighted

. b
moving averages with'weights 1, 2, 1 respectively.

3
r.6.5.4 METHOD OF LEAST SQUARES

4
This method has already been expiained in detail in the chapter of regression

9
analysis of this book. We examine the given time series to decide whether the trends
is.linear or not. If the .difference between two adjacent values is almost same in most

t9
of the cases, then the trend is linear and we fit a straight line to the time series for

ta
16.5.5 FITTING A STRAIGHT LINE

/: / s
The equation of straight line is Y = a + bX where 'a' and 'b' are unknowns to be
determined. The variable X denotes time and Y denotes the dependent variable. Our

s
job is to calculate the vqlues of 'a' and 'b' from the given data. For this purpose we

tt p
write the normal equations of 'a' and 'b' which are
XY = na + bXX and DXY = a EX + bXX2

h
Putting the values of various summations and solving the equations simultaneously,
we get the values of 'a' and 'b'. Putting these values in the equation Y = a + bX,
we get the equation of best fitted straight line. The fitted equation is written as
AA
Y = a + bX where Y denotes the trend values rvhich make the best fitted line.,
16.5.6 CODING OF THE TIME PERIODS
The time is the X-variahle and if the given time series consists of years, the
given years can be denoted by X, but it is never advisable. The years are assigned
some smaller units to simplify the calculations. This process is calied coding. The
initirrl period may be taken as 0, the value of the next period will be 1 and. so on.
Thus 0, l, 2, 3, ... will be used as codes for the time periods which may be years,
quarters, months, weeks or any other period of time" it is further iilustrated in the
following columns.
270 Basic Statistics Paft-II
Years i
'X Years X Years Xor Years Xor
x x
1988 0 1988-89 0 1988 0 0 195 1 00
1989 1 '1989-90 1 1990 1 2 1961 110
1990 ) Z 1990-91 2 Lssz 2 4 197 1 220
1991 i 3 1991-92 3 1994 3 6 1981 330

m
Years

o
Years Xor and x Months X

c
intervai) x

.March
Quarters

t
I

1988 I 0January

o
1988 0 1951 0 0 0
1989 1961 10 II February

p April
1 1 1 1
.)
1990 c)
1972 2l 2.t IIT, 2

s
* 1981 30 3.0 IV e 3

g
1992 4 1989 I 4

l o
II 5

b
,, a missing value.
I.6.5.7 CHANGE OF ORIGIN IN CODING

3 .
The calculations can be further simplified if we use the coding such that IX =.0.

9 4
When EX = 0, the normal equatigns are reduced to IY = na and XXY = bEXz
From these equations we can very easily find the values of 'a' and 'b'. Taking

t9
EX = 0, becomes very important if we have to fit znd degree curve or curves

a
involving higher powers of X in the normal equations. To make EX = 0, for odd

t
number bf'periods we take 0 against the middle period anil the succeeding periods

/: / s
are taken as L, 2,3 and so on. The periods preceding the middle pqriod are denoted
by - 1, - 2, -3 and.so on. If we have seven years, their codes wiil be - 3, - 2, - 1, 0, L,

s
2, 3 with IX = 0. For even number of periods, we assume 0 irr the centre of the two

tt p
middle most periods and after 0 (we do not write 0) the codes will be 0.5, 1.5, 2,5 and
so on. Before 0, the codes will be - 0.5, - 1.5, - 2.5 and so on. The idea of change of
origin is further explained in the following columns.
Odd No.
ofX '
Years
h,

or
x
Yeais
r.vith
uarters
Months

1989 -2 1989 - 2.5 -5 I -7 January


1990 -1 1990 - 1.5 -3 II -l) February
1991 0 1991 - 0.5 -1 III -3 March
L992 1 L992 0.5 1 w -1 April
1993 I 1993 1.5 3 I ;1 Mav.,t
1994 2.5 5 II ;3 June
III i)
n
IV I

XX=Q ,XX=Q XX=0 XX=0 IX=0


-l
[Chapter 16] Time Series
Example 16.11
Fit a straight line with the help of least squares method tc the following data
taking the origin at the middle of the time period and unit of measurement for X
being one year.

Year 1998 1999 2000 2001 2002


Sales (million Rs.) 28 32 40 44 56

**';i:'"quation
om
c
is y = a + bx
of the straight tine
The normal equations are: XY = na * bEX antl EXY =
t .
aEX + bXXz

o
The necessary calculations are given below:

Year x Y

s
XY
p x2

g
o

o
1998 28 -ob 4

1999 -1 32

bl -32 1

3
2000 0
.
40 0 0

4
2001 +1 44 +44 1

992002 +2 56 + 112 4

a t
Total 0 200 68 10

t
EY=na sor
Since XX = 0, the normal equations become

/: / "=T and EXY=bIX2 or .b=FEXY


n

s
tt p =4Oandf= Y_
Substituting the values, we get
200 68
u=-B- = 6.8

h
10

Hence the fitted straight line is 40+6.8X


Example 16.12
Fit a linear trend equation by the method of least squares for the following
time, series, taking the origin at the middle of 1997 and 1998, the unit of X being half
year.

Year 1995 1996 1997 1998 1999 2000


Profits (0000 Rs.) 4 6 7 5 8 L2

Also compute the sum of the residuals and the sum of squares of residuals.
272 " Basic Statistics Paft-II

Solution:
The equation of the linear trend is [ = 3 + bX
The normai equations are: xY = na + bxX and xxY = axx + bxx2
The necessary calculations are given below:

(Y_Y)

m
Year x Y XY x2 Y = 7+0.63X .(Y-Y)z

o
i5 0.0225

c
25 3.85 0.
1995 -5 4 -20

-3 6 .-18 I 5.11 0.89

t . 0.792t

o
1996

p
6.37 0.63 0.3969
1997 -1 7 -7 1

1998 +1 5 +b 1 7.63

g s
-2.63 6.9169

l o
+3 I +24 I 8.89 *0:89 0.7921
1999

2000 .+b L2 +60 25 10.15

. b 1.85 3.4225

Total 0 42 444

43 70 42 0 12.343

99
Since XX = 0, the normal equations will be

t
,XY
b=F
a
XY=na or "=?'Y and XXY=bXX2 or

s t .r

/
Substituting the values, we get
42
/: [= 44

s
u=T=7.and 70 =.0.63

tt p
A
Hence the fitted linear trend is Y = 7 + 0'63 X'

h
The trend values are computed from the above equation by substituting
X = -5, -3, -1, 1, 3, 5, which are shown in the above table' '
.A
Hencesurrtofresiduals=E(Y_y)=0andsumofsquaresofresiduals

= ,(Y - Y)' = 12.343.

If the straight line fitted to the d.ata for th^e years 1995 to 2000 (both inclusive)
with origin at the middie of 1997 and 1998 is Y = 195 + 8'5X, the unit
of being 1/2
to 2000' Also determine the
y"a*. DJtermine the trend values for the years 1995
straight line by shifting the origin to 1995'
Chapter 16 Time Series
Solution:

Year x Y=195+8.5X u

1995 -5 195+8.5(-5) = 152.5 0

m
1996 -3 195+8.5(-B) = 169,5 1

1997 -1 195+8.5(-1) = 186.5 ,


c o
+1 195+8"5(+1) = 203.5
t
o
.
o
1998 d

p
leeri +8 195+8.5(+3) = 22A.5 4

2o0o +5 195+8.5(+5) =
g s
237.5 5

l o
When we have to shift the origin at 1995, on the new scale the years are coded

b
as 0, 1, 2,3, 4 and 5. Let u denote these values. There is a reiation between X and,u,

3 .
clearlyX= 2u- 5. In the equation$= lgb + 8,b X; X.is replacedby zu-Sandwe

4
get the equation in which the origin is at 1gg5. Thus $ = tgb + 8.b(2u b) = lgs
-
9
+ 17u - 42.5 = L52.5 + 17u.

t9
16.6 FITTING OF SECOND DEGREE PARABOLA

a
We fit a straight line to the given time series when the Y-values

t
move in a

s
linear manner i.e; the increase or decrease per given tim6 period is almost constant.

/
If the increase or decrease do not show this pattern, we do not fit the straight line"

Y =s:
/
In that case we fit a curve to the data. A simple curve to be discussed here is the
second degree parabola:

tt p
a+bX+cX2
It is also called second

h XY
degree curve. F'or'fitting of this curve, three unknowns
'a', 'b', 'c' are to be estimated from the given data. This is done by means of the
following three normai equations:
= na+bEX+cEX2
. XXY - aXX +bIXz + cXXg
EX9Y = axX2+bEX3+cEX4
The solution of these equations requires lot of computational work. To recluce
the computations, we use the codes for the time series such that XX = EX3 = 0. When
XX and IXs are zero, the normal equations are reCuced to
IY= na+bEX2 (1)
EXY = bEXg (2)
EXhf = axX2+cEX4 (3)
Basic Statistics Paft-II
274
(a] 'are solved
Equation (2) directly glves the value of b. Equations (l)-.a.na
the values of 'a" 'b'
simUltaneously tr: S"i tt u uil,r., of 'a' and 'c'. Having calcul4ted
;.;, *" pri therir in the equation and get the .q,utio' of the fitted curve written
"ra
.t = a * [f, + cX2. The origin must to mentioned with the fitted equation'
".

m
Example 16,14
2002;
Fit a second, degree parabola to the following results for the years 1998 to
troth inclusive:

c o
xX=0,xY=250,xXY=600,8x2=].50,EX3=0,Ix2Y=8200,EXa=5500.
t .
Solution:
+ + cX2
p o
s
The equation of the seconcl dr:gree parabola is Y = a bX

g
The normal equations arel Since EX = EX3 = 0, the normal equations become

IY = na+bXX+cXX2 EY=na+cEXz
l o
. b
3
IXY=aEX+btxz+cIx3 IXY = bEXz

4
tX2Y=aEX2+bXX3+cEXa XX2Y=aEXz+cXXa

99
Substituting the values, we get

- * c
a t ..'." 600

t
Z5O 5a 150 (1)
From equation (2), we get b = G-q = 4

/ s
600 = 150 b ...... (2) (1) and (3), we
Solving equations

:/ bV 30 and subtract
multiply equation (1)

s
from equation (3), we have

tt p
8200 150a + 5500 c

h ?oo looocn*-.=ffi
7500 150a + 4500 c

= =0.7

Substituting c = 0'? in equation (1), we get

250 = 5a * 150(0.7) or 5a = 250 - 105 = L45 or a=f =29


A

Hence the fitteil second degree parabola is Y = 29 + 4X + 0'7 X2


[Chapter 16] Time Series ,zE
SHORT DEFINITIONS
Tirne Series
A time series is the rneasurement of a variable at regular intervals of
time.
or
An arrangement of staiistical data with respect to their time of occurrence is

m
called
time series

o
Histogram

. c
A histogram is a vertical bar chart in which the rectangular bars are constructed

t
at
the boundaries ofeach class.

o
Historigram

p
A graph of time series or historical series is called historigram.

s
Signal

g
signal is the sequence foJlowing a regurar patrern of variations.

l o
Noise

b
Noise is the sequence follorving an irregurar pattern of variations.
Secular Trend

3 .
Secular trend in a time series is a long term, smooth, underlying pattern of

4
change
from time to time in the series.

99 or

t
A long-term increase or d.ecrease in a time series in lvhich rhe rate of change is

a
relatively constanr is called secular trend.

t
Seasonal Variation

tirne series
/: / s
Seasonal variatiorr is the repetitive pattern of variation occurring within year
a in a

s
or

tt p
A pattern that is re.peated throughout a time series anci has a recurrence period of
at
most one year is callecl seasonal variation.

h
Cyclical Variation
The cyclical variation of a time series is the wavelike or oscillating pattern
about the
trend that is attributable to business and economic conditions at the time. It i. uf.o
known as a business cycle. :

or
A pattern within the time series that repeats itself throughout the time series and
has a recurrence period of more than one year.
Irregular Var.iation
Irregular variation in a time series is composed of changes that cannot be described
as secular trend, cyclicalvariation, or seasonal variation.

Changes in the time series data that aroJunpredictable and cannot be associated
with the secular trend, seasonal variation, or cyclical variation are called irregular
variation.
276 Basic Statistics P.rt'II
Business Cycle
A business cycle has four stages:
called
fil f-rp erity or Boom WhB-n pr,duction of a thing is maximum, this stage is
as prosperitY stage.
(ii) inirrri",, Wir"" production of a thing is decreasing this stage is called as
recessloll.

m
(iii) Depr essiotr, when the production of a thing.is minimurn, this stage is called as

c o
the production of a thing is increasing towards prosperitv, this

.
t*l *eJ;li::ilhn."
t
stage is cailed recovery stage'

o
Multipiicative Time Series Model
'
p
A model whereby the separate cornponents rlf the time series are
multiplied together

s
of time series are
i, iar"rtf' it " ".trut time series uulr.. When the four components a multiplicative

g
assumed to be fresent in a multiplicative form, the model is cailed

o
. time series model that is Y = TSCI'

l
Additive Ti*; S;;ies Model

b
present in an add'itive
when the four components ot'a time series are assurned to be

.
fofm that is y = T i S o C + I, the modei is called an arlditive time series model.
Analysis of Time Series

43
Analysis of the time series is stud..,, of varieus iomponents that are
present in a time

9
series and to analyze them. By analysis of tirne series, we mean to concentrate in

9
other three
finding the effect of one .o*por,urrl after eliminating the effects of

a t
corrponents from the time series data'

t
Graphic or Freehand l\'Iethod
to the data just

s
The given data are plotted on a graph paper and trend line is fitted

/: /
by inspection.
Semi-Average Method

s
- The data for which the trend values are to be computed b-y dividing"the values into

tt p
trvo equal pard:If};;r.--r. LtfJ number of years^the middle year'is left and the two
eqgal parts are fornled. The averaj" it .oqtp,rted' for each part and is writtdn against
the midpoi"t;f th;;;.3, part. Tl*rese two-averages are shown on the graph alq".g

h
these two points wtrich
with the original values and. a stralght line is drailn through
describes the trend.
Average Method
A method of forecasting or smoothing a time series by averaging each successive
-Moving
group of ctata points is called moving average method'
ar
ie known as mbving
The successive averages of 'n' consecutive volrtes in a time series
average method.
- Methld of Least Squares
a curve to 4ata an$ i:
The method of least ,qr"r", is a widely used method of fitting
trend of time series' The method
;h;;;rt poputhr method of computit g tr," secularperiod
Iocates the trend value for the. middlelf the tirne at the average for-the data
'to which the trend line is being fitted. The objective of least squards method is to
minimize the sum of squares of reeiduals'
Ichlptel161 ritne se 277
Residual
Residual is the difference bctlveen the actual value of the time series and the
forecast value. It is also cali,:d the forecast error.
Principle of Least Squares
According to the principlc of lclst squares "the best or most plausible value of any
:bserved quantit'r is tlra[.[i-rr ri,hrch rum r.,f squares of rcsidrrals is least".

m
Forecasting

o
lhe process of predicting thc rnagnicude that a variable. u,ili assume in future is

. c
:alled forccasting.
GIVBN I'HE FOLLO\,VTNG S,.[A'|EN{UNTS. LINK THtrM I#ITH THE MOVEMENTS

ot
p
: OF TI{E TIME SEBIES
1. Decrease in death ratr:s in Pal<istan.
s
I

g
I
Ans. Secular ti.errrl
2. Increase in priccu of shor.r$ t'or, cirildren i:efbre llid.
Ans. Seasorral
l o
b
3. Non-availability of [,i.arrr,purt ci.rie tc heav.y ralns,
Ans.Irregular
4,
3 .
4
Shortage of sugar due to strikes in sugar Mills.
e
Ans.In'egrrlirr

9
n
5. I)epression in busi.nerui;.

9
,e

t
Ans. Cyclical

a
6. Decrease in priues of cold tL'inks in wintcr.

t
Ans. Seasonal

s
3t

/: /
7, Dcmand fol urnl;ricllrrr il,rrurg r.rriir-v seirson,
Ans. Seasonal .

s
bo 8. Increase in literacy ratc in pakistan.

tt p
IO
Ans. Secular trend
St
9. Developmcnt in agr:icultural sector in pakistan.

h
1g
3h .dns. Secular trend
10. Increase in popularrrln in f'ahistan.
Ans. Secular trend
ve 11. Break in supplv rf f'ruits and vegetables due to heavy rains.
Ans.Irregular
12. Attrrnriance cf chiliii,:ir itr i;irc s,;irutl .lLrr LLl ti,"ir:.
ng
Ans. Seasonal
13. Nunrber ol'inarriages tl,rrrirq ,.hc rrionths olApril anrl Octobcr.
tis Ans. Seasonal
rod t4, Nuntbei'oi'v*iriclcs ou tl'rr; r():-t{i. i. a c;ty ,rt school tirne
rta .A.ns. Seasonal
to 15. Nurnber of vehicles on the rorrrls in a city at 2 P.M. in summer season,
Ans. Seasonal
'
sasic Statistics Part'II
278
16. Damage to the crops due to floods' :

Ans.Irregular
tj']ii'*ioll'
1?. Increase in stationery i.tems in the start of ihe new ac:rdu'rilie
Ans. Seasonal
18. Faehion in thc dress,
Ans. Secular trend
19. Number of vehicles on the'roads'
Ans. Secular trend
om
20. Increaee in the gencral level of priceu'
Ans. Secular trend
t . c
o
.

p
_ _ MtJLrIpLg- cHqlcE auE:ll*lLl
L ft t graph of time seriee is ealled:
(a) hictogram
g s (b) sIrtrtglrI linr'

l o
(e) historigram (d) ogive

b
aeCOfdanee wtLh t,ltr:ii- 11'1e, r:1' ricr::;i'ririli;Lt is
An orderly cet sf data aruanged in
(b).
1

3
ealledt
(a) ecrics funpmoritc 'ei'rir'':i

4
arithmetie
(e) geometrie eeriea (d) tinle se:rir':ri
tn,
9
tEaoUr trcnd ic mcaaured by the meth$d 0l'r;qtrrt':,l:{,r'riv,i:i1 'uliuti;

9
t
(a)timeeeriesbaeedonyearlyvaluBa(b)lr'i'rrr!t'lit'"'t'
timC eCriee eOnAiCtE Of even Rurntie'r QX'.1'ai!ra:,' {{li r'rrrit' .l
i"'iir

trendta
i
iaj ,1ltp tq lir:ai $l'rrtlte
io**uuru in the numbcr of patiente in tlre lioaltitul
iFir

s
(lr)
/
iui rurofar irregulir variritr(rri

/
(d)
:
irl seaconal variation eyeliuril tiit'iitl rt",

s
6, Tie eyctematie *o'ipuouote of tinre getiee wlri*lr lirihr''r rr-,6lita!' ltrtlt.errr ot'

tt p
variationa are ealled:
(a) eignal (b) nriix'r
(d) rnultiPlicnllvrr iiiuilt'l
h
(e) additive niodel
The uneyetematie sequenee whieh followe i!'l't:uuiait' l.'i1i1'r:l'r1 rrl"l;qt'iitti0nx
ig

callcdl '
(a) noise (b) uisnnl
Ci linear (d) o.''1'|iileirr
7, in timc gcrieg teaconel variationc eaR CIeeur rvitttitr ;1 Yit:f iirri ii{'
(a) four yeare (b) fihlier: yt,It!'r{
(e) oRe year (d) Ritt*: Ylr;tr*
E, Wheat erops badly damaged on acssunt of raina iri;
(b) ralttfuiiit t!1(rvt!irii:'ill
(a) eyolieal movemeRt
(e) seeular trend (d) ggasonttllIl(rYi!illttll
g, In a etraight linc cquatlon Y = a + bX; a in tlre:
(a) X'interecPt (h) clope
(€) Y'intcreept tcU I1']i1$ {}l'l'hrlrn
-T
[Chaptei 15] Time Series 279
10. In a siraiglit line equuLi.orr Y = a + bX; b is the;
(a) Y-intetcept (b) slope
(c) X-intercept (d) trend
r1, A second degree paraLrola is fitted to the time eerieg when the variations are:
(a) lirrc'ar (b) non-linear
(c) uprvnrd (d) downward

m
L2. If a straiglrt lirre is fitteel to the time series, then:
,AA

o
(a) EY * EY (b) IY < EY
(e) ry , :ir
. c
A
rff - i)'=
t
(c1) o

o
18. Mr:ving averasu rncthorl ie used for rneaeurement of trend when;
(b) trend ie non linear
p
(a) trencl is linenr

s
(e) lrc'n.l ir, t:trrvilrltcltt' (d) ngne of thcm

g
14, Wlten tire l,r:end i$ of rnlioRential typc, the merving avcragcs are to be eomputcd

o
lry uriing:
(a) ariihrnetiq: nlean

bl
(b) geometrie mcan

.
(o) harmonie mrran (d) wcightcd mcaR

3
16, The lorrg tr:r'iri tt'urld rif a tirne senen graph appcars to bc:

4
(u) r;trllglii =lir-r,-r (b) upward

9
(e) dorr,nrvtrrel (d) par:abolie eurve or third degrce eurve

9
16.
t
lndicabe wlrieh sf tbe t'oliowing ia an examplc of s€asoRal variationa!
(a) Death rat+: desreaaad due to advanEe in eaience
ft)
ta
Tlre tagle of nir eonrlrtion inercaaoc during rummcr

s
(c)

/: /
Iirtqovr'r'1' ttt brtsirtapH
(d) Sudelpn cauceg by *ars

s
L1, The rroct eommonly ueed mathcmatieal method for mEaeuring the trcnd isl

tt p
(a) ntoving av€rnge rnethod (b) Bemi average m€thod
(c') ntalhurl of lrrast cqURre8 (rl) none of them

h
18, A treiiri rs the lielltgr fittetl trend for rvhieh the curn of oquareo of rcciduale isl
(a) rrtrtxinr rrtn ill) . minimum
(c) podih!o (d) nogative
1P, Dacomltoeitiorr of tu:r' BUi'ics in callnd:
(n) lriol*r'tdr;trir (b) analysie of time aerica
(e) iiistrigi'RFr (ri) detrending
80, Tlre fire iu a faotory m ar1 Bxample of:
(a) eeeular trend (b) reaeonal movem€ntg
'
(e) *ytlinrrl veriatir:n* (rf) irregular irariationc
2L, itlererr!,t,ri tigrnnnil of irtliulsaiorr in the aubjeet of eomputer in Pakietai ie:
(a) *rttetrlrtt t,r'etrtrl (b) eyelieal trend
(e) 8i?naqonnl trenrl (d) irregular trend
280
22. Damages due to floods, droughts, strikes fix,s anrl'poiii,i,:ai d-iriliu'l-,,,;;
(a) trend (b) s?:1u;:]:1ii:
(c) cyclicai (d) irregirl:rr
23. The general pattern of increase o.r decreast: itl ,:':r.',r:- l ,'.: r i.'
is shown by;
(a) seasonal trend (b) {::,'{i irrl i:',,,.i,

m
(c) secular trend (,1) rfi'e?iiiil" :-i'ri.r{i

o
24, In moving average method, we can not find tire irerir-i -\/;r.iLios ul'sr):r.1i,

c
(e) middle periods (b) €:t:,f r,:ilL,-i., fr-

(c) starting periods


t .
(d) lti:Lr',-t'c:: t.-;i.f lii:r: i.:'r;,i;iir

o
25. The best fitting trend is one in which the surrr i,f llr;r-ia lcn i;[' r:lriii ir-;;i,.: r'.

p
(a) negative (b) lt l:sr.

s
(c) zero (d) m:r;<i:-1urrr

g
26, In fitting of a straight line, thc value cf sir,pt, ,'r.!r,:: ": , ' .1 i .. :

o
:

(a) scale
l
(b; crig i; i
(c) both origin and scaLe
b
(d) none ,.rf thr.:::;

.
27. Depression in business is:

3
.
I
(a) secular trend (b) r:;.'ciiL:li

i)'4
(c) scasonal (d) irri:;-r:.ii.ii
28.

99
In fitting of sffaight line I(Y - = 0 rvlien:

t
(a) all the observed Y values lie on ti'ie iure

ta
1..

(b)all the Y values are greater than the corresponding Y valuers


(c)

/: / s
all the Y values are positive (ri) no;re of ti:r:rn
29. Semi-averages method is used for measilrelrrLlnt ol ,.i'. rrd wh*n: i

s
(a) trend is linear

tt p
(b) observed data contains yearly value-q
(c) the given time series contains odcl number rif r,*lues

h
(d) none ofthem
80. Moving-averagr.is: +i
(a) give the trend irr a straight line (hr7 iriilapui'{. i ii,:} r;r-:ilriciiiil !':ri'.i:,i ,'
(c) smooth-out the time series (cl) ncne ol tire;iir
31. Tlfe rise and fall of a time series over'1;:r'; .l i ;:cl,' .:;:r,r ,'n{)'..,,r:' ,;'
(a) secular trend (b) jisil..;..1:,.'i ::l"i;itiotl
(c) cyclical yariation (d) irt cgrtl.ir \',lr rirti.;n 1,,
32. A time series has:
(a) twto components Lh rr:r;'-:.;uti;U il* i i t,s
(c) four components (d) five componernts
33. The multiplicative time series model is:
(a) Y=T-fS+C+I (b)
Y = TSCI
.(c) Y=a+bX (d) Y=a+bX+cX?
281
). .::,t .., it,,, .,ti ;'t ..,.i,: j ,.,,i'li:t: lirnC Srjrles iS:

:i1. . ,,,.' :i.iii'-::iiil'.r irtrl,'y't*:cn tilc actuai value of the time series and the forecasted
. ..'i.1, .

.
m
:.., ,ir,:r:. sum of residual il:
,:'.

o
tr,ir.rij-l rl r"esiciual (d) all of the above
,..:;.::;i::.i..{1:_

c
-:'i., ,',, ,1s,1 ;;:.: 1]:"' ;.:. li,;i.,atr.1 thror.rghout a time serieS and has a recurrence

.
1.1

t
:.;i:ri'..J.rl ,li r.i ;,.;Csi tno yeaf is called:

o
', . r ' ,t .^*-j',:l (b) irregular variation

' : -::..,.,.,,:,,,..'..1,i,. i'.,;i1..

s p
g
; ri.r:!-,t_, two stages (it)

o
,r-i:-;(a

l
i .r ,l''.:.;,-. ri;i_?r-';r four stages (d)

b
'r.:, r',, ! 1r'1 i.h* ptcil-ilf;ii,rn of a thing is iaaximum, this stage is called:

..r:.' l:li;:r: (b) recovery


3 .
4
, ,', ;.';,.;1 -,:t (d) depresSiOn
.;l:r ''
9
.,,,ii,.;ii,p:'l;,.iuci;on cla thing is minimum, this stage is called:

9
;l:';:ri'll.'':i (b)

t
recession
r'.:r.r{i!,:rjr.1r (d) depression
),'.. '. : 'r, i:1. 1',,1 ;;
ta
" .",-: ,.1 i. ihing is increasing tcwards prosperity this stage is

/: / s
11_rr I.r a.:.

i . i -,,r

s
' 'i (d) depression

tt p
i'i. of a thing is decreasin!, this stage is called as:
',r:',,fr,1 ,,i::, i.r..'.,:i ,.;rr..rlr1
ft) recovery

h
,.: , r,'r,si.;.;.i{,..,, (d) depression
j:. ';'ir: sr,riiight i:re rs fitteC tr: the time series when the movements in the time
.,.:'i'r,r :1l'C:
, i r:cnline ar'
,: linear (b)
!.. i:'r',:{.;1,.i' upward (d)
,i:1. ii';,rr hrror-l:ri iiincseries consisting of even number of years is coded, then each
,, .. ,, l.lai.ii-,,;.;i iS r-rirUal iO:
, . r:,:,,'",.;,L (b) one year rrr:

r:,, '-:i;:i, .:ii :rr{ il.,i (c1) two years


',
,t.:, r,, .,,,.,-,,,,,,.,-i-,r::'ri: j:,r.urzr,ilole is fitted to thg tirne series when the variations are:
!.. i:r:,1p1. (b) nonlinear
:182 Basic Statistics Paft-II
46, For odd number of years, formula to code the vaiues of X by takirig origin at
centre is:
(a) X = year - average ofyears (b) X = year - first year
- (c)
X - year - last year (d) f, = year - ilZ average ofyears
46. For even number of years rvhen origin is in the centre and'thc unit of X being

m
one year, then X can be coded as:

o
year olyggl!
(a) X=- - ave_Isff (b) f, =year- average ofyears

c
z

.
(c) [ = year - 0.5 average ofyears (d) X = average ofyears - y-ear
47. For even number of years when origin is in the centre and the unit of X being
ot
p
' halfyear, then X can be coded as:
(a) [ = ye&r - average ofyears '(b) X = 2(year'-average ofyears)
year
(c) X=- - averaff_ef_years
g s
(d) X = year - 712 average ofyears

o
2

l
48. In semi averages method, if the number of values is odd then we drop:

b
(a) first value (b) last value

.
(c) . middle value (d) middle trvo values

3
49. The trend values in freehand curve method are obiainerl by:

4
(a) equation of straight line (b) graph

9
(c) second degree parabola (d) all of the above

9
50. IX = tX3 = 0, if origin is:
t
(a) at the end of time period (b) any rvhere
a
t
(c) at the middle of time period (d) at the beginning of time period

1. (c)
10. :/
2. 3.
(d)
/ s 5:
(b) 4. (c) (a) 6. (a) 7. (c) 8. (b)

18. s
9. (c) (b) 13.
(b) L2. (a) (a) t4" (b) 1E (d) 16. (b)

tt34p 3527. 28. 3729.


11. IU.

L7. (c) (b) 19. 20. 2L.


(b) (d) (a) 22. (d) 23. (c) 24. (d)
30.
h
25. (b) 26. (b) (b) (a) (a) (c) 31 (c) DO (c)

B3 (b) (a) (a) 36 (c) (d) 38 (a) 39 (d) 40 (b)


4t (a) 42 (b) 43 (c) 44 (b) 45 (a) 46 (b) 47 (b) 48 (c)

4e (b) 50 (c)

SHORT QUESTIONS

1. Given Y = L27,101, 130, 132, 126, L42,138 and t = 116, 120, 124, 128,132,
136, 140. Find e = (Y - \)
Ans. 11, - 19, 6, 4, -'61 6, - 2
2. Given$= 128 + 4XandX=- 3, - 1; 0, 1, 2, 3" Eind IY.
Ans.896
3. Given EX = 0, XY = IXY = XX2 = 330. Dcterrnine the value of
-n"

Ans. b = 0.09
Chapter 16] Time Serfqq
4. Given IX = L'i, IY = .il1?2 and n = 10. Find the value of X intercept a'
Ans. a = 411i,2
E. Gi.,,en rr* - i - 0.b, - 0.b, 1, - 1, 0.5, - 0.5. Fihd sum of squares of residuals'
A^
Ans.I(Y-\)'=* ,.
6.' Grven Y= 6, 8, 10, !2, L4,X= 0, 1, 2, 3,4, and\i= 6 + 2X' Compute the sum of

m
residuals

o
A

c
Ans.I{Y-Y}=*

.
A

t
7. If Y= 16, iB, 2Q,22,2'1,X= -2,- 1,0, 1, 2,andY=20 +2X' Computethesum

o
of scluares of resirltla.ls.

p
A

s
Ans.I(Y-1-)2=0
8. Given IX = 0, IY'= I)Cf = 66 and n= 7. Fit a linear trend'

g
245, IX2 = 28.

o
A
Ans. Y= 3S .i 2"4X
g"
bl
suppcse ihat a corporatioi:r finds that a linear irend for its saies
is Y = 5 + 0'1X'

.
rvhere Y is the fil'm's rncnthly sales (in millions rupees) and
X is-measured in

3
the forecasted
montirs fr,;ini .j;tuuary 1982. Based on this trend alone, what are
sales fo;: lhe firm in ir':bniary 1990?

9 4
Ans.If Januar;; 1982 is X - ,-i, then February 1990 iS X = 97. Thus, the
forecasted

t9
value of salei in t= b -i' 0.1 (gz) = 5 + 9.7 = 14.7 million rupees'

a
L0. suppose ihc iirllo*,,ing data represent the 7-year moving total over the 1l-year
periorl 1990 tL, 200{j:

s t
/: /
56, ?0, 8il, 9t, 11!. i.ornpute the 7'yeat moving averages'
Ans. 8, 70, L2, L4, 1G

s
11. Gi.reu the follor,vrng tiata:

tt p
al' I (lQ\ 1996 1997 1998 1999 2000
Ye

h
Value 2G7 27,O 2t6 213 224 218

Applyiiig the nietirocl of senli-averages, the trend values are-109,21.1,213,215,


1987 as origin'
217 and.219. ll',iliathe eqr.ration of the trend line tdking

Ans"Y-209+2X
L2, Supposc tire Xeast-squares trend line to an annual time(inseries containing
10
' obseivalio;is from iggf to 2000 on real total revenues rnillions rupees) is
A
Y = l:,' i.liX (1991 as origin). Interpret the Y intercept'a'and
slope'b'in this
linear trr:*d mcclel"
the real total revenues
Ans. The Y interce pt a =' 3 is the fitted trend value reflecting
real total revenues
during ti," ooigil i991. The slope b = 1.2 indicates that the
are increasinq ai p- rate of L'2 million rupees per
year'
284 Basic Statistics Paft-II
13. Suppose that the least-squares tren<i l:ne
15 observations
L<i a;: iinnuai tinre series coutaining
from 1985 to 200f1 oil reai nt.,c sales is i = Z.+ + 0..rr X (1g86 as
I'
,

base year). What is the fitted trenrl valuc for thi.i tinie series on real net sales
for the 5th year?
A
Ans. Y = 2.4 + 0.5 (4) = 2.4 + 2 = 4.4
14. Suppose that the least squares tren,t li;re tr: iirl ailuiral ti,nre, series containing T
t'
m
observations from 1987 to 1993 on reai tc;tal proiilr, (iri thoLrsanrls rupeesf is

o
Y = 600 + 75X (i99r3 as origin)" l.Vhat is ti:e irilr:r.i rrend value for this time

. c
series on real total profits for the r;r.ost recent x,:i:crcitd 5-ear,l

t
A A
Ans.Y=600+75(3)=g25 *
o
a,
-
15. Suppose that the least squares trencl hne to an annual time series containing T
observat'ons from 1980

s p
to 1986 on the world production of golcl (i.n million

g
n
ounces) is Y = 128 + 4X (1983 as origin). \\rhat is the trend forecast for this

l o
time ser:ies on the world production of goid 3 ycars after the last recorrieC valge

b
on total production of gold?
Ans. Y= 128 + 4(6) = 152

3 .
16. Disting"rish between short term and long terrn var.jalions.
A

9 4
17. Write dolvn the rrariotts meihorls of measur:ng secu.[a:: trencl in a time series.
18. What are the different components of a timc ser:ics?
3.

19. Define a time scries.

t9
a
20. What is meant by analysis of time seriesi,

s t
21. Distinguish between regular andirregular fluctu;1ions in time series.

/: /
22. Define cyclical movements.
23. Distinguish between arlditive uodel anri rnultiplicarive rnodel of a time series.

s
24. Explain the terrn secular trend. A:

tt p
25. What is rneant by seasonal variadons? 4.
26. What is meant by business cycle?

h
27. What do you understand by parabolic tlenci? How would you fit a parabola of
the second degree to a time series to obtain trencl vaiues?
28. Differentiate between histogram and historigranr.
29. Define the method of serni:.averages.
30. Explain the method of least squares in a time ser.ie s.
Ar
31. Differentiate between time series and anaiysi,q of iirne series.
D.
32. Differentiate betu'een signal and noise

I
.

33. Discuss the meaning anrl purpose of moling a\/E).'agos"


34. Describe the ilifferent comporrents of tj.tn,.: * .lrrt,
dtD. Distinguish beiween secular tilerrl, se.1;:,ooal ','..r,ritrcr-q anil cyclical variations. lc,
36. Define irregular varizrtions
tr
Ar
-
285

EXERCISES
1. PIot the following data on a graph paper and drarv the trencl passing througb
the given data by free-hand drarving and read the trencr varues.
Years 198 I 1987 1988

m
Sales oo

o
(Lakhs of Rs.)

t . c
o Id
for the followirrg data.

ip
s
n
I'
1985 1988 1989 i990

g
I EATS 1986 I 987 199
(). t'

Values 874 tO24 168 1405 1664


l o
1958 2258

b
1t''

Ans. 787.5 , lA22.O, 1256.5, 1491.0,

3 .
tr-7hb.b, i 960.0, Zlgt'.5,

4
t =787.5 +234.F-tX (198b=0)

9
3. Piot the following data on a graph paper and compute trend values by the semi-

9
averages method.

a t
l
1989 199C 1S9i 1992

t
I 1993
Ev{s
s
v ///

/: /
I

I
280 285 285 279 295

s
Ans.307:45,304.15,300.E5, 297.55,294.25,290.gi1, 287.6b,284..9b,281.0b

tt p
4. Applying the method of semi-averages for the following <lata, deterrnine the
trend veilues and also write the equation of the trend iine.

h 1991 1992
220 21.8
c)d,)

Ans. 209, 211,2L3,215, 217,219; t = 20g + ZX (1g8T as origin)


5. Obtain the trend values for the following data by using the method of semi-
averages. Construct a graph illustrating the resuits obtained
Years 1981 1982 1983 1984 1985 1986 1987 1988 i989 1990
Consumption of cotton
706 854 886 815 aq1 761 805 797 /40 nq,7
(thousands of bales)

Ans. 837. 7 6, 827 .68, 8 17.60, 807. 52. i 97 .44, 7 87 .3G, T77 .28, 7 67 .20, 767 .tZ, 7 47"O4
odd.
-:-
286 Basic Statistics Paft-II

'o. 6rnp,.lte $-yeetr anil $-year movitrg averages fi'om the following ctata:
Year 1982 1983 i98.1 1985 i{}86 1987 1988 1 989

F-actory Sales o.,L 7.8 8.rj aq 8,6 7.8 8.1 7.9


(in millions)
Ans: 7.411,8.47, 8.73, 8 5?, 8.i7, 7.9:J anel 8,1),1, 8.36, 8.42, 8'34
Tire data gi','en below' repreijent the annual uuilber of ernployees (in
m
7,
thou.sancis) in an oil siupply company fol lhe years 1982 to i991.
Year 1gB2 i oq.Q 1{.}t{..1 lsBS 1986 i 987 1988 r989 19$0 1991

c o
t .
,

t?a t.77 1.90 i.82 1.65 1.73 1.88 2.00 2.08, 1.88
Number of employees

o
(in thousancis)

p
Compql6 ?-year mgving ave):ages to the data and plot the actual and"trend

s
va",a4 on the sanre graph paper.

g
Ansz V{8.1.82, 1.87, 1"86

o
r'Compute

l
8" 5 yeari5, moving averages of students in a college as shown by the

b
foiiorving figures:

.
Year 19$3 1994 i9i)5 1996 1997 1998 1999 2000

3
19192

4
Nurnber of i O{-}.: L45i 1592 1662 1805 1910 2427 2050
I D.]": LIJLJ,

9
students

9
Ans. 1488.O, 1582.G. lGBS.2. L799.2, iB$C.8

t
9. Compute 4-rvear centreci moving aYer:a-ge lbr tire follorving iime series:

ta
Yea.r 1993 t 994 1995 i9{i6 i997 1998 1999 2000

/: / s
Production aq1 34,1 349 AL7 /, 364 395 400 410
(in million kilograrns)

s
Ans: 343.125, 353.625, 366.3?5, 382.500

tt p
10. Compute four-quarteriy moving averages of -by
the data for 2000 to 2001. Con'lpare
the moviug averages with the original data picitrng both on the same graph

h
paper.
Quarier
Year r
I ii TTT
t _t,t IV
2t\ aa 45
2000 20
O/l 30 36 48
2001
Ans:31.5, 32.5, 33.4, 34.1 :

11. N{easure the secular lrencl by calculating 2-yearly moving average centred for
.the following ti.rne series:'
i994 1995 1996 i99? 1998 '1999 2000
Year
250 ootr 425 520 600 720
Value 170 r)Lt

Ans: 248.75, 331.25, 423.75, 516.25, 6L0.00


-a'
[Chapter 16] ",f,ime Series 287
7-day moving avemges for the fotrlowing n co;:d cf attentlan*B:
-

m
Piot the agtual and trend values on the same graph paper.

o
Ans. 46. L4, 46.71, 47 .OA, ,t8.57 , 47 .7 !, 47 .\4. 45.14, 42.29

c
13. Fit a straight line Y = a + bX to the following data taking the origin at the

t .
middle of the time period and unit of rneasurement fci X being one year.

o
Year 1996 199? 1998 1999 2000

p
Profits (000 Rs.) .i20

s
50 70 80 160

g
Ans:Y=96+27X.

l o
L4. For the time series given below, relating to rhe rvorid production of gold.

b
compute the trend values fr.rr each year by' fitting a straight iine.
Year

3 .
4
Production

9
(in million pounds)

t9
Ans:Y = 8 + 0.2X; 7.A, 7.4, 7.8, 8.2, 8.6, 9.0

a
15. A manufacturer of computers for the industriatr marker, huilds each unit to

t
specifications after a firrn order is received. The nurnber of units scheduled lbr

/: / s
delivery over a 6-year period are listed beiow:
(i) Fit a linear trend equation by the rnethod of least squares.

s
(ii) Deterrnine the estimated value for 1990.

tt p
Year 1982 i983 1984 r:raiit 1986 1987

h
Units scheduled for 42 45 49 4fr 43 45
shipment in hundreds

Ans: (i) Y = 44 + 0X (ii) 44


16. Using the method of least squares derive a linear trenci to the fbilowing results
for the years 1985 to 94 (both inclusive):
EX = 0, LY = 322, EXY = 1550, ZXz = 330. Find out the trend vaiues as well.
Ans. Y =32.2 + 4.7 X; - 10.1, - 0"7, 8.7, 18.L,27.5,36.9, 46.3, 55.?, 65.7,74.5.
17. Fit a linear trend to the following infbrmation fcr the 5'eals if}66 to 92 (both
inclusive):
XX = 0, 2Y = 245, XXz = 28, XXY = 66. Also compute bhe trend values.
Ans. Y = 35 * 2.4X;27.8,30.2,32.6, 35.0, 37.4,39.8, 42,2.
288 Basic Statistias tsas"t-II
18; if the srraighr llne fitteri tu tiie iiata:+i in,; ;,:1. ; -'i-ljij to 2ri02 (botil ii:,:iusiie)
with origin at 199$ arrri'ur:.ii ci nisariry.;inr:*i f+r X i:eing oirn:.y*si: is
n
Y = 130 + tr4x. Find t]:e ti'l:lcl =.'airies rlt'r'n riri)r{i:"rg tl the years igfi6 to ZA0Z.
What u,outrd trc thc equati,,x: cf Lhe f,ir,;r:giri 1.,r, ,l l! <: oligin js shifreci to 1gg6.
/;
Ans:88. 102, 116, lB0, i44. 159, 1??; y = SE * L.iir

m
19. If the straight line fittcc{ to rhe d;lra i"o:'ti:=:,,r,. i;'}i to 1000; borir,,rclrrsivc,

o
witir origin at the rntrtdle of 1996 a"r:ri 1Sg7 i,, J. = .34 + 2.6 X, the unit of

. c
measureflient for X being LlT yea.r. D*i*l'iiiia* ,,:r: ir.enrl vaiues for the years

t
1993 to 2001). AIso deierrnine tl'.e slr:righi ii:e b..':rri{ting the origil to lgg4,

Ans: 15.8, 21.fr" 26.2,31..t, :lG 6 +l B, 11 *, 52.2: y-= 2j {- b,2 u

p o
s
2S. If tire }iaear tlend in the da:ia I'ar the ;:.ears 1$*B tr',. IC03; Lroth inciusi-,,e, with

g
or"igin at thc niidrJlc o.l'p."^,rlt ,:riC :ZCC1 .; ,---"--;;;; io, il U"ln* on"
"ni ',*i,

o
,^

. )rear is Y = 4?.:5 * 5 3 X. uc';ermine ti:e I;i:end !in* i:y shifting the origin to the
. i{r:at i998 ar:rt hcnce iie,.r:r";llii.;s :r+ ;'er,,l ir;ri.r:..j
.!

bl
.
n
Ans:V =2t,+ 5.5 u; 2ii.ij;3i.3.:jT ri jli..l..!+.i" ,ii i-j

3
2L" Fii a second d-egree {Li:rve t* the f*liolv,l:l" l;su!ii. i,-ir the :/ears lgg3 to 2000

4
9
(both inclusir,"e).
IX=0, 2Y=i2C iKl- i8J,,'-:lf,r: 1fij) .'; .
9
t.,,1, ).),i= l), IXi

t
-- 5:.1i{.
A

a Iy=
Ans: Y = 22.25+ 3"5X + fi.2,5 X2

t
22, Fit a second degree parab*Ja tc iire fciici;i::g resuits for the years 1SBE to gE
IX=C, IXz= 1trO,

/: / s
IX3=0, IXa= lgbg, 4i0. IXy=60i. IX2y = 4ig7.

s
Also compute [he tnend vaiucs.

tt p
/1
Ans. t = 31.5? +.5.46 X r- lr.5: -i--': i-8.1i2. .tE.G5. :ji.:j: ;t, i);i, 26 {ig. Bi.f7. .tI.60,
7;j.L'-Z
44.i7,53.08, 62.55,

h
,

23. The fitted r.-;,r;;;;- *rrl*"t- ihe rv-{:ai.ic :rr$s to 2001; both inclusive,
t-crr
\
wtth origin at the middle of 1SSS and I-SSS is }i = 80 + 10X -'ZXz, tire unit of X
b9i1s half year. trYhat wouid be the equaii*r, oi r,i* secorrcl degree parabola by
'. shifting the olig!n to the ",'eer i"qg6.
A
Ans:Y =-20 + 60; - Errz (1Sg6 = 0)
*[:apter
1'{
P{ ffi ffi #G[VI P UTffi ffiS
m
CI reE ffi f*"$"FaYHffi

c o
t .
17. 1 IhiT}&OD UCT'ION T{} C*MPUTE1}"$

A u;trtputer is ,jt-t ,',,ir.,t,,;'otii;r.:


accep'rs'iitformatiulr, sLorr:ii;,, rl;:i:i
inir'l-,it:e lh;rt
jt':{'ti-:l:iiic..: ii
p o
s
l1:,r,,

ndeded, proce$ses Lhe irltbr'rii;ri..illl; ri-'r.i.l;ri!ii{ to Lhe


instruc[,i.*vts prol,iti,,'r[ try ii--' i:.',' l,,l iirr::i'.,, retiirns

og
l
the r*suifs li; thd user. 'iirt :.:*'.,.i!;t,i,,,,.i' ,r;:n slcre and

b
maniputratc largc (i rtr)Iii rl r '. ' ir ,-; irl ver5r higir
spcertr, Lrirt a conil.irltet' {.i.!,i; i. :lii:rl'.-. A cct:tputet'

3
makes ikrlisi*ns bas;.l otr i.,;';r.- .-'uulrii..r;l'ieons sucil .
as one number being

9 4
largeruhan another, Although the carnputer can help to soive a

9
tremendous variety of problems, it is sirnply a machine. It cann,ct sr:Ive problerns on

t
its own. ,I

ta
1?.1"1 {lS&fPUT}trR C}"}};tEiItgTIRlS AHf} g?S UStslS

/: / s
Llomputer is a n:achic* capalrir, of perf,:rming tasks brilliarrtiy: Foilowing are
some of tire comptrter char*.1''r'trIics.

s
{i} SpeeJ; Cornputcl r': 'r s.'i:i'1 fast.ieuice. A powerful conlputer can perform

p
3 ta 4 rnillioir alithn:,ri j'. c,pei'atre:..q in a second.

' (iiiit
{i,i}
t
Accur*e3t: C,:;inpule, alway.; produces results l-tlO?,i, accurate, depending

h
uporl the iupui d:*a aarl the instru*tions to carry out the given task.
fiiligence..' IJrrlike: obher nlachines, computer can work for long hours lr
!

v*ithoui, nraking: t:'r;r- co:irpie:nt. Tlir. ionger iime duratfon, nel,er affect its 'l
n,orking.
(iu) Versutitity; Cui,:lrrrui' vc:rsatiie tnachine and can perfbrm variety of
;$ j:r .'1.
tasks, like, L,I*.t}:eiu;l.lit;ral/ Sietisiicnl Da.ta Maiiipulatiot, Vford
Proi:essing, Spr,:l.l i,l:-;,. 1s ilata ilases, lirapirics, Cuntmunieation efc. *,
l,
Fr:i,I,'ie fro;u ,ii{1'ereiri ii,,rlr.ls ai',*'using ci:inp*ii:::s in t}reir pi:ofessions due to its
versauriit;r, spee{i, ACcu!';icv inri cr)r}si$[enc5' an,J-tnany orher feattrres. In to dayls
,fl
rvorld cornputers are'beinSi usr:ci in business, baniring, si;ock exchange, education, +l
il

889
290 Easic $tatistics Part-II

17.9 COMPUI'EI{ I{IST'{}RY


Earl7, ilet,eiupnten I
.{lthouEh thr.r ri+vclr;i.rrr:eni, r-ii'
digital t:or:tli,.rtcl's ili ror.rlr:ri in
tbe altili:ris anri piurl.y
*ti rr g ticv icr-rr,
meeha rtir:ai crrlcul
Charrlea ilairb;rgr: is cr'+rhti:ii
witi: tix: dtsign iif Llrr Iir:st.
m()(lrtl'tl (.(,trr:.,U1(,r', tiir
om
. c
analyti r:*tl " ul'ig r*

t
" c r i g,,:iltr,:, r'l l'i x, I

1830s. Antrlricun $ctgrtirit.

o
Vrtlrnrvnr llrir:lr lrrrill ,l

p
ntcrelr$ nii;l l.iy o1 it: r;l i t-' i I tl r,:v irt,, An;rlytieal Eneine
eu[ed a i{if"lil'errti;tl ;inrii.1,s1r, in liiiiti; it rvirx tite iii'r-*t. genlrrnl.put1rose Bnalog eomput€r,
,John At,lrrtitsluirl'll r:u!r'il.rrrrr!,rrri Llr,: l-rr,st :..rrtiri r.:lsctroilii: elig-itatr eomputing deviec in
g s
1 03n,

l o
b
tl rt,i r r t l. {.1* nrj i i I { : g,4:f rl r: lt i n *r

.
Il I * CI t'a u t g r: h ;l r

3
'tr'llt, i'irrrt I'iriiy rrr,rliiiri;!{:ii: r:;r!rlulrrtrrr wir;t tirr, lllnrli I, i-rr AutOmAtie SeqUenee

4
('],:ntr:Ollcrl (.riir"itl;ilr1r, i,, 1i1i1; rrr lSiill;rL []*lrvafrl tiy t{riwafrl Aiken, While the fifet
ull,liUrp**1; r,lrr;l i',;rrr, ,iiiirt rl , ,r;r!rrii,rlt,, IIFf iAt. (1,:llpr:i,r'rrriiil .FJtrntr:r'iga1 IntegfatOf

9
-!'Ai.ittUiti
l,iriti,,1a, rt.'XS (:hltlFliltr:d in 1916 at

9
A11el fltfli,ulittur'1, r,'!rir:lr i1516,i! [.it1r!tiili.!]tlr, lrf

t
ttte Ljlirv,-:r'ntl.Vtrf i','r,,,r,1[rrr.rr:r l-ihliV;1(]ttiN!Vi.r'*ltl Arrt.omet:it: UOtnpulef) beeafne

a
(lfifJl) t,irr, fit'l{ iti.rrrr1rtrl.r,1'1.* }rirrrrllc Ir,rllt rrrrrrrcrr,: irrlrl r.riphalletia data wlth gqual

t
lircilitl': Ihir tolou rl ,,; llti,.l, i rit!tt!rr]j, ,,,lli' itvitrltl!.ii,, riqttrlrLlt.e!:,

/: / s
Er+ r:+Aea.4r+,.*rr++r, -+ :!!*+i+r.ffir_-+*:s

s
tt p
h

ff a mp t' f;rri rjr'{l i r {l r! s


w t *:

Irfutrl I i.l.tij rril lt*t'rlu i 1tr, rttrr{,riu!rrrit}r.1{, rii' riir{:t.r'rrliir: r:rir4lii1[prtt afg elivided into
l'iue 8fnttntlt,t:]ttt:i, rit!1rr.:rr..! ty1tr1 l.lli{-rli i,!tit tirt:liltr-rlut{,ir.rti ill1c.rl,
171 Orientation of 391
- .,

First-generation eornputr:rs ilf] t?-if)59) uiiliacd vi:cuum ttil:e* and


magnet,ic .oirr. Thes* comlrut,ero w€]r-r vur')' rtrgu iit stat,, sl+rv in 1Ueqcl, non reliabic,
u,rrl ,liftj."lt Lo nraint,ais. $e*orrcl-61*,nerat,itlr: nracltirtsu {1{}fi{}"1iiUr) wetre colr}B
intei existence wit[ tirr' ,1{vrrp1, pl trlrrsistor' lr.:cllnol,-.lfi.y. 'l'lii:se ctttuputers were
smnlier, used, less pr-)wrjr, ;rnrl corrlci pr:rfoirrt a itrillion opct'rltrortr: ptrr *eeond' 'l'hcy,
in turn, ,urrre **i,i,rr:..:ri i.;y tirr third*generiitiou (i$eir-tl'l?tlJ irrl.cglatcd"cirurrit
rnaehines. Th*,je \/r-i!'e even sqta.ll*r iinrl vr,:,'.. lt:rt'rnol't reitirl.lltr tnircltirrt:s" Ilottrtlr
generation ccltrltui;er* (1$?()-lt)ii(i) Wcrr-. ,*^i):rr lrr:ierize(i lt1' fiir (irrtrli{)lril}enl of thc

om
Iri*tolraut*o*,rr' ,,rrd tlre evolutiotr 6f 'r'rrt:l'r':lr"i:;iil,"' t:tti:tllet' brtt pon'ct'ftrl r=;ctttlFtrtr-'rs'
such as tlri: persr:pal u6rnputr;r, whielr tr;hr':'e,l i:i ;,t pe,rioti of rilpiti gt'arvth in the

t .
eol,pu[cr" irrdustry, tri(th {iefiei.&ri(;ru (ii}lJr.i-eir.+lrr<!) ii.r a grn*t'ltioii of ci:tli1;uters, c
o
to w6ich scientietlt rvant t* *111i1r wrtii Lhinliu!H li{iwr'!':u'tti c*trraLiiiities erf I'eiisnt'tii1g,

p
leafning, rlfaWiUg itti'err:rreilr irnrl rtr;rii;rlil r1q:r'11-11i1ri iilic i111v1rir:I1 lff:irreg'
1?"8't'YPFlfi ()lr ( i( l$1F ry';'1;',lq

g s
floitlprlters nrt: rJividr"tl irrt'u t'ilrrrl ifle:' Tir'-. tin'rsirlu i;'r Ltl*'lt':r'l upon the design

l o
ancl wnrkiirg r:f the ciiiupuLr:r rvlrii.:lr iiitlt;rs on lltl t..v]rc of tlri: itt!'rutt rlata and
t]re

b
{'urrrr of iLe r,utlrrtt.
1,?..9" 1 4H AILIC+ Ct!U t' l"J',I-'l; Il

3 .
An arialog rjprilFJqtgr re!,)llleqtltB eletrr ul plr:v,Ft{rai tlttiri!ttiitlrr;1p11 rr1'rrrl-ttE$ Dn l;}iti

9 4
riata tiy R4rrttpulrrtiiii{ ths (ltlantlLieE 1t u rlrsrHrreti lit }rri',rirr'i3 rl;rh iri wlticli tlrr+
variabie qu&*itieH uary eoutinuoilElyi it ttnrrulates ttrr: r.!lrrl.tirni{litlia ltrtwtre:fl f irir

t9
vaiinnfgg Qf a prerLlleln 1ntCI al1alogpllg rolaftonsiillipi Ltrfwt'r:il t':ir:irl'!'tfirtl illtrnt!'tf itttr'

a
F{UelX fiFd r:U4,t:Irl ,rfrd Vrr}tAgf,:, atiel EulverrC t,tln urtgtttri! 1rr,-il.}1rllt
try-.rolvrRU [he

s t q.r;alrlt'tt,liv Ur,tl!'til tit Llitr iiilr!r'littUttr!t rtlril r-rvilirrltittlir uf

/: /
i*Urr,r*, ;inirlclU ,;rr,ri!rr,Lr,,,: rrrlr-r

itynArrtie Eril-ui;io!rd, dLttlt iir.l t,liri fiii:ht, iil'lr ,;ii:i;:1,,r,':ti-)ijilltl


iit' i!1rr i:ir;rt1H'itlf 1"'r'ltttti:t'
ptlttril,nti 1.1y6lf rt ltc!,lrri111 ill'(!ri. Aliiiliig i'iillr1rr,rti.!'I, l!i'ri i,'rrlirrnunly iiliiri,.l rtl E*!t'Jl{ilt'lltg
trg

s
oi'lii,r,i' nrr: ;ili]ii l{ttt}ttJ!, ili'! ltEgtJl.ll-EUt:tl.o*e.
*iiUr,,l,;oroliliri B11a! gj;1lt-!lurrr' rtrrit,i:i'ir,

tt p
Q-amJtulr:tg'"
fT.S.* Si(iI,!.dt, {l(}n.t l,tj,!'!rtli

h
A riigit{l q0lrq]i.it,r,r,ig llplligtri.iii tc 1}ru,:tir;e rlitl,it itl t1-lrtii*iri,:rii ii;i'trt; ltH ctt":ttits
I)tl!,furytl l'[i1t,r:tly t,h* rrrntlir-rrrirl.uiri
tttullrllltt,lit,,,r, rrrt,i ,lr*i,r{ru 'l'lrr:
r]liti!'irl lli!r', !i1' ritiiltl,ir:t!, tJtibLrililt;li]11'
riirrtiirrrl'r; ,tp,,l'rli',I lri' rr 'itl.i'liri' t'rrrq!lrttlcl' iit'i:
11111y1rp;g6r-l tll i[ttr ilr, irr.1 i,\'tili:llt,
'l lrt: feAttl[e Ul',ligtt,al {.:grrrl}r,;tr:}]fi ill'ra rll{rt(-} iir:uiji r!Lii. 1;l'tti.:!ir.! ,1it(i a.-:lttjiit.utlll! ttttll
trl g111y4; ltifgir
l,he fet Lllt,s nf firiirl,,B il,ittI.r,L,:t"o I$i.,rlt.:r'ii rljgrl,erl {:.iliiittl.i;t'|E art'ir.:}llrlll)lr;:
"['ltfiiltl
Bltloilfi[ u{'llrrtit dttd irtllirr,tn6tirrir ,rritl ,r11't,.,,,,,t1tl.t{q,; lIiit.ir ir! t'{'l1y ltigh trtl0titl'
CIoRlpgirrfH Ai.e rrecrl irl rllftlti,;ir rlvryY fil|.l rr. ir r,r ,rtr1 irlr:{r }ili+twtr
rlri &,ef.lgryf l8UXA!}88

EsrrA4drru
1?,S,8 trlY$ItID LlfJll,tiltj'rult :

A Hytl,itl r.rolllFUtFr i$ l,ltp uritrtlttnttir,,:1 ,rl [[1t, i:irnt. fr:ill,ttt'HB (,tl i{'tl;.rlti$ tlEil
Digitlfl r,titlprtlet,, ii.g rl, ltii;:nG$rrr:s Lht: fitr:l.trt'riirrllltl irli:,l rlttiti'lH cuflillitf''rt';irl(l
2s2 Basic Statistics Paft-II I
- I
accuracv iike digital compttter. It can meailure both in terms of physical as well as
digital quantities. These are rnainly usecl in specialized applications wliere both
kind of in{ormation ueerlerl to ber processed. These computers are used in the fields,
like, air defense system, diff'erent laboratory equipment foi medicine.
1?"4 CLASSIF'ICATIONS OF CO1VIPUTEIXS
The computers ale ciassifietl into bhe follorvirrg foui'marn categories, depending

m
upon the processing power, speed, size of inain meniory and other crp"biliti"s

o
possessed by a computer.
(i)
(iii)
Mainfrarne Computero (ii) Minicomputers

t . c
o
Microcomputers (iv) Super compurers
l
p
r?.4.T ftfATi{FRAME COMPUTERS

s
Mairt{raure computers are very large, often filling an entire room. They can I

g
store enor'lnous of information, can perforrn malty tasks at the same time, can

o
c

l
comniunicate rvith rndlly r-ise.r's at the same time, iln{i are vcry expensive. The price
of a mainfi'ame coinputer t-.requently runs into the miiliotrs of dollars. Mainframe t

. b
computers usrially have rnany terminals connected to therrr. These terminals look s

3
lilie srnall computers but they are only devices used to send and receive information q

4
it'orn the act,uai cotnl:uter usi.ng wires. Terminals can be located in the same room c

9
with the mainframe ccunputer, but they can also be in diff'erent rooms, buildings, or
cities. Large businesses, governrnent agencies, and universities usually use this type

t9
of computer. Exarnpie of mainframc. comp!.rters are Arndahl 580, Burroughs B 7800

a
Control Ilata CYBEit 176 and IN{B 4341.
17,4.2 MINICOMPUTERS

s t
/: /
I\{inicornputers are much smailer than mainframe computers and they are also
less expensive. These computers use integrated circuits. The cost of these computers

s
can vary from a feu' thousand doliars to several irunCred thousand dollars. They

tt p
possess most of the features found on mainframe computers, but on a more limited
scale. They czrn stiii have many terminals, but not'as many as the mainframes. I

h
rl'heiy can store a trernendous amouni clf information, trut again usually not
as much
as the mainframe. iVledium and srnall businesses typica-Il}, use these computers. The (
examples r:f hfinicomputers are PRIUIE 9755, VAX 8650, IBM System 36, etc.
I?.4.3 N{ICIT,OCOMPUTERS
N{icrocourpuiers are also known as Personal Computers. These computers
are usually divirled irrtr: desktop models anci laptop models. They are terribly
lirnited in rvhat they can do when compared to the larger rnodels discussed above
because thcy can only be r.rsed by one person at a time, they are much slower than
tlie largcr cotnputers, and they cannot store neariy as rnuch information, but they
are excellent when used in small businesses, hornes, and school classrooms, These
n
c'omputers are inexpensive and easy to use. Thcy ha.rr-, become an essential part of (i
modern life. Examples of microcomputers are IBN{ PC, AT, PS\2 and Apple
Macintosh, TIIS -- 80. (i

a-
'Orientation of Computers 293
lChapter 171

Super Computers are the most expensive computers. These pfocess billions of
instructions per second. Most people do not have a direct need for the speed and
power of a supercomputer. The supercomputers are mainly used for tasks that
require mammoth data manipulation, such as rvorldwide weather forecasting and
weapbns research. But now supercomputers are also moving toward the

m
mainstream, for activities as varied as stock analysis, automobile design, special
effects for movies, and even sophisticated artworks. Examples of super computers
are Cray-1, Cray -2 and CYBER 205.

c o
.
17.5 COMPUTER COMPONENTS
g

t
The computer is divided into two main components, namely, computer

o
p
hardware and computer software.

s
1?.6 COMPUTER HARDWARE

g
The electronic and mechanical components of a computer are known as
computer hardware. These component, .rn be physically handled. The function of

l o
these components is typically divided into three main categories: input, output, and

b
stolage. Keyboard, mouse, moniter, system unit are some of the common examples of

c'ategories
3 .
computer hardware. The computer hardware is further sub divided into four main

4
(i) Input Unit (ii) Central Processing Unlt
(iii) Secondary Stor.age

99 (iv)Output Unit

t
17.6.1 INPUT UNIT

a
Input Unit provides communication between the user and the computer. The

t
input unit performs the following three functions.

s
/: /
(a) At first, it accepts data from user.
(b) The accepted data is then converted into coded data i.e. data in computer

s
readable form.

tt p
(c) The coded data is then moved to the system for processing.

4
Examples of input unit

h
(1) Mouse is a pointing device designed to be gripped by one
hand. It has a detection device (usually a ball) on the bottom
that enables the user to control the motion of an on-screen
pointer, or cursor, by moving the mouse on a flat surface, As
the device moves across the.surface, the cursor moves across
the screen. To select items or choose commands on the screen,
the user presses a button on the mouse. :

FollowinS a"re the important terms which are associated with the actions of
mouse.
(i) Pointer: Pointer is a symbol that moves on monitor's screerl as the mouse is
rolled over a flat surface.
(ii) Point:To point means to set the pointer on a particular spot. *r4
Basic Statlstics Part-II
(iii) Clich.' To pregg and to release left mouse button ie known as click.
. (iu) Double Clich: To press and to release left mouse button twice, with a quick
action ie known as double click.
(u) Ri9ht Clich: To press and release the right mouse button is krrown as right
click.
(vt) Drag & Dropt and drop means to place the mouse pointer on an object,
-DJ"g 'taking"

m
hold down the left moucc button and then relesse the button by the

o
mouse pointer to another plaee on the screen,
(2)
c
Keyboard ic a typewriter-like deviee that allows the user to type in text and

t .
commandc to thc eomputer, The"keyboard is divided into four differ,ent groups.

o
@ Alphonumerle Kcyppd consists of alphabet keys, numtrer keys. punetuation

p
keyc, speeial eharaeter keyo and spaee bar.

s
fiil Numerle KeyBad Bervcc the dual purposc, One is to enter nurneric data and

g
the other is to move around the serecn
(ttt) Funetlon Keypad cohciets of 12 keys labelcd F1, Fz, r.8"."--."".".F'lg. These
keye perform opeaifie funetionc,
l o
. b
(la) Sereen Naulgatlon and Edtttng Keya, the group corrsists of aryow keys

3
along with conre cpeeial keys, ?heee kcyc help to move arouncl the sereen and

4
pcrform partieular taskc, Page up, Page Down, Dclete and Insert keys are

9
Eomc of the cxamples of eerccn navigation and etliting keys.
(8) Llght Pcn haa a light acnsitivc tip that is ueerl t0 drnw r:lireetly on a

t9
. eomputer's video BorecR or to ecleet inforrnation on the scree.R
by pressing a clip

a
in thc light pen or by preccing the light pen against rhc surfaee-of the sercen,

t
Thc pen contains light BenBorB that idcnti& wtrictr portion of the sereefl it ie
over,
/: / s
pacsed r
(4) Jayetleh ia a pointing deviee comBoce* of a lever rhar rnoves in multiple

s
direetione to navigatB a eursor or othc'r graphieal objeet on a eomputer cereen,

tt p
L7,8,2 OENTRAL PNOCESSING UNIT
Central Processing Unit is the msat important unit o{ the coniputer: system, It

h
ie the clectronie brain of the eomputer, In addition ro proees<ing rlnta, it conmole tho
funotion of all thc other eomponeRtc;
Maln aomBonenta and strueturc af the ecnffal proeesalng ttntt
Thc main eomponerlts of thc OPU are ae undcr:
(i)
Con*ol Unir
(ii)
Arithmetie Logie Unir
(iii) Main Memory (It ia eloecly aesociated wrth the L'PLI, in fact it is eeparateel
from it)
Worhlng of thc eentral Proecealng tlnie
(0It earricg out instruetions and.teiia tlie raBt sf tho eoriil;i.rtt r. syeterr wliat
to do, This ic done by thc control unit of the ePU whieh sende eommand
signals to the other eomponcnts of thc system,
[Chapter 17] Orlentatlon of _C-oj[!cq!erl 295
(ii) Perforn:$ arithmetic calculations arrd data rnanipulation, e.g,
comparisono, sorting, combining, dtc. The computer's calculator is a part
of the CPU knowrr as the arithmetie logic unit.
(iii) Holds data and irrstmctions which are in current use. These are kept in
' the main store ot rlLllli()fy
To unclerstanrl horv tlre rvirolc system works, consider the diagram shown

m
below. This eliagraur ehows thc l.,*.uic: cr:nrp*nents of fl gene,ralized CPU. An actual

o
ePU may have these componerite or other with differcnt names that provide the

c
satnu functions.

t .
o
eli i{ TI.t.\t P lti:t{iu $ S t N 6 UNIT

p
?

g s
l o
. b
43
99
a t
s t
/: /
s
tt p
Il*l r $L:rrv
i,'x3{111 }i!r;r1ir

Control Unit
h
The contrul uriit eontrr;lr.r and direcLc rhe operatiluls o{ thu cntile cumputer
ByctcRl. The main featrtrce of etrlrtr'ol urlrl. nl'e disetinscd an urrtler':
(i) 'l'he control trrrit rlilr.cts tlre t ntirc r:ollrlrutr:r systenr to eurry ouL Etorcd
llrograrn instruction$,
(ii) The control trnit courrriuniuateu wtlh Lloth tlru uritlruretic logie urrit and
rnain rncttror,.y.
(iii) 'I'he eontrol unit us€B th6 iriBtruction eontairrerl in the Instrtret'iort Hogiutcr
' to t{ecide which sireuits need to be uctivrrtud.
(iv) The cr"lntrol unit co-orrlinatrrli the aetisitieil of th+: r.rthor twr"r unit,s irs well
as all periplieral trttti nrutliarl, r.itortrgc tlevicel ltnlierl tri tho uompnter,
(v) Tire corltrol tinrt inatruets tho arithrn*tic loeic unit irlriuh rrnthmetic
operation or logien! CIperfltion is tu he per{brmeil
296 Basic Statistics Part-II
Arithrnetic Logic Unit
The arithmetiq logic unit contains the circuits that perform arithmetic ancl
logical operations. The frain features of arithmetir: logic uniti are disc"ssecl as under:
(i) The arithmetic logic unit executes arithmetic aird logical operations.
(ii) Arithmetic .operations irtclude addition, subtractir:n. multiplication and
division.

m
(iii) Logical operations compare numbers, letters and special characters.
(iv) Comparison operations test for three conditions:

c o
t .
. equal-to 1=; condition in which two values are the same

o
. less-than (<)condition in which one value is smaller than the other

p
o greater-than (>)condition in w'high one value is larger than the other

g s
The-arithmetic logic unit also performs logic functions such as AND, OR and

o
NOT.

l
Main Memory

. b
The Main Memory is the part df the computer that holds data and instructions

3
for process^ing. Al_ttlough it is closely associated with the CPU, in actual if is
separated from it Me.mory associated'with the CPU is also calied primary storage,

4
-
primary memory, mairrstorage, internal storage and main Inemory.

9
,

9
When we load software from a floppy disk, hard disk or CD-ROM, it is stored in

t
the main memory.

ta
There are two types of computer memory inside the computer, RANI and ROM.

for /s
RAM

RAM stands
:/ Rand,on

s
Access Memory. This is really the

p the
main store and is the place where

t t
the programs and software we load
gets stored. When

h
central
processing unit runs a program, it
fetches the program instructions
from the RAM and carries them
out. If the central processing unit
needs to store the results of
calculations it can store them in
RAM. The more RAN,I in your
computer, the larger the programs
you can run.'When we switch a
computer off, whatever is stored in
the RAM .gets erased. The
following'is a photo of a common
^ }{ chip.
lChapter l7l grientton of-Cornputers . __ Zgl
ROM .

ROni stands for Read Only


Memory. The CPU can only f'etcir
or read instructions from read only
rnemory (or IiOM). ROl\{ comes
ivith Lnstructicns permanentiy
stored inside and these
instructions cannot be over-written
om
by the computer's CPU. ROM
melnory is used for storing special
t . c
o
sets of instructions rvhich the

p
computer needs

g s
when it starts up. \\'hen we switch the computer off, the content; of thu ROM does not
become erasecl but remains store_cl permanently. Therefore it is non-volatile. The

l o
is a diaer.ggslo*lryjt e'retationshii, u*t*L"" th"
;;il;;r pr^"....i"e ri,tt
Iglellq
the main menrory (RAM anci RON{).

.b
".,a
17.6.3 SECONDARY STORAGE

3
The secondary storage is a device which provides the opporturlity to store and

4
retrieve data and information according to the requirement of the user. The data

9
and information store<i on this device can be. too, if required. The Secondary

9
"r".ud storage, backup storage or long
Storage is aiso called as auxiliary storage, external

t
term storage. The main features of the secondary storage are as foliows:

ta
(, It provides lorrg term storage to data and programs.

s
(iil It provides aclditional memory to the computer to save data.

/: /
(iii) It provide backup to main rnemory.

s
(iv) It provicles permanent storage so that electricity failure or switchin4 off

tt p
the cornputer does not affect data.
Some of the popular storage devipes are discussed below:

h
Hard Dish: The hard disk (HD) is the main secondary
storage device used to permanently store information and
consists of one or more magnetic disks contained in a box.
This is also called fixed disk and used for more storage i.e.
it can store a huge amount of data and it has faster access
speed.
Floppy Dish: A floppy disk is a rernovable storage device
that reads and writes inf,ormation magnetically onto floppy diskettes. There are two
r types of floppy diskettes. One is 5\/n inch disk capable oistoring 1.2 MB data and
other isB % lnch capable of storirry 1.44 MB data. The 3 % inch diskettes are most
commonly used diskettes and were introduced in 1987. These have a plastic exterior
shell in order to protect the thin, flexible disk inside
A harcl disli is ruounted inside the system unit and only removed for repairs or
upgrades. The floppy disk provides removable storage, giving user the abilityto take
.
298 Basic Statistics Part-II

their files with them. The drawback to the floppy- diskette is that it oniy holds 1.44
MB of information, although very few PC's are wil:hout one. 'lhis has plenty of space
for most Lext documents (Word & Excel fiies), but if file contains pictures, a floppy's
capacity may be insufficient.
Mentory Units:
|'he computef is a wond.crfttl. nochitrc o,nd, can pcr\ortn t,ariety of tash; i,ta aLl due

om
to that crnnputer has a capability to stare data/inforntotiott, into its tneinory. Th,e
capacity of cont,puter nleilLory ond storagc deuices is expressed as bits or bytes. So, we

. c
,i,, toi that biis\bytes are nteasuring urtits for corttpttter nlemory and storage.

t
Cornputers represent data digitolty. Th,ey use binory d,i,gits, which, are rtumbers

o
ttsirr,g a base 2 n;tunber system rather tlton. a decinnl (or base 10) nunr.ber systent. A

p
binary digit, cornmonfu called a bit, has a t'alue of either 0 (zera) or i (on'e). Eight bits

s
are grouped togetl-ter to represerfi a character--a letter, nunlber, or special character.

g
Thii groiry is colled, a byte. Th,e tenn.s ch,aracter and byte nlean the same thing.

l o
One byte equals only one character, the storage deuices must be colable of storin'g
thousand,s, nillions, or eL)erl biliions of bytes. To describe these large capacities, the

. b
terrns hilobrtte(Kl, meedbyte (M). and gieabyte (G)are it'sed. A hilobyte equals

3
approximotely one thousand bytes, a ntegabyte eqrtals approximately on'e million

4
bytes, and a gigabyte equals approximately or:e bitrlion bytes. (The actual number of

9
bytes in a megabyte is stightly higher because computer. storage amounts are

9
attuallv measured in base 2 numbers.)

t
Compact Dish: A compact disk commonly knorvn as CD is a secondary storage

a
t
device that reads information stored on a compact disk. It is 4.75 inches in diameter

s
and is capable of storing 650 MB. Floppy and hard disks are magnetic media; the

/: /
compact disk is an optic media. N{agnetism can simply fade a'*r'ay in time; however,
the life span of optic media is counted in tens of years, rvhich makes CD-ROM a very

s
useful tooi. CD-ROM drivps can be l-roused insidc the computer case (internal), or

tt p
connected to the computer by a cable (exterior). CD ROM's are useful for installing
programs and for running applications that install some of

h
the files to the hard drive and execute the program by
transferring the data from CD-ROM to memory, while the
program is running.
L7.6.4 OUTPUT UNIT
The devices which are used to get information or result from computer are
called output devices. 'fhese are a communication link between a computer and user.
The output can be of two typcs
(i) Soft Copy (ii) Hard Copy
Soft Copy: The results appear on monitor screen are known as
soft copy. This forrn of output is known as iemporilry output as it
cannot be retained if the computer is turned off. N{onitors and PC
projectors are two common exampies of soft copy output devices.
[Chapter 17] Orientation of

Hard Copy: The results printed on paper are known as hard


copy. This forni of output is known as permanent output as it
retained even if the computer is turned off. Printer and plotters
are the examples of hardcopy output devices.

17.7 COMPUTER SOFTWARE

m
Computer software is the set of detailed, step-by-step instructions (also called a

o
program) which give a computer the capacity to perform a specific task. The

. c
computer software is mainly divided into the following three types

t
(i) Programming I,anguages (ii) System Software (iii) Application Software
17.7.I PROGRAMMING LANGUAGES

p o
s
Software programs must be written in programming languages. Programmers

g
(Peopie qualified in programming language) write programs. Before 1gb2, the only

o
available programming language was machine language, now called a low level

l
language. A machine language is recognized by a given brand or desigh of computer

. b
processor. Machine language consists of nothing 6ut the 0s and 1s with which the

3
computer works. Machine language is difficult to learn, and early programs were

4
few and short.

9
In 1952, a new low-level programming lanluage called assembly language was

9
introduced. In assembly language, programmers use short letter codes, such as

t
ADD, that stands for addition.
In the
ta
1960s, high-level programming languages emerged. With a high-level

/: / s
Ianguage, the programmer uses simple English words and familiar mathematical
expressions. Examples of high-level languages are C, C++, PASCAL, and FORTRAN

s
L7.7.2 SYSTEM SOFTWARE

tt p
System software are the programs that manage computer resources at a low

h
level. system software are categorized into the following two types:
(i) Operating system
(ii) Languageprocessor
Operating System
Operating system is the most important program that runs on a computer.
Every general"purpose computer must have an operating system to run other
programs. Operating systems perform basic tasks, such as recognizing input from
the keyboard, sending output to the display screen, keeping track of files and
directories on the disk, and controlling peripheral devices such as disk drives and
printers.
300
Basic Statistics part-Il
For large systems, the
operating system has eVen
Applicati+n
greater responsibilities and
powers. It makes sure that
different programs and
users running at the same
time do not interfere with
each other. The operating
system is also responsible 4*"s
om
.c
for security, ensuring that I
t
use
*l
Ilfic)
uneuthorized users do not

o
access the system

p
Operating
s
systems
provide a software platform

g
on top of which other

o
l Osi;;;#ffiil:
programs, called application Frrr( er

b
progranxs, canrun. The

ll:I*'Tl:i3T: *"t
.
b"

3
For PCs, the most popular operating systems are DOS,

4
Dish Operating System (Dos)

9
. Dos stands for "Disk operating system". It is a generic

9
name for the basic

t
IBM PC operating system. Severai variants or oos are available,
inciuding

a
Microsoft's version of Dos o/rs-Dos), IBM's r"r.lo.,

t
There's even a free version of DOS called Open DOS. '
rpc-Dos), and. severai others.

/: / s
There are actually several levels to Dos. At the lowest
]evel is the BIos (Basic
Inpuuoutput System) which is responsible *u.rugirrg devices like the keyboards
'the simplest possible {or "...orrd

s
31{ disk drives qt level. trr" layer provides a set of
higher level serviies implemented ,.1"g trr; r"*-r"""r gios

tt p
is the command interpreter (or streit;. The shell's
,".ui.".. trr" third layer
prompt on the screen to let user to type a command,. ih".rl;u L to display a command

h
typed command. The user .rn .u""y'-out qny ;;;G;;""t "eJ-and
to inLrpret trre
e.g. writing/composing
documents, graphics and games etc with in the environment
.f Dos.
Longiage Processor
tangtfage processor also known as translator ls a software qre that
language translation. It has three types:
!'au performs
(i) Assembler is a program that . converts assembly language into
machine
ranguage.
(ii) A compiler is a computer program that translates a high-level programming
language into machine language. The compiler'usually
converts the high-level
language into assembly language first, and then iranslates
language into machine languag"..Th" p.ogru* i.Ji"t"
the assembly
tt" ;;j;.'is called
the source program; the generated machi-ne language program is
object program. e- r- called the
lChapter lTl Orient@
(iii) Interpreter is a program which execuies source code by reading it one line at a
time and performing each instruction immecliately. An interpreter is different
from a compiler, which does not execute the source code, but translates it into
object code (machine language) w"hich is stored in a fiie urra ur,*.rted later.
some programming languages must be interpreted; some can be both
interpreted and compiled.

m
I7.7.3 APPLICATION SOFTWARE

o
Application softrvare is a complete, self-contained program that performs

c
a

.
{ specific function directly for the user. These are also known o, pr.t rgu*
s;;;
t
I
t of the examples for Application Software are rvord proc".*or., .p.uid sheets,'

o
.i!
data bases, graphics and game packages etc.

p
17.8 BASIC IDEA OF WRITING AND RUNNING A CONIPUTER PROGRAM

s
A program is an ordered set of computer instructions which enable a computer

g
to perform a specific task. Programming is a process in which a program
is written.

l o
This process is made up of following steps
(i) Program design
b
(ii) Program rvriting
- (iii) Testing
.
and debugging.

3
(iv) Documentation, implementation and maintenance

4
17.8.1 PROGRAM DESIGN

9
Program design has two phases

9
(a) Function Definition: Clearly define the data requirements (input &

a t
output) of the program and the purpose of the program.

t
(b ) First Leuel of Refinement: The data to be used is

s
defined in detaii and

/: /
the main operations of the program are also defined.
17.8.2 PROGRAM WRITING
(a) I)escribe each operation in ,letail. This can be done with the help of detailed
s
tt p
program flowcharts or structure diagrams
(b) The next step is to code the program. Code means to write tlown the program in

h
suitable programnting language.
(c) Then, feed the program into the computer. (This may be done as the program is
coded)
I7,T.3 TESTING AND DEBUGGING
(a) In thi.s phase the .first step is to eliminate the compile time errors of the
program.
(b) Then the run time errors are eliminated.
(e) The program is run with test data.
(d) The test data provides a check on every possible situation.
(e) The accurate results of running the program with the test data have been
solved manually in advance
(0 The programmer uses various debugging aids to correct the program until the
expected output is produced when the test data is used.
7
302 Basic Statistics Part-II
17.8,4 DOCUMENTATION, IMPLEMENTATION AND MAINTENANCE:
(a) The docurnents are produced. for programmers and users before the program is
made available.
(b) The new system of which program forms pari then has to be implernentecl and
maintained.
17.9 NUMBER SYSTEM
The organization and <iesign of a computer is dependent upon the number

om
c
systems, as, data is manipulated and stored in the computer in the coded numeral

.
formats. To understancl these coded numcral formats, it is necessary tb have
knowledge of different number s)'stcrxs.

ot
p
1?.9.1 DECIMAL NUNIBER SYSTEM

s
Decimal number system is most widely used number system of the world. It

g
consists of ten digits, ranging frorn 0 -9. The base of decimal system is considered to

o
be 10. In this system, the successive positions to the left of the decimal point

l
represent unit, tens, hundreds etc.

. b
17,9.2 BINARY NUMBER SYSTEM

3
' Binary number system suits to electronic machines. It is composed of two

4
digits, 0 and 1. The base of this system is two.

9
17.9.3 OCTAL NUMBER SYSTEM

r]J ;' ff ffit ii,x-Jl,**ut9


The octal number system is composed of eight digits, ranging from 0 to 7. The

fr
ta o

/ s
I'he hexadecimal number system 'YSTEM
is composed of sixteen digits. The first 10

:/
digits of this system are 0 to 9 and the remaining six digits ar'e denoted by A, B, C,

s
D, E, F representing the decimal valiles 10, 11, 12,73,14, L5 respectively.

tt p
17.10 BINARY NUMBER SYSTEM AS A FOUNDATION OF COMPUTER
PROGRAIITMING

h
A computer is capabie to carry our mathematical computations and it can also
store data. The computers' tasks, like, rnathernatical computations, storing data are
all about rnanipulating, numbers.
In our routinc.life rve usc the nurnbcr system of tan digits 0,1.2,3,4,5,6,7,8,9.
It's a base 10 gumber system and knorvn as decimal number systern. Computers
don't use the ten digits of the decimal system for counting and arithmetic. Their
CPI.I and lnernory are rnade up of millioirs of tiny switches that can be either ON or
OEI,'. Two rligits, 0 and 1, can bc used to stand for the tlvo states of ON ancl OF-F. So,
the conrputerrs couid work with a nurnber s5'stem baseci on two digits. This type of
s1'slcra is called a hinnry number systeni.
T'hc dec:irnal st,stcrn is bascr.l on piace, o:' Iocation. T'hat is. the p)act- of each ctigit
tells thr, vrrlut'rrf that digit,. ['or cxilmi.,le. the nurnber 1T hzr; a 7 in the one's placc and a
I in thc tr:rt'i, plili:c. In other rvor,:is, tr ten plus ? cnes r:tluals 17.'l'hr: nrrrnbc,r l38 has:r 1
17I Orientation of Com 303
in the hundred's place, a 3 in the ten's p14ce, and an 8 in the one's place. Written in
numerals this is (1X100) + (3X10) + (8X1) =138.
The binary system works in exactly the same way, except that its place value is
based on the number two. In the binary system, we have the one's place, the two's
place, the four's place, the eight's place, the sixteen's place, and so on. Each place in
the number represents two times (2X's) the place to its right
-
Here's a comparison of decimal and binary numbers:

om
Decimal Binary Decimal

t . c
Binary

0 0 6

p o 110

, 10

g s I 111

3 .11
o
lI
b
8 1000

3
100
. 1001

9 4101 10 1010

t9
ta *--.
s
.1
/: / J.
s =O

tt p
h
Since the computer is really made up of tiny switches that can be either OFF or
)N, binary numbers can be seen as a series of light switches; A 1 represents a
witch that is ON, and a 0 means a switch that is O!'lr.

Numbers can become rather long in the binary system. For example, to show
,he number 10, we need four light switches, or four: places. However, the real
sulches inside a computer are tiny and they are able to turn on and off very rapidly.
The binary number system suits a computer extremely well and it's the main
foundation of computer llrcgramming.
7

304 Basic Statistics Part-II


MULTIPLE - CHOICE QUESTIONS
l. Interpreter is a type of:
(a) language processor (b) apy:licationsoftware
(c) storage device (d) computer hardware
o One byte eqtrals:
(a) 8 bits (b) 4 bits

om
c
(c) 6 bits (,f) 12 bits
3. General purpose cr:mputers are also known as:

t .
o
(a) hybrid computers (b) digital computers
(c) analog computers (d)
s
super computers
p
g
4. PRIME 9755 is one of the examples of:

l o
(a) minicomputers (b) super qomputers

b
(c) microcomputers (d)
.
mainframe computers

3
D. The electronic and mechanical components of a cornputer are known as:

4
(a) I
computer software (l) computer hardware

9 . (d)
(c) none ofthe above both (a) and (b)
6.

t9
Drag and drop is a term associated with:

ta
(a) mouse (b) keyboard

s
(c) printer (d)
/
.scanner
l.

:/
All the arithmetic and logical data manipulation is done by the:

s
tt pdisk
(c) control unit (d) main memory
When we switch the computer off, the data vanishes which lies on:

h
8.
(a) hard (b) compact disk
(c) RAM (d) floppy disk
9. The 3 % inches diskette can store data of the size:
(a) 1.44 NIB (b) 1.2 MB
(c) 2.1 MB (d) 1.54 MB
10. The diameter of a compact disk is:
(a) 4.75 inches (b) 4.85 inches
(c) 4.65 inches (d) 4.55 inches
11. The time'period of third. generation of computers is:
(a) 1965-70 (b) 1980-onwards
(c) 1959-65 (d) re42-65
t2. The currently used data and instructions arc held in:
(a) main memory (b) control unrt
(c) arithmetic logic unit (d) hard di,gk
18. The secondary storage is also known as:
(a) long term storage (b) baekup storage

m
(c) none of the abovc (d) both (a) and (b)

o
14. C++ ie an example of:

c
.

.
(a) high level language (b) Iow level language

16.
(c) nssembly language (d) none of the ahove
ot
p
The analytical engine was invented in:
(a) 1730s
s
(b) 1830s

g
(c) 1gllOs (d) 103$s

l o
16. A hybrid computer is the eombinatiorr of thc best t'eatures of:

b
(a) analag and digital eomputerB (b) mieroeompufers and minicomputerB
.
(e) mainframe and cuper computerc (el) digital and supel eomputere
3
4
1?. Super eomputers ean prCIeees billion of instruetirlns;
(a) pur B€eond
9
(b) per miero BeeCInd

9 (tr)
(s) per rninute (tl) lrer'lrour

t
.

a (el)
18, Funetion koypad coneiste ofl

t
(ir) 1! kuys U keyn

/: / s
(c) B kpyc 14 keys
19, Joysttck iH aR example of:

s
(a) input eleviees (b) out Irut devrces

tt p
(u) proeeesingrleviees ({'l) drtur,rigudevieec

h
90, Which of tlre {irllowing ic nur an ari[h nlutic 0l,eriiti(,rl:
(a) adrlitlon (l)) Hr'uHtr,t,tlihtl
(c). subtraerion (rlt rtrulr,iplru*tion
TI, A binary iligit i* eommonly calloa:
(a) bir 1lr) h5,tr,
(e) ltiltil,l'iir trl) grgrliytu
13' ^ IJrrlLru lil5*, tho unly availablrr prrigr{lniining i*ngurigCI waBi
(u) ttinultini,i lrringuaga (b) aeeombly Langunge .
(c) ( i l.ruigtulge (d) rione of tlre abovc
IlB, The first cornmerclally avnilnble corn;rubrrr wRH:
(a) UNIVAC (tr) Ir;NIAe
(et Mnrlt I (d) nnnlytic*i engine
306 Basic Statistics Paft-II
24. First generdtion computers utilized:
(a) vacuum tubes (b) transistors
(c) integrated circuit (d) none ofthe above
26. Program design is made up of two phases, including:
(a) function design and first level of refinement
(b)
m
function design and program writing
(c)
o
testing and debugging

c
(d) documentation andimplementati<.rn

t . 8.
I (a) 2. (a) 3. (b) 4. (a) s. (b) 6. (a)

p
7.
o (b) , {c)

s
9. (a) 10. (a) t 1. (a) .r2. (a) 13. (d) 14. (a) ls. (b) 16. (a)
17. (a) 18. (a) 19. (a) 20. (b) 2t. (a)

og
22. (a) 23. (a) 24. (a)

l
25. (a)

sHoRT QUESTTONS
. b
1.
43
Differentiate between hardware and software.

9
2, Differentiate between control unit and arithmetic logic unit.

9
3.
t
Differentiate between ram and rom.

a
4, Differentiate between inain memory and secondary storage.
5.
s t
Differentiate between hard copy and soft copy.

/: /
6. Differentiate between system software and application software.
7. Differentiate between assembler and compiler.
8.
s
Difierentiate bptween decimal number system and biirary number system.

tt p
9. Differentiate between floppy disk and compact disk.

h
10. Differentiate between low level languages and high level languages.
11. Difi'ererrtiate between anaiog computers and digital computers.
12. Write. dorvn the different types of computers.
13. Write a short note on computer history.
L4. Define the central processing unit.
15. Write a note on disk operating system.
16. What do you know about classification of computers?
17. trIhat are the rnain components of central processing unit?
18. Write down different types of language processors.
,

19. What is meant by programming?


E
Statistical Tabies 307

STATISTICAL TABLES
RANDOM }..{UMBERS
100973 2533 . 7652013586 3467 354876 80959091 17 39 29 27 49 45
37 54 20 48 05 64 BS .47 4296 24 80 52 4A 3i 20 63 61 04 0? 00 82 29 16 65
os +z zo ss sg L9 64 50 93 03 23 20 90 25 60 15 95 33 4? 6'1 35 08 03 36 06

m
99 01 90 25 29 09 37 67 07 15 38 31 13 11 65 8887 6i 4397 04 43 62 i6 59
12 80 79 99 70 80 15 73 61 47 64 03 23 66 53. 1.2 L7 ti 68

o
98 95 11 68 77 33

c
66 A6 57 47 17 34 97 z',t 68 50 36 69 73 61 7d 65 81 33 98 85 11 19 92 9,r. 70

.
31 06 01 08 05 45 57 t8 24 06 35 30 34 26 14

t
86 79 90 74 39 23 40 30 97 32
85 26 97 76 02 02 05 i6 56 92 68 66 57 48 18 ',73 05 38 52 47 18 62 38 85 79

o
63 57 33 21 35 05 32 54 70 48 90 55 35 75 48 28 46 82 87 A9 83 49 t2 56 24

p
73 79 64 57 53 03 5296 47 78 35 80 83 42 82 60 93 52 03 44 35 27 38 84 35

s
98 52 0t 77 67 14 90 56 86 07 22 10 94 05 58 60 97 09 34 33 50 50 07 39 98

g
11 80 50 54 31 39 B0 82 i7 32 50 72 56 82 48 29 10 52 42 0L 52 77 56 78 51
l37167 AA 78

o
83 45 29 96 34 06 28 89 80 8A 18 47 53 06 10 68 71 L7 78 17

l
88 68 54 02 00 86 50 74 84 01 36 76 66 79 51 90 36 i17 64 93 29 60 91 10 62

b
99 59 46 73 48 87 5li6 49 69 91 82 60 89 28 93 78 56 13 68 23 47 83 4t t3
65 48 11 7674 1? 46 85 09 50
80 12 43 56 35 l7 '.72 ?0 8A $
58A4i7 6974

3
45 31 82 23 74
. 73 03 95 71 86
21 11 57 82 53
40 21
14 38
8t
55
65 44
37 63

4
?4 35 09 98 17 77 40 27 72 11 43 23 60 02 10 45 52 16 42 37 96 28 60 26 55

9
69 91 62 68 03 66 25 22 91 48 36 93 68 ?2 03 76 62 11 39 90 94 40 05 64 18
09 89 32 05 05 t4 22 56 85 t4 46 12 i5 67 88

9
96 29 77 88 22 54 38 21 45 98

t
gt 49 sr 45 23 68 47 92 76 86 46 16 28 35 54 94 75 08 gg 23 37.08'92 00 48

a
80 33 69 45 98 26 94 03 68 58 70 29 i3 41 35 53 14 03 33 40 42 05 08 23 4t

t
44 t0 48 19 49 85 15 74 79 54 32 97 92 65 75 57 60 04 08 81 22 22 20 64 13

s
t2 55 o7 37 42 11 10 00 20 40 t2 86 oi 16 97 70 72 58't5

/: /
96 64 48 94 39 28
63 60 64 93 29 16 50 53 44 84 40 21 95 25 63 43 65 17 70 82 a7 20'73 17 90
61 19 69 04 46 26 45 i177 74 5192 *.3 37 29 65 39 45 95 93

s
42 5826 A5 27
1541', 445266 9527079953 5936783848 82396101 18 3321 159466

tt p
941 55 72 85 73 67 89 75 43 87
-o4
62 24 4"* 31 $1 19 04 25 92 92 92 74 59 73
42 48 71 62 L3 97 3.1 40 8? 21 16 E6 84 8i 67 ' 03 07 11 20 59 25 70 14t'6 70
23 52 37 83 t7 . 73 2A8 98 37 68 93 59 14 16 2625 2296 63

h
05 52 2825 62
04 49 35 24 94 75 24 63 38 21 ,15 86 25 10 25 61 96 :l i 93 35 65 33 71 24 72
00 54 99 76 54 64 05 i.8 81 59 96 1196 38 96 5.169 28 23 91 23 28 72 95 29
35 96 31 53 07 26 89 B0 93 54 33 35 13 54 62 77 97 45 A0 24 90 10 33 93 33
59 80 B0 83 91 45 42 72 6E 42 83 60 94 97 00 13 02 12 48 92 78 56 52 01 06
46 05 88 52 36 01 39 00 22 86 77 28 i4 40 77 9:i 91 08 36 47 . 70 6t 74 29 4t
32 1? 90 05 t)7 87 37 92 52 4t 05 56 70 70 0? 86 71 31 7r 57 85 39 41 18 38
69 23 46 14 06 20 11 7l 52 0.1 15 S5 66 00 00 L874392423 97 11 89 63 38
19 56 54 14 30 01 75 8? 53 79 4U 41 [}2 15 85 6ti 67 43 68 06 ' 84 99 28 52 07
45 15 51 49 38 19 47 60 72 46 4'.i 66 i9 4:,"43 59 04 79 00 33 20 82 66 95 41
94 86 43 19 94 36 16 81 08 51 34 88 88 15 53 01 54 03 54 56 05 01 45 i1 76
'308 Baslc Statlstlce Part'II
I
AREAS i
UNDEB THE
STANDABD
}.{ORMAI, CUNVE
from 0 to z
(4-places of decimals)

om
c
ozsg o'ozz9 0'0319 0'0359

.
0.06?5 0.0714 0.0764

t
0:; 0.0ggslo.oaea 0,04?8 o.ortr o.orsr 0.06e0 0,0636
0.2 0.0793 0.0832 0,0871]0,0910 0.0948 0,098? 0.1026 0.1064 0'I103 9.1111

o
0:5 0:iiie 0.irit o.tzrs 0,12e8 0.1831 0.1368 0.1406 0,1443 0.1480 0'1517

p
o.+ o.rosa o,rrsr o,ieie o,tce+ 0.1?00 0.1?36 o.Li72 0,1808lo.teaa10.19i9

s
o.d o.rsrr 0.tgbo 0.t986 0.2019 0,:t054 0.2088 0.2128 0.215710.2190 0,2224

g
o.e o,zgsa 0,21{}1 0,2ts24 0.235? r0.2389 0.2422 0.2454 0.2486 0,2618 0.2649
0,27M 0,2734 0.2764 0.27e4 0.28?q 9,?t32-

o
0:i 0:rBB0 i,,g6rg 0,ro.1z o.zE?B

l
o.a o,za8r 0.q910 0,9s39 0,296? 0,2996 0,3023 0.8061 0.3078 0,3106 0 31r8

b
0:s 0:5i6rJ 0.B1Bo 0,Bt1s 0,8238 0.ts264 0.3289 0,3316 0.3840 0,3866 0.;JBBg
i-Ii
.
i,o,o,sl18 6,stBB 0,s481 0,8486 0.3608 0,8531 0,86b4 i0'8790 0,q6?7 0.B6es G

3
i,i o,ge+s 0,Bag5 o,geco 0,8?08 0,8?29 0.9?49:0,8770
i0,8980
r0,8E10,0,8830

4
0.8963 i O,SggZ
1,s r o,Ba.ts o,iisas o:HSs O,ssOr 0,ss26 O;aga+ r i0.40lq
i:rt 0:;ii,ig o,+sag,0:;0aB o,+oas;0,40ee'0,+ru 0,'113x ts,4t47 0 416? lq'41?1

9
i:; ;,;is; 0:fioi o:,iriz;0:,iiss s,azsr 0,'4266 0,'rs?e 0,4s0s 0"1306 9'4q19

9
i,o o,isbs o,ag+o 0:;#t 0,45i0 0,4sqs 0.43s4 0,440610,e*ra 0,4{?9 a^,*l!

t
i,e o,i+ei o,qaffi 6i,i*ia;o,a+aa o,++oti g,4qqp 9,161qiq,qg?qr9'1999 I *g*g

a
i:; 0.;66; 0,;6il 0,;6isiCI,;dedlo:i{isi o:iest 0,4s8s 0,4qgq q,4q16ia,qaas,g'eqqq
;;;ili
t
i,8 o,.ias o,is66io,+ac+ 0,46?1 0.46?8 0,+ess 0,-16ss 0,4706

s
;:fi;;
0:;is6 ai,'t1*a o,+iso 0,4?56 q'+Zqt q'4191

/: /
1,s 0,4?18 0,'r?lt, 0,;Fss
g,O O,af tg 0,4??A o,,ireg O,af eC 0,4?98 0,4?gB 0,4608 0,4899 0,4819 0,4817
*,i s,+ati o,+see u,ieis o,aas* 0,4EsB 0,4842 0,4s16 q,4q50 0,4864 0,486?

s
z,t u,ratri O,agila O,ieBB O,+Cit 0,48?6 0,487s 0,,1681 0,'18s'1 q,4q91 g,49qq

tt p
0,4s06 0,4g0sl0,4911 g,491qlq,4qlq I
0,5 CI,ABgs o,agg6 0:,lBsB o,+eot CI,4004
i,; o,irli5 CI,iseo s,-iss; 0,;srl5 a,4sst CI,4sso 6,4631 q,4?Bg q'49q4lq'q99 I
u,+u1e g'4949i0'4961 0'4g6s

h
i,6 0,;CI3* o,agao 0,'is4i,o,as+B 0,'4t)'161CI,4848
0,4063 a'4944
i,6 0,itlH 0,;0BE 0,;f,EA CI,;s6? l0,4sqp 0,4s60'0,n1s61 r0,4BBE
t0,4974
A,i o,ibgB O,+ggS 0,,i0Bi O,iOAg O,+geg CI,+g?s 0,4gll 0,49?Ai0,4B?B 0'4sq1
I

:
a,E 0,4s?4 0;4t)?5 o,.isio s,'isri o,aoi? u,'iote q,+qlg 9,+9?s 0'4ss0
;,g,il,;gAi,O,,iels[ g,,ifrei O+bga 0,+08e 0,'l0s-1 0,408610,+986 0'4qqg g,'19pq {
6,0,o,;sEi 0:;68? o,irla? r0,;t,8810,4s88 0,4rls8:0,4sas 9"1q69 0'4q9g l g'4qqq
,i,i ii,,it00 o,,iriur o,,iusi o,+ipt tt,+gga 0,4ssg 0,4ssa 0,4gsa 0,4ggg 0'4ggg
Bg 0,,igBB U,+ggB u,,i{tr,-t fi,*gg+ 0,49$4 0,,lgg,l 0,'1q94 0,a806 0,4996 0,4ggq
r0'49gB 0'4ggs 0'499?
5,5 0,;0srr ii,+srra CI,ilrr,.i s,aoge r0,4r,09 0,'1996 0,4ggg
s,alo,iCIu? u,'Jiltl? ri,irlet u,atot o,'lrlu? 0,40s? q'4?q?'?'+?s1 0"1987 0'4gge
g,ii 0,,{096 il,,t[,UlB ti,igg8 O,agf}A 0,4SBB O,ag0S1O,*ggqi0,49liB 0,4998;0,4988
8,6l6,;ffi8ro,i;fii 0'iffiu q,'iqqe Q,'iqgq 9,1??9ig'lineIg'11?PIP'19P9ig'fggg I
$,; 0,i0s6 lo,;tlsd o,,i[iis 0,ass0lo,+sss 0,,1sse io,asss 1o,*sss l0,4sss I 0,4sqq t
ll,8 0,;sCI0'oir,ffi CI,;;t]rl ;,;r,il q,i?qa q,*egq 9'19?9lg'1?P?tg'*1PP,P'lPPl
:

=-
I,s g,rit)CIoIr,,6ii00 ir,trioo 0,rr0qg*CI-:aglg 0 eCICIq-g,Egggs.,aq0qrl-Eqgorg
-***.,'..*,*.-J;;-;ij:**;;id;;l;-;.=s#3
:*= :",@
-
StaUstical Tables

AREA UNDER THE


STAI\'DARD NORMAL CURVE
From 0 to z ( 5-places of decimals)

z .00 .01 .0t .03 ,04 ,05 .06 ,07 ,08 .09
0.0

m
fl .00000 .00399 L00798 ,01 197 ,01695 .01994 ,02392 ;02?90 .03 r 88 ,03586
o.l fl.oasaa .04380 Lonta ,05172 .05567 ,06962 ,06366 ,07142 .07635

o
,06749
0.2 fl .ozs2o .o8sr? I .oszoo ,09096 ,09483 .09871 .10267 ,10042 I 1026 ,1 1409

c
.
o.s ll .rrzsr .rzr7z I rzosz

.
. .12930 .13307 ,10683 ,14068 .1443 1 ,14S08 .16173
0.4 ll 15542 .1b910 Lrczlo

t
.16640 . r7008 .17964 ,17726 .18082 .18430 .18798
o.s ll rslae .19492 Lrssar

o
.20184 .20540 .20884 .21226 .21666 .21904 .22240
0.6 ll .zzeis .2zso7 | .ztzst .29566 ,23891 .24,2t5 ,24637 :24851 .25t75 .26490

p
o.z ll .zsaoa .2611b | .zsttzt .26730 ._27035 ,273 7 ,27637 .21936 .28230 ..zssz+

s
0.8 fl .zear+ .29rog I .zgssa .29673 .2e9q5 ,30234 .30511 .30786 .33646 ,33891
0.9 ll .31594 .3r8ss l.szrzr .32639 5szas4

g
.3238i .33t47 .33398 .3364G .33891
i.ootll .sars+ .B4szs I .eaora .34850 .35083 .35314 .35543 -35?69_ .35993 .36214

o
-.sz+b3 -:38100
t. r ' I!!',sais3 .86650 I .sosaa .37076

l
.37286 .ezossl .37900 -.39973 .28298
1.2 . ll'.6ae3 .88686 Lssszz .39065 .39251 .B94rE ,ggotz\ #s7s{ .40L47

b
1.5 i0220 .40490 I .aoass .40824 .tt7tt

.
.40988 .4Lt49 .41309 .41466: .srdzt
r.4 ll :aie2a .4zo7l l .uzzo .42364 .42547 .42647 .42786 .1?e22 .43056 .43189

3
'.44295
1.b ll .assrs .42448 I .+sst a .43699 .43822 ,43943 .44062 .44-t70 .itnoa

4
r.6 ll .++szo .44680 I .++zss .44845 .44950 .45053 .45154 .45254 .45352 .45443
1.7 ll .455.43

9
.4568? I .q"1zs .45818 .45907 .45994 .46080 .46164 .46246 .46327
1.8 ll .46407 .4648b | .+osoz .46638 .46784 .46856 .46926 .Ef;gl5 .47062

9
.4671,2
l,e .uzst .47558 ,:476L6

t
ll .4? 128 .47LsB | .47320 .47381 .47441 .4?5.00 .47979
z.o ll.+ttzs .477781 .4?831 .47882 .4793?. .47982 .48030 .48077 .48L24 .48169

a
z.\ ll .qszw

t
.4E257 I .asaoo .48341 .48382 .48422 .48461 .48500 .48537 .48574
2.2 ll .aaoro .48645 I .+eoic .48?13
'.48718
.48840 .48870 .48899

s
.487 45 .48809
2.3 ll .48e28

/: /
.48956 | .assss .49010 .49036 .49061 .49086 .49111 ,49L34 .49158
2.4 Jl .4er80 .1gzo2l ,tgtz,t .49245 .49266 .49286 .49305 .49324 .49243 .49361
2.5 4e37e ..r9a9s | .aon,3 .49506 .49520
2.6 llll .49534
.49.130 .49446 .49461 .49477 .4949p

s
.49547 | ..19560 .49573 .49585 .49598 .ac6og .49621 .49632 .49643
2.7 ll .+goss

tt p
.49064 I .49G7,1 .49683 .49693 .49702 .4S7tl .49720 .49728 .49736
2.8 ll .+oz+.r ,4s7b2l .+szeo .49767 .4977 4 .49781 .49788 .49795 .49801 .49807
2.9 ll .4e8r 3 4e8le | .agszs .49831 .49836 .49841 .49846 .49851 .49856 .49861

h
B.o ll .agsos 1e869 I .+lat,a .49878 ,49882' .49886 .49889 ,49897 .49897 .49800
a. r ll ..lsgoe asgoo L"lgsro .499 1 3 .49916 .49918 .49921 .49924 .49S26 ..1(]929
s.2 ll .asge r lgseq I .4993G .49038 .49910 ,49942 .49944 ,49946 .49948 ."r!)950
3,4 ll ,agcsz .rscss I .4e9sr. ,4995? .49958 .49960 ,49S61 .4C982 .4996,1 ,.$gtii,
s,4 ll .+gsoe 4ge6g I .4ee6e .49970 ,4997 r .49912 ,499?3 ,49074 ,4007s' ,,1t)0?(i
3b ll .rsoi;
B,o ll ,.rssul
3,7 ll .401i8e
3.8 ll ..1{)flf'3
3.9' ll ..191195
4.0 ll asosz

t
310 Basic Statistlcs Part-II
Ordinates (Y)
of the
Standard ,-
Normal Curve
. atz

m
0.010.3989 l0.3e8e 0.3989 0.3988 0.3986,0.3984 0.3982i0.3980 0.39?7r0.S973

o
0.1 0.3970 0.3966 0,3961 0.3956 tytvyv-
0.3951 iO.SgAe 0.3939 0.3932 0.3925 0.3918

c
1 1 v.Uuu& v.U

.
0.21 0.8910 I o.ssoz 0,8894 0.3886 i0.3876 0.3867 i0.3857 0.3847;0,3836 0.3825

t
0,31 0.8814 1 0.8802 0,0790 g.l11q : 0.31q5 0.3752 0.373e'10,3726' i0,3712
9.q11q l0.q1g510.3752 i0.373e i0.3725 a37 t2;;0.36e7
0.36e?

o
0.4 0.868e 10.3668 0.8669 0.3637 0.3621:0,3605 0.3589: 0.3572 0.3555 0.3538
0.6i0.862110,3603 0.3486
9.q1q7i9._3448i0,342e 0.3410 0.33e1 a,3372 0.3362

p
0.61 0.3332 I 0.3312 0.s292 0.3271 iO.SZgt 0.3230 i0.3209 0.3187 l0.3rOC 0.3r44

s
0.7 0.3123 10.0I01 0.8079 0.3066,0.3031 0,3011 0.2989 0.2966 0.2943 0.2S20

g
0.810.2897 0.2874 0.2860 0.282t ; O.ZSOS : 0.27 80 0.27 56 0,27 32 0.27 0g 0.2685

o
0.910.2661 10.2637 0.2613 0.2589 0.2565 0.2541,0.2516 0.2492 0.2"168,0.2444

l
1.01 0.2420 I O.Zgso 0.237t 0.2347 | 0.2329 O.22gg 0.227 5 0.225L' 0.2227.') O.22AS

b
1.1 i 0.2179 I O.ZTSS 0.2131 0.2L07i0.2083 0.2059. 0.2036 0.2012 0.1989iO.tgOS

.
L.2 0.Ls4Z 10.1919 0.1895 0.1872:O.r8+g 0.1826 0.1804 {.1781 ,0.1Tb8 0.1286

3
1.3 i 0.171410.1691 0.1669 0.L647,0.1626 0.1604 0.1582 0.1561 0.1539 0.1518

4
1.410.1497 i0.1476 0.1456 0.1435r0.r415 0.1394 0.1374 0.1354 :0.1334 0.131b

9
1.510.1295 10.L276
9 1?9!i9 1?1?,!.1200 0.1182Io.rrog io.ir+s o.Lt27
0.L257
1.610.110910,1092 0.1074 0.108? 10.1040 ro.rozg lo.rooe , o.ogsg I o.oszg 0.0952

t9
1.71 0.0940 0.0925
1 0.0909
0.0893 0.08?8, 0.0863 i O.O8+A i 0.0833 t O.OStS r 0.0804
1

1.81 0.0790 10.0775 0.0748 0.0734 0.0721 0.0707 0.0694 0.0681 :0.0669
0.0761

ta
1.9 i0.0656 i0.0644 0.0620i0.0608 0.0596 0.0584 0.0573 0.0562 0.0551
0.0632

s
2.01 0.0540 I 0.0529 0.0519
0.0508 i0.0498 0.0488 i0.0478 i0.0468 0.0459 0.0449

/: /
2.1 0.0440 0.0431 0.0422 0.0413:0.0404 0.0396:0.0387 0.0929 0.0821 0.0363
1 1

2.2t0.0355 10.0347 0.0339 0.0332 i0.0325 ;0.0317 10.0310 0.0303 o.azg7 0.0290
2,310.0283 i0.0277 0,027Q 0,0264 iO.Oere ,a,0262,0.0246 0.0241 0.0235 O.O22g

s
2.410,0224 io.02le 0,031CI 0,0208 i0,020a i0.0198 io.otg+ ,0.0189 0.0i84 0,0180

tt p
2,61 0,0r76 I 0,0171 0,016? 0,018810,0168 0,0164i0,0161 0,0147 0,0143 0,Olse
2,61 0,0136 I 0.0182 0,012s 0,0126 0.0122 0,0119 , o.ot to 0.01 13 O.0l 10 0,010?

h
1

2,71 0,0104 0,0101 0,00e9 0.0096 10,0098 ;0,0091 I 0.0088 0.0086 0.00E4 0,0081
1

2.81 0,00?9 I 0,0077 0,0076 0,007810,0071 0,0069 0 0067 0.0066 0.006s ,0,0061
2,81 0.0060 0,008a 0,0066 0,005510,0058 0,0061 ,0,0050 0.00{8 0,004?
1
lo,goae
8,01 0,0044 i 0,004a 0,0048 0,0040 i0,0089 0,0038 0,0087 0,0086 0,Q03s r0,00s4-
8,1 I 0,00991 0,009a 0,0081 0,00s010,0099 0,0028.0,0027 0,0029 0,0096io.o0as
L!1 0,0024 0,0088 0,0022 0,009a l0,0o2t 0.00?0 0,00?0 o,0org r0,0016 io;oota
1

8.910,00u 1.0.0017 0,0016 0.0016ig,0otr 0,0016 0,0014 0.0014100018 0,001s


B,4io,00l2lo,0o1, 0,0012 0.0011 r0,0011 0.0010 0.0010 0,0010 i0.000e 0,000e
8,61 0,0009 0,0008 0.oCI08 0.000810,0008r0,0007 0.000?'0.000? lo.ooor 0.0006
1

8,61 0.0006 I 0,0006 0,0008 0.0006 i-r-FEY'v|YYvY


io,opgo io,ooon 10,0006 0.0006 0.0008 0.0004
8,710,0004 10,0004 0,0CI04 0,0004lo.oooa 0.0004 i0.000s 0.0003 u.000a 0.000s
8,81 0,0008 0,000$ 0,000s 0,0008 I o,0o0a I 0,00u4 lo.ooo* 0.0002 O.urloa 0,000?
1

8,41 0,0002 0,000s 0,0009 ggrcggg?=lg4sfu


1
Ii
g, g€.=?,qqg!_ g.qqgt g
qqq_l
"

You might also like