Professional Documents
Culture Documents
OF SURVEYS
WITH APPLICATIONS
Other Books on Statistics-
By GEORGE W. SNEDECOR:
Statistical Methods
Published by:
The Iowa State College Press, Ames, Iowa, U.S.A.
Sampling Theory
of Surveys
with Applications
by
PANDURANG V. SUKHATME
Ph.D., D.Sc.
Chief, Statistics Branch, Economics Division,
Food and Agriculture Organization of the
United Nations.. Formerly Statistical Adviser,
Indian Council of Agricultural Research, New
Delhi, India.
U A.~. BANGAtORE
UN'VfASJTY UBRARY,
I 1 DEC 1971
"I ~Ull'''•••''.''•••'''''''.'''''''''.-t
CHAPTER
I. BASIC IDEAS IN SAMPLING
1.1 Sampling Method . 1
1.2 Standard Error 1
1.3 Principle of Choosing among Alterna-
tive Sampling Methods. 2
1.4 Probability Sampling . 3
1.5 Simple Random Sampling 3
1.6 Procedure of Selecting a Random
Sample 6
1.7 Non-Random Methods of Sampling 10
1.8 Non-Sampling Errors . 10
ApPENDIX: Table of Random Numbers 18
PAGE
2b. 1 Introduction 60
2b.2 Sampling with Replacement: Sample
Estimate and its Variance . 62
2b.3 Sampling with Replacement: Estima-
tion of the Sampling Variance 64
2b.4 Sampling without Replacement . 65
2b.5 Sampling without Replacement- General
Case 69
APPENDIX: Tables of Sg (P, Q)/Xl! XI!'.. 74
Tables of 7TI! 7T2,! •• ·gs (P, Q) 78
CONTENTS xv
PAGE
III. STRA TIFIED SAMPLING
A. SELECTION WITH EQUAL PROBABILITY
3a.l Introduction 83
3a.2 Estimate of the Population Mean and
its Variance 84
3a.3 Choice of Sample Sizes in Different
Strata 86
3a.4 Size of Sample for Estimating the Mean
with a Given Variance under (a) Opti-
mum (Neyman), and (b) Proportional
Allocations 89
3a.5 Comparison of Stratified with Unstrati-
fied Simple Random Sampling 91
3a.6* Practical Difficulties in Adopting the
Neyman Method of Allocation 95
3a.7 Evaluation from the Sample of the GPo in
in Precision due to Stratification . 100
3a.8 Use of Strata Sizes for Improving the
Precision of an Unstratified Sample. 106
3a.9 Effect of Increasing the Number of
Strata on the Precision of the Estimate 108
3a. 10 Effects of Inaccuracies in Strata Sizes 109
B. VARYING PROBABILITIES OF SELECTION
3b.l Estimate of the Population Mean and
its Sampling Variance 127
3b.2 Allocation of Sample among Different
Strata . 129
3b.3 Variance of the Estimate under (i) Opti-
mum Allocation, and (ii) Proportional
Allocation when the Total Size of
Sample is Fixed . 131
3b.4 Comparison of Stratified with Unstrati-
fied Sampling 132
3b.5 Evaluation of the Change in Variance
due to Stratification 135
xvi CONTENTS
PAGE
IV. RATIO METHOD OF ESTIMATION
A. SAMPLING WITH EQUAL PROBABILITY OF SELECTION
4a. 1 Introduction 138
4a.2 Notation and Definition of the Ratio
Estimate 138
4a.3 Expected Value of the Ratio Estimate. 139
4a .4* Second Approximation to the Expected
Value of the Ratio Estimate 144
4a.5 Variance of the Ratio Estimate. 146
4a.6 Estimate of the Variance of the Ratio
Estimate 150
4a.7* Second Approximation to the Variance
of the Ratio Estimate 151
4a.8 Conditions for a Ratio Estimate to be
the Best Unbiased Linear Estimate 154
4a.9 Confidence Limits. 158
4a. 10 Efficiency of the Ratio Estimate 160
4a.ll Ratio Estimate in Stratified Sampling 166
4a .12 Ratio Method for Qualitative Charac-
ters: Two Classes 174
4a. 13 Extension to k Classes 177
B. SAMPLING WITH VARYING PROBABILITIES OF
SELECTION
4h.l Ratio Estimate and its Variance 179
ApPENDIX: Expected Values of Certain
Higher Order Product
Moments 188
PAGE
PAGE
8.13 Sub-Sampling with Varying Probabili-
ties of Selection at Each Stage 404
8.14* Sampling without Replacement at Each
Stage 410
X. NON-SAMPLING ERRORS
A. OBSERVATIONAL ERRORS
c-
E o
~
.5
g.
V"l
V"l -
\0
N
00
\0-
-N
xxiv LIST OF EXAMPLES
'2
-
o
o
u
- ~
........
,J::,
;::s
tI)
on ..... 00 00 00 .....
.....
N
on
N
on
N
t'-
N
a-
N
N
~
.....
\0
.* * * * *
LIST OF EXAMPLES xxv
o
o
-:
0\ - N
o
-.:to
....o
INDEX TO PRINCIPAL NOTATION
(The number against each symbol indicates the page where it is first
introduced and explained)
PAGE PAGE
PAGE PAGE
239,265
Yi.
S·2
t
85 YN.,Yn. 239-40
Sb2 . 240 Y.. 239, 265, 447
SlJ'2, Sb"2 267,268 Y.j 447
8w . 92 Yn.',Yn. " 266-67
8W2 241, 289, 308 ,Vw . 85, 206, 479
8.t2' Sp 2 308 yz 193
5.tZ2 360 Yds' 224
Sc2 • 420 y'ds 229
SWT
2 423 Yim, Ynm 286
Swc
2 429 Yi(m i ), Yn(m j ).
The probability that the specified unit is selected at the r-th draw
is clearly the product of (1) the probability of the event that it
is not selected in any of the previous r - 1 draws; and (2) the
probability of the event that it is selected at the roth draw. The
probability that it is not selected at the first draw is, by
definition, (N - l)/N; that it is not selected at the second draw
(N - 2)/(N - 1), and so on. Th.e probability of event (1) is,
therefore,
N-l N-2 N- r + 1
N . 'N-l N -;+2
The probability of the event (2) is clearly I/(N - r + I). The
product of the two is, therefore,
N-I N-2 N-r+I I 1
N-- . N-l ... N':'_r-+l' lV"":-r+I or N
which is the probability of drawing the specified unit at the first
draw.
Since the specified unit may be included in the sample at any
of the n draws, it also follows that the probability that it is
included in the sample is the sum of the probabilities that it is
selected in the first draw, second draw, ... , n-th draw, and is, there-
fore, equal to nj N. Since this result is independent of the specified
BASIC IDEAS IN SAMPLING 5
unit, it follows that every one of the units in the population has
the same chance of being included in the sample under the
procedure of simple random sampling. This, in fact, has some-
times been used as the definition of simple random sampling.
However, this definition does not completely specify the procedure
of simple random sampling, for, as will be clear in Chapter IX,
there can be other procedures of sampling which do not give the
same chance of selection at the first draw to each unit of the
population and yet the probability that any specified unit is
included in the sample is niN.
The method of simple random sampling is also equivalent to
giving an equal probability to each possible cluster of n units
to form the sample of the population. The possible clusters of
n are
(~)
Random sampling implies that everyone of these possible clusters
will have an equal probability, namely,
1
(~)
of being selected as the sample. Thus, if the population consists
of 4 farms serially numbered 1, 2, 3 and 4, having 2, 3, 4 and
7 acres under corn respectively, then the possible clusters of 2 farms
from this population will be the following six:
1,2 2,3
2 1,3 2,4
3 1,4 2, 7
4 2,3 3,4
S 2,4 3,7
6 3,4 4, 7
6 SAMPLING THEORY OF SURVEYS WITH APPLJCATIONS
of random numbers, and (c) taking for the sample the n units
whose numbers correspond to those drawn from the table of
random numbers. The following examples will illustrate the
procedure:
Example 1.1
Select a sample of 34 villages from a list of 338 villages.
Using the three-figure numbers given in columns 1 to 3, 4 to 6,
etc., of the table given in the Appendix and rejecting numbers
greater than 338 (and also the number 000), we have for the
sample: I
be divisible into units which are distinct so that every unit area
of the population belongs to one and only one sampling unit is
thus not fulfilled, with the consequence that the central areas get
a relatively greater chance of selection than those near the
border.
1.7 Non-Random Methods of Sampling
Methods of sampling, which are not based on laws of chance
but in which units of the population to be included in the sample
are determined by the personal judgment of the enumerator, are
called purposive or non~random methods. An example of this
method, where personal judgment is introduced in the selection
of a sample, is provided by the old official method in India of
selecting fields for sample~harvesting for determining the average
yield of a crop. Under this method, the experimenter was required
to select fields which, in his judgment, had an average crop. It
was found that the experimenter tended to select fields which were
poorer than the average when the season was good and better
than the average when the season was bad. The result was a
tendency to over~estimate yields in bad years and to under~estimate
them in good years. The quota method of sampling, so exten~
sively used in the United States of America in opinion surveys,
is another example of this method. Here quotas are set up for
the different categories of the population to be included in the
sample and the selection of units from each ~tegory is left to the
personal discretion of the enumerator. The method is convenient
to use in practice. Its cost is also low relative to that of the
method of probability sampling. However, the sample does not
provide any means of judging the reliability of the estimates based
thereon. If we want to have unbiased estimates of the population
character whose accuracy can be measured from the samples
themselves, probability sampling alone shQuld be used.
1.8 Non-Sampling Errors
The accuracy of a result is affected not only by sampling errors
arising from chance variation in the selection of the sample but also
by (0) lack of precision in reporting observations, (b) incomplete
or faulty canvassing of a designated random sample, and (c)
faulty methods of estimation. These errors, particularly those
BASIC IDEAS IN SAMPLING 11
under (a) and (b), are usually grouped under the heading "non-
sampling errors". Deming (1944, 1950) has listed the different
sources of errors and biases arising from (a). These, in his words,
are principally due to arbitrariness in definition and variable
performance of the man. An eye-estimate of the crop provides
an example of this source of errors. Eye-estimate is a form of
measurement which cannot, in the very nature of things, give
a unique result even when the same field is observed at different
times by the same enumerator. The result will depend upon the
personal judgment of the enumerator, no matter how well he is
trained and consequently there will be variation from enumerator
to enumerator observing the same field and in repeated observa-
tions by the same enumerator. A character like damage to a crop
in the field from rust will similarly involve a certain amount of
arbitrariness in definition and, therefore, give variable response.
Even with factual characters like the area under the crop in a field,
there is found to be marked variation in performance of the same
enumerator measuring the acreage at different times or of different
enumerators measuring the same field. Tn an inspection carried
out by the statistical staff to test the reliability of the area records
maintained by the patwaris (village officials) in 61 villages selected
at random in the Lucknow District (India), about 20% of the
reports were found to be in disagreement. Part of the discre-
pancies could, of course, be explained by carelessness or even
dishonesty but most of them were due to differing descriptions
of the same situation given by different agencies (Sukhatme and
Kishen, 1951).
An example of faulty canvassing of a selected sample is reported
by Kiser (1934) who selected a random sample of households for
studying morbidity. The relative frequency distribution of the size
of households included in the sample and as revealed by the Census
is given in Table 1.1, which shows that the sample is considerably
deficient in the frequency of households of size 2. Kiser attributed
the deficiency to the failure on the part of the enumerators to
re-visit missed households in which childless married women
working away from homes are likely to predominate.
A similar bias attributed to the poor execution by the field force
of the selected sample arose in a survey for estimating the yield
12 SAMPUNG THEORY OF SURVEYS WITH APPLICATIONS
TABLE 1.1
Relative Frequency Distribution of the Size
of the Households
2 19·4 26·8
) 25·9 26·5
4 2~'5 21·9
5 15·4 13·0
6 8·1 5·9
7 3·5 3·2
8 1'9 1'4
9 and over 2·2 I·)
,-.
N
'-'
~
00
V)
";'I
IC
IC
r:-
V)
on
"?
V)
.. I
N
'-'
~
<;>
.....
-
00
V)
~
N
IC
.....
~
on
:: 0\
...,
,-. ~
..., IC
~
00
;::- ..... "'r "? ':' ~
::::. r:.. N
'"
on
~
~
§
...,
..... IC
.;.., "?
V)
0\
0\
"? ...
,-.
N
-'" .....
"':"
00
..;,. ..;,.
on
V)
'"
._, ~ ~ ~ '-' V)
~ ...,
·5
,-.
e C! IC
";'I
00
r:- -
,-.
!::!.
00
V)
.....
~
N
~
~
~
~
00
'" on V) IC
'" IC
.~
.'§~
:::1'-' e'"' '"
N
00
::
r:..
~
N
...:..
-
0
.....
6'
!::!.
~
r:-
V)
IC
on
IC
"'r
on
-
V)
V)
~~
~ § ..., ,-. on .....
~
on 0\
~
0
<u~ E ~ ':' ':' .;..,
0\
"? r:- .;.., ~
~'-' on
V) '-' IC V) V)
V)
·5 ~
~~
§ ~
,-.
~
0\
N
0\
0\
"'r
.....
.....
':'
00
- -
~
.....
,-.
00
'-'
-
.....
V)
~
";'I
V)
'"
"'r
on
N
0\
~
-
N
~~
~:§
~ ~ ;:
~
,-.
~
V)
'"
..;,.
- --
N
~
N ~
..;,. ..;,.
r::-
::::.
IC
"?
IC
00
";'I
IC
...,.....
.;..,
-
00
.;..,
~
-<u
~;§
·5
bo
;..
§
~
N
IC
~
0\
~
~
~
':'
V)
.....
"?"
~
G
::::.
on
':'
IC
~
IC
...,
r:-
V)
~
IC
a- .tIj
.5 ...,
,-.
'-'
~
.....
.;..,
~
";'I
V)
N
':'
V)
0\
on
.;..,
.-.
-
V)
'-'
-
IC
~
.;...
..... ~
IC
.;.., . V)
·s
~
~~
IU
<::)
~ --
N
'-'
0\
~
.....
.....
<;>
IC
on
r:- r:-
V)
V)
V)
..
._,
~
V)
8
.;.., '"
0\
~
~
V)
~
~
:e:>
r.i:
-
,-.
._, ~
V)
..;,. ':'
~
IC
V)
..;,.
00
.....
~
...,
-
,-.
'-'
N
':'
.....
::
'" '"
-
IC ~
";'I
IC
§
~
'"
'0
..
'0
..J § ..J §
:3 .~ u
.§
'iii
'+l
S ~
Q,
,-..
:$ " 8.
-a S -ae go ~ "38.'"
~
Q,
is. e
l
~
~ 0
Po.
~
<:)
~
~ i. l ~ Po. l ron~ Po.0
~ ~
~ ~ ~ ~
16 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
and walking from a given point in the field along the direction
of its length will agree that the starting point of the plot and the
direction along which it is to be laid in the field could at best
be determined only approximately. Even if the same ohserver
were to locate and mark the plot determined by a given pair of
random numbers at different times in the same field, the plots
may occupy different positions. The inclusion or exclusion of
particular plants on the border of the plot in demarcating it will
similarly depend upon the judgment of the experimenter. The
area actually cut may also vary from the one intended to be cut
due to unevenly sown crops and errors in measurement. If all
these deviations could be ascribed to a random element, one would
expect the errors to cancel out. The results given in Table I .3,
however, indicate that this is not the case. They show that small
plots significantly over-estimate the yield, although the degree of
over-estimation becomes smaller with larger plots. It is obvious
that the overall influence of the various non-sampling errors
relative to the produce harvested becomes smaller with the increase
in plot size until, when the plot size is large enough, such as is
used in official crop-sampling work, the bias becomes nt'gligible.
The above examples will show the need for exercising care in pre-
paring the design of a survey so that, as far as possible, biases are
absent. Where it is not practicable to ensure absence of bias,
one should at least satisfy oneself that the bias present, if any, in the
sample estimate is so small as to be negligible in comparison with
its standard error.
REFERENCES
I. Tippett, L. H. C. (1927) .,
Random Sampling Numbers, Tracts for Computers,
XV, Cambridge University Press.
2. I.C.A.R., New Delhi (1951) Sumple Suney.\' for the Estimatiofl of Yield of Food
Crops (1944-49), Bulletin No. 72.
3. Deming, W. E. (1944) ., "On Errors in Surveys," Amer. Sociol. Rev., 9,
359--69 .
4. ---(1950) ., Some Theory of Sampliflg, John Wiley & Sons, Ltd.,
New York.
5. Sukhatme, P. V. and "Assesment of the Accuracy of Patwaris' Area
Kishen, K. (1951) Records," Agriculture and Animal Husbandry,
1, No.9, 36-47.
6. Kiser, C. V. (1934) .. .. Pitfalls in Sampling for Population Study,"
Jour. Amer. Statist. Assoc., 29, 250-56.
7. Sukhatme, P. V. (1947) ., "The Problem of Plot Size in Large-Scale Yield
Surveys," Jour. Amer. Statist. Assoc., 42,
297-310.
2
18 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
-
""""0(')O
....
° ~("PI"I-~ -V)"'NO -"00"'" -~-.::rf""'lO" O\OO("f"i-OO
0\
N V>NN""''''!:t ""0000"'0 0\..,. .... "'01 \O-NNCO _t""'>O\ON
'"
N \Qr-t'--Nf"oo. -00000,..... 0\ \0 r"('f")t"'j r-.I.I")O-~ f'''d-f'-"'\O
'"
N NOO--O 0000"'0", 0000 ..... '<1' 0---'" ",r----t'--r---.
'<I' V')-NMO'\ ",ON-O OOMV)"I1't-- OOOOOO-M I,O-r-Ol'-
N
.., OO-_V'lO"l ~-Mt--OO -NO\'O"'tt \OOONt---r---. \Ooot---O\('f')
N
N '<1''''''' ..... 0 "' ..... N"' ..... O\trlOClr-OO "'0\-"''<1' Oo.,r-NN
!,""
-
N OO-~Of"') O'\OONMV) '<I''''V)M'''' \D 00 .............. \0 OOOO"'MO
°'"
0\
'<1'00"'0 .... tI")"III:tet:t""Il.I) 00 ('I V)('.IV) N""O"'''' ('lC'I\Olor.O
:::<u 00
00 0\ t- 00 r- -.:t rf"!f"'-I 0 V) ..... -'<1'0'<1' .... "''<1''''0 I,(')-'"e::t"~~
~ ~ t-ot-oo_
'" '<I''<I'''OCN .... '<1'01 N'" M~OOOOr- -o\\C)NM
~ ~
t: ~
11',
N tf")""'" tr, r- NOO-r-\D "'N'<I''''''' 0\"'0""00 --00-0
~ c-oo_o
'<I' -VN_N NN_r"'-("I'i OOr--M~_ r-t('fj 00 000
~ .... '<1'001 '<I'M 0011)'<1'0\
~
'<I'r-OIN'" N"'OIr-O 00 V')(X) 0\ N
00
OO-Nt--"'I:t '<1'''' ..... '<1'\0 ..... '<1''''''''0 ....,""'<1'''''<1' 0\00'<1'''0\
.,., N'<tN 0 '<I' t'f": 0"'1' l'--tI") O\~--CO "'-NO .... ° '<I"-'<t 1'1
'<I' '<1''''1''10\1'1 0\"'" '" N ..... ""00\0\t- 0\.0\0""'" .... 000\0'"
.....
t-, tnt.r)r-Mt--- O-NO",N\O 0...,,,,<'1\0 t"'1V')\O 0-. 0t- iX)0\ 0\ 0-
°
N 0\ N NO ..... <'I "''''0 v \OO-<'IOr- Nt"it"i":::tN V)-~\OO
-
0-
00
O\OMt"if""l ""OO\\Or- OOV)_M\C) f""'loN\O""'_ 'V 0\ f·'·H··..' f"I")
--
'"ci
~ -
r-
IoCO"'<'IO
r-oo\OO-
\OvO- oo
O-q··~'ll:tr---
00\0"''''
0\----
VO\t-·<'IO
"d"t"iMt---f'f"i
O'\t"I")l"I'il,f')f'I"')
",0'<1"\0_
~ \0
1"'\\00\0'" r-oo_OOV) \ON\O-f""') O<'lr-O'" O\OV\OI"'\
-'
~ '" '<1"\01"'\ <'IV
.... \01"'\"'1'"10 "'Or-\Ooo ",r-","'"" MM<'I"'O
Q v NO\r-OO_ NooMII')O\ tnf"l"'l-r-f"oo t"iO\V".-N "'0 v "'<'I
Z 1"'\
"'-l r-I"'\r-lCv \0\0\0-\0 O\OMO\ar'\ ("f"'Ht') lIi 0 \0 NOOOMf'-
~
~
< ---
<'I
r-ooOooO
_\0 O\t"i 0
oo-OOON
oo"'V\Ooo
V",QOM\OV
OOMO\r---r-
O\-trl"'tOO
r-r--t""-V')
V)ooV')"
V)f'·..·HI')NOO
00
-°
0\
0'<1""'-11')
-r-'<1"vO'l
r-O\-II""IQO
l'f")V"IO\\Ot--
V\OMOoo
r-MOr-<'I
OM r-'<t '"
<'IO'<1""'M
""II:t0\
v'<1"ooO\M
r-tn 00
N 0\0\ .... 0'1\0 0"'''' ........ OOo\Nr--- ..... 0\ .... r- .... r- 00\0\000'"
\Or-ooO\O
~~~~i
-Nf"I"'I~1I")
\0 r-oo 0\0
NNNN .... Mff')Mt'\tf') .~~;~ '<t'<1"'<1"'<1"",
CHAPTER II
BASIC THEORY
A. SIMPLE RANDOM SAMPLING
-
1: Y.
i=1
)IN = N· (1)
N
1: ),,2 - NYN 2
= i=1
(2)
N-=l
BASIC THEORY 21
N - I S2 (4)
N
and
the sample mean square given by
n
S2 = '"
~ (v
. i - /"
'", )2
n-I
n
1: y/ - n)"\2 (6)
n-J
where the summations extend over all
the units in the sample.
(7)
=~ 00
where the symbol E stands, as usual, for expectation. We write
Est. h = jill (9)
and
Est. S2 = S2 (10)
where Yi stands for the value of the i-th unit of the population,
and the summation is taken over all the n units in the sample.
Numbering the units in the sample serially, as 1,2, ... , r, ... , n,
we may write (11) as
(12)
where Yr' now stands for the value of the unit included in the
sample at the r-th draw.
By a well-known theorem in probability, the expected value of
a sum is the sum of the expected values. We, therefore, write
• This section may be Skipped over at the first readinB witt-out losina continuit,
of the tellt,
24 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
Now, by definition,
N
E (y,') = .E Pi,Yi (14)
i=t
E (y,') = \'v
N w·
;=1
1
( 17)
where
a, -co I if Yi is in the sample,
and
ai = 0 otherwise.
(18)
Now, clearly,
E (a;) = 1.{Probability that Yi is included in the sample}
n
(19)
N
E(y/2) = ; L:
i=l
y/ (20)
Adding and subtracting jiN~ from th~ right-hand sid! and using
(2), we cm write (20) in an alternative form as;
It follows that
E {I w~ }'_.2} ,,~ fI ~
11
E J'
ll1 ~ _r
'2J{
(22)
Again, if yr' and Ys' are used to denote the values of the units
drawn at the roth and s-th draws respectively, say Yi and Yj, we
have, by definition,
(23)
(25)
26 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
Hence, substituting for Pi, and P j ,,, from (24) and (25) in (23),
we get
N
It follows that
n (n ~ 1) E {t y<y!} ,. .~
.#J
n (n '- 1) E { t y,'y.,}
,#.=1
N
= N (~ - I) L
<#1=1
Yi)'J (27)
n (/:"_-1) E {t 1#1
YIYJ} = n (/_ 1) [t i#i=1
E (a,ai) YiY!] (28)
Ii (n ~1) E {t '¥-i
YiYJ} = fir(~-= 1) t
<¥-/-l
Y'YI (32)
BASIC THEORY 21
which can, alternatively, be expressed as:
1
- N<N-I)
(33)
or
E (y'rY. ') -- YN
- 2 - 8~
N (34)
E(y.') ~ E{~ t Y}
=
n
\ E { i y,2 + i' YIY;}
i~j
:2 [nYN2 + ( n - Z) 82 + n (n - 1) h 2
_ n (n Ii 1) S2]
- 2 + (I 1)
= YN n - N S2 (35)
Hence
E (S2) = E {n ~l ( i; y/ - ny,,2)}
= nl:-t {E t y,2 - nE (j,,2) 1
28 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
= S2 (36)
showing that S2 is an unbiased estimate of S2.
2a.6 Sampling Variance, Standard Error and Mean Square Error
We have seen that a sample estimate differs from the population
value by varying magnitudes in different samples. This difference
between the sample estimate and the population value is called
the sampling error of the estimate. An important requirement
of a sampling method is that, in addition to giving an estimate
of the population value, it should provide a measure of the
sampling error in the estimate. Since the actual sampling error
in an estimate cannot be known, we obtain a measure of the
average magnitude over all possible samples of the sampling
error in the estimate. A simple average of the actual sampling
errors over all possible samples is, however, zero in the case of
unbiased estimates, as seen from Table 2. I. An average of the
sampling error without regard to sign provides one measure,
called the mean deviation, but this is not in common use. The
average magnitude of the squares of sampling errors over all
possible samples is called the sampling variance of the estimate
and its square-root is the measure most commonly used for
defining the average sampling error. This measure of the average
sampling error is called the standard error. In defining the standard
error as above, we assume that the sample estimate provides an
unbiased estimate of the popUlation value. More generally, for
a biased estimate of the population value, the sampling variance
is defined as the arithmetic mean of the squares of the differences
between the sample estimate and the expected value of the
estimate over all samples, and its square root is called the
standard error. The arithmetic mean of the squares of the differ-
ences between the sample estimate and the population value, in
this case, is called the mean square error.
V(Yn) = G- ~ ) S2 (38)
The reader may verify that the value of the sampling variance
derived from this formula is the value actually obtained in
Table 2.1 for a sample of size 2.
E(y) =fLl
30 SAMPLING tHEORY OF SURVEYS WITH APPLICATIONS
and
E (y - 1-'1)2 = 1-'2
We may write
(ji. - ILl) = (ji" - YN) + (jiN - 1-'1)
+ E (YN - ILl)2
+ 2E (YN - ILl) E {(y" - h) IYJ' ... , YN}
But we have already seen in (16) that E (Yn) = YN • and so
E {(Yn - YN) IYl' ... , YN} = O. The last term in the above expres-
sion is therefore zero. Hence
or
E{V(ji,,) 1)'1.·· "YN}=(! - k) IL2
It follows that
will be denoted by
(42)
32 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
Examples:
n
S (3) = J; Yi 3
i=l
-f log (I - Yi a ) == Slet
2
+ Sa a2 + S8 a3 + ...
3
-_1... -
I a I < maxy!
giving us
g (PI'" Pa"' ... ) = J; g, (P, Q) S (ql'" q2'" ... ) (43)
and
(44)
BASIC THEORY 33
where P and Q stand for the partitions (pt l 1'2"' . .. ) and
«]lx, q2 X , • • • ) of the same number w given by
(45)
Examples:
n(n - l ) ... (n - p + 1)
ep = -N(N-I)
- --- -
... (N-p+l)
(46)
i.e.,
3
34 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
Proof:
E {g (Pi", P2",·· ,Ph"I,)}
But
E (a) = 1· Probability (a = 1)
_ n (n - 1) .. . (n - p + 1)
- N(N= 1) .. . (N - p + 1)
= ep
Hence
E {g (Pi"' p~"'" ·h"h)} = E (a) G (Pl"'P2"'" 'PI>"h)
= ep G (PI"' PZ"'" 'P1I"A)
Now
(49)
+ 2!
n
e,G (1 ') (50)
(51)
36 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
(52)
It can be verified that the value 19·05 for the sampling variance
of S2 obtained in Table 2.1 agrees with that derived from the
above formula.
The corresponding formula when N is infinite is readily obtained
as the limiting case as N -+ 00. In this case, we have
Nfe; = 0 for i <j
and
Nie J = n (n - 1) .. . (n - j + 1)
Hence
(53)
(54)
where
V (S2) = I-". 4 -
n
I-"~
2
+ n (n 2- I)
1-"2 2
where 1'-2 and 1-"4 denote the second and the fourth moments of the
population. However, sometimes, we also need to know the
behaviour of s, the standard deviation. This is obtained as follows:
Let
where
E (€) = 0
and
E «('2) == V (.1'2)
We may w.ite
s = (S2 + (')!
= s( 1 + ;2)*
Since E will be small as compared with S2 with a probability
approaching 1 as n becomes large, we may expand the right-hand
side as a series, neglecting powers of E higher than the second.
We then have
= 82 - 82 {1 - i . V.(S.2).}2
-8 c
= 82 {I - 1 + ~{~~:) ... }
V (S2)
~ -48 2 (58)
-
Yn - t(a. 00) VJNNn
:_ n S ~ - ~ -
'" YN ',,: Y..
+ t(a.oo) VIN-NIl-
-n S (61)
V Nn S
-
Y .. - t(a . • _1) VIN-n
- Nn <"-
5 ""-
_-
YN "". ),,,
L
T tea, ._1) V/N-n
Nfl S (63)
For the six samples in Section 1.5 and for the same confidenct:
coefficients as given above, these cO:1fidence limits bas~d on 52 are
given in cols. 3 and 5 of Table 2.2. The values of 1(,32,1) and
f('05, 1) have been interpolated from the t-table, being 1· 85 and 12·7
11= (65)
n (66)
(67)
(68)
or
(69)
CJ pft' (1 - p)"'
By definition,
•
E (nl ) = }; nIP (nl )
"1-0
X (n - 1)! (N - n)!
(N -I)!
44 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
. (~~ ~-----
= np \ ' --
~ /) (~:) -
W
.,=1
(N-
n -'I
I)
Now
(n-I
represents the probability that in a sample of n - 1, n1 - ] will
fall in class 1 and n 2 will fall in class 2. Consequently
~ (NPnl -- II____)(Nq)
n 2 = 1
W
.,=1
(Nn --I) 1
Hence
E (nI ) = up (70)
Similarly,
Est. q = ~2 = q. (73)
Now
(74)
BASIC THEORY 45
SO that
V (n 1) = E {nl (Ill - I)} + E (n l ) - {E (1l 1)}2 (75)
Also
"
E {nl (n l - I)} = .[; nl (nl - I) P (nt)
n!(N-n)!
x N!-
= "
n (n - I) Np (Np - I) '\' _(Np - 2)_!
N(N-I) W(nl-2)!(Np-nl)!
(n - 2)! (N _- n)!
(N-2)!---
_ n (n - I) Np (Np - 1)
- -N (N - I) (76)
since the sum of the terms under the summation sign is evidently
1. Substituting the result in (75), we have
(80)
.
y" = ~ I:YI = ~1 = P.. (81)
S2 = Np _ Np2 N
;",1
N-l
=
N-I
. = N-I p (I -p) (82)
.. n _nL2
Sl = -·*-__T
1: Y 2 - nji 2 1 n
= li··..:__T· = Ii _
n
1 p .. (1 - p .. ) (83)
and not by
p. (1 - p,,)
as one might suppose. We write
n (N - 1) (1
Est. {p (1 - p)} = (n-- 1) N • P.. - P..) (85)
(86)
BASte tHEORY 41
and an estimate of the standard error of Pn is given by
(88)
n (90)
1+
For N large and € not too small, n is simply given by
n = [2
(a. 00)
q,
€2p
(91 )
Example 2.1
Material for the construction of 5000 wells was issued during
the year 1944 in a certain district as part of the Grow-More-Food
Campaign in India. The list of cultivators to whom it was issued,
together with the proposed location of each well, is available.
A large part of the material was reported to have been misused
by diverting it to other purposes. It is proposed to assess the
extent of the misuse by means of a sample spot check. Tn other
words, it is proposed to estimate the proportion of wells actually
constructed and used for irrigation p~rposes. The sample is
proposed to be selected by the method of simple random sampling
from the total population of wells for which the material was
issued. The permissible margin of error in the estimated value
is 10% and the degree of assurance desired is 95%. Determine
the size of sample for values of p ranging from ·5 to ,9.
We are given N = 5000, € = ·10 and t(Q, 00) = 1·960.
Substituting in (90), we obtain for different values of p, the
following values of n.
~.----~----~-----.--- --_.,._------ ---
p '5 ·6 ·7 ·8 ·9
n 357 244 159 94 42
Since the worst critics do not place the misuse at more than
half of the material issued, a sample of 357 would appear to be
adequate for the proposed check.
BASIC THEORY 49
2a.tS Generalised Hyper-Geometric Distribution
We shall now extend the preceding results to the population
which is divided into k mutually exclusive classes. Let Ni denote
the number of units in the i-th class of the population (i = 1, 2,
... , k), so that
k
1) N, = N
;~l
(92)
where
N,
P. = N
and
N-n
V (ni) = N-='_l . IIPi (1 - Pi) (94)
Thus
P'I = ~-- v
E (n,nl) - E (nj) 'E (nJ)
V(nj) , V (nJ) .. (96)
Now
Ni,Nj
E (11,111) = E n.l1 j P (n" nj , n - nj -- n/)
n,=1
"/::1
= n
N N,N/
{nN -_::}1 _ Nn}
(98)
(99)
BASIC THEORY 51
2a.19 Expected Value and Variance of the Proportion in One
Class to a Group of Classes
The more general problem in the study of qualitative characters
is the estimation of the proportion of the number in anyone
class to the number in a combination of some classes containing
the class under consideration. Suppose, for instance, the popu-
lation consists of paddy-growing fields in a district, classified
into four classes as shown below, and that a sample of n has
given the results indicated below the corresponding population
numbers.
(rrighted Unirrigatcd
Not Sub- Not Sub·
Total
Manured Manured Total Manured Manured Total
(I) (2) (3) (4)
.-~---.--------.------~-
----- ----.--~--~ ~-------
(100)
(102)
(103)
We know by a well-known theorem in probability that
P {nIl> n12. n - nI} = P {nb n - n I} P {nll. n12 I nI} (104)
(105)
(106)
fJ_H (107)
PI
BASIC THEORY 53
-
--
E [E {nll (nIl -n 1) + nIl I J)J _ /1111
2 N111
2
_
1
2
(108)
1
(109)
(n 1 - 1) Nil (Nil - 1)
(110)
- n; Nl (Nl _:_ If
Substituting in (108), we have
(Ill)
Now let
where
and
54 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
E (nil) ~ npi
E( I ) ~ I {I + N - nIl (I _ )1
nl -- npi N - I n PI PI f (112)
+ NN-n
- I
1
n"PI:' (I -- PI)
} (113)
Evidently,
Np + Nq= N
11N YN.
-
lip
( 116)
(117)
BASIC THEORY 51
Now, for given nb
Np
n, \ ' 2 + n, (11, - J)
Np L..JYi Np(Np -I)
j::q
Hence
E [{ £ Yi rJ = E [E {( £ Yi rIn,} ]
~p (t ",) E (n,)
+ Np (N; _ I) (. ~,N.)
(II'})
Using the results, already established, namely,
E (II,) = np
and
Np (Np - 1)
E (11,. 11, - I) = N (N- J) 11 (n - J)
- l _ N (IV - n) {(
V { nN . nIY",; I) S 2 N - 2}
- n (N _ 1) Np - ,+ PYN,
N (n - 1) N" 2- 2 N2 2- 2
+ n[R-=-l) .p YN, - :P YN,
N(N - n) 2
= n (N _:T) (Np - I) SI
v {: . I1 Yn,} I = V {NYn}
= N2. N -n. S2
N 1/
_-- N (N - n)
n (N _ I)
{L:
N
2_ N-YN2}
.1',
1.,
N(N-I1){~ 2_(Np)2_ 2}
L./'
n (N _ I) N YN,
1=1
BASIC THEORY S9
For purposes of simplification, we will assume that Np is large
enough to permit the following approximations:
Np -1 ~
Np~-
and
v {N.
n -} ._-
n1Y"1 _ N (N
~ 11
- n) { p S
•. 12 + P (1 - P) YN
- 2)r
1 (122)
Nil. 1 {C +
1
2
I - Pt (123)
N n p p J
where C12 denotes the square of the coefficient of variation of the
area irrigated from a well in class 1. For N large, the relative
variance is given by
(124)
p ·5 ·6 '7 ·8 ·9
.-._-----
and
PI, = (Probability that Y. is not drawn at the first draw) X
(126)
where
(127)
It is thus seen that Pi, is not equal to Pi, unless Pi = liN. The
theory of sampling with varying probabilities of selection is
consequently more complex than that of simple random sampling.
One way of introducing simplification into the theory is to replace
a selected unit before another draw is made, so that Pir = Pi
62 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
(128)
(129)
, = L..J'
E (z) \' P • Y,_
NP,
'"'1
= YN (131)
It follows that
(132)
BASIC THEORY 63
d{:tZJ-"
=n2 {t Z.2 + t Z,ZI} -
E Z..2
i~j
Now
N
E (z/) = .E
i=l
p,Z.2 (134)
and
E (Z,Zj) = E (z,) . E (Zj)
(135)
Substituting from (134) and (135) in (133), we obtain
V(zn) = ~2 {n t
1=1
P,Z,2 + n(n - 1) z..•} - %.• 2
0"2
• (136)
n
64 SAMPLING THEOR.Y OF SURVEYS WITH APPLICATIONS
where
N
u z2 = lJ Pi (Z; - ZJ2 (137)
;=1
and
N
u,2 = N .L:
("I
(Yi - YN)2
N-I S2
(138)
N n
Lastly, we remark that when the selection probability is propor-
tional to the value of the variate, in other words, when Pi is
proportional to Yi, say, Pi=yi/fL, z assumes a constant value for all
i, and in consequence (136) reduces to zero. In practice the values
of the variate will, of course, not be known in advance but values
of another variate correlated with the variate under study may be
known. We may, therefore, expect that when Pi is proportional
to the measure of size of Yi, the estimate may be considerably
more efficient than that based on simple random sampling.
2b.3 Sampling with Replacement: Estimation of the Sampling
Variance
Consider the mean square of Zi'S obtained from the sample,
defined by
.
S 2= __!_ \ ' (z - Z )2 (139)
, n-l W ' "
BASIC THEORY 65
Expanding, and taking expectations, we get
Now, by definition,
so that
E (z,.2) = V (2,,) + 2 .. 2
.,
a-
z I~ z 2
1141 )
11
E· ( .l'z-") ~
-- 11- Iff I II L...
1=1
p o"
i·~;·---",· ~
---/1Z.-"}
( 142)
(144)
and
Yi (145)
= N (s - I ~;p) Pj
where Yi and Yi are values of the units drawn at the first and
second draws respectively.
Clearly,
N
E (Zl') = ~Pi
w '
Yi
NP i1
1=1
= .YN (146)
Also,
L:
N
E (Z2') Yi
P J,
~ip)
'-C"O
1=1 N(S - 1 PI
= J'N (147)
It follows that a simple arithmetic mean of Zl' and z~' will provide
us with an unbiased estimate of the population mean.
An alternative estimate, in which Z is defined independently of
the order of the draw and which is unbiased, can also be formed.
Let
2 y,
(148)
Z, - N (s + 1 - 1 ~'p) P,
(149)
BASIC THEORY 67
Clearly,
N
E (%(nc2) = t 1.: E (a,) Z, (150)
E(a,) ~~ P, + {s - 1 ~ipJ Pi
= {s + 1 _ Pi } P j
1 - Pi
(151 )
E (%1.:2') = t .L: {s +
i==l
1 - I ~i Pi} PiZ j
x (S +I - 1 ~i pJ Pi
(153)
On substituting for E (ai) from (151) and for E (aiaj) from (155)
in (154), we have
+ [PiP I {I
i#j::q
~ p~ + J ~7J Z;ZI] - YN
2
(156)
%.. '2 = ! {t ( -+
1:1
S I -
P,
1 _ P,
)2 .P 2Z; 2 j
(159)
BASIC THEORY 69
Also,
= L
,01-/=1
E(a,a j ) (S + I Pi ) (
I -- Pi S +I -
Pj )
I - Pi
x __ z'=J_' __
1 + I
I -PI I -p!
(160)
Let
z = __ nYl___ (162)
I NE (a j )
E (,in) - E
-- "
{I W{;
n
fly;}
N E (a.)
nL;
(163)
(164)
Z2 _
n
t Est,
N2 {t y{ +,~,M}
n
1'2
• yv
= Z.2 -_
N2 L . I
E (a,) N2 L
i~j
i, j
E (a,al)
(166)·
N
1: {£ (alaj) - £ (al) £ (all} =0 - £ (aj) {I .- £ (ai)}
j(¥oi)=1
to recast the expression for the variance V (i,.) as a linear function of the
,qua res of the differences of the z's. Thu~, (165) can be written as
v (Z-
. •
) = -
N2
I l EN 1-
1=1
£(a()
_--._ )'. 2
£ (a,) .,
.j
~
LJ
j~j=1
£Ca,4j) ... £(ai)E(aj)
£ (a;) £ (aJ)
Vi)'J
•
]
Sul~stituting for J'l in terms of Zj from (162), and using the ahove result,
they obtain
I N
V{i.) = 2 L' fECal) £ (al) - £ (ajajl} (Zi - Zj)2
21< ,""'1=1
It is easy to ~ee that an unbia~ed e~timate of JI (2:.) is given t.y
_ I n {£(Ol) £(01) - £(aiaj)} 2
Est. V (t.) = 2n 2 i~i ··£(olal) - -.-- (ZI - ZI)
I 1
= Pi, + (I - Pi,) N _ 1 + (I - Pi, - Pi) Iv __: 2 + ...
= P; + (I - P;) N ~ 1 + {I - P, - (I - Pi) N ~ I}
1
x N _ '2 +,"
N-n n-I
= N - 1 Pi + IV --1 (167)
while
n-l n-l
= P; N . . :.:. I + PI IV _ 1 + (1 - P; - Pj)
(n - I) (n - 2)
x (N - if (N - 2)
n - 1 ... - n (P j
N n - 2}
+ P)I + ---- (168)
- N-I { N-2 N-2
APPENDIX
Tables of Sg (P, Q)/Xl! X2!."
--._--- , -
__________ :~__:_________ r
v
w= 1 w=3
Q Q Q
(I) (2) (1") (3) (21) (J3)
p (I) p (2) (3)
P (21)
(13) 3
w=4
(4)
(31)
P (21)
(21') 2
.. 3
(I ') 4 6
w=s
Q
(5) (41) (32) (3 JI) (211) (21 8) (15)
(5)
(41)
(32)
P (311) 2
(221) 2
(211) 3 4 3 3
(15) 5 10 10 IS 10
76 SAMPLING THEORY OF SURVEYS WITIl APPLICATIONS
.-
~
N-
;;;"
'-'
-
N
"""
=-
~
0 or,
<;>
.""" on ...,'"
C
.-.
=-
N
-D '-0 0
...,. S
N
C
,~
..""":.;: '<t 0 ...,
on
~ '-'
>:
01
__
...
- .,
Q..,~
'
r-
Ii
~
...,
01 ._,
..., ..., r- on
N
on
S
0>
Col ..C
::: N '<t '-0 0
N
0
r-
~
Col
~
.C) :::-
N
..., N ..., N 0\
...,on VI
~
0
~
-
::-
~
.-.
,.,
...,
..., ...,
""
..., S -
N
~
N on r- on ...,
on
,-.
N
._,
11'1
N ..., N ...,
"" on
-
N
,.-.
t::;.
. . -
~ :::-
._,
~
N' ...,
on
._,
.-
~
~ C
;;:: ,_,
._,
N
,-.
"'" ._ ,_,
~
.., .....N
Q..
- ....
~
~
~
;:::;
,_, ~ C
C .-
~
._,
N
;;;"
;:::;
M' ._,
~
"-'
Mst C TltEOR.Y
., · .................... .. _
-
._"
::' .....
~
::
~
-c
:;-
..
~
~
~
:..
N
C
..
~
:!
• .•••.......... _ . . lrl-V\O
- .....
..
~
~
•• _ •• _. C""'.V"';V')
-0
~
_... '"N
C
>< ..
~
• •..•.•••••• _ • • • M · 8~~~
.... n
C N
x
---
,-.,
~
.:-
N
OC ::!,
~~
'-"
01 ~
• ••.•••••• _ • • • • "<1"- • 0"<1"0\0
:t " - NV)
Ol ~
'0...,'"> ~
~
:!
N
~
,;;-
... •• - •.. ·M _ _ · .\OM-O\OV')oo
::£. - -N
~
• o __ • '-N-f'I"')-MMV')r-lr)V)
-,..,
...,
~
Q Q Q
(I) (2) (12) (3) (21) (J3:
P (I) P (2) (3)
P (21) -I
() 2) _I (F) 2 -3
w =4
Q
(4) (31) (22) (2P) (I ')
_ ...... __ ... __ . ___. __._---_._.- .._.- - - -
(4)
(31) -I
P (22) -I
2 -2 -I
-6 8 3 -6
w=5
Q
(5) (41) (32) (311) (2'1) (21')
(5)
(41) -1
(32) -I
P (311) 2 - 2 - 1
(221) 2 - 1 - 2
(211) -6 6 5 -3 -3
(1&) 24 -30 -20 20 15 -10
----------------
DASle THEoRY 79
Tables of TTl! TTl!. • • gs (P, Q)
w=6
Q
(6) (51) (42) (3 2) (411) (321) (2 8 ) (31 8) (2112) (21') (1 8)
(6)
(51) -
(42) - 1
(3 2) - 1
(4)8) 2 - 2 - 1
P (321) 2 -- I - I - 1
(2 8 ) 2 - 3
(31") - 6 6 3 2 - 3 - 3
(2 2 1.) - 6 4 5 2 - I 4 -I
(214) 24 -24 -18 - 8 12 20 3 -4 - 6
( ]6) -120 144 90 40 -90 -120 -15 40 45 -J.5
------~---- -------.~--. ~-.~- --.--.-~--.-.~--
SO SAMPLING THEORY OF SURVF.YS WITH APpLlCATlOM
-
;;-
' -'
::-
N
'-'
,-.
-
N
I
;- 0 on
N
'-'
~
!:' on 0
t-
C I
-
"""
&
..., on on
~
:;-
0 0
N
..., '" '" on N
"<t
'-'
I
,-.,
~
-..
,-.
::!.
"<t 0
N
;:
N
I
Q,.~
'-"
01 :::; M M 00 on
M
0
~ C I 1'"
"
I::
.....
it
-
.,_,...,.
,-.
(" 00
'" 0
"<t-
I
0
oo
N
~ '-' I '"
...
~ h'
~
M N "<t
..c ~ '" 0
~ I V)
'-'
N N M M
'" N 00 "<t
C()
"<t-
0
~ on
I
\0
,_,
N
'" "<t N "<t
N :!! ~ ~
00
I
,-.
N N "<t
t-
'-'
N N
'" '" '" "<t
N N
0
N ~
.....
BASIC THEORY !!I
Q
(8) (71) (62) (53) (42) (612) (521) (422) (431) (3 22) (5J8)
(8)
(71) -
(62)
(53) -
(4') -
(6]1) 2 2 -
(521) 2 -
(42') 2 2
(431) 2 - I -
(3'2) 2 - I 2
, (5]3) - 6 6 3 2 3 - 3
(421') - 6 4 3 2 2 - I - 2 - I -- 2
(}2(') - 6 4 4 2 - 4 -
(32'() - 6 2 4 4 2 - 2
(2') - 6 8 3 6
(4(') 24 - 24 - 12 8 - 6 12 12 3 8 - 4
(321 8 ) 24 - 18 - 12 - 14 - 6 6 9 3 12 5 -
(2111) 24 - 12 - 20 - 12 - 6 2 12 9 6 6
(31 6) - 120 120 60 64 30 - 60 - 60 - 15 70 - 20 20
(2'1') - 120 96 84 64 30 36 - 72 33 - 56 - 28 8
(21') 720 - 720 -480 -384 -180 360 504 180 420 160 -120
(18) -5040 5760 3360 2688 1260 -3360 -4032 -1260 -3360 -11:20 1344
82 SAMPLING THEORY OF SURVEYS WITH APPLICATION.;
(8)
,
(71)
(62)
(53)
(42)
(612)
(521)
(421)
(431)
(3 2 2)
P (513)
(4212)
(3 2 1")
(3221 )
(24)
(414) - 6
(32J3) - 3 - 3 - 3
(23 P) - 3 - 6 -I
(316) 30 20 IS - 5 - to
(2214) 30 12 32 3 - 1 8 - 6
STRATIFIED SAMPLING
A. SELECTION WITH EQUAL PROBABILITY
3a.l Introduction
We have seen that the preCISIOn of a sample estimate of the
population mean depends upon two factors: (1) the size of the
sam.ele, and (2) the variability or heterogeneity of the population.
Apart from the size of the sample, therefore, the only way of
increasing the precision of an estimate is to devise sampling
procedures which will effectively red~ce the heterogeneity. One
such procedure is known as the procedure Of stratIfied sampling.
It consists in dividing the population into k classes and drawing
random samples of known sizes, one each from the different
classes. The classes into which the population is divided are called
the strata and the process is termed the procedure of stratified
sampling as distinct from the procedure considered in the previous
chapters, called unrestricted or unstratified sampling. An example
of stratified sampling is furnished by the survey for estimating
the average yield of a crop per acre in which administrative areas
are taken as the strata and random samples of predetermined
numbers of fields are selected from each of the several strata. The
geographical proximity of fields within a stratum makes it more
homogeneous than the entire population and thus helps to increase
the precision of the estimate. In this chapter we shall consider
the theory applicable to the procedure of stratified sampling.
Stratified sampling is a common procedure in sample surveys.
The procedure ensures any desired representation in the sample
of all the strata in the population. In unstratified sampling, on
the other hand, adequate representation of all the strata cannot
always be ensured and indeed a sample may be so distributed among
the different strata that certain strata may be over-represented and
others under-represented. The procedure of stratified sampling
is thus intended to give a better cross-section of the population
than that of unstratified sampling. It follows that one would
84 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
and
k
.E nl = n
1=1
YN = ~L N;YNi
'=1
(I)
(2)
STRATIFIED SAMPLING 85
k t
= 1: p,2 E U'ni-_v N,)2+ 1: p,pjE {(Y"i-.l'N) (Vni-J'N)} (4)
i=l i-l'"j=1
-,
E (}"i -YNj
-)2 -
--
(In, - N,
I ) Si 2 (5)
where Si2 is the mean square of the population in the i-th stratum
defined by
(6)
The value of the second term in (4) is clearly zero, since samples
are selected independently from each stratum, We therefore have
k
-) -
V (Y. s - LJ (1iii
\' _N,I) PI 2S i
2
(7)
i=1
(8)
STRATIFIED SAMPLING 87
i=l
-t
+ terms independent of n, (10)
Clearly, V (Yw) is minimum for fixed C, or the cost C of a survey
is minimum for a fixed value of V (Yw), when each of the square
terms on the right-hand side of (10) is zero; or in other words,
when
(i = 1,2, ... , k) (11)
88 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
(12)
1=1
Hence
P"SiCO (13)
;/(::(--.1;
,=1
P,siV C )-
i
(15)
k (16)
Vo +~ L:
i == 1
p,Sj2
Hence
k
(17)
. I \ , S')
l-o + N W Pi ,-
i~1
VlI + ~ L: j=1
P,SI2 ( 18)
so that the minimum sample required for estimating the mean with
fixed variance Vo is given by
II (19)
kL:
k
Vu + p,S,2
'=1
- NI1) p, 2S 2
, (20)
90 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
V (Y.,)N
=
k
.L:
{ j=l
L'P,S,
npi:'-;-
_ l}p,",
N1 '~i
i=l
=:1 C )'
~ PiSI
I
k
N Lp,S,2
1=1
Neyman allocation.
For proportional allocation of the sample among strata-
(23)
(24)
('''12;k PIS. )2
n = k
(25)
Vo +~ LPIS ,2
I-I
STRATIFIED SAMPLING 91
n = --~'=~I--k~--- (26)
Vo + ~ .z= p,S,2
1=1
V(ji.)us ~C - ~)S2 r)
For purposes of comparing (27) with (28), it is necess,ry to
express S2 in terms of Si2 •
Now the total sum of squares in the population can be split up
into two parts, viz., (0) within strata, and (h) between strata, in
accordance with the identity:
k N, k Nj
2.: 1:
;=1 .i=1
(Yo - jiN)2 == ,1'}.;
i=l j=1
{Y'l - _VNi + YNi - YN)2
k N, k
== ,1' 1,; (Y,j - jiNj)2
1=1 J-=l
+ l) N, (YN; -
4-1
jiN)2
(29)
92 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
or
(30)
(31 )
L
k
The expression shows that the more the strata differ in their
means, the larger is the gain in prec;ision due to proportional
sampling over unstratified simple random sampling.
(h) Neyman Allocation
Here, again, we shall first express the variance of the mean
under Neyman allocation in a form suitable for comparison with
the variance in unstratified simple random sampling. We shall
make usc of the identity:
(33)
where
k
S'D = 1: p,S,
'.1
STRATIFfED SAMPLING 93
(34)
i=l i=l
I: Pi
k
We see that the larger the differences between the strata standard
deviations, the larger is the gain in precision of optimum over
proportional allocation. Further, on substituting for V (Yw)p from
(31), we obtain
-
V(Y")N=
N -- n
Nn {s' . t p , (S'., _.1,)'
- /'_ n t
Since the first term on the right-hand side of (36) represents the
p, (S, - S.)'} (36)
+ N~ n t."'1
pj (S, - S.,)2j- (37)
V<Yn)US = e- ~) 8
2
We therefore have
k
V (Yn)US - V ll'w)s = G- ~) 8 2
- [
i=l
G. - ~) p,28,2
STRATIFIED SAMPLING 95
giving us (32). As the allocation departs from (40) or (41), the first
term may not only become negative but be larger in magnitude
than the second, thus making a stratified sample less efficient than
an unstratified sample. The result is important and suggests the
need for care in the allocation of the sample among the strata.
30.6* Practical Difficulties in Adopting the Neyman Method
of Allocation
There are certain limitations to the use of the Neyman alloca-
tion in practice which will now be pointed out. If more than one
character is to be estimated from a sample survey, then the
allocation of the sample into different strata on the basis of any
one character, using the Neyman method, may lead to loss in preci-
sion on other characters as compared to the method of propor-
tional allocation. If, however, the characters are correlated, or
if certain characters are more important than others, then gains
in precision on the estimates of the more important characters
can still be secured by using the Neyman method of allocation.
However, the more severe limitation on the ut>e of the Neyman
allocation is the absence of the knowledge of Si'S. One method
of overcoming this limitation is to estimate Si'S from a preliminary
sample of n' (Sukhatme, 1935). These estimates will, however,
96 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
1=1
= t{t
j=1
p,2S,2 + t PJlJS?
I""J=l
~:} - ~ t
'E1
p,S/
(44)
This expression involves Si and we are consequently not in a
position to say whether this will give a smaller value or not as
compared to that for proportional allocation, viz.,
(! - ~) L p,S,2
k
V (.Y~)p = (45)
.=1
• Where, as in this case, the decision regarding the size of the additional sam-
ple to be drawn from each stratum depends upon the results of the fir~t sample,
the procedure is essentially what is called sequential sampling.
STRATlfIED SAMPLING 97
We might, however, obtain the average value of (44) in samples
of n' and examine how it compares with (45).
Now it can be shown (Sukhatme, 1935) that if the variate under
study can be considered as normally distributed and consequently
Si is distributed as x, the average value of the right-hand side in
(44) is given approximately by
where
1
I} + 2n'
= 1 (47)
E { V (y.,)
-
I n')J
5j N =
I
n
k
~ [I, S ,
)2 -
Ik
,2
N ~ [liS.
(
k ~
= C- ~) I: i=L
Pi S j
2
- !L
i=l
p, (Sf -- ~tr
t
i=l
P/S/}
(4~)
which, on using (30), can also be written as
7
+ 2!n,{(t i=1
piSI )2 - t '(=1
P/S/} (&9)
98 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
The first part on the right-hand side denotes the variance of the
mean under Neyman allocation when the Si are known. Con-
sequently, when the Si are estimated from a preliminary sample
of size n', this variance is seen to increase, on an average, by
(50)
L:
k
or
(52)
and
where
S, = ~j (1 + ~j) (1 + ~j)-l
sJ SJ S. SJ
E (S,)
s/
'"" 8/8, (1 + C2) (53)
i=l
I:
k
(55)
100 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
•t From Sections 2a .10 and 2a.ll of the last chapter, we know that,
for samples of n', C2 is approximately given by (fJ2 - I)J4n', so
that the size of the preliminary sample should be such that
(57)
STRATIFIED SAMPLING 101
where Si2 is the mean square in the sample drawn from the i-th .
stratum. If the total sample had been selected by the procedure
of simple random sampling without stratification, then the variance
of the sample mean would be
(58)
1=1 i=l
Hence
= t
'=1
PJNj2 + .t N~~jn_,
i=l
p,S,! (64)
It follows that
k k k
1: pS'ft, = 1: PiYN; + 1: p,f',
(=1 4-=1 4=1
or
k
Y., = YN + 1: P,f'i
i=l
(66)
(6g)
Hence
(69)
Est. Lr,
,-)
k
(YN,-YN)2 = 1:1',
k
.-1
(Y,,;_}'.,)2 __ I},
k
'~-l
(I-p,) !V.~~,nl s,·
(70)
STRATIFIED SAMPLING 103
Est. S2 = N N
~J ~ \>( Pi - N
1) Si
2
+ N N_ If' >_;)2
I ~ Pi (J"i ) '"
On simplifying, we obtain
t=l
+
N-n
(N - J) n
{
L
;:1
- - 2
Pi (Yni - y",)
i=t i=l
{~(_
N - n
+ (N - I) n '::! _.
Pi Yn, - Y.;)-
-t P, (I ~ p,) f} (73)
2:
k
Est. V (Yw)p = N in ll
PiS/ (75)
1=1
and the first two terms in (74) vanish. The net reduction in
variance due to stratification is, therefore, given by the last two
terms in (74). On substituting ni/n for Pi, this takes the value
since
=8.,2
Let
k
}; 11; U'n. - .f'w)2 = (k - I) iis h 2 (78)
i:':1
The quantities sw 2 and flsb 2 are caned the mean squares within and
between strata respectively, and are best calculated from what
is familiarly known as the analysis of variance table given below:
10'" _ I
Total n-I .1' 1.;
1=1 J
(Y'I-Y")
106 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
Est. V (5'w)p ~
s,/
n (80)
ESt. J? (Yn
- )
us - ES,t V (-) - k - 2 1 {-,
YID P - mb
2 _ 21
SID J (81)
n
The ratio of (81) to (80) gives the relative gain in precision due to
stratification and equals
(82)
(83)
The gain in precision estimated this way is n!(n-l) times the value
in (82), which is not likely to be c5f material difference in large
samples, provided the sample is allocated in proportion to the sizes
of the different strata. When the sample is not so allocated, neither
(82) nor (83) is likely to be satisfactory. The exact expression
given by the ratio of (73) to (57) should be used in that case.
30.8 Use of Strata Sizes for Improving the Precision of an
Unstratified Sample
Stratified sampling presupposes the knowledge of the strata
sizes as well as the availability of the lists of sampling units for
the different strata. The latter are not, however, always available.
Thus, the classification of a population by age is known from the
census tables although the lists of persons belonging to different
age groups may not be available for the selection of samples from
the different age groups. Consequently, it is not possible to know
in advance to which stratum a sampling unit belongs until it is
STRATIFIED SAMPLING 107
contacted in the course of the survey itself. While the sample in such
cases has necessarily to be selected by the method of unstratified
random sampling, we can always classify the selected sample by the
strata and treat it as if it were a stratified sample. In this section we
shall examine the gain in precision arising from such a treatment.
If the sample is to be treated as if it were a stratified sample,
then jiw would be the appropriate estimate of the popUlation mean.
This is easily seen to be an unbiased estimate of the population
mean, since
E (}'o.) = E {E (jill. I ni )}
= E(jiNI)
Hence
E tv.,) =}oN (84)
For fixed n1, n2, ... , n k, the variance of ji w is given by
(85)
E ( 1)
nl
~ 1
npt
+ 1_~
n PI
~j + 0 ( 14)
n
(86)
~~ {CI - Z- Dtp,s,·+~ts,.}
+ 0 (~,) (87)
108 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
It is seen that the first term in (88) is the variance of the mean
of a stratified sample with proportional allocation. We, therefore,
see that the adjustment of the results of an unstratified random
sample as if it were stratified gives almost as high precision as
a stratified sample with proportional allocation, provided the
sample is large. The result is obvious otherwise also, for, a large
sample is expected to be distributed in proportion to the sizes of
the strata.
3a.9 Effect of Increasing the Number of Strata on the Precision
of tbe Estimate
The variance of the estimate of the population mean from
a stratified sample depends upon
(i) the strata values of Pi and Si> and
(ii) the sample numbers ni.
We shall assume that ni is proportional to Pi> so that the variance
will now depend only on the strata values of Pi and Si. being
given by
(89)
The smaller the strata the more alike will presumably be the
sampling units comprising them and the smaller, therefore, will
be the values of Sill. We may, therefore, expect that under
proportional allocation the precision of the estimate will generally
increase as the number of strata increases.
For small departures from proportionality, the effect of increasing
the number of strata is best studied with the help of (88). The
first term in this equation, it will be noticed, is identical with
(89) and will presumably decrease as k increases. On the other
hand, the contribution of the second term to the variance of Yw
STRATIFIED SAMPLING 109
will increase as k increases. For N large and Si2 equal to, say
Sw 2, (88) may be written as
(92)
+ lOP!=1
]; p/p/ (y,,; _ YN) (Y"I - YNj) II ~1'}
• ,
Pk
(94)
For fixed p(s the second term is clearly zero, and we are left with
(95)
The mean square error will be the sum of (95) and the square of
the bias term in (93). We have
+ {t
'"'1
(p/ - Pc) YN,1
J
2
(96)
(97)
STRATIFIED SAMPLING 111
As n increases, (97) will decrease and so will the first term in (96)_
The bias term is, however, independent of the sample size. It
follows that (96) may assume a larger value than (97) beyond
a certain n, making stratified sampling less accurate than simple
random sampling.
An example will help to illustrate the point. Suppose that accord-
ing to an agricultural census taken in an earlier year, 80% of the
holdings were below 5 acres. Information for the current year
is not available but we will assume that the percentage of hold-
ings below 5 acres has increased to 85. Suppose, further, that
we have selected a stratified sample of 11 holdings allocated in
proportion to the known sizes of the two strata. Then, clearly,
the sample estimate of the population mean of the character
under study will be calculated from
Yw = '80j'ft, +- '20Yft, (98)
Stratum
2 2
Then,
M.S.E. (ji.,) = I
n
+ .0025 (104)
and
-)
V(Yn _1'1275 (105)
us - n
The table below gives the values of (104) and (105) for five dif-
ferent values of n.
25 ·0425 ·04510
50 '0225 ·02255
100 ·0125 ·01128
200 ·0075 ·00564
400 ·0050 '00282
It will be seen that for small n, the actual mean square error
of the stratified sample is smaller than that of the unstratified
simple random sample, but the superiority is lost after n = 51.
With a larger size of sample, the bias assumes still larger
proportions. It must, however, be pointed out that the bias will
not be known in practice and consequently the variance of the
mean of a stratified sample will continue to be estimated by the
first term in (96), thereby under-estimating the variance.
Case 11. Double Sampling
We shall now consider the case of double sampling where Pi'S
are estimated from a preliminary simple random sample and the
character under study is observed on a sub-sample selected from
the preliminary sample. let Q denote the size of the preliminary
STRA TIffED SAMPLING 113
sample, Qi the number of units in the i-th stratum and IIi the size
of the sub-sample chosen out of Qi (i = 1,2, ... , k). We have
Est. Pi = ~i
= P.'
and
1=1
k
The estimate E Pi'Yn, is now clearly the unbiased estimate of
(:1
( 106)
i=1
8
114 SAMPLING THEORY OF SURVEYS WrTH APPLICATIONS
Now from (71), (78) and (98) of Chapter II, Wij have
and
(1 J J)
v(r: (=1
k
'-)
PIYni
= r:k ( P..2 + N-Q
'=1
N - 1
p,(J -Pi»)N;-Il; S.2
Q N.Il.·
••
\ ' N - Q 1 (1
-f- W N - 1 Q Pi
) -, 2
- Pi hi ( 112)
1=1
(113)
+ t p, (Y .. - YN)'} (114)
(I16)
where
II;
Ub
2
= 1: p, (jiN, - YN)2 (I 17)
i-I
since the first term in (115) will be small relative to the second.
116 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
(119)
n
The reduction in variance is, therefore, given by
(120)
where the letters 'ds' stand for double sampling and, as before,
_ k
Sw = .E PiSi' If Q is large relative to 11, the reduction in
'=1
variance will approximate to the difference between the variances
of an unstratified simple random sample and that of a stratified
sample with Neyman allocation (vide Section 3a.5). In other
words, when Q is large, the procedure of double sampling will
be approximately equivalent to stratified sampling when strata
sizes are accurately known.
This comparison of the precision of single with double sampling
regardless of the cost of the survey is not, however, of practical
STRATIFIED SAMPLING 117
N- Co
_
_ ~c~ 82 (123)
Co
N
v
.:....J
1=1
'i'.)
(r.k Pi. II, ._-~ yk,
.:....J
1=1
p/2S j
~j
2.,,_ UbQ2 (125)
Using the method given III the previous sections, we form the
function c/> given by
cf> = ~
~ 2
p. S,2
'n + uQ +
b
2
IL
(
c1Q
~
+ C2 ~ nj
)
(127)
i
+ y'p,c L
k
i=l
giving us
1 Co
(131)
YP- ( v' C1 O"b + \/C2 i!' PiS.)
Hence
p,Sj Co
(132)
VC2 ~-v'C2 i~ p,S,)
i
n = ( v'C10"b
and
O"b Co
Q = VC; - (133)
( v'C-;O"b + v'C2 i~ PiS.)
Substituting for Q and ni in the expression for the variance given
10(125), we have
(134)
;=1
i.e., if
(135)
120 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
where (S -i PiSiY jb V 2
will usually be a positive proper fraction
for efficient systems of stratification.
so that
S2 = 5
Example 3.1
TABLE 3.1
Strata Means and Standard Deviations of Areas for Vii/ages
in Ghaziabad Subdivision
Size of
Stratum Village in Nj )'Ni Soc; So;
Number Bighas*
• 1 Bigha = i acre.
k )2]
(,EN-
~Ni
k k k
= I __ . [\' NS
N - I LJ
.2_
,to,
\'S
LJ to!
2+ \ '
LJ
N.V N .2
v,
_ 1=1__
N
i=l (:1. iCl
= 70850
122 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
N
s
rJl
;i;rIl
I _
0
N
::::'-
~
on
8 $
0 \0
0
f'I
M
M
§
N
cJj _, ~ M
0
;:;::
0
M
\0
~
on
0
00
~
"i: 00
'"""
"<t
on 0
~ <? ':'l
S -0
l 0\
("
l-
\0
N
I-
N
N
00
00-
S
-.
0
on ~ ~ ....
0
§
rJl
"i: € on
M M
00
00-
0
00- 'I'>
N '<t
""'l
~
~ "i 80:: § ~ §
I
§r--
~ '~
00'
'-'
l-
on
N
M 0
on
N
"'"
0
r--
.5 ~
'" \0 M
N '"'r,
or;
~
II::
.~ i 0 <:>
~
0 0
N .... ,~
~
S \0
0
\0
0
or,
00
or;
<:>
.....
~
t- 0- 01')
or, N N
!"'l
--'-~~
I.Ll
...l
QQ r--
.....i
r!
~
G'
'-'
--
N -0
......
N
...00
on
$00-
....
-II::
. 0
~ § ~ ~
~
'c- ~
,-
~ ~
II::
.:a.... ~ N
00-
\0
M
00 N ~
r--
N
'"
.st
;:s
~
a
...
rJl
-; ~ -
0
......
M
<:>
'" 8 Sf
'"'" '""" on~
M
@
00
00
-:
rIl
;; ,-.
C ..
":'
\0
,
-
~
<?
\0
00
":'
~
M
Hence
N-n 1
V(ji,.)us = - ir- . n S2
306
340 x-j4 x 70850
= 1875
306
x 7994000
(340)2 x 34
= 622
2. (ii) Neyman Allocation
The allocation of the sample to the different strata will be in
proportion to NiSWi shown in col. 9 of Table 3.2. On substituting
in (22), we get
= 460
2. (iii) Allocation Proportional to NiSai
The allocation of the sample will be proportional to NiS a;
given in col. II of Table 3.2. On substituting
n, nN,S.,
= ._._-_-
(1-,f NISal)
in the formula for the variance of the mean of a stratified sample
given in (7), we get
124 SAMPLING THEORY OF SURVEYS WITH APPLICATIO~S
1
115600 X 55840000
483
Now the relative efficiency of any given procedure of sampling
(B) compared to that of a standard procedure (A) for the same size
of sample is defined as the ratio of the inverse of their variances.
Thus
R.E.
VB VA
= -1-
VB
VA
The values of the variance obtained above for the different
procedures of sampling together with those of their relative effici-
encies as compared with unstratified simple random sampling, are
given in Table 3.3. The valu es of the relative efficiencies com-
pared to proportional sampling are also shown in the table.
TABLE 3.3
Relative Efficiencies 0/ Different Methods
0/ Stratified Sampling
R.E. compared R.F. cc mpmcd
Method of Sampling Var:ar.ce to Unstrat.fied to Proporticr.al
(Bighas)2 Sampling
Sampling
Stratified:
--.---.--~--."-
._------_.. _---_
STRATIFIED SAMPLING 125
TABLE 3.4
Crop-Cutting Experiments on Paddy, Kegalle District (Ceylon),
Maha 1951-52
Means and Mean Squares of Village Mean Yields per Plot
= 298·23 - 7·57
= 290·66
379
- 291
= 1·30 or 130%
STRATIFIED SAMPLING 127
TABLE 3.5
Crop-Cutting Experiments on Paddy, Kegalle District,
Maha 1951-52
Calculation of thl! District M(!an Yield and its Variallce
"I
I
Zn, = II,
-
1: j
Z;J
"I
1
= __ o.
n j 1: YII
N~ P~I
(136)
N·
ii' = .E' PljZ il = .)iN; (139)
1"'1
and
k
Z.. = 1} P, i ,. =.)iN (140)
i., = ~ L:
1=1
N,i"j
(141)
(142)
STRATIFIED SAMPLING 129
=
k
E {J;Pi(Zn,-Zi.)
}2
i=l
k k
l: p.2 E (Zn; - Z .. )2 + l: PiPi' E (Z.i -- Zi. )
i=l i-0.i'=l
xE (Zn,' - =.'.)
since samples are drawn independently from the i-th and the i' -th
strata. Hence,
k
in virtue of (137).
Using Section 2h.3, an estimate of V (zw) \\ill be provided by
k
11.
2
(144)
where Siz 2 denotes the mean square between z's in the sample for
the i-th stratum, and is defined by
Siz 2 -- 1 ..
n,- I
L:n, ( Zil _ ' ;....
' ), 2
maximize the precision for given cost or minimize the cost for
given precision. We shall illustrate the principle for the simple
case for which the cost of the survey is represented by
(145)
(t
1=1
Cin,)
where IL is a constant.
Clearly V (zw) is minimum for fixed cost, say Co, or C is
minimum for fixed variance, say Vo, when rP is a minimum. Now
4> can be written as
¢ = i (146)
i=l '=1
(148)
n, = -- P,u.. Co
k - --- - (149)
vc, (};'=1 p,'1,. vc;)
which is seen to be identical in form with (13) in Section 3a. 3.
STRATIFIED SAMPLING 131
n, = n (150)
( 152)
V (z .. )p =--" 1
11
[{t Pla.. }
i=1
2 + .t
i=l
PI (C-•• - a",zFJ ( 154)
where
k
{T",. = .E pja•• (155)
.=1
It follows that the efficiency of optimum over proportional alloca-
tion is dl!e wholly to the yariation among the ~trata stH.G<:J d
132 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
deviations. If th! O"iz are all equal, the two systems of allocation
become equally efficient.
3b.4 Comparison of Stratified with Unstratified Sampling
If a sample of n is selected as an unstratified sample with
replacement with selection probabilities PL (l = I, 2, ... , N), then
we have seen that
zn - -
n Lz,
"
n
L n
y,
NP I
(156)
and
N
Z.. = 1: P,ZI =YN (159)
1'''1
PI} =
1:P,
:1 (j = 1,2, ... , N,) (160)
Nj
where 1: denotes summation over the Ni units in the i-th stratum,
then
k
ZID =E p,!", (161)
'''1
provides an unbiased estimate of YN' with its sampling variance
given by
LP/ ~~~
k
V(!.,) = (162)
t&l
STRATIFIED SAMPLING 133
where
N,
CTiz
2
= }} P il (ZiJ - 2 •. )2 (163)
/=1
(165)
Also
Z,
P.
= Pi. ZII (166)
L
N
CT.
2
= P, (Z, - Z.. )2
1=1
'=1 J=l
k k
\ ' ,e(
Ll Pi. "
a 2 \' lPi' (!!i
+L P
ZI _
.
Z
..
)2 (167)
i.
•-=1 '-I
134 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
(168)
(169)
or
(170)
VUS - VP == I
11
1:
k
p I. (P.
p. Z- ..
,.
-- Z- .. )2 (171)
I
n
{~P.
LJ"
i ....
t
( Piai.
Pi.
.=1
+ nI {
L k P
i. (
P, Z
Pi. i. -
_
z.') 2}
(172)
1:
k
{VUS - VS}(Pi.=Pi) =
i=J
+~ L pdt•. - z.Y
i=l
(173)
( 174)
whence
(176)
136 SAMPLING THEORY OF SURVEYS WITH APPI.ICATlON~
whence
Est.
{
~
k
p~:,
2- 2
- Z,,2}
,'<,
L
k
Est. {Vus - Vs } =
i=l
It.
Ie
p,2 zn / _
+1
n Pi,
Zoo
-'J n L Pi~;;Z:_ (~i. -
'=1
1)
(179)
When Pi, = Pi. we get
k
Est. {V us - VShPi. = po = ! I:
1=1
(180)
REFERENCES
I. Neyman, J. (1934) "On the Two Different Aspects of the Representa-
tive Method: The Method of Stratified Sampling
and the Method of Purposive Selection," Jour.
Roy. Statist. Soc., 97, 558-606.
2. Tschuprow, A. A. (1923) "On the Mathematical Expectation of the Moments
of Frequency Distributions in the Case of
Correlated Observations," Metroll, 2, No.4.
3. Sukhatm~, P. V. (1935) . .. "Contribution to the Theory of the Representative
Method," Jour. Roy. Statist. Soc. Suppl., 2,
253-68.
4. Evans, W. D. (1951) "On Stratification and Optimum Allocations,"
Jour. Amer. Statist. Assoc., 46, 95··104.
5. Neyman, J. (1938) "Contribution to the Theory of Sampling Human
Populations," Jour. Amer. Statist. Assoc., 33,
101-16.
Ii. Koshal, R. S. (1953) "Report to the Government of Ce)lon on Sample
Survey of Rice," E.T.A.P., Food and Agricul-
ture Organisation of the United Nafions,
Rome.
CHAPTER IV
40.1 Introduction
In developing the theory of simple random sampling in the
preceding chapters, we have considered only estimates based on
simple arithmetic means of the observed values in the sample.
In this and the next chapter, we shall consider other methods of
estimation which make use of the ancillary information and which,
under certain conditions, give more reliable estimates of the
population values than those based on the simple averages. Two
of these methods are of particular importance. They are: (i) the
ratio method of estimation, and (ii) the regression method of
estimation. In this chapter we shall consider the former.
Let
Let
so that
(3)
where
N-n S/
N n (4)
Similarly, let
so that
(5)
where
(6)
(7)
We shall now suppose that the x's are positive and n is suffici·
~nt1y
large, so that
RATIO METHOD OF ESTIMATION 141
Expanding* then
(1 + ?)-l
as a series in € n', we have
, ,
ill + -
En
'2
E.€ ..
XN ' XN 2
YNX N
in '3 in '4 in in '3 )
XN3+ XN 4 YN XN3
+ ... J (8)
(9)
Hence
N - n x~-n) < I
I N xN
Expanding
(I _l'!_;; n x;;;t
by Taylor's theorem and working out expectation term by term. he reac~ed the
same expression for the expected value of R" as gi\en in this and the next
section.
142 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
Now
[(t,) (t . . )
- tfiifit'}
'=1
N--n 1
Nn N-J
N -n J
- --_ pS S (lO)
N nil.
(12)
where
RATIO METHOD OF ESTIMATION 143
and
i.e.,
{3 =RN (15)
It follows that
E(RJdE(~ln
= E c{3x. 1
1. x.. r
={3
=RN
(16)
Appendix to this chapter. Using then (6), (10) and the formulre
in the Appendix, and writing in terms of the moment notation
given by
i=l
+ (N - n) (N2 - 6Nn N + +
6n 2)
(N - I) (N - 2) (N - 3) n3
X (/L04 _ fLI3 )
fLOI 4 fLlOfLOl 3
N(N-n)(N-n-l) 3(n-l)
+ (N -
I) (N - 2) (N =--3) n3
We then have
E2 (Rn) = RN [I + ~ (fL022 -
fLOI
fLU __ )
fLICfLoi
+ 3 fLo22
n 2 fLoi
10
146 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
81 = Bl (1 + ! C. 2 ) (19)
Equation (19) shows that the contribution of the third and fourth
degree terms to the relative bias of a ratio estimate is 3Cx 2 /n
times the value of the latter to a first approximation. Unless
n is small, the contribution can, therefore, be considered to be
negligible. G. R. Ayachit (1953) has assessed the value of contri-
butions to the bias from successive approximations by means of
experimental sampling on a wide range of populations commonly
met with in surveys, and found that the contribution of higher
order terms is negligible. For appreciably large n, say 30 or
larger, even the leading term is found to be of no consequence.
(21 )
(22)
V1(R.) = RN I _ --n
N -1 ( C I
N- . . n ' + C•2 - 2pC,C.) (23)
RATIO METHOD OF ESTIMATION 147
or
(24)
When
(25)
(27)
= N (N_n - n) {s1/
2 + RN2S. 2 _ SS }
2R NP.,
= N2 {V (y,,) + RN 2
V (i.) - 2RN COy (Y.. , i,,)} (28)
and
(31)
(32)
and
(33)
RATIO METHOD OF ESTIMATION 149
Hence
or simply
= N; V (y I i)
and
k
V I (YR ) = N (N -- n) I " N V( I )
N __ I . n LJ i Yo , x, (35)
i==1
and
(c) V Cv I x) oc X2 say yx2, or V GIx) = y (36 c)
VI (Y R )
N-n y N2 (N - n) y
(a) N - 1 . nXN 2 (37 a)
N -I n
N-n y__ N2 (N - n) y_
. --. -.----. XN (37 b)
(b) N - 1 nXN N -1 n
N (N - n)
(37 c)
N -1
Est. VI (:n)N
= NN- n (~~:
n Yn
+ ~~_: _ 2S~.)
Xn .f"x,.
(38)
(40)
RATIO METHOD OF ESTIMATION 151
and
(41)
The reader will note that these are biased estimates but the bias
will be negligible if the coefficients of variation of y and x are small.
One special case of a ratio estimate, for which the estimated
variance takes a particularly simple form, may be mentioned. It
is the case of a weighted mean in which the weights are in the
nature of ancillary information, varying from one sampling unit
to another, with Yi of the form Willi and Xi = wi, so that
"
R n =.;;-,I/) = _l:_ItIi';TJ· (42)
1: IV,
The sample estimate of the variance for this case is given by
Est. VI (~.. ) =
N-n
Nn . (n _~ 1) L:" »,,2 (7], - ~..)2 (43)
and
(44)
(45)
(47)
X ( 1-'22
., 2
__ 21-'13
3
+ /'04)
4
1-'10 -1-'01 I-'lOfl'Ol /Lei
+ 3 (n - I) N (N __ nL<N~ n ---:!)
n 3 (N - I) (N - 2) (N - 3)
2
X (1-'2flIl 02 + 21-'u
2 2
_ 6}'U/L02a+ ~1-'o2.2) (48)
1'10 1-'-01 J.LIOJLol JLol ..
so that
= V(Rn)
RNI
+ 3 C. 2
n {V I (:;) +- ~ (C. - pC.)2}
(49)
Now
E (Rn - RN)2 = E [R. - E (R,,)]2 + [E (R.) - RN]2
Hence, deducting from (50) the square of the relative bias term
given by (13), we obtain
V2 ( R:R) = 2C2 C4
---fj-- + n2
(l - p)
{6 (1 - p) + 5 (I - p)2} (53)
(54)
(58)
or
k
}; (n)'i - N;) Xi = 0 (59)
k
= };n,zA.z V(y", I Xi) (60)
i-I
V (y I x == x,) = )IX,
156 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
i.e.,
where Si2 is the mean square for the i-th class. Hence
(61 )
(62)
(63)
Clearly, V (YR I nl> nz, ... , nk) is minimum when each of the
square terms on the right-hand side of (64) is zero, or in other
words, when
jml
whence
(i = I, 2. """. k) (66)
(67)
and
(68)
f' nx. I (N,=
'-.J j
N -
I)
nj
(69)
and
(70)
showing that the ratio estimate under the conditions stated in the
beginning of this section gives the best unbiased linear estimate,
provided Nt'S are large.
IS8 SAMPUNG THEORY OF SURVEYS WITH APPLICATIONS
When, however, Ni'S are not large, the estimate Y R will be only
approximately given by (69) and the effect on the variance of Y R
will be to multiply (70) by the usual finite multiplier (N - n)jN.
For, estimating Ni from the sample, we have
and hence
N,-l N
Nj =--,,; :::: j{=1i (71)
and
(73)
We notice that the variance depends upon the set of x's which
happen to turn up in the sample. In repeated samples, to a first
approximation, the average value of (73) is given by
(75)
1_ (C 2 -
Vn. 2pC C
• r + C 2)~II
R" _ t
R~ Ca,oo)
}__ (C 2
Vn· - +•
2pC C
•• C ~)l:s:::: I ~;::
"" --, _Rn
RN
(77)
S. I = ~
n
(S/ - 2RN pS.s. + R N 2S. I ) (78)
162 SAMPLING THeoRY OF SURVEYS WITH APPLICATIONS
We have worked out ill Table 4.1 the ratios of YN, to xN , and
of V (y I i) to XN, for the different classes. The ratio of YN , to
xN , will be seen to be fairly constant, showing that the relationship
between y and x is approximately linear. V (y I i) also appears to
vary as x up to 1000 acres but not beyond it. It has, however, to
be observed that the coefficient of correlation for the last class,
namely, with villages having area larger than 1000 acres, is rather
large and the calculated value V (y I i) cannot possibly give for
this class a correct idea of the variance of y about the line
y = RNx. On the other hand, any further division of this class
to study the behaviour of the variance of y with x is also not
feasible owing to the fact that the number of observations is few.
As about 35% of the livestock population is accounted for by
villages with agricultural area larger than 1000 acres, it appears
advisable to study separately areas less than 1000 acres and those
with larger acreage. The ratio estimate may be used to provide
an efficient estimate of the livestock population for the first group
comprising all the villages in the first six classes.
Example 4.2
Calculate for a sample of 64 villages, the sampling variance of
the estimate of the total livestock population for villages with area
less than 1000 acres based on (a) the simple arithmetic mean,
and (b) the ratio method. l-Ience calculate the relative efficiency
of the latter as compared with the former.
For obtaining the variance of the estimate based on the simple
arithmetic mean, we need the value of V (y) based on all the
319 observations (N) with area below 1000 acres. This is given
in col. 8 of Table 4.1.
Substituting in the formula for the variance of the estimated
total based on the simple arithmetic mean, we obtain
_ N-n V(v)
V (Ny..) = JV2 • N":'1 • -,:--
and, therefore,
Also
V(x) = 39528
V(y) = 8292
and
P y'V(x): V(y) = 12378
Substituting these values in the expression for the variance of the
ratio estimate of the total livestock population given above, we
obtain
N (N - n) X ! x J{ ~ 1: 1
i:C:l
Nt V (y I i)
1
= 319 x 255 x 64 x 4602
= 5849000
This value is only slightly larger than the one calculated aho"e,
as one would expect owing to the small values of p within the
classes and the close linear relationship between y and x.
It will be seen that the variance of the simple arithmetic mean
estimate of the total livestock population exceeds the variance of
the ratio estimate by 88%. showing thereby that the latter is 88%
more efficient than the former. This large gain in efficiency of
the ratio estimate is to be expected in view of the high correlation
between the agricultural area in a village and the number of
livestock in it.
lUno METHOD OF EmMAnON 165
'0
",t::;l
- -
'0
~
1t '-'._
.,0.0
.:s;:_~
UCu
.-.
e :2; "! c:-
'" ~
VI
-
VI
0-
co
...,.
10
N
'"
g
VI
N
00
00
'"
"?
0
8
§
'-' C/l.-."'g .~
~
... 1t~.5
"'0.0
.-.
.....,
00
-'" '"
0- ":'
r-
"! 00
N
N
0-
r-
'"
00
"0
....
r:!
-
VI N
~ .:s""'s
U~o
10 '" 0- co 10 U
-s....
"""u '" 0
-
.0
;:s ...,.
......
;:s 8 .-.
t;,
...,.
VI 9 .;.
VI
on
~oo on
VI
10 "!
....
10
'"'
<IS
:a
e
M
~
N
.;: ~ ...,.
M
M
r- 0-
...
eo ....
0- on
6 0
~ ~
§ on 00
":' ::s:
<:II
-l:: '7 € ....
0- ":' "! 00
N
0\
10
.... <IS
...... r-
10
r-
;
N
0\
00 ""M 6 :! ~
'"I:s f6 0
§ v~
,-... 810 0 .~
-
r- on 0-
;:.. ~ ~ 00 N
N ....0
r- r-
- ,~
~
' - ' .~ I
~ 6on r- 0:: '<1' N
~ .~ ~ ...,.
0\ N on N
0
6 3
III
;;..
C
~
......
._~
"> .J:J
;:s
0
0 co
""Vi~
v:, .... 0 "! N r- VI
on
N
on "?
- ...
00
~ 10 0
* "'-l
I
~
.....
0\
0\
r-
~
on N
";,.
6 6
0 ',:j
";:.."§
...
M
~"
-
~
~
~
~
~
i e-
~
~ :E ...:
::1,<:;
§o
c.-
'S
<:II
~ ..
';;':
c:
~
<IS
'- .. ..."
';;': c:. oS
eoo&!
~
Z'"
i:'
",0
-;~';;'
t .. e ~ ; ~
.....,
8-
oS . . .. .::: ~
~
~~< " ~" it en
!::
§ .Eel .
~ -; '-0
en
i'1:i ~I ~ i 81:I~
v:, "'''''
'" - co
..!!! E
~ t :ii ii:.t i ~ '8_
..!!!e..!!! ....0 B e '10(
!~
.0
u'i~ 'i c::> 'i ~
.-. ';0..
::..
--
<
I
l. !
'"c:
'"
II
:l
c
~
::..
Q
b
::.. Q:
i
III; :t
H •
:I
!
166 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
= .Ek f R n , x 2,'
j
NI} X Ii
t=l \. 1-=1
k
=-= E R., X N,x N , (84)
'=1
Now
k
E (Y B) = 2,' N,XN,E (R. t ) (85)
1=1
t=l
= N
L: p, (N, - n,) (S
n,
. _ hi
2..L R 2S
I Nt h
2 - 2RNtP,~StillSI.) (89)
(90)
1: PrY.,
'-1 _____ _
~
(91)
1:
1=1
PIX.!
and denote this ratio by Rn. and the estimate of the population
total by YR. in order to distinguish them from the corresponding
168 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
k k
E Pt.l'nt =YN
t=l
+- £11 and J; p/x llt
t=1
00-= iN +- £ n' (92)
where
t=1 t=l
(94)
Then to a first approximation
(95)
(96)
.
To have an idea as to how rapidly the bias diminishes with the
size of the sample, we will suppose that nt is proportional to Nt
and Stx, Sty and Pt are constant. The relative bias will then be
seen to be given by
N -n I
- -N- --- n
- (C·2 - pC C )
, , (97)
It follows that even when the size of the sample within each
stratum is small, a combined ratio estimate can give a satisfactory
estimate of the population total provided the total sample IS
sufficiently large.
To work out the sampling variance of the combined ratio, we
have, to a first approximation
RATIO METHOD OF ESTIMATION 169
k
8,/
+[ N,N-,n,n,
t=1
. p,
2
.
.
X., 2
N
'!~_~ n,
N,n,
N, - n,. p, fS
(' I.2 TL R N2S I»2 -
S 8 ,_}
2RNP,,,
n,
t=1
(~8)
=L t=l
It will be seen that (99) depends upon the magnitude of the varia-
tion between the strata ratios and the value of
(R N ,S,z2 -- P,S"S,x)
Let
N,-n,
(100)
n,
(101)
Example 4.3
From the livestock data referred to earlier, it is proposed to
draw a stratified random sample of 73 villages (amounting to 20~~
of the total) and to estimate the total livi!stock population for
the entire subdivision. The villages having agricultural area up to
1000 acres constitute the first stratum and the remaining villages
the second stratum. Calculate the optimum allocation of the
villages between the two strata if the method of estimation to be
adopted is (i) ratio method with a common ratio for both strata,
(ii) ratio method with separate ratios for the two strata, and
(iii) simple estimation within each stratum.
Also calculate the sampling variance of the estimated total by
each of the above methods and hence compare their efficiencies.
The relevant calculations for each method step by step are
presented in Tables 4.2, 4.3 and 4.4. The tables are self-
explanatory. The results are tabulated below:
- - - - - - - - - - - - - - - - - - - - - - _ . _ - . - - - - - _.._-----_.-
Number of Villages
Method of Estimation in the Sample Sampling Efficiency
- - - - - - - - - . Variance
Stratum 1 Stratum 2
4.2 TABLE
Stratum I Stratum 2
Agricultural Agricultural
Area Area
0-1000 Acres > 1000 Acres
(1) N, 319 45
(2) V(YI;) 8292 58402
(10) n,· 54 19
(II) N,(N, - n,) 84535 1170
= 6963000 + 1744000
= 8707000
., S.£. = 2951
Y = (151'9) (364)
= 55300
Stratum 1 Stratum 2
Agricultural Agricuit ura I
Area Area
0-1000 Acres > 1000 Acres
(I) N, 319 45
.. S.E. =2948
Y -= (lSI '9) (364)
= 55300
2948
%S.£. =55300.100 = 5·3
• The steps leading to this figure are reproduced from example 4.2.
t Obtained by distributing the total sample in proportion to N,S,,' shown in
row (11).
174 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
TABLE 4.4
Stratified Random Sampling with Simple Estimate
for Each Stratum
(Adopting Optimum Allocation Between the Strata)
Stratum 1 Stratum 2
Agricultural Agricultural
Area Area
0-1000 Acres > 1000 Acres
(I) N, 319 45
V(YR) = 16677000
Y = 55300
S.E. -= 4084
% S.E. = 5~~ .100 = 7'4
and
(104)
.
.E y, III
(105)
R" . 112
.E x,
and
N
.E Yi NJ
RN
i=l (106)
N N2
.E Xi
1=1
(107)
176 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
N-I
Np - Np2
N-1
- - -N- _ . p . q
-- N - 1 (l08)
Similarly,
(109)
and
N
S/= N __::_ 1 pq (110)
Lastly,
N
1: XiYi -NXNYN
pS.Sv = j=L N _ 1
_ 0 -- Npq
- N-l (Ill)
E G:) Z: {I -r-
coc N N~ II (N ~
1. ~ + N ~ I)}
Nl
N2
{I + NN-- nI . 1(Pq + I)}
II
Nl
N2
{I + N-:- n
N -1
. I}
nq
(112)
v
1
(Rn)
RN
= N-
N - 1
II • J (pq
n
+ P f-
q
2)
N-n
N-J
N-n
(113)
N - 1 npq
N (117)
8 ,.2 = N :._ 1 PI (I - PI)
J2
118 SAMPLlNG THEORY OF SURVEYS WITH APPLICATIONS
and
(118)
so that
- . / P,PI (119)
P= V(I--p,)(I-PI)
(120)
It will be seen that the formula is identical with (112) except that
q is now replaced by Pj. •
To obtain the variance, we substitute from (115) to (118) in
(24) and obtain
and
Xi
E (zn) = iN E (1\) = XN
L
N
n P, (z. - S·N)2
i=l
I
= n PC1.C1.
ISO SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
El (Rn) = RN {I + n1(~.22
xN
_ YNX
~(T~(T.)}
N
(122)
(123)
(124)
or
(125)
It follows that
VI (YR ) = ~I f.t
L=l
Pj (Zj - RNVj'f} (126)
or
(127)
RATIO METHOD OF ESTIMA nON 181
"
Est. VI (R,,) = 1 _12 _1 " (z _ R V )2 (128)
n V" n - 1 W i " I
and
(129)
z.
VI =
and
=NYN
showing that the ratio estimate for this case provides an unbiased
estimate of the population total. Further, from (126), we have
TABLE 4.5
Values of Total Cultivated Area and of Area under Wheat
in Two Consecutive Years for a Sample of 34 Villages
in Lucknow Subdivision
TABLE 4. 5-(Contd.)
Let ai denote the cultivated area in the i·th village and Xi and
Yi the areas under wheat for the years 1936 and 1937 respectively.
Then Pi the selection probability for the i-th village is given by
N
Pi = ai/A, where A = 1: ai = 78019.
'=1
Also
Zi and
f I, = 8691 = 0.9138
1: 1/ 9511
YR = Rn X
= (0·9138) (21288)
= 19453 acres
\~
(/
.L.J, _ R ,.iI ')2 = ~,
;.. I 2 - 2R ,,~,'
;, II' +R 2 ;
n ~
1'2
l
= 165082
or
i: (Zi - A
R.V)2 ..c= ( l000N )2 L'• (I, - R.l,')2
= (0·45894)2 (165082)
= 34770
Hence
(170)2
= (34) (33) • 34770
= 895600 acres 2
If the information for the previous year had not been used,
the estimate of the area under wheat in 1937 would have been
N
= (0,45894)
n
(0,45894) (170) (8691)
~- 34
= 19943 acres
and
(170)2 2
(34) (33) (0,45894) (2505925 - 2221573)
0
¥3 (0,21063) (284352)
1542700 acresl
or
= 72,3%
REFERENCES
I. David, F. N. and "Extension of the Markoff Theorem on Lcast
Neyman, J. (1938) Squares," Statist. Res, Mem., 2, 105-16.
2, Cochran, W. G. (1940) .. "The Estimation of the Yields of Cereal Experi-
ments by Sampling for the Ratio of Grain to
Total Produce," 101lr. Atr. Sci., 30, 262-7',
RATIO METHOD OF ESTIMATION 187
APPENDIX
where
and consequently
Let
N
1: €,G€/a. = Np.G, a. (1)
j
(2)
Also
(4)
(N - n) (N - 2n) fLI2
= (N _:-l)(N ="-2)' it2 (5)
It follows that
(N - n) (N - 2n) fL03
= (N~ i) (N- 2) n2
(6)
Similarly
E (i,,«,;1) = E [(i:) (i"i,,'I)]
Substituting from (4), we write
= n'
3 (n - I) (N - n - 1) J
+ (N - I) (N - 2) (N - 3) Nl-'lll-'02 (8)
It follows that
3 (II -- I) (N - n -- I) .1 (10)
+ (N-I)(N-2)(N-3) Nf'OI J
Lastly
14 E ["
.E, Ej 2'2
Ej +.E ft
(2£,E;E, '2 + £, 2'2
EJ + 2£, 2'"" £1
n i=/-J
+ 2E,EjE; ~/) + i
'<FJ=/-k-F1
£'£1£/ £t' + i:
'-!<i+t
(£,E k£/2 + 2£,£tE:£/ + £,2£/£.' +2E1",£,'".')J
(I)
where
and
E «(jj2 I i) ,.'" constant, say, y (2)
k
Y ==" Na + P 1: N,x,
1=1
(3)
When a and f3 are estimated from the sample and the population
total of x is known, the right-hand side of (3) provides an
estimate of the total. This estimate is known as the simple
regression estimate. To distinguish it from the estimate of the
population total based on the ratio and the simple mean methods,
it will be denoted by Yz, and the mean by )il'
13
194 SAMPLING ntEORY OF SURVEYS WITH APpLIcATIONS
L:
1_1
(n,", - N,) (a + (3x,) = 0 (5)
where
=y (6)
Hence
(7)
REGRESSION METHOD OF E'iTIMA TloN 195
To mmlmlze (7) subject to the condition (5), we shall use
the Lagrangian method of multipliers and form a function c/>
given by
k 1.0
</> = 'Y :E
i=l
n/Il - p. :E
i=l
(n,A, -- N,) (a + {3x,) (8)
04> = - p. :E
k
(njA, - N,) =0 (10)
()a ,~[
and
( II)
where
" (a'
'Y + Wx ) = N
n
(14)
and
(1 S)
196 SAMPLI"I'G THEORY OF SURVEYS WITH APPLICATIONS
On substituting for fi' and a' from (16) and (17) in (12), we
obtain
(Xi - Xn) 1f (.
1= ,
I 2
, ... ,
k
) (18)
(19)
V(Y) =
I
N2 Y {I n +
L
k
(X N -Xn)2}
I1j (Xi -
~
Xn)
(22)
i=l
(23)
and
~ Ni
E E )';J (x. - XN )
f3 = '=1 k J0:..1
E N j (x, - XN)2
i-I
or
N
= EX(X -xN )
N
1: (x - XN)2
198 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
(27)
(28)
and calculate its conditional expectation for fixed nb n2' ...• nk.
This is best done by expressing Q as a function of the ~ 's, which
are defined by (1).
From (1) we have
(30)
whence
t k
1.: n,ji", = na
i;:::l
+ {3nx" + }; n,E .. ,
'_1
i.e.,
a -t BXn + n
(31)
Also
I.
}; n,f", (x, _- Xn)
p= i;;;::1
k
1.: n, (x, -- Xn)2
1~1
k
1.: n, (x, -- x.f
'=1
k k
f3 1.: n,x, (X, - .in) E n, En, (X, - .in)
k
+ '=1 k
1.: n, (X, - .i.)2 }; n, (X, _- Xn)2
,., ,.,
k
}; n,E n , (X, - X.)
f3 + '&~ (32)
}; n, (x, - .i.)2
200 S4.'tfPLlNG THEORY OF SURVEYS WITH APPLICATIONS
(33)
k
Setting}; ni (Xi - xn)2 = nm2 and expanding the right-hand side,
i=l
we obtain
i==1
Ek
11/£ n,
)
,.'::1
~~-----
II
k
- (x; - xn) }; n, (x, - x.) in,
x '=1
I }~
1 .f n,£ n,
k _
(X, -
_
X.)
nm2
REGRESSION METHOD OF ESTIMATION 201
we obtain
E (Q) = (n -- 2) y (35)
It follows that
Q
Est. 'Y = ( n - '"')
~
(36)
(37)
(ml_~#:'l)1
ma
Without loss of generality we may assume J.'t to be zeto.
Obviously, the expected value of mil/me can be determined
202 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
where
n (n -1) (n -2) . .. (n - j + 1)
N (N _:_. 1)(,v -. 2):-.. (N -- j -f I)
Neglecting terms of order smaller than l/n2 and retaining terms
in liN and I/N2 for completeness, we obtain
4n _ N_l + nN
2
!_~ _ 6_)
NI
(40)
4 --I- ., )
nN ' N?
9
n2
+ 21 _ I _
nN N A"
12) (41).
Substituting in (39) from (40) and (41), and using the known
result that
where
(43)
(44)
(45)
To obtain the variance of the mean 11, we have only to divide the
expression on the right-hand side of (44) or (45) by Nt,
204 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
i.e.,
k k
1: n,>', (a + fjx,) = 1: N, (a + (3x i)
or
t
1: (n,\ - N,) (a + fJx.) = 0 (47)
'-I
The variance of Y wI for fixed x's is clearly given by
'=1
(48)
lHiGRESStON METHOD OF ESTIMATtON 205
where
(49)
and
(50)
i=l ;=1
whence
(52)
k
- /1. J) (nj,\j - N j) = 0 (53)
j=l
and
(54)
(56)
where
a' - p.a. (51)
- 2' ,..R'
206 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
and
(58)
where
(61)
where
k
_ 1 \' _
(64)
Y. = W LJ. WcY.,
'-1
and
(65)
kEGRESstON METHOD OF ESTIMATION 207
To obtain the variance of Ywz, we substitute for ,Vs from (62)
in (48) and obtain
V(Y,)
'"
0.-= N2
W
[1 + (XN -X"y]
S",.2
(66)
and
k
1: n.Y' i (Xl - x.)
i-I
k
1: n, (X, -- x~)2
i-I
and on substitution we find that (63) and (66) reduce to (19) and
(22) respectively.
An examination of the expression for Y wL in (63) shows that
a knowledge of the true weights Wi is not necessary for calculating
Y wi; numbers proportional to Wi are sufficient for the purpose.
The variance of Y wi, on the other hand, requires a knowledge
of the true weights Wi. This raises a practical difficulty, since
the true weights Wi are rarely known. Often, however, the
relationship between the variance of y for a given x, and X can be
guessed, and numbers proportional to Wi can be known. It is,
therefore, important to investigate the form which the variance
of Ywl takes for certain well-known relationships between V (y I x)
and x. We shall consider only the simplest situation where
V (y I x) is proportional to x.
Let
nj . 1!_,_~}
)'x, N, - n,
(68)
y
where
w/ (69)
and
1 W'
W= (70)
y y
(71)
-
which now depends only on the known numbers wi' and the
constant y.
5.6 Estimation of the Variance of Weighted Regression Estimate
We shall consider this problem for the case where the variance
of y for a given x is proportional to x. Since the variance of the
weighted regression estimate for this case is given by (71), the
problem reduces to that of the estimation of y.
As in Section 5.3, we shall start with the quantity Q defined by
(72)
terms of Eij and x's, and then take expectations. Now Eij is
defined by (1), namely,
where
E (Eli I i) = 0
and
E (<:J I i) = V (y I A)
(73)
(74)
where
yx; • N. - n l
n. N,-I
y
(75)
w/
Also, from (74),
k ,_ k k k
I: w, Y., =
i=1
a I:
i~l
w: + f3 I: w:x, + I:
i~l i;::J
IV/i ft •
(77)
14
210 SAMPLING THEORY OF SURVEYS WITH APPUCATIONS
[t 1=1
W: (Xi - x"') Yni]
I..
={3+ L:
;=1
W/ (Xi - XU') ini (78)
On substituting for Yij from (1), for Yw from (76) and for
p from (78) in (72), we obtain
r
,,=1 i
= t t : ' {«(Ii -
(=1 j
flO)
REGRESSION METHOD OF ESTIMATION 211
'=1
><
{
[k; 11',' (x, -- X.,) i.,12
'~l
_ 2 f- f' 11';'
WW
i=l
tI;
tt
iCl
f'
WW
f..II'i'
fli
+ W'S",,2 {
1:k
,el
11',' (x, - x.) in,
12
i.,}2
r
k
[ ; 11',' (X, - X.,)
,"',
'=1
j~l
1
- W'S:,2 {~
W Wi
'2-
€.j
2(
Xi _. X.,.
- )2
1=1
+ t
.".,-""
W;'W:ioii". (Xi -f.,) (X_ -f.,)} (79)
212 SAMPLING 'tHEORY OF SURVEYS WITH APPLlcA..rto~~
we have
(80)
(81)
Now, let
(82)
and
(83)
(84)
and we have
llEGRESSION MEtHOD OF ESTIMAtION 21~
k nj
.E .E (Yii - ji,,)2
1=1 i
n
k "i
.E.E Yii (Xl - Xn)
r =-~ i= 1 I
Q nstIJ / (I --r2)
E St y = ==
. n-2 11-2
(86)
(87)
where
(88)
214 S,,\fPLlNG THEORY OF SURVEYS Willi AJ'PLICATIONS
where
and
J!'(Y.., ) = _n_
(90)
V (Y, ) ~, n,
Y "-' 2
,=1 Uj.
Now, let
uj
2
= y + Il (an (911
so that
and
provided
I~.~() I <
REGRESSION METHOD OF ESTIMATION 215
(93)
V (YIll , ) }
E { VCY,) ~ 1+ C 2"./
(94)
TABLE 5.1
Summary of Data Relating to Agricultural Area (x) and Number
oj Livestock (y) in a Simple Random Sample oj 64 Villages
Selected Jrom the Population in Table 4. I
Serial A rea of ", "i
1: .l' II 2
No. of illage ni L' J'ij
Class (Xj) I
63·73 2 16 256
2 155·33 5 333 26895
3 245·68 18 1810 281314
4 344·40 16 1991 330113
.5 491·56 13 1815 287079
6 767·49 10 2352 605510
I. V(ylx)=yx
and, II. V (y I x) = constant, say y; and sampling within each
class is with replacement.
Method I
The relevant formulre for the regression estimate of the popu-
lation total and its variance are given by (63) and (85). To
evaluate these we require the values of xw , jiw, swx2, Swy2 and the
estimate of f3. The calculations leading to these values are given
10 Table 5.2. From this table we obtain
Total of col. 5
x'" Tofal of col. 1
= 294·24
Total of col. 9
ji", = TotaiOfcor-7
102·72
k
.E w;' (Xi - XID )2
W'
Total of col. 16
- -total of col. 7
7952
·27297
= 29131
y-, w/ ft;
4646·4 - 2880·2
·27297
1766·2
·27297
= 6470·3
S",. = 80·44
k
E W,' Yn, (x, - i .. )
, leI
2314·6
= (13731) (·7.7297)
8479·3
13731
·6175
k
E W/Y., (XI - i,.)
2314·55
7952
·2911
= 319 f124·0J
= 39556
= 6·27
Method II
The relevant formulre are given by (19) and (R6) respectivt'ly.
We have
8317
64
= 129 ·95
24902
64
.'" 389·09
" ni
2,' .E YIj (Xi -~. Xn)
1=1 j
k
.E 111 (Xi - Xn)2
1=1
644382
c= 2455455
=, 0·2624
=c 319 (124'28)
= 39645
Again
n
1531167 - 1080768
64
= 7037·5
REGRESSION METHOD OF ESTIMATION 119
k
1: n x,2 - nx. 2
j
= '-I n
2455454·7
64
= 38366·48
and
k
.2; nJ"j (Xi - x,,)
n
644382·12
64
10068·47
7037.5 _ (10068'47)2
3lS366'48
= 4395·2
V(Y) =
I
3192.4395'2.
62
(I _.t (367'5338360.5
-389'09)2)
= 101761 x 71·75
0/ SE Y = J()()Y71·7; = 6 ·81
/0 . " I 124.3
TABLE 5.2
Computations Leading to the Values of the Regression Estimate
of the Livestock Population and its Variance
Serial
No. of nj Ni n, (N,-I) NI-ni w/x, XI wi' Y"I
Class
(I) (2) t3) (4) (5) (6) (7) (8)
(l - p2)
V (Y;)
, J.
= Wa, 2 -----
n (95)
REGRESSION METHOD OF ESTIMATION 221
The sampling variance of the ratio estimate under comparable
conditions of sampling with replacement is obtained by dropping
the finite multiplier from (28) in the previous Chapter, and will,
therefore, be
(96)
(97)
Comparing first the simple regression with the mean per sampling
unit estimate, we notice that the regression estimate is always
more accurate than the arithmetic mean estimate. Comparing
next the simple regression with the ratio estimate, we observe
that the former is more accurate than the latter if
i.e., if
i.e., if
(pO'" - RpfT.)2> 0 (98)
~I
There is thus little to choose between the simple regression and the
ratio methods of estimation, since the regression of the number
of livestock on agricultural area is almost a straight line passing
through the origin.
. Sl
nj
(99)
(100)
N2UU2
11
(I _p2) (I + I)
II
(101)
(103)
Since Q' is a random samp~e of N', x'l ' provides the best unbiased
linear estimate of XR" Hence
E (5',.1 nh ns, ... , tlk" Q') = E (5'..) + Z' E {b (X'll - x,,)} (105)
I,
k
}; 11, (x, -- Xn)~
1=1
(108)
(109)
-_ E (v.)
-2 + N'2 {2.."
N2 E b (XQ
_ Xn)
- 2}
Now
(111)
Next
k
1. _
.._ 2
E f' . ~
tfW, n • -I' til (x , - ,x-.)
n }; nj (X, Xn) ;=1
--i-- 1 . . ___
n}; n. (x, - x.)
2 [{t +1=1
nj (a {3X j ) }
i=L
> E (X Q • - X.)2
Lastly,
E UQ , - .\',,)2 = E (x~/ - XN' + XN' - Xn)2
_(1Q' - N'I)S2
- ~IN'
+N2(_
N;~ XN -
-)2
X" (115)
where
N'
I \'
S2~IN' -
- N' _ 1 W (Xi - 'eN' )"•
i=l
Also
(116)
(117)
(118)
REGRESSION METHOD OF ESTfMA TION 229
(119)
Also
k
y
_
}.~
2 ~~
y
Q'n (120)
i~ nj (Xi -- Xn)
(123)
230 S.~MPLING THEORY OF SURVEYS WITH APPLICATIONS
(124)
= ( n1 - Q'+n
1). 1
n - 3
(125)
1: n, (Xi - Xn)2
i=l
from (115)
+ 1 {.2
.1" -
Q l
n .- 2 j
(127)
Q"----jII
5.11 * Successive Sampling
In using the method of regression we have so far assumed that
ancillary information is available for all the units in the sample
for which the character under study is observed. This is not,
however, always the case and the formulre given above need to be
extended to make use of the information contained in additional
units recording only the character under study. This is the case,
for instance, when the same variate is observed on two successive
occasions in time, there being in the units observed some which
are common to both occasions and some which are exclusive
to each occasion. The observations on the earlier occasion are
used as ancillary information to improve the estimate of the
population value on the second occasion. We shall assume the
population to be large.
Let the character be observed on Q' + n 1 units on the first
occasion and nl + n2 units on the second occasion, of which nl
are common to the first occasion. We may, as in (122), form
a regression estimate of the mean on the second occasion based
on the nl units, viz.,
(128)
232 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
Thus
v (z)
(131 )
v (v;,) ~F v (z)
where, from (126),
and
S2
V (I'
. n,
) = -y-
n
2
(132)
where
- "Ill
X = mean of the same n', units on the preceding occasion
n
n'I,
}; y (x - i.,)
,,'I,
1.,' (x - Xn'h)2
n'h
Assuming independence of }; (x - Xn 'h)2 and (h __ l-.Xn.,)2, which
will necessarily be true in case of normality, we have
(135)
234 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
But
- )
V(Yi-I + "S2", h_ 1
-
2C OV Vh-I,
(" ..,)
Xn'n
nh
(136)
since
(137)
The validity of (137) follows from the well-known result that the
correlation between any unbiased estimate and an efficient estimate
tends to the square root of the ratio of their variances. The
approximation involved will affect only terms of order (ljn'h)3 in
the expression for V (Zh)' The result is, however, exact if units
are common only between two consecutive occasions, for, we
have in that case
since
Thus
(138 )
x {n'I _
•
.1.
'1'.-1
I} + P2
n.
.-1 i, la-I
S• 2 «P.-I
".
n .-1
J (139)
REGRESSION METHOD OF ESTIMATION 235
Writing
'2 _ 2 I-p\, i-l
p II, ~-l = P A, 1--1 - --,--- -
n,.-
3 (140)
= ( 1 -
v (z)
V(v:,)+ V(z)
)2 V(z)_
1_ ( _ V(z) __ )2 V(- )
I V (V",) t- V (z) )''''
V U'",) V (z)
VU'",) + v(i)
(142)
236 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
(143)
(144)
we obtain
(145)
REGRESSION METHOD OF ESTrMA TION 237
A. EQUAL CLUSTERS
V·
. t. the mean per clement of the i-th cluster, given by
(1)
(3)
240 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
and
_ 1 \' _
Yn . = n W Yi, (4)
V (5'",) = N - n S2 (5)
Nn "
where
N
N -1 1:
1=1
(5'1. -YN,)2 (6)
where
N M
V (f'nM)
E (9)
V (5'n.)
NM -nM 82
NM nM
N-n
. 8 1,2
Nn
(to)
TABLE 6.1
Analysis of Variance
Degrees
Source of Variation of Mean Square
Freedom
L:
N
Example 6.1
To show how efficiency changes with the size of a cluster, we
give a numerical example from data relating to the use of
clusters of different sizes in estimating the area under wheat.
Table 6.2 gives values of the mean square between survey
numbers in a village (8 2) and the mean square between clusters
(M8 b2 ) pooled over 11 villages in the Meerut District (India).
1tl
242 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
TABLE 6.2
Efficiency of Clusters of Size 2, 4, 8 and 16 Survey Numbers
Size of Cluster
Mean Square (Acres)2
2 4 8 16
._-----------_._._-- _._-_
Between clusters within villages .. 138'6 180·7 245 ·1 333'9
Between survey numbers within villages 108·3 108·3 108·3 108·3
Efficiency 0 . 7t! 0·60 0·44 0'32
It will be seen that the efficiency decreases rather rapidly with the
increase in the size of the cluster, clusters of 2 being only about four-
fifths as efficient as individual survey numbers, those of 4 about three-
fifths, while tho~e of 16 are only one-third as efficient as individual
survey numbers. In other words, the sample of clusters of size 16
will have to be three times as large as the sample of individual
survey numbers in order to give an estimate of equal precision.
If clusters are random samples of M from a population of N M
elements, and consequently composed of elements which are not
more alike than those of other clusters, then the mean squares
between and within clusters will behave as random variables and
their expected values will each be of the same order.
For,
E (Mean square between clusters)
CHOICE OF SAMPLING UNIT 243
Now substituting from (35) in Section 20.5, we have
= S2 (II)
Similarly
E (Mean square within clusters)
I
N (M -- I)
{NM (.J'N l +- NMNM·-- I s~)
-- MN ( YN l -/-
NM·- M
NM . M
Sl)}
(12)
since the expected value of the two middle terms is clearly zero.
To evaluate the first term of (14), we firs, work out the expectation
for a given i. We have
E {(YII - y,.) (Ylk - h) I i}
M
= M (~ - 1) L: ?,iJ -
/,pk=l
YI.) (Yik - YIJ
( 15)
i=l
(16)
CHOICE OF SAMPLING UNIT 245
where
L 8/
N
8,.2 N
i= 1
The values of the second term in (14) and that of the denominator
in (13) are known, by definition, to be (N - 1) Sbz/ Nand
(NM - 1) S2jNM respectively. We thus have
N - 18 2 - ~",2
N I, M
p ( 17)
NA1_~ I 8 2
NM
1
p = - NM-l (18)
Now, by definition, S2, Sb2 , and Sw2 are related to each other by
the identity
(19)
For N large, the formulre (17) and (19) to (23) can be approxi-
mated by the simpler expression s
8 2 _ Sw 2
b M
(24)
82
(25)
2
8b2~ 8
M {
1 +(M-I)PS) (26)
Sw 2 ~ 8 2 (I - p) (27)
VI,,) N -n 8 '1
\Y.. :::: N . 11M (28)
and
1
E ~ ..- --~------- (29)
- 1 + (M -- I) p
(30)
(31)
(33)
(34)
(35)
M--I
M . 0-2 • (1 -- p) (36)
and
-; __ N_Il.0- 2 f }
V(}".)--N_I nMl1+(M--I)p (37)
TABLE 6.3
Relative Change in Variance with Increase in Size of Cluster
4
Descrip- Fre- Proportion}; (Yij--!)2 ,E (1'11-y;')2
Class lion of quency of Males !.::i=~l~_~ 1=1
Household PI rl. 4 4
2 male
children
i i fir n
2 1 male -! -! ! 0
child
3 No male
children
i ! "
llr n-
------~-~~"-~-~.---~-.-- ..-..
Total population -! 7
1H ..J~
-~---.-- ..-.
CHOICE OF SAMPLING UNIT 249
table Yij = L if the j-th member in the i-th class is a male, and
Yij = 0, if it is a female.
Since the population under consideration is an infinite population,
we write
=[p,
i=J
(38)
4
Also
7
(39)
32
and
S,,2 ~= 11,,2 = L" Pi 0\ - p2
,=1
I
(40)
32
4 4
The values of 1 .E (Yij - t)2, .1 .E (Yij - j\J2 and (.Vi. - U2 for
j=1 j=1
the different classes, and those for the whole population repre-
senting a 2, uw 2 and ab 2 are also shown in Table 6.4. On substi-
tuting in (33), we obtain
I
6
The result is otherwise obvious also. For, the correlation between
the sexes of husband and wife is -1, but that for every other
of the remaining five pairs is zero, since the sex of husband or
wife will not determine the sex of their children, nor will the sex
of one child determine the sex of another. The average value
of the correlation between sexes of different members in a house-
2SO SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
Between clusters n -1
tI M
" u
Total sample nM-1 ;M1_IL L (_1"I-Y.,)2=S2
--_.-_-----------._-_.
•
__._----- -
1-1
CHOICE OF SAMPLING UNIT 251
(41)
Hence substituting from (41) for Est. (S2) and writing .\'b 2 for Sb 2
In (10), we obtain
for large N.
Example 6.2
Table 6.6 gives the analysis of variance of the area under
wheat for a sample of 44 clusters selected from 11 different villages
in the Meerut District (India). Four clusters were selected from
each of the J I villages and each cluster consisted of 8 conse-
cutive survey numbers. Estimate the relative efficiency of the
cluster as a unit of sampling compared with the individual survey
number. The total number of clusters in a village may be
assumed to be large.
On substituting the values from Table 6.6 in (41), we obtain
for large N,
2 M-l_2
Est. S ~ Sb
2
+ '--ij- S",
= 25!'4 +~ (112.8)
= 130·1
252 SAMPLING THEORY OF SURVEYS WITH APPlICATIONS
Hence
Est. (Relative Efficiency)
130·1
251="4
= 0'52
TABLE 6.6
Analysis 0/ Variance 0/ Area under Wheat
Degrees
Source of Variation of Mean Square
Freedom
where S2 will now represent the total mean square in the infinite
population of which the finite population is itself a sample.
Equation (45) also shows that if we regard the popUlation itself
as a single cluster and M is consequently very large, the within
cluster variance Sw 2 will approach S2 as expected.
If instead of assuming the relationship given by (43) we assume
the one given by (45) which satisfies the condition of Sw 2 being
independent of the size of the population, the expression for the
mean square between clusters can be written as follows:
5 b2 = M (~=- i) {(NM - I) 52
52 (NM } (46)
= M (N -I) (Mu - 1
CHOICE OF SAMPLING UNIT 255
which in the limit tends to S2!Mg. Sb2 now depends on Nand
in fact increases with it when 9 < 1, which is logical, since the
variance of cluster means for clusters of a given size when
clusters are widely separated should be larger than the variance
observed when they are close together.
Jessen (1942) showed that although Fairfield Smith's original
formula, namely (43), describes the yield data extremely well and
the refinement suggested by him, namely (46), even improves the
relationship, most economic characters relating to farm data
follow a slightly different law. He postulated that the mean
square among elements within a cluster is a monotone increasing
function of the size of the cluster given by
(b > 0) (47)
The constants S2, a and b are evaluated from the data. For
this purpose, we require: (1) an estimate of the mean square
among elements in the population, and (2) an estimate of the
mean squares between elements within clusters for at least two
values of M. If we regard the total population as a single cluster
containing N M elements so that
S2 = a (NM)b
then we have
S.2 -_ (NM
.-_.- - - ----- _-----_- ~ -N (M -1) aAfb
I) a _..(NM)b
(49)
M(N -1)
2 78· 10 81·53
4 84'28 84·25
8 88'92 87'05
16 93·50 89·95
NM= 1176 108·33 110·22
(52)
(53)
()v v (54)
'On n
we get
(55)
(56)
(57)
(58)
nl = '::-:_CB±_{r:IIIl_±i~lfc01)~ (59)
2c1M
CHOICE OF SAMPLING UNIT 261
M oV
(60)
V oM
1 ()V
-V· 5M
6.9
TABLE
Numbers of Sampling Units which can be Covered, given
Several Cost Situations, Two Expenditure Levels,
and Seven Different Sampling Units
Unstratified Sample in the Statt of Iowa
Individual farm
B.
1·000
-
Total Expenditure of $2000
4012 1452 803 2886 1223 712
Quarter section 0·914 4293 1569 871 3057 1314 769
Half section 1·828 2494 852 462 1900 744 421
Section 3·656 1388 451 241 1112 407 225
Two sections 7·312 749 235 124 623 217 118
Four sections 14·624 396 121 63 338 113 61
Thirty-six sections 131·616 44 14 7 38 I3 7
-.--.------~
TABLE 6.10
Relative Standard Errors (% of Item Means per Farm)
Estimated for Samples of Different Sampling Units
and Taken at Random within the State, 1938
(Expenditure 0/ $1000. I5-minute Qu('Slionn(Jire (Jnd 2£ per Mile)
Sampling Unit
Items _------_-- --------
Individual S 2S 4S
Farm S. S. 36S
II. Number of corn acres .. 1·95 2·06 1·98 2·08 2·37 2·87 6·88
12. N limber of oat acres 2·36 2·59 2·66 3 ·05 3·78 4·91 12·76
13 Corn yield ·82 ·90 ·94 1·09 1·36 1·7B 4·73
14. Oat yield ·84 ·88 ·84 ·86 ·96 1·15 2·71
15 Commercial feed expcnditure5 6·n 7'06 7·60 9·14 11·7R 15·71 43·07
16. Total expenditures, operator 3·96 4·36 4·51 5·21 6·48 R·46 22·36
17. Total receipts, operator .. 3·16 3·49 3·64 4·23 5·29 6·93 18·39
lB. Net cash income, operator .. 3·54 3·82 3·84 4·26 S'D 6·57 16·l\2
264 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
6.11
TABLE
Summary of Sampling Unit Efficiencies
Number of Items Most Efficiently Estimated by the
Six-Grid Sampling Units, 1938 and 1939
-
v --- n
• n.
1["_. }i.
(63)
It is easy to see that this estimate will not give an un biased estimate
of the population value, for
1L
n
E (Y •. ) = E (j.,)
N
1 \' ~
= N LJh
It=t
(64)
266 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
YN. = ~'I:h
j=l
i.-,_
; Yn. -
1 f'M-'.)i.
nM L..J (67)
nk L
n
E (jin.') = E (M'Yj.)
CHOICE OF SAMPLING UNIT 267
= 1• . -n
nM N
LN
M-
ih
i==1
= y" (68)
N-n
v (Yo,') N
1 S '2
n b (69)
where
(71)
N-nS"1 (72)
Nn b
268 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
where
N
1 \ ' M,2 (,' -)2 (73)
N - 1 LJ M2 Vi. - Y ..
i=l
=5' .. (75)
CHOICE OF SAMPLING UNIT 269
It follows that
E(znJ =z .. =Y .. (76)
where Sbz 2 is the mean square between cluster h'S in the sample,
defined by
(80)
n- 1
V(zft. ) = U_l(_
n
(81)
where
(82)
270 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
and
E st. V(z-n.) = S,,2
n (83)
where
(84)
Further, let
E {(Y'i - Yi. )2 Ii} = rJ j
2 (87)
and
(88)
( M; _ .} 2 M,
Hence
(89)
(9Q)
(91)
(92)
p (93)
Also
(94)
(96)
N N
1) Vi. - 5' .. )2 _ :E (ji i. - .l'N. )2 + N U'N. - .V .. )2
i=l 1=1
- n~ L (~i - 1)
1c1
(Vi. - Y.Y
k Ni
J \' (~;
= - nNW M
- 1) W}·
\' (J). - -V.. )2 (103)
N'
1 \ ' {:;, -)2
N, W Vj. -Y ..
1=1
_ J \'
nN W if
(Mi _ 1)- {:;, _ Vi.
ji
'"
)2
= - n~ L (Z' - 1)
.=1
Ni V (Yi.1 M i ) (J04)
III. V 0\ 1M,)
Case J.-
VO\! M,) = y
- 1W
nN \' (M,
if - 1) (-Yi. - -)2
Y ..
i=l
=
·i=l
+ (~j - Iy - ---}
(106)
276 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
which is again positive, so that for this Clse also we can expect
zn. to be more efficient than Yn ..
Case III.-
V(.Yi.1 M i ) = ~2
,
It can be shown that, following the approach in Case II,
equation (104) can be approximated by
N
1) 0\ - _V .. )2 ,~ n~~2 ~ eMi
1=1
so that for this case also we would expect zn. to be more efficient
than Yn ..
These results are in fact obvious from an examination of the
first term in (102). This term, ignoring the sign, represents the
covariance between Mi and (h - Y.. )2. Now, ordinarily (h-Y .. )2
will decrease as Mi increases so that the covariance will be
negative and consequently the expression on the right-hand side
of (102) will be positive.
Yn./I
(8) Comparison of Zn. with
The mean square error of Yn./I when the finite multiplier is
ignored and n is large is given by
MSE ( - /I)
. . . Yn . ,..,_,
N
1 ,\,Mj
nN W 2,., -)2
M2 \,n - Y .. (108)
j~l
_! '\' ~i
= nNwM (.Y - Y' )2 (~'
j. . . M
- 1) (109)
'=1
Restricting again the discussion to the case in which E (Yi, )=Y .. ,
we notice that the relative precision of Yn." and zn. depends upon
CHOICE OF SAMPLING UNIT 277
the relationship between the variance of .Pi. and Mi'
Case I.-Let
V(j\ I M) =y
Equation (109) can then be written as
L Zi (Z· - 1)
N
(110)
V(Yi.1 M i)
VCY •. I M i ) = };2.
Equation (109) then takes the form
M.S.E. (YR.') - M.S.E. (zn.)
= n:Sf2 Li=l
(M, - Sf) (~)
~ - nJM' l:j .. ,
(M, - Sf)! (111)
278 SAMPLING THEORY OF SURVEYS WrTH APPLICATIONS
Thus for this case, in contrast to the previous two, the estimate
zn. is expected to be less efficient than the ratio estimate in
simple random sampling.
Example 6.5
Table 6. 12 gives the number of villages and the area under
wheat in each of 89 administrative areas* in Hapur Subdivision
of Meerut District (India), and Table 6.13 gives the analysis of
variance on a village basis. It is required to estimate the total
area under wheat in the subdivision using an administrative
circle as the unit of sampling. We shall assume that a sample of
20 circles is to be selected. Calculate the sampling variance of
the estimate of the total area under wheat for each of the
following procedures of sampling and estimation:
(a) equal probability, mean of the cluster means estimate,
TABLE 6.12-Contd.
---- "~"--"---- - ._----------------
57 3 622 75 4 669
58 2 591 76 1187
59 5 273 77 2 852
60 2 781 78 51
61 2 1101 79 1265
62 2 799 80 8 1423
63 2 601 81 2 794
64 3 928 82 1604
65 4 1141 83 3 1621
66 1208 84 2 1764
67 5 1633 85 6 2668
68 4 902 86 1076
69 3 1286 87 348
70 5 1299 88 4 1224
71 7
72 3 . 1947
741
89 4 1490
TABLE 6.13
Analysis of Variance of Areas under Wheat in Villages
in Hapur Subdivision (Acres)2
89
2
• I~~O . 513613
1577 x 10 5 acres 2
69x89
- ... _.... 342043
- 20
= 11 07 x lOS acresl
282 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
V (M - )
oY...
= M 02 • 1
n
\"!!II
L.J (- _ '"
Mo Yi.
ji )2
1=1
_ (299)2. 1 . (36537)
20
~99_x 6~ (68834)
20
710x 105 acres!
The relative efficiencies of the different methods are then as
follows:
--_._ .. _.-.._.-.__._ ._------
Relative
Sampling Unit Sampling Method Method of Estimation Effici-
ency
---- -_---
(0) Circle Equal probability Mean of cluster 12
means
(b) Circle Equal probability Mean of cluster 45
totals
(c) Circle Equal probability Ratio 64
(d) Circle Probability propor- Mean of cluster 43
tional to size means
(e) Village Equal probability Mean per village 100
360()<3800
<3600
<3400
<3200
<3000
<2800
~
<2600
~ <2400
c::
·S <2200 2 2
~
~ <2000
....
~ <1800
§
<::s
<1600 3 2
".... <1400 2
"'= 3
<1200 2 2 3
<1000 3 5 3
< 800 2 7 4 2
<600 7
200< 400 5 2
0< 200 2
2 3 4 5 6 7 8 9 JO
Number of Vii/ages in a Circle
The relationship between the two, i.e., Mih and Mi, is seen to be
approximately linear, the value of the coefficient of correlation
being 0·64. The variability among areas under wheat in circles
of the same size is seen to be rather independent of the size, and
explains the relative superiority of the ratio method (with equal
probability) over the simple arithmetic mean estimate when the
clusters are selected with probability proportional to size. The
284 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
SUB-SAMPLING
7.1 Introduction
So far we have considered only sampling procedures in which
all the elements of the selected clusters are enumerated. We
also saw that the larger the cluster the less efficient it will usually
be relative to the element as the unit of sampling. It is therefore
logical to expect that, for a given number of elements, greater
precision will be attained by distributing them over a large
number of clusters than by taking a small number of clusters
and sampling a large number of elements from each of them or
completely enumerating them. The procedure of first selecting
clusters and then choosing a specified number of elements from
each selected cluster is known as sub-sampling. It is also known
as fIl'o-stage sampling. The clusters which form the units of
sampling at the first stage are called the first-stage units and
the elements or groups of elements within clusters which form the
units of sampling at the second stage are called sub-units or
second-stage units. The procedure is easily generalized to three-
stage or multi-stage sampling. As an example of three-stage
sampling, we may refer to crop surveys in which villages are the
first-stage units, fields within villages are the second-stage units
and plots within fields the third-stage units of saml,ling, the
correlation between yield of adjoining portions of the same field
rendering it unnece~sary and also uneconomical to harvest the
whole field.
7.2 Two-Stage Sampling, Equal First-Stage Units: Estimate of
the Population Mean
We shall first consider the case of equal clusters and assume
that the population is composed of NM elements grouped into
N first-stage units of M second-stage units each. Let n denote
the number of first-stage units in the sample and m the number
of second-stage units to be drawn from each selected first-stage
unit. Further, we shall suppose that the units at each stage are
selected with equal probability.
286 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
YiJ = the value of the j-th second-stage unit in the i-th first-
stage unit (j = 1,2, ... , M; i = 1,2, ... , N)
j\ the mean per second-stage unit in the i-th first-stage unit
in the population (i = 1,2, ... , N)
and
N M
Y.. iM 1: 1: Yij
Further, let
-
Ylm = m
1 f' Yli
W
I
n
1 \' _
nw Yim
ElY...) ~ E {! I>.J
~E {! t Ii)} E (v,.
- E {! ty<}
= YR.
Since the first-stage units are equal,
YR. =Y..
Hence
(1)
thus showing that the simple mean of all elements in the sample
gives an unbiased estimate of the population mean.
By definition, the variance of the sample mean is given by
Now
whence we obtain
'u y-)2 = 1 E f' (J__ l)S2
E Vnm-", n2 W m M j
(6)
SUB-SAMPLING 289
where
(7)
where
=0 (9)
V(y )
nm
= (In _ NI) S + (Im _ M
2
U
l~) n
$... (10)
(11)
V Iv
Vnm
) = (!n _ !_)
N
s + nm
b
S..
2 2 (12)
VI;;
Vn..
) = ~b2
n
~",2
+ nm (13)
(14)
SUB-SAMPLING 291
and Si2 denote the mean square between second-stage units drawn
from the i-th first-stage unit defined by
,
S·2 = (15)
(n - I) Sb
2
= E" .Vim2 - nYn,n2
whence
(n - I) E (Sb 2) = E (t Y; .. 2) - nE (y"",2) (16)
~ E[ t {y" + (~ - ~) S,'1]
= N [t
'''1
Y•.2 + N (~ -it) SW
2
] (17)
The value of the second term in (16) can be directly obtained from
(10). For, by definition,
V(Y,.",) = ECY ....2) -YN. 2
whence
nEcy"",2) = (1 - J) Sb 2
+ (~ -1) S.. 2 + nYN,1 (18)
we obtain
(19)
292 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
whence
(20)
We thus have
i (m~- 1) s?
(21)
n (m - 1)
Est . V ("
v"" ) -- (!-n - N}!) s I
U
+ N_! (1m - M}-) S..
2 (23)
SUB-SAMPLING 293
When (N _- n)/N can be taken as unity, (21) and (22) will still
hold, giving on substitution in (II)
2
Est. V 0\,,) = So (24)
n
- 2
Est. 8,,2 = S/,2 _ Sir (25)
m
and
(29)
V(y- )
nm
= (1n _ N!) 82 + (CCo _ Mn1) S
II /0
2
n• = Co
-~- (31)
c
• Co
n = eM
In other words, there is no sub-sampling.
SUB-SAMPLING 295
The alternative approach of estimating the population mean
with the desired precision for minimum cost leads to the same
solution for m. For, let Vo be the value of the variance with
which it is desired to estimate the population mean, so that
(33)
+(1_1)8
m M IC
2
n = (34)
(35)
where CJ. and Cs are positive constants. From (36) and (10), we
obtain
296 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
(38)
or, approximately by
(39)
m=M
SUB-SAMPLING 297
(40)
and iz is 1.
It will be noticed that m is now dependent upon the magnitude
of the two cost constants, as also on the intra-class correlation
of the character under study. In general, if Sb 2 - Sw 2 ( M > 0,
the optimum value for 111 will be smaller if: (i) the travel between
fmt-stage units and other costs which go to make up C1 are
cheaper, (ii) the cost of collecting: sub-samples from the selected
first-stage units is larger. and (iii) the intra-class correlation is
large. It follows that the optimum sub-sampling rate will change
from character to character. For a related group of items,
however. for which the value of p does not materially vary it may
be possible to hit upon a satisfactory optimum for m without
appreciable loss of efficiency. Sample surveys for the estimation
of acreage under major crops provide evidence on this point.
Table 7.1 gives the values of Sb2 , Sw 2 and p for the acreage under
wheat, gram, maize and sugarcane calculated from the records of
complete enumeration during 1936 of all the villages in Hapur
Subdivision of Meerut District (India). The village constituted
the first-stage unit and a grid of 8 fields formed by grouping
consecutive fields within a village was taken as the second-stage
unit. It is seen that p is of the same order for three out of
the four crops and it seems feasible to hit upon a satisfactory
value of m without very much departing from the optimum
for individual crops. When, however, minor crops are also
included in the survey, the value of p is found to vary consi-
derably and it is not possible to design an efficient two-stage
sample with a common sub-sampling rate for all items. This is
true of all general purpose surveys and one solution at least of
minimizing the loss of efficiency lies in grouping together related
298 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
Area under
Crops in 1000 p
Biswas·
--~- -- -.-.----------.-----~~.----- ----- __ -
Example 7.1
A yield survey on paddy was carried out in West Godavari
District (India) in 1946-47. Five villages were selected in each
of the seven strata into 4which the district is divided, three fields
were harvested in each village and one plot of 1/100 acre was
harvested in each field. The data are reproduced in Table 7.2.
Obtain pooled values of Sb 2 and sw 2 for the district and the
estimates of Sb 2 and Sw 2 • Finite multipliers at the sub-sampling
stage may be ignored.
Calculate the sampling variance of the estimate of the district
mean yield and the percentage standard error.
Assuming that the sample of villages is to be allocated in
proportion to the numbers of villages in the several strata
and that the cost in rupees of the survey is represented by
C = 7n + 2nm
calculate the values of nand m that may be recommended for
a subsequent survey in order that the district mean yield may be
estimated with standard error of 12 ozs. per plot for the minimum
cost.
SUB-SAMPLING 299
From the last row of Table 7.2, we obtain
k
E (n, - 1) S'I,2
Pooled S 2 = 1=1_ -
b n- k
4 X (46655,2)
28
= 6665·0
k
.E n, (m - 1) s,,,,'
n(m - 1)
66979'5
= 7
= 9568·5
On substituting in (25), we obtain
9568·5
Est. Sb 2 = 6665·0 - - -
3
= 3475·5
and
where Pt = Nt! N.
300 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
7.2 TABLE
No. of Villages
Stratum in the Sample N,
Number Population Sample Mean
t - - - - - - - (Oz./Plot) N
N, n,
-----~-- .. ~~- ..-.--.."--- .. ~--.-.~~----~--
Hence
-
Est. V(Ynm) =- 3475·5 {O'147888
5- - 0·001248 J + 9568·5
)
3
x 0·029578
= 192·80
whence
whence
% S.E. of the estimate of district mean yield == 4·9
SUB-SAMPLING 301
V (- ) -
Ynlll -
S (1n _ N1) + Swnm
U
2
2
whence
S2
b
+ 8",2
m
n = 148·3
TABLE 7.3
Calculations of the Optimum Sub-Sampling Rate
(3)
m n = 148.3 c
(41)
+p {N -:_n • T1J (M - 1)
N-l M
_ M ;m}] (42)
SUB-SAMPLING 303
+ P (NN -=-1
-n
2
M _
( nm N
1) 8 b
2 (45)
1(Mm _ 1) (8 2_ 1M S 2)
n b II}
(46)
showing that the smaller the sub-sampling rate mjM, the larger
will be the reduction in variance of a two-stage sample over a
cluster sample. When Sb2 - Sw 2 jM < 0, sub-sampling will lead
to loss of precision.
7.7 Effect of Change in Size of First-Stage Units on the Variance
We have seen in Section 7.6 that the variance of the mean of
a two-stage sample consisting of n first-stage units with m second-
stage units from each can be expressed as
304 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
- nC
+ pz { Nii= C
m
MC (MC - 1) -
MC -
MC
m} J
where P2 will now represent the intra-class correlation within
first-stage units of size MC. The difference between the two
variances can be expressed as
+ OVll - ~Pz ]
where
al = Z=~ Z (M - 1) - ¥-i-t-m
N-nC m MC-m
az = N--=-C MC (MC - 1) - --Me
Since
al
_
~
= Tn
M
{(c:_-(N1)-(n 1)- (N
1) SNM -
- C)
I)} ~ 0
-
and
m (n - 1) (C - 1)
M(N=-l) (N - C) ~ 0
we conclude that
SUB-SAMPLING 305
20
306 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
N M P
y... =
I
NMP LL I:
'''1 /-1 k-1
Yljk
and
Yli(~), }'I(mp), ji (nmp)
or, simply
(48)
N
I\,_
= N LJ )" ..
;=1
= y_. (49)
SUB-SAMPLING 307
To obtain the variance of 5'nmp, we write
I II
--
2
{2:
" C
_
I' i,n" -
_
•Vf .. ) J
_
(. VfI..
_
- J' .. , )
]
+ (Y"- .. - _,2
Y )" + 11 fL 2:" (.)'"",,- J
- -
Yi .. )
If - -
(y" .. - y ... )
]
(50)
sampling from the i-th and if -th first-stage units is carried out
independently,
V (Y.~) ~ !, E t {( ~ -k) s; + G- ~) q
+ E (Y •.. - .v..J2
+(1n __N1)S2 b
where
M
M -1
L
j=1
(j'ij - y,.. )2
and
N
8 b2 = N·~ L
1=1
(y... - }.... )2
Sw 2 = 1L 8,2 and
'''' 1
we have finally
+ (_!_p _ J) Spa
P nm
(51)
SUB-SAMPLING 309
V(- ) _ Sb
2 +Sw 2 + nmp
Sp2
Ynmp - n nm (52)
E~=~ ~
Again, from (19), we have for a two-stage sample drawn from
a specified first-stage unit,
Finally
Yimp 21')
E(- I
_. { .I.;..
-
-, 2 + (m
I - M 1) S 2 + (IP - p1) 8;2}
I In
__ n S(1
l n
_ N1) S/, + ( m1_ MI) 8n
2 m
2
+ (1p _ P1) nm
8,,2
+ .I-,... Jl
2
= (n - 1) { Sb 2 + (~ - ~) 8.,2
+ (1 __~) 8,,2l
pPm j
or
Substituting from (53) and (54) in (55), and from (53) in (54),
we have
zo; 11 - S 2
Est." .. - .
_ (!p _ P1) ,p p
(56)
SUB-SAMPLING 311
and
Est. Sb 2 =Sb
2
- (~ - ~) S,.2 - G- t) '~~ (57)
whence
Est. V (_V"IIII,) = G- ~) S,,2 + (~ -- ~) :\~ + G- t) ~~
(58)
(59)
(62)
(63)
From (64) and (51), deleting the bars over 8 W 2 and 8 p2 for
convenience, we consider the product
c (v + ~2)
_
- lf( s 2_ M1 S'" 2) + ( S2 _ p
I,
1 S 2) 1 +
Pm 10
1
mp
SI'2}
which can also be written as
+ {C2 (s 2 I, -
1 SIn
M 2) m + (s.,.2_ p1 S 2) mC1}
l'
+ {Ca ( s 2 IV -
~ S
P I'
2) P + .pS,,2}
C2
.
+ {Ca (S 2 I, - M1 S2) mp
IV
+ C1niiiS"z} (65)
c (v + S~2)
= {~Cl (SI,2 _'1t' ~",2) + ~C2 (S",2 - ~-SI,2)
+ v'C~s~,2r
(67)
(68)
(70)
Case II
Suppose next that both Sb 2 - SW2/M and Sw 2 - Sp2/P are
negative or zero.
In this case (65) is minimum when p has the maximum attainable
value, say p, which could also be equal to P, and m is to be
chosen so that
C2 (8 "2- 1 S 2)
M II) m + (s 2 pPm
lQ
1 S 2) C
-
t
Case III
Suppose now that Sb2 - Sw 2 jM > 0, but Sw 2 - Sp2jP~ O.
The solution in this case for pm is given by the nearest
integer to
(73)
Yn(ml)
1 f'-
= It W Y'(m,) =
I
Ii W
f' YjJ
f' m(I W
j
and
Further, let
Mj
N M,
YN. = YN(MI) = ~. L J, L
'-I j.l
y"
316 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
and
=
f'-Yi
n1 w (lII j )
where the summation extends over the units in the sample. This
can also be written as
N
y,. = ~ L
i;:l
aLVilm;) (74)
= n~ L Mi Yilm;)
where
M; YI(mj) is the estimate of the population total for the i-th first-
stage unit
and
a, = 1 if the i-th cluster is in the sample, and zero otherwise.
SUB-SAMPLING 317
(76)
where
and
L
n
x,' = n~ M;xl(m;J
and
-,
R, = ~',
x,
(77)
We shall study the properties of the different estimates in the
next section.
318 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
E(Y,) = 1L h
'=1
(79)
N
1 \' _
= N /...J y,.
'~1
+ t
'-Ft'
E{(ji;(m,)-Y,') (ji,'(ml')-Yl') Ii, n]
(81)
320 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
where
, =
S.ll
where
N
=0 ~~
The fifth and the sixth terms are obviously zero. We are there-
fore left with
N
+ (YN. - Y..)!
- V(Y,) + (YN~ - Yu)2 (84)
where
SUB-SAMPLING 321
The bias component arises because of the failure to give equal
chance to each second-stage unit being included in the sample.
Unless the Mi vary considerably and the character is correlated
with Mi. this component may not be serious; but it is important
to collect evidence on this point before adopting Ys as the estimate
of Y.. , The procedure of estimation in the yield surveys in India
provides an instance in point (Sukhatme and Panse, 1951). The
number of fields under the crop is often found to vary considerably
from one village to another. thereby pointing to the need for
testing the nature and magnitude of the bias arising from the use of
the simple arithmetic mean of plot yields to estimate the average
yield per plot. Table 7.4 gives a comparison of the yield
estimates in large samples based on Ys with those calculated from
an alternative estimate 5's' which, as will be presently shown,
provides an unbiased estimate of the yield per unit. The standard
error of the difference between the two estimates is known to be
TABLE 7.4
Mean Yield in the Wheat Survey in Punjab, 1943-44
21
322 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
= N~ L
i=l
AlIYi.
= Y.. (86)
+ t
'7'0"
M,Mj,E {U'H ...'-Yi,) (Yi'(m/,-Yi'.) Ii, n]
SUB-SAMPLING 323
L: M,~ (~I -- ~J
i·1
8,2 (HH)
where
N
L (M .
I M 2
8 /, '2 I j. _ i'
N -I j. •. .)
N-I L (UJ-. -
,::::1
.i'.)2 (90)
The last term in (87) is clearly zero and we are left with
N
V (j' ') =
8
(!n - N_!) S b
'2 + fiN
) W ~/ () -- ))
\' M2 mi Mi
8
i
2 (9) )
illll:j
E (j") _ E (j_.')
8 - E(un)
(92)
= Y~
since
E(un ) = 1
V (un) = (~-
n N
I) S U
2 (94)
where
N
Su2-- N 1 '\'
_ I W (u, - 1)2
and
COy (ji,', un) = E {(u" - 1) ('va' - Y.,)}
- 2Y .. t.=1
(Ujii.- y.. ) (U, - 1)1
J
SUB-SAMPLING 325
2
8/ = N~ 1 L: ul(Yj. - Y..
i::;::;t
)2
TABLE 7.5
Percenrage Standard Errors of D(tferent Estimates
of Mean Yield
y, y,' r/
-------
Wheat (U,P.), 1947-48 3·7 14'0 4·7
Wheat (Delhi), 1948-49 2·5 10·0 S'7
Cotton (Madhya Pradesh), 1944-45 .. 5·5 15'0 1l·3
1945-46 .. 6·9 14'0 13·2
326 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
+ t
j#I'
UjU j ' (YHmil -.Yi.) (X,'C.'I', -Xi', >]
t
~ ~ E[ utE «f'.,.,,-y,) (x.,.,,--',) Il)]
= n~ 1:
i==!
u.
2
(~I - ~) Slyr
(101)
where
(102)
(103)
where
(104)
C ov (ji, ' , x,
-') = (IIi - N1) 8'
b••
+ n~ L:
,",1
UI
2
(~I - -k) 8 ,•• (105)
328 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
7 (-,
J 1 YR -
) _ (!n _ N_! ) j S'2
t /.. + L=
X..2
S
bx
'2 _ 2ji " S
X.. bU')
'1
+ _!_
nN
If'
W
i=l
u. 2
'mj
(J __ __!_)
M j
S
'.
2
+ Y..
-
:X:,,2
2
(
L
1=1
N •- •
Ut X ;: -
- 2
Nx ..
)
i=l
+
1 \'
nN W u.
2(1m, _ Mi
_!_)D;2 (106)
i-1
SUB-SAMPLING 329
where
(107)
1",
- n {)iN. s + V (ji,)}
330 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
we obtain
(109)
Also
where m,
S,2 ,'" mi~ .L: (Yii-_}';(lIIi,)2
(111)
Hence
1 (112)
S2 -
I, n
Est. V(y-) 8
= (1n _ N!) S2
I,
+ 1 '\'
nN W
(1m,__ MI1_) S·2
•
(113)
Est. V (ji)
•
= (.1n - N2) m~ + N_! (_!m - M.~-) W
SUB-SAMPLING 33]
Sf.
'2 __
--
1 \'
n __ I LJ
" (M.if -; __J.,,)2
.liC",,) (115)
- n (Y ..2 + V (.V,')}
Substituting from (91), we obtain
N
Also
(117)
Hence
(lIB)
332 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
(119)
(I21)
_ 2 [~M (- 2 + ( m!.
W 3 e" _ M.
'.. ) S 2}
J:M
n
j
I
I I
I
.
+ (Im i
_ M
I ) S
t
' 2} +. L'.MM.
' ,}-, •. Y-.]
1'1- 1'
I •
+
"
.E Ml' ""' M 2
.
( .EM, )1 W ·
ft
x (!m, - M,1,) Sl
334 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
)1 ft. •
we have
E
f' M'l.
)_1_ L..l
In- 1
h!J (". \,ft.
_ )-, ")2}
n.
= S"2
b
whence
Est. S:' = so" - .~l t~:U; -~) s,'}
x )1 _ _~!'ii._ + 1; M,2 1 (124)
l 1: M. (1; M.)2 J
SUB-SAMPLING 335
On substituting from (117) and (124) in (96), we get
x {~- n~-d!- ~)
-l~, + (i~j·)}
or, to a first approximation,
(125)
(d) Ratio Estimate YR
The steps leading to the estimation from the sample of the
variance of YR are similar to those given in (c) above. We
shall quote here only the final result. We write, to a first
approximation,
Est. VI (YR) = e- ~) n ~1 L
n
1W
+ nN 'f' U 2 (_Im,
j
_ M,1) d j
2 (126)
where
(127)
n
Em,
(128)
n - 1
(129)
(130)
"
=-= 1 \'mE(I'-)
mo W l . I.
(131 )
I { "
= 2 E I: m:., (1'0(",,)-
- -.,
.1' .. )-
mo
I I" ., - - - - 2
I ~ m - ()' -I- r )
-
-mo 2
E (4..1 i{md - 1
I' I' -
"j . . . I .....
n
-I- 1: mim,' (5'i("'il - L. -I- j .... - Y..)
i-r- t'
-
(y ,'(mi') -
- + Y.'.
y;,.
- - -)1f
Y..
n
+- I: m,m.' {U'd",,) -. J'j.) U'i'(m() - .v... )
+- U,("',) - 5'..) (5'." - ji .. )
since the expectations of all the other terms are clearly zero.
Hence
-t j~i'
m,m,' s;;]
(132)
whence
(134)
(135)
(136)
+ '!!i:
m·
M.2S
"
2) (138)
•
where
(139)
SUB-SAMPLING 341
i>i/~l
+~ L: (
i=l
,lc2 ::1m,
(140)
and this is minimum when each of the two square terms is zero,
giving us
(143)
which is minimum for
(144)
(145)
y/, he will notice that the sampling variance has the same form
as that for the estimate Ys' except that Sb'2 is replaced by Sb H2 •
It follows that the optimum value of mi is so determined that
m - /C;-. ~i s.
i - V c2.J' M I
where
Example 7.2
A corn borer survey is carried out every fall in Iowa (U.S.A.)
for estimating the number of borers per plant. Fifty sampling
units of 25 stalks are selected at random from each district, and
for each sampling unit the number of corn borers per stalk is
estimated. Since it is costly to dissect all of the infested plants
in each sampling unit, a sub-sample of two is dissected to obtain
the estimate of the number of corn borers per infested plant.
When the number of infested plants in a sampling unit is one,
that is dissected.
Two methods of estimation are followed:
(a) The first method consists in computing the simple mean of
the number of borers per plant from each sampling unit.
Thus, if Mi is the number of infested plants in the i-th
sampling unit and Yi(ml) the estimated number of borers
per infested plant in the i-th sampling unit and n the
number of units in the sample, then an estimate of fl., the
number of borers per 25 plants, is given by
ZI = nI "1: _
MJ'I(fIIll
344 SAMPLfNG THEORY OF SURVEYS WITH APPLICATIONS
S~g2_l/g-2l
n(11 -- I) l £.., I • j
19463 - 7750
50x49
= 4·78
1 -
=
II
{v + (11- I) MYN '
}
for large N.
Define
E (M -- M) ('" -)' )
_ _ _ _ _ .. :_ .. ______ _.::___!_:_ __ • N.
p
VE(M. _.. M)2E(i' -1' N. )2
I • 'I • •
where
and
L (- -)'
N
l' - ,
,;, )N, -
1=1
or
....,
--
~
:~o .(o~~:;l8
~-:~"?'~"':'
---
_I 00000 o
'i
.....,
N ~ ~ ~~ooo ~ ~~~oo~ ~
;;-
....... ~ 600000~0000bobb~~~00boob6~~~~ob
;::
~ 1
....... ~
!:i' S
~
.::: Ii 00 0 ~o~~oooo 0 "'''''''00000
\0 II ~~OOOOOO~OO~~~~N~~~o~oo~b~~~~~~
·5 - ~ -NN M -~-MvM-
l"-
e- i
5 :;.. I~
~ ~...
~
1('\0 0 1('\01('\1('\0000 1('\
...
Il.I
~
,;;:.: 6NOOOOOO~oo~~6~~~~~o6oo66~~~~~~
"'I('\I('\OO~O~
~
~ 4)
-5
U
c=:J
0; ~ I ON
~c::
~~
'-J!
°c
0""'
Z
SUB-SAMPLING 347
- .
"..,
- Ii
~
, ..
.....
• V"I
. .r
6
....., ~
N
o ~ ~ ~~~ ~ ~~~ ~
o
Noo~060666060666006 II
-
~o~~6~O~~N~NO~~NO~~
~~- --- ~ -
!!....I
_
I
I .... '
....,
~I",
x-
o O~O~ ~~~o~ ~~~ o~ o ~E,
~oN~~60666~6066NO-0 00 IG
-D
'". ...,_,
g
'l
000
M "'-
_~~~o~~~~~v~-~~~o-~
_ _ N _ _ _ _ _ _ N_NN -N N
348 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
p = r = 0·261
8M 2 = 8 M 2 = 65'58
SM =SM =8·10
while Sb 2 is obtained from (112), being given by
1
= 0·7347 - 50 (14,42)
=0·4463
S. = 0·668
Hence
and
M.S.E. (za) = V (za) + bias 2
= 4·23 + 1'90
= 6·13
7.16 Stratified Sub-Sampling
By far the most common design in surveys is stratified multi-
stage sampling. In this design the population of first-stage units
is first divided into strata, within each stratum a sample of
first-stage units is selected and each of the selected first-stage units
is further sub-sampled. Crop surveys with the subdivision as
the stratum, described in Example 7.1 and the corn borer survey
with the district as the stratum described in Example 7.2, are
examples of this design. In this section we shall give the formul:e
for the estimate of the population mean in stratified two-stage
sampling, and its variance. We shall consider the unbiased
estimate only.
Let the population be divided into k strata with Nt first-stage
units in the t-th stratum, so that
where
and
YII = the corresponding sample estimate
k
- E .\,Y,,' (146)
'''1
S Ib '2
(147)
where
and (148)
MU
SIl! = (M~~::":'T) I:
j;;:"1
(Y"j - ,"" )2
(149)
l: (Ylli - l'
J II(,"ti) )
2
where
"' M,
_1_ \ ' \ ' Ylli = Yh
n,m, l..J
,
l..J
J
and
.\, =
352 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
V(ji,c)s= l: 'lf(~ _
1=1
k A2
11,
I
Nt
)S".2'v
_l
1-
1
11,
(!m, - M!__), 8".21
~ S
where
Nt
Slb
2
- Nt ~1 2:
1:=1
(jill, - 11 ..)2
and
and estimates of Stb2 and Stw 2 are provided by the same formula!
as in (110) and (112), namely,
(152)
SUB-SAMPLING 353
with the sampling variance given by
V (ji)
nm US
""" I) "S
(ln _ N 2 + (Jm _ M.!_) S.,n 2
(153)
I \ ' (j -)9
f.r--=:-I W 'i. - Y. - (154)
i=1
and
Sw 2 = the mean square between second-stage units within first-
stage units in the whole population
(155)
where Pt is the weight for the t-th stratum, and its sampling variance
(156)
k Nt
= 1} 1} (ji". - Y..)2
,el i-I
23
354 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
k N,
= .E .E (y", - y..,
'~l h"l
+ Y, .. - Y.. )2
k k
= .E (N, - 1) Su 2 + .E N,Y, ..2 - NY ..2 (157)
'=1 '=1
The estimate of Stb is known from (150), so that our problem
2
= Cn,1 - N,)
1 1 18
S'b + (m - M) -;,~
2
(158)
whence
Est , Y,..
- 2 =-
Y"
2 _ (__!_
n
_ N
,
_!__),Sb 2
I
_ (1.m _ M_!_) S,wn ,
2
(159)
or
L L'=1 L k) S'b~
k k k
(160)
Also
V{Y ..)s= E(J; P,Y,. )1 _y...
'-1
so that
Est. (NY._2) = N(J; p, y,.)' - N' Est. V{Y.. )s
'&1
+-n,1 (1m
_--
M
1) S,2
~ }
'"
(161)
SUB-SAMl-LiNG 355
Subtracting (161) from (160), we get
Est.
{~~ N~'
1.1.
2 - NI'
. ..
}~
2.
-
" -,!
Nlt'I' -, 2
-.ltD)
N 2
NP, ) S11,"
I
(162)
whence on substituting in (157), we obtain
• = N-:_1 1!...J
S.2 "N
t=l
k
I
(ji,2, - J'"
- 2)
(163)
-
SID
2 (164)
+ N~ {t 1 p, (V,. - Yw)2
t~l
+ (I__ _
m
1 )}_
M n
[1 _!!_~N-I
(165)
Est • V V(:;,) -
.. S -
\'p2
W I
{(~
n t
- I.) S 2+ n_!_ (Im - .I)
Nt M s 2}
tb
t
W
(166)
Hence
I;
- )~
Est. {Vus - Vs} = n (N N - n
_ 1) w\' (-
P, Y,. - Yw-
1=1
k
\'
+ LA {N Nn
-n (
P,
_ _!!_ • Pt (1 - PI»)
N - 1 n,
_ PI!
n,
+ 1!1}
N
S ,'I.
tu
1 1) \' {PI
+ (m - M W n -
k
N- n
n (N - I)
S b2 -
t
(_!_m - _!_)
M
S2
II!
(168)
SUB-SAMPLING (Continued)
8.1 Introduction
In the preceding chapter we have developed the sampling
theory appropriate for sub-sampling systems involving the use of
equal probabilities of selection at each stage of sampling. When
the first-stage units are large and vary considerably in their sizes,
this system of sub-sampling is not usually efficient. This is even
more so in cases where practical considerations demand that the
survey should be confined to only a small number of first-stage
units within each stratum with equal number of second-stage
units from each first-stage unit, although the amount of sub-
sampling from the selected first-stage units would be necessarily
unequal under optimum allocation. A system of sub-sampling
involving the use of varying probabilities has been used with
considerable gains in efficiency in such cases. In particular, a
sub-sampling design in which only one first-stage unit is selected
from each stratum, with probability proportional to the measure
of the size of the unit, and a fixed number of second-stage units
is selected with equal probabilities from each of the selected
first-stage units, has been found to bring about marked improve-
ments in precision, compared with sub-sampling systems involving
the use of equal probabilities. The developments are due to
Hansen and Hurwitz (1943, 1949). In this chapter we shall give
the theory of sub-sampling systems involving the use of varying
probabilities.
8.1 Estimate of the Population Mean and its Variance
We shall assume that the first-stage units are selected with
replacement. Further, we shall suppose that whenever a specified
first-stage unit of the population, say the i-th, is included in the
sample, a sub-sample of mi second-stage units will be drawn
therefrom without replacement, but only after the replacement of
any sub-sample which may have been drawn previously. In
other words, if the i-th unit happens to be selected, say 'Y times,
in a sample of n first-stage units, y sub-samples of mi units each
SUB-SAMPLING (continued) 359
will be drawn therefrom independently of each other, each sub-
sample of mi being drawn without replacement.
Let Pi denote the selection probability assigned to the ;-th
N
first-stage unit of the population (i = 1, 2, ...• N) and E Pi = 1.
I.,
Further, let
M,
Z'I = . ... Yij (I)
Mo Pi
Consider the estimate
Z, = Z,,(mi)
(2)
where the summation is taken over all the /I units in the sample.
Then it is easily shown that Zs is an unbiased estimate of Y...
For, we have
E(:,) ~E {! t , ,. .}
~ E {~ t E ('".,,\ i)}
=z .. (3)
where
(4)
N
\ ' M, _
= l...J Mo YI.
'-I
= y_ (5)
360 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
f' (-
+W zi(ttq) -
- )(.
Zi. Zi'(mi') -
- )1J
Zi'.
1f t '
+ t
i=¥=i'
E (Zi(m,) - Zj.) • E (Z"(m{) - .ii")}
(7)
where
M.
Sjz' = M,l_ I L
1=1
(Zjj - Zj.)2
Mi
(8)
SUB-SAMPLING (continued) 361
The value of the second term in (6) is given by (77) of Chapter
VI. We have
2
E (z •. - Z.. )2 =~~ (9)
n
where
N
2
(Tho =); Pi (Zt. - Z" )2
1=1
N
\' P (MIS'..!.. __
LJ 1 MoP i Y..
)2
(to)
v (Z,) == a~/ + :1 L
{=l
Pi (~i - ~J Si. 2 (11)
v (Z,) =
I
n
{f M/p
LJ
1=,
M2- 2
j
- Y.,2}
+ --nI L
(12)
When the selection probahilities are such that
Pi = Z~ (i = 1,2, ... , N)
(13)
362 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
whence
N
v (z,) M v "
o
- )2 + n~
ld'I" _ Y.. \ ' M, (_!_
LJ Mo m,
_ I-) S.2'
M,
(14)
(15)
n -1 1 E If'
LJ -2
Z ((mi' - nz-2 n(m;) jt
L
N
It follows that
S 2
Est. V (z,) = _!J.!'._._ (19)
n
N
= Cl E l'{l - Probability that the i-th unit is not
j-1
included in any of the n draws}
N
= Cl 1: {I - (1 - PS'} (21)
4=1
(22)
(26)
SUB-SAMPLING (continued) 365
+ 2C2 L
i>i';=1
P,P.,SizSi'z
(2~)
(32)
iz = Co (33)
Cl + cBkMo
We remark that when Pi is proportional to Mi> the optimum
value of mi is a constant, irrespective of the first-stage unit
included in the sample.
8.S* Determination of Optimum Probabilities
In determining the optimum allocation of the sample in the
previous section we assumed that the selection probabilities Pi
were given. These probabilities can be any arbitrary positive
proper fractions, subject to the condition that their sum is 1;
SUB-SAMPLING (continued) 367
(i = 1, 2, ... , N)
368 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
N
But 1: Pi = 1. We, therefore, have
P, = (i = ], 2, "" N) (36)
S/
M;Y~2
for Pimi = kMi. The cost function (39) will now be seen to
depend upon Pi'S. However, if Mi's are unknown, Mo will also
not be known and the estimate Zs can no longer be used. Several
alternative estimates can be formed. We shall consider one
such estimate in this chapter, namely, the ratio estimate, and
thereafter resume discussion of the problem considered in this
section.
-, l'
v" ~- R
8
v
,•
(40)
•
denote the ratio estimate of the population mean, where
Zij
MI
MoP, Y'j
;;
-,
11
n
L: M;
MoP, - 1
YI{>II,)
(41)
"
VI! -
M j
Mo-P i x,! v, II L M,
M~p. -",(m"f
x standing for a supplementary variate observed for all units
in the sample.
Then from the results of Chapter IV we may, for n sufficiently
large, ignore the bias terms in the expected value of YR and
regard it as an unbiased estimate of the popUlation mean for
all practical purposes.
24
370 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
(43)
and by analogy
V(v) = 1
• n
L ;~i (~, - ~)
N
(46)
(47)
(48)
372 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
where
(50)
and
R = )',.
x..
When Xij = 1, the expression for the estimate YR is simplified,
being given by
(51)
where
"
Uj = ._--Mi -
MoP,
and ii. n LUi (52)
Also,
~
+W M/
Pi
(1mi _ M1) Si.2}
j
(53)
1=1
where
R, = Est. R = !.
VB
where
(56)
where
LJ' (59)
~ + ~
M;2 _, _ 2 1 2
</>= {
P, (Ji, - RxiJ k M,D,
(63)
SUB-SAMPLING (continued) 375
N
= P.nC3 })
i=l
P;M, + ,\ (65)
N
"~p. (cln + c2nkMo -+ c n }) P.M;)
3
;"1
(67)
(68)
(69)
376 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
~
Di ~-=
(-Y •. - R X,.
-)2 - D.2
M (70)
i
k ~- (71)
and
n= N
(72)
('1 -I- c2 k Mil + C:; }; PiM,
The optima will naturally vary with the cost function and the
sampling system, and care is necessary to determine from pilot
studies the nature of the cost functions before deciding on the
optima to be adopted for the surveys.
8.8 Relative Efficiency of the Two Sub-Sampling Designs
We remarked in the introduction to this chapter that a sub-
sampling design in which the selection probabilities are propor-
tional to the size of the first-stage units, and a constant number
of second-stage units is drawn from each selected first-stage unit,
may bring about a marked improvement in precision compared
to the sub-sampling design involving the use of equal selection
probabilities. In this section we shall compare the two systems,
using Zs as the estimate for the former system and the simple
SUB-SAMPLING (continued) 377
Mi (~i - I) L
!f'k c 1
(Yi) - 5\,) (Y,k - }'•. ) + (5'i. -- .V.. )2
= - S.2
U.
+ (_Yi. -.- Y-)2
.. (74)
,
Substituting the result in (73). we obtain
N N
a2
"!.' Pi + __1___
M,S.E. (z,) = nN
Li-' M nmN L M'S2
if ~i (75)
)78 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
L 5.
N
+ n2N 2
+ (1 - ~) (J'N. - S' .. )2
,=1
+ (1 _ 1)
n (I'
. N.
_ .Tt· .. )2 (76)
The difference between the two mean square errors is, therefore,
given by
M.S.E. (Y,) - M.S.E. (f,) =
N
1
nmNW'M
'\' 5.2 (A!I
1=1
+ (1 - D(J'N. - J.Y
and
Zi, m1'. = the mean per second-stage unit of the sub-sample
, drawn from the i-th first-stage unit
(78)
namely,
P {y. r}
380 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
1
= nE N
{; Y·i z}
,.
{
(80)
N
1)' _
E (:,') n L.., nPiz ..
i=1
= z. (81)
=L
= Y.. (82)
SUB-SAMPLING (continued) 381
V(z,') =E
{
1
n L
i=t
z
1', " '""j
_,,\,
~"J
12
~ E {t L N
1=1
_
1',zi, '"I', -
)
n 2: Y,*i,
1=1
N _
- 1-
1
11 2:
N
,el
__
z· -
1'", z..
X -
( Zi'. III)' -
l'
-
-
=i'. I
"1-
-
Zi'.- =..
-)} (83)
E { 1:
;#;'=1
Y;Y/ (.i" m')',
I
- .i;, + .i,,- .i,,) (.i". In')' , -
j
i,'. + .i;',-.i,,)}
(86)
P
E(y 11'.) = (n -1'). 1 (89)
1 • • I-Pi
E (I', 1')
,
= E {I' , . (n - y.) Pi . }
, I -Pi
0 ..
I -PI Pi {P
n, (1 - P)
, + n1P,2}
(90)
Using (87) and (90), the third term in (86) may now be written as
N
+ E {n (n - 1) p,p/}
'.,k /-1
N
n 1.: Pi (Zi' - Z,,)2
i=l
= nu bz2 (91)
N
since 1.: Pi (Zi. -
• _1
z.. ) is clearly zero .
12 - 1
(92)
Il
V (z ') =
•
1
Il {t
N
n -1
n L (93)
+ n1 \'
LJ Mo
A!, (1m _ M1_) 8
j •
2
i=l
n - 1
nMo
,r:
'=1
N
M,
M o- 8,2 (94)
SUB-SAMPLING (continued) 385
V(z')
"
= u,,2
n
+ S",2
n
(1m _ M~) n-l
/I
(95)
N
= Ei==l
I' z2,_"'1' - nz,'2
1. ,
(96)
- nE (2,'2)
- nE(.!.'II)
20
386 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
n-l
-n-
(97)
1=1
Hence
_
~
\' ~~I + _1 _.
..
\'~:
S b. 'I
n.: .:. 1 LJ M, n- I LJ ".r.
(98)
SUB-SAMPLING (continued) 387
or
.\' ( I 1) S,. '2
W my -
, M;
.
+ ,,(n 1_ I) L (~ - ~) S,,'2
n
\ ' p, S,.'2
(99)
n W 'M,
where
m"Y,
1.: (Zij - Zj. •nY,)2
my, - 1
S E (n') - I
2
all. - W.C'
2
W\'
n -- I
whence, we get
E (n') -- I
•
11-1 [;
(100)
where
n' "~i -)2
1.: 2,; (Zjj - Zj. mY,
S .,.'2 nm -n'
Est. f1
•
I = S '2 _
b
S '2
.,
{Em(n'~ -=J _
(n - 1)
,~ + _I... }'
M NM
(101)
388 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
(102)
Est. V (z') =
" '2 S'2
_.~--.
fE (') 1
1
nm }
+ N--M:;o"
"b - It - - (103)
• n nm t n- 1
(1 = 1,2, ... , k)
ft'
I" = k~ %"("111 (lOS)
390 SAMPLING THEORY OF SURVEYS WITH APPLICATfONS
M,o (107)
A, = -
Mo
The variance of Zw is given by
where
N,
= 1: Ph (Zli. - i l.. )2 (W9)
'''I
and
M"
S2 , l("l = M;;-=--r Ljet
(ZUj - %11.)2 (110)
i, (Ill)
where
(112)
.,
V ( ,,) = + ! \'
nnw
(1'b(.) P ( ]
I til, - M,
] ) S'
1(.)
(113)
where
(114)
SUB·SAMPLING (continued) 391
and
(lIS)
/=1
P,
P" (116)
p,.
where
N,
Pt. = 1: P,
A, _ A, _ A,
P I. -z, .. P +Zl.. P - 2..)"
t. I.
I: N,
L: L
tal
p,.
"=1
PI< {~,~2 (2h. - Z, .. )2 + (~,. 2, .. - z.. y}
(117)
'-I ''''I
Also
).,t P SI (118)
p. II "til)
I.
392 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
Hence substituting for a2b(z) from (l17) and for P I S2 1(z) from
(118) in (113), we get
1;
I: -p---: I:
NI
At 2
+ n
Pt; (_!_
m ti
~~) S2 ,Hzt )
1"1 1=1
(119)
{Vus- Vs } = L
t:::;l
At
Z
( n~t, - n~) a2 tbt ./)
+ _~n {f'
W At:_ z2
P t , t..
t= I
- p} ..
'\'AZ( 1
+W 1)
nP,. - I n"
NI
X .L
i=1
Pt; (~ti - ~J S2,H• t) (120)
·2 •
a 11.(.,) = S"II,(.t) (122)
11, -
Also
S'2 II(,/) -
-
S2
li('/) (123)
Nt
+ :, 1: !" -- ~,~)
i=l
P li ( S2'i«/)
so that
nl
a·2 Ib (ot)
Est. (Z, ..2) = - 2
Zto
111 ;'7 1: (~'j ~J S2'H:t)
(124)
Similarly
Z..2 = E (Z., 2) - V (Z.,)
whence
k
Z..2 - 1: At 2 {~:'~;~
'=1
.,
+ n,1 L (m!, - 1') SI,,(.,)} (125)
394 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
k 2 } k
Est -_ %, 2 -
'\' ", Z- 2 = "\' .--. Z,- 2 -
",2 - 2
Z
. { W P,." - W P,.' '"
'-1 1=1
.,
x )'
W
(1mil - .Ml II
) S2 ti (zt)
(126)
Est. {Vus - Vs } - n
_ 1
L P,.
1=1
A,2 _
ZI8
2
-
-
Z,n
2
A-.
+ L A~~
1=1
(nn;". - 1 - nP,. + ~) a2"'hll
(127)
1 + IiI)
k
+ "\' A,· (~ _ 1 - nP 2
W n, nP ,. I.
S tbI.,1
(J28)
SUB-SAMPLING (continued) 395
Also
(129)
Example 8.1
A sample survey for estimating the acreage under paddy was
carried out in Orissa State during 1950-51. Each district of the
State was divided into a suitable number of strata by grouping
together adjoining administrative divisions in the district. From
each stratum a sample of villages was selected with probability
proportional to the number of survey numbers in the village.
Each selected village was divided into clusters of 8 consecutive
survey numbers, 1-8, 9-16, 17-24, etc., the last cluster consisting
of the remainder. From the clusters so formed a simple random
sample of 4 clusters was drawn (without replacement). If, how-
ever, a village occurred more than once in the sample, say "
times, a sample of 4" clusters was selected from it.
Table 8. 1 shows the number of villages and the number of
clusters in the population, the number of villages in the sample
and the number nt' of distinct villages in the sample, the
estimated area under paddy per cluster and the values for Stb 2,
Stw·, Stb'2 and Stw'l for each stratum. Calculate the sampling
396 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
TABLE 8.1
Area Survey on Paddy, Orissa, 195()-5i
Values of Means, Mean Squares between Village Means (Slb Z, Slb'f) and Mean Squares
Within Villages (SI .. 2, Slw'2) in Acres per Cluster Basis
+ (0'067843)24.9885
14
+ (0'14261)2 0·6760
18
+ (0 '12127)2 !_-_?~87
15
= 0·001037 + 0·004289 + 0'001492 + 0·003669
+ 0·001640 + 0·000764 + 0·001616
= 0·01451
8.2 TABLE
NtM,
Stratum A, = - ~ n,-I· (n- I)~, (4) >.(5) S'b 2
NM
-...- A,Ytn, .. ~,y2'n,. nn,
(I) (2) (3) (4) (5) (6)
---_._--_._- .._-_ .._-
1 0·19741 0·2549 0·3290 ·000076397 -8,6504 -- '0003343
2 0·12151 0·3156 0·8195 ·000068727 -4'4038 -'0011430
3 0·091189 0·1895 0·3938 ·000029152 +9'6895 + '0011653
4 0·25818 0·5417 1·1364 ·000055835 -1'8543 - '0001938
5 0·067843 0·1731 0·4418 ·000035632 -13,8412 + '0006828
6 0·14261 0'2389 0·4001 '000058256 -2·2524 - ·0000887
7 0·12127 0·2547 0·5348 '000059446 -2·3714 -'0002324
_.._-----_-_-------
Sums 1'9684 4·0554 - '0001441
398 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
For example,
+~ ~
\' nn, {en, - I) - A, (n - I)} r'b""
SUB-SAMPLING (continued) 399
1
= f36 {4'0554 - (1'968)'} + (- 0·0001441)
= 0·001341 - 0'000144
= 0·001197
= 0·082
or
8·2%
where
(131)
SUB-SAMPLING (continued) 401
(132)
where
N,
1: PI, ()\i' - 5\ )2
}
2
alb
1==1
N, (133)
a 21J 2 -_ 1: P 2i Ll'2). - .i· 2 . )2
j=l
and
Mti
8Ii 2
MIl -1 L
,:1
(YH, _. .i'H. )2
(134)
V (Y.) =2
1
{a b
2
+ LP,(~,
1£1
- -k) 8,
2
} (135)
where
a. 2
=
N
1:
1=1
P, (Yz. - Y.. )2 (136)
and
M,
8l M,-l
L
,.1
(y" - YI.)2 (137)
26
402 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
NI
= £ API! {(fll. - YI .. )2 -+ (h. - Y.. )2}
"'-1
We know that
(141)
SUB-SAMPLING (continued) 403
and
(142)
giving us
(146)
R - Yj
i-Xi
and
R the ratio of the population total of y to the
population total of x given by
y
R = X
It is easily seen that
£ (r'l 1i) = R, (151)
406 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
and
(152)
For,
M,
£ (r ll ) = L~ 'Ii
and
'''''
y
X
=R
Consider now the simple arithmetic mean of the ratios rij given by
_
. ",
l , , \ , ) ' Yo (153)
rn .. = nm W W Xi}
i
=R (154)
SUB-SAMPLING (continued) 401
the second term vanishing since sampling within the i-th and
i' -th units is carried out independently.
From (87) of Chapter VI, we may write
(157)
where
M.
fT,· = L ~:
I-I
(r'l - R,)I (IS8)
408 SAMPLING THEORY OF SURVEYS WiTH APPLICATlONS
E (f,,,,, -
-
R,,)2 = n2
1
E
{1:"j
~~
G 2
(159)
(161)
(164)
n-l [
n {V (r,,,,) + R2} - n {V U... ) + R2}J
SUB-SAMPLING (continued) 409
(165)
whence
2
Est. V( i'om) = Sb
(166)
n
(167)
ab
2
+ a",.. 2 - 2
+!!_,,__ (168)
n nm nmp
E (t.) ~E t t '". J
= E ~ t E(f"." lil}
SUB-SAMPLING (continued) 411
(171)
- ~
- Y ..
412 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
(173)
1=1
-, 1
.
\'M-
:, = y, = nM l.-J ;Yj(tftil (175)
and the expression for the variance becomes identical with that
given by (91) of Section 7.12, as expected. Thus,
SUB-SAMPLING (continued) 413
N
V (i,) = ~ G- I) L
---I
+ nN
1=1
where
(178)
-)
Est . V (Z, f'
= W
I - E (al ) M,2 E
{E (a,)}2 Mo2
/" 21 .)
St. Vi. I
"
+ \' E (alaJ) - E (a,) E (aJ) MIMi
W E (a,a,) E (a,) E (ai ) Mo?
'''''J
X Est. (Vi, h 1i, j)
"
+2: I Ml
{E(a,)}2 Mo2
( I
m; - 1 ) E (S 2
M, St. j
I I')
"
+ L
,-,'oJ
{E (ai)I E (a l) - E (!,al)}
V(Z,) = in';. L
1#1=1
{E (at) E(a j ) - E (ain)} (i,. - i,.)2
E (a.)
I (m,__1. - M,
-1).
S
.,
2
Hence
Est. V (i,)
REFERENCES
1. Hansen, M. H. and "On the Theory of Sampling from Finite Popula-
Hurwitz, W. N. (1943) tions," Ann. Math. Statist., 14, 333-62.
2. ---(1949) "On the Determination of Optimum Probabilities
in Sampling," Ann. Math. Statist., 20, 426--32.
3. Sukhatme, P. V. and "Sampling with Replacement," Jour. Ind. Soc. Agr.
Narain, R. D. (1952) Statist., 4, 42-49.
4. Mahalanobis, P. C. (1945) .. Report on the Bihar Crop Survey, 1943-44,"
Sankhya, 7, 29-106.
5. ---(1948) "Report on Bengal Crop Survey, 1944-45," Depart-
ment of Agriculture, Forests and Fisheries, West
Bengal.
6. I.C.A.R., New Delhi (1950) "Report on the Sample Survey for Estimation of
Acreage under Principal Crops in Orissa"
( Unpublished).
7. Horvitz, D. G. and "A Generalisation of Sampling without Replace-
Thompson, D. J. (1952) ment from a Finite Universe," Jour. Amer.
Statist. Assoc., 47, 663-85.
8. Yates, F. (1949) Sampling Methods for Censuses and Surveys, Charles
Griffin & Co., Ltd., London.
CHAPTER IX
SYSTEMATIC SAMPLING
9.1 Introduction
So far we have considered methods of sampling in which the
successive units (whether elements or clusters) were selected with
the help of random numbers. We shall now consider a method
of sampling in which only the first unit is selected with the help
of random numbers, the rest being selected automatically according
to a predetermined pattern. The method is known as systematic
sampling.
The pattern usually followed in selecting a systematic sample
is a simple pattern involving regular spacing of units. Thus,
suppose a population consists of N units, serially numbered from
1 to N. Suppose further that N is expressible as a product of
two integers k and II, so that N = kn. Draw a random number
less than k, say i, and select the unit with the corresponding
serial number and every k-th unit in the population thereafter.
Clearly, the sample will contain the n units i, i -1- k, i + 2k, ... ,
i + n -_:--lk, and is known as a systematic sample. The selection
of every k-th strip in forest sampling for the estimation of
timber, the selection of a corn field, every k-th mile apart, for
observation on incidence of borers, the selection of every k-th
time-interval for observing the number of fishing craft landing on
the coast, the selection of every k-th punched card for advance
tabulation or of every k-th village from a list of villages, after
the first unit is chosen with the help of random numbers less
than k, are all examples of systematic sampling. In the first
three examples, the sequence of numbering is determined by
Nature, the first two providing examples of distribution
in space while the third that of distribution in time. In the
fourth and the fifth, the ordering may be either alphabetical or
arbitrary approximating to a random distribution. In the latter
case, a systematic sample will obviously be equivalent to a random
sample. The method is extensively used in practice on account
of its low cost and simplicity in the selection of the sample.
The latter consideration is particularly important in situations
where the selection of a sample is carried out by the field staff
27
418 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
(2)
and
(3)
E(P") = iL
'.1
YI.
= Y.. (4)
k
_ 1 \ ' (., -)2
- k W VI.-Y .. (5)
''''1
(6)
V{;')
V,. R
= (~n - N}) 8 2 (7)
= t
1 lf1t
4-1 J-1
Y'J - y.}'
- k~ t {t (yu - Y->}'
SYSTEMATIC SAMPLING 421
~ (V.
+ L...J j, I' ) (v
- •..• ..1 '-)'" )}
j~I'~l
~ W
+ L...J ~"' (" _.- Ii r'l) •.•
) (I','
• f}
- ,-, )}
r"
(8)
,--} r/ )'=-1
kn (9)
(kll _. i) S~
or
k "
}; }; (Vi) _. 5'..) (Yi;' - )'.. ) -= (n -- J) (kn - J) pS2 (-10)
1"" i¥I'=1
"-1 kn - 1
= 2 1: k (n - a) p"S2 • - - -
"=1" kn
n-1
(IS)
1: L
k
n
· -r
LY"S
le1
424 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
{ o?;
k "
(YIJ - Y.Y
k n }
+ f...J
"_ w " (v. 'J - V ) ()!., - v·,)
•• ) .1
t)
(18)
1=1 rrr=l
The second term contains kn (n - I) product terms, of which
2k (n - 1) are products of y deviations from the respective strata
means separated by one row, 2k (n - 2) are products of y
deviations separated by two rows, ... , 2k (n - a) are products
of y deviations separated by a rows, ... , and 2k separated by
(n - I) rows. We can, therefore, write (18) as
t~~
~ 2 (y" - 5'.,)(y"," - 5'.,..)}
kn' {t t (y" - y,)'
+2 (~ k (n - aJ P,.,. )
x (f'W Wf'
J=l \=1
(Y;1 - YJ
n(k -1)
2
)} (19)
where
(20)
(Y.J - y.;)2
n (k -- If
SYSTEMATIC SAMPLING 425
-
V(jids~ =
k - 1
-k-·
8 .. ,2
n (1 r+ k
k-
~
I 2n f;: (n - a)
1
P(a)"'5 (21)
Yh = I-' +h (22)
Clearly,
N
Y.. ~ Lh=l
(I-' + h)
= I-'
N+ 1
+ --2- (23)
(N + 1) (2N + 1)
= NI-'2 + N-----'"6--- + I-'N (N + 1) (24)
426 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
and hence
S2 = N ~1 {t y.2 - NYoo2}
."1
_ nk (nk + 1)
- 12 (25)
and for the same reason, since the column means corresponding
to k different systematic samples also increase by unity, we have
the mean square between column means given by
S 2 = k (k + 1) (27)
c 12
Substituting from (27) in (6), from (25) in (7), and from (26) in
(16), we get
k2 - 1
VSr = -12- (28)
(k - I)(nk + 1) (29)
VR = 12
and
(30)
Hence
k +1
Vs: Vs.: VR = - n
- : k + 1: nk + 1
or, approximately
1
'" %n
=n 0 0
(31)
Example 9.1
A pilot survey for investigating the possibility of estimating the
catch of marine fish was conducted in a sample of landing
centres on the Malabar coast of India (LC.A.R., 1950). At
each landing centre"' in the sample, a count was made of the
number of boats landing every hour from 6 A.M. to 6 P.M.
Out of the boats landing during each hour, the first one was
selected for observation on weight of fish, the product with the
number of landing boats giving an estimate of the catch brought
during the hour. Table 9. 1 shows the number of boats landing
per hour at Quilandy Centre for seven consecutive Mondays.
Calculate for each day the values of p between observations
separated by k = 2, 3, 4 and 6 hours and hence investigate the
effectiveness of systematic sampling relative to random sampling
for making observations during the day.
The first step in the calculation consists in making the analysis
of variance tables on the number of boatS landing per hour for
each of the several cases (k = 2, n = 6; k = 3, n = 4; k = 4,
n = 3; and k = 6, n = 2) on the data for each day. The next
step is to substitute from these tables the values of the mean
SYSTEMATIC SAMPUNG 429
squares between (nS c2 ) and within systematic samples (Swc 2 ) in
the expression for p, namely,
p (32)
Hour
of Day
6-7 7-8 8-9 9-10 10-11 11-12 12-1 1-2 2-3 3-4 4-5 5-6
Week
'",
42 52 19 6 23 56 36 59 14 14 2 6
2 23 39 33 27 52 32 39 45 33 9 6 0
31 25 13 6 15 9 19 2 9 31 16 27
4 41 14 15 24 19 24 21 27 21 32 45 57
8 16 14 8 47 39 20 26 33 34 61 45
6 2 6 20 28 28 17 40 41 41 37 21 31
7 16 15 4 14 12 4 15 10 17 10 18 16
430 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
9_2 TABLE
Analysis of Variance Table together with the Values of p and of
the Variance of Systematic Relative to Random Sampling for the
Data Collected on the Third Monday
k = 6,n =2 k = 4, n = 3 k = 3, n = 4 k = 2, n = 6
Source of Variation
D.F. Mean Mean Mean Mean
Square D.F. Square D.F. Square D.F. Square
TABLE 9_3
Values of the Intra-Class Coefficient of Correlation and of the Variance
of Systematic Relative to Random Sampling
Values of p
Size of
Sample 2 3- 4 6 2 3 4 6
n
Week
1 1) 2 (33)
( n - nk S",.
where swc 2 is the mean square between units within the selected
systematic sample, i.e., a column of diagram (1). Clearly, (33)
does not provide an unbiased estimate of (6), fOf,
_1 E
n- 1
(r;n )' ' I)
2 _ nv
• I,
2)
J=l
= n ~ I [~ t. t y,/- ny_'
- nn\-;; I ~ {I + (n - p I)}]
= nkn"k 1 SI (l - p) (34)
432 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
_!_)
( !n _ nk nk - 1 82 (1 _ p)
nk
(35)
(37)
«=1
(38)
= 1 E -{
11
i: y,} '
= y.. (39)
where Si2 denotes the mean square between the second-stage units
of the i-th selected first-stage unit, and Pi denotes the intra-class
correlation between second-stage units within M 1m columns of
m units each which can be formed out of the M units of the
i-th selected first-stage unit.
Substituting from (42) in (41), we then obtain
= nlN (I -1t) L
.=1
S,2 {I+p, (m-I)} (43)
(44)
where
N
8. 2
= N~ 1 L (Y .. -y_)2
'-1
436 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
while the last term is obviously zero. Interchanging the first and
the second terms in (40), and substituting from (43) and (44), we
thus obtain
V(Ynm) = G- ~) So2
(47)
V (,., )
VflIn
= (-~n _ N
1) S 2 + ~
°n
(1m _ M1) S., 2 (48)
_
- E . {) + V (y,.,
[ ....,'p-,2
i.
- \J
II')S.J
~ E[ t {S" + (I - ~) ::
;qI+p,(m~ I»)}]
x {I -I p, (m--l)}} (51)
2
E(Sb ) = Sb 2
+~ (I -~) ~ I: {I +Pi(m-··I)}Sh53)
(55)
(56)
E(',) ~ {~
E t '".,J
~E {~ t E('"." I i)}
~ {~ t
E "1
SYSTEMATIC SAMPLING 439
in virtue of (4),
N
= L.J
\' P·i. 110.
= Y.. (57)
To obtain the sampling variance of zs. we write
V(z,) = £(i, -Z.,)2
= £ (in(m;) - in. + zn. - i,,)2
= £ (in(ml) - zn.)2 + E (zn. - Z.. )2
+ 2£ {(.in(m;) - Z". ) (z". - z" )} (58)
Taking the first term in (58), we have
+ 1:, (=I(mll
",of.,
- Zj.) (Zj'(m{)- ZI', >}
= n~ £ [ 1: E {(Zjf..., - .i, .)2 I i}
1: £ {(Z'(tn;l- .i,.) I i}
+ 1",,1'
L k) !i;2
N
(59)
Sb.
2
= 11 ~ 1 L: (Zi(mj) - z,(",,)2 (60)
and
with
jiim A_
Z;", NP, = 59A, Yi",
The values of Zim for the selected centres are given in the last
two columns of Table 9.4, and the various steps in the compu-
tation of the average catch per hour and its standard error are
442 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
201 for m = 6
339 for m = 2
REFERENCES
NON-SAMPLING ERRORS
A. OBSERVATIONAL ERRORS
lOa.l Introduction
In developing the sampling theory in the preceding chapters,
we assumed that the character observed on the i-th unit of the
population (i = 1, 2, ... , N), takes a unique value Yi whenever
the unit is included in the sample, irrespective of the person
who enumerates it. By implication we assumed that a complete
count of all the N units gives a unique value for the mean or
the total of the population. In practice, however, the situa.tion
is rarely so simple as the one described above, since the value
observed on any unit will also depend upon the enumerator
reporting the value. Thus, an eye-estimate of the yield of a crop
in a field will depend upon the judgment of the enumerator making
the estimate and will invariably be different from the true value
of the yield obtained by harvesting the crop in the field. The
magnitude and direc;_tion of the difference will depend upon the
enumerator's intrinsic tendencies or biases and the approach to
the selected unit or interviewee at the time of reporting the
value. Even with factual characters like those of farm facts,
e.g., the area under crops, the number of animals on the farm,
etc., there is found to be a marked variation in the performance
of the same or different enumerators. It follows that even when
the sampling fraction is unity, or, in other words, a complete
count of all the N units is made, the result will vary in repeated
counts. As the errors responsible for this variation arise in the
process of collecting data, they are properly observational errors
but are also referred to as response errors (Hansen et al., 1951).
Together with the errors arising from incomplete samples and
faulty procedures of estimation, they go to make up what are
termed as non-sampling errors.
We have given several examples in Chapter I to show that
the net effect of non-sampling errors on the value of the estimate
NON-SAMPLING ERRORS 445
can sometimes be serious, and that it is, therefore, important to
control them as far as practicable. In Part A of this chapter
we shall deal with the measurement and control of observational
errors and in Part B with the treatment of incomplete samples.
i = I, 2, ... , II
j = 1, 2, ... , m
k = 0, I, 2, ... , nil
denote the value reported by the j-th enumerator on the i-th unit
for the k-th occasion. It will be seen that m enumerators have
been assumed to participate in the survey, with the j-th enumerator
making nij observations on the i-th unit in the sample.
The difference between the reported value and the true value
is called the error of ohservation, and for any given measurement
technique will depend upon the enumerator reporting the value,
the interaction of the enumerator with the true value of the unit,
and the mood and like causes at the time of reporting. The
reported value may, therefore, be considered as being made up
of four uncorrelated components as foHows:
(I)
where
represents the bias of the j-th enumerator in repeated
observations on all units,
the interaction of the j-th enumerator with
the i-th unit,
the deviation from Xi + aj + 0i; when the
j-th enumerator reports on the i-th unit on
the k-th occasion,
446 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
and
E (£'w;.1 i, j) = OJ (2)
E (8'1 Ii) = °
In most surveys, however, repeated observations on the same
unit by the same enumerator or, indeed, by different enumerators
is not practicable. A model which is likely to be more in line
with the practical situation can therefore be represented by
(3)
Y.; (4)
and
Y.. (5)
Y.i Fz .L:
i
X,n'j + +~ L:,
aj £,PH (6)
and
h m m h
Y..
I
h L: Xj +~ L:a+ L:L:
i
i
l
n- £jjnjj (7)
It follows that
1
E(y)=l
.i N +lJ
(8)
Substituting for Y.j and E (Y.j) from (6) and (8), we have
( 10)
V(j.,) ~E (! ~ ,,)'+
x,n" - E(a, -'1'
+ ~, E ( t ,,,n.) (Ill
jj! 1+'
I " E (X/) n,,2
= S/
(12)
450 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
where
N
where
M
i: E
+ 1#" (£jj£I'J Ij) n1jnl'J}
and
E (£,,£,'j) = 0
Hence, substituting from (12), (14) and (16) in (11), we get
+~ E (t t ,,,n.}(19)
(20)
Similarly,
E (Im W
f' a _
J
a)2 ~~ S (Im _ MI)
CL
2 (21)
I
2
= 8<
n
-
8 <2
- - hp (22)
since
V 0' )
..
=--" 2 (I_ _ MI)
SZh(I _ N1) + 8am 2 _j_
I
8~
•
hp (23)
\.Y.. h m
+ hp< (24)
(26)
.E
. E(y,/) - mE (y .. 2)
(m - 1)'E(s/) =
,
= L'[V(y) + {E(y)Pl - m [v(Y .. ) + {E<Y .. Wl
I
E (s
• In m-p
S -, -_. - - + s" 2 + m S2
2) = (29)
• • hp m - I hp·
(30)
we get
(31)
(33)
S
•
2 - I
-- h - I
L-
h
(h I - -)2
)
J .. (34)
h
~,~ 1: {V (yj. )} - h V (ji.. ) (35)
j
(37)
The set of three equations (29), (33) and (37) provides the
estimates of SX2, S,,2 and S/. In particular, we obtain
I= P (m - 1) (h - 1) {2 m 2
Est...
S pmh - ph - pm + WI s. + h (m _ 1) of.
m' 1
hp (m - I) s.o! J
(38)
456 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
and
E ( s. 2) -- S•2 It. m- 1S
+ S•2 + h-=-I m ..
2
(41)
(II - I) S.2 == (m - I) h_
m
5. 2 + (II - m) s./ (42)
Est. Sa 2-- 5.
2
-;; Iim S ••
2
(43)
and
(44)
The expression for the estimate of Sa2 given in (43) can also be
derived directly from (38) by putting p = 1 and substituting for
SUI from (42). It follows from (25) that for Nand M infinitely
large and p = 1,
2
Est. v(;,
v.. ) = 5.
m (45)
1 h-m I
Est. V (Y .. ) = h 5.
2
+ in-=! -- (Sl-S
h · ••I) (46)
NON-SAMPLING ERRORS 457
;- L L J;;2}
J (
~- ~ ~>}
S 2
+ m-I
"
where
true variance between village means;
f number of fields in the sample in the sub-
division;
number of fields in the sample harvested by
the j-th enumerator;
number of fields in the i-th village for the
j-th enumerator;
h number of villages in the sample in the
subdivision;
and
m = number of enumerators, two in the present
case.
On substituting for B, E and W from the analysis of variance
table, we obtain
S..I = 468·4
NON-SAMPLING ERRORS 459
and
S,.2 is thus seen to be a little larger than 10% of the total variation.
A test of significance of the differential bias is provided by the
ratio of E to B. The design being non-orthogonal, however, the
two mean squares are not independent, and the ratio of the two
cannot be regarded as following the F distribution. An appropriate
test may be provided by a comparison of E with
where
L t. (], ~ J,;)}
-(h ::__ m)
and
{(
I _ kt')
kl
w + ~l' n12
kl f
. (1 - Z:T ~; + (~:Y·~;·
On reference to Snedecor's F Tables, it will be found that the
observed ratio is smaller than the F .• value, showing that S,.2
is not significant.
460 SAMPLING THEOR.Y OF SUR.VEYS WITH APPLICATIONS
Clearly, the sample mean for the loth stratum will be given by
., m,
Y-'=
..
I
h,
\'x,+_2_
W \'a'+E'-
m t W!
I
(47)
where
Nt (48)
P, N
It follows from the results of the previous sections that
k k
= l' P,/L, + ,l.,' p, a,
t=1 1=1
~'IL'+a (50)
Similarly
V (Y,.' I h,)
S 2
,_ t.;_ + S2 + _!_,,:_
__!_
S2
(51)
h, hi m,
where Stx 2 is the variance of xit, St./ the variance of at and S.2,
for the sake of simplicity, is assumed to be constant from stratum
to stratum; and
k
V (Y" I hI' h2' ' , " hk) = E p,2 V (.Y,,' I h,)
,"'I
462 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
whence
and
(56)
Example 10.2
The data for this example are derived from a pilot survey for
comparing the relative efficiency of plots of different size in
estimating the average yield of irrigated wheat in Moradabad
District of U.P. (India) in 1944-45. The design of the survey
and the method of assigning enumerators were similar to those
described in Example 10.1. Thus, in each of the five subdivisions
of the district, two inr1ependent samples of two villages each were
selected and allocated to two enumerators designated A and B.
In each village two fields were selected and in each field two
plots of each size were marked. The data relating to the plot size
of an equilateral triangle of side 25 links (117·9 sq .ft.) are taken here
for illustration. Table 10.2 gives estimates of the average yield
together with the analysis of variance for individual subdivisions,
and also the pooled values for the district. Estimate the contri-
bution to the total variation due to S0.2.
The calculations are straightforward. Substituting from the
table the values of E and B in the formula
BaS I
t. ,. == s,. - 2 m,..t.
h, 0)-,( •• )
(E, - B,)
= -8--
NON-SAMPLING ERRORS
463
ti
.;: g,
~
is .'" \0
~
..... ..., cd~ 0-
..,~
..... co
.;,
~ ....'" ,.,..... Inl
0
~ 0-
~
.s... '-
-« ~~
....~
00
;:g
0-
....
.... ....
N
N
N
.8 o !.I
'OJ '0
o- ~ 00
M
""
..... ~ ....
~ z~ ~
~
0
..,'"g, ':'I
\0
.....
00
....
~
c~
~'"
....
....9 ..,.... ..,..,~
\0
....<.J g,
';;: 0 0 .... c~ 9 ':'I
.... .,~> ,.:..
;;:; ;,., ..... "''''
"::1 ..,\0 3;
,~ -::- .... ~rZ ;:g
~ ~
~
"<t
,_<: ~
00
N
co
N
~
-c
t-i o !.I u:
2- '0
0- co 00 10
N
~
0
~ N
~
z~ ~
l: ~ ., ~
~ ~ "
N 01)
~~ 9
VI N "<t C
...... ':' "?
.,E
-
0
Pol
,:;
....
'c-
~
'"
>
-«
6
N
N
,.:..
10
'"
'"
00
N .'"
.~
~&
N
VI
.....
....,
.....
~
00
...,0 '"
on
~
..J ~ ~
!XI
~ ~
<:u
.. '-
o !.I ~ u:
Ii::
..
~
...
'0
0-
z~
00 10
-
~
'c-
.~
0 N
~ "I:
& ..c- .....
~ 'c- .....'" 9
10
00
,.:..
0-
:g l\l c~
~'"
':'I
..,
':'I
S~
N
~g.
;:.. "I:
~
N
E! >
~ -« '" N
til
0-
on
V'.) .~ '"
'-
on
~ t!i o !.I
'0
0_ 00 00
~
u:
~ z~ 0 N
~ 0 10 c~ ~ ':'
~ 0- ...,
6 .....
0- 11'" .;, 10..,
~g. on
~
N
-« ~ on ~ ~ ..., ~
til
i
'-
o !.I
'0 ~
~
00 00
0-
z~ 0 N
...
8
8
'Wl
B
'S:
;g
'Wl
::s
'S:
;g
~
i&i :>it :!
(I')
~ ~
]
l IJ
II II j rl •
464 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
we obtain the values shown in the last row of the table. The
average magnitude of Sta 2 over the five subdivisions works out
to 6569·3.
Also
= 19644·1
Thus Sa2 accounts for about 33% of the total variation.
It should be pointed out that both in this and the previous
examples the estimates of the component of variance due to
differential bias are based on small numbers and, therefore, not
sufficiently reliable. They are presented here only for purposes
of illustration.
and its expected value and variance for a given set of hI, h2'
... , hk by
t
1L
'~1
h, (iL, + ii,) (58)
NON-SAMPLING ERRORS 465
and
k k
}i2 L:
t::l
ht (St/+ S/) + Z2 1::,., h, 8, ..2
(59)
It is seen from (58) that the conditional expected value of 5'" does
not equal f..L + ii, To the expression (59) we must, therefore,
add the square of the bias component and then take the expecta-
tion in order to obtain the variance of }'", We write
_
- E {
J
JIi 1:k
I=J
2'.
h, (StJ . \ S.)
'2
+ II112 1:A }
11101
h, S'a
2
(60)
~ L Pt (S'J
2
+ S/) + Z 1: PtSta
2
(61 )
E {t Gt - PI) + t)}2
IEl
(fL, u
=E {t (~ - p,Y +
1=1
(fL, 0.,)2
30
466 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
I: Z=7. l!1~!h-=P_J
k
= (/i, + ii,)2
1=1
I: %~-1' 1
'".""1
P1 ..!. (/i, + a,) (/i,' + iii')
N -h 1
N":'_-1 . h
I·
I:
t~-I=t'=l
P,P,' (/it + ii,) (/i" +
N-h
..... .
1 k
\ ' P, (/it + a,-- - 2
/i - a) ]
N-I h [ W
Ie 1
L
k
!L L
k k
!L
k
+ lL t=l
Pt (fLt + iit - fL - ii)2
k k
+Z L PtStr/ - ~ L
t_l
PtS tt1.
2
(64)
whence
k
V (Y ) =
..
~/
h
+ ('m - hi) W
'\' p,Stt1. 2
(66)
t_1
k
If we define St1.2 = }; PtStt1.2, we obtain for V (y,,) the same
,=1
expression as in (26), as we would in fact expect.
lOa.7 An Alternative Expression for the Variance of the Sample
Mean in Terms of tbe Covariance between Responses
Obtained by the Same Enumerator
Let SYI denote the covariance between observations recorded
by the same enumerator. Then, clearly
= E E {(x I + 1 + Ell -
0. fL - a) + 1 + E 1'1 - fL - Ci) Ij}
(XI' 0.
(61)
(68)
(69)
where
and
cp = S/ + S .. [~
2 - ~J + IL (c 1h + C2m + ('3 y'hm - Co)
(70)
where fI- is a Lagrangian constant.
Differentiating ¢> with respect to h, m and fI- and equating to
zero, we obtain
(71)
(72)
and
(73)
(74)
where
S.. 2
(75)
(76)
470 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
Party A B C D E F
_.-.--- .~--- .- _-
% Illiterate 88 75 90 87 62 95
% Economically independent 67 46 77 50 46 31
.~~--'--- .---~--." - . ---_ .. - -
Example 10.4
This relates to the data on acreage collected in the course of
a surprise check to which a reference has already been made
in Section 1.8. Acreage under crops in India is compiled by the
village accountants by noting the names of the crops field by
field in the course of their administrative duties. As all the fields
are surveyed and mapped and the area of each field (survey
number) is, therefore, accurately known, the total area under any
crop is obtained by simply adding the area of the fields growing
that crop.
472 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
TABLE 10.4
Comparison 0/ Crop-Acreages
Gross Area (Sq. ft.) as recorded hy
Name of Crop Difference Percentage
Village Statistical (2) - (3) over (3)
Accountants Staff
strengthening the supervision over the work of the field staff and
the conduct of similar checks in other parts of the country.
On analysing the data, it was found that Sa 2 did not exceed 5~~~
of the total variation in the case of any crop.
B. INCOMPLETE SAMPLES
10b.l The Problem
It is common experience that some of the units selected in the
sample do not respond, at least at the first attempt, and indeed
may not respond even after repeated attempts. Thus the selected
farmers or families may not be found at home at the first attempt
and some may refuse to co-operate with the interviewer even if
contacted at the second attempt. Persuasion and further attempts
are therefore invariably required for achieving completeness.
This, however, increases the cost of the survey. On the other hand.
estimates based on the incomplete sample may be biased. This
extent of incompleteness, called non-response, is sometimes so
large as to completely vitiate the estimate. The problem is
particularly important in interview surveys. In the following
section, we shall give the solution of the problem as first put
forward by Hansen and Hurwitz (1946), which consists in drawing
a sub-sample of non-respondents and enumerating it completely
through later attempts, the size of the total sample and that of
the sub-sample in the non-response group being so determined
as to give an unbiased estimate of the population value with the
desired precision at minimum cost.
10b.2 The Solution of the Problem of Incomplete Samples
We shall suppose that the population can be divided into two
classes, those who will respond at the first attempt and those
who will not. For convenience we shall call the two classes as
the response and non-response classes. If nl units in the sample
respond and n. do not, then we may regard nl a random sample
of the response class and nl a random sample of the non-response
class. Let hI denote the size of the sub-sample from n. to be
enumerated at the second attempt, such that
(81)
Further, let Nl and N. denote the sizes of the response and the
non-response classes in the population. Clearly, Nt and N. cannot
be known and can only be estimated from the sample. We have
Est. Nl = nlN
n
NON-SAMPLING ERRORS 479
and
(82)
(~3)
(84)
and
E (naY., I n) = E E E (nJ'., I hs. Yl> •••• y.,., n)
= E E (n 2}·., I nl. n)
= E (nJ'N, I n)
480 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
whence
E(j.,) = {NtYN1: N~N.}
(85)
= E {(J-' - )-, ) + n
tI N
n "(I)2
It~
_ "I'
"'l
>}2
+ 2n"- C-
V-V-) 'C-- 1 '-)} (86)
n -n - N )h, _',
so that
where
N,
N2 - I " (J'
W . i
-)'
. N,
)2
i=l
Hence
The value of the third term in (86) is clearly zero, Hence, from
(86), (87) and (89), we get
V(j') =
Ie
(In _ NI) 52 + f~-=-J
n
. NN z S.2
-
(90)
and
(93)
which reduces to
Hence
(94)
or
( V. + NS')1I = 1
nl
{N
SI + N
I
(j - 1) Sat }2 (96)
NON-SAMPLING ERRORS 483
82 + Z2 (f - 1)822
n ..... 82 (97)
Va + fir
Equations (94) and (97) thus provide the values of nand f required
to estimate the population mean with the desired standard error
at the minimum cost.
An example will serve to illustrate the method. Suppose the
response rate is 50% and S22 for the non-response group is 4/5
of that in the whole population. In other words,
Nt Nt
N =N
= 0·5
and
482
8 22 =
-:5
Ignoring the finite multiplier, we then obtain from equation (90),
the variance of the estimated mean as
(98)
To work out the cost of the survey let us assume that it costs
one rupee to contact a unit, four rupees to enumerate and
process information on that unit and eight rupees to enumerate
and process information on the unit in the non-response group.
The total cost of the survey is, therefore, given by
(99)
484 SAMPLING THEORY OF SURVEYS WITH APPLICATIONS
f n C
100 700 50
2 - 140 700 35
3 180 780 30
4 220 880 27
tJ " $
4pll" . . . .
BANGALORE
q.'r"__ ~
"
I
UNIVfRSITY LIBRARY,
DEC "to
i
J\C: filo.Y~
r.A flf;a2 III1U11' _ _.....
Regression). Ls
Ancillary inform~ti~ff!.- ~ and
Area sampling, 2 ,389. ( -- ~ -.
Asthana, R. S., 256, 284.
_ interval)
for means, 38-40.
_f~ns,47.
for ratio estimate, 158·~·160.
Consistent estimate, 143.
Ayachit, G. R., 146, 18 Corn-borer damage, estimate of. 343-349.
Correlation coefficient (st't' also Intra-
B class correlation), 142.
Cost functions
Bancroft, T. A., 347. in determining optimum allocation
Bias, 10-17. among strata, 86-89.
due to difficulty of identifying units, in determining optimum design for
14, 16-17. yield surveys, 86, 298-302.
due to inaccuracies in strata sizes, 109. in determining optimum distribution of
due to incomplete samples, 10, 478. sample among three stages, 311-312.
of enumerators, II, 13-14, 17,444-445. in determining optimum distribution of
in estimates from cluster samples, 265- sample betw"en two stages, 294-296,
267. 339-340. 363-364, 373.
in estimates from two-stage sampling, in determining optimum number of
318-322. enumerators, 468-470.
in estimation of the variance of in determining optimum probabilities
systematic samples, 431-433. of selection, 71, 366·~369, 373 ·376.
in ratio estimate and variance, 143-144, in determining sampling rate for non-
146-147, 166-170, 323. respondents, 478-484.
in determining optimum ~ize of
C : sampling unit, 257, 260.
in double sampling for strrtthlcation,
Census 117-118.
utilization for improving precision of CrOI'-cutting (see also Yield surveys), 2,
sample, 182-186. 7-10.
comparison with sampling method, D
453.
Cluster sampling, 5--6, 238-284. David, F. N., 23. 73.155,186,237.
notation, 239-240, 265, 268. Deming, W. E., II, 17.
estimates, 240, 265-267, 268. Double sampling
variances, 240, 247, 266-268, 269. for estimating strata sizes, 112-120.
estimates of variances, 250-251, with regression method, 223-231.
269-270.
comparison with systematic sampling, E
417-419.
comparison with two-stage sampling, Efficiency, 124.
285, 302-303. of cluster sampling, 240-252, 270-284.
efficiency of, 240-250, 270-284. of different estimates, 169-170, 174,
efficiency estimated from sample, 213-215, 272-284, 325-326, 329,
250-252. 343.
efficiency in terms of intra-class corre- of ratio estimate, 160-164, 186.
lation, 243-247, 270-272. of regression estimate, 220-223.
with probability of selection propor- of sampling systems with varyin,
tional to size. 268-284. probabilities, 272-284, 376-379.
488 INDEX
E-(contd.) K
of stratified sampling, 124, 126, 134, Kiser, C. V., 11, 17.
223, 352-357. Kishen, K., II, 17,485.
of sub-sampling designs, 302-303, Koop, J. c., 141.
376-379. Koshal, R. S., 125, 137.
Enumerators
bias, II, 13-14, 17,444-448. L
covariance of response to, 467-468.
errors (see Observational errors). Linked samples (see Replicated samples).
optimum number of, 468-470. Livestock, estimation of number of,
selection and assignment of, 447, 460, 161-164, 171-174, 215-220.
465, 473.
supervision of, 11,471--477.
variance of biases, 450, 453. M
Evans, W. D., 98, 137.
Madow, W. G. and L. H., 423,425,443.
F Mahalanobis, P. C., 255, 257, 284, 389,
416, 474, 475, 485.
Finite multiplier (finite correction factor), Markoff's theorem, 23, 154-156, 194.
29. Marks, E. S., 468, 484.
Fish, estimate of catch, 428--430, 440-443. Mauldin, W. P., 468, 484.
Fisher, R. A., 40, 73. Mean deviation, 28.
Forecasts of crop acreage, 182. Mean square error, 28.
Frame, 3, 238. Midzuno, H., 72, 73.
Models for measurement of observa-
G tional errors, 445-446.
Gurney, M., 167, 187. Moments, expected value of higher order
moments, 188-192.
Multi-purpose surveys, 2'7-264, 297-298.
H Multi-stage sampling (see Sub-sampling).
Hansen, M. H., 167, 187, 247, 284, 305,
358,399,416,444,468,478,484,485. N
Hasel, A. A., 204, 237, 428, 443.
Hendricks, W. A., 256, 284. Narain, R. D., 70, 71, 73, 235, 237, 385,
Horvitz, D. G., 70, 73, 412, '1116. 416.
Hurwitz, W. N., 167, 187, 247,284, 305, Neyman, J., 23, 42,73,86,137,155,186,
358, 399, 416, 468, 478, 484, 485. 237.
Hypergeometric distribution (see also Neyman allocation, 86-88.
Qualitative characters). Non-random methods of sampling, 10.
confidence limits for, 47. Non-response (see Incomp)f)te samples).
for two classes, 42--48. Non-sampling errors (see also Observa-
generalized, 49-54. tional errors), 10-13, 444.
mean value, 43-44. Normal distribution, 145, 203.
variance, 44--47.
o
I
Observational errors, 11-12, 444.
Incomplete samples, II, 12,478--484. control of, 473--477.
Indian Council of Agricultural Research, measurement of, 445-446.
8,9,17,389,416,428,443,472,484. notation, 445, 446, 447, 460.
Intra-class correlation estimation of population mean from
between units of a column, 421. observations subject to, 448, 461,
efficiency of cluster sampling in terms 464.
of, 243-247. variance of observations subject to,
example of negative value, 248-250. 448--452, 461, 465-467, 468.
non-circular serial correlation, 423. Optimum allocation
within first-stage units, 377. case of incomplete samples, 481--484.
within-stratum serial correlation, 425. in assigning units to enumerators,
468--470.
J in double sampling, 117-118.
in stratified sampling with simple mean
Jessen, R. J., 230, 255-259, 284. estimate, 86-90, 92-93, 95-100.
INDEX 489
O-(contd.) R-(contd.)
in stratified sampling with ratio esti- comparison with regre~sion method.
mate, 170. 220-222.
in sub-sampling in three stages, conditions for best unbiased linear
311-314. estimate, 154-158.
in sub-sampling in two stages, 293-297, confidence limits, 158·-160.
339-343, 363-366, 373-376. qualitative characters. 174--178.
in successive sampling, 235-237. with stratification, 166-174.
Osborne, J. G., 428, 443. optimum allocation, 170.
combination of estimates, 167--168,
p 170.
with eluster sampling. 267, 281.
Panse, V. G., 259, 284,321,357. with sub-sampling, 317. 323-329,
Partitional notation, 31··-33. 332-335. 369-376. 404 410.
Pattrrson, H. D., 235, 237. with varying probabilities of selection,
Plot size, 14-17,253,462-464. 179 -186. 369 376. 404-410.
Preliminary sample, 42, 95-100. Regression method. I (}.l. 237.
Probability of inclusion, 4-5, 24-·26, simple, 194-203.213-215,220-237.
65-71. notation. 193.
Probability sampling. 3, 10. estimate. 194·-196.
Probabilities of selection variance. 196-·198.
determination of optimum. 71, estimate of varian{'e, 198-201.
366-369, 373-376. expected value of variance, 201 -203.
equal,4-5. weighted, 204-220.
unequal estimate. 204-206.
at the i-th draw. 24-25. variance, 207-208.
proportional to size, 181-186, estimate of variance. 208--213.
268-284, 361. 368. comparison with simple regression
in single-stage sampling. 60-72, estimate, 213-215.
179-186, 268-284. comparison with simple random
in stratified sampling, 127-136. sampling, 221.
in sub-sampling, 358·-415. comparison with ratio estimate, 220,-
in sub-sampling with systematic 222.
sampling, 438-443. comparison with stratified sampling,
Proportion (see Hypergeometric distri- 222-223.
bution or Qualitative characters). double sampling. 223-231.
Purposive sampling methods, 10. successive sampling, 231-237.
Replacement
sampling with. 3. 62··-65, 127-136, 157,
Q 179,-1 M6, 193-198, 224-230, 26M-270,
358-362. 404-410, 438443.
Qualitative characters, 42-60. sampling without, 3, 20, 65-72, 1911,
combined with quantitative characters, 204-208, 379-385, 410-415.
54-60. Replacement of sample (see Successive
ratio estimates of. 174-178. sampling).
Quota sampling, 10. Replicated samples, 460, 473-477.
Response errors (see Observational
R errors).
Random numbers s
use of, 6-10.
table of, 18--19. Sampling error, 2, 28.
Ratio method Sampling variance, 28.
notation, 138--139, 179,369,405. for incomplete sample. 480--481,
estimates. 139, 166--168, 179, 181,267, in double sampling, 223, 228, 230.
317. 369. 405--406. in successive sampling, 232.
bias, 139--147. 166--168, 176. 178. 180. in systematic sampling, 420, 434--436,
variances, 146-154. 168-170. 177-178, 439-440.
180--181, 370-372, 407-409. of estimates subject to observational
estimates of variances. 150-151,181. errors, 448-452, 461, 465-467, 468.
comparison with simple mean estimate, of mean per element in cluster
160-161, 164. sampling, 240, 247, 266--269,
490 INDEX
S-·(contd.) S-(contd.)
~ Q APR aQw.
2 4 APR 2010
'-16 +o14~ J'
,?"
G1CVlC Lib!'u,