Professional Documents
Culture Documents
These problems are for your practice only. You do not need to turn it in but it is strongly
advised that you go through each of these individually
Q1. Kanhaiya. In a recent Mood of the Nation survey, India Today asked a sample of
15,815 randomly selected respondents, Do you think that the Kanhaiya Kumar episode has
negatively affected the image of JNU ? In answer to this, 65% of the respondents answered
in the affirmative. For this study, identify the
Q2. Identify the relevant underlying population and suggest an appropriate sampling scheme
that should be used to estimate
a) Proportion of adult IIMA community members who exercise regularly.
Sampling scheme : Each section of the IIMA community (different groups of students,
students, faculty and their families, groups of staff members) can be assumed to be a cluster
because exercise habits should not depend on the sections of the community. A simple
random sample of clusters can be selected (out of the above clusters) and the proportion of
members who do regular exercise (in each cluster) can be calculated.
Population : All current PGP I, II, X, FPM, FDP & AFP students of IIMA.
Sampling scheme : Use stratified sampling (with proportional allocation) from each of the
above groups i.e first decide on the sample sizes to be drawn from each of the above groups
using the proportional allocation rule and then select simple random samples of the above
sizes from the groups.
c) The proportion of non-vegetarians in Ahmedabad.
Q3. Suppose you want to know the proportion of IIMA PGP students who went home during
the last weekend. Accordingly you select your close friends in your section and ask them
about their plans.
i) Experiment
ii) Simple random sampling
iii) Observational Study
iv) Stratified random sampling
v) Convenience study
Q4. For each of the following studies, explain whether an experiment or observational study
would be more appropriate and also identify the response and predictor/s
c) Whether or not a special coupon attached to the outside of a catalogue makes recipients
more likely to order products from a mail-order company
d) Whether longer hours doing Facebook tend to be associated with lower grades
e) Whether women working in brick klins are more prone to give birth to infants with birth
defects compared to women from the general population.
Q5. Suppose following are the values in some population: 5, 27, 4, 17, 4.5, 19, 2, 11, 3, 6, 13,
18. A sample of size 4 is taken, and is observed to be 3, 4, 4.5, 2. Is it most likely to be (a) a
simple random sample, (b) a stratified sample or (c) a clustered sample? Give reason for your
answer.
A stratified sample should have units from EVERY strata; a cluster sample should have ALL
units from the sampled clusters (each of which should be heterogeneous).
Hence (a) is the correct answer, as either for (b) or (c) there will be more variation in the
sample. In case of cluster sample the chosen cluster(s) will have more variability, in case of
stratified sampling the variation between the strata should be high and therefore the sample
will be heterogeneous.
Q6. IIMA income : An agency wants to estimate the average monthly income of IIMA
employees. The agency designs the following sampling plan to get the estimate.
1. Divide the IIMA employees into five different groups: senior faculty, junior faculty,
officers, supervisors, and contract workers. The number of people from whom the
information is collected in each category is proportional to the share of the employees
in that category.
2. On one weekday morning, the agency surveyors go around all the IIMA offices and
collect income information from employees of different category until they reach the
specified number for each category.
(a) In this plan, is it a good idea to classify the employees into different groups? Why or Why
not ?
Yes, stratification by employee category makes sense because average salaries are expected to
vary substantially across employee categories. However, more groups/strata may be necessary
since, for example, income of officers may vary quite a bit based on rank and/or seniority.
(b) What kind of a sample is this ? Is this sample likely to give the agency a reasonable
estimate of the mean income ? Why or Why not ? Can you think of a better sampling plan than
this ?
This is a stratified sample but not randomly selected from each stratum. The sample may be
biased, for example, towards faculty who are likely to come to office in the mornings (it does
not take into account those employees who come late to the office and hence suffers from
undercoverage). Basically, the above sampling process does not ensure that every employee in
each category is equally likely to be in the sample.
Someone recommends to the agency that they randomly select one of the employee categories
and sample everyone in that category to estimate the mean income. Would this plan make
sense to you?
Not at all; this is an example of cluster sampling (with very bad clusters since strata are being
treated as clusters). It does not make sense because any one cluster of this kind will not
represent the entire population of employees.
Q7. Read the news articles in the links below and answer the questions that follows :
a) http://www.bbc.com/news/education-28327921
Biases/Comments: would have been more interesting if people from different occupations
were examined (since occupations often influence our working hours).
b) http://www.bbc.com/news/health-28546656
Type of study/Sampling design : observational study (people were just questioned about their
alcohol consumption).
Sampling frame : all middle-aged adults in the U.S.
Sample size : 6500.
Biases/Comments : additional information on the amount of alcohol consumed could have
been collected too (just to check whether it has any confounding effect on the perception of
abuse).
c) http://www.bbc.com/news/health-18856658
Type of study/Sampling design : observational study (the womens health status was
tracked/observed over a period of time; would have been an experiment if a group of women
were asked to drink alcohol and the another group asked to abstain and their degree of
cognitive impairment were measured).
Sampling frame : all elderly women in the U.S.
Sample size : 1300.
Biases/Comments : no comments as such.
d) http://www.bbc.com/news/science-environment-28512781
e) http://www.bbc.com/future/story/20140729-is-it-bad-to-bottle-up-anger
Type of study/Sampling design : experiment (patients given questionnaires and followed up;
health status was matched with their responses in the questionnaires).
Sampling frame : all patients with heart ailments.
Sample size : 13,000.
Biases/Comments : no comments as such.
Q8. Raghuram Rajan. Suppose in the context of Raguram Rajans decision to quit as RBI
governor in September, NDTV would like to conduct a poll to gauge the mindset of urban
Indian youths. in the age group 20 25. The question that would be asked is Do you think
that Raghuram Rajans decision to quit as RBI governor would negatively impact the Indian
economy in the long run ? . The permissible margin of error (of the resulting sample
percentage) is decided to be 3%. Then
a) How many urban youths should the poll sample if NDTV has no idea of the true
population size (i.e total number of urban youths in India).
Ans : Here, .03 = 1/ n i.e n (1/ .03)2 1112 . Thus the poll should sample 1112 urban
youths approximately.
b) How many urban youths should be sampled if NDTV assumes that the total number of
urban youths in India (in the age group 20 - 25) is 10 lakh.
1 1000000 n 1000000 / n 1
Ans : In this case, .03 = which implies .0009 i.e
n 1000000 1 999999
1000000
900.999 1000000 / n and hence n 1109.88 1110
900.999
c) Comment on the difference in the sample sizes obtained in (a) and (b) above.
Ans : the difference in sample sizes between (a) and (b) is negligible. Still the sample size in
(b) is lower than in (a) probably because (b) is a finite population scenario while (a) is an
infinite population one. However since a population of 25 lakhs can be treated as infinite for
all practical purposes, the difference in sample sizes is negligible.
Q9. Dog bites @ IIMA. Because of the recent spate of dog-bites on the IIMA campus, you
want to design a survey to estimate the proportion of the IIMA community who favour the
proposal of complete eviction of dogs from the campus. Accordingly you walk the length and
breadth of the campuses, both old and new, on a nice Saturday afternoon asking anyone and
everyone you meet, Do you disagree that the complete eviction of dogs from the campus
may have a significant negative impact on security issues? Based on the feedback of the 125
people you met, you report that 82% of the entire IIMA community are in favour of the
proposal.
a) Sampling design: Not a simple random sample actually a convenience sample, there is
sampling bias.
b) Undercoverage: Here the sampling frame consists of only those members of the IIMA
community who tend to stay outdoors (but in campus) on Saturday. Hence a large section of
the IIMA community is not being considered.
c) Nonresponse bias: Unlikely, because this is a face-to-face interview and there seems to be
no reason for the interviewees for refusing to participate.
d) Response bias: Quite likely since the question is worded in a very complicated manner.
Hence the responses may be artificially biased towards a particular direction (positive or
negative).