You are on page 1of 7

IIMA PS II 2016

Practice problem set 1 : solutions

These problems are for your practice only. You do not need to turn it in but it is strongly
advised that you go through each of these individually

Q1. Kanhaiya. In a recent Mood of the Nation survey, India Today asked a sample of
15,815 randomly selected respondents, Do you think that the Kanhaiya Kumar episode has
negatively affected the image of JNU ? In answer to this, 65% of the respondents answered
in the affirmative. For this study, identify the

a) Population: adult (educated) population of India.


b) Parameter: true (unknown) % of adult (educated) Indians who believe that the Kanhaiya
Kumar episode has negatively affected the image of JNU.
c) Subject/ Sampling unit: each of the 15,815 individuals who participated in the survey.

d) Sample: 15,815 individuals who participated in the survey.

e) Variable : opinion about the question (yes/no).

f) Statistic: sample proportion of affirmative responses (65%).

Q2. Identify the relevant underlying population and suggest an appropriate sampling scheme
that should be used to estimate
a) Proportion of adult IIMA community members who exercise regularly.

Population : All IIMA community members above a certain age.

Sampling scheme : Each section of the IIMA community (different groups of students,
students, faculty and their families, groups of staff members) can be assumed to be a cluster
because exercise habits should not depend on the sections of the community. A simple
random sample of clusters can be selected (out of the above clusters) and the proportion of
members who do regular exercise (in each cluster) can be calculated.

b) Average number of hours/week an IIMA student studies outside of class.

Population : All current PGP I, II, X, FPM, FDP & AFP students of IIMA.

Sampling scheme : Use stratified sampling (with proportional allocation) from each of the
above groups i.e first decide on the sample sizes to be drawn from each of the above groups
using the proportional allocation rule and then select simple random samples of the above
sizes from the groups.
c) The proportion of non-vegetarians in Ahmedabad.

Population : All current residents of Ahmedabad.

Sampling scheme : Use stratified sampling (with proportional allocation, if possible)


based on religion/caste/ethnicity since those may have an effect on food habit. Select a
random sample from each strata and calculate the proportion of non-vegetarian adults in
each sample.

d) Proportion of female faculty members in Indian universities.

Population : All current residents of Ahmedabad.

Sampling scheme : Use stratified sampling (with proportional allocation, if possible)


based on religion/caste/ethnicity since those may have an effect on food habit. Select a
random sample from each strata and calculate the proportion of non-vegetarian adults in
each sample.

Q3. Suppose you want to know the proportion of IIMA PGP students who went home during
the last weekend. Accordingly you select your close friends in your section and ask them
about their plans.

a) What kind of study are you conducting?

i) Experiment
ii) Simple random sampling
iii) Observational Study
iv) Stratified random sampling
v) Convenience study

b) What can be a possible source of bias in your study? Explain.


i) Sampling bias
ii) Non-response bias
iii) Response bias
iv) All of the above

Q4. For each of the following studies, explain whether an experiment or observational study
would be more appropriate and also identify the response and predictor/s

a) Whether or not smoking has an effect on coronary heart disease

Type of study : an observational study would be more appropriate because it is


unethical to randomize subjects into smoking and non-smoking groups and follow
them up over time to check/compare the proportion of those affected with coronary
heart disease. Rather, a random sample of patients (with coronary heart disease)
and healthy subjects can be interviewed with respect to their smoking history.
Response : whether someone has coronary heart disease.
Predictor/s : whether someone is a smoker.
b) Whether class X scores tend to be positively associated with CAT scores

Type of study : an observational study would be more appropriate since it is not


feasible to randomize students to higher and lower class X score categories and
follow them up to check their CAT scores. Rather, it would be more realistic to
select a random sample of students, record their CAT and class X scores to check
for any association between the two.
Response : CAT scores.
Predictor/s : class X scores.

c) Whether or not a special coupon attached to the outside of a catalogue makes recipients
more likely to order products from a mail-order company

Type of study : an experimental study can be conducted through which a random


sample of subjects can be mailed the catalogue with coupon and another random
group can be given the catalogue without the coupon. The proportion in each
group who order the products can then be compared.
Response : whether someone order products from the company.
Predictor/s : whether someone has the coupon.

d) Whether longer hours doing Facebook tend to be associated with lower grades

Type of study : an observational study would be more appropriate since it is not


realistic to randomize students into different Facebook-use categories and follow
them up to check their grades. Rather it would be more realistic to select a random
sample of students, record the number of hours/minutes they browse Facebook and
their grades and analyse whether these are associated (or not).
Response : Grades.
Predictor/s : # hours doing Facebook.

e) Whether women working in brick klins are more prone to give birth to infants with birth
defects compared to women from the general population.

Type of study : an observational study would be more appropriate since it is


absurd and impossible to randomize women into a brick-klin group and a non-
brick-klin group and compare the rate of birth defects of children born to them.
Rather it would be more realistic to select random samples of women from the
brick-klins and from the general population and compare the rate of birth-defects
of children born to them.
Response : whether a child has birth-defect.
Predictor/s : whether a woman works in the brick-klin.

Q5. Suppose following are the values in some population: 5, 27, 4, 17, 4.5, 19, 2, 11, 3, 6, 13,
18. A sample of size 4 is taken, and is observed to be 3, 4, 4.5, 2. Is it most likely to be (a) a
simple random sample, (b) a stratified sample or (c) a clustered sample? Give reason for your
answer.
A stratified sample should have units from EVERY strata; a cluster sample should have ALL
units from the sampled clusters (each of which should be heterogeneous).

Hence (a) is the correct answer, as either for (b) or (c) there will be more variation in the
sample. In case of cluster sample the chosen cluster(s) will have more variability, in case of
stratified sampling the variation between the strata should be high and therefore the sample
will be heterogeneous.

Q6. IIMA income : An agency wants to estimate the average monthly income of IIMA
employees. The agency designs the following sampling plan to get the estimate.

1. Divide the IIMA employees into five different groups: senior faculty, junior faculty,
officers, supervisors, and contract workers. The number of people from whom the
information is collected in each category is proportional to the share of the employees
in that category.

2. On one weekday morning, the agency surveyors go around all the IIMA offices and
collect income information from employees of different category until they reach the
specified number for each category.

(a) In this plan, is it a good idea to classify the employees into different groups? Why or Why
not ?

Yes, stratification by employee category makes sense because average salaries are expected to
vary substantially across employee categories. However, more groups/strata may be necessary
since, for example, income of officers may vary quite a bit based on rank and/or seniority.

(b) What kind of a sample is this ? Is this sample likely to give the agency a reasonable
estimate of the mean income ? Why or Why not ? Can you think of a better sampling plan than
this ?

This is a stratified sample but not randomly selected from each stratum. The sample may be
biased, for example, towards faculty who are likely to come to office in the mornings (it does
not take into account those employees who come late to the office and hence suffers from
undercoverage). Basically, the above sampling process does not ensure that every employee in
each category is equally likely to be in the sample.

Someone recommends to the agency that they randomly select one of the employee categories
and sample everyone in that category to estimate the mean income. Would this plan make
sense to you?

Not at all; this is an example of cluster sampling (with very bad clusters since strata are being
treated as clusters). It does not make sense because any one cluster of this kind will not
represent the entire population of employees.
Q7. Read the news articles in the links below and answer the questions that follows :

a) http://www.bbc.com/news/education-28327921

Type of study/Sampling design : observational study (here the observations/tests scores of


200 people are being noted; it would have been experimental if, for example, 100 people
were forced to wake up early and a similar number of people were asked to stay up late and
their behavioural patterns were observed/measured over a time period).

Sampling frame: All adult individuals (not too clear though).

Sample size: 200.

Biases/Comments: would have been more interesting if people from different occupations
were examined (since occupations often influence our working hours).

b) http://www.bbc.com/news/health-28546656

Type of study/Sampling design : observational study (people were just questioned about their
alcohol consumption).
Sampling frame : all middle-aged adults in the U.S.
Sample size : 6500.
Biases/Comments : additional information on the amount of alcohol consumed could have
been collected too (just to check whether it has any confounding effect on the perception of
abuse).

c) http://www.bbc.com/news/health-18856658

Type of study/Sampling design : observational study (the womens health status was
tracked/observed over a period of time; would have been an experiment if a group of women
were asked to drink alcohol and the another group asked to abstain and their degree of
cognitive impairment were measured).
Sampling frame : all elderly women in the U.S.
Sample size : 1300.
Biases/Comments : no comments as such.

d) http://www.bbc.com/news/science-environment-28512781

Type of study/Sampling design : experiment (this is an experiment on quantifying facial


features).
Sampling frame : set of all possible facial types.
Sample size : 1000 facial textures.
Biases/Comments : no comments as such.

e) http://www.bbc.com/future/story/20140729-is-it-bad-to-bottle-up-anger

Type of study/Sampling design : experiment (patients given questionnaires and followed up;
health status was matched with their responses in the questionnaires).
Sampling frame : all patients with heart ailments.
Sample size : 13,000.
Biases/Comments : no comments as such.

Q8. Raghuram Rajan. Suppose in the context of Raguram Rajans decision to quit as RBI
governor in September, NDTV would like to conduct a poll to gauge the mindset of urban
Indian youths. in the age group 20 25. The question that would be asked is Do you think
that Raghuram Rajans decision to quit as RBI governor would negatively impact the Indian
economy in the long run ? . The permissible margin of error (of the resulting sample
percentage) is decided to be 3%. Then

a) How many urban youths should the poll sample if NDTV has no idea of the true
population size (i.e total number of urban youths in India).

Ans : Here, .03 = 1/ n i.e n (1/ .03)2 1112 . Thus the poll should sample 1112 urban
youths approximately.

b) How many urban youths should be sampled if NDTV assumes that the total number of
urban youths in India (in the age group 20 - 25) is 10 lakh.

1 1000000 n 1000000 / n 1
Ans : In this case, .03 = which implies .0009 i.e
n 1000000 1 999999

1000000
900.999 1000000 / n and hence n 1109.88 1110
900.999

c) Comment on the difference in the sample sizes obtained in (a) and (b) above.

Ans : the difference in sample sizes between (a) and (b) is negligible. Still the sample size in
(b) is lower than in (a) probably because (b) is a finite population scenario while (a) is an
infinite population one. However since a population of 25 lakhs can be treated as infinite for
all practical purposes, the difference in sample sizes is negligible.

Q9. Dog bites @ IIMA. Because of the recent spate of dog-bites on the IIMA campus, you
want to design a survey to estimate the proportion of the IIMA community who favour the
proposal of complete eviction of dogs from the campus. Accordingly you walk the length and
breadth of the campuses, both old and new, on a nice Saturday afternoon asking anyone and
everyone you meet, Do you disagree that the complete eviction of dogs from the campus
may have a significant negative impact on security issues? Based on the feedback of the 125
people you met, you report that 82% of the entire IIMA community are in favour of the
proposal.

Identify how/why/whether the following may be of concern:

a) Sampling design: Not a simple random sample actually a convenience sample, there is
sampling bias.

b) Undercoverage: Here the sampling frame consists of only those members of the IIMA
community who tend to stay outdoors (but in campus) on Saturday. Hence a large section of
the IIMA community is not being considered.

c) Nonresponse bias: Unlikely, because this is a face-to-face interview and there seems to be
no reason for the interviewees for refusing to participate.

d) Response bias: Quite likely since the question is worded in a very complicated manner.
Hence the responses may be artificially biased towards a particular direction (positive or
negative).

You might also like