Professional Documents
Culture Documents
The word population is used to signify the aggregate from which the sample
is to be selected and to which inferences are to be made. The choice of population may
be clear and present few problems for some surveys, while for others difficult decisions
may be required to handle borderline cases. Two specific forms of the population should
be identified and defined by survey planners. The target population is the aggregate to
which inferences will be made or about which information is wanted. The sampling
population is the aggregate from which the sample will actually be selected. Survey
practicable to the target population. Sometimes, for reasons of cost, timing, or feasibility,
the sampling population may be more restrictive than the target population. If so,
statistical conclusions drawn from the sample apply strictly to the sampling population
and judgement is required to extend the conclusions to the broader target population.
about the nature of differences between the sampling and target populations.
Key parameters of the population. The objectives of the survey must be translated into
quantities are called parameters of the population. For example, in a labor force survey,
one of the key parameters may be the unemployment rate, defined as the total number of
unemployed persons divided by the total number of persons in the labor force (excluding
children and retirees). The key parameters in most surveys include population means of
survey variables, population totals of survey variables, ratios of totals, and regression
coefficients that describe the relationship between different survey variables. A given
survey may be faced with hundreds or thousands of population parameters. The survey
researcher will usually identify a small number of key parameters of the population and
then will make most of the important planning decisions about the survey with these key
parameters in mind.
Sampling frame and sampling unit. Before selecting the sample, the population must be
divided into parts that are called sampling units. The units must cover the whole of the
sampling population. Usually, the units do not overlap one another and every element of
the sampling population belongs to a unit. For example, in a labor force survey, the
objects or elements of the survey are persons; the population is divided into households;
all persons belong to a household; and households usually do not overlap. To enable the
selection of a sample of units, the survey researcher must obtain or develop a list of all
sampling units in the sampling population. This list is called the sampling frame. A
good sampling frame is usually hard to come by. In some simple cases, a list may be
readily available of the elements of the population to be measured. For example, if a 09/26/11
Kirk Wolter
Statistics 331
University of Chicago
American Bar Association or from state licensing boards. In other surveys, choices may
have to be made about the nature of the sampling unit, and such choices are guided by the
Degree of precision sought, cost, and timing. The results of the sample survey are always
subject to uncertainty or error because only a part of the population is measured and
because of errors of measurement. The uncertainty can be reduced by taking a larger
often comes with the price of increased time and money. Absent any constraint on time
and money, the survey researcher would simply enumerate the entire sampling population
and use a superior instrument. In the real world, however, the survey researcher will
always be faced with time and money constraints. Thus, an important step is the
Sampling design and implementation. The term sampling design refers to the methods
of sampling that are to be used in actually selecting the sample. Hundreds, thousands, or
millions of specific samples may be realized given a defined sampling design. The term
sampling implementation refers to the actual selection, given the sampling design, of
the one realized sample that will be employed in the sample survey. Important surveys
with public policy, business, or scientific consequences riding on the survey results
almost universally employ sampling designs that involve probability or random methods
of sample selection in which each element of the population has a known and nonzero
probability of selection into the actual sample. Random methods guard against selection
biases.
The sampling design may involve a number of features. It may involve stratification,
in which the sampling frame is partitioned into a number of parts, called strata, and a
sample is selected within each stratum. It may involve one or more stages of selection
of clusters of sampling units within each stratum, such as the selection of counties within
stratum, city blocks within county, housing units within city block, and persons within
housing unit. It may involve the selection of sampling units with equal or unequal
probabilities. It may involve the selection of sampling units in one or more phases,
such as a two-phase design in which certain information that may be obtained with low
cost is collected from a large first-phase sample and other information that entails high
Data to be collected; what is to be measured? The survey planner must clearly specify
answer the questions the survey is intended to answer. Because most surveys are
expensive and the marginal cost of collecting an additional item of information is low,
survey planners have a tendency to collect a bit more information than the minimum
required to address the surveys key objectives. The survey planner must manage a
tension between the desire to collect more information and the need not to overburden the
Kirk Wolter
Statistics 331
University of Chicago
In survey of persons, the items are usually questions to be answered by the respondent.
In a forestry survey, the items may relate to the volume of wood and the type of wood
available for lumber in a certain tract of land. In a market research survey, the items may
metropolitan area, the dollar value of the purchases, the time period in which the
purchases were made, and the promotional conditions (coupon, newspaper add, end of
aisle display) under which the purchases were made.
In some surveys, the main information of interest may be the responses to individual
questions, while in other surveys, the main information may be derived or calculated
The specification of the items to be collected in a survey will nearly always require
consultation with substantive experts. For example, in a labor force survey, the survey
planner will want to consult a labor economist. In a general social survey, the planner
will want to consult a sociologist. In a health care survey, the planner will want to
consult an epidemiologist. And in a survey of soil and land use, the planner will want to
consult a soil scientist. The substantive expert will usually assist or lead the
information are required to answer the key surveys questions; and will have some
knowledge of how respondents or survey elements retain, recall, and report items of
information.
Interviewers or data collection agents. Some surveys of households or persons are self
collected by professional data collection agents with special expertise, such as nurses or
laboratory technicians. And some survey data are simply collected, as noted earlier, by
qualifications, recruitment, pay grade, and training are of great importance to the quality
of the resulting data. Survey planners usually put effort into recruiting the best
interviewers the survey can afford; into devising and delivering a program of training on
the specific instruments and data collection protocols to be employed in the survey; and
into practice tests. Poorly qualified and ill trained data collection agents can wreck the
quality of an otherwise well designed and conceived survey. If, because of lack of
different stimuli to different respondents, then the survey researcher is left with a body of
inconsistent data that cannot be aggregated to make inferences to the survey population.
Methods of measurement. The survey planner must specify the mode of data collection
and the specific format and design of the survey instrument. There are a variety of
survey modes, depending upon the type of survey and the nature of the population. 09/26/11
Kirk Wolter
Statistics 331
University of Chicago
telephone, or by the internet (web or email). Some surveys are conducted using some
type of mechanical or electronic measuring device. Some surveys use chemical, medical,
or other laboratory testing. And some surveys offer several modes of administration: a
multimode survey.
software language, and managed by the interviewer through keystrokes entered into a
Usually, the same methods of measurement will be used for all of the elements of the
sample. The methods of measurement are vital to the overall quality of the survey and
considerable work is often put into planning these methods. Question wording and
sequencing are of importance, as are the use of color and other aspects of the design of
the instrument. In laboratory testing, the quality of the lab and the nature of the test (an
expensive highly accurate test versus a cheaper test with lesser accuracy) are of
importance.
Data collection operations. The term data collection operations or field period refers
to the actual collection of the survey measurements from the survey respondents or
elements using the survey instrument and the specified mode of interview. The dates and
length of the field period, the level of supervision given to data collection agents, the
followup protocol, and the manner of transmission of the survey measurements to survey
The length of the field period must be specified and it will likely contribute to the
resulting quality of data. Shorter field periods often result in less complete measurements
with more missing data, while longer field periods usually cost more and may not be
Highly skilled, experienced, and well trained agents generally require a lower supervisor
ratio than unskilled, inexperienced, and less well trained agents. An interviewer will
occasionally confront an unexpected issue or problem; the supervisor is the one that will
give instruction regarding the appropriate means of resolving the issue and will
communicate the methods of resolution to the rest of the data collection team who may
confront a similar issue in the future. The supervisor is also the one who will assign
cases in the sample to specific interviewers and will transfer cases from one interviewer
09/26/11
Kirk Wolter
Statistics 331
University of Chicago
The follow-up protocol refers to the number of times and the manner in which the
interviewer will call back in the event the survey cannot be completed. For example, if a
respondent does not answer the telephone, the interviewer may call back 6 to 10 times or
more. The protocol may specify the gaps between the call-backs or the times of day to
call-back. Sometimes, the interviewer and respondent may agree on a call-back time: an
For example, the initial interview attempts may be conducted by a personal visit to the
respondents home, while subsequent call-backs may be made via telephone, assuming
The interviewer and supervisor must transmit the completed instruments to survey
headquarters.
Analysis of data. Once the data collection instruments have been transmitted, they must
and merged into computer databases for subsequent statistical analysis. Survey managers
will usually put in place a receipt control operation for the purpose of actually
receiving the completed instruments and checking to ensure that all instruments are
accounted for, that none are missing, and that no instruments are included for elements of
the population that were not selected into the actual sample.
Paper instruments will have to be keyed into a computer format. Some instruments may
contain open-ended text of some sort, such as respondents reports of their industry and
occupation. Such text may require human coding (i.e., the assignment of a numeric code
that classifies the text as having a specific meaning) prior to or just following the
All of the data for all of the cases in the sample must be merged into computer databases
for subsequent processing. If the survey data have arrived at survey headquarters in a
variety of formats, then the data formats must be standardized prior to the merging of the
data.
Almost all survey data known to me involves some level of missing and erroneous data,
despite the best attempts by survey planners, trainers, interviewers, and supervisors to
produce clean and complete data. Some respondents will not be found (e.g., not at home
or wont answer the telephone) or will refuse to cooperate. Some respondents may
generally cooperate but will refuse to report one or more individual items, either because
they dont know the answer or because they feel it is sensitive information. Some
instruments are involved, they may malfunction or cease operating for a period of time,
leading to missing or faulty data. If laboratory testing is involved, the lab can make
Kirk Wolter
Statistics 331
University of Chicago
The survey data are now usually put through a cleaning process called edit and
imputation. The edit step identifies the individual fields of information on the individual
case records in the survey database that are missing or faulty. The survey statistician and
substantive expert collaborate to devise computer rules that can be used to check for
faulty data. For example, if a daughters age is greater than her mothers age, then either
one of the ages is faulty or the information on relationship between the two individuals is
faulty. For a second example, if a stores annual receipts this year are much greater or
less than the same stores receipts last year, then either this year, last year, or both may be
faulty.
Once the missing and faulty data are identified, the imputation step makes a statistical
estimation of the true value of the missing or faulty item. Once again, the statistician and
the computer to perform the estimation. Some surveys omit the imputation step and
leave the missing data as missing, or fix some of the faulty data but leave other faulty or
Statistical methods are used to aggregate the data that have been cleaned so as to produce
good estimates of the parameters of the population that were specified for study. Often,
the statistical methods include the development of survey weights. The survey weight
is a numeric value that is attached to each element of the sample for which the survey
measurement process was complete. The survey weight may be thought of as the number
of elements of the population that the respondent element represents. For example, a
weight of 100 would signify that the respondent represents himself or herself plus 99
The estimation procedure also includes statistical methods for calculating measures of
statistician to calculate how good the survey is using the survey data itself. Remarkably,
external evidence is not required; rather, internal evidence within the survey itself is
enough to establish the precision with which the population parameters are estimated.
The most common measure of precision is called the variance or the standard error
(the square root of the variance). The survey estimate plus or minus 1.96 times the
estimated standard error is called a statistical confidence interval for the population
parameter under study. The estimate, the estimated standard error, the confidence
interval, and related methods comprise the statistical framework for inference from the
The survey analysis consists of the calculation of the estimates and estimated standard
errors for all of the population parameters under study in the survey. The analytic results
High quality surveys usually implement a variety of methods to assure the quality of the
results. Some surveys implement verification interviews in which supervisors will call
back to a subsample of the respondents to confirm that the initial interview actually 09/26/11
Kirk Wolter
Statistics 331
University of Chicago
occurred. Re-asking a set of the key questions allows the survey statistician to estimate
variability due to measurement error. The survey cleaning process (edit and imputation)
is actually a component of the overall survey quality assurance program. If survey data
are keyed to make them computer readable, it is common to re-key the data on a 100
percent or partial basis and to compare the results of the two keying. Excessive
difference between the two keyings is symptomatic of keying error and suggests a need
for corrective action, such as retraining of keyers. If survey data are coded to make them
computer readable, it is common to re-code a portion of the responses and to compare the
two codings. Excessive difference between the two codings is symptomatic of coding
error and suggests a need for corrective action, such as retraining of coders or
The survey organization will usually implement a variety of checks of the computer
programming to make certain that the programs are calculating correctly. The
organization will also implement checks of the estimates and standard errors produces.
organization.
specifications are usually prepared for each of the steps in the survey. Specifications
provide some level of assurance that work will be done as designed. They help survey
planners clarify approaches, spot and fill gaps that otherwise may have gone unnoticed.
They also help consultants, advisors, and other interested parties to understand, criticize
computer programming.
At the close of a survey, the survey organization may prepare a number of reports. A
methodology report describes all of the methods, protocols, personnel, instruments, and
algorithms used in conducting the survey. An analysis report describes the methods of
analysis, the survey weights and the estimated standard errors. It includes the tables and
charts that contain the survey results and typically includes expert interpretation of the
substantive meaning of the results. Sometimes the methodology report and analytical
report are combined in one final report.
The survey organization may also prepare a codebook that describes the content of the
final survey database. If the survey database is to be handed over for analysis to others
outside the survey organization, then the organization may prepare a users guide,
describing the origins of the data and how they may be used, and programs for reading
the database, to assist the external user of the data to get started correctly.
Survey products differ somewhat from survey to survey. For public surveys, the survey
database and documentation, as described in the previous paragraph, are delivered to the
survey sponsor and possibly to others who plan to use the survey. Limited versions of
the database and the users guide may be released to the public. The database may be
made available through a website or at other designated archives. For commercial or 09/26/11
Kirk Wolter
Statistics 331
University of Chicago
proprietary surveys, the survey database and documentation are delivered to the
survey3