You are on page 1of 14

Population.

The word population is used to signify the aggregate from which the sample

is to be selected and to which inferences are to be made. The choice of population may

be clear and present few problems for some surveys, while for others difficult decisions

may be required to handle borderline cases. Two specific forms of the population should

be identified and defined by survey planners. The target population is the aggregate to

which inferences will be made or about which information is wanted. The sampling

population is the aggregate from which the sample will actually be selected. Survey

researchers usually attempt to define the sampling population to be as identical as

practicable to the target population. Sometimes, for reasons of cost, timing, or feasibility,

the sampling population may be more restrictive than the target population. If so,

statistical conclusions drawn from the sample apply strictly to the sampling population

and judgement is required to extend the conclusions to the broader target population.

Survey researchers sometimes conduct supplementary studies to ascertain information

about the nature of differences between the sampling and target populations.

Key parameters of the population. The objectives of the survey must be translated into

statistical quantities or numerical summaries that describe the population. Such

quantities are called parameters of the population. For example, in a labor force survey,

one of the key parameters may be the unemployment rate, defined as the total number of

unemployed persons divided by the total number of persons in the labor force (excluding

children and retirees). The key parameters in most surveys include population means of

survey variables, population totals of survey variables, ratios of totals, and regression

coefficients that describe the relationship between different survey variables. A given

survey may be faced with hundreds or thousands of population parameters. The survey
researcher will usually identify a small number of key parameters of the population and

then will make most of the important planning decisions about the survey with these key

parameters in mind.

Sampling frame and sampling unit. Before selecting the sample, the population must be

divided into parts that are called sampling units. The units must cover the whole of the

sampling population. Usually, the units do not overlap one another and every element of

the sampling population belongs to a unit. For example, in a labor force survey, the

objects or elements of the survey are persons; the population is divided into households;

all persons belong to a household; and households usually do not overlap. To enable the

selection of a sample of units, the survey researcher must obtain or develop a list of all

sampling units in the sampling population. This list is called the sampling frame. A

good sampling frame is usually hard to come by. In some simple cases, a list may be

readily available of the elements of the population to be measured. For example, if a 09/26/11

Kirk Wolter

Statistics 331

University of Chicago

survey of lawyers is to be conducted, a list of lawyers may be available from the

American Bar Association or from state licensing boards. In other surveys, choices may

have to be made about the nature of the sampling unit, and such choices are guided by the

practicality of obtaining or constructing a complete sampling frame.

Degree of precision sought, cost, and timing. The results of the sample survey are always

subject to uncertainty or error because only a part of the population is measured and
because of errors of measurement. The uncertainty can be reduced by taking a larger

sample or by using superior instruments of measurement, but such reduction in error

often comes with the price of increased time and money. Absent any constraint on time

and money, the survey researcher would simply enumerate the entire sampling population

and use a superior instrument. In the real world, however, the survey researcher will

always be faced with time and money constraints. Thus, an important step is the

determination of an acceptable compromise between the levels of precision, cost, and

timing within which the survey will operate.

Sampling design and implementation. The term sampling design refers to the methods

of sampling that are to be used in actually selecting the sample. Hundreds, thousands, or

millions of specific samples may be realized given a defined sampling design. The term

sampling implementation refers to the actual selection, given the sampling design, of

the one realized sample that will be employed in the sample survey. Important surveys

with public policy, business, or scientific consequences riding on the survey results

almost universally employ sampling designs that involve probability or random methods

of sample selection in which each element of the population has a known and nonzero

probability of selection into the actual sample. Random methods guard against selection

biases.

The sampling design may involve a number of features. It may involve stratification,

in which the sampling frame is partitioned into a number of parts, called strata, and a

sample is selected within each stratum. It may involve one or more stages of selection

of clusters of sampling units within each stratum, such as the selection of counties within

stratum, city blocks within county, housing units within city block, and persons within
housing unit. It may involve the selection of sampling units with equal or unequal

probabilities. It may involve the selection of sampling units in one or more phases,

such as a two-phase design in which certain information that may be obtained with low

cost is collected from a large first-phase sample and other information that entails high

cost is collected from a small second-phase sample.

Data to be collected; what is to be measured? The survey planner must clearly specify

the items of information to be measured/collected. The items must be adequate to

answer the questions the survey is intended to answer. Because most surveys are

expensive and the marginal cost of collecting an additional item of information is low,

survey planners have a tendency to collect a bit more information than the minimum

required to address the surveys key objectives. The survey planner must manage a

tension between the desire to collect more information and the need not to overburden the

respondent or element of the survey. 09/26/11

Kirk Wolter

Statistics 331

University of Chicago

In survey of persons, the items are usually questions to be answered by the respondent.

In a forestry survey, the items may relate to the volume of wood and the type of wood

available for lumber in a certain tract of land. In a market research survey, the items may

relate to the number of tubes of each brand of toothpaste purchased in a certain

metropolitan area, the dollar value of the purchases, the time period in which the

purchases were made, and the promotional conditions (coupon, newspaper add, end of
aisle display) under which the purchases were made.

In some surveys, the main information of interest may be the responses to individual

questions, while in other surveys, the main information may be derived or calculated

from two or more of the individual questions.

The specification of the items to be collected in a survey will nearly always require

consultation with substantive experts. For example, in a labor force survey, the survey

planner will want to consult a labor economist. In a general social survey, the planner

will want to consult a sociologist. In a health care survey, the planner will want to

consult an epidemiologist. And in a survey of soil and land use, the planner will want to

consult a soil scientist. The substantive expert will usually assist or lead the

determination of the survey objectives; will have an understanding of what items of

information are required to answer the key surveys questions; and will have some

knowledge of how respondents or survey elements retain, recall, and report items of

information.

Interviewers or data collection agents. Some surveys of households or persons are self

administered. Some surveys are administered by interviewers. Some surveys are

collected by professional data collection agents with special expertise, such as nurses or

laboratory technicians. And some survey data are simply collected, as noted earlier, by

electronic or mechanical devices, such as the collection of information about purchases in

a grocery store, using the stores point-of-sale scanning system.


Interviewers or other data collection agents are vital to the success of the survey. Their

qualifications, recruitment, pay grade, and training are of great importance to the quality

of the resulting data. Survey planners usually put effort into recruiting the best

interviewers the survey can afford; into devising and delivering a program of training on

the specific instruments and data collection protocols to be employed in the survey; and

into practice tests. Poorly qualified and ill trained data collection agents can wreck the

quality of an otherwise well designed and conceived survey. If, because of lack of

appropriate training in use of consistent methods, different interviewers administer

different stimuli to different respondents, then the survey researcher is left with a body of

inconsistent data that cannot be aggregated to make inferences to the survey population.

Methods of measurement. The survey planner must specify the mode of data collection

and the specific format and design of the survey instrument. There are a variety of

survey modes, depending upon the type of survey and the nature of the population. 09/26/11

Kirk Wolter

Statistics 331

University of Chicago

Surveys of households or persons are conducted via a face-to-face interview, by mail, by

telephone, or by the internet (web or email). Some surveys are conducted using some

type of mechanical or electronic measuring device. Some surveys use chemical, medical,

or other laboratory testing. And some surveys offer several modes of administration: a

multimode survey.

The survey instrument is sometimes called the questionnaire. It is the document in


which the survey measurements or responses are entered. For self administered or

interviewer administered surveys, the instrument may be in the form of a paper

questionnaire or answer sheet. Some interviewer administered surveys are conducted

using a CATI/CAPI (computer assisted telephone interviewing/computer assisted

personal interviewing) instrument, which is in electronic format, programmed in a

software language, and managed by the interviewer through keystrokes entered into a

desktop or laptop computer.

Usually, the same methods of measurement will be used for all of the elements of the

sample. The methods of measurement are vital to the overall quality of the survey and

considerable work is often put into planning these methods. Question wording and

sequencing are of importance, as are the use of color and other aspects of the design of

the instrument. In laboratory testing, the quality of the lab and the nature of the test (an

expensive highly accurate test versus a cheaper test with lesser accuracy) are of

importance.

Data collection operations. The term data collection operations or field period refers

to the actual collection of the survey measurements from the survey respondents or

elements using the survey instrument and the specified mode of interview. The dates and

length of the field period, the level of supervision given to data collection agents, the

followup protocol, and the manner of transmission of the survey measurements to survey

headquarters are of importance.

The length of the field period must be specified and it will likely contribute to the
resulting quality of data. Shorter field periods often result in less complete measurements

with more missing data, while longer field periods usually cost more and may not be

consistent with the timing expectations of the survey sponsor.

All interviewers or data collection agents require an appropriate level of supervision.

Highly skilled, experienced, and well trained agents generally require a lower supervisor

ratio than unskilled, inexperienced, and less well trained agents. An interviewer will

occasionally confront an unexpected issue or problem; the supervisor is the one that will

give instruction regarding the appropriate means of resolving the issue and will

communicate the methods of resolution to the rest of the data collection team who may

confront a similar issue in the future. The supervisor is also the one who will assign

cases in the sample to specific interviewers and will transfer cases from one interviewer

to another and the need arises.

09/26/11

Kirk Wolter

Statistics 331

University of Chicago

The follow-up protocol refers to the number of times and the manner in which the

interviewer will call back in the event the survey cannot be completed. For example, if a

respondent does not answer the telephone, the interviewer may call back 6 to 10 times or

more. The protocol may specify the gaps between the call-backs or the times of day to

call-back. Sometimes, the interviewer and respondent may agree on a call-back time: an

appointment. Sometimes call-backs are conducted using a different mode of interview.

For example, the initial interview attempts may be conducted by a personal visit to the
respondents home, while subsequent call-backs may be made via telephone, assuming

the availability of the telephone number.

The interviewer and supervisor must transmit the completed instruments to survey

headquarters. In the modern era, transmission is increasingly accomplished through

electronic, computer communications. Some transmission may occur by postal delivery,

by express mail services, by special messenger, or by personal delivery to the survey

headquarters.

Analysis of data. Once the data collection instruments have been transmitted, they must

be received at survey headquarters, converted (if necessary) to machine readable form,

and merged into computer databases for subsequent statistical analysis. Survey managers

will usually put in place a receipt control operation for the purpose of actually

receiving the completed instruments and checking to ensure that all instruments are

accounted for, that none are missing, and that no instruments are included for elements of

the population that were not selected into the actual sample.

Some instruments will already be in computer format, such as a CATI/CAPI instrument.

Paper instruments will have to be keyed into a computer format. Some instruments may

contain open-ended text of some sort, such as respondents reports of their industry and

occupation. Such text may require human coding (i.e., the assignment of a numeric code

that classifies the text as having a specific meaning) prior to or just following the

conversion of the instrument to computer format.

All of the data for all of the cases in the sample must be merged into computer databases
for subsequent processing. If the survey data have arrived at survey headquarters in a

variety of formats, then the data formats must be standardized prior to the merging of the

data.

Almost all survey data known to me involves some level of missing and erroneous data,

despite the best attempts by survey planners, trainers, interviewers, and supervisors to

produce clean and complete data. Some respondents will not be found (e.g., not at home

or wont answer the telephone) or will refuse to cooperate. Some respondents may

generally cooperate but will refuse to report one or more individual items, either because

they dont know the answer or because they feel it is sensitive information. Some

interviewers may record the information incorrectly. If electronic or mechanical

instruments are involved, they may malfunction or cease operating for a period of time,

leading to missing or faulty data. If laboratory testing is involved, the lab can make

mistakes or omit information. 09/26/11

Kirk Wolter

Statistics 331

University of Chicago

The survey data are now usually put through a cleaning process called edit and

imputation. The edit step identifies the individual fields of information on the individual

case records in the survey database that are missing or faulty. The survey statistician and

substantive expert collaborate to devise computer rules that can be used to check for

faulty data. For example, if a daughters age is greater than her mothers age, then either

one of the ages is faulty or the information on relationship between the two individuals is
faulty. For a second example, if a stores annual receipts this year are much greater or

less than the same stores receipts last year, then either this year, last year, or both may be

faulty.

Once the missing and faulty data are identified, the imputation step makes a statistical

estimation of the true value of the missing or faulty item. Once again, the statistician and

substantive expert collaborate to devise statistical algorithms that can be implemented on

the computer to perform the estimation. Some surveys omit the imputation step and

leave the missing data as missing, or fix some of the faulty data but leave other faulty or

missing data as missing.

Statistical methods are used to aggregate the data that have been cleaned so as to produce

good estimates of the parameters of the population that were specified for study. Often,

the statistical methods include the development of survey weights. The survey weight

is a numeric value that is attached to each element of the sample for which the survey

measurement process was complete. The survey weight may be thought of as the number

of elements of the population that the respondent element represents. For example, a

weight of 100 would signify that the respondent represents himself or herself plus 99

other elements of the population.

The estimation procedure also includes statistical methods for calculating measures of

precision. The outstanding feature of probability sampling is that it enables the

statistician to calculate how good the survey is using the survey data itself. Remarkably,

external evidence is not required; rather, internal evidence within the survey itself is

enough to establish the precision with which the population parameters are estimated.
The most common measure of precision is called the variance or the standard error

(the square root of the variance). The survey estimate plus or minus 1.96 times the

estimated standard error is called a statistical confidence interval for the population

parameter under study. The estimate, the estimated standard error, the confidence

interval, and related methods comprise the statistical framework for inference from the

observed sample to the entire target population.

The survey analysis consists of the calculation of the estimates and estimated standard

errors for all of the population parameters under study in the survey. The analytic results

are often presented in the form of tables and charts.

High quality surveys usually implement a variety of methods to assure the quality of the

results. Some surveys implement verification interviews in which supervisors will call

back to a subsample of the respondents to confirm that the initial interview actually 09/26/11

Kirk Wolter

Statistics 331

University of Chicago

occurred. Re-asking a set of the key questions allows the survey statistician to estimate

variability due to measurement error. The survey cleaning process (edit and imputation)

is actually a component of the overall survey quality assurance program. If survey data

are keyed to make them computer readable, it is common to re-key the data on a 100

percent or partial basis and to compare the results of the two keying. Excessive

difference between the two keyings is symptomatic of keying error and suggests a need

for corrective action, such as retraining of keyers. If survey data are coded to make them
computer readable, it is common to re-code a portion of the responses and to compare the

two codings. Excessive difference between the two codings is symptomatic of coding

error and suggests a need for corrective action, such as retraining of coders or

development of clearer coding definitions.

The survey organization will usually implement a variety of checks of the computer

programming to make certain that the programs are calculating correctly. The

organization will also implement checks of the estimates and standard errors produces.

Tables and charts will be scrutinized by an independent statistician within the

organization.

High quality surveys usually employ a variety of methods of documentation. Written

specifications are usually prepared for each of the steps in the survey. Specifications

provide some level of assurance that work will be done as designed. They help survey

planners clarify approaches, spot and fill gaps that otherwise may have gone unnoticed.

They also help consultants, advisors, and other interested parties to understand, criticize

and improve methods. Specifications can be especially important at the stage of

computer programming.

At the close of a survey, the survey organization may prepare a number of reports. A

methodology report describes all of the methods, protocols, personnel, instruments, and

algorithms used in conducting the survey. An analysis report describes the methods of

analysis, the survey weights and the estimated standard errors. It includes the tables and

charts that contain the survey results and typically includes expert interpretation of the

substantive meaning of the results. Sometimes the methodology report and analytical
report are combined in one final report.

The survey organization may also prepare a codebook that describes the content of the

final survey database. If the survey database is to be handed over for analysis to others

outside the survey organization, then the organization may prepare a users guide,

describing the origins of the data and how they may be used, and programs for reading

the database, to assist the external user of the data to get started correctly.

Survey products differ somewhat from survey to survey. For public surveys, the survey

database and documentation, as described in the previous paragraph, are delivered to the

survey sponsor and possibly to others who plan to use the survey. Limited versions of

the database and the users guide may be released to the public. The database may be

made available through a website or at other designated archives. For commercial or 09/26/11

Kirk Wolter

Statistics 331

University of Chicago

proprietary surveys, the survey database and documentation are delivered to the

survey3

You might also like