Professional Documents
Culture Documents
Andy Neely
Murdoch University
Cranfield University
Brian Berry
Leonard Plotnicov
University of Pittsburgh
Charles Brody
Theodore Porter
Ruth Chadwick
Kenneth Rothman
Lancaster University
Boston University
David F. Gillespie
Robert W. Sussman
Washington University
Washington University
Ton de Jong
University of Twente
Tilburg University
George McCall
University of Twente
Manus I. Midlarsky
James Wright
Rutgers University
Editorial Board
Editor-in-Chief
Kimberly Kempf-Leonard
University of Texas at Dallas
Richardson, Texas, USA
Editor Biography
Dr. Kempf-Leonard is Professor of Sociology, Crime and
Justice Studies, and Political Economy at the University of
Texas at Dallas. Prior to her appointment at UTD in 2000,
she was Associate Professor and Graduate Director of
Criminology and Criminal Justice at the University of
Missouri at St. Louis. She also served for ten years as a
gubernatorial appointee to the Missouri Juvenile Justice
Advisory Group. She received her Ph.D. at the University
of Pennsylvania in 1986; M.A. at the University of Pennsylvania in 1983; M.S. at the Pennsylvania State University
in 1982; B.S. at the University of Nebraska in 1980.
Her book Minorities in Juvenile Justice won the 1997 Gustavus Myers Award for Human Rights in North America.
Her publications have appeared in: Criminology, Justice
Quarterly, Journal of Criminal Law & Criminology, Crime
& Delinquency, Journal of Quantitative Criminology, Advances in Criminological Theory, Punishment & Society,
Corrections Management Quarterly, the Journal of Criminal Justice, Criminal Justice Policy Review, The Justice
Professional, Youth and Society, The Corporate Finance
Reader, and The Modern Gang Reader.
Editorial Board
Gary King
Harvard University
Paul Tracy
University of Texas at Dallas
Foreword
Not long ago, and perhaps still today, many would expect
an encyclopedia of social measurement to be about
quantitative social science. The Encyclopedia of Social
Measurement excellently defies this expectation by
covering and integrating both qualitative and quantitative
approaches to social science and social measurement. The
Encyclopedia of Social Measurement is the best and strongest sign I have seen in a long time that the barren opposition between quantitative and qualitative research,
which has afflicted the social sciences for half a century,
is on its way out for good. As if the Science Wars proper
between the social and natural scienceswere not
enough, some social scientists found it fitting to invent
another war within the social sciences, in effect a civil
war, between quantitative and qualitative social science.
Often younger faculty and doctoral students would be
forced to take sides, and the war would reproduce within
disciplines and departments, sometimes with devastating
effects. This, no doubt, has set social science back.
We cannot thank the editors and contributors to the
Encyclopedia of Social Measurement enough for showing
us there is an effective way out of the malaise.
This volume demonstrates that the sharp separation
often seen in the literature between qualitative and quantitative methods of measurement is a spurious one. The
separation is an unfortunate artifact of power relations and
time constraints in graduate training; it is not a logical
consequence of what graduates and scholars need to
know to do their studies and do them well. The Encyclopedia of Social Measurement shows that good social science is opposed to an either/or and stands for a both/and
on the question of qualitative versus quantitative methods.
Good social science is problem-driven and not methodologydriven, in the sense that it employs those methods which
xxxix
Preface
xli
xlii
Preface
discipline-specific. Some preferences can be linked to a specific field of study or research topic; others, related to time
and location, coincide with how new ideas and advances in
technology are shared. Sometimes we dont even agree
on what is the appropriate question we should try to answer!
Although our views differ on what is ideal, and even on
what are the appropriate standards for assessing measurement quality, social scientists generally do agree that the
following five issues should be considered:
1. We agree on the need to be clear about the scope and
purpose of our pursuits. The benchmarks for
evaluating success differ depending on whether
our intent is to describe, explain, or predict and
whether we focus extensively on a single subject or
case (e.g., person, family, organization, or culture) or
more generally on patterns among many cases.
2. We agree on the need to make assurances for the
ethical treatment of the people we study.
3. We agree on the need to be aware of potential
sources of measurement error associated with our
study design, data collection, and techniques of
analysis.
4. We agree it is important to understand the extent to
which our research is a reliable and valid measure of
what we contend. Our measures are reliable if they
are consistent with what others would have found in
the same circumstances. If our measures also are
consistent with those from different research circumstances, for example in studies of other behaviors
or with alternate measurement strategies, then
such replication helps us to be confident about the
quality of our efforts. Sometimes wed like the results
of our study to extend beyond the people
and behavior we observed. This focus on a wider
applicability for our measures involves the issue of
generalizability. When were concerned about an accurate portrayal of reality, we use tools to assess
validity. When we dont agree about the adequacy
of the tools we use to assess validity, sometimes the
source of our disagreements is different views on
scientific objectivity.
5. We also agree that objectivity merits consideration,
although we dont agree on the role of objectivity or
our capabilities to be objective in our research. Some
social scientists contend that our inquiries must be
objective to have credibility. In a contrasting view of
social science, or epistemology, objectivity is not possible and, according to some, not preferable. Given
that we study people and are human ourselves, it is
important that we recognize that life experiences
necessarily shape the lens through which people
see reality.
Besides a lack of consensus within the social sciences,
other skeptics challenge our measures and methods. In
what some recently have labeled the science wars, external critics contend that social scientists suffer physics
envy and that human behavior is not amenable to scientific
investigation. Social scientists have responded to antiscience sentiments from the very beginning, such as
Emile Durkhiems efforts in the 19th century to identify
social facts. As entertaining as some of the debates and
mudslinging can be, they are unlikely to be resolved anytime soon, if ever. One reason that Lazarsfeld and
Rosenberg contend that tolerance and appreciation for
different methodological pathways make for better science
is that no individual scientist can have expertise in all the
available options. We recognize this now more than ever, as
multidisciplinary teams and collaborations between scientists with diverse methodological expertise are commonplace, and even required by some sources of research
funding.
Meanwhile, people who can be our research subjects
continue to behave in ways that intrigue, new strategies are
proffered to reduce social problems and make life better,
and the tool kits or arsenals available to social scientists
continue to grow. The entries in these volumes provide
useful information about how to accomplish social measurement and standards or rules of thumb. As you learn
these standards, keep in mind the following advice from
one of my favorite methodologists: Avoid the fallacy fallacy. When a theorist or methodologist tells you you cannot
do something, do it anyway. Breaking rules can be fun!
Hirschi (1973, pp. 1712). In my view nothing could be
more fun than contemporary social science, and I hope this
encyclopedia will inspire even more social science inquiry!
In preparing this encyclopedia the goal has been to
compile entries that cover the entire spectrum of measurement approaches, methods of data collection, and techniques of analysis used by social scientists in their efforts
to understand all sorts of behaviors. The goal of this project
was ambitious, and to the extent that the encyclopedia is
successful there are many to people to thank. My first thank
you goes to the members of the Executive Advisory Board
and the Editorial Advisory Board who helped me to identify
my own biased views about social science and hopefully to
achieve greater tolerance and appreciation. These scientists helped identify the ideal measurement topics, locate
the experts and convince them to be authors, review drafts
of the articles, and make the difficult recommendations
required by time and space considerations as the project
came to a close. My second thank you goes to the many
authors of these 356 entries. Collectively, these scholars
represent well the methodological status of social
science today. Third, I thank the many reviewers whose
generous recommendations improved the final product.
In particular I extend my personal thanks to colleagues
at the University of Texas at Dallas, many of whom participated in large and small roles in this project, and all of
whom have helped me to broaden my appreciation of social
Preface
xliii
KIMBERLY KEMPF-LEONARD
Access
Peter Kirby Manning
Northeastern University, Boston, Massachusetts, USA
Glossary
access A working opportunity to gather data in a social unit;
the process of obtaining access may be met with success,
partial success, or failure.
access types Can be preexisting, worked through via a series
of stages, electronic, and global in implications.
craft vs. organizational studies One to two researchers
engage in craftwork, but larger teams require organization
(administration, accountability, and a hierarchy).
natural history of access Stages of access: preliminary work,
the approach, entry, and role relations (or secondary
access).
social units A working taxonomy of what might be accessed:
places, organizations, scenes, databases, or persons.
study purpose A motivational account that focuses, ideally,
on a scientific and intellectual aim.
study rationale A succinct summary of the aims and
purposes of the research, usually vague initially.
All science requires access of some kind. In social measurement, this includes access to socially constituted data
sources and to individuals in naturalistic settings. Access
must be considered as a continuous process rather than as
a single deciding point, and this process should be seen as
complex, not as a linear progression of entries. Access can
be sought in places, organizations, scenes, interactions,
databases, and persons. The efforts required are based on
the foci and rationale of the research, and although
these may not be firmly specified at the outset of
a study, they must be developed by the time of analysis
and report writing. Important issues arise prior to acquisition of access: the size and organization of the research
group, preexisting or ongoing access, and power relations
and access. A natural history of access (preliminaries/
preparation, the approach to the unit, the entry process,
Introduction
Research of socially constituted data sources and people
may require access to naturalistic social life settings,
rather than artificially created settings. Successful social
measurement of extant social settings requires consideration of a series of access points and access levels, from the
initial steps through the completion of the study.
Social Units
It is useful and perhaps necessary to subdivide the process
of obtaining access, and consider first the social units, and
their related patterning of interactions, to be accessed.
The units to be accessed range from the most general to
the most specific:
Places. Organized interactions occur both publicly
and privately. Public places that may be accessed include
parks, parades, restaurants, coffee bars, lobbies, and many
other easily accessed areas.
Organizations. Authoritatively
coordinated and
characterized by dense interaction of collections of people
in a given ecological location, organizations contain internal social units that may be studied even more intensely
than the whole organizational entity.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
Access
Scenes. Characteristic encounters, episodes, and situations that are key or thematic within the setting vary
from the banal (such as making change, emptying garbage, or attending meetings or dances) to the quasi-sacred
(such as baptisms, weddings, or other ritual events). The
idea of scenes includes typical or characteristic interaction
patterns found within the settings.
Databases. Collections of files, manuscripts, papers,
clippings, or other archived material, in electronic or
paper format, provide global access.
Persons. Access to individuals, which includes physical access (e.g., studies of tattooing, disabilities, or
maladies), is circumscribed by federal, state, and committee
guidelines that oversee issues of human rights and protect
the privacy and dignity of individuals and study participants.
Foci
Sociologists are concerned more with access to organizations, settings, and scenes in the sense provided here, and
are less likely to seek access that invades privacy. Each
social unit has a front stage, so to speak, as well as a back
stage. Access to these unit components for any sort of
study (even a brief survey) is often nested. For example,
access to organizations is one initial step in gaining access
to some other unit (records or recorded and memorialized
daily routines), to participants, and/or to some part of the
back stage of the studied social unit. Depending on the
concern of the study, there may be relevant subparts of
access to front stage material (e.g., what social means are
involved in occupational routines), including the setting
(props and paraphernalia), appearance (status displays),
and manner (expectations of performance). Social science
often contrasts the ideal and the real, and contradictions
of such. Consider Enron. The front stage elements, such
as the accounting system, the rationale in choice of published materials, the rhetoric used, and the public expectations (stockholders), may be fruitfully contrasted with
conventions for concealing losses and spreading expenditures and indebtedness and the corruption in auditing
reviews. To do so, however, requires some form of access
to nonpublic (back stage) information.
Access
Purpose or Rationale
One of the most difficult matters to articulate in the access
process, and in developing the approach, before, during,
and after the negotiations for access, is creating a workable
purpose (a motivational account) and a brief rationale for
the study. This may take some time, and is not equivalent to
simply articulating a precise aim or theoretical grounding of
the study, nor to the one-liner description that might be
used in casual conversation. Clarity of purpose should go to
motive and expected rewards. Ideally, such clarity might
ensure that creative research and its public dissemination
would govern the work, rather more than writing an expose,
or seeking financial gain. The long and emotionally draining
efforts by John Van Maanen, for example, to obtain access to
the Union City (large urban) police department were
finally successful through indirect sponsorship of a local
professor. Van Maanen explains that he used vague terms of
reference with the police chief initially, and later, as he
negotiated further access. The present author used similar
omissions in London in 1973, expressing simply interest in
the job (of policing) and by implication how it differed from
policing in the United States. This vagueness was sharpened
after the initial entree, and remarks informants had made
about what they thought was being studied were echoed!
Although these glittering generalities serve well early on,
because often a field study is not precisely focused initially
(or is defocused, according to Jack Douglas), a stated rationale must be derived eventually. Mitch Duneiers candid
statement of his lack of precise grounding is a good example
of an implicit rationalehis aim was to gather data and
write a good book. A stated research purpose is essential
to the eventual completion of a project, as is honesty to self,
to informants, and to respondents. This clarity of focus
generally emerges, not at once, because, as Kierkegaard
wrote, life is lived forward but understood backward. If
the aim is development of a grounded theory of some kind,
this formulation of an account or vocabulary of motives, and
specification via data, feedback, and reformulation is an
intrinsic aspect of the process.
depending on whether a study is carried out by an individual, by a loosely coupled set of teammates (a group
project with interlocking facets), as a tightly integrated,
organized, funded, and administered project, or as an
established, funded, ongoing field study. Although most
field studies are done by one person, teams of fewer than
four people are still engaged in craftwork. The major
contrast is between small (craft) studies and those involving more than three people; the larger groups are usually
funded and include a hierarchy of administration and the
requisite coordination. The larger size of the study group
entangles the access process in bureaucratic maneuver;
the access and administration of the project may be complex, handled by administrators who are not in the field,
and may not be the responsibility of the field researchers.
In effect, in larger groups, craftwork in the field is no
longer sufficient to manage the project, but managerial
decisions will directly affect the work in the field.
Access
Preliminaries
Preliminaries involve preparation. Preparation may involve study of the setting or organization, its history
and traditions, the people interacting within it, and organizational ideology and culture. Preparation may also involve assembling a rationale and gathering sponsorship
and/or and funding. After obtaining broad access, cultivating relationships and negotiating access to targeted
persons or data may be required. Several alternative strategies for access may be required in addition to managed
interactions (in the unit, or on site), such as developing
ways around obstacles to information. Generally, one or
more strategies will fail to yield the necessary results.
A variety of instruments are generally assembled in
a field study, and these may affect access. Different approaches (use of informants; formal/informal, structured/
unstructured, or face-to-face interviews; questionnaire
mailings; observation); may require different modes of
access.
As a general rule, within any organization, social scientists have been most successful in studying the powerless and lower level participants, e.g., small-time criminals
and ex-cons, young and midlevel gang members, officers
on the street, solo lawyers, and failed members of marginal groups. This is perhaps equivalent to studying academic life by interviewing only those who failed to get
tenure, life-time associate professors, or those who
dropped out of graduate school or were thrown out for
cheating. The systematic work of Mike Useem alone has
focused on the fat cats and the top dogs in recent
years. Professor of sociology Jack Katz has argued that
the absence of literature reports on the powerful within
social organizations may be a function of lack of access,
social scientists choices of subjects, or the elites active
strategies that prevent close study.
The Approach
The published literature suggests that the more a research
effort is sponsored by powerful organizations and persons,
the more likely access will result. It is possible this is an
artifact based on what is published (and the need to write
up results as apart of the contract or grant) rather than
ease of access. Seeking sponsorship is the opening move in
a game. In the early days of police research, for example, it
was rare that the researchers did not acknowledge the
Access
Entry
Entry is profoundly shaped by what is being entered.
Access to public places or to organization front stages
(the official greeters and receptionists in organization offices), and even to other settings of interest, may be easily
available without egregious costs or efforts. Access to bars,
parks, coffeehouses, and public gatherings is easily
accomplished. Semipublic events such as graduations,
weddings, and religious services are quite accessible.
However, after the preliminaries for researching organizations have been accomplished, very often negotiations
for further access flounder . For example, in a large English constabulary where the present author had personal
sponsorship, a presentation, lunch, a tour of the crime
analysis facility, and introductions to the Deputy Chief
Constable went well, but were followed by instructions to
write a letter to the Chief Constable for further access.
This all looked promising until the letter of refusal came
a week or so later. There was no explanation for the refusal, but an indirect inquiry revealed that the Chief Constable was about to resign, and the new deputy to be
appointed and the people running the crime analysis
unit were wary of criticism of a nascent undertaking. Another study involved being given initial sponsorship by
a London officer visiting Michigan State University
on a study bursarship; the sponsorship was linked to
a friend of the London officer, i.e., a Chief Superintendent in the Metropolitan London Police (now called
Metropolitan Police Services). A phone call to the
Chief Superintendent (yielding the response What
Access
middle manager, clerk, or trusted administrative assistant). Here, the usual vagueness obtains. Once
negotiations ensue, those at the point of access and the
access-producer, or gatekeeper, will shape the initial
roles occupied by the researcher, and hence data access
to some extent. Several researchers have suggested that
fieldworkers should take on a fabricated, false, and agreeable pleasant persona in order to gather data; others disagree with this approach. The stereotypes of academics as
confused, abstracted, distant, and slightly otherworldly
can be both an advantage and a disadvantage. With police,
who are attuned to lies and self-serving fronts, it may be
essential to disagree directly with their views, and to take
positions, particularly concerning issues such as gender
relations (sexism) and race bias.
As a study progresses and data are gathered (whether
from interviews, observations, or questionnaires), myths,
questions, curiosities, and unanticipated events will shape
the study and the nature of the secondary access. In the
worst-case scenario, the researcher is subpoenaed,
thrown out, and the data confiscated or destroyed.
These events cannot be fully anticipated, but they are
common and in that sense predictable. As access continues, the researcher takes on new depth or presence;
personal features, habits, nicknames, and shared humor
and jokes make data gathering more contextual and sensitive to gender, appearance, and interests. Jennifer Hunt,
for example, in studies of the New York Police Department, was treated with great ambivalence by the officers,
but her ability to shoot accurately and well was ultimately
acknowledged and was a source of prestige. Responses to
such events and patterns of interaction shape subsequent
access and unfolding or closing of opportunities. One of
the advantages of a multiperson team study is that people
talk to researchers about the other researchers, thus
facilitating role-building activities. Roles are built up
over time, not immediately thrust into a setting. Some
roles are a result of active decisions by a researcher,
but others are beyond control. Clearly, gender, sexual
orientation, and sexual preferences shape secondary
access in important ways.
The question of what sort of help and assistance
a researcher gives to the organizations members studied
also arises in fieldwork (e.g., in an ambulance, at a fight,
when gang members are fleeing police). Does a researcher
carry a gun, wear a bulletproof vest, engage in fights, do
medical procedures, type up documents, count money or
evidence seized in raids, run errands for coffee, or become
a gopher? In general, these little duties become bases
for reciprocity and obligation that exceed the narrow purpose of the research. For examples, anthropologists in
Chiapas always overpaid their informants just enough
to keep them out of debt, but not so much as to elevate the
standard rate among the Zinacantecos. This encouraged
the informants to depend on the anthropologists for loans,
Access
Future Issues
Global Ethics and Access
The points made in this article assume that the investigator or team has at least nominally set out a purpose and
that the general purpose is known by the studied groups.
Secret observers and disguised observations raise quite
different questions about access, because access involving
concealed observers in whole or part is patterned by their
roles as they defined them within the organization, or only
slightly expanded versions of these roles
When access is gained with a broad mandate and/or
a secret purpose that either is not disclosed or is hidden
from the people being studied (or when they do not understand what is happening, given lack of research experience or because of poor reading and language skills),
then access and field tactics become fraught with ethical
questions. James ONeel and Napoleon Chagnon entered
the Amazonian forest to study the Yanomamo Indians as
a genetic population, seeking to establish whether the
dominance of headmen shown through fighting and conquests led to a natural selection process that sustained
warfare and conflict and the social structure. Holding fast
to their aim, ONeel and Chagnon would not provide
medical treatments; staged, stimulated, and filmed fights;
gathered genealogies that were angering and a violation of
tribal traditions; and in some sense eroded an already thin
and dying culture. In this research, access increased vulnerability of the tribe to diseases and outside forces and
contributed to the shrinking number of the remaining
isolated, preliterate groups.
Comment
Fieldwork requires primary and secondary access, and
some types of access are more troubling and will be negotiated more than others. Fieldwork requires diligence
and open-mindedness. The shifting mandate of a project
will move people in various directions, but research
should always be guided by centered and peripheral concerns, i.e., the purpose of the research and the questions
posed as well as the epistemological assumptions of gathering and seeking to access data. In addition, as every good
research paper should state, pursuing access to new areas
will produce the need for more research.
Further Reading
Becker, H. S. (2001). The epistemology of qualitative research.
In Contemporary Field Research (R. Emerson, ed.),
pp. 317 330. Waveland Press, Prospect Heights, IL.
Berreman, G. (1962). Behind Many Masks. Monograph 4.
Society for Applied Anthropology, Chicago.
Dalton, M. (1962). Men Who Manage. John Wiley, New York.
Access
Accounting Measures
Michael Bromwich
London School of Economics and Political Science, London,
United Kingdom
Glossary
accounting measurement converting physical numbers of
items (such as physical quantities of sales and physical units of
raw materials and components) into monetary amounts.
audit the validating of the financial reporting package and the
confirmation that its contents comply with extant accounting
regulations by independent qualified accountants.
free cash flow the amount of cash and financing available for
future, as yet uncommitted activities.
intangible assets assets not having physical form.
internal goodwill the value of the firm over and above the
total value of its net assets after deducting financing.
purchased goodwill the amount by which the price of an
acquired firm exceeds its net asset value.
off-balance sheet not including some assets and liabilities in
the balance sheet.
true and fair view an unbiased accounting representation of
the economic situation of the business constrained by the
accounting methods used.
working capital short-term assets net of short-term liabilities.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
10
Accounting Measures
Accounting Measures
11
12
Accounting Measures
Accounting Theory
Before examining the income statement and balance
sheet in some detail and giving examples of some of
the areas where care is required in interpreting their
meaning, this article provides a theoretical background
of accounting and briefly considers some controversies
and difficulties.
Prior to the mid-1960s, accounting reports were not
strongly regulated. The emphasis was on prudence (recognizing gains only when sure about them but recognizing
all losses immediately and making allowances for likely
future problems) and on conservatism (understate asset
values and overestimate liabilities, thereby reducing the
current value of the firm and current profits but increasing
possible future profits). Such accounting results were
seen as good for contracting purposes, which require as
objective information as possible but were of little help for
decision-making. The predominant accounting approach
was to ensure that the revenues of any period were
matched as best as possible with the costs of obtaining
those revenues whenever these costs were incurred. This
yields a reasonable picture of the profits earned in the
period but at the expense of a balance sheet that may have
little meaning (often being only collections of transactions
that have not yet passed through the profit-and-loss
account). In the United Kingdom, this meant that, generally, management were free to choose whatever accounting practices they wished and generally to
disclose as little as they liked, subject to what was judged
acceptable by the accountancy profession and the giving
of a true and fair view. The latter is difficult to define and
represents the exercise of professional judgment but
relates to a reasonable representation of the economic
situation of the corporation with the accounting model
being used. This means there may be several degrees of
freedom available in determining a true and fair view and
there may be more than one true and fair view. Moreover,
via conservatism and prudence, management were able to
build up hidden reserves of profits in good times and
release them in bad times (thus smoothing profits over
Accounting Measures
Profit-and-Loss Account or
Income Statement
The ability to recognize revenue in a given year may lead
to over- or understating revenues, and therefore profits,
and is being restricted by regulators. Examples of the
overrecognition of revenues are the taking all revenues
at the beginning of a long-term and complex contract,
where, for example, a future warranty may have to be
fulfilled or where later environmental clean-up work is
required. Another example of the overrecognition of
revenue is the recording of payments from pseudoindependent companies (as practiced by Enron).
Most costs in the profit-and-loss account will represent
buying prices at the time of purchase, though the cost of
any opening inventories used during the year may be
calculated in a number of ways, which affects the costs
charged to the year when items have been purchased at
different prices. Depreciation represents an allocation of
the cost of a long-lived asset to the year in mind. Ideally,
such depreciation should reflect the loss of value of the
asset in the year, but usually accounting depreciation
follows an arbitrary formula of which a number are permissible, the use of which thereby generates different
profits. Similarly, the cost of those intangible assets that
are allowed to be incorporated into the accounts also may
have to be spread over their lifetime on a reasonable basis.
Provisions for future losses may also be spread over
a number of years. The United Kingdom regulator and
the IASB have restricted the ability to spread costs over
years in order to avoid the possibility that management
will use this facility to smooth profits (earnings) so that bad
years are disguised using undeclared profits from better
13
14
Accounting Measures
Conclusion
Overall, financial reports have changed enormously
over, say, the past 20 years and greater changes can
be expected but care still must be taken when using
accounting measurements. There is still conflict between seeking to provide reliable information and information useful for decision-making. The valuation
(measurement) bases used for different items still differ.
Some liabilities and many assets (especially intangibles)
are not included in the balance sheet. Income is moving
Accounting Measures
15
Further Reading
Benston, G., Bromwich, M., Litan, R. E., and Wagenhofer, A.
(2003). Following the Money: The Enron Failure and the
State of Corporate Disclosure. AEI Brookings Joint Center
for Regulatory Studies, Washington, DC.
De Roover, R. (1956). . The Development of Accounting Prior
to Luca Pacioli According to the Account-Books of Medieval
Merchants, in Studies in the History of Accounting,
(A. C. Littleton and B. S. Yamey, eds.), pp. 114 174.
Sweet and Maxwell, London.
Lewis, R., and Pendrill, (2000). Advanced Financial Accounting. Pitman, London.
Penman, S. H. (2001). Financial Statement Analysis and
Security Valuation. McGraw-Hill, New York.
Wilson, A., Davies, M., Curtis, M., and Wilkinson-Riddle, G.
UK and International GAAP: Generally Accepted
Accounting Practice in the United Kingdom and under
International Accounting Standards. Tolley, London.
Whittington, G. (1983). Inflation Accounting: An Introduction
to the Debate. Cambridge University Press, Cambridge, UK.
Administrative Records
Research
Dean H. Judson
U.S. Census Bureau, Washington, D.C., USA
Carole L. Popoff
U.S. Census Bureau, Washington, D.C., USA
Glossary
administrative record Data collected for an administrative
purpose, as opposed to a formal research data collection
effort.
coverage bias The systematic difference between the database and the population of interest.
database ontology The definition of objects that are in
a database (e.g., What is a business entity? Do sole
proprietors count as business entities? What is an address?
Is a person living in the basement at the same address or
a different address?) and the categories that are used in the
database (e.g., Does race have three, five, eight, or sixtythree categories?).
microdata Information about individual persons, families,
households, addresses, or similar objects, as opposed to
tabulations or aggregate data.
reporting lag The systematic tendency of administrative
records data to be reported in a time frame that is behind
the actual behavior of interest. For example, a person may
change addresses in one year, but that move may not be
recorded in the administrative database until a later year.
response bias The systematic tendency of respondents or
data collectors to respond in systematic ways that are
different than the ontology intended or what was
understood by the researcher.
easy to calculate), and there are systematic response biases and coverage biases in the data that are caused by the
administrative agency and their client base. An important
concept in the use of administrative data is that of
a database ontology, or the structure of the objects in
the database and how data on those objects is collectedin particular, a database ontology that is perfectly
suitable for the administrative agency may not be at all
suitable for the researcher. In such cases, the researcher
can sometimes estimate coverage biases or translate data
from one ontology to another; but always with care.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
17
18
Data collection
Data entry errors and
coding schemes
19
20
Population
in
database
Non-U.S.
residents
Accidental duplication
Accidental duplication
Population
in employee
database
Accidentally
included
contractors
(non-employees)
Deceased
Terminated,
not yet entered
in database
Figure 2 Illustration of coverage differentials between the database and the population
of interest.
Micro
AR alone
Mixed
Aggregate
Count
21
the Job Training Partnership Administration used performance measures to evaluate job training outcomes, unemployment insurance (UI) wage records were proposed
as an alternative to a 13-week follow-up survey, and certain states determined to use UI records for performance
measurement. A similar measure has been proposed for
block grant evaluations under welfare reform. The immediate research question emerges: Do these different outcome measures paint the same basic picture? Can they be
used interchangeably? The answer is in part yes and
no: although the overall pattern of outcome measures
is similar for the two data sources, there are significant
slippages in which data captured by one source are not
captured by the other. A similar requirement is the required reporting under the Community Reinvestment
Act (CRA) of 1977. CRA requires that each insured depository institutions record in helping to meet the credit
needs of its community, especially including low-income
neighborhoods, be assessed periodically by examining
bank operating data on persons who receive loans.
A second major count-based/AR alone/aggregate use
of administrative records data is in benchmarking other
kinds of data collection or estimates. For example, the
Current Employment Statistics (CES) program is
a survey of establishments, with the intention of providing
current information about the number of jobs in various
industries. Later, when the Unemployment Insurance
wage records files are complete, these CES statistics
are then benchmarked to the UI wage records and
adjusted to account for coverage differentials. A second
example of benchmarking would be comparisons of
survey-based or demographic estimates of phenomena
(such as food stamp receipt or levels of Medicaid
insurance coverage) with independent calculations of
food stamp distribution or of Medicaid client data.
A third, major user of aggregate/mixed administrative
records (with both count and model aspects) is the Bureau
of Economic Analysis (BEA), which produces the National
Income and Product Accounts (NIPA). These are the sum
of all output or expenditures, or the sum of income or
payments to business, government, and individuals. In
a simplifying assumption, over a given period of time,
one must equal the other. Expenditures (or output) are
measured by the value or final price of the nations output
of goods and services or Gross Domestic Product (GDP).
National Income (NI) is simply GDP minus depreciation
and consumption allowances. Personal Income (PI) is NI
minus indirect business taxes and other adjustments. As
such, PI is composed of payments to individuals in the
form of wages and salaries (including personal income
taxes and nontax payments net of social insurance payments) plus income from interest, dividends, and rent,
plus transfer payments (private and public).
Data for NIPA estimates come primarily from administrative records and economic census data. The census
22
23
Weaknesses
The first, and major, weakness of administrative records
comes under the heading of data quality. Defining data
quality is in itself a challenge. For example, what does it
mean for a measured data element to be right? How
wrong is too wrong? To illustrate with a simple example, involving a relatively simple element, consider race.
Prior to 1980, the Social Security Administration recorded
three races (White, Black, and Other or unknown); beginning in 1980, the race codes reflected five races (White,
Black, Asian or Pacific Islander, Hispanic, American
Indian or Eskimo), and included the codes for Other,
Blank, and Unknown. At the same time, the U.S. Census
Bureau collected race data using four races (White, Black,
Asian or Pacific Islander, and American Indian), which
were crossed with ethnicity (Hispanic or non-Hispanic).
In Census 2000, an additional race was added (Hawaiian
native) and a mark all that apply rule was applied, resulting in 63 possible combinations. Thus, if a researcher
wishes to link SSA data with Census data, the differential
recording of race over time would create substantial
comparability problems, but who can say which coding
scheme is right?
Every preceding comment can be echoed about every
conceivable data element, and the analyst must firmly
keep in mind that social definitions and the uses of
a database are changing, and that when a coder (either
agency personnel or an individual respondent) does not
have the categories that are appropriately descriptive,
they will most likely choose the best fitting. Choosing
the best fitting response, rather than choosing the
right one, is constructing social reality, not necessarily
24
25
Ground
truth
Carefully
collected data (Y)
(e.g., # of persons in HH
receiving FS/value)
AR data
(e.g., HH
characteristics)
Representative
sample of X
Estimated model: Y = f (X )
Figure 4 Illustration of calibrating administrative records (AR) data to an external data source.
HH, Households; FS, food stamps.
Acknowledgments
This article reports the results of research and analysis
undertaken by Census Bureau staff. It has undergone
a more limited review by the Census Bureau than have
the official publications on which it is based. This report is
released to inform interested parties and to encourage
discussion.
Further Reading
Bureau of Economic Analysis. (2002). Updated Summary
NIPA Methodologies. Survey of Current Business, October
2002. Available on the Internet at http://www.bea.gov
Bureau of Labor Statistics. (1997). Handbook of Methods.
Bureau of Labor Statistics. Washington, D.C. Available on
the Internet at http://www.bls.gov
26
Glossary
age effect A consequence of influences that vary by
chronological age.
age period cohort conundrum A specific case of the
identification problem in which the interrelated independent variables are age, period, and cohort.
birth cohort The individuals born during a given period, such
as a year or a decade (the kind of cohort usually studied in
cohort analyses).
cohort The individuals (human or otherwise) who commonly
experienced a significant event during a specified period of
time.
cohort effect A consequence of influences that vary by time
of birth or time of some other event (such as marriage) that
defines a cohort.
identification problem The situation that exists when three
or more independent variables that may affect a dependent
variable are interrelated so that the multiple correlation of
each independent variable to the others is unity.
period effect A consequence of influences that vary through
time.
Age, period, and cohort effects are estimated by sociologists and other social scientists primarily to understand
human aging and the nature of social, cultural, and political change.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
27
28
The age period cohort conundrum can be illustrated by the use of a standard cohort table, in which
multiple sets of cross-sectional data (data for one point
in time) relating age to a dependent variable are juxtaposed and in which the intervals between the periods for
which there are data are equal in years to the range in each
age category. For instance, if 10-year age categories are
used, data gathered at 10-year intervals are presented, as
in Table I, in which the dependent variable is whether
respondents to the 1974, 1984, and 1994 American General Social Surveys said they favored the death penalty for
persons convicted of murder. In such a table, the trend
within a cohort can be traced by starting with any but the
oldest age category in the left-hand column and reading
diagonally down and to the right. For instance, according
to the data in Table I, in the cohort that was 20 29 years
old in 1974, the percentage approving of the death penalty
went from 58.2 in 1974 to 73.6 in 1984, to 79.5 in 1994.
This increase may have been an age effect, because the
cohort grew 20 years older, or it may have been a period
effect reflecting general changes in society during the two
decades covered. Or it may have been a combination of
age and period effects. In other words, in this or any other
cohort diagonal, age and period effects may be confounded. Likewise, age and cohort effects may be confounded
in each column, and period and cohort effects may be
confounded in each row, of a standard cohort table.
It is obvious that a simple examination of Table I cannot
reveal the extent and nature of any age, period, and cohort
effects reflected in the data. What has not been evident to
many researchers interested in the age period cohort
conundrum is that no routinely applied statistical analysis
of the data can, by itself, be relied on to provide accurate
estimates of the effects. The reason is illustrated by the
different combinations of effects that could account for
the data in Table II, which is a standard cohort table
reporting hypothetical data. The simplest interpretation
of the data is that they reflect pure linear age effects,
whereby each additional 10 years of age produces
a five-point increase in the dependent variable. For
Table I Percentage of Respondents to the 1974, 1984, and
1994 American General Social Surveys Who Said They
Favored the Death Penalty for Persons Convicted of Murdera
1974
Age (years)
20
30
40
50
60
29
39
49
59
69
a
1984
1994
some dependent variables, this might be the only plausible interpretation, but, as the alternative explanations at
the bottom of the table indicate, it is not the only logically
possible one. Rather, an infinite number of combinations
of age, period, cohort effects could produce the pattern of
variation in the dependent variable shown in the table.
When the pattern of variation is not as simple as that in
Table II, which is usually the case, the combination of
effects producing the data must be somewhat complex. It
should be obvious that no mechanically applied statistical
analysis can reveal which of the many possible complex
combinations is the correct one. One kind of complexity,
however, sometimes aids interpretation of the data. If
there is an almost completely nonlinear pattern of variation by either age, period, or cohort that is uniform
across the categories of the other two variables, as in
Table III, there is only one reasonable explanation for
the data. In the case of Table III, for instance, it is hard
to imagine that any kinds of effects besides nonlinear age
Table II Pattern of Data Showing Pure Age Effects, Offsetting Period, and Cohort Effects, or a Combination of Age
Effects and Offsetting Period and Cohort Effectsa
Year
Age (years)
20
30
40
50
60
70
29
39
49
59
69
79
58.2
67.7
68.5
74.8
70.4
385
279
265
257
187
79.9
73.6
76.4
78.0
76.7
368
289
231
181
155
78.5
80.0
79.5
78.9
82.3
467
651
578
387
238
1960
1970
1980
1990
2000
50
55
60
65
70
75
50
55
60
65
70
75
50
55
60
65
70
75
50
55
60
65
70
75
50
55
60
65
70
75
50
55
60
65
70
75
1950
20
30
40
50
60
70
29
39
49
59
69
79
a
1950
1960
1970
1980
1990
2000
50
52
62
62
50
45
50
52
62
62
50
45
50
52
62
62
50
45
50
52
62
62
50
45
50
52
62
62
50
45
50
52
62
62
50
45
29
Constant
50.0
Age (years)
20 29
b
30 39
5.0
40 49
10.0
50 59
15.0
60 69
20.0
70 79
25.0
Year
1950
b
1960
0.0
1970
0.0
1980
0.0
1990
0.0
2000
0.0
Cohort (year of birth)
1871 1880
0.0
1881 1890
0.0
1891 1900
0.0
1901 1910
0.0
1911 1920
0.0
1921 1930
0.0
1931 1940
0.0
1941 1950
0.0
1951 1960
b
1961 1970
b
1971 1980
0.0
50.0
25.0
28.7
b
5.0
10.0
15.0
20.0
25.0
0.0
0.0
b
b
0.0
0.0
b
b
3.6
6.1
8.6
11.4
0.0
0.0
0.0
b
b
0.0
b
5.0
10.0
15.0
20.0
25.0
b
b
3.6
6.1
8.6
11.4
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
b
50.0
45.0
40.0
35.0
30.0
25.0
20.0
15.0
10.0
5.0
b
25.0
23.9
21.5
19.1
16.8
14.0
11.8
9.1
6.5
3.9
b
a
Unstandardized
b
regression coefficients.
Reference category; value set at zero.
30
Constant
62.0
Age (years)
20 29
12.0
30 39
10.0
40 49
b
50 59
b
60 69
12.0
70 79
17.0
Year
1950
b
1960
0.0
1970
0.0
1980
0.0
1990
0.0
2000
0.0
Cohort (year of birth)
1871 1880
0.0
1881 1890
0.0
1891 1900
0.0
1901 1910
0.0
1911 1920
0.0
1921 1930
0.0
1931 1940
0.0
1941 1950
0.0
1951 1960
0.0
1961 1970
0.0
1971 1980
b
50.0
50.0
60.0
b
2.0
12.0
12.0
0.0
5.0
b
2.0
12.0
12.0
0.0
5.0
b
b
8.0
6.0
8.0
15.0
b
0.0
0.0
0.0
0.0
0.0
b
b
0.0
0.0
0.0
0.0
b
2.0
4.0
6.0
8.0
10.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
b
b
b
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
20.0
18.0
16.0
14.0
12.0
10.0
8.0
6.0
4.0
2.0
b
a
Unstandardized
b
regression coefficients.
Reference category; value set at zero.
31
how the data in Table I should be interpreted. The crosssectional data in the first column and the intracohort
trends both suggest a positive age effect on approval of
the death penalty; all eight 10-year intracohort changes
shown are upward. However, the upward trend at ages
20 29 from 1974 to 1984 is evidence for rather strong
period influences toward approving of the death penalty.
Therefore, the intracohort trends could well be entirely
period rather than age effects. And the positive relationship between age and support of the death penalty shown
for 1974 could have resulted from earlier anti-deathpenalty period influences that affected younger persons
more than older ones. Of course, no serious cohort study
of attitudes toward the death penalty would be based only
on the data in Table I, especially because the question
yielding the data has been asked on other American General Social Surveys, nor would the study stop with a simple
examination of tabular data. However, given the basic
evidence available, a definitive answer as to whether
there has been an age effect on attitudes toward the
death penalty would elude even the most sophisticated
study possible.
Informal means of examining cohort data may not be
satisfying to persons who have a high need for certainty,
but accepting the fact that there is always some ambiguity
in the evidence concerning age, period, and cohort effects
is more scientific than dogmatically embracing statistical
model estimates that are likely to be substantially in error.
Conclusions
Social and behavioral scientists have formulated many
important hypotheses about the effects of age, period,
and cohort, and research to test these hypotheses should
not be abandoned. However, researchers should recognize that definitive evidence concerning many of the hypothesized effects may never be forthcoming. Belief that
statistical age period cohort models can provide such
evidence has led to much pseudorigorous research and
almost certainly to many incorrect conclusions. If statistical model testing is used to estimate the effects, the
credibility of the estimates should be evaluated on the
basis of theory, common sense, and a priori knowledge
of the phenomena being studied. It is important to avoid
letting the model testing create an illusion of rigor that will
prevent the proper application of human judgment in the
research process. It is often preferable to skip the statistical model testing and proceed directly to more informal
means of distinguishing age, period, and cohort effects.
Although these methods are fallible, they are generally
recognized as such and thus are less likely than formal
model testing to lead to overly confident conclusions.
32
Further Reading
Abramson, P. R., and Inglehart, R. (1995). Value Change in
Global Perspective. University of Michigan Press, Ann
Arbor, Michigan.
Alwin, D. F. (1991). Family of origin and cohort differences in
verbal ability. Am. Sociol. Rev. 56, 625 638.
Blalock, H. M., Jr. (1967). Status inconsistency, social mobility,
status integration, and structural effects. Am. Sociol. Rev.
32, 790 801.
Converse, P. E. (1976). The Dynamics of Party Support:
Cohort Analyzing Party Identification. Sage Publications,
Beverly Hills, California.
Glenn, N. D. (1987). A caution about mechanical solutions to the identification problem in cohort analysis:
A comment on Sasaki and Suzuki. Am. J. Sociol. 95,
754 761.
Aggregation
D. Stephen Voss
University of Kentucky, Lexington, Kentucky, USA
Glossary
aggregation bias Systematic inaccuracy induced in a method
of statistical inference because of patterns in the process of
grouping data.
areal units problem The recognition that the same individuallevel data can produce a wide variety of aggregate-level
statistics depending on the geographical areas into which
the individuals are grouped.
cross-level inference When analysts attempt to generate
statistics, derive estimates, or draw inferences for units of
analysis at one level of measurement using data measured
at a different level.
ecological inference A generic phrase describing the
attempt to estimate individual-level behavior using aggregate data.
ecological regression Usually associated with Leo Goodman,
refers to the use of linear regression to estimate individuallevel behavior using aggregate data.
Kings EI A method of ecological inference based on
maximum-likelihood estimation using the truncated bivariate normal distribution. Kings approach has no formal
name, but it is commonly called EI after the software
distributed to implement it.
neighborhood model An ecological inference model that
assumes variations from aggregate unit to aggregate unit are
entirely contextual, and therefore not compositional.
weighted average A version of the standard mean or average
that is invariant to the level of aggregation because of the
method of weighting each unit.
Introduction
Social science research often relies on quasi-experimentation, in which the researcher analyzes data produced
incidentally by the social system rather than data produced experimentally as a direct and controlled outgrowth of the work. In fact, it is not irregular for analysts
to exercise minimal control over the data available for
research, aside from choosing among one or more preformed data sets to exploit in secondary analysis. Analysts
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
33
34
Aggregation
This central insight has remained with the social sciences as they have grown more formalized. Each social
science discipline has its own language for discussing the
problem of linking theory and data at the proper unit of
analysis. Economists have termed the danger a fallacy of
composition. Microeconomic and macroeconomic patterns and relationships may differ sharply. Anthropologists worry that the results of etic observation (simplified,
observation from the perspective of the outsider) may
produce sharply different results from emic observation
(again, simplified, observation from the perspective of
someone within a social system). Quantitative researchers
have long recognized that regressions performed on
a cross-section of data will produce different results
from those produced within a single panel or identifiable
unit across time. In each case, the specific problem falls
into this same general class: theories may have different
observable implications, depending on the sort of data
collected and how they are analyzed.
Cross-Level Inference
Aggregation most commonly poses a threat to inference
when the theory and the data for an analysis appear at
different levels. Practitioners are especially aware of the
pitfalls of cross-level inference when they need to derive
individual-level quantities of interest from data aggregated to a higher level. In these circumstances, it is common to fear an ecological fallacy caused by aggregation
bias. However, the problem can be just as serious in less
familiar circumstances, as when the analyst wishes to understand the phenomena driving aggregate-level patterns
using only individual-level data, or when data for an analysis appear at multiple levels of aggregation.
Aggregation
35
36
Aggregation
Voting-age
whites
Voting-age
blacks
All races
a
Left-wing
vote
Right-wing
vote
No
vote
Unregistered
ViL
ViR
(Ri Ti)
(1 Ri)
Table presents the typical situation faced in areal unit i, for which
marginal data is available but crosstabulations are not.
Aggregation
37
Ecological Regression
Ecological regression is the most common approach for
estimating how groups voted. It is sometimes called
Goodmans ecological regression, after the scholar who
developed it, but most applications of the technique stray
far from Goodmans intent. The basic intuition is fairly
simple. Goodman argued that ecological inference
using linear regression would be possible if underlying racial vote choices were constant (e.g.,
bbi bbi1 bbi2 bb ) or at least constant aside
from random variation. Researchers could collect data
from areal units smaller than the aggregation of interest
and run one or more linear regressions to estimate
quantities missing from the cross-tabulation.
Consider a one-stage version of the equations introduced above, with the racial support rates expressed as
constants:
ViL bb Xi bw 1 Xi :
Assume a normally distributed error term ei to make
room for stochastic processes, and this equation looks
like a linear regression formula with no intercept
which makes sense, given that a candidate who receives
neither white nor black votes will not have any votes at
38
Aggregation
0
1
contains a shifting parameter, bw
i g g Xi , a function
of racial density. Straightforward ecological regression,
performed on data of this sort, will not estimate the desired quantities (i.e., bb and the weighted average of bw
i ,
presuming all other assumptions of the model were correct). Instead, it would turn up the following:
b
w
ViL bw
i b bi Xi ei
0
g g1 Xi bb g0 g1 Xi Xi ei
g0 bb g0 g1 g1 Xi Xi ei
g0 bb g0 g1 1 Xi Xi ei :
The estimated intercept, assumed to represent white
support for Wallace, will be g0 only. How does g0 differ
from the real white rate? If the contextual effect g1 is
positive, as hypothesized, then the missing component is
also positive because Xi will never fall below zero. The
estimate of white leftism will be too low. Similarly, the
estimated black vote for left-wing candidates will be
too high.
Fixing the aggregation bias in this case requires
a simple application of the distributive property:
ViL g0 bb g1 g0 Xi g1 Xi2 ei :
Results generated from this equation could serve as
ecological-regression estimates. Subtracting g0 and g1
from the coefficient generated by Xi, for example, would
produce the black rate. The estimate for whites is more
complicated, because it contains Xi and therefore
requires a weighted average, but is also obtainable.
The problem with this simple fix is if black behavior
also changes with racial density, say by becoming
more Republican in whiter communities. The second
shifting parameter, bbi r0 r1 Xi , would produce the
following:
ViL g0 r0 r1 Xi g0 g1 g1 Xi Xi ei
g0 r0 g1 g0 Xi r1 g1 Xi2 ei :
Now there are four constants to identify instead of three:
g0, g1, r0, r1. Doing so is impossible, though, because
the equation only produces three coefficients. The
results are underidentified.
Aggregation bias will be present in a naive ecological
regression even if the coefficients are not a direct linear
function of racial density (i.e., Xi), but instead change
based upon some other community demographic that
correlates with racial density and therefore creates an
indirect relationship. For example, the poverty rate
might increase as Xi increases, and poorer whites
Aggregation
39
Aggregation
Similarly,
bb ViL bw 1 Xi =Xi :
EI has reduced the range of possible estimates to a series
of exclusive pairs, all within the range of possible values.
The same process is possible for the observed behavior
in each area unit i.
If we graph possible black and white behavior, then,
the known information about each parish will be represented by a line segment, the set of all possible combinations. Figure 1 presents a tomography plot, Kings name
(drawn from medical imaging) for the combined line segments of all areal units in an electoral district. It contains
data for 64 Louisiana counties from 1968 and represents
possible racial registration rates. This plot summarizes all
deterministic information contained in the registration
and demographic data available; no assumptions were
required to produce it. Horizontal linesthat is, lines
with very narrow bounds for white registrationcorrespond to almost entirely white counties. They contain so
few African Americans that we know quite precisely how
whites behaved (betaW), but almost nothing about blacks
(which is why the slant of such a line only allows a small
range of possible values on the y axis but any value on the
x axis). A segment becomes more vertical, however, as the
black population increases. We are less sure how many
T=1:2
1.00
0.75
betaW
40
0.50
0.25
0.00
0.00
0.25
0.50
betaB
0.75
1.00
Aggregation
41
Conclusion
The last section focused on one particular, and particularly intractable, example of data aggregation. It exhibits
all of the traits of aggregation that appear in other methodological contexts: loss of information, statistics that vary
based upon the level or unit of analysis, a search for statistics that are invariant to the level at which they are
computed so that they will tap into the underlying quantities of interest. Nonetheless, there is a wide variety of
examples in social measurement that represent attempts
to deal with aggregated or grouped data, each with its own
vocabulary and notation, but usually with the same underlying concerns.
Sometimes data are clumped across timefor example, when they are computed as five-year averages. This
form of aggregation is usually considered a concern when
performing time-series analysis. Often researchers attempt to combine their data into indices or scales,
which often result in a loss of information at the same
time they provide the researchers with relatively tidy
proxies for a social phenomenon. Sometimes researchers
wish to combine individual-level data with aggregate data,
or for some other reason wish to work with multiple levels
of data at once. Events may not be measured directly, but
instead may be summed into event counts or recorded
according to their durationsboth forms of grouping. In
short, aggregation as a concern in social measurement is
an almost universal consideration across numerous
methods and numerous disciplines.
Further Reading
Achen, C. H., and Shively, W. P. (1995). Cross-Level
Inference. University of Chicago Press, Chicago, IL.
Dogan, M., and Rokkan, S. (1969). Quantitative Ecological
Analysis in the Social Sciences. Massachusetts Institute of
Technology, Cambridge, MA.
Duncan, O. D., and Davis, B. (1953). An alternative to
ecological correlation. Am. Sociol. Rev. 18, 665666.
Freedman, D. A., Klein, S. P., Sacks, J., Smyth, C. A., and
Everett, C. G. (1991). Ecological regression and voting
rights. Eval. Rev. 15, 673711.
Goodman, L. A. (1953). Ecological regression and behavior
of individuals. Am. Sociol. Rev. 18, 663664.
Goodman, L. A. (1959). Some alternatives to ecological
correlation. Am. J. Sociol. 64, 610625.
Grofman, B., and Migalski, M. (1988). Estimating the extent of
racially polarized voting in multicandidate elections. Sociol.
Meth. Res. 16, 427454.
Hannan, M. T. (1971). Aggregation and Disaggregation in
Sociology. D. C. Heath and Co, Lexington, MA.
42
Aggregation
Michael R. Dowd
The University of Toledo, Toledo, Ohio, USA
Glossary
ad hoc Improvised and often impromptu; an ad hoc model is
one in which the structural equations have not been derived
from microfoundations (i.e., either utility or profit maximization) but rather are postulated from the beliefs of the
modeler.
aggregative model A macroeconomic model with structural
equations postulated by the modeler to describe the
economic relationships in the macroeconomy.
balanced growth path A rate of economic growth that has
the capital-stock-per-worker and the output-per-worker
growing at a constant rate. In this situation, output-perworker is determined solely by the rate of technological
progress.
break-even investment The level of investment in a growth
model that is required to keep the capital-to-effective-labor
ratio constant.
fiscal policy Altering government expenditures and taxes to
affect the level of national income, prices, unemployment,
and other key economic variables.
growth model A framework designed to examine the longrun movements in output, capital, and labor in terms of
economic growth.
IS curve A curve that illustrates the combinations of interest
rate and income consistent with equilibrium between
investment (I) and saving (S).
ISLM model A postulated static aggregative macroeconomic framework of aggregate demand that considers the
interaction of its real side (investmentsaving, or IS) with
its nominal side (money demand and supply, or LM) to
determine equilibrium income and interest rates.
LM curve A curve that illustrates the combinations of interest
rate and income consistent with equilibrium between
money demand (L) and money supply (M).
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
43
44
Y C I G,
C CY T, r,
1
2
Investment function
I IY, r,
Money market
M LY T, P, r: 4
The endogenous variables are income (Y), consumption (C), investment (I), and the interest rate (r). The
exogenous variables are the money stock (M), price
level (P), taxes (T), and government expenditures (G).
The IS curve is obtained by using Eqs. (2) and (3) in
Eq. (1). In effect, as Eqs. (1) and (2) produce the saving
curve; the IS curve portrays the interaction between
saving and investment. This is commonly referred to as
the goods market. The LM curve is equilibrium between
money demand (L) and money supply (M) in Eq. (4). The
IS is typically represented as the interest rate, being
a positive function of government spending and negatively related to income and taxes; with LM, the interest
rate is a positive function of income and prices and negatively related to money supply and taxes. The interaction
between IS and LM determines the equilibrium income
level and the interest rate, which, in turn, determines
consumption, investment, and money demand. In contrast, an aggregative model with no interaction between
saving and investment was provided by James Tobin.
The ISLM model is particularly adept at providing
short-run stabilization policy prescriptions, an ability that
45
Y FN, K,
Labor demand
FN N, K W=P,
Labor supply
SN 1 tW=P: 8
46
Expectations
Though expectations can be introduced into an ISLM
model via a number of avenues, the method most widely
used has been that of specifying an adaptive expectations
scheme, whereby the change in inflationary expectations
(p_ ) is proportional (l) to the error in expectations in the
previous period:
i
h P_
9
p :
Adaptive expectations
p_ l
P
Adding Eq. (9) to Eqs. (1)(5) provides a mechanism
for tracking the step-by-step movement toward longrun equilibrium (defined as when all expectations are
satisfied). The underlying problem, of course, is that
including a relationship such as Eq. (9) in an ISLM
model is an attempt to proxy dynamics in a static model.
Max
Ct , Cot1 , Mt =Pt
Subject to:
U ln Ct ln Cot1
y
Ct Mt =Pt W y
Cot1 W o Mt =Pt Rt :
10
47
:
2
Pt
Rt
These consumption and saving allocations are both
intuitive and consistent with assumptions normally
postulated in aggregative models. An example of such
microfoundations is that an increase in the interest rate
increases saving, decreases current consumption, and
increases future consumption.
There are a number of general characteristics of equilibrium allocations in the OLG model. First, consumption
and saving decisions are based on lifetime income (appropriately discounted). From a young agents perspective,
lifetime income is W y W o/Rt; from the perspective of
an old agent, lifetime income is W yRt W o. Hence, agents
consume in each period half of their lifetime income. This
consumption smoothing is made possible by transferring
excess income in one period to the other. From a young
agents perspective, the present values of income in the
two periods are W y and W o/R t. Whichever income is
larger, the young agent must transfer (via saving) onehalf of the excess income to the other period in order to
smooth consumption in each period. Because W y 4 W o is
typically assumed, the relationship for Mt /Pt indicates
that saving in this example is chosen appropriately. The
usual assumption of W y 4 W o can be interpreted as the
young period of an agents life (the working years of
that agent) and the old period of an agents life (his/
her retirement years). The situation of W y 5 W o/Rt
implies that young agents would have to borrow (perhaps
from the government) in order to smooth consumption.
The second characteristic of an OLG model to note is
the role that money plays. In an ISLM model, money
is postulated to act as a medium of exchange and not
as a store of value. In contrast, money in an OLG
model is derived from utility maximization to be a store
of value and not as a medium of exchange. This makes
intuitive sense because the ISLM model is relevant for
the short run and the OLG model is more applicable to
long-run analysisgiven its generational motivation. In
ISLM, short-run changes in the money supply influence
real variables by first altering the nominal interest rate and
then the real interest rate. In OLG, changes in the money
supply (completely independent of fiscal actions) will alter
the nominal interest rate, but that will not be transmitted
to either the real interest rate or to other real variables.
The third OLG characteristic centers on the alternatives to holding money. The demand for money in
48
Expectations
In an ISLM model, expectations usually serve merely to
provide a mechanism for moving from a short-run equilibrium toward a long-run equilibrium. In contrast, expectations play a crucial role in determining the current and
future equilibrium allocations in an OLG model. The
reason is straightforward: if young agents are to choose
consumption now and in the future, they must form expectations of future prices to make such decisions today. In
order to form expectations of prices, agents necessarily
must form expectations of both fiscal and monetary
policy actions now and in the future. Typically, strict
government policy rules are imposed to facilitate expectation formation. To achieve this, OLG modelers typically
assume that young agents possess complete information
on all past variables, have perfect foresight of all future
variables, and know with certainty all present and future
government policy actions. In general terms, imposing
such assumptions obtains neoclassical policy prescriptions,
and relaxing any such assumptions yields neo-Keynesian
results.
The common imposition of such strong assumptions
has prompted an examination of rational expectations
equilibria versus nonrational expectations equilibria.
One reason the OLG model has been so widely used
was the belief that the assumption of rational expectations
in a choice-theoretic framework would deliver to the economic agent the highest level of utility obtainable. This
belief has been shown to be without foundation. In
a choice-theoretic framework, the utility of an agent
with nonrational expectations will be higher than that
of an agent with rational expectations. This results from
the macro application of a micro model: what is true for an
individual is not necessarily true for the group.
y
y
Ct ,Lt ,Cot1 ,Lot1 ,Mt =Pt
Subject to:
y y
U U Ct ,Lt ,Cot1 ,Lot1
y
Ct Mt =Pt W y ,
Cot1 W o Mt =Pt Rt ,
T Lt Lt
T Lot1 Lot1 :
11
A Growth Model
(n + g + )kt
(break-even investment)
Preliminary Descriptions
The Solow growth model focuses on the long-run movements of output (Y), capital (K), labor (L), and knowledge
(or the effectiveness of labor) (A). With the subscript t
denoting time period, the basic model consists of the
following system of equations:
Yt FKt , At Lt ,
49
sf (kt)
(actual investment)
12
Lt1 1 nLt ,
13
At1 1 g At ,
14
Yt Ct It ,
15
St Yt Ct sYt ,
16
It Kt1 Kt dKt :
17
k*
kt
k_t kt(K_ t/Kt L_ t/Lt A_ t/At), we can write this relationship in intensive form:
k_ t sf kt n g dkt ,
18
50
consistent with several of the stylized facts about economic growth. First, the growth rates of labor, capital,
and output are roughly constant. Second, the growth
rates of capital and output are roughly equal. Third,
the growth rates of capital and output are larger than
the growth rate of labor (so that output per worker and
capital per worker are rising).
Further Reading
Barro, R. J., and King, R. G. (1984). Time-separable
preferences and intertemporal-substitution models of business cycles. Q. J. Econ. 99(4), 817839.
Black, D. C., and Dowd, M. R. (1994). The money multiplier,
the money market, and the LM curve. East. Econ. J. 30(3),
301310.
Blinder, A. S., and Solow, R. M. (1973). Does fiscal policy
matter? J. Public Econ. 2(4), 319337.
Dowd, M. R. (1990). Keynesian Results in a Neo-Classical
Framework. Ph.D. dissertation. State University of New
York, Buffalo.
Friedman, M. (1968). The role of monetary policy. Am. Econ.
Rev. 58(1), 117.
Hicks, J. R. (1937). Mr. Keynes and the classics: A suggested
interpretation. Econometrica 5(2), 147159.
Holmes, J. M., and Smyth, D. J. (1972). The specification of
the demand for money and the tax multiplier. J. Political
Econ. 80(1), 179185.
Holmes, J. M., Dowd, M., and Black, D. C. (1991). Why real
wages do not fall when there is unemployment. Econ. Lett.
35(1), 916.
Holmes, J. M., Dowd, M. R., and Black, D. C. (1995).
Ignorance may be optimal? Some welfare implications of
rational versus non-rational expectations. J. Macroecon.
17(3), 377386.
Keynes, J. M. (1936). The General Theory of Employment, Interest, and Money. Harcourt Brace Jovanovich,
New York.
Poole, W. (1970). Optimal choice of monetary policy instrument in a simple stochastic macro model. Q. J. Econ. 84(2),
197216.
Romer, D. (1996). Advanced Macroeconomics. McGraw Hill,
New York.
Samuelson, P. A. (1958). An exact consumption-loan model of
interest with or without the social contrivance of money.
J. Political Econ. 66(6), 467482.
Sargent, T. J. (1987). Dynamic Macroeconomic Theory.
Harvard University Press, Cambridge, MA.
Sargent, T. J. (1987). Macroeconomic Theory, 2nd Ed.
Academic Press, Boston.
51
Agricultural Statistics
Dawn Thilmany
Colorado State University, Fort Collins, Colorado, USA
Elizabeth Garner
Colorado State University, Fort Collins, Colorado, USA
Glossary
Agricultural Marketing Service (AMS) An agency that
provides high-frequency, geographically delineated data
and information on prices, marketing, and distribution of
numerous agricultural commodities for the United States
Department of Agriculture and the agricultural industry.
Economic Census The U.S. Census Bureau profile of the
U.S. economy, conducted every 5 years, from the national
to the local level; provides data on establishments, revenue,
value of shipments, payroll, and employment.
Economic Research Service (ERS) The primary source of
economic research and analysis for the United States
Department of Agriculture (its department home); provides
information and analysis on a broader set of agriculture,
agribusiness, consumer, and rural issues.
National Agricultural Statistics Service (NASS) Primary
statistics collector and information provider for the United
States Department of Agriculture (its department home)
and the production agriculture industry.
North American Industry Classification System
(NAICS) A system that has replaced the U.S. Standard
Industrial Classification system. NAICS provides more
detailed classifications and new industries, reflecting the
changing economy and the way business is done.
United States Census Bureau An agency of the United
States Department of Commerce; collects a wide variety of
data about the nations people and economy.
United States Department of Agriculture (USDA) A
cabinet-level department with seven major divisions overseeing 19 different services.
Introduction to Agricultural
Statistics and Data
Some of the earliest data collection in the United States
was based on agriculture; the 1790 census counted 9 out
of 10 Americans as living on farms. In 1791, President
George Washington wrote to several farmers requesting
information on land values, crops, yields, livestock prices,
and taxes; this was, in effect, the first agricultural survey in
the United States. According to the historical information
on the United States Department of Agriculture web site
(www.usda.gov), in 1840, detailed agricultural information was collected through the first Census of Agriculture,
which provided a nationwide inventory of production.
Yearly estimates established the general pattern of annual
agricultural reports that continues to this day.
The United States Department of Agriculture (USDA)
was established by Abraham Lincoln in 1862, and its first
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
53
54
Agricultural Statistics
Agricultural Statistics
55
The attention to farm and retail price spreads motivates discussion of some of the limitations of available
agricultural price data. As consolidation in the food retail,
distribution, and food service industries has led to fewer
companies making fewer public transactions, the validity
of prices from remaining cash-based open markets is
called into question. Still, the volume and frequency of
price data for food at the farm, wholesale, and terminal
market levels provide a rich informational resource for
business planning and research.
56
Agricultural Statistics
Employment
Data on farm labor employment are collected by four
major agencies: the U.S. Department of Labor, the
Bureau of Labor Statistics, the National Agricultural
Statistics Survey, and the Census of Agriculture from
various surveying efforts. Reliable data on farm employment are difficult to collect due to the part-time and
transitory nature of the employment, as well as the number of family members working on farms. Employment
data for the other industries comprising agribusiness are
developed and disseminated primarily by the Bureau of
Economic Analysis (BEA) and can be disaggregated to
a county level and downloaded from their web site. Often,
the publicly released BEA Regional Economic Information System (REIS) data cannot be disaggregated to
a small enough industrial classification. Special requests
for unsuppressed, more detailed county employment data
are often granted through the BEA. Additionally, data
from an individual states Department of Labor and
Employment can be used to supplement the BEA data.
Income
Data from the BEA are the primary source for both farm
labor and proprietor income, as well as for other agribusiness industries. These data can also be supplemented by
data from a state Department of Labor and Employment.
Because employment and income data are available annually, they are a comparatively useful measure in determining the overall and changing impact of agriculture on
a local economy in response to various shocks (policy,
Sales
Production and price data for commodities leaving the
farm gate were discussed in Sections II and III. Still, it is
important to note that, although the data are available on
an annual basis, geographic delineations of these data are
more limited due to the nature of production agriculture.
Sales or output data for the input, processing, and marketing sectors are somewhat restrictive to work with because they are available only every 5 years from the various
economic censuses.
Multipliers
Multipliers are often used to estimate broader economic
impact by estimating the indirect and imputed economic
activity related to direct revenues and input expenditures.
The two primary sources for regional multipliers are the
Regional Input Output Modeling System (RIMS II) by
the BEA and the Impact Analysis for Planning (IMPLAN)
model by the Minnesota IMPLAN Group. Sometimes the
data used to create the multipliers in IMPLAN have been
used to supplement employment, income, and output
data needs. Using the data directly is not recommended
because, in cases, local estimates can be simple adaptations of national averages, but can offer data ranges and
magnitudes when other sources are limited.
Social Concerns
Indicators of rural economic and social conditions are also
important when addressing the economic condition of
agriculture. The Economic Research Service (ERS)
branch of the USDA researches and compiles data for
measuring the condition of rural America. Some of
the data they monitor include farm and nonfarm rural
employment, labor and education, wages and income,
poverty, infrastructure (including telecommunications
and transportation), federal funding, and general demographics (including age distribution and changes in ethnic
and racial composition). The ERS has also worked extensively defining more detailed geographic descriptors that
enable richer county-level research. This detail is especially rich for analyzing nonmetropolitan area trends that
may be related to degree of rurality and metropolitan area
proximity. The ERS measures the rurality of an area
based on characteristics of the area in addition to the federal definitions provided by the Office of Management and
Budget (OMB). The ERS has developed various codes and
spatial aggregations useful for understanding rural areas,
including typology, urban influence, rural urban continuum, commuting zones, and labor market areas. Most of
Agricultural Statistics
Land Use
The number of acres planted and harvested by state and
county are available annually through the National Agricultural Statistics Service, although collection is limited to
the major commodities. The Census of Agriculture collects information on land use every 5 years. Variables can
include land in farms, harvested cropland by size of farm,
value, water use, and crop usein essence, the productive resource base of agricultural-based rural areas. The
National Resources Inventory through the Natural
Resources Conservation Service (also the USDA)
conducts a survey of land every 4 to 5 years as a means
to monitoring the status, condition, and trends of soil,
water, and other natural resources in the United States.
Food Security
The Economic Research Service leads federal research on
food security and hunger in U.S. households and communities. They also provide data access and technical support
to facilitate food security research in the United States. The
USDA web site provides a list of publicly available national
surveys on food security in the United States. Food security
for a household, as defined by the ERS, means access by all
members at all times to enough food for an active, healthy
life. Food security includes at a minimum (1) the ready
availability of nutritionally adequate and safe foods and
(2) an assured ability to acquire acceptable foods in socially
acceptable ways (that is, without resorting to emergency
food supplies, scavenging, stealing, or other coping strategies). The Current Population Survey Food Security
Supplement (CPS FSS) conducted by the Census
Bureau for the Bureau of Labor Statistics is the primary
source of national and state-level statistics on food security
in the United States. The data are primarily national, with
state-level data available for food insecure and food
insecure with hunger. Related to food security is food consumption. The ERS also annually calculates the amount
of food available for human consumption in the United
States. This series tracks historical national aggregate consumption of several hundred basic commodities and nutrient availability annually; the data date from 1909 and go
through 2001. Although there are full time series on many
foodstuffs and their subsequent nutrients, more commodities have been gradually added through the years.
Food security is also based on the availability of safe
foods. The ERS provides a variety of resources and research on food safety, including estimating the costs of
57
Summary Overview
The scope and detail of agricultural statistics has
increased greatly since the first agricultural census of
58
Agricultural Statistics
Further Reading
The following entities maintain Internet web sites that publish
a variety of topics of national, state, county, or local interest.
Alpha Reliability
Doris McGartland Rubio
University of Pittsburgh, Pittsburgh, Pennsylvania, USA
Glossary
coefficient alpha An index of the internal consistency of the
measure. It is a lower-bound estimate of the reliability of
a measure.
Cronbach, Lee The creator of the coefficient alpha or
Cronbachs alpha.
Cronbachs alpha Coefficient alpha (named after the developer of the index).
correlation The degree to which two variables are related.
essential tau-equivalence The items that measure a particular factor have the same factor loadings on that factor.
homogeneity An assumption of alpha that all of the items are
equally related and come from the same content domain.
internal consistency A type of reliability that indicates the
extent to which the responses on the items within a measure
are consistent.
items The individual questions, statements, phrases, sentences, or other word arrangements on a measure.
measurement error The amount of variance present in an
item or measure that is not attributable to the construct.
multidimensional measure A measure that assesses more
than one attribute of a variable.
reliability The extent to which a measure is consistent; this
can be demonstrated by either stability within the measure
(consistency) or stability over time.
unidimensional measure A measure that assesses only one
attribute of a construct. Unidimensionality is an assumption
of coefficient alpha.
assumes that the items are homogeneous and unidimensional. A high coefficient alpha indicates that one item
can be used to predict the performance of any other
item on the measure. A low coefficient alpha can indicate
either that the measure has poor reliability or that the
items are not homogeneous. Having a high coefficient
alpha does not provide any information as to the construct
validity of the measure. Even if the measure has perfect reliability, this does not address what the measure is
measuring. The measure must be subjected to further psychometric testing in order to ascertain its level of validity.
Introduction
Reliability is the degree to which a measure is consistent.
The consistency of a measure can be shown by either the
consistency of the measure over time or the consistency
of the responses within a measure. When examining the
consistency, we are concerned with the extent to which the
responses vary (either between measures or within a measure) as a result of true variability or as a consequence of
error. Reliability has been shown to represent the degree
to which the variability in scores exemplifies the true
variance in responses. In other words, the reliability
of a measure reflects the amount of true variability, as
opposed to variability attributable to error.
When assessing the psychometric properties of
a measure, researchers often begin by assessing the reliability of a measure. At a minimum, the reliability provides an indication of the amount of error present in
a measure. However, reliability does not address whether
the measure is accurately assessing the construct. Nevertheless, an important component of validity is that the
measure is reliable. In other words, reliability is necessary
but not sufficient for validity.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
59
60
Alpha Reliability
Definitions
Development of Alpha
Cronbach
Perhaps Lee Cronbachs most famous work stemmed
from his 1951 article in which he presented coefficient
alpha. Cronbach was able to demonstrate that coefficient
alpha is equivalent to averaging all the split-half correlations. More specifically, if a measure consists of 20 items,
the items on the measure can be split into two groups of
10 items each. Computing a correlation between the two
groups provides a rough estimate of the reliability of
a measure, but only for one-half of the test. If all the
Alpha Reliability
61
Assumptions
Direction of Relationships
Between Items
In order to accurately assess the internal consistency
of a measure using coefficient alpha, all the items should
Alpha
0
Number of items
62
Alpha Reliability
Spearman-Brown Prophecy
Formula
The Spearman-Brown prophecy formula is used to estimate the change in reliability of a measure as a function of
the measures length. If the number of items in a measure
is doubled, the formula can estimate the reliability for
the longer measure. Similarly, if the measure is reduced
by half, the formula can estimate the reliability for the
shortened form.
2 r12
1 r12
k rij
1 k 1rij
1=2 0:95
0:475
0:90
1 1=2 10:95 1 0:475
Conclusion
Internal consistency indicates the amount of measurement error present within a measure. This type of reliability provides useful information about the consistency
of responses in a measure. If a measure has a high internal
consistency, then the we can assume a high degree of
interrelatedness among the items.
Coefficient alpha is a measure of reliability that
represents the degree to which a measure is internally
consistent. A high coefficient only indicates that the
items measure something consistently. Even with
a sufficient coefficient alpha, we cannot draw any conclusions as to the validity of the measure. Alpha does not
provide any information about what the measure is measuring; the construct validity of a measure still needs to be
assessed. Reliability is necessary but not sufficient for
validity.
At a minimum, coefficient alpha should be computed
for every measure at each administration. Given the ease
with which this index is calculated, we would be remiss if
we did not study the reliability of the measure given in
their specific sample.
Alpha Reliability
Further Reading
Cortina, J. M. (1993). What is coefficient alpha? An examination of theory and applications. J. Appl. Psychol. 78,
98104.
Cronbach, L. J. (1951). Coefficient alpha and the internal
structure of tests. Psychometrika 16, 297334.
Cronbach, L. J. (1984). Essentials of Psychological Testing.
4th Ed. Harper & Row, New York.
63
Anthropology, Psychological
Steven Piker
Swarthmore College, Swarthmore, Pennsylvania, USA
Glossary
anthropology A discipline historically dedicated to a holistic
understanding of humankind, including the diversity of
cultural lifeways now and in the past, the social evolution
of same, language, human biology, and the relationship of
humankind to its close evolutionary relatives, extant and
extinct. Psychological anthropology is a part of (mainly
American) anthropology.
culture, general and evolutionary The mode of adaptation
of the human species, featuring, e.g., speech, learned and
diversifiable social relationships, religion, technology, and
material culture, all of which presuppose capacity for
symbol use.
culture, specific The way of life of a specific group, including
especially meaning.
evolution The processes by which species, including the
human species, arise and endure and change and become
extinct.
personality Enduring psychological dispositions. Psychological anthropology workers, variably, emphasize emotional
or cognitive and conscious or unconscious dispositions.
Sometimes the dispositions are understood in the terms of
an established psychological school (e.g., psychoanalysis,
cognitive psychology), sometimes not.
relativism Everything that human beings do and think is
relative, to a significant extent, to the specific cultural
context in which the doing and thinking occur.
Introduction
Psychological anthropology is a recognized subfield of
(mainly) American anthropology. Its organizational
focus is The Society for Psychological Anthropology,
which is a branch of the American Anthropological Association, and its official journal is Ethos. Arguably, the
central and enduring concern of psychological anthropology is the psychological mediation of the individual/
culture relationship. The first organized school within
psychological anthropology was culture and personality,
established at about the time of World War II. Over the
subsequent three decades or so, the subfield of culture
and personality grew and diversified, and a number of its
emphases remain active today. In part in reaction to culture and personality, and with reference to its major issues, new schools also arose, both within and outside
anthropology.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
65
66
Anthropology, Psychological
basic personality; independently, a Rorschach expert interpreted the test protocols. If the personality interpretations from all three sources agreed, then, for that culture,
the model was validated.
Anthropology, Psychological
Meanwhile, contra Kardiner, some culture and personality workers were not finding approximate uniformity of
personalities in small-scale, socially homogeneous
cultures. On several personality dimensions, rather,
weak modality and much overall diversity seemed to be
the pattern. Wallace, with an elegant argument, demonstrated in 1961 that this should be expected to be so in
socially homogeneous small-scale cultures, and that this
circumstance is consistent with and, arguably, favorable
for social stability.
As a sidelight to all of this, the national character
school projected a broadly Kardinerian understanding
of personality and culture onto a vastly larger cultural
screen, that of the modern nation state. However, whereas
the modal personality folks made the establishment of
personality traits and their distribution in a culture an
empirical question, and ditto for socialization practices,
the national character school did not hold itself to this
standard. Not surprisingly, then, although national character often purports to explain features of culture by reference to personality traits supposedly generated by
socialization practices, its interpretations are often insubstantial. Arguably, the schools greatest contribution is
a number of rich and nuanced and informative depictions
of national cultures. These depictions usually arise from
study of the target culture through written sources (in
other words, at a distance). In this methodological respect,
the national character school (along with psychohistory)
departs from the time-honored fieldwork, the empirical
methodology of cultural anthropology, including most of
psychological anthropology.
67
Issues
The diversification of culture and personality in the 1970s
and 1980s can be glimpsed by noting major issues treated
by leading practitioners. The early work of Anthony
Wallace has already been mentioned. Wallace built organically on his work to treat acculturation and revitalization, and to explore the interfaces among anthropology,
psychiatry, human biology, and neuroscience. In the
manner of Kardiner, Erik Erikson, also an eclectic
psychoanalyst, lavishly imbibed anthropology in pursuit
of an understanding of personality and culture. His conception of ontogenesis is epigenetic, life history is his
method par excellence, identity his master concept,
and biography his main vehicle for relating personality
to culture and history. Although psychological anthropology has largely moved away from psychoanalysis, the
careers of Weston LaBarre and George Devereux have
remained faithful to the early charter, and to excellent
effect. Nowhere is there a fuller and more psychoanalytically informed depiction of religion as a projective institution than in LaBarres work in 1972, and Devereux,
whose renaissance-like career touched so many important
bases within and beyond psychological anthropology,
leads the field in adapting the clinical methodology of
psychoanalysis to field work. Alfred Irving Hallowell,
whose brilliant and eclectic career predated and outlasted
classical culture and personality, worked out, especially as
regards self and culture, the basics of what cultural psychology, 30 years later, claimed to have invented. Regarding psychological functions of religion, the psychology of
68
Anthropology, Psychological
Cultural Psychology
Cultural psychology, now probably the largest part of
psychological anthropology, is founded on a reaction
against much of the discipline of psychology as well as
much of earlier psychological anthropology, and what
purports to be a new and better psychological anthropology inquiry agenda. Cultural psychology dislikes four
things:
1. Psychic unity conceptions, which posit central
psychological processing mechanisms that can be learned
about and characterized independent of any of their
real-life instances, but which everywhere operate to generate thought, feeling, and behavior. Presumably these
mechanisms include the perceptual processes that the
psychologist studies in the laboratory, and the ontogenetic
developmental processes that psychoanalysis posits and
that culture and personality adopted and adapted.
2. Nomothetic interpretation, for reasons shortly to
be seen.
Anthropology, Psychological
Evolutionary Psychology
Evolutionary psychology is the child of sociobiology, in the
lineage of bioevolutionary theses on human nature and
cultural elaboration of same. Few psychological anthropologists study evolutionary psychology, but, along with
psychological anthropology, it is centrally concerned with
the psychological bases for culture and how they arise.
Unlike psychological anthropology, evolutionary psychology posits evolutionary sources and draws heavily on evolutionary science to make its case, especially those parts of
evolutionary science that examine how natural selection
operates to maximize reproductive fitness. To cut right to
the chase, as expressed by Robert Wright, what the theory
of natural selection says . . . is that peoples minds were
designed to maximize fitness in the environment in
which those minds evolved . . . the . . . environment of evolutionary adaptation . . . .Or . . . the ancestral environment. For humans, the environment of evolutionary
adaptation is the world of the foraging band, in which all
of humankind lived for (conservatively) the first 90% of
human history. In short, the evolved, biologically rooted
human psychological and behavioral repertoire was fine
tuned with reference to the adaptive exigencies of foraging
band lifeways. By the evolutionary clock, it has been but the
blink of an eye since human ancestors relinquished forager
lifeways, and that is still basically human nature. From all of
this, evolutionary psychology sets itself, inter alia, two large
tasks: (1) to identify behavioral expressions of human forager psychologies in contemporary lifeways and (2) to
illustrate the (perhaps cruel) ironies of history, viz., how
historical change has consigned humans to modern lifeways that are often radically at odds with the most fundamental evolved aptitudes.
Accepting that humans are an evolved species, and that
this entails that everything human is in some sense
grounded in and/or an expression of an evolutionary
odyssey, has evolutionary psychology advanced understanding of the specific sense in which those aspects of
humanness that interest us are expressions of the nature of
the evolved, biologically rooted human species? A yes
answer includes the following dicta:
1. Evolutionary science provides the core theory (e.g.,
natural selection, reproductive fitness, kin selection, and
reciprocal altruism) for evolutionary psychology.
69
Cross-Cultural Psychology
To begin, take cross-cultural psychology literally: it
largely intends to deploy the concepts and methods of
Western (positivistic) psychology in other cultures. In
doing so, it has fostered lavish communication and collaboration among scholars from many nations and
cultures. For (at least) two reasons, it is fitting to conclude
this discussion with brief mention of cross-cultural psychology, even though it is not studied by anthropologists.
70
Anthropology, Psychological
First, in its several iterations, psychological anthropology is mainly about both psychology and culture
relationships, studied cross-culturally. And so, largely,
is cross-cultural psychology, as the following titles
from the authoritative Journal of Cross Cultural Psychology illustrate: Sex differences in visual spatial performance among Ghanaian and Norwegian adults;
Intimacy: a cross-cultural study; Relationship of
family bonds to family structure and function across
cultures. In addition, cross-cultural psychology has
taken all of the main branches of academic psychology
(e.g., perception, personality, cognitive) around the
world.
Second, anthropology, including psychological anthropology, often posits that culturally relevant psychology
is, to a significant extent, irreducibly culturally specific.
Cultural psychology, specifically, sometimes takes this
to the limit by eliminating the dependent clause. Crosscultural psychology goes in the opposite direction, viz.,
human psychology fundamentally comprises panhuman
processes that can be measured, anywhere, with the
methodologies that psychologists have developed, or
may develop. Within cross-cultural psychology, this
view has been extensively developed and applied by
the five-factor model, which posits five panhuman personality dimensions and claims a cross-culturally valid
methodology for measuring them. Given the importance
of especially personality for psychological anthropology,
this disagreement between cultural psychology and
cross-cultural psychology is emblematic of a (perhaps
the) fundamental metatheoretical issue in play here
viz., concerning the study of psychology cross-culturally,
can the conceptual and empirical methodologies of
a scientific psychology neutralize or otherwise cut
through the relativizing fog of specific cultures?
Within psychological anthropology as treated here, the
Whiting school sides with cross-cultural psychology,
albeit its concepts and methods are vastly different. Acknowledging the fundamental importance of this issue,
some suppose that it may not be amenable to empirical
resolution.
Further Reading
Bock, P. K. (1980). Continuities in Psychological Anthropology. W. H. Freeman, San Francisco.
Bock, P. K. (ed.) (1994). Handbook of Psychological Anthropology. Greenwood Press, Westport, Connecticut, and London.
Boesch, E. E. (1991). Symbolic Action Theory and Cultural
Psychology. Springer-Verlag, Berlin.
DeVos, G., and Boyer, L. B. (1989). Symbolic Analysis Cross
Culturally: The Rorschach Test. University of California
Press, Berkeley.
Kardiner, A., Linton, R., Dubois, C., and West, J. (1945). The
Psychological Frontiers of Society. Columbia University
Press, New York.
Kleinman, A., and Good, B. (eds.) (1985). Culture and
Depression. University of California Press, Berkeley.
LaBarre, W. (1970). The Ghost Dance: Origins of Religion.
Doubleday, New York.
Paul, R. (1989). Psychoanalytic Anthropology. (B. J. Siegel,
A. R. Beals, and S. A. Tyler, eds.). Annual Reviews. Palo
Alto, California.
Piker, S. (1994). Classical culture and personality. In Handbook of Psychological Anthropology (P. K. Bock, ed.).
Piker, S. (1998). Contributions of psychological anthropology.
J. Cross Cultural Psychol 29(1).
Spindler, G., and Spindler, L. (1978). The Making of
Psychological Anthropology. University of California Press,
Berkeley.
Suarez-Orozco, M. M., Spindler, G., and Spindler, L. (1994).
The Making of Psychological Anthropology II. Harcourt
Brace, Fort Worth.
Wallace, A. F. C. (1961). Culture and Personality. Random
House, New York.
Wiggins, J. S. (ed.) (1996). The Five-Factor Model of
Personality: Theoretical Perspectives. Guilford Press,
New York.
Wright, R. (1994). The Moral Animal. Why We Are the Way
We Are. The New Science of Evolutionary Psychology.
Random House, New York.
Archaeology
Patricia A. Urban
Kenyon College, Gambier, Ohio, USA
E. Christian Wells
University of South Florida, Tampa, Florida, USA
Glossary
culture history The school of thought that dominated
American archaeology prior to World War II; concerned
with determining the time depth of past cultures and their
spatial extent; little attention paid to explanation, group, or
individual dynamics.
interpretive archaeology A subset of postprocessual archaeology; focus is on understanding past cultures, rather than
explaining processes or change, and finding meaning in the
material remains of the past.
postprocessual archaeology A catch-all term used for
theory developed in opposition to processualist approaches;
rejects fixity of knowledge, holding that all knowledge, and
therefore all reconstructions of the past, are contingent on
the individual and that persons social and historical
context; focus is on human agency, identity, hegemony
and counterhegemony, or resistance; also includes many
gender/feminist and Marxist approaches.
processual (new) archaeology The school of thought
dominant in American archaeology from the 1960s through
the 1980s; focus is on archaeology as a science, on
hypothesis testing, and on development of laws of human
behavior.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
71
72
Archaeology
History
History and archaeology clearly overlap in temporal
periods, methods (some archaeologists are able to use
texts, whereas some historians rely on materials and
goods to amplify the textual record), and societies studied
(civilizations/high culturese.g., Egypt, Greece, China,
the Aztecs). Archaeology covers the larger time span, and
without written records tends to focus on long-term
processes and group activities, defining societies on the
basis of differences in material culture rather than in
language or self-ascription. Nevertheless, both history
and archaeology are concerned with diachronic and synchronic change and have similar approaches to explanation. Despite this, little conscious attention has been paid
by archaeologists to history, except for recurring
enthusiasms for Collingwoods contention that it is possible to think ones way into the pastthat is, to understand it on its own terms. Recently, the Annales school of
history, founded by Fernand Braudel, whose most influential work for archaeologists is The Mediterranean and
the Mediterranean World in the Age of Phillip II, has
Archaeology
affected a few archaeologists, but in different, almost opposite, ways. One objective of this school is to examine
human history in the longue duree, the long-term relationship between humans and their environment. Such
inquiry has been basic to archaeology: the current difference is a greater concordance between archaeological and
historical theory about investigating these issues. The
smaller scale aspects of Braudels theory are also of interest to archaeology. These include medium-range
processes such as conflict and economic change and
shorter trends, such as group or individual decisions.
The latter concern, agency, is of particular interest to
postprocessualists, although they derive much of their
inspiration not from the Annalistes, but from sociology
and cultural anthropology. More recently, the Annales
school has de-emphasized the extreme long term, instead
providing detailed descriptions of events, with special
attention to agency. The descriptive aspects are similar
to thick description as defined by Clifford Geertz, again
of interest in contemporary archaeological theory.
Postprocessual, or interpretive, archaeologists also concur
with Annalistes in rejecting the positivism of natural science. Works such as Interpreting Archaeology: Finding
Meaning in the Past, by Ian Hodder and associates, amply
illustrate the trend toward understanding and away from
explanation, particularly of a positivistic sort.
73
Geography
Political Science
Archaeology and political science overlap in their concern
for the development of the state and the forms of society
leading up to it. In archaeology, this is known as the origins
of social complexity, and archaeologists have drawn most
inspiration from cultural anthropologists, both theorists
(e.g., considerations of evolutionary stages such as bands,
tribes, chiefdoms, and states) and ethnologists (for example, ethnographers working on African polities). Political
science theories have been tested against archaeological
data and largely found wanting; for example, humans have
never lived in a state of nature. To be fair, political scientists are more concerned with modern governmental
structures, international relations, and ideal types of social
organization than with the pottery and stone tools of actual
early states. Nonetheless, the connections between political science and archaeology are few because the overlap
in subject material is more apparent than real. A classic
work addressing both political science perspectives and
archaeology is Origins of the State, edited by Ronald
Cohen and Elman Service.
Economics
Manufacturing, distribution, and consumption activities
are central to archaeological inquiry, because these
processes leave material remains. Interpreting the
74
Archaeology
Archaeology
Sociology
The relationship between sociology and archaeology is not
direct, and is for the most part mediated by cultural anthropology. Archaeologys recent concern with group and
individual identity, as mentioned previously, is related to
sociologys work with race and ethnicity. More significant
are four sociologists who have had a profound influence
on the postprocessual, or interpretive, school of archaeological thought: Wallerstein, Foucault, Giddens, and
Bourdieu.
Emmanuel Wallerstein, an historical sociologist, developed the body of thought known as world systems
theory, which is concerned with understanding the
relationships between core and peripheral states in systems made up of multiple polities; his central works are
75
the three volumes of The Modern World System. In Wallersteins view, peripheral states produce raw materials in
bulk, which are shipped to the core states for processing.
The manufactured goods produced in the core states are
then shipped back to the peripheries, the members of
which, lacking industries of their own, are forced to purchase the cores goods. Raw materials have less value than
finished products, so the peripheral states are kept in
perpetual subjugation to the more prosperous cores. Wallersteins prime example of such a system, which need not
literally cover the world, is the Mediterranean in the early
stages of capitalism.
Because it was developed to explain relationships
within capitalist economic systems, world system theory
is not applicable in its original formulation to prehistoric
states. The general idea of central, developed, core areas
with dependent, underdeveloped peripheries has,
however, been significant in much recent archaeological
debate. Counterarguments have been mounted stating
that dependency, coreness, and peripherality are in
the minds of the analyst, and that use of the terms
core and periphery prejudge the relationships to be
found within systems of interaction. Despite any specific
problems with Wallersteins ideas, the interconnectedness he posits among political units has been crucial to
understanding ancient interpolity interactions, and many
archaeologists focus, as does world systems theory, on
the study of networks of political entities, rather than
on specific political units. The 1999 edited volume of
World Systems Theory in Practice by P. Nick Kardulias
and the chapters in Resources, Power, and Interregional
Interaction (1992) by Edward Schortman and Patricia
Urban show how archaeologists have amended and/or
abandoned aspects of Wallersteins conceptions.
Wallersteins concerns with economic patterns makes
his work of natural interest to archaeologists; the attraction
of the more abstract notions of Michel Foucault (who is
variably considered an historical sociologist or an historian), Anthony Giddens, and Pierre Bourdieu (who is
either a sociologist or an anthropologist, depending on
the commentator) are often more difficult to grasp.
Christopher Tilley was among the first to utilize Foucaults
self-proclaimed archaeological method to examine not
the past, but how archaeologists view and reconstruct the
past. Subsequently, writers such as Julian Thomas and
Trevor Kirk (e.g., in Tilleys Interpretative Archaeology)
have shown how Foucaults work can aid in a reconceptualization of archaeology, one that recognizes the near
impossibility of gaining a true understanding of the past.
Because what is claimed about ancient peoples lives is
determined not by their intellectual systems, but rather
by our own, we must be aware that there is no definitive
past, only contingent pasts shaped by contingent presents. Finally, Foucault calls our attention to power: it
is part of all social relations and actions. Individuals
76
Archaeology
have power to, that is, they can through their own
agency, try to ensure that events redound to their benefit.
The complementary idea of power over reminds us that
some individuals or groups can affect activities by manipulating other people. These others need not be complacent in the face of manipulation, but may resist or subvert
the actions of dominants. Thus, power, like knowledge or
identity, is contingent on circumstance. Responding to
evolutionary thought as applied to culture change,
Giddens has adumbrated structuration theory. Humans
know their history, and act in accordance with it. Their
actions are reinterpretations of and reactions to social
structures; as they act, they change existing conditions,
but within the constraints of their own history.
Finally, contemporary archaeological theory manifests
considerable influence from Bourdieus theory of human
action. In his perspective, the everyday activities that we
all carry out are largely unconscious, and are passed down
from earlier generations. This quotidian activity is termed
habitus. Cultures have particular forms of habitus that
characterize them, and are parts of individual and
group identity. Even carrying out habitual activity,
however, people make changes in their practices, and
therefore engender change in the over-arching structures
that generate habitus in the first place. Bourdieus and
Giddens ideas are similar; they are subsumed under the
larger rubric of practice theory, which is not confined to
sociology. It has had profound influence on contemporary
anthropological thought, as discussed, for example, by
Sherry Orton in her 1984 article Theory in Anthropology
since the Sixties.
Sociocultural Anthropology
Earlier, the origins of archaeology in the United States
were discussed as the companion science to anthropology,
both united in an evolutionary perspective that stressed
unilineal change through a series of fixed stages. One of
Franz Boas most enduring imprints on anthropology
and archaeologywas his debunking of unilineal cultural
evolution and the racist tenets that the schema both deliberately and inadvertently promoted. Boas work did,
however, show the value of the direct historical approach
for understanding prehistoric Native American societies
and furthered the use of ethnographic analogy as an analytical toolthat is, using data from modern peoples
material culture to develop analogies about how prehistoric artifacts were used. The culture historical approach
also obtained from Boas studies the idea of particularism,
that each group has its own individualistic developmental
trajectory. Although Boasian particularism did much to
promote respect and understanding of different cultures,
it also had the unfortunate effect of dampening comparative approaches in archaeology, except for comparison of
artifacts for dating purposes. Seeing each culture as totally
Archaeology
77
Those current practitioners who find latter-day versions of processualism congenial are in the so-called scientific subset of archaeology, whereas those who abhor
positivism are today called postprocessualists or interpretive archaeologists. This latter group is not a coherent
school, but rather is composed of a number of groups
that may be protoschools, and they take their inspiration
from very different currents in anthropology, as compared
to the scientists. Before turning to this most recent
current in anthropological archaeology, the influence of
Marxist thought must first be examined, for Marxism,
though often occluded due to pressures and prejudices
in modern Western society, was and continues to be
a highly significant force in archaeological thought.
The ideas outlined in the preceding discussion, as well
as others that cannot be covered due to limitations of
space, often have a Marxist subtext. Sherry Ortner suggests that the predominant strain of Marxist thought in
American anthropology is structural Marxism. Structural
Marxists see the determinative forces behind culture
change in social relations, particularly those subsumed
under modes of production, or those aspects of social
organization that mediate the production of goods and
services. Structural Marxists also emphasize ideas,
which had no independent place in the processualist
scheme. Ideas, or ideology, serve many purposes, such
as legitimating the existing system, obscuring inequality,
or providing a rationale for elite prominence. Thus, this
variant of Marxism deals with materialism and ideas,
a task not accomplished by processualists.
There are additional significant parts of a Marxist perspective that have attracted archaeologists because there
appear to be material correlates for particular social situations or processes. The first of these is inequality, which
can be marked by differential access to basic resources
such as food and housing as well as to exotic materials such
as imported goods, or to luxuries that define different
social roles or statuses. Inequality leads to conflict; thus
one of the crucial aspects of the past to examine is the
degree of inequality, and the factions or groups engaged
in conflict. This conflict may be overt, but can also be
more subtle, a form of resistance rather than outright
hostility.
A final Marxist-related concept salient for contemporary archaeology is hegemony. Two groups of researchers
appear to take an interest in hegemony and possible counterhegemony, or resistance: those working with complex,
hierarchically organized societies, with a clear, dominant
elites, and those who find themselves in situations
wherein a more complex, or so-called high, culture is
in a position to influence smaller or less complexly organized neighboring peoples. Hegemony in the former situation indicates domination with control, but for cultural
groups in contact, it may simply mean influence or even
dominance without any specific controlling mechanism.
78
Archaeology
The most thorough current review of Marxism and archaeology is Thomas C. Pattersons Marxs Ghost. So, in
sum, the methods archaeologists have obtained from
Marxism are analytical ones, such as examining class
structure, or looking for evidence of conflict among members of a society.
One critique of Marxist approaches is that, like
processualist ones, they emphasize groups and largescale social dynamics. Individual people, or even smaller
groups, are not present, and humans appear to be buffeted by forces beyond their control, rather than being
active agents making decisions and acting on their outcomes. For this reason, as Ortner has so cogently pointed
out, anthropology has increasingly turned to practice theory; archaeologists have followed suit. Much of practice
theory is derived from the works of Foucault and
Bourdieu. Cultural anthropologists, to be sure, are making their own contributions to practice theory, but the
foundation remains the French sociologists discussed earlier. Practice theory can present difficulties for archaeologists, because mental processes are not preserved in the
archaeological record, only the material results of those
processes. Any given material situation, however, can
have many causes, and the difficulty for archaeologists
is deciding among those causes. Here lies one of the
problems with postprocessual thinking: in rejecting positivism, some archaeologists have rejected as well the possibility of testing or confirming any statement made about
the past. In the extreme, this is a sort of radical relativism
in which any interpretation is as good as another is. In
more moderate guise, a number of competing ideas remain, and, if postprocessualists are correct, there are no
means of deciding among them.
The preceding discussion has focused on how American archaeologists tend to approach theory borrowed
from sociocultural anthropology. However, it should be
remembered that the British and Continental views of
the relationship between anthropology and sociology,
on the one hand, and archaeology, on the other, may
be quite different. Michael Shanks and Christopher Tilley
evidence this in their book Social Theory and Archaeology. In addition, Ian Hodder, who has been highly influential in the United States for the development of both
processual archaeology, with his work on spatial analysis,
and postprocessual archaeology, comes from the British
tradition. Thus, in the past two decades, American, British, and European strands of thought have become more
closely knit as there is a convergence on structuration,
practice theory, and Marxist approaches.
Conclusion
Postmodernism in general has taught us that knowledge
is contingent; in an archaeological context, this means
that what we say about the past is conditioned by our
Statistical Inference
Statistical inference involves hypothesis testing
(evaluating some idea about a population using
a sample) and estimation (estimating the value or potential range of values of some characteristic of the population based on that of a sample). Archaeologists were
relatively slow to realize the analytical potential of statistical theory and methods. It is only in the past 20 or 30
years that they have begun to use formal methods of data
analysis regularly. Influential essays by George Cowgill
(e.g., in 1970 and 1977) and others such as David Clarke
(in 1962), Hodders 1978 Simulation Studies in Archaeology, Ortons 1980 Mathematics in Archaeology, and
Spaulding (in 1953) demonstrated to archaeologists
that, because most of their data represent samples of
larger populations, statistical methods are critical for
identifying empirical patterns and for evaluating how precisely and how accurately those patterns represent real
trends in the broader world.
In addition to basic, descriptive statistics that
summarize central tendency (what is a typical case?)
and dispersion (how much variation is there?) in batches
Archaeology
79
Combined Approaches
One of the more influential quantitative studies that
emerged in archaeology during the 1970s, based on
central-place theory, combines graphic display with statistical inference. The theory, originally developed for
market economies by geographers Christaller and
Losch, proposes that a regular hexagonal distribution of
hierarchically ordered central places is optimal for minimizing the cost of travel and transport and for maximizing
economic profits. In central-place analysis, the hierarchy
of central places is established on the basis of the sizes of
centers (e.g., using rank-size measures, as discussed by
Haggett in Locational Analysis in Human Geography and
in numerous articles by archaeologists). Size is assumed to
correlate positively with the number of functions the center performs such that larger centers perform more
functions than do smaller centers. Rank-size analysis
can be used to examine the degree of socioeconomic integration of a settlement system by plotting the sizes and
ranks (based on size) of all settlements on a graph. For this
80
Archaeology
Further Reading
Benzecri, J. P. (1973). LAnalyse des Donnees, Tome 2:
LAnalyse des Correspondances. Dunod, Paris.
Bourdieu, P. (1977). Outline of a Theory of Practice. (Richard
Nice, transl.). Cambridge University Press, New York.
Claassen, C. (ed.) (1992). Exploring Gender through Archaeology: Selected Papers from the 1991 Boone Conference.
Prehistory Press, Madison, Wisconsin.
Clarke, D. L. (1962). Matrix analysis and archaeology with
particular reference to British beaker pottery. Proc.
Prehistor. Soc. 28, 371 382.
Cowgill, G. L. (1970). Some sampling and reliability problems in archaeology. In Archeologie et Calculateurs (J. C.
Gardin, ed.), pp. 161 172. Centre National de la
Recherche Scientifique, Paris.
Cowgill, G. L. (1977). The trouble with significance tests and
what we can do about it. Am. Antiquity 42, 350 368.
Diehl, M. W. (ed.) (2000). Hierarchies in Action. Center for
Archaeological Investigations, Southern Illinois University
at Carbondale.
Dobres, M.-A., and Robb, J. E. (eds.) (2000). Agency in
Archaeology. Routledge, New York.
Feinman, G. M., and Manzanilla, L. (eds.) (2000). Cultural
Evolution: Contemporary Viewpoints. Kluwer Academic/
Plenum Publ., Madison, Wisconsin.
Foucault, M. (1972). The Archaeology of Knowledge.
(A. M. Sheridan Smith, transl.). Pantheon Books, New York.
Giddens, A. (1984). The Constitution of Society: Outline of the
Theory of Structuration. University of California Press,
New York.
Haas, J. (ed.) (2001). From Leaders to Rulers. Kluwer
Academic/Plenum Publ., New York.
Haggett, P., Cliff, A. D., and Frey, A. (1977). Locational
Analysis in Human Geography, 2nd Ed. Wiley, New York.
Hodder, I. (2001). Archaeological Theory Today. Kluwer
Academic/Plenum Publ., New York.
Hodder, I., and Hutson, S. (2003). Reading the Past: Current
Approaches to Interpretation in Archaeology, 3rd Ed.
Cambridge University Press, New York.
McGuire, R. H. (1992). A Marxist Archaeology. Academic
Press, San Diego.
McGuire, R. H., and Paynter, R. (eds.) (1991). The Archaeology of Inequality. Berg, Providence.
Nelson, S. M., and Rosen-Ayalon, M. (eds.) (2002). In
Pursuit of Gender: Worldwide Archaeological Approaches.
AltaMira Press, Walnut Creek, California.
Orton, C. (1980). Mathematics in Archaeology. Collins,
London.
Patterson, T. C. (2003). Marxs Ghost: Conversations with
Archaeologists. Berg, Providence.
Patterson, T. C., and Gailey, C. W. (eds.) (1987). Power
Relations and State Formation. American Anthropological
Association, Washington, D.C.
Archaeology
Preucel, R. W. (ed.) (1991). Processual and Postprocessual
Archaeologies: Multiple Ways of Knowing the Past. Center
for Archaeological Investigations, Southern Illinois
University at Carbondale.
Robb, J. E. (ed.) (1999). Material Symbols: Culture and
Economy in Prehistory. Center for Archaeological
Investigations, Southern Illinois University at Carbondale.
81
Aristotle
James G. Lennox
University of Pittsburgh, Pittsburgh, Pennsylvania, USA
Glossary
definition For Aristotle, a definition is an account of the
essential characteristics of the object being defined. These
essential characteristics are discovered through the use of
division and causal investigation. In fact, for Aristotle, to
know the fundamental cause of something is critical to
having a scientifically valid definition of it. Because scientific
definitions identify a things proper kind and the fundamental ways in which it differs from other members of the
kind, a scientific definition can be represented by means of
a demonstration.
demonstration A form of proof that provides scientific
understanding of a fact within the domain of the science.
As Aristotle defines it in his Posterior Analytics, such
understanding comes from knowing the primary causal
explanation of the fact in question.
division A method for systematically examining variation, or
relations of similarity and difference, within a kind. In On
the Parts of Animals and History of Animals, Aristotle
criticizes the a priori method of dichotomous division
defended by his teacher, Plato, and presents an empirical
method of division that permits the systematic organization
of information in scientific domains in which the objects are
complex and multivariate.
polis A transliteration of the Greek word for a common
method of social, political, and economic organization in
classical Greece. On Aristotles account, a polis is a natural
result of the development of human society.
social animal Aristotle uses the expression politikon zoon
(social animal) to refer to a special subclass of gregarious
animals, which includes humans and the social insects.
Note that politikon is an adjective based on polis, but it
would be very misleading to translate it as meaning
political. Aristotle discusses social animals as a group in
History of Animals, and, in Politics, he discusses human
social organization as a form of animal social organization,
differentiated in accordance with the characteristics that
distinguish humans from other social animals.
the more and less A technical expression in Aristotles work,
along with excess and deficiency, referring to measurable
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
83
84
Aristotle
Theory of Science
A group of treatises known since the Middle Ages as
the Organon present Aristotles views on the cognitive
tools necessary for thinking clearly about any subject.
Aristotles Categories presents a list of types of answers
to fundamental questions that can be asked about any
subject, and a theory about how these questions and answers can be related to one another. At the most abstract
level, the answers constitute basic categories: substance,
quality, quantity, place, time, relation, and so on. But it is
important to note that the Greek names for these categories are nominalized interrogatives: the what is it, the what
sort, the how much, the where, the when, and so on.
Aristotle also explored the basic features of the
ontology behind these categories. He argued that only
individuals in the what is it categoryprimary substances, he called themare self-subsistent; everything
else is inherent in these (i.e., dispositions, qualities, sizes,
and changes do not exist on their own) and/or is said of
them (their species and genus names, their locations, and
their relationships). This is no mere grammatical point for
Aristotle. In insisting on the ontological primacy of the
particular substance, he is taking a stand against Platonism
in all its formsfor it was Plato who argued that things like
maple trees are merely fleeting and insubstantial participants in eternal forms such as goodness, unity, or equality,
thus treating the particular objects encountered with the
senses as inappropriate objects of scientific investigation.
By arguing that being good, or being a unit, or being
equal were all dependent on the existence of particular
good, singular, or equal objects, Aristotle saw himself
as rescuing empirical science from the Platonists.
The Organon also includes a small treatise that systematically investigates the different forms of propositions
and their relationships (De Interpretatione), a much
larger treatise on methods for carrying on debates on
any subject whatever (Topics), and four books on Analytics, which have traditionally been divided into two units
called Prior Analytics and two called Posterior Analytics,
though Aristotle makes it clear that all four books constitute a single investigation. The first two are the very first
formal investigation of proofthey are part formal logic
Aristotle
85
86
Aristotle
Some animals have all their parts the same as one another,
some have their parts different. Some of the parts are the
same in form, as the nose and eye of one human and that
of another, or the flesh and bone of one in comparison
with another; and the same is true of horses and of the
other animals, so long as we say they are the same as one
another in formfor just as the whole animal is alike to the
whole, so too is each of the parts alike to each. Others
those for which the kind is the samewhile the same,
differ according to excess or deficiency. And by kind I
mean such things as bird and fish; for each of these differs
according to kind, and there are many forms of fishes and
birds.
Most of the parts in these animals [those that are the
same in kind] differ by way of the oppositions of their
characteristics, such as their color and shape, the same
parts being affected in some cases to a greater and in some
cases to a lesser degree, and again by being more or fewer
in number, or greater and smaller in size, and generally
by excess and deficiency. For some of them will have
soft flesh and some hard flesh, a long beak or a short
beak, many feathers or few feathers. Moreover, some
differ from others in a part that belongs to one not belonging to the other, as for example some birds having spurs
and others not, and some having crests and others not.
But speaking generally, the majority of the parts from
which the entire body is constituted are either the same,
or differ by opposition, i.e. by excess and deficiency; for
we may posit the more and less to be a sort of excess
and deficiency.
The parts of some animals, however, are neither the
same in form nor according to excess and deficiency,
but are the same according to analogy, such as the way
bone is characterized in comparison to fish-spine, nail in
comparison to hoof, hand to claw or feather to scale; for
what is feather in a bird is scale in fish.
[History of Animals 486 a 15b 22]
Aristotle goes on to note (486 b 22487 a 13) that similardifferentiations can be made regarding the position of
a part; the same theory of sameness and difference can
be applied to the uniform parts, or tissues, generally,
and indeed to all of the important characteristics of
animals, their ways of life (aquatic, avian, terrestrial),
their activities or functions (modes of locomotion, feeding, reproduction), and their character traits (social, predatory, aggressive). It is precisely these measurable
variations in shared characteristics for which scientific
explanations are sought. Here is a rather typical example,
from On the Parts of Animals, regarding the parts of
birds: Among birds, differentiation of one from another
is by means of excess and deficiency of their parts,
i.e. according to the more and less. That is, some of
them are long-legged, some short-legged, some have
a broad tongue, others a narrow one, and likewise too
with the other parts (On the Parts of Animals Book
IV. x12, 692 b36). Aristotle next notes that all birds
have feathers differing by more and less, but that
Aristotle
Ptarmigan
(Feathered)
Jacana
(Walking on
floating plants)
Shag
(Swimming)
Jungle fowl
(Walking, scraping)
Coot
(Swimming)
Crow
(Perching, lifting)
87
Sea eagle
(Raptorial)
Figure 1 Kinds, forms of kinds, and the more and the less. Types of bird feet, indicating
adaptations for locomotion and predation. Reproduced with permission from Thomson, J. A.
(1970), The Biology of Birds, in Encyclopedia Britannica, 14th Ed.
88
Aristotle
Toucan
Hawk
Shoveler
Petrel
Curlew
Spoonbill
American robin
Cockatoo
Goosander
Grosbeak
Figure 2 Matter, form, and kind. Types of bird bills, indicating adaptations for
feeding. Reproduced with permission from Thomson, J. A. (1970), The Biology of
Birds, in Encyclopedia Britannica, 14th Ed.
Aristotle
which something develops and its goal is best for it, and
self-sufficiency is the goal of communities and best for
them. Thus from these points it is apparent that the polis
exists by nature, and that mankind is by nature a social
animal.
[History of Animals 1252 b 281253 a 3]
89
Conclusion
It would not be surprising if the reader were at this
point asking the question Where is measurement?.
90
Aristotle
Further Reading
Charles, D. (2000). Meaning and Essence in Aristotle. Oxford
University Press, Oxford.
Cooper, J. M. (1990). Political animals and civic friendship.
In Aristoteles Politik (G. Patzig, ed.). Vandenhoeck &
Ruprecht Verlag.
Depew, D. (1995). Humans and other political animals in
Aristotles History of Animals. Phronesis 40, 156181.
Gotthelf, A., and Lennox, J. G. (eds.) (1987). Philosophical
Issues in Aristotles Biology. Cambridge University Press,
Cambridge.
Kullmann, W. (1991). Man as a political animal in Aristotle.
In A Companion to Aristotles Politics (D. Keyt and
F. D. Miller, Jr., eds.). Oxford University Press.
Kullmann, W., and Follinger, S. (eds.) (1997). Aristotelische
Biologie: Intentionen, Methoden, Ergebnisse. Franz Steiner
Verlag, Stuttgart.
Lennox, J. G. (2001). Aristotles Philosophy of Biology: Studies
in the Origins of Life Science. Cambridge University Press,
Cambridge.
Miller, F. D., Jr. (1995). Nature, Justice, and Rights in
Aristotles Politics. Oxford University Press, Oxford.
Artificial Societies
J. Stephen Lansing
University of Arizona, Tucson, Arizona, USA
Glossary
agent-based Relying on the activities of individual agents
who respond to events within the area of their local
knowledge, but not necessarily to global events.
Boolean network A system of n interconnected binary
elements; any element in the system can be connected to
a series I of k other elements, where k (and therefore I) can
vary. For each individual element, there is a logical or
Boolean rule B, which computes its value based on the
values of the elements connected with that particular
element.
cellular automaton Usually a two-dimensional organization
of simple finite-state machines where the next state of each
machine depends on its own state and the states of some
defined set of its neighbors.
dynamical systems theory The branch of mathematics
devoted to the motions of systems that evolve according
to simple rules. It was developed originally in the 17th
century by Newton to model the motions of the solar
system, evolving under the rules of his new theory of
universal gravitation.
edge of chaos The hypothesis that in the space of dynamical
systems of a given type, there will generically exist regions
in which systems with simple behavior are likely to be
found and other regions in which systems with chaotic
behavior are to be found. Near the boundaries of these
regions, more interesting behavior, neither simple nor
chaotic, may be expected.
Feigenbaums number A constant (4.6692 . . . ) that represents the rate at which new periods appear during the
period-doubling route to chaos.
genetic algorithm An evolutionary algorithm that generates
each individual from an encoded form known as
a chromosome or genome. Chromosomes are combined
or mutated to breed new individuals. A genetic algorithm is
useful for multidimensional optimization problems in which
the chromosome can encode the values for the different
variables being optimized.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
91
92
Artificial Societies
Characteristics of Artificial
Societies Models
Contemporary artificial societies are a specialized type of
simulation model that typically employs an objectoriented, agent-based system architecture. As Epstein
and Axtell observe with regard to one of the first such
models, called Sugarscape: if the pure cellular automaton is a space with no agents living on it, and the pure
adaptive agents model represents agent kinetics with no
underlying space, then the Sugarscape model is a synthesis
of these two research threads. In models such as Sugarscape, agents are implemented as objects that are able to
perceive features of their environment, which may include messages from other agents. Typically they are
able to process these perceptions and make decisions
about their subsequent behavior. Agents may also possess
attributes of memory and a capacity for learning. These
characteristics add several dimensions of novelty to artificial societies models, in comparison with conventional
equilibrium models. To begin with, the architecture of
these models enables the investigator to utilize dynamical
systems theory in order to investigate both equilibrium
and non-equilibrium conditions. In other words, the behavior of agents and societies in state space can be studied
through controlled experiments in which behavioral or
environmental parameters are tuned. Agents can be as
heterogeneous as the investigator chooses to make them
and the model environment can be arbitrarily simple or
complex. Archeologists are particularly fond of very detailed environmental models based on geographic information systems. Using parameter sweeps, they can model
questions such as the effects of varying rainfall on
a landscape, with respect to the growth of crops and
the spatial patterning of human activities. Unlike equilibrium models in economics and sociology, artificial societies models generally involve time as a critical dimension.
The usual strategy is to create a landscape with agents that
are instructed to follow a set of rules. Such rules might
implement evolutionary dynamics, or trading rules for an
artificial stock market, or kinship rules in an anthropological study. Since the behavior of each agent depends on its
Artificial Societies
93
Subsequently, Kristian Lindgren embedded game-playing agents on a lattice, adding greater flexibility by making
memory length an evolutionary variable. Over tens of
thousands of generations, he observed the emergence
of spatial patterns that resemble evolutionary processes
and clarify preconditions for the emergence of cooperation and competition. Such simulation results have
inspired behavioral ecologists to reexamine biological systems. For example, Manfred Milinski has studied stickleback fish, which enjoy a well-earned reputation for
keeping abreast of the latest trends in animal behavior.
According to Milinski, cooperation in predator
inspection by the sticklebacks follows the dynamics of
the iterated Prisoners Dilemma. The results of these
simulations have also been used to model problems in
political science and economics. There is a large literature
on this topic.
However, cooperation is by no means the only emergent property investigated by social simulations. Philosopher Brian Skyrms has studied the evolution of the
social contract by modeling it as a problem in the
evolution of strategies. His most ambitious models
tackle large questions such as the evolution of justice,
linguistic meaning, and logical inference. Skyrms finds
that the typical case is one in which there is not
a unique preordained result, but rather a profusion of
possible equilibrium outcomes. The theory predicts what
anthropologists have always knownthat many alternative
styles of social life are possible. But this seems a bit
too modest. With respect to the evolution of meaning,
for example, Skyrms shows that evolutionary processes
provide a plausible answer to the fundamental question,
How do the arbitrary symbols of language become
associated with the elements of reality they denote?
94
Artificial Societies
of the Anasazi society of Long House Valley in northeastern Arizona from 1800 b.c. to 1300 a.d. The simple lattice
environment of Sugarscape was replaced by paleoenvironmental data on a 96 km2 physical landscape. The environment of Artificial Anasazi is populated with human
households, so that spatiotemporal patterns of settlement
formation and household production can be simulated
and compared with the archaeological record. A similar
approach was developed by Tim Kohler and Carla van
West to model human settlements in Mesa Verde circa
900 1300 a.d. Such models enable their creators to test
their intuitions about the complex nonlinear processes
involved in human environmental interactions. As
Kohler observes, agent-based approaches admit an important role for history and contingency (and) can also, in
principle, accommodate models that invoke heterogeneity among agents, or which drive social change through
shifting coalitions of agents, argued by many to be a critical
social dynamic.
Artificial Societies
Further Reading
Arthur, W. B. (1999). Complexity and the economy. Science
284, 107 109.
Axelrod, R. (1997). The Complexity of Cooperation: AgentBased Models of Cooperation and Collaboration. Princeton
University Press, Princeton, NJ.
Epstein, J., and Axtell, R. (1996). Growing Artificial Societies:
Social Science from the Bottom Up. The Brookings Institution,
Washington DC and MIT Press, Cambridge, MA.
Gilbert, N., and Conte, R. (eds.) (1995). Artificial Societies:
The Computer Simulation of Social Life. UCL Press,
London.
Gumerman, G., and Gell-Mann, M. (eds.) (1994). Understanding Prehistoric Complexity in the Prehistoric Southwest. Santa Fe Institute Studies in the Sciences of
Complexity, Proceedings Volume XXIV. Addison-Wesley,
Reading, MA.
Helmreich, S. (1998). Silicon Second Nature: Culturing
Artificial Life in a Digital World. University of California
Press, Berkeley, CA.
95
Stephanie Carmichael
University of Florida, Gainesville, Florida, USA
Glossary
attrition The loss of longitudinal study participants over time.
exposure time The temporary loss of study participants due
to some type of incapacitation effect.
longitudinal data Information from research that assesses
people or other units during more than one time period.
missing data Information on respondents or for variables that
are lost.
mortality Death of an individual during a (long-term)
research project.
Introduction
Stability and change are key themes in social research, and
panel, or longitudinal, data are optimal for the study of
stability and change. Cohort studies examine more specific samples (e.g., birth cohorts) as they change over
time; panel studies are similar to cohort studies except
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
97
98
Longitudinal Data
Longitudinal studies are designed to permit observations
of some specific phenomena over an extended period of
time. Unlike cross-sectional studies, which permit only
a snapshot of individuals or constructs at a particular
point in time, longitudinal studies have the key advantage
in that they can provide information describing how
processes remain stable and/or change over time.
There are several ways that longitudinal research can
be carried out. One of the most obvious is to identify
a cohort at birth and to follow that cohort prospectively
for a long period of time. Though ideal for the study of
within- and between-individual changes in social science
phenomena over time, prospective longitudinal designs
suffer from limitations, including financial costs; history,
panel, and testing effects; and sample attrition. On
a practical level, human life expectancies and stakes
make multidecade, longitudinal projects difficult for researchers to complete and sustain, especially because
such projects require significant financial resources.
Another type of longitudinal design is retrospective.
This approach avoids the long delay associated with the
prospective design by defining a cohort, retrospectively.
In a retrospective design, the researcher defines a cohort,
such as all persons born in 1970, and then retrospectively
collects various pieces of information, such as offending
histories. Limitations also exist with the retrospective longitudinal design. Specifically, such a design introduces
potentially serious concerns over recall errors (if selfreport information is gathered).
Attrition
As noted earlier, attrition occurs when some study
participants either drop out of the study permanently or
fail to participate in one or more of the follow-up assessments. Although attrition does not necessarily bias the
results, bias does occur when changes in an outcome differ
between cases that remain in the sample and cases that
drop out. In particular, when change in an outcome does
relate to the probability of that change being observed, the
effects observed among those who continue to participate
do not equal the effects in the total sample. Because the
probability of remaining in a study often depends on age,
and because the changes studied often seem likely to affect
the probability of remaining in the sample, panel studies
are susceptible to attrition bias. In sum, it is possible that
individuals who drop out of a longitudinal study differ
in important ways from individuals who do not. The key
issue is whether the attrition is random (i.e., attrition has
nothing to do with the outcome of interest) or nonrandom
(i.e., attrition has something to do with the outcome of
interest).
Three examples are worth pointing out here. The first
concerns the relationship between age and depression,
a relationship that researchers disagree on quite often.
Some report a U-shaped relationship between depression
and age, with middle-aged adults feeling less depressed
than younger or older adults, whereas others believe that
the rise of depression in old age is more myth than fact.
Mirowsky and Reynolds employed data from the first two
waves of the National Survey of Families and Households,
a large sample of more than 13,000 respondents ages 18
and older followed over a 6-year period, from 1988/1989
to 1994/1995. These authors analyzed the impact of attrition on estimates of the age-specific changes in depression over this 6-year period. Their findings indicated that
the cross-sectional relationship of baseline depression to
age differs for those who later drop out, compared to those
who stay in. Interestingly, once the authors controlled for
health and impairment, much of the difference vanished.
In sum, these authors concluded that panel models ignoring the attrition will imply that depression decreases in old
age, but models that control for attrition imply that
depression rises by an amount that increases with age.
The second example comes from criminology,
a discipline for which the issue of sample attrition is an
important concern because high-rate offenders may fall
victim to attrition more so than low-rate offenders, and
thus may be less likely to remain in the later waves of
longitudinal studies. Brame and Piquero used the first
and fifth wave of the National Youth Survey, a national
probability sample based on 1725 individuals between the
ages of 11 and 17 years at the first wave, to assess how
sample attrition influenced estimates about the longitudinal relationship between age and crime. In particular,
Mortality
Attrition due to mortality is also a concern for researchers
using panel data to study issues related to continuity and
change in social science phenomena. In several panel
studies, especially those studies that include high-risk
individuals and/or deal with high-risk behaviors, researchers need to be aware that some individuals may not be
present on key theoretical constructs because they
have died.
The issue of mortality has recently been introduced as
a central feature in research on criminal activity over the
life course because researchers have noticed that highrate offenders are much more likely to die earlier (within
longitudinal studies and at an earlier age) compared to
low-rate offenders, and that these high-rate offenders are
more likely to die via a violent death (i.e., as victims of
homicide). For example, before age 40, delinquent individuals are more likely to die from unnatural causes such
as accidents and homicide than are nondelinquent individuals. In a follow-up study of youthful serious offenders
paroled from the California Youth Authority institutions,
Lattimore et al. found that homicide was the prevailing
cause of death for the youth. The study also revealed that
a higher probability of death by murder was observed for
black youth, for those from Los Angeles, for those with
a history of gang involvement and institutional violence,
and for those with a history of drug arrests. To the extent
that such individuals are assumed to have desisted from
crime in longitudinal studies, then researchers will have
incorrectly identified these deceased delinquents as
desisted delinquents.
99
Exposure Time
The final methodological concern deals with exposure
time. This challenge deals with the temporary loss of
study participants to some type of incapacitation effect,
such as hospitalization, jail/prison term, etc. Unlike sample attrition and subject mortality, the issue of exposure
time has been relatively under-studied because of the lack
of adequate data containing information on time periods
(i.e., spells) associated with exposure time.
Consider a criminological example. In longitudinal
studies of crime over the life course, researchers derive
estimates of individuals offending over some period of
time, typically over a 6- or 12-month period. However,
during these time periods, individuals may not be free to
commit criminal acts. The calculation of time at risk,
street time, or free time, then, is crucial to estimating
individual offending rates, because offenders cannot commit crimes on the street while incarcerated.
Estimating an individuals offending frequency
without taking into consideration their exposure time
assumes that he or she is completely free to commit
crimes. Under this assumption, an individuals true rate
of offending is likely to be miscalculated because some
offenders are not completely free. Some researchers have
recognized the importance of this problem and have
implemented controls for street time, but many longitudinal self-report studies do not implement controls for
street time. The importance of this issue was recently
demonstrated by Piquero et al. In their study of the recidivism patterns of serious offenders paroled from
the California Youth Authority, these authors found
that conclusions regarding persistence/desistance were
contingent on knowledge of exposure time. For example,
without controlling for street time, they found that 92%
of their sample desisted; however, with controls for exposure time, only 72% of the sample desisted. In sum,
variations in exposure time can affect measurements of
key social science phenomena and need to be considered
in panel studies.
100
Data Examples
The methodological challenges raised herein can be illustrated with two data examples that describe follow-up
activities of two important longitudinal and panel studies.
First, consider the results of a follow-up survey conducted
by researchers at Johns Hopkins University in Baltimore,
Maryland, who were part of the National Collaborative
Perinatal Project (NCPP), a large-scale medical research
project initiated in the late 1950s. In the 1990s, a team
from Johns Hopkins initiated a follow-up study of a sample
of the original Baltimore portion of the NCPP. Of the
approximately 4000 children who were initially followed
in Baltimore, 2694 were eligible to take part in the followup study at the point in time when the individuals would
have been between 27 and 33 years old. Of the eligible
2694 individuals, 17.59% (n 474) were not located, leaving 82.41% (n 2220) of the individuals available. For
several reasons, 28 of these individuals were not fieldable,
thus leaving 2192 individuals fieldable. At the end of
the follow-up study, 1758, or 80%, of the fieldable individuals received a full follow-up interview. The remaining
cases (1) were still in the field at the end of the follow-up
period (n 157), (2) refused (n 135), (3) had absent
subject data (n 71), (4) were deceased with data
(n 71), (5) were deceased with no data (n 17), or
(6) were unavailable (n 11). Thus, the final response
rate for the full follow-up interview was 65.3%, or 1758
of the originally 2694 eligible sample of individuals.
Now consider the National Youth Survey (NYS), a large-scale panel study that was administered to a national
probability sample of youth ranging from age 11 to age
17 in 1976. The survey also employed multistage cluster
sample techniques to produce a representative sample of
United States households. Respondents were then selected from sampled households and interviewed. About
27% of the 2360 individuals who were selected to participate in the study did not participate, leaving a final sample
of 1725 respondents. By the fifth wave of the survey, 1485
of these individuals had been retained and the remaining
230 respondents dropped out. This produces an overall
attrition rate of about 13.4% from the first to the fifth
wave of the survey. Data on mortality and exposure time
have not yet been made publicly available. In sum, the
methodological challenges raised here are pertinent
to the follow-ups of these and other important data
sources. The important question for researchers is the
extent to which attrition, mortality, and exposure time
are random or nonrandom.
Conclusion
Missing data in panel studies represent an important
methodological issue that cannot be bypassed when
studying continuity and change in behavior over time.
The results of a recent study help to underscore the
Studies,
Further Reading
Allison, P. (2001). Missing Data. Sage, Newbury Park,
California.
Brame, R., and Piquero, A. (2003). The role of sample attrition
in studying the longitudinal relationship between age and
crime. J. Quant. Criminol. 19, 107128.
Cernkovich, S. A., Giordano, P. C., and Pugh, M. D. (1985).
Chronic offenders: The missing cases in self-report
delinquency research. J. Crim. Law Criminol. 76,
705732.
Dempster-McClain, D., and Moen, P. (1998). Finding
respondents in a follow-up study. In Methods of Life
Course Research: Qualitative and Quantitative Approaches
(J. Z. Giele and G. H. Elder, Jr., eds.), pp. 128151. Sage,
Newbury Park, California)
101
Audiovisual Records,
Encoding of
Marc H. Bornstein
National Institute of Child Health and Human Development,
Bethesda, Maryland, USA
Charissa S. L. Cheah
University of Maryland, Baltimore, County, USA
Glossary
audiovisual records Videotape or digital video representations of an ongoing behavioral stream.
behavior codes Operational definitions of the parameters of
behaviors.
content validity The adequacy with which an observation
instrument samples the behavior of interest.
continuous coding The complete and comprehensive coding
of an audiovisual record.
duration The total time in which a behavior of interest
occurs.
field testing The observation of behavior where it occurs and
the application of a coding system for feasibility.
frequency The number of discrete times a behavior of
interest occurs according to some conventional assignment
of an interbehavior interval.
observation system Formalized rules for the extraction of
information from a stream of behavior.
operational definitions Discrimination rules that specify
target behaviors to be studied.
reactivity Atypical responses from individuals who are being
observed.
sampling Behavioral coding that relies on systematic but
partial coding.
Introduction
The Significance of Behavioral
Observation
Behavioral observation involves recording the manifest
activities of individuals. Direct observation is consistent
with an epistemological emphasis on overt behavior,
quantification, and low levels of inference. For these
reasons, observational procedures are rigorous and powerful, providing measures of both behavioral frequency
and duration on the basis of their occurrence in the noninterrupted natural time flow. This approach to assessment has been called the sine qua non of social science
research; that is, observation is often considered the gold
standard against which other kinds of assessments should
be evaluated. For these reasons, observation has been
employed in the vast majority of published research.
The Challenges
This entry addresses the questions of what and how
to observe about behavior. Conceptual, theoretical, practical, and ethical considerations govern the target behaviors of observation. Audiovisual record data collection
procedures; coding systems, including issues related to
reliability and validity; selecting and training coders; various methods of scoring observational data; and recording
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
103
104
Overview
The questions of what and how to observe are the main
subjects of this entry. Determining the target behaviors of
observation is governed by conceptual, theoretical, practical, and ethical considerations. We review audiovisual
record data collection procedures. Coding systems are
then discussed, including issues related to the reliability
and validity of observational data, procedures for selecting
and training coders, various methods of scoring observational data, and a brief review of recording technologies.
After this, we present two illustrations.
unstructured settings suffer inherent problems and limitations, and the codes that are developed to score them
must take these shortcomings into account.
105
106
Continuous Coding
Videotape and digital video technology and scoring
systems based on complete audiovisual records today
permit the recording, documentation, aggregation, and
analysis of all (rather than sampled) ongoing behaviors.
In continuous comprehensive coding, behaviors of interest from an entire observation can be quantified. This
strategy serves several functions: It reduces instantaneous
demands on observers, it allows focused coding, and it
facilitates accurate assessments and reliability. However,
the amount of time necessary for data reduction increases
concomitantly.
Audiovisual records coded using computer-based
coding systems are applicable to a broad spectrum of
behavioral data collected in a wide variety of settings.
Audiovisual records enable observers to spend their
time observing and recording behavior with no distractions imposed by coding requirements; reciprocally,
a computer-based coding system enables coders to
spend their time coding behaviors with no distractions
imposed by observation requirements. With audiovisual
records in hand, it is easy to succumb to the temptation to
undertake highly detailed analyses.
Coders
Coders are meant to bring objectivity to evaluations of
behavior; although variation in rating behaviors should
decrease with training, evaluative biases may not. Shared
meaning systems serve to ensure consensus among coders, and judicious observational training helps to create
and consolidate shared perceptions among coders.
Nonetheless, coders may make systematic errors in
assessment and hold biases based on their informationprocessing limitations and expectations. Coders of
behavior may miss informationthe human visual and
auditory senses can be insensitive or unreliable in detecting certain behaviors; coders can also suffer from information overloadwhen a large number of target
behaviors occur rapidly or frequently within a short
period of time, a coder may have difficulty detecting
or recording all of the behaviors. Coders sometimes see
patterns of regularity and orderliness in otherwise
complex and disordered behavioral data; coders sometimes harbor or develop correct or incorrect hypotheses
about the nature and purpose of an investigation, how
participants should behave, or even what constitute
appropriate data.
To address issues of coder bias and to maintain the
accuracy of quantitative measures, it is necessary to
Scoring
Computerized scores of continuous behavior streams normally yield measures of frequency, total and average duration, rate, and sequencing of component behavior
codes. Frequency is the number of discrete times
a behavior occurs according to some conventional assignment of an interbehavior interval (IBI). Frequencies obtained by continuous coding depend to a certain degree
on arbitrary parameters; for example, in order to define
two instances of a behavior (infant vocalization) as separate rather than as the same continuous vocalization, the
time between the two instances must equal or exceed
a specified minimum IBI. The standardized unit of frequency is the rate or relative frequency of occurrence per
unit time. The duration is the total time that the behavior
occurs. The standardized unit of duration, capturing the
proportional nature of the measure, is prevalence per unit
time. The mean duration of a behavior is its total duration
divided by its frequency. These statistics summarize the
nature of behavior occurrences within the real-time observation session. They describe quantitatively the various
aspects of the behavior that are coded in a series of clock
times. Continuous and comprehensive coding in realtime yields unbiased estimates of behavior frequency
and duration.
A separate coding pass is normally made through each
audiovisual record for each behavioral mode. When coding long audiovisual records, coding in shorter blocks
helps coders to maintain the high level of concentration
needed to score accurately without unduly prolonging
Recording Technologies
Several commercially available systems for the collection,
analysis, presentation, and management of observational
data are now widely used to implement continuous coding of audiovisual records. The most popular include Behavioral Evaluation Strategy and Taxonomy (BEST),
INTERACT, and The Observer. Ethnograph is a qualitative data version. Most programs share the same virtues.
All these software packages facilitate real-time collection
and analysis of real-life situations or video or multimedia
recordings of observational category system data automatically. They record the start and stop times of multiple,
mutually exclusive, or overlapping events. They comprehensively represent data by automatically recording
107
Illustrations
Infant-Mother Interaction
It is uncommon in research for the absolute frequencies
of behaviors to be of interest (e.g., whether infants vocalize 7, 10, or 15 times). Although in certain instances population base rates convey meaningful information,
researchers typically use relative frequencies to compare
individuals or groups (e.g., Who vocalizes more: typically
developing or Downs babies?) and to rank individuals or
groups on particular variables with an eye to relating the
ranks to other variables (e.g., Does infant vocalization
predict child language development?). It is assumed, although not established, that continuous coding provides
a more accurate reflection of reality than do sampling
techniques. Do continuous recording and partial-interval
sampling procedures allow investigators to reach similar
conclusions regarding the relative frequency and standing
of behaviors? Do partial-interval sampling procedures
produce reliable estimates of actual frequencies obtained
108
that assess maternal perceptions of dispositional characteristics (e.g., emotionality, activity level, shyness, and
soothability). Factors assessing emotionality (five items;
e.g., child often fusses and cries) and soothability (five
items; e.g., when upset by an unexpected situation, child
quickly calms down) were composited to form an index of
emotion disregulation comprising high negative emotionality and low soothability.
Children were assigned to quartets of unfamiliar samesex, same-age peers and observed in a small playroom
filled with attractive toys. Behaviors in the peer play session were coded in 10-s intervals for social participation
(unoccupied, onlooking, solitary play, parallel play, conversation, or group play) and the cognitive quality of play
(functional, dramatic, and constructive play; exploration;
or games-with-rules). For each coding interval, coders
selected 1 of 20 possible combinations of cognitive play
nested within the social participation categories. The proportion of observational intervals that included the display
of anxious behaviors (e.g., digit sucking, hair pulling, or
crying) was also coded. Time samples of unoccupied, onlooking, and anxious behaviors were combined to obtain
an index of social reticence.
The peer quartet was followed 6 8 weeks later by a visit
to the laboratory by each child and his or her mother. All
children and mothers were observed in two distinct
mother-child situations: During an unstructured freeplay session, mother and child were told that the child
was free to play with anything in the room (15 min); during
a second session, the mother was asked to help guide and
teach her child to create a Lego structure that matched
a model on the table at which mother and child were seated
(15 min). Mothers were asked not to build the model for
the child and to refrain from touching the materials during
this teaching task, which was thought to be challenging for
a 4-year-old. A maternal behavioral rating scale measure
was used to assess: (1) proximity and orientation: the parents physical location with reference to the child and parental nonverbal attentiveness; (2) positive affect: the
positive quality of maternal emotional expressiveness toward the child; (3) hostile affect: negative instances of
verbal and nonverbal behavior arising from feeling hostile
toward the child; (4) negative affect: the negative quality of
maternal expressiveness that reflects maternal sadness,
fearfulness, and/or anxiety in response to the childs behavior; (5) negative control: the amount of control a mother
exerts over the child that is ill-timed, excessive, and inappropriately controlling relative to what the child is doing;
and (6) positive control and guidance: the amount that
the mother facilitates the childs behavior or provides
supportive assistance that is well-timed.
The free-play and Lego-teaching-task sessions were
coded in blocks of 1 min each. For each 1-min interval,
observers rated each of the maternal behaviors on a threepoint scale, with higher maternal behavioral ratings
109
110
emotional disregulation and social reticence among preschoolers whose mothers provided appropriate control
during the Lego paradigm. Thus, in structured situations
in which parental goals are task oriented (such as the Lego
teaching task), the display of maternal direction and
guidance strategies may be normative and appropriate.
Conclusion
Computerized systems for the coding of audiovisual
records of observation are comprehensive and versatile,
with many potential applications in social science research. They can be used to code data collected in naturalistic as well as structured situationsat home, in the
laboratory, and in clinical or educational settings; they can
be adapted for actors of any age; they can be used to code
behavior or to measure other dimensions of the environment; and they can be applied in real time or can be
adapted for coding tasks that require a more prolonged
or repetitive examination of the database. Regardless of
the nature of the data collected, the recording system
preserves most of the information available in the original
data. Multiple behavior records can be juxtaposed, yielding new, composite behavior codes or information about
the ways in which behaviors vary in relation to one another. This information can be used in the continual, iterative process of validating and revising the coding
system. Data storage, retrieval, transmission, and manipulation are computer-based and therefore efficient, both
in terms of time and space.
Acknowledgment
This article summarizes selected aspects of our research,
and portions of the text have appeared in previous scientific publications (see Further Reading). We thank
C. Varron for assistance.
Further Reading
Arcus, D., Snidman, N., Campbell, P., Brandzel, S., and
Zambrano, I. (1991). A program for coding behavior
Autocorrelation
Harold D. Clarke
University of Texas, Dallas, Richardson, Texas, USA
Jim Granato
National Science Foundation, Arlington, Virginia, USA
Glossary
autocorrelated errors Correlations between stochastic
errors ordered over either time or space in a model,
typically a linear regression model.
autoregressive process A data-generating process with
memory, such that the value of the process at time t
reflects some portion of the value of the process at
time t i.
common factor restriction A restriction on model parameters, produced in the context of linear regression
analysis by the use of a quasi-differencing procedure as
a generalized least-squares correction for autocorrelated
errors.
fractionally integrated process A data-generating process
with signficant autocorrelations at long lags, such that
the system has long memory and shocks erode very
slowly.
integrated process An autoregressive process with perfect
memory, such that shocks to the system at time t are not
discounted in subsquent time periods.
minimum state variable (MSV) solution A solution procedure for rational expectations models that uses the
simplest, least parameterized characterization.
near-integrated process An autoregressive process where
a very large portion of the value of the process at time t
carries over to time t 1.
rational expectations equilibrium (REE) Citizen expectations, based on all available information (in the
model), about an outcome that equals the outcome on
average.
stationary A time (data) series (or model) is stationary if there
is no systematic change in the mean (e.g., no trend), if there
is no systematic stochastic variation, and if strict periodic
variations (seasonal) are stable. Time plays no role in the
sample moments.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
111
112
Autocorrelation
^Et ^Et1 2
P 2
:
^Et
where vt N(0,Ps ).
P
Noting that ^E2t and ^e2t1 are approximately equal
when N is large, the formula for d implies that
d 2(1 r). Thus, when there is perfect positive firstorder autocorrelation (i.e., r 1.0), d 0, and when there
is perfect negative first-order autocorrelation (i.e.,
r 1.0), d 4. Critical values for d vary by the number
of regressors in a model and the number of data points and
are characterized by an indeterminate zone, where it is
unclear whether the null hypothesis should be rejected.
Econometricians advise that the upper bound (du) of
the critical values should be used in circumstances when
regressors are changing slowly and caution that the test is
not valid when one of the regressors is a lagged endogenous variable (e.g., Yt1). When a lagged endogenous
variable is present, other tests (e.g., Durbins h, Durbins
M) should be used.
If the null hypothesis of no (first-order) autocorrelation
is rejected, the traditional response is to treat the autocorrelation as a technical difficulty to be corrected,
rather than evidence of possible model misspecification.
The correction is to transform the data such that the error
term of the resulting modified model conforms to the OLS
assumption of no autocorrelation. This generalized leastsquares (GLS) transformation involves generalized differencing or quasi-differencing.
Starting with an equation such as Eq. (1), the analyst
lags the equation back one period in time and multiplies it
by r, the first-order autoregressive parameter for the
errors [see Eq. (2) above]. Illustrating the procedure
for a model with a single regressor, the result is
rYt1 rb0 rb1 Xt1 rEt1 :
Autocorrelation
113
114
Autocorrelation
80
Random walk
with drift
70
60
Value of series
50
Random
walk
40
30
20
Stationary
AR(1) = 0.50
10
0
10
20
40
60
80
100
120
140
160
180
200
220
240
Time period
such as Eq. (10) are variants of the more familiar autoregressive distributed lag form. For example, Eq. (10) may
be written as:
Yt b0 1 aYt1 b1 Xt ac b1 Xt1 vt :
11
Note also that since all variables in a model such as
Eq. (10) are stationary, the spurious regression problem
does not arise. Thus, if other conventional assumptions
hold, the parameters in model (10) may be estimated via
OLS. Engle and Granger suggest a two-step process,
where step 1 is to regress Y on X in levels. Assuming
that the residuals from this regression are stationary,
step 2 is to estimate the parameters in an error correction
model such as Eq. (10). Other analysts have advocated
a one-step method in which all coefficients in an error
correction model are estimated simultaneously. If Y and X
do not cointegrate, a will not differ significantly from zero.
An error correction model specification is attractive
because it enables one to study both short- and longrun relationships among nonstationary variables.
However, establishing that a variable is nonstationary
in the classic sense can be difficult. This is because the
principal statistical tool for this purpose, unit-root tests,
has low statistical power in the face of alternative DGPs
that produce highly persistent data. Two such alternatives
are the near-integrated and fractionally integrated cases.
A near-integrated variable is the product of an autoregressive process [e.g., model (8) above], where the f1
parameter is slightly less than 1.0 (e.g., 0.95). In this
case, unit-root tests are prone to fail to reject the null
Autocorrelation
jL
ot :
fL
12
20
16
Value of series
12
Fractionally
integrated
d = 0.95
4
Near integrated
AR(1) = 0.95
0
Stationary
AR(1) = 0.50
4
20
40
60
80
115
Figure 2 Simulated stationary first-order autoregressive, near-integrated, and fractionally integrated processes.
116
Autocorrelation
1.0
Value of autocorrelation
0.8
0.6
0.4
0.2
0.0
0.2
0.4
5
10
15
20
25
30
35
40
45
50
55
60
Time lag
Random walk
AR(1) = 0.95
Fractional, d = 0.95
AR(1) = 0.50
Figure 3 Autocorrelation functions for simulated autoregressive, nearintegrated, fractionally integrated, and random walk processes.
Autocorrelation
Macropartisanship
Mt a1 Mt1 a2 Et1 Mt a3 Ft u1t :
Equation (14) represents citizens impression (favorability) of a political party (Ft). In this model, favorability is
a linear function of the lag of favorability (Ft1) and an
advertising resource variable (At). There are many ways to
measure political advertising resources. These measures
include but are not limited to the total dollars spent, the
dollars spent relative to a rival party (parties), the ratio of
dollars spent relative to a rival party (parties), and the
tone, content, timing, and geographic location of the advertisements (on a multinomial scale). Data have been
collected for individual races but have the potential to
be aggregated along partisan lines. For more details
on measurement issues consult the Wisconsin Advertising Project Web site at: http://www.polisci.wisc.edu/
tvadvertising. u2t is a stochastic shock that represents unanticipated events (uncertainty), where u2t N0, s2u2t .
The parameter b1 0, while b2 v 0 depending on the
tone and content of the advertisement.
Favorability
Ft b1 Ft1 b2 At u2t :
14
16
and
a1
a2
Mt1
Et1 Mt
1 b2 c2
1 b2 c2
Mt
b2 c 1
b1 b2 c3
At 1
Ft1
1 b2 c2
1 b2 c2
b2 c1
u2t u1t
M
:
1 b2 c2
1 b2 c2
b2 c2 Mt M u2t:
17
19
Simplifying the notation shows that there is an autoregressive component in the reduced form for macropartisanship
Mt Y0 Y1 Mt1 Y2 Et1 Mt
Y3 At1 Y4 Ft1 et ,
20
where Y0 b2 c1 Y =1 b2 c2 , Y1 a1 =1 b2 c2 ,
Y2 a2 =1 b2 c2 , Y3 b2 c1 =1 b2 c2 , Y4 b1
b2 c3 =1 b2 c2 , and et u2t u1t =1 b2 c2 :
The system is now simplified to a model of macropartisanship that depends on lagged macropartisanship
and also the conditional expectation at time t 1 of current macropartisanship. The prior values of advertising
and favorability also have an effect.
To close the model, the rational expectations equilibrium can be solved by taking the conditional expectation at
time t 1 of Eq. (20) and then substituting this result back
into Eq. (20)
Mt P1 P2 Mt 1 P3 At2 P4 Ft2 x0t ,
where
21
Y0
Y3
Y4
P1
b2 c2 Y ,
1 Y2
1 Y2 1 Y2
Y1
Y3
Y4
b2 c2 ,
P2
1 Y2
1 Y2 1 Y2
P3
Y3
Y4
b2 c 1 ,
1 Y2
1 Y2
Y3
Y4
c3
b1 b2 c3 ,
P4
1 Y2
1 Y2
and
Ft b1 b2 c3 Ft1 b2 c1 At1
18
At c1 At1 c2 Mt M c3 Ft 1 :
117
x0t
Y4
u2t et :
1 Y2
118
Autocorrelation
a1 b2 c2 c1 b1 b2 c3
:
1 b2 c2 a2
22
:
qc2
1 a2 b2 c2 2
23
1.0
0.8
0.6
0.4
0.2
0.0
0.2
0.4
0.6
0.8
1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0
c2
Figure 4
Autocorrelation
Further Reading
Beran, J. (1994). Statistics for Long Memory Processes.
Chapman and Hall, New York.
Box, G. E. P., and Jenkins, G. (1976). Times Series Analysis:
Forecasting and Control, revised Ed. Holden Day,
Oakland, CA.
Clarke, H. D., and Lebo, M. (2003). Fractional (co)integration
and governing party support in Britain. Br. J. Polit. Sci. 33,
283301.
DeBoef, S. (2001). Modeling equilibrium relationships: Error
correction models with strongly autoregressive data. Polit.
Anal. 9, 7894.
DeBoef, S., and Granato, J. (2000). Testing for cointegrating
relationships with near-integrated data. Polit. Anal. 8,
99117.
Erikson, R. S., MacKuen, M., and Stimson, J. A. (2002). The
Macro Polity. Cambridge University Press, Cambridge, UK.
119
Scott Greer
University of Prince Edward Island,
Charlottetown, Prince Edward Island, Canada
Glossary
applied research Concerned with practical knowledge;
outcome focused rather than theory focused, and involving
the application of existing knowledge to solve problems.
applied science Knowledge directed toward producing a product for public interest, and perhaps developing its
commercial value.
basic research Involves questions and investigative practices
that are focused on discovering or formulating fundamental
principles; generally inspired by scientific curiosity rather
than by the need to solve a particular problem.
basic science Involves theories about the world that are
considered foundational to human understanding.
human science Includes methods and theories based on the
idea that human beings are fundamentally unlike other
components of the natural world, and that methods and
assumptions different from those applied to the natural
sciences are necessary to understand human conduct.
ideographic research The study of a single individual case
(e.g., a case study); contrasted with nomothetic research.
natural science Includes methods and theories devoted to
understanding the natural world through the measurement
of systematic and controlled observations.
nomothetic research The study of groups of individuals, and
the search for universal generalizable laws of behavior.
paradigm A broad set of assumptions that works as
a framework of knowledge.
social science The study of individual human behavior as
individuals and in groups.
Introduction
Basic social science research involves questions and investigative practices that are focussed on discovering or
formulating fundamental principles of human behavior,
and is generally inspired by the scientists curiosity rather
than an attempt to solve a particular problem. Applied
social science research, driven more by practicality, is
outcome focused rather than theory focused. Basic research answers questions relating to the whys and
hows of human behavior, and does not typically result
in a product or technology. Applied research, on the other
hand, is directed toward producing an improved product,
or toward finding ways of making the use and delivery of
the product better and easier. In short, basic research
tends to improve our understanding of the world, and
applied research tends to improve our ability to function
and interact with it. To fully understand this distinction,
however, it is important to realize that basic and applied
research are sets of practices that are based on two differing conceptions of science, with differing professional
interests. As a result, the relationships between them are
dynamic, highly complex, and often quite ambiguous.
Although the distinction between basic science and
applied science had existed in various forms since the
19th century, contemporary understanding of these
terms is based on the effects of World War II, and the
changes it brought about in the ways science operates
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
121
122
123
124
From Pseudoscience to
Social Science
Early Approaches to Applied Knowledge
A variety of applied social science practices existed long
before the formal disciplines were founded. One could
argue that the assessment and treatment of individuals,
whether for abnormal behavior or other reasons, are as old
as recorded history. The witch doctor, priest, and shaman
are all practitioners of assessment and intervention on the
individual and society, although the means of establishing
credibility and training have evolved considerably, as have
the forms of treatment. Prior to the emergence of the
social sciences, there were other forms of social and psychological measurement that would be regarded today
as pseudoscience, in that they appeared to be based on
empirical observation and measurement, but they often
relied on untenable assumptions or failed to demonstrate
a set of testable principles. Some of these included phrenology, which claimed that bumps and indentations on the
skull were indications of character and aptitudes; physiognomy, which proposed that the face and facial expressions revealed the persons character and intellect; and
animal magnetism, the idea that mental and physical pathologies were a result of the bodys magnetic fields being
out of alignment. This latter theory proposed that magnets
could be applied to the body to realign the errant fields.
Franz Mesmer made this treatment famous in the late
18th century, claiming that his hands alone possessed
enough animal magnetism to cure his patients. Mesmer
would pass his hands over his patients in a dramatic
fashion, telling them what they would experience, and
125
Kinnebrook for sloppiness, but that only made the discrepancy greater. As a result, Kinnebrook was fired for
incompetence, although as Friedrich Bessel was to discover, it was through no fault of his own. Some 20 years
later, Bessel discovered what he called personal equations: corrections to individual systematic differences in
reaction time among observers. Bessel found, in other
words, that such differences were a normal part of
astronomical measurements, and could be adjusted by
adding to or subtracting from individual measurements.
A thorough understanding of the implications of this
pointed to some key changes in the way we understood
scientific knowledge: (1) the observer influences the observation, (2) the need to understand how the physical
world was represented through human sensation and
perception, and (3) the gradual realization that knowledge about the natural world deals less with certainty
and more with probability (i.e., the likelihood of a particular value being true or accurate).
126
importance. Moreover, the individual was no longer separate from the world, but had become interwoven with not
only how society functions, but with the very possibility of
knowledge about the world.
The 19th century would bring great advances in the
biological sciences, and the research that would figure
prominently in the origins of the social sciences would
be that which addressed the discrepancy between objective and subjective reality. One of the earliest research
areas in psychology was the area known as psychophysics, which was defined as the experimental investigation
of the relationship between changes in the physical environment and changes in the perception of the environment. Hermann von Helmholtz, among others, laid the
foundation for such research in demonstrating that
nerve impulses move at a measurable rate, and in proposing a theory of color vision (known as the Young
Helmholtz theory) that shows the perception of all
color is based on three types of color receptors in the
body, corresponding to the primary colors of red, green,
and blue-violet. In each case, it can be seen that perception
is far from just a mirror of reality, but is constructed through
the physiological processing of sensation and learned inferences from our environment; physical reality is not the same
as psychological reality. The case of reaction times is but one
example; visual perception is another. According to
Helmholtz, visual perception involves unconscious inferences about distance and size (e.g., the image on our retina
sees the railroad tracks converge in the distance, but the
tracks are perceived as parallel).
The problems facing scientists were thus both basic
and applied: how do we perceive reality (basic) and
how do we address the problems arising from human
technology interactions (applied)? These questions
were both pursued, initially, as essentially the same question. However, they have become differentiated over
time, and scientists with interests in discovery are
drawn toward basic research, and those interested in applications work in the field of applied research. In
Germany, where a number of classic studies in physiology
were carried out during the 19th century, there was
a clear focus on basic physiological research, with the
notion of applications to a market secondary. Of course,
at the time, there was not the enormous complex of industry and business that there is today, nor the multitudes
of professional and scientific interests that they bring.
this underlying desire to know the world and to help improve our place in it.
Consider now a specific example of the ways in which
applied social science differs from basic science, looking
in particular at the way knowledge is transformed as it
evolves from basic theoretical issues to issues of application and social practice.
127
that information about the self was collected and disseminated. The idea that psychological tests could render
mental attitudes, beliefs, and abilities in objective
terms was found to be incredibly useful, and psychological
research entered into the era of the questionnaire.
A central theme in our understanding of basic and
applied social science research has been language and
the purposes for which research is carried out; here we
see that the shift toward applied research involves
changes not only in how the concept is used, but in
how it is defined and understood at a basic level. This
difference is far from simply semantics, but represents
a conceptual evolution in the meanings of scientific
constructs. Similar changes were made with other personality measures during the 1940s and 1950s. For example, the early predecessors of contemporary personality
testing were the projective tests, such as the Thematic
Apperception Test and the Rorschach Inkblot Test.
These tests purport to tap into the preconscious and unconscious forces at work within the person. Earlier examples of this same general idea resided in physiognomy, or
the analysis of handwriting or other forms of personal
expression. These early personality tests were based on
the same basic idea: that we project parts of who we are
into verbal and physical expressions, and these can be
interpreted to give us insight about hidden parts of our
personality. However, as psychology began to strive for
more measurable and quantifiable means of measuring
the person, these subjective methods fell out of favor (but
by no means completely), and were replaced by more
quantitative approaches.
128
ConclusionThe Future of
Basic and Applied Social
Science Research
The distinction between basic and applied social science is
based on an evolving relationship between different forms
of scientific and social practice. Basic social science, on
the one hand, seeks to discover the fundamental
principles of human behavior, whereas applied social science attempts to create and improve technologies that
Further Reading
Benjamin, L. T., and Baker, D. B. (2004). From Seance to
Science: A History of the Profession of Psychology in
America. Thomson Wadsworth, Belmont, California.
Camic, C., and Xie, Y. (1994). The statistical turn in American
social science: Columbia University, 1890 to 1915. Am.
Sociol. Rev. 59, 773 805.
Danziger, K. (1987). Statistical method and the historical
development of research practice in American psychology.
In The Probabilistic Revolution, Vol. 2: Ideas in the Sciences
(L. Kruger, G. Gigerenzer, and M. Morgan, eds.). MIT
Press, Cambridge.
Danziger, K. (1990). Constructing the Subject: Historical
Origins of Psychological Research. Cambridge University
Press, Cambridge.
Easthope, G. (1974). A History of Social Research Methods.
Longman, London.
Gigerenzer, G., Swijtink, Z., Porter, T., Daston, L., Beatty, J.,
and Kruger, L. (1989). The Empire of Chance: How
Probability Changed Science and Everyday Life.
Cambridge University Press, Cambridge.
Leahey, T. H. (2004). A History of Psychology: Main Currents
in Psychological Thought. Pearson Prentice Hall, Upper
Saddle River, New Jersey.
Stigler, S. (1986). The History of Statistics. Harvard University
Press, Cambridge, Massachusetts.
Wright, B. D. (1997). A history of social science measurement.
Educat. Measure. Issues Pract. Winter, 33 52.
Bayes, Thomas
Andrew I. Dale
University of KwaZulu-Natal, Durban, South Africa
Glossary
Act of Uniformity An act, passed by the anti-Puritan
parliament after the Restoration, decreeing that all
ministers who were not episcopally ordained or who
refused to conform were to be deprived of their livings.
Bayess Theorem A formula that allows the determination of
the posterior probabilities P[Ci j E] of the (possible) causes
Ci, given the occurrence of an event E, in terms of the prior
probabilities P[Ci] (determined anterior to the conducting
of the current investigation) and the likelihoods P[E j Ci] of
the event, given the causes Ci.
Bayesian statistics That branch of modern statistics in which
Bayes Theorem plays a fundamental role in the incorporation of past experience into the making of decisions,
statistical analyses, and predictions.
fluxionary calculus Newtons development of the calculus,
concerned with problems of tangency and quadrature.
prime and ultimate ratios Introduced by Newton as
a rigorous justification of the methods of his fluxionary
calculus, these ratios, analogous to the modern right- and
left-hand limits, are concerned with the ratios of magnitudes
as generated by motion.
Genealogy
The family of which Thomas Bayes was a member can be
traced back to the early 17th century. The city of Sheffield,
in Yorkshire, England, has long been known for the
manufacture of steel, iron, and brassware, and Thomass
forebears were of considerable importance in the Company of Cutlers of Hallamshire; for instance, one Richard
Bayes was Master of the Company in 1643, as was his son
Joshua in 1679. But the Bayes family was known not only
in business circles. The 17th and 18th centuries were
times during which Nonconformity and Dissent became
both of importance and of concern to the Established
Church, and Richards second son, Samuel, having studied at Trinity College, Cambridge, was among those
ejected in 1662 from his living because of his refusal to
accept in full the doctrines of the Established Church and
to take the oaths demanded of her clergy. Samuel moved
to the village of Grendon St. Mary, near Wellingborough
in the county of Northamptonshire, where he probably
remained for some years before moving to Sankey in
Lancashire.
Samuels younger brother, Joshua, rose in the ranks of
the Sheffield cutlery industry. He married Sarah Pearson
on 28 May 1667, seven children issuing from this union.
Ruth, Joshua and Sarahs eldest daughter, married Elias
Wordsworth (who became a Town Burgess in Joshuas
place on the latters death in 1703); another daughter,
Elizabeth, married John de la Rose, minister of the
Nether Chapel in Sheffield. Joshua and Sarahs eldest
son, Joshua, baptized on 10 February 1670 (old style),
married Anne Carpenter.
The younger Joshua entered Richard Franklands
Academy at Rathmell, in Yorkshire, in 1686. Frankland,
a dissenting minister, having been ejected from his living,
had started the first Nonconformist academy in that town
in 1669. The various laws and regulations aimed at oppressing the Nonconformists resulted in the academy
having to move several times, before it ended up once
again at Rathmell. Although the training provided by such
dissenting academies was not restricted to those who
felt they had a vocation, Joshua must have attended
this academy with at least some interest in making the
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
129
130
Bayes, Thomas
Bayes, Thomas
Bayess Works
Tract on Divine Benevolence
In 1731, an anonymously published tract appeared. Entitled Divine Benevolence, or, an attempt to prove that the
principal end of the divine providence and government is
the happiness of his creatures, this tract, now known to be
by Thomas Bayes, was a rebuttal to an earlier one by John
Balguy entitled Divine Rectitude, and was in turn followed
by Henry Groves Wisdom, the first Spring of Action in the
Deity. All three authors were trying in these works to find
a single principle to which Gods moral principles could
be ascribed. Whereas Balguy maintained that Gods moral
attributes, such as truth, justice, and mercy, were modifications of His rectitude, Bayes found the fundamental
principle to be benevolence, that is, Gods kind affection
towards his creatures, leading to the conferring on the
universe of the greatest happiness of which it is capable.
Bayess argument is closely reasoned and wide ranging
(the tract is some 75 pages long), critical of Balguy in
some places and in agreement with him in others, but
there is perhaps room for doubt as to whether it is entirely
convincing.
131
Tract on Fluxions
Bayess second printed work, An Introduction to the
Doctrine of Fluxions, was also published anonymously.
Its attribution to Bayes is made on the authority of the
19th-century mathematician and bibliophile Augustus
de Morgan, who is most reliable in such matters. Writing
in response to Bishop George Berkeleys The Analyst,
Bayes was concerned more with the logical theory of
Isaac Newtons prime and ultimate ratios than with either
moments or the methods of the fluxionary calculus.
Though he in general approved of the bishops attention
to fluxionary matters, Bayes could not agree with the incorporation of religious aspects, and indeed he declared in
his introduction to the tract that he would restrict his
attention to an endeavour to shew that the method of
Fluxions is founded upon clear and substantial
principles. To this end, he set down postulates, definitions, axioms, propositions, and corollaries. The propositions are carefully proved (though the proofs may
sometimes seem slightly deficient), and Bayess defense
of Newton against Berkeley seems unobjectionable. The
paper shows a logical mind with concern for mathematical
rigor, and the arguments adduced, without the use of limit
theory and nonstandard analysis, are perhaps as sound as
was possible at that time.
A Semiconvergent Series
Within a few years of Bayess death, three posthumous
papers under his name and communicated to the Royal
Society by Richard Price appeared in the Philosophical
Transactions. The first of these was little more than a note
on some aspects of series, the most important being
concerned with the well-known Stirling-de Moivre expansion of the series for log z! as
1=2log2p z 1=2log z z 1=12z
1=360z3 1=1260z5 :
Bayes showed that the series actually failed to converge
in general, a fact that he was apparently the first to note
(though Leonhard Euler, some 6 years before the death
of Bayes, had noted the failure for the special case
z 1). Comments are also found here on the divergence
of similar series for log(2z 1)!!
P [with n!! n(n 2)!!
and n!! 1 for n 0 or 1] and n(k/nr).
Papers on Chance
The second posthumously published paper by Bayes is
the important An Essay towards Solving a Problem in
the Doctrine of Chances. Here Bayes provided the
seeds of the modern ideas of inverse, prior, and posterior
probabilities, and indeed the whole theory of Bayesian
132
Bayes, Thomas
The Essay is set out in a way that would be quite acceptable to a modern mathematician. The given question is
answered, though its solution requires the acceptance of
an assumption that has generated a considerable amount
of controversy since its publicationthat is, in essence
and perhaps somewhat crudely, the assumption of
a uniform distribution as a prior when one is in a state
of ignorance.
Three rules are presented for the obtaining of bounds
to the exact probability required (Bayess solution is effectively given as an infinite series), and proofs of these
rules were given in a supplement to the Essay in
a subsequent issue of the Philosophical Transactions,
with improvements, obtained by Price, of Bayess bounds.
Price also added an appendix to the Essay, in which he
explored the use of Bayes results in a prospective sense.
He developed a Rule of Succession (e.g., if an event is
known to have occurred m times in n trials, what is the
probability that it will occur on the next trial?), and discussed the place of Bayess results in induction.
The importance of these posthumous papers on probability cannot be denied, and though it is not expedient to
speculate here on the reason for Bayes not having published his results, it might be noted that this is sometimes
assumed to flow from his suggested modesty.
Bayes, Thomas
133
Further Reading
Bailey, L., and Bailey, B. (1970). History of Non-conformity in
Tunbridge Wells. Typescript copy in Tunbridge Wells
Library, Kent, United Kingdom.
Bellhouse, D. R. (2002). On some recently discovered manuscripts of Thomas Bayes. Histor. Mathemat. 29, 383394.
Dale, A. I. (1999). A History of Inverse Probability from
Thomas Bayes to Karl Pearson, 2nd Ed. Springer-Verlag,
New York.
Dale, I. A. (2003). Most Honourable Remembrance: the Life
and Work of Thomas Bayes. Springer-Verlag, New York.
Bayesian Statistics
Scott M. Lynch
Princeton University, Princeton, New Jersey, USA
Glossary
conditional probability density function A density for
a random variable that is the ratio of a joint density for two
random variables to the marginal density for one. For
example, f A j B f A, B=f B. Often simply called conditional density.
joint probability density function A probability density
function that assigns probabilities to a set of random
variables (see probability density function).
marginal probability density function A density for
a random variable in which all other random variables
Rhave RRbeen integrated out. For example, f A
...
f A, B, C, . . . dB dC: . . . Often called a marginal
density or marginal pdf.
normalizing constant A constant that ensures that
a probability density function is proper, that is, that it
integrates to 1.
probability density function (pdf ) A function that assigns
probabilities to random variables in a continuous parameter
space. A function that assigns probabilities on a discrete
parameter space is called a probability mass function, but
many use pdf for both types of spaces. In both cases, the
function must integrate/sum to unity to be a proper density.
The pdf is often referred to as simply a density, and the
term is also synonymous with distribution.
sampling density The joint probability density for a set of
observations. A normalizing constant is required to make it
proper (i.e., a true density). Expressing an unnormalized
sampling density as a function of the parameters rather
than the data yields a likelihood function.
approach, although results obtained via Bayesian and classical approaches are often numerically similar, differing
only in interpretation.
Bayess Theorem
In 1763, Reverend Thomas Bayes introduced a theorem
for calculating conditional probabilities that ultimately
provides a recipe for updating prior uncertainty about
parameters of distributions using observed data. Bayess
theorem is simply a double application of the well-known
conditional probability rule, and the mathematical basis
for the theorem is thus beyond dispute. The theorem
states:
pB j A
p A j BpB
p A
p A, B
pB
Similarly,
Bayesian statistics is an approach to statistics that considers probability as the key language for representing uncertainty, including uncertainty about parameters for
which inference is to be made. The Bayesian approach
to statistics differs fundamentally from the classical
pB j A
pB, A
p A
and
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
pB j Ap A PB, A
135
136
Bayesian Statistics
ptest j p.c.pp.c.
ptest j p.c.pp.c. ptest j no p.c.pno p.c.
11
0:900:00001
0:900:00001 0:100:99999
Philosophical Foundation of
Bayesian Statistics
Many researchers use Bayess theorem for probability
problems, such as the one about prostate cancer, in which
the parameter values are known quantities. However,
historically non-Bayesians have frowned on the use of
Bayess formula when it involves parameters whose
true values are unknown. This disdain for the formula
arises from the competition between two key philosophical understandings of probability. One understanding of
probability defines probability in terms of the relative
frequency of an event in a long series of trials. For example, the frequentist justification for believing that
the probability of obtaining heads on a single coin flip
is 0.5 is that in a long series of trials, we expect to see
heads approximately 50% of the time.
The frequentist perspective grounds the classical approach to understanding probability. For virtually any
statistical problem, a classical statistician will develop
a likelihood function, which represents the relative frequency of a particular set of data under a particular parameter. For example, suppose a classical statistician is
interested in estimating a population mean from a sample
of n independently and identically distributed (i.i.d.) observations. If the statistician supposes that the data arises
from a normal distribution with mean m and variance s2,
then the likelihood function (or sampling density) for n
observations will be:
(
2 )
n
Y
yi m
1
p exp
pY j m, s / Lm, s j Y
2s2
i1 s 2p
12
Bayesian Statistics
13
137
138
Bayesian Statistics
Bayesian Estimation
For decades, the two arguments against Bayesian statistics, coupled with the computational intensity required to
conduct a Bayesian analysis, made Bayesian analyses of
marginal use in social science. However, the recent explosion in computing capabilities, coupled with the
growth in hierarchical modeling (for which the Bayesian
approach is very well suited) has led to an explosion in the
use of Bayesian techniques. In this section, I discuss
the estimation of Bayesian models, primarily focusing
on the contemporary approaches that have made
Bayesian analyses more popular in the social sciences
over the last decade.
15
16
Bayesian Statistics
139
y m2 py dy
y[S
Contemporary Approaches to
Estimation
Over the last decade, new methods of estimation
Markov chain Monte Carlo methodshave become popular. These methods have the advantage of being able to
handle high-dimensional parameters (something that
quadrature methods cannot) and being theoretically
exact (something that approximation methods, by definition, are not). I focus extensively on these methods because they hold the most promise for making Bayesian
analyses more accessible to social scientists.
The name Markov chain Monte Carlo (MCMC)
derives from the nature of the techniques; the methods
produce simulated parameter values (hence Monte
Carlo), with each sampled parameter being simulated
based only on the immediately prior value of the parameter (hence Markov chain). In plain English, these
techniques produce sequences of random, but not independent, samples of parameters from their posterior
distributions.
140
Bayesian Statistics
Bayesian Statistics
141
142
Bayesian Statistics
Bayesian Inference
We have already discussed some aspects of Bayesian inference, including estimating the mean, median, mode,
and variance. Using simulated samples from the posterior
distribution is easy, and there is virtually no limit to the
statistics that we can use. For much of Bayesian inference,
the variance of the posterior distribution is important.
Credible intervals (the Bayesian version of confidence
intervals) can be constructed based on the variance
of the marginal posterior distribution for a parameter,
or we can simply use the sampled iterates themselves
to construct empirical intervals. That is, for a 100
(1 a)% interval for a parameter, we can simply order
the simulated sample of iterates from smallest to largest
and take the n(a/2)th and n(1 a/2)th iterates as the
bounds of the interval. The interpretation of such an interval differs from the interpretation of a classical interval. From a Bayesian perspective, we simply say that
the probability the parameter fell in the interval is 1 a.
The Bayesian approach also directly allows inferences
for parameters that are not in the model but are functions
of the parameters that are in the model. For example,
suppose our MCMC algorithm generates samples from
the distribution for y, p(y j y). If we are interested in
making inferences for a parameter d f(y), we simply
compute d j f(y j), 8j and use the collection of d as
a sample from p(d j y). We then proceed with inferential
computations as previously discussed.
actually fits the data. In addition, we may also be interested in comparing multiple models. The Bayesian approach has very flexible methods for model evaluation and
comparison.
19
Bayesian Statistics
Bayes Factors
Of interest for many statisticians is the comparison of
multiple models that are thought to capture the processes
that generate the data. Often, competing theories imply
different models. The classical approach to statistics is
limited to comparing nested models, but models often
are not nested. A Bayesian approach to comparing nonnested discrete sets of models is based on the Bayes factor.
If we have two models, M1 and M2, each with parameters
y, then Bayess theorem gives the posterior probability for
each model and a ratio of these probabilities can be
formed:
p M1 j y
Posterior odds
20
p M2 j y
Both the numerator and denominator can be broken
into their constituent parts, the Bayes factor and the
prior odds for model 1 versus model 2:
p y j yM1 pyM1
21
Posterior odds
p y j yM2 pyM2
The former ratio is simply the ratio of the marginal
likelihoods and is called the Bayes factor; the latter
ratio is the ratio of the prior odds for the two models.
The marginal likelihoods are so called because they
represent the integral of the posterior density over the
parameter space Si for each model Mi (essentially
averaging out parametric uncertainty within each model;
notice that this is nothing more than the inverse of the
normalizing constant required to make a posterior
density proper):
Z
py j Mi
py j yi pyi dyi
22
yi [Si
23
143
J
X
py j Mj , ypMj j y
24
j1
144
Bayesian Statistics
Further Reading
Box, George E. P., and Tiao, George C. (1973). Bayesian Inference
in Statistical Analysis. Addison-Wesley, Reading, MA.
DeGroot, Morris H. (1986). Probability and Statistics, 2nd Ed.
Addison-Wesley, Reading, MA.
Gelman, Andrew, Carlin, John B., Stern, Hal S., and Rubin,
Donald B. (1995). Bayesian Data Analysis. Chapman and
Hall, London.
Gilks, Walter R., Richardson, Sylvia, and Spiegelhalter, David
J. (1996). Markov Chain Monte Carlo in Practice. Chapman
and Hall/CRC, Boca Raton.
Hoeting, Jennifer A., Madigan, David,, Raftery, Adrian E., and
Volinsky, Chris T. (1999). Bayesian model averaging:
A tutorial. Stat. Sci. 14(4), 382417.
Kass, Robert E., and Raftery, Adrian E. (1995). Bayes factors.
J. Am. Stat. Assoc. 90(430), 773795.
Lee, Peter M. (1989). Bayesian Statistics: An Introduction.
Oxford University Press, New York.
Raftery, Adrian E. (1995). Bayesian model selection in social
research. Sociol. Methodology 25, 111164.
Behavioral Economics:
The Carnegie School
Mie Augier
Stanford University, Stanford, California, USA
Glossary
bounded rationality A phrase coined by Herbert Simon;
used to describe an approach to economics that is more
realistic than that of neoclassical economics.
maximization One of the assumptions of neoclassical economics and rational choice theory, assuming that all
agents maximize expected utility (or profit) of all possible
outcomes.
neoclassical economics A branch of economics building on
very strict assumptions about economic behavior as
optimizing behavior.
satisficing The idea that economic agents (and organizations)
do not maximize (due to cognitive limitations and bounded
rationality) but search for an outcome that is good
enough.
theory of the firm A field of economics and organization
theory centered around questions relating to the existence,
boundaries, and internal organization and activities of the
business firm.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
145
146
147
148
Closing
An important branch of behavioral economics was conceived at Carnegie Mellon University during the 1950s
and early 1960s around the work of Herbert Simon,
Richard Cyert, and James March. For these scholars,
behavioral economics meant doing science in an interdisciplinary way, linking economics to organization theory,
cognitive science, sociology, and psychology, and centering around concepts such as uncertainty, ambiguity,
norms, routines, learning, and satisficing. Emphasizing
the concern with the empirical validity of assumptions,
Simon thus wrote that behavioral economics is best characterized not as a single specific theory but as
a commitment to empirical testing of the neoclassical
assumptions of human behavior and to modifying economic theory on the basis of what is found in the testing
process. He included in the behavioral economics different approaches, such as new institutional economics,
transaction cost economics, evolutionary economics, and
the literature on heuristics coming from Kahneman and
Tversky.
149
Further Reading
Augier, M., and March, J. G. (2002). The Economics of Choice,
Change and Organization: Essays in Honor of Richard M.
Cyert. Edward Elgar, United Kingdom.
Augier, M., and March, J. G. (2002). A model scholar. J. Econ.
Behav. Organiz. 49, 1 17.
Camerer, C., Loewenstein, G., and Rabin, M. (eds.) (2004).
Advances in Behavioral Economics. Princeton University
Press, New Jersey.
Cyert, R., and March, J. G. (1992). A Behavioral Theory of the
Firm, 2nd Ed. Blackwell, Oxford.
Day, R., and Sunder, S. (1996). Ideas and work of Richard M.
Cyert. J. Econ. Behav. Organiz. 31, 139 148.
Earl, P. (ed.) (1988). Behavioral Economics. Edward Elgar,
Aldershot.
March, J. G., and Simon, H. A. (1993). Organizations, 2nd Ed.
Blackwell, Oxford.
Nelson, R., and Winter, S. (1982). An Evolutionary Theory
of Economic Change. Bellknap Press, Cambridge,
Massachusetts.
Simon, H. A. (1955). A behavioral model of rational choice.
Q. J. Econ. 69, 99 118.
Simon, H. A. (1991). Models of My Life. MIT Press,
Cambridge.
Williamson, O. E. (1985). The Economic Institutions of
Capitalism. Free Press, New York.
Williamson, O. E. (1996). Transaction cost economics and the
Carnegie connection. J. Econ. Behav. Organiz. 31,
149 155.
Williamson, O. E. (2002). Empirical microeconomics:
another perspective. Forthcoming in M.-S. Augier, and
J. G. March (eds) The Economics of Choice, Change and
Organization: Essays in Honor of Richard M. Cyert.
Edward Elgar, Cheltenham, UK.
Behavioral Psychology
Francisco J. Silva
University of Redlands, Redlands, California, USA
Glossary
avoidance conditioning A procedure in which a particular
response during a conditional stimulus prevents the
occurrence of an aversive event.
conditional stimulus A stimulus that elicits a conditional
response after being paired with an unconditional stimulus.
extinction In Pavlovian conditioning, when a conditional stimulus is no longer followed by an unconditional stimulus, the
conditional response will return to its preconditioning level.
In operant conditioning, withholding a positive reinforcer
after a response it normally followed will cause the response to
return to its preconditioning level.
functional relationship A description, often summarized in
a graph, that shows how one variable (the independent
variable) is related to another variable (the dependent
variable). Knowledge of the functional relationship and of
the value of the independent variable allows one to predict
the value of the dependent variable.
habituation The waning of a response elicited by a usually
harmless stimulus because of repeated presentations of that
stimulus.
hypothetical construct Unobserved entities purported to
mediate an environmentbehavior relationship.
operant conditioning The procedure in which behavior is
modified by its consequences.
Pavlovian conditioning The procedure in which a stimulus
comes to elicit a new response after being paired with
a stimulus that elicits a similar or related response.
positive reinforcer A stimulus whose occurrence after
a response increases the likelihood of that response recurring.
punisher A stimulus whose occurrence after a response
decreases the likelihood of that response recurring.
unconditional stimulus A stimulus that reliably and persistently elicits behavior that is resistant to habituation.
Foundations of Behavioral
Psychology
Historically, behavioral psychology was incorrectly identified with rat psychology because, at one time, its primary data were collected using laboratory rats as the
subjects. It was thus assumed that its theories were
most appropriate to rats and other laboratory animals.
In terms of its application to humans, behavioral psychology was considered most useful for populations with
developmental disabilities, autism, and the seriously
and persistently mentally ill. For many critics, to modify
someones behavior is tantamount to mind control and
evokes disturbing images from Orwells 1984, Huxleys
Brave New World, and Burgesss A Clockwork Orange.
But behavioral psychology resembles none of these
any more than nuclear medicine resembles atomic
bombs, meltdowns, and genetic mutations. In this section,
the scope and origins of behavioral psychology are
presented.
Scope
There are no restrictions on what behavioral psychologists
study, as long as it is something that people and animals
do. Crying during a movie, reading a book, writing
a poem, counting to 10, speaking to a teacher, playing
a guitar, thinking silently, feeling pain, solving a puzzle,
using a tool, remembering a quote, and loving a child are
examples of actions that behavioral psychologists can
study, as are capturing food, attracting a mate, eluding
predators, avoiding poisons, and seeking shelter. Topics
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
151
152
Behavioral Psychology
such as stress, personality, learning, intelligence, creativity, and consciousness are also studied by behavioral
psychologists.
There are also no restrictions on where these actions
are studied. Behavior can be studied by unobtrusive observations in the natural environment, such as when a
teacher observes children playing during recess or
a naturalist studies ants carrying food to their nest. Behavior can also be studied by experimentation in the natural
environment, such as when a coach compares the effectiveness of relaxation techniques to alleviate a tennis
players anxiety before big matches or a biologist removes
pine cones and stones near the entrance of a wasps burrow to see how this affects the insects ability to find its
home. Finally, behavior can be studied experimentally in
a clinic or laboratory, such as when a therapist measures
the nonverbal behavior of a married couple and then
implements a counseling program to reduce their negative body language or when a psychologist presents
a rat with tones of different durations to study its timing
abilities.
Philosophical Origins
The origins of studying environmentbehavior
relationships are as old as humans themselves, for humans earliest ancestors must have tried to understand
the relationship between a season and the migration of
animals, the behavior of prey animals and the noises made
by those who hunt them, the ingestion of a mushroom and
bodily sensations, and so on. Somewhere in the history of
humans, these informal observations and understandings
evolved into explicit and written musings about behavior.
The intellectual roots of behavioral psychology can be
traced to several sources: Francis Bacon (15611626),
who urged an inductive approach to science; John
Locke (16321704), who emphasized the role of experiences in the formation of knowledge; Julien de la Mettrie
(17091751), who believed that humans and animals
were machines that differed by degree and not kind; David
Hume (17111776), who postulated several laws of association; Auguste Comte (17981857), who insisted that
the data of science be publicly observed; Charles Darwin
(18091882), who argued that living creatures evolved
by natural selection; and many other philosophers and
scientists of the late 19th and early 20th centuries
who were a part of the philosophical movements known
as functionalism, logical positivism, and behaviorism.
Despite these early influences and that ethologists
in Europe were also studying the relationship among
animals behavior and their environments, the birth of behavioral psychology is most often attributed to Russian
physiologist Ivan Pavlov (18491936) and American
psychologists Edward Thorndike (18741949), John
Watson (18781958), and B. F. Skinner (19041990).
Hypothetical Constructs
Hypothetical constructs are unobserved entities
purported to mediate an environmentbehavior relationship. Behavioral psychologists avoid the use of hypothetical constructs because these are easily misused
and, hence, can hinder the scientific study of behavior.
One reason for this misuse is that it is often unclear
whether these constructs are real, metaphorical, or promissory notes for something that might be observed in the
future. For example, it has been said that a rat navigates
a maze by scanning with its minds eye a stored cognitive
map of the maze. However, proponents of this type of
explanation also say that there is no actual minds eye, no
Behavioral Psychology
Theories
Theories fall along a continuum, ranging from a set of
loosely interconnected observations to precise mathematical formulations. Behavioral psychology is not atheoretical, though its practitioners believe that the best theories
are those that organize facts at the same level that
those facts were collected. This means that if someone
has determined the functional relationship between
certain environmental conditions and behavior, then
it is unnecessary to appeal to physiological mechanisms
or processes that were never manipulated or measured
or whose own relationships to behavior are poorly
understood.
Theories can also be distinguished on the basis of their
use of hypothetical constructs. For behavioral psychologists, a theory with too many hypothetical constructs is
sloppy because it has too many degrees of freedom.
A theory that can explain everything explains nothing.
Finally, theories in behavioral psychology differ from
those in other areas by using behavioral primitivesbasic
terms and relationships that serve as pillars of theory and
therefore do not require further explanationand then
coordinating and integrating these to explain more complex behavior (see OperantPavlovian Interactions for
an example). In the process of integration, behavioral
153
154
Behavioral Psychology
would probably attribute the players poor batting performance to a decline in his confidence; for the behavioral
psychologist, the players slump is likely the result of poor
batting mechanics and responses that interfere with batting (e.g., intrusive thoughts, tensed muscles). To help the
struggling batter, the cognitive psychologist focuses on
changing the players cognitions and feelings. Thus, this
psychologist might help the athlete focus on the positive
(Hey, youre swinging the bat well and making good
contact!), lower his goals (Hitting 1 out of 4 at-bats is
okay.), and challenge unrealistic statements (Youre not
going to be demoted to the minor leagues because of
a batting slump.). The goal is to impact the players confidence, which will then presumably raise his batting average. In contrast, although there is nothing that prevents
a behavioral psychologist from trying to change the
players confidence, this psychologist focuses on video
analysis, modeling, and corrective feedback to show the
player how he is gripping the bat too rigidly, dropping his
right shoulder too soon, as well as how he can shorten
his swing, how to steady his head, and the like. Following
this instruction, the player will practice batting with these
new mechanics. The goal is to change the players batting
behavior directly rather than change a hypothetical construct (e.g., confidence) that might lead to a change in
batting performance.
Changing EnvironmentBehavior
Relationships
Although there are many ways of changing environment
behavior relationships, most of these ways can be categorized into three types of procedures: habituation,
Pavlovian conditioning, and operant conditioning.
Whether these three procedures represent three distinct
phenomena is controversial; each procedure seems to
contain elements of the others. In this section, each procedure is summarized along with a few of its major properties. An understanding of these properties gives
behavioral psychologists the foundation for changing
environmentbehavior relationships.
Habituation
If a usually harmless stimulus occurs repeatedly and there
are no other events associated with it, then behavior
elicited by that stimulus will diminish. This is habituation,
one of the simplest ways to change behavior. By repeatedly presenting some stimuli (the environmental cause),
people and animals stop responding to those stimuli (the
behavioral effect). Habituation is one of the reasons that
listening to the same song causes someone to lose interest
in that song and a roller coaster becomes uninteresting
after a few rides.
To illustrate some of the features of habituation, consider an example of a person staying in a downtown
Chicago hotel along the citys famous elevated train
line. Initially, trains rumbling down the track and screeching to a stop (the stimuli) elicit responses such as
waking up, covering ones ears, and a set of sensations,
emotions, and thoughts that collectively might be called
Behavioral Psychology
155
a startle response. This is stimulus discrimination. Habituation to one stimulus might (generalization) or might
not (discrimination) extend to other stimuli.
Pavlovian Conditioning
If a neutral stimulus (e.g., a bell) reliably precedes,
usually in close temporal proximity, a stimulus that reliably and persistently elicits behavior (e.g., food in the
mouth), then people and animals begin reacting during
the neutral stimulus (e.g., by salivating) in way that
prepares them for the impending stimulus. Although
few if any stimuli are neutral in the sense that they do
not elicit any behavior, behavioral psychologists consider
a stimulus to be neutral when any behavior it elicits readily
wanes with repeated presentations of the stimulus (i.e.,
responding habituates). A ringing bell, for example, might
initially elicit an orienting response directed toward
the sound, but this action will disappear with repeated
ringing.
A stimulus that comes to elicit behavior after being
paired with the stimulus that elicits behavior is a conditional stimulus (CS). A stimulus that reliably and persistently elicits behavior resistant to habituation is an
unconditional stimulus (US). The responses elicited by
the CS and the US are the conditional response (CR)
and the unconditional response (UR), respectively. The
procedure for changing behavior when two or more
stimuli are paired is Pavlovian, classical, or respondent
conditioning.
Although many examples of Pavlovian conditioning
involve biologically significant USs such as food or
water, the US does not have to be biologically significant.
For example, imagine that when Janes grandparents visit
each week, they give her $20 when they arrive. After
several pairings between these relatives and the money,
how will Jane react when her grandparents visit? She will
probably be happy and expecting money at the sight of
her grandparents at the door because her grandparents
(CS) precede the occurrence of money (US), which normally elicits a constellation of positive emotional responses (URs) when she receives it. By reliably giving
Jane $20 when they arrive, the grandparents come to elicit
a set of similar responses (CRs). However, if these
relatives stop giving Jane the $20 gift when they visit,
then she will become less happy and less likely to expect
money when she sees them. That is, when the CS no
longer predicts the US, the CR weakens and might eventually disappear. Presenting a CS without the US is
termed extinction.
But just as the passage of time without a stimulus
causes the spontaneous recovery of a habituated response, so too does a period of time following the extinction
of a CR cause spontaneous recovery of that response. If
Janes grandparents stop visiting for a few weeks after
156
Behavioral Psychology
Operant Conditioning
Habituation deals with how a person or an animals behavior is changed by repeated presentations of single stimuli. Pavlovian conditioning focuses on how someones
behavior is changed by a particular relationship among
Behavioral Psychology
157
OperantPavlovian Interactions
Although habituation, operant conditioning, and
Pavlovian conditioning are often discussed separately,
the three procedures are almost always involved in varying
proportions in learned behavior. For example, in most
Pavlovian conditioning situations, the neutral stimulus
usually elicits a response that undergoes habituation as
this stimulus is increasingly paired with a US.
Some of the clearest examples of operantPavlovian
interactions involve situations where people and animals
avoid aversive events. A man who is afraid to speak in
public will not wait to see how his audience reacts; on
being asked to give a speech, he will decline the request or
make an excuse. A woman who fears that an elevator she
rides aboard might crash will not ride the elevator to see
what will happen; she will use the stairs instead. A gang
member hears gunshots and ducks for cover. In these
examples, being asked to give a speech, the sight of the
elevator, and the sound of gunfire predict possible aversive outcomes (e.g., an unresponsive audience, a crashing
elevator, being shot) unless an avoidance response occurs
(e.g., declining to give a speech, taking the stairs, diving to
the ground). In these circumstances, there is a Pavlovian
and an operant component that controls behavior. The
Pavlovian component is the relationship between a signal
and an aversive event. The operant component is the
responseconsequence relationship involving the avoidance response and the consequent absence of the aversive
event. If someone makes a particular response during
the signal that predicts the aversive event, then this
event will not occur and the avoidance response is negatively reinforced.
It is worth noting that, for many people, the aversive
event does not have to be probable, only possible. The
probability that an elevator will crash is low. Despite this,
modifying avoidance behavior is difficult. In the example
of the woman who is afraid to ride in elevators, it is unlikely that this fear will disappear because she will never
sample the real contingencies of safely riding in an elevator. The fear elicited by the CS (sight of the elevator)
causes her to make a response (climbing the stairs) that
prevents her from discovering that her fear (an elevator
she rides aboard will crash) is unrealistic. To eliminate
avoidance behavior related to unrealistic anxiety or fear,
158
Behavioral Psychology
Further Reading
Abramson, C. I. (1994). A Primer of Invertebrate Learning:
The Behavioral Perspective. American Psychological Association, Washington, DC.
Donahoe, J. W., and Palmer, D. C. (1994). Learning and
Complex Behavior. Allyn & Bacon, Boston, MA.
Hearst, E. (1988). Fundamentals of learning and conditioning. In Stevens Handbook of Experimental Psychology (R. C. Atkinson, R. J. Herrnstein, G. Lindzey,
and R. D. Luce, eds.), 2nd Ed., Vol. 2, pp. 3109. Wiley,
New York.
Kazdin, A. E. (1998). Research Design in Clinical Psychology,
3rd Ed. Allyn & Bacon, Boston, MA.
Bentham, Jeremy
Gilbert Geis
University of California, Irvine, Irvine, California, USA
Glossary
felicity calculus The process by which the balance of
pleasures and pains is measured; sometimes called
hedonistic or hedonic calculus.
happiness For Bentham, a state measured in terms of whether
an action adds to an individuals pleasure or diminishes the
sum total of his or her pain. The term is generally used by
Bentham to refer to the aggregate of a persons pleasures
over pains.
paraphrasis Benthams coined term for the process of
demystifying words such as liberty, by breaking them
down into their constituent elements. If such a definitional
breakdown is not possible, it demonstrates that the term is
a fiction, unrelated to any real thing.
pleasure and pain Bentham maintains that nature has
placed mankind under the governance of two sovereign
masters, pleasure and pain. The difference between
happiness and pleasure lies in the fact that the former is
not susceptible of division, but pleasure is.
utility For Bentham, the principle of utility approves or
disapproves of every action whatsoever according to its
tendency to augment or diminish the happiness of the party
whose interest is in question.
Biographical Notes
Jeremy Bentham was an eccentric, reclusive intellectual
who spent the better part of his life writing tracts that
sought to enlighten his fellow humans about the paths of
proper thought and persuading persons with power to
support the kinds of programs he believed would improve
the human condition. The son and grandson of successful
London lawyers, Bentham was an exceptionally precocious child. He became a boarder at the fashionable
Westminster School at the age of 7 years and enrolled at
Queens College, Oxford University, when he was not
yet 13. He later studied law and clerked at Lincolns
Inn, developing a deep disdain for both lawyers and lawyering, regarding the latter as a pursuit built on fictions
and marked by the use of terms and concepts that only
initiates could comprehend, all this being so in order to
enrich practitioners and keep laymen from knowing what
was going on. Lawyers feed upon untruth, as Turks feed
upon opium, was one of Benthams caustic jabs. On his
fathers death in 1792, Bentham inherited a sizable fortune, realized from property investments, that allowed
him the leisure to spend his days putting together
a monumental outpouring of ideas that reflected his
utilitarian views.
Bentham wrote tirelessly for all his adult life. James
Steintrager insists that to read all that Bentham wrote
would take a lifetime longer than Benthams 84 years.
Besides, Benthams prose at times can be virtually impenetrable. The essayist William Hazlitt sarcastically observed that his works have been translated into
Frenchthey ought to be translated into English.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
159
160
Bentham, Jeremy
Benthams Utilitarianism
Of Pleasures and Pains
Jeremy Bentham is regarded as the father of the utilitarian
school of thought that seeks to analyze human and social
actions in terms of their consequences for well being.
Many writers before Bentham had employed the term
and some of the ideas of utilitarianism, but in a much
looser sense than Bentham proposed. Bentham advocated that human action and public policy should seek
to create the greatest happiness for the greatest number. This concept, as Robert Shackletons adroit detective work has demonstrated, was taken verbatim from
Cesare Beccarias Dei delitti e delle pene (1764; translated
into English as An Essay on Crimes and Punishments).
Later, Bentham would truncate the goal to the greatest
happiness, a move Mary Mack regards as mere definitional housekeeping, though it more likely was prompted
by Benthams recognition that, from a mathematical perspective, the original definition contained one too many
greatests. Bentham finally settled on the term the felicity maximizing principle to describe the ideal that characterized his recommendations. Bentham also was deeply
concerned not to ignore the needs of persons in the minority. This is reflected in his system of weighting, in which
everyone was to count for one and no one for more than
one. In modern economic welfare theory, this equal
weighting defines a utilitarian principle as being specifically Benthamite.
For Bentham, the wellsprings of human behavior
reside totally in attempts to gain pleasure and to avoid
pain. He maintained that men calculate [pleasure and
pain], some with less exactness, indeed, some with more:
but all men calculate. And all humans voluntarily act in
regard to their personal pleasure. It is difficult, Bentham
granted, to prove enjoyment in the case of, say, a Japanese
man who commits hari kari; but, nonetheless, the pursuit of
pleasure was his goal. Bentham believed that self-interest,
if enlightened, would produce socially desirable results,
though he was aware that individuals at times fail to appreciate what will bring them pleasure rather than pain, in
part because they cannot adequately anticipate the future.
For Bentham, present and future consequences of actions could be estimated in terms of seven considerations:
(1) intensity, (2) duration, (3) certainty, (4) propinquity, (5)
fecundity, (6) purity, and (7) extent, this last referring to the
number of individuals affected. Bentham conceded that
intensity was not measurable, but maintained that the
other outcomes might be calibrated. Pleasures and pains
were of various kinds, 13 of which Bentham, an inveterate
maker of lists that sought to embrace comprehensively all
possibleramifications of a subject, spelledout:(1) pleasures
and pains of sense, (2) pleasures of wealth, with the corresponding pains of deprivation, (3) pleasures of skill and
Critiques of Bentham
Benthams appeal was to rational argument and to what
James E. Crimmins calls consequentialist calculation.
His aim was to extend the physical branch [of science] to
the moral, but it must be stressed that he was well aware
that his measuring formula inevitably would provide an
inexact answer: his was a roadmap without any precise
Bentham, Jeremy
161
Felicity Calculus
To measure the goodness or badness of an act,
Bentham introduces the pseudomathematical concept of
what he calls felicity calculus. Bentham offers the following vignette to illustrate how a situation might be looked at:
[I]f having a crown in my pocket and not being thirsty,
I hesitate whether I should buy a bottle of claret with it for
my own drinking or lay it out in providing sustenance for
a family . . . it is plain that so long as I continued hesitating
the two pleasures of sensuality in the one case, of sympathy
in the other, were exactly . . . equal.
162
Bentham, Jeremy
The problem of assigning satisfactory weights to different pleasures (any one of which will also vary for different
persons) was overwhelming: all Bentham really asked
was that responsible people ought to take the utilitarian
approach as a valuable tool for directing and evaluating
behaviors and public policies.
John Stuart Mill, a Bentham disciple with ambivalent
views about his mentor, put the matter particularly well:
He [Bentham] introduced into morals and politics those
habits of thoughts and modes of investigation, which are
essential to the idea of science; and the absence of which
made those departments of inquiry, as physics had been
before Bacon, a field of interminable discussion, leading
to no result. It was . . . his method that constituted the
novelty and the value of what he did, a value beyond all
price . . . (Mill, 1769/1838).
The issue of slavery offers a good example of
Benthamite thought. Having slaves might create a good
deal of happiness in those who possess them and a good
deal of wretchedness in those who are enslaved. But
rather than attempting to measure these consequences,
Bentham points out that he knows of nobody who would
voluntarily prefer to be a slave; therefore, it is apparent
that the unattractiveness of the status and the pain associated with it overwhelm any advantage slavery might
offer to those who benefit from it.
Benthams felicity calculus, given his dogged analytical
attempts to pin down all possible consequences of an
action, usually is up to the task of differentiating what
most of us would consider the good from the bad,
or pleasure from pain. He offers, for instance,
a situation of an employee of a medical research center
The disagreement, Bentham observes, can readily be settled by obtaining a measuring stick, the accuracy of which
both parties concede. The same process, Bentham maintains, can be carried out in regard to disagreements about
happiness. But he never truly found an adequate measuring rod. At one point, in fact, in a not uncharacteristic
manner, Bentham observed that the single honk of a goose
some thousands of years earlier undoubtedly has influenced many aspects of contemporary life, and he sought to
enumerate some of these consequences. His was an indefatigable pursuit of a detailed inventory of the consequences of events and actions, so that all of us could be
better informed and thereby persuaded to make wiser
choices.
Bentham, Jeremy
163
Conclusion
In a long life dedicated to interminable written analyses
and exhortations, Bentham attempted to translate his
premises regarding happiness into formulas that offered
an opportunity to determine pain and pleasure. He inevitably failed by a wide margin to approximate exactitude in
a mathematical sense. Nonetheless, relying on utilitarian
doctrine and formulas, Bentham had remarkable success
in bringing about significant number of political reforms,
particularly in criminal law and in regard to the punishment of offenders. He typically did so by specifying with
striking perceptiveness and in excruciating detail the
consequences of the old ways and the advantages of his
recommendations for creating a fairer and a happier society. In addition, Benthams utilitarianism, scoffed at and
caricatured for decades, later regained favor in the social
sciences, particularly in economics, and today forms the
intellectual skeleton that is being fleshed out by sophisticated analytical and empirical formulations.
Auto-Icon
Benthams skeleton, stuffed with padding to fill it out, and
seated in his favorite chair and dressed in his own clothes,
is displayed in the South Cloister at University College,
University of London. His head did not preserve well, so
a wax replica sits atop the body (see Fig. 1). He requested
in his will that his body be so preserved, but only after it
furnished final utility for medical science by means of
a public dissection.
A Bibliographic Note
Bentham produced a torrent of written material, much of
it still unpublished. His method of work was highly disorganized. As Shackleton notes, drafts succeeded drafts,
abridgments and expansions followed each other, manuscripts were often dismantled and physically incorporated
in others. Benthams correspondence alone now occupies 11 large volumes that have been published as part of
a project to move all of his manuscript material into print.
Three more volumes are in process, with a possible fourth
to include material that has come to light since the project
began. Bentham characteristically enlisted aides to edit
his material and they often took considerable liberties
reorganizing and rewriting what they regarded as
a sometimes impenetrable thicket of words. The best
guide to the quality and reliability of Benthams work is
provided by Ross Harrison in Bentham (pages ixxxiv).
The Bentham Project, housed at University College,
University of London, offers a website (available at http://
www.ucl.ac.uk) with up-to-date information on the
Further Reading
Ben-Door, O. (2000). Constitutional Limits and the Public
Sphere: A Critical Study of Benthams Constitutionalism.
Hart, Oxford.
Bentham, J. (18381843). The Works of Jeremy Bentham
(J. Bowring, ed.). W. Tait, Edinburgh.
Bentham, J. (1996) [1789]. An Introduction to the Principles of
Morals and Legislation (J. H. Burns and H. L. A. Hart,
eds.). Clarendon, Oxford.
Crimmins, J. E. (1990). Secular Utilitarianism: Social Science
and the Critique of Religion in the Thought of Jeremy
Bentham. Clarendon, London.
Harrison, R. (1993). Bentham. Routledge & Kegan Paul,
London.
Kelly, P. J. (1990). Utilitarianism and Distributive Justice:
Jeremy Bentham and the Civil Law. Clarendon, Oxford.
Mack, M. (1963). Jeremy Bentham: An Odyssey of Ideas,
17481792. Columbia Univ. Press, New York.
Mill, J. S. (1969) [1838]. Bentham. In The Collected Works of
John Stuart Mill (J. M. Robson, ed.), Vol. X, pp. 75115.
Univ. of Toronto Press, Toronto.
Parekh, B. (ed.) (1993). Jeremy Bentham: Critical Assessments.
Routledge, London.
164
Bentham, Jeremy
Bernoulli, Jakob
Ivo Schneider
University of the German Armed Forces Munich, Munich, Germany
Glossary
Bernoullis measure of probability A measure of probability derived from the transformation of Huygenss value
of expectation into a quantifiable concept. There are
two ways of determining this measure of probability:
(1) (a priori) for an equipossibility of the outcomes of
a finite number of mutually exclusive elementary events,
such as drawing a ball of a certain color out of an urn filled
with balls of different colors, by the ratio of the number of
cases favorable for the event to the total number of cases;
(2) (a posteriori) for events such as a person of known ages
dying after 5 years, by the relative frequency of the
frequently observed event in similar examples.
law of large numbers The theorem that the estimates of
unknown probabilities of events based on observed relative
frequencies of such events become the more reliable the
more observations are made (also called Bernoullis golden
theorem). More precisely, Bernoulli proved that the
relative frequency hn of an event with probability p in n
independent trials converges in probability to p or that for
any given small positive number e and any given large
natural number c, for sufficiently large n, the inequality:
P jhn pj e
4c
P jhn pj4e
Jakob Bernoulli (16551705), one of the leading mathematicians of his time with important contributions to
infinitesimal calculus, is the father of a mathematical
theory of probability. His posthumously published Ars
Conjectandi influenced the development of probability
theory in the 18th century up to Laplace. The basic concept in Bernoullis art of conjecturing became probability,
the classical measure of which he had derived from
a transformation of Huygens value of expectation. Ars
Conjectandi should be applied to what we call now the
social domain and what Bernoulli described as the domain
of civil, moral, and economic affairs. Bernoulli distinguishes two ways of determining, exactly or approximately, the classical measure of probability. The first,
called a priori presupposes the equipossibility of the outcomes of certain elementary events and allows us to relate
the number of cases favorable for an event to all possible
cases. The second, called a posteriori, is for the determination of the probability of an event for which there are
no equipossible cases that we can count (e.g., mortality).
For these cases, we can inductively, by experiments, get
as close as we desire to the true measure the probability
by estimating it by the relative frequency of the outcome
of this event in a series of supposedly independent
trials. This he justifies by his theorema aureum, which was
called later by Poisson Bernoullis law of large numbers.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
165
166
Bernoulli, Jakob
1682) a short presentation of differential calculus in algorithmic form, and in 1686 he published some remarks
concerning the fundamental ideas of integral calculus.
These papers occupied the interest of Jakob Bernoulli
and his younger brother Johann (16671748). Jakob
tried to get further information from Leibniz in 1687,
but Leibniz could answer Bernoullis questions only 3
years later because of a diplomatic mission he had to
undertake. At that time, Jakob and Johann had not only
mastered the Leibnizian calculus, but also had added so
considerably to it that Leibniz in a letter of 1694 remarked
that infinitesimal calculus owed as much to the Bernoulli
brothers as to himself. Jakob cultivated the theory of infinite series on the basis of preliminary work done by
Nikolaus Mercator, James Gregory, Newton, and Leibniz.
He published five dissertations on series between 1689 and
1704. He considered series to be the universal means to
integrate arbitrary functions, to square and rectify curves.
In 1690, Jakob had introduced the term integral in his
solution to the problem of determining the curve of
constant descent.
In the 1690s, the relationship between Jakob and Johann deteriorated and led to bitter quarrels. In 1687,
Jakob had became professor of mathematics at the University of Basel, in which position he remained until his
death in 1705. He was honored by the memberships of
the Academies of Sciences in Paris (1699) and in Berlin
(1701). He had a daughter and a son from Judith Stupan,
whom he had married in 1684.
Bernoulli, Jakob
167
168
Bernoulli, Jakob
Bernoulli, Jakob
asked him to send him his copy for a while. In his following
letters, Leibniz disappointed Bernoulli, claiming to be
unable to find the Waerdye or relevant papers of his
own. Instead he declared that in the area of jurisprudence
and politics, which seemed so important for Bernoullis
program, no such extended calculations were usually required because an enumeration of the relevant conditions
would suffice. In his last letter to Bernoulli, of April 1705,
Leibniz explained that the Waerdye contained nothing
Bernoulli could not find in Pascals Triangle Arithmetique
or in Huygens De Rationciniis in Ludo Aleae, namely to
take the arithmetical mean between equally uncertain
things as is done by farmers or revenue officers when
estimating the value of real estate or the average income.
Bernoulli, who died the following August, seemed to have
ignored Leibnizs hint about Pascals Triangle Arithmetique, which was not mentioned in the Ars Conjectandi.
Astoundingly, Leibniz did not mention Edmund Halleys
work on human mortality in the Philosophical Transactions, which was based on data from the city of Breslau
that Leibniz had helped to make available to the Royal
Society. The only relevant work Bernoulli mentioned in
his manuscripts was John Graunts Natural and Political
Observations from 1662, a German translation of which
appeared in 1702. However, Bernoulli took Leibnizs objections, especially those against the significance of his law
of large numbers, as representative of a critical reader and
tried to refute them in Part IV of the Ars Conjectandi.
169
vc
v1
where c is a given natural number in the form of a polynomial in n of degree c 1, the coefficients of which
were constructed with the help of certain constants,
later called Bernoullian numbers. This formula played
an important role in the demonstration of de Moivres
form of the central limit theorem.
170
Bernoulli, Jakob
1
c1
Bernoulli, Jakob
171
Further Reading
Bernoulli, J. (1975). Die Werke von Jakob Bernoulli. Vol. 3
(Naturforschende Gesellschaft in Basel, ed.), Birkhauser
Verlag, Basel.
Hald, A. (1990). A History of Probability and Statistics
and Their Applications Before 1750. John Wiley & Sons,
New York.
Schneider, I. (1984). The role of Leibniz and of Jakob
Bernoulli for the development of probability theory.
LLULL 7, 6889.
Shafer, G. (1978). Non-additive probabilities in the work of
Bernoulli and Lambert. Arch. Hist. Exact Sci. 19, 309370.
Stigler, S. M. (1986). The History of StatisticsThe Measurement of Uncertainty Before 1900. Harvard University Press,
Cambridge, MA.
Glossary
arithmetic average A term Bertillon borrowed from Laplace
and Quetelet to describe an average calculated on the basis
of qualitatively different types of objects. A famous example
of an arithmetic average is the average height of houses on
a street. As this example suggests, arithmetic averages do
not give any information about the individual observations
that make up the population or about the population as
a whole.
demography The science of population statistics. In the 19th
century, this included the study of all those features of
a population that could be quantified, including measures
of population size and growth such as births, deaths, and
marriages, physiological measures such as height and
weight, environmental measures such as the type of housing
and climate, and social measures such as income and level
of education.
natural averages Averages taken on (relatively more) homogenous populations, such as a racial or socioeconomic
group. In contrast to arithmetic averages, natural averages
were seen to provide valuable information about the
characteristics of the observed population.
special rates A term coined by Bertillon to refer to what are
now called age-specific rates. They measure the likelihood of a particular event affecting a population of
a particular age, generally between ages x and x 1. In the
case of mortality, the special rate for the population of age
20 would be the number of people between ages 20 and 21
dying in the course of a year divided by the number of
people aged 20 alive at the beginning of the year.
statistical laws A term used by 19th century statisticians to
refer to statistical regularities. By using the term law, they
indicated their belief that this regularity pointed to the
existence of an underlying cause or force that influenced
all individuals in the observed population with the same
force, thus accounting for the presumed stability of the
phenomena.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
173
174
Bertillon as Professional
Methodologist
Public Hygiene and the Measure of
Well-Being
Louis Adolphe Bertillon was born in Paris on April 2,
1821. Little is known about his parents, except that his
mother died of cholera in 1832 and that his father subsequently placed him in a boarding school where he was
destined for a career in business. Against his fathers
wishes, Bertillon went on to study medicine. He attended
the School of Medicine in Paris, where he enthusiastically
embraced the new experimental medicine, inserting
himself in scientific laboratories (generally restricted to
professors) and eventually gaining a position as an assistant (preparateur). While at the School of Medicine,
Bertillon also attended the public lectures of Jules
Michelet and became a devotee of the Republican historian. His Republicanism combined a strong belief in
libertyespecially freedom of expressionwith a socialist support of workers associations and mutual aid
societies and a strong anti-clericism. Although Bertillon
opposed capitalism, he did not favor state intervention
or revolutionary action. Instead, he believed that society
had to be allowed to develop naturally according to its
internal logic.
Bertillons political views developed in the years leading up to the 1848 Revolution. It was during one of his
many visits to political clubs that had sprung up all over
Paris that he first met Achille Guillard, his future fatherin-law and cofounder of demography as a discipline.
These activities brought Bertillon to the attention of
the authorities and he was arrested and imprisoned
a number of times in the conservative backlash that followed the 1848 Revolution and the military coup of 1851.
Bertillons early political activities also introduced him to
the medical researcher Paul Broca, with whom he later
founded the Anthropological Society.
Arithmetic versus Physiological Averages
In 1852, Bertillon completed his medical studies with
a doctoral thesis entitled De quelques elements de lhygiene dans leur rapport avec la duree de la vie. The thesis
began from the classical public hygiene problem of how to
measure well-being or, in this case, national prosperity.
The text involves a discussion of the most commonly used
measure, the average age of death. Bertillon groups the
175
The second half of the thesis reviewed different explanations for variations in longevity. This separation of the
identification of statistical regularities and their explanation was characteristic of Bertillons work throughout his
life. His studies began with a documentation of statistical
phenomenaeither a stable trend or parallel in the development of two variables over timeand were followed
by a more speculative, qualitative discussion of different
types of explanation. Statistical data were sometimes used
to support or discount particular trends, but no attempt
was made to measure the stability of the observed trend
(degree of certainty) or to discriminate between different
causes. In the case of his thesis, Bertillon considered
various classical public hygiene explanations, including
the influence of climate, temperature, latitude, urban living, race, and profession on longevity. He concluded by
supporting the well-known public hygienist, Louis Rene
Villerme, in his then-novel arguments concerning the
(nonmathematical) association of poverty with aboveaverage death rates.
Mortality Differentials
Bertillons political activism brought him under the suspicion of the authorities in the early years of the Second
Republic. To avoid arrest, Bertillon moved his family to
Montmercy, a town outside of Paris, where he practiced
medicine. It was during this period that Dr. Malgaigne,
a member of the Academy of Medicine and Faculty of
Medicine where Bertillon had studied, asked Bertillon to
participate in an ongoing debate on the efficacy of the
vaccine. Bertillon intervened in the discussion as a statistical expert. The invitation led him to a study of mortality
differentials in France by age and region and a first major
publication entitled Conclusions statistiques contre les
detracteurs de la vaccine: essai sur la methode statistique
appliquee a` letude de lhomme (1857). The book was
basically a methodological treatise on the construction of
mortality rates in terms of the population at risk and the
effect of age structure on mortality statistics, illustrated
with a systematic study of mortality trends in France in the
18th and 19th centuries. In contrast to public officials who
claimed that the increase in the absolute number of deaths
among 20- to 30-year-olds pointed to a worsening in the
health of the population, Bertillon argued the reverse. His
evaluation rested on an increase in the absolute size of
the cohort owing to a decline in infant mortality. The
memoir received a number of prizes from the Academies
of Sciences and Medicine.
This systematic study led Bertillon to the identification
of an unusually high rate of infant mortality in the departments surrounding Paris. His study was the first to signal
the effects of the practice of wet-nursing on infant mortality. Bertillon first submitted a report on the problem to
the Academy of Medicine in 1858 in a paper entitled
Mortalite des nouveau-nes, but the problem was
176
177
178
179
180
2000
1900
S
ns
1800
1700
400
D o u bs
p a rt d u
t a ill
4p
5p
il
gIII
b
oba
d I
gII
200
IV
III
300
le d
e dans
chaqu
e taill
e en F
rance
500
100
e pr
ed
b
r
u
Co
600
aque
700
gIV
e ch
it d
800
ilite da
900
pro bab
1000
b e de
1100
C o ur
1200
1500
1300
VI
1600
1400
II
4p 5p 6p
gI
g o
fM i
o
l
a
7p a 8p a 9p a 10p a 11p a 0p 1p a 2p 3p
6p
7p
8p
9p
10p
Figure 1
Camel-shaped distribution.
similar to the type of common-sense categories that people make every day to negotiate the complexities of social
life. Within the category of subjective averages, Bertillon
distinguishes between typical averages, where the individual observations were all variations of a single type or
race, and indexical averages, where they rested on a purely
fortuitous and factitious agglomerations, such as the
average height of houses on a street (Herschels example)
or the average height of the conscripts in the Doubs.
The challenge for statisticians was to distinguish between typical and indexical averages, the first being of
scientific value, the latter not. The value of the error
curve (or curve of probability as Bertillon referred to
it) was that it provided a means to empirically distinguish
between these types of averages. If the different values
displayed a symmetrical distribution about the mean according to the binomial distribution, then the average was
either objective or a typical subjective average; if not, then
it was an indexical subjective average and of dubious scientific value. Bertillons 1876 discussion suggests that,
although he himself did not use the error curve to
181
Further Reading
Armatte, M. (1991). Une discipline dans tous ses etats: La
statistique a` travers ses traites (18001914). Rev. Synth. 2,
161205.
Bertillon, A., Bertillon, J., and Bertillon, G. (1883). La Vie et
les Oeuvres du Docteur L.-A. Bertillon. G. Masson, Paris.
Brian, E. (1991). Les moyennes a` la Societe de Statistique de
Paris (18741885). In Moyenne, Milieu, Centre: Histoires
et Usages (J. Feldman, G. Lagneau, and B. Matalon, eds.),
pp. 107134. Editions de lEcole des Hautes Etudes en
Sciences Sociales, Paris.
Cole, J. (2000). The Power of Large Numbers: Population and
Politics in Nineteenth-Century France. Cornell University
Press, Ithaca, NY.
Coleman, W. (1982). Death Is a Social Disease: Public Health
and Political Economy in Early Industrial France. University of Wisconsin Press, Madison, WI.
Dupaquier, M. (1983). La famille Bertillon et la naissance
dune nouvelle science sociale: La demographie. In Annales
de Demographie Historique; Etudes, Chronique, Bibliographie, Documents, pp. 293311.
Dupaquier, J., and Dupaquier, M. (1985). Histoire de la
Demographie: La Statistique de la Population des Origines
a` 1914. Librairie Academique Perrin, Paris.
Eyler, J. M. (1979). Victorian Social Medicine: The Ideas and
Methods of William Farr. Johns Hopkins University Press,
Baltimore, MD.
Lecuyer, B. P. (1987). Probability in Vital and Social
Statistics: Quetelet, Farr, and the Bertillon. The Probabilistic Revolution (L. Kruger, ed.), pp. 317335. MIT Press,
Cambridge, MA.
Binet, Alfred
Dolph Kohnstamm
Emeritus, Leiden University, Leiden, The Netherlands
Glossary
cephalometry Methods and scientific motives to measure the
size and form of the head of individuals in order to find
characteristics of specific groups of human beings, e.g., of
gifted children, early forms of mankind (in evolution),
criminals, and people belonging to different races. Discredited because of Nazi theories and practice. In modern
times, replaced by measures of brain volume using
magnetic resonance imaging scans.
conservation Jean Piagets term for the ability of the child to
recognize that certain properties of objects (e.g., mass,
volume, and number) do not change despite transformations in the spatial appearance of the objects.
mental age Stage and level in cognitive (mental) development that are typical for children of a given age, as
determined in large and representative samples of children
in a given population.
pedagogy From the Greek words pais (boy; child) and agoo
(to lead), thus the art and science of educating children.
The concept currently comprises more than school education alone. It also encompasses the guidance parents give to
their children, in all respects, not only their cognitive
development.
suggestibility A personality trait indicating the degree to
which a child or adult is susceptible to suggestions made
by other children or adults.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
183
184
Binet, Alfred
a self-study of psychology at the national library, the famous Bibliothe`que Nationale, soon finding his vocation in
that discipline, so new that it was not yet a field of study at
the university. Binets first enthusiasm was stirred by the
then dominant current in cognitive psychology, associationism. Only 1 year later, in 1880, his first paper, on the
fusion of resemblant sensations, was published in the
Revue Philosophique.
Early Career
In 1882, a former schoolmate of Binet, the neurologist
Joseph Babinski, introduced him to the psychiatrists of the
famous hospital, La Salpetrie`re, and Binet began work in
the clinic of Professor Charcot, where hysterical patients
were treated with hypnosis. Only 3 years later, young
Freud came from Vienna to study Charcots methods
in this same clinic. Binet published about a dozen papers
on this subject over the next 7 years, most of them in the
Revue Philosophique. At the same time, he continued his
work in cognitive psychology and published his first book,
La Psychologie du Raisonnement, in 1886, based strictly
on the premises of association psychology.
With John Stuart Mill as his hero, the young Binet
had to mitigate his extremely environmentalist beliefs that
were influenced by his father-in-law, E. G. Balbiani,
a professor in embryology at the Colle`ge de France.
Binet adapted his lectures on heredity for publication.
All this time, Binet had no paid appointment. He must
have had sufficient access to the family capital, being the
only child of a wealthy father. This allowed him excursions
into another new field, zoology. Studying and dissecting
insects in a laboratory led him to write a doctoral thesis,
entitled A Contribution to the Study of the Subintestinal
Nervous System of Insects, defended in 1894. Meanwhile, he had made yet another move by asking for
and obtaining a nonpaid appointment at the Laboratory
of Physiological Psychology at the Sorbonne. Being accepted as a staff member by its director, Henri Beaunis, in
1891, he was to succeed him 3 years later, when Beaunis
retired. In that same year, which also brought him his
doctors title, he and Beaunis founded the first psychological journal in France, lAnnee Psychologique, of
which Binet was to remain the editor until his death,
17 years later. But that was not all. Two of his books
went to print in that fruitful year, an introduction to experimental psychology and a study on the psychology of
chess players and master calculators. Also, his first article
on experimental child development appeared, in collaboration with his co-worker Victor Henri, foretelling what
was to become the focus of his studies in child intelligence
and education.
With this solid and impressive academic record, it is
amazing that Binet was never offered a chair in the French
universities. Notwithstanding his international recognition, even long before his scales of mental tests brought
him international fame. A course of 12 lectures he gave at
the University of Bucharest, in 1895, progressed so well
that the number of students, professors, and interested
citizens increased to the point that the course had to
be continued in a larger audience hall. Also, a year before,
he had been appointed member of the editorial board
of the new Psychological Review, at a time when only
American psychologists were appointed. One can
imagine Binets great disappointment about not being
given a chair in Paris.
A Father Becomes
a Child Psychologist
In 1884, Binet had married Laure Balbiani, daughter of an
embryologist, and in 1885 and 1887, two children were
born, Madeleine and Alice. The father began to observe
the behavior of his little daughters, just as Piaget would do
some 20 years later. Binet was struck by the differences in
behavior patterns the two girls showed, especially their
different styles of voluntary attention. Madeleine always
concentrated firmly on whatever she was doing, whereas
Alice was impulsive. He took notes and began writing
reports on his observations, using the pseudonyms
Marguerite for Madeleine and Armande for Alice. He
published three of these reports in the Revue
Philosophique, when his eldest daughter was 5 years
old. The following quotation from the first report
(1890) described motoric behavior: When M. was learning to walk she did not leave one support until she had
discovered another near at hand to which she could direct
herself . . . while A., in contrast, progressed into empty
space without any attention to the consequences (of falling). On their temperamental differences, Binet noted
that M. was silent, cool, concentrated, while A. was
a laughter, gay, thoughtless, frivolous, and turbulent . . . .
Now, [at their present age] the psychological differences . . . have not disappeared. On the contrary, they
have disclosed a very clear character to their whole mental
development.
In child psychology of the past century, long after
Binets time, people have tried to explain such individual
differences between two sisters or brothers as resulting
from their difference in birth order. The characteristic
differences between Binets two daughters echo the typical differences found in some of those studies between
first- and second-born children. But this research has
practically come to a halt because wide-scale confirmation
has failed to materialize. In Binets days, this was not yet
a hypothesis known to be reckoned with and neither had
Freudian psychoanalytic thinking colonized the minds of
Binet, Alfred
Assessment of Individual
Differences in Cognitive
Functioning
Binet tried many cognitive tasks on his two daughters.
Several tests of memory were devised, from remembering
colors seen to recalling sentences spoken. Other tasks
included interpreting simulated emotions in expressions
(using Darwins original pictures) asking for word definitions (What is it? Tell me what it is?). With answers
by his daughters such as a snail is to step on or a
dog bites, Binet was amazed to see how utilitarian
children are. He made discoveries that nowadays are textbook knowledge in child development. He touched on
these insights in his articles, but he did not bring them
into a systematic framework, e.g., by writing a book on
language development or the development of memory. In
this respect, he differed from other child psychologists
who came after him. For example, Binet touched on the
problems young children have in distinguishing between
what looks like more and what is more. He arranged
beads of different colors or sizes on the table and asked his
daughters to make judgments on quantity, much as Piaget
would do two decades later with his children. But Piaget,
who must have known Binets publications, did not
stop there, and devised many different tasks on number
conservation, describing meticulously how children of
different ages reacted to these tasks, embedding the
185
186
Binet, Alfred
Assessment of Intelligence in
Groups of Children and Mentally
Retarded Adults
Not in the position to offer students grades and diplomas,
nor to offer co-workers a salary, Binet was totally dependent on the enthusiasm of one or two volunteer assistants.
Therefore, Theodore Simon was just the person he
needed. Simon worked in a colony for retarded children
and adolescents, all boys, and had permission to use them
as subjects in tests. Binet first tested Simon, for his competence, persistence, and good faith, as he used to do with
prospective co-workers. Luckily, Simon passed the high
standards Binet had in mind for his assistants. A very
fruitful period followed. Both the differences in age
and the differences in mental capacities in the population
of retarded children and adolescents yielded a mass of
differential data on the many cognitive tasks Binet and
Simon invented for them. Together with his co-worker
Victor Henri, Binet joined other European investigators
in studying mental fatigue in school children. The introduction of compulsory education for all children, including those from the working classes, caused concern about
their endurance during long days of mental exertion. With
another co-worker, Nicholas Vaschide, Binet set out to
measure physical, physiological, and anatomical characteristics of the boys, to study individual differences in
physique and physical force. In order to find meaningful
relations between the many different measures, he
invented a crude measure of correlation, using rank
differences. It is fair to note that this research was met
with devastating criticism, also in the Psychological Review, because of its many flaws in computations and the
doubtful reliability of the measures used. A few years
later, Binet and Simon made extensive studies of the relation between certain measures of intelligence and head
size. Cephalometry was very common in those days, in
psychology as well as in criminology. Both men hoped to
find significant correlations between head size and intelligence. But after many years of data collection and publications, among which was Simons doctoral thesis on
mentally retarded boys, they gave up, because in the normal ranges of intelligence the correlations were too small.
It is only now, with the more precise magnetic resonance
imaging scans of brain volume, that some psychologists
have returned to this old hypothesis. For Binet and his coworkers, it was because of their interest in the distinction
Binet, Alfred
Although Binets first concern was with retarded children, deaf mutes, and the visually handicapped, he was
probably the first to ask special attention for the gifted
child, suggesting the organization of special classes for the
above-averaged. He argued that it is through the elite,
and not through the efforts of the average that humanity
invents and makes progress. Though many of those who
later promoted the Binet Simon scales, such as Goddard
and Terman in the United States, were ardent hereditarians, Binet held a more balanced view. As Siegler has
documented in his fine biographical article, Binet strongly
believed in the potential of education for increasing
intelligence.
187
names, these scholars included Alice Descoeudres in Geneva, Otto Bobertag in Breslau, and Ovide Decroly in
Brussels. It was William Stern, Bobertags colleague in
Breslau, who in 1911 coined the concept mental age and
proposed to compute an intelligence quotient (IQ) by
comparing mental age and chronological age. The IQ
was thus born, albeit still without the multiplication by
100. Thus, a child in those days could obtain an IQ of 0.75!
Nowhere was the reception of the 1908 revision of the
scales to measure intelligence so enthusiastic as in the
United States. In that same year, Henry H. Goddard of
the Training School at Vineland, New Jersey, introduced
to the new scales by Ovide Decroly, took them back to
the States, where he published his adaptation in 1911. He
was so enthusiastic about the discovery he made in old
Europe that he compared the importance of the scales
with Darwins theory of evolution and Mendels law of
inheritance. The most widely known American revision
became the one developed by Lewis Terman, published
in 1916 and standardized on about 2000 children. This
test, known as the Stanford Binet, later revised by
Terman and Merrill in 1937, used the factor 100 as
a multiplier. Termans standardization brought the normal distribution and standard deviation into Binets
invention.
Binets biographer, Theta H. Wolf, quoted John E.
Anderson when she wrote that it is impossible, unless
we lived through the period, to recapture the enthusiasm,
discussion, and controversy that the Binet tests started.
Clearly [Binets work] substituted for the onus of moral
blame, a measurable phenomenon within the childs resources that [was] to be studied henceforth in its own
right. Wolf added that these tests made it possible to
show that measurable differences in mental levels, rather
than voluntary and thus punishable moral weakness,
could be responsible for childrens [below standard]
school achievement. Not everyone in the United States
was charmed by the triumphal progress of the intelligence
scales. Kuhlman was among the first to criticize the scales
for their flaws. William Stern reported to have read in
American publications about binetists who tried to
binetize children and adults. Especially in France the
reception of the Binet Simon scales was met with reservation and disregard. Binet had no easy character and had
been frank with many of his academic colleagues in giving
his opinion on their work. After Binets death in 1911,
Simon became the director of their now world-famous
laboratory. When at the end of the First World War
the young biologist Jean Piaget, aged 22, came to Paris
to study at Bleulers psychiatric clinic and at the
Sorbonne, Simon offered him the opportunity to work
in the laboratory. Simon asked Piaget to standardize
Cyril Burts reasoning test on Parisian children. Although
Piaget undertook this project without much enthusiasm,
his interest grew when he began the actual testing.
188
Binet, Alfred
Acknowledgments
The author is indebted to an anonymous reviewer of this
article for his critical notes and helpful suggestions.
Further Reading
Note: An exhaustive bibliography of Alfred Binet can be found
at the website of Nancy University (www.univ-nancy2.fr).
Anderson, J. E. (1956). Child development, an historical
perspective. Child Dev. 27, 181 196.
Andrieu, B. (2001). Alfred Binet (1857 1911), sa vie, son
oeuvre. In Oeuvres Comple`tes dAlfred Binet, Editions
Euredit. St. Pierre du Mont, Landes, France.
Bertrand, F. L. (1930). Alfred Binet et son Oeuvre. Librairie
Felix Alcan, Paris.
Binet, A., and Simon, T. (1905). Methodes nouvelles pour le
diagnostic du niveau intellectuel des anormaux. Ann.
Psycholog. 11, 191 244.
Franz, S. I. (1898). Review of lAnnee Psychologique, 1998,
Vol. 4, specifically the series of articles by A. Binet and
N. Vaschide. Psychol. Rev. 5, 665 667.
Kuhlman, F. (1912). The present status of the Binet and Simon
tests of intelligence of children. J. Psycho-Asthenics 16,
113 139.
Siegler, R. S. (1992). The other Alfred Binet. Dev. Psychol. 28,
179 190.
Stern, W. (1914, 1928). Psychologie der Fruhen Kindheit.
Quelle & Meyer, Leipzig.
Stern, W. (1920, 1928). Die Intelligenz der Kinder und
Jugendlichen. Barth, Leipzig.
Wolf, T. H. (1973). Alfred Binet. University of Chicago Press,
Chicago.
Biomedicine
Giora Kaplan
The Gertner Institute, Israel
Glossary
biomedical model Based on the dominant natural science
paradigm that guides the research and practice activities
of the medical enterprise.
coping with disease and treatment Viewed as either a stable
coping style, reflecting a personality trait, or as a flexible,
context-specific behavioral process in which an individual
appraises and copes with illness.
health, disease, illness, sickness Health is as much a social
and cultural as a biological issue; health standards change
over time and across cultures. Disease refers to the medical
practitioners perspective as summarized in an official
diagnosis. Illness is a subjective experience referring to
how sick persons perceive, live with, and respond to
symptoms and disability. Sickness underlines the role of the
person who is ill and refers to the social aspects of illness
and its consequences for wider social networks and
macrosystems.
health-related locus of control A personality construct
referring to an individuals perception of the locus of
events as determined internally by his/her own belief that
his/her outcome is directly the result of his/her behavior.
This contrasts with the perception in this regard of external
circumstances, the control of powerful others (doctors), or
the vagaries of fate, luck, or chance.
health-related quality of life Relates both to the adequacy
of material circumstances and to peoples feelings about
these circumstances, including feelings about how actual
circumstances compare with the ideal. Quality of life is seen
today as an ultimate outcome of health care and so is used
for assessing the consequences of specific diseases, for
characterizing the needs of those living with specific chronic
conditions, or as an additional result of the evaluation of
specific treatments.
holistic care Relating to or concerned with complete systems
rather than with parts. Treat focuses on both the mind and
the body, expanding to include the many personal, familial,
social, and environmental factors that promote health,
prevent illness, and encourage healing.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
189
190
Biomedicine
medical enterprise, methods, specific means of measurement, and approaches that have already been incorporated and are almost integral to medical discourse.
Historical Perspective
In the conceptual relationship between social science and
medicine, four important periods can be identified: the
Greek and Roman holistic approach to health and disease,
the enlightenment legacy and development of the biomedical model, the development and consequences of the
epidemiological approach, and the postmodern reactions
to biomedical orthodoxy.
converging with a manifest disjunction of mind and matter, was a novel conception of the universe stemming from
Newtons Mathematical Principles of the Natural World.
Newtonian mechanics sanctioned the idea of organized
complexity, like that found in biological systems, as, in
principle, reducible to the interaction of its physical
parts. The influential ideas of this physical paradigm,
which begot the Age of Enlightenment, had a decisive
influence on the life sciences and a revolutionary impact
on medical thinking. The body came to be viewed as
a machine, independent of psychological and environmental factors. As such, disease, like malfunctioning in
a machine, suggested the need for repair. Two defining
characteristics of the Enlightenment period have had an
ongoing influence to the present day, particularly on the
medical sciences. These are faith in science and a belief in
the human ability to exercise control over that which is
understood scientifically. Both notions derive from the
idea of progress. Embracing the natural science paradigm
provided the foundation for the rise of biomedicine. By
the mid-19th century, medicine was infused by a physicalist approach. Most notable during this period was the
recognition that certain organic entities (e.g., bacteria)
caused certain diseases, and their pathogenic effect
could be avoided or reversed by certain substances
(e.g., antitoxins and vaccines). The concepts that are at
the foundation of the biomedical model surface as arguments concerning what is medically relevant. Most members of the medical profession delineate their field and
their responsibility by those elements of disease that can
be explained in the language of physiology, biology, and,
ultimately, biochemistry and physics. They prefer not to
concern themselves with psychosocial issues that lie outside medicines responsibilities and authority. By the mid20th century, the underlying premises of medicine were
that the body can be separately considered from the mind
(dualism) and that understanding the body can proceed
from knowledge of its parts and how they interrelate (reductionism).
Biomedicine
191
192
Biomedicine
success in dominating the health care field is so remarkable that its paradigm has attained virtually unique
legitimacy.
In sum, medicine can be seen as a specific culture with
its peculiar and distinctive perspective, concepts, rituals,
and rhetoric. Medicine provides a context in which public
consciousness about health and illness and about the role
of medicine in society is generated.
Biomedicine
it takes account of the behavior of others and is thereby
oriented in its course.
[Wallace and Wolf, 1991: 238]
193
Disease
Impairment
Disability
Handicap
194
Biomedicine
Biomedicine
195
circumstances, by which control is in the hands of powerful others (doctors), or outcomes are due to the vagaries
of fate, luck, or chance. Some research suggests that what
underlies the internal locus of control is the concept of
self as agent. This means that our thoughts control our
actions, and that when we apply this executive function of
thinking, we can positively affect our beliefs, motivation,
and performance. We can control our own destinies and
we are more effective in influencing our environments
in a wide variety of situations.
A health-related Locus of Control Inventory developed in the mid-1970s was derived from social learning
theory. This tool was a one-dimensional scale containing
a series of statements of peoples beliefs that their health
was or was not determined by their own behavior. A further development was the three 8-item Likert-type
internal, powerful others, chance (IPC) scales, which
predicted that the construct could be better understood
by studying fate and chance expectations separately from
external control by powerful others. The locus of control and IPC approaches were combined to develop the
Multidimensional Health Locus of Control (MHLC)
Scale. The MHLC Scale consists of three 6-item scales
also using the Likert format. This tool is used to measure
quality of life in patients with diseases or disabilities such
as breast cancer, irritable bowel syndrome, chronic leg
ulcer, and traumatic spinal cord injury. The second aspect
is medical outcomes as assessment for quality of treatments (for example, for cervicogenic headache, after
cardiac surgery; treatment outcome in subgroups of uncooperative child dental patients; outcomes of parent
child interaction therapy). The last aspect is efficiency
of health services or planning a new service. Examples
of application include prediagnostic decision-making
styles among Australian women, relating to treatment
choices for early breast cancer, intention to breast feed,
and other important health-related behaviors and beliefs
during pregnancy; predicting the ability of lower limb
amputees to learn to use a prosthesis; and planning
a program of awareness in early-stage Alzheimers
disease.
Sense of Coherence Sense of coherence (SOC) is
a global construct expressing the degree to which
a person has a pervasive and dynamic, but lasting, feeling
that the internal and external stimuli and stressors in his/
her environment are (a) comprehensible, i.e., predictable,
structured, and explicable, (b) manageable, i.e., there are
resources available to meet the demands of these stimuli,
and (c) meaningful, i.e., the demands are challenges worthy of engagement and coping. It has been proposed that
a strong SOC would help to mediate and ameliorate
stresses by influencing coping efforts. This concept is
the basis of the salutogenetic model, i.e., explaining
how people cope with stressors such as illness and how
196
Biomedicine
Biomedicine
197
Psychosocial Professional
Partnerships in Different Areas of
the Medical System
Todays complex health care system employs psychosocial
professionals as members of the caregiver staff; as members of planning, priority-setting, or evaluation teams; as
partners in research projects; and as consultants in many
areas.
198
Biomedicine
Program/Project Evaluation
Because the delivery of health care is a social intervention,
many of the approaches and methodologies used are adaptations of techniques developed in other social areas.
Program evaluation is a collection of methods, skills, and
sensitivities necessary to determine whether a human service is needed and is likely to be used, whether it is sufficiently intense to meet all the unmet needs identified,
whether the service is offered as planned, and whether the
human service actually does help people in need at
a reasonable cost and without undesirable side effects.
Through these activities, evaluators seek to help improve
programs, utilizing concepts from psychology, sociology,
administrative and policy sciences, economics, and
education.
Biomedicine
such as education, behavioral training, individual psychotherapy, and group interventions, are developed and operated by health professionals to improve the health/
illness outcomes. Social workers are an integral part of
the hospital care team. Considering the social consequences of the patients illness, they have acquired
a major role in discharge planning and in helping the
patient to cope better with the illness situation (e.g., utilizing community services, special devices, legal rights,
and home care). However, there is still a need for
more theoretical and methodological development of
the concept of coping and its extensive application in
the medical field.
Professional/Patient Communication
and Relationships
Clinical work is a practice of responding to the experience
of illness, and, as such, its context is a relational encounter
between persons. Though the general parameters of the
doctor patient relationship have been constant throughout history, important elements vary in time and place:
patient characteristics, the modes of organizing practice,
and the social and economic context within which the
relationship exists. With the breakdown of the paternalistic model of medical care, there is growing awareness of
the importance of effective communication between the
caregiver and the patient and his/her family. Obtaining
the patients compliance with medical treatment, or with
instructions and recommendations given to him/her by
the medical staff, has become much more difficult. The
expansion of consumerism into health care, relating to the
patients as clients, has also had a strong influence.
Efforts are being made to develop appropriate techniques
among medical personnel, and in some cases new mediator roles have been created. There is a growing interest in
better understanding patient perceptions and in techniques to manipulate patient behaviors.
199
Research
Psychosocial variables are frequent in medical research;
they take the form of independent variables in the etiology
of disease or in the use of health services, of dependent
variables in evaluating treatments and consequences of
disease, and of intervening variables in understanding the
development of disease or in the impact of treatment.
Many difficulties are encountered when the premises
of biomedicine attempt to explain psychosocial factors
in disease causation. These premises are, in principle,
if not always in practice, reductionist and dualist. The
significance of these difficulties is heightened as psychosocial factors increasingly emerge as factors in todays
disease burden (afflictions of civilization). However, social scientists are much more frequently becoming partners in health research, rising to the great challenge of
influencing medical knowledge.
200
Biomedicine
other hand, social scientists are accustomed to refer exclusively to their professional colleagues, with no genuine
attempt to reach other audiences. Both sides need to
change their attitudes in order to improve the dialogue,
not just as individuals but as a true interdisciplinary operation.
Theory-Grounded Research
The development of social science as an occupation, especially in American society, carries with it some distancing from theory and submersion in survey opinion polls
and experiments with strong emphasis on strict and polished techniques and methodologies. In many cases, this
aspect of the social scientists work focuses only in providing very specific data commissioned by employers or
clients, with almost no significance for the broader theoretical perspective. Statistical data, even social measurements, will not become sociology or economics or social
psychology until they are interpreted and slotted into
a theoretical reference framework. Practical problems
have to be translated into clear conceptual definitions,
and hypotheses have to be derived strictly from theories
Further Reading
Anthony, R. N., and Young, D. W. (1988). Management
Control in Nonprofit Organizations. Richard Irwin,
Homewood, IL.
Antonovsky, A. (1987). Unraveling the Mystery of Health.
Jossey-Bass, San Francisco, CA.
Antonovsky, A. (1993). The structure and properties of the
sense of coherence scale. Social Sci. Med. 36(6), 725 733.
Armstrong, D. (2003). Social theorizing about health and
illness. In The Handbook of Social Studies in Health
& Medicine (G. L. Albrecht, R. Fitzpatrick, and
S. C. Scrimshaw, eds.). Sage Publ., London.
Beck, A. T., Ward, C. H., Mendelson, M., Mock, J., and
Erbaugh, J. (1961). An inventory measuring depression.
Arch. Gen. Psychiatr. 4, 551 571.
Berger, P. L. (1967). Invitation to Sociology: A Humanistic
Perspective. Penguin Books, Harmondsworth, UK.
Carver, C. S., Scheier, M. F., and Weintraub, J. K. (1989).
Assessing coping strategies: A theoretically based approach.
J. Personal. Social Psychol. 56, 267 283.
Coser, L. A. (1971). Masters of Sociological ThoughtIdeas in
Historical and Social Context. Harcourt Brace Jovanovich,
New York.
Culyer, A., and Newhouse, J. (eds.) (2000). Handbook of
Health Economics. Elsevier, Amsterdam.
Drummond,M.F.,OBrien,B.,Drummond,O.,Torrance,G.W.,
and Stoddart, G. L. (1997). Methods for the Economic
Evaluation of Health Care Programs, 2nd Ed. Oxford
Medical Publ., Oxford.
Foss, L., and Rothenberg, K. (1988). The Second Medical
RevolutionFrom Biomedicine to Infomedicine. New
Science Library, Boston, MA.
Foucault, M. (1994). The Birth of the Clinic: An Archaeology
of Medical Perception. Vintage Books, New York.
Fowler, F. J., Jr. (1995). Improving Survey Questions: Design
and Evaluation. Sage Publ., Thousand Oaks, CA.
Freidson, E. (1988). Profession of Medicine: A Study of the
Sociology of Applied Knowledge. The University of Chicago
Press, Chicago.
Gordis, L. (1996). Epidemiology. W. B. Saunders,
Philadelphia, PA.
Biomedicine
Hippler, H. J., Schwartz, N., and Sudman, S. (eds.) (1987).
Social Information Processing and Survey Methodology.
Springer-Verlag, New York.
Holland, J. (ed.) (1998). Psycho-Oncology. Oxford University
Press, New York.
Jobe, J. B., and Mingay, D. J. (1991). Cognitive and survey
measurements: History and overview. Appl. Cogn. Psychol.
5, 175 192.
Kabacoff, R. I., Miller, I. W., Bishop, D. S., Epstein, N. B., and
Keitner, G. I. (1990). A psychometric study of the McMaster
Family Assessment Device in psychiatric, medical, and nonclinical samples. J. Family Psychol. 3, 431 439.
Kaplan, G., and Baron-Epel, O. (2003). What lies behind the
subjective evaluation of health status? Social Sci. Med. 56,
1669 1676.
McDowell, I., and Newell, C. (1987). Measuring Health:
A Guide to Rating Scales and Questionnaires. Oxford
University Press, New York.
McHorney, C. A., Ware, J. E., and Raczek, A. E. (1993). The
MOS 36-Item Short-Form Health Survey (SF-361 ): II.
Psychometric and clinical tests of validity in measuring
physical and mental health constructs. Med. Care 31(3),
247 263.
Merton, R., and Barber, E. (1963). Sociological ambivalence.
In Sociological Theory, Values and Sociocultural Change
(E. A. Tiryakian, ed.). Free Press, New York.
Nichols, D. S. (2001). Essentials of MMPI-2 Assessment. Wiley
Publ., New York.
Parsons, T. (1951). Social structureand dynamic process: The
case of modern medical practice. In The Social System,
pp. 428 479. The Free Press, New York.
Plough, A. L. (1986). Borrowed Time: Artificial Organs and the
Politics of Extending Lives. Temple University Press,
Philadelphia, PA.
Radloff, L. S. (1977). The CES-D scale: A self-report
depression scale for research in the general population.
Appl. Psychol. Measure. 1, 385 401.
Saks, M. (2003). Bringing together the orthodox and alternative in health care. Complement. Therap. Med. 11,
142 145.
201
Built Environment
Karen A. Franck
New Jersey Institute of Technology, Newark, New Jersey, USA
Glossary
architectural programming The information-gathering
phase that precedes architectural or interior design
decisions; may include behavioral research to determine
organizational goals and user needs.
behavior mapping An observational method for recording
details of ongoing activities in specific locations.
behavior setting A concept referring to the regularly
occurring combinations of physical features and patterns
of use and meaning that constitute everyday environments.
cognitive map An internal spatial representation of the
environment that takes the form of a schematic map.
participatory design A process of designing or planning an
environment to be built (or modified) that includes future
occupants in decision making.
personal space The zone around the body that a person, and
others, recognize as that persons space; zone size varies
according to activity, context, culture.
post-occupancy evaluation Research conducted in a
recently built (or renovated) environment to determine
how well user needs have been met.
Sanborn maps Detailed maps of cities, showing land
subdivision, building forms, construction, uses, and street
widths; used primarily by city governments.
spatial syntax A technique for graphically representing
structure of circulation in or around the interior or exterior
of a built environment.
The built environment has been the subject of social science research in the United States since the 1950s and
1960s. With the twin aims of filling a gap in the social
sciences, which traditionally ignored relationships between the physical environment and human behavior
and experience, and of improving the quality of built
environments for occupants, members of a variety of disciplines have pursued diverse research approaches and
Introduction
When and Why the Built Environment
Social scientists in the United States started to study the
built environment and its relationship to human behavior
and experience in the 1950s and 1960s. In response to
inquiries from psychiatrists and architects about how
to improve the design of mental wards, psychologists
Humphrey Osmond and Robert Sommer in Canada,
and Harold Proshansky and his colleagues William Ittelson and Leanne Rivlin in New York, began to explore how
the design of these settings influences patient behavior.
Psychologist Roger Barker had already been documenting the precise location, content, and sequence of everyday activities of children in small towns in the Midwest.
As interest in the built environment grew among
psychologists, concepts of privacy, personal space, and
territoriality became useful, as evident in the research
of Irwin Altman. Following earlier traditions in their discipline, sociologists, including Lee Rainwater and
Herbert Gans, studied the experiences of residents in
public housing, urban neighborhoods, and suburban
communities. Because social scientists traditionally
focus on people and their experiences, much of the research treated the built environment as the independent
variable, or as context. One researcher who treated the
environment as a dependent variable early onthat is, as
the product of human actionswas an architect. Amos
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
203
204
Built Environment
Characteristics of Research
Research in this field is typically conducted in real-world
settings and not in laboratories. Although field research
is common in anthropology and sociology, for psychologists it was initially a disciplinary and methodological departure, which followed the legacy of Kurt Lewins action
research. Researchers may also explore peoples responses to graphic representations of actual or ideal
environments or may use simulated environments.
These include physical models, full-scale constructions
of rooms or entire buildings, and, more recently, digital
modeling and virtual reality models. Studies may also be
longitudinal, looking at situations before and after physical interventions have been made or before and after
people move, and comparative between different kinds
of environments.
Researchers acknowledge that the built environment
and human behavior and experience affect each other
that the environment does not determine human behavior
and experience, and that what influence it does have is
very likely modified by a host of other factors. The built
environment is recognized to be adaptable, to change over
time, and to have both fixed and semifixed and movable
features. Thus, even though any given research endeavor
must, necessarily, focus on a few aspects of environment
and people and often looks primarily at the environment
as an independent variable (context of human action) or as
a dependent variable (product of human action), the true
complexity of the situation has been a key concern in the
development of theories and methods. One concept that
addresses this complexity, originally proposed and used
by Roger Barker and further developed by Allan Wicker,
is the behavior setting. This refers to the combination of
a particular type of built environment and its use, thus
encompassing the regularly occurring combination of
physical features, furniture and objects, patterns of use,
kinds of users, and related rules of a small-scale environment. For instance, an elementary school classroom or
a dentists office is a behavior setting. Further development of this concept has emphasized the importance of
meaning and cultural variations in behavior settings and
the linking of both settings and activities into systems.
Both built and natural environments are subjects of
study in this field, although studies of the latter are not
discussed here. Research has covered all kinds of built
environments (houses, housing complexes, day care centers, offices, markets, malls, schools, prisons, hospitals,
post offices, courthouses, museums, zoos, parks, streets
and plazas, and environments for particular groups, including housing for the elderly or those with dementia)
at different scales, ranging from individual rooms or
Built Environment
205
Occupants
For social scientists studying the built environment, the
most immediate and possibly the most obvious sources of
information are the present, past, or future occupants.
In contrast, given their training, designers and other
building-related professionals would be more likely to
look to the environments first. This was a key contribution
social science made to the design disciplines early on: the
recommendation to look at how environments are actually
used and to talk to the occupants.
With their interest in naturally occurring behavior
in everyday environments, researchers often employ
206
Built Environment
Environments
Built environments, regardless of the presence of people,
are key sources of information. One common and very
useful technique is to observe environments for visual,
auditory, and olfactory evidence of current or past use and
for adaptations that occupants have made. With drawings,
cameras, video cameras, plans of the environment, or
precoded checklists, observers record the conditions of
settings and a variety of physical traces. As John Zeisel has
listed them, traces could include by-products of use
(erosions, leftovers, missing traces), adaptations for use
(props, separations, connections), displays of self (personalization, identification, group membership), and public
messages (official, unofficial, and illegitimate signs). Leftovers can include items left outdoors, such as toys or lawn
furniture; erosion might be shown in the way a path has
been worn through the grass, and an adaptation might be
the hole made in a fence.
To understand how interior space is used, and what
problems may arise with the size and relationships of
rooms, researchers will record on floor plans (and through
photographs) the location, type, and size of furniture and
other belongings, as Sandra Howell did in an early study
of housing for the elderly, resulting in design guidelines for future buildings. This kind of inventory of
space, furniture, and equipment is a very useful tool in
Built Environment
207
208
Built Environment
Current Status
The built environment in relationship to human behavior
and experience continues to be studied in a variety of
ways, for a variety of purposes. Although the degree of
interest among individual practicing architects and architecture schools is considerably less now, at the beginning
of the 21st century, than it was in the 1960s and 1970s,
Further Reading
Baird, G., Gray, J., Issacs, N., Kernohan, D., and McIndoe, G.
(eds.) (1996). Building Evaluation Techniques. McGraw
Hill, New York.
Bechtel, R., and Churchman, A. (eds.) (2002). Handbook of
Environmental Psychology. John Wiley & Sons, New York.
Bechtel, R., Marans, R., and Michelson, W. (eds.) (1990).
Methods in Environmental and Behavioral Research.
Krieger, Melbourne, FL.
Built Environment
Cherulnik, P. (1993). Applications of Environment Behavior
Research: Case Studies and Analysis. Cambridge University
Press, New York.
Groat, L., and Wang, T. (2002). Architectural Research
Methods. John Wiley & Sons, New York.
Hershberger, R. (1999). Architectural Programming and
Predesign Manager. McGraw Hill, New York.
Lang, J., Vachon, D., and Moleski, W. (eds.) (1996). Designing
for Human Behavior: Architecture and the Behavioral
Sciences. Van Nostrand Reinhold, New York.
Marans, R. W., and Stokols, D. (eds.) (1993). Environmental
Simulation: Research & Policy Issues. Plenum, New York.
Michelson, W. (1976). Man and his Urban Environment:
A Sociological Approach. Addison-Wesley, Reading, MA.
209
Business Research,
Theoretical Paradigms That
Inform
Gayle R. Jennings
Central Queensland University, Rockhampton, Queensland, Australia
Glossary
axiology The study of ethics and values.
chaos theory Describes and explains a world of unstable
systems (nonlinear and nonintegral) using descriptive
algorithms. In the social sciences, the theory may be used
metaphorically to describe a setting or organization as
chaotic.
complexity theory A view of the world as being constituted
of complex and open systems composed of agents, each of
which interacts with the others to move from a state of
disorder to order by self-organization.
critical realism A paradigm related to postpositivism in
which truths, laws, and facts are fallible and theory bound
to specific contexts (for example, social and historical). An
objective epistemological stance recognizes that researcher
biases and values may influence research projects. The
methodology is primarily quantitative, although mixed
methods may be incorporated. Axiology is value free and
extrinsically based. Critical realism is utilized by valuebased professions.
critical theory The ontology of this paradigm recognizes that
there are complex, hidden power structures in the world
that result in oppression and subjugation of minority
groups. The researchers role is to make visible these
structures and to become the champion of the minority
via a value-laden axiology that aims to change the world
circumstances of those being studied, using a subjective
epistemology and qualitative as well as some quantitative
methodologies.
epistemology The science of knowledge; also the relationship
between the researcher and that which is to be known. The
relationship assumes either an objective or a subjective
stance.
feminist perspectives A generic term used to describe
a number of feminist perspectives, such as radical feminism,
marxist/socialist feminism, liberal feminism, and postmodern
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
211
212
Introduction
Business incorporates a number of fields/disciplines
of study, including accounting, auditing and finance, commerce, economics, entrepreneurship, human resource
management, management studies, marketing, organizational studies, political sciences, public administration,
strategic management and planning, and tourism, among
other fields and branches of study. Each field/discipline/
branch has its own specific history and predilections toward social science methods. Because the study of each
field is beyond the scope of this article, the discussion here
focuses on applying business as a generic descriptor
deemed to be representative of each of the related fields.
A consequence of this application is that focus is restricted
to the emblematic social science methods used across the
contributing fields.
213
phenomenology stem from the philosophical phenomenology of Edmund Husserl (1859 1938). In particular,
Alfred Schutz (1899 1959) drew on Husserls work to
fashion what has become known as social phenomenology.
Several of Schutzs students, Peter Berger (1925 ) and
Thomas Luckmann (1927 ), are recognized for their
contribution to the development of social constructivism.
The interpretive social science paradigm is linked to
the work of Max Weber (1864 1920) and Wilhem Dilthey
(1833 1911), especially their respective work in regard
to verstehen (empathetic understanding). Hermeneutics
is another related term appearing in the literature
and is often described as aligned with the interpretive
paradigm.
Within business literature (and other literature), there
exist examples of the implicit or explicit use of these
terms interchangeably and/or synonymously. At
a general level, this practice recognizes that each of
these approaches is associated with the social nature
of the construction of social reality, as well as understanding the world from the perspective of an insider
specifically, how that insider constructs meaning in regard to the social context being studied (in this case, the
business world). That being said, it must also be stressed
that each term ascribes somewhat differing emphases in
regard to ontological, epistemological, methodological,
and/or axiological stances. To iterate, although the
terms are sometimes used interchangeably, they do
not in fact mean the same thing. Therefore, in the following discussions (where space permits), the four terms
will be used together to remind readers that each is
somewhat different from the other. Broadly speaking,
then, these paradigms perceive the world as constituted
of multiple realities (ontology), and assume to understand/interpret those realities through the tenets of
a subjective epistemology, qualitative methodology,
and value-laden and intrinsically based axiology. Associated with the paradigms of social constructionism, social constructivism, social phenomenology, and the
interpretive social science are the related critical theory,
feminist perspectives, postmodern, and participatory
paradigms. These four paradigms are also utilized to
design studies and research projects in business settings.
Together with social constructionism, social constructivism, social phenomenology, and the interpretive social
science paradigms; critical theory, feminist perspectives,
postmodern, and participatory paradigms are associated
with a soft science approach. The terms hard and
soft sciences are used to distinguish between the scientific, objective inquiry of the natural or physical sciences and the naturalistic (as occurring in real world),
subjective inquiry of the social or humanistic sciences.
Critique exists in the literature regarding the use of the
terms, because they are considered to be pejorative and
do not effectively portray the different approaches each
214
type of science uses for the study of business and business related phenomena.
Table I provides an overview of the differences between each of the paradigms presented in this section.
This table incorporates a summary of the various paradigms in regard to their origins, associated synonyms and
related terms, and overall focus of the research intent, as
well as stances in regard to ontology, epistemology, methodology, and axiology. (Due to space limitations, of the
social constructionism, social constructivism, social phenomenology, and interpretive social science paradigms,
only social constructivism is represented in Table I.)
Table I
Descriptors
Positivism
Postpositivism
Origins
Synonyms and/
or related
terms/
comments
Empiricism, realism,
naive realism
objectivism,
foundationalism,
representationalism
Focus
Explanation (Erklaren),
realism, objectivism
Ontology
Critical realism
Critical theory
Social
constructivism
Feminist
perspectives
Postmodern
Participatory
Founded on the
principles
of hard/natural
sciences
(Naturwissenschaften)
Described as a midpoint
between realism and
relativism
Founded in human
Founded in human
Founded in human
(social) sciences
(social) sciences
(social) sciences
(Geisteswissenschaften)
(Geisteswissenschaften)
(Geisteswissenschaften)
Founded in human
Founded in human
(social) sciences
(social) sciences
(Geisteswissenschaften)
(Geisteswissenschaften)
A number of types:
Marxist/socialist,
postpositivist,
postmodern critical
theorists
Phenomenology
interpretivism,
constructivism,
constructionism
A number of types:
Marxist/socialist, liberal,
postmodern,
poststructural, critical
feminist empiricism,
standpoint theorists
Ludic postmodernism,
Cooperative inquiry,
oppositional
participatory action
postmodernism, critical
research, action inquiry,
postmodernism
appreciative inquiry
Explanation (Erklaren),
realism, objectivism
Explanation (Erklaren),
realism, objectivism
Understanding
(Verstenhen),
historical realism,
perspectivism,
interpretivism,
intentionalism
Understanding
(Verstenhen),
relativism,
perspectivism,
interpretivism,
intentionalism
Understanding
(Verstenhen),
relativism
perspectivism,
interpretivism,
intentionalism
Sociohistorical multiple
realities; realities
reflective of power
relations
Multiple perspectives/
realities
Multiple realities
mediated by
gendered constructs
Multiple realities; no
privileging of position;
skepticism towards
truth and -isms
Multiple realities
collectively
constructed via
interactions
Epistemology
Objective
Objective: acknowledges
potential for
researcher bias
Objective: acknowledges
Subjective unless
potential for researcher
postpositivist critical
bias
theory (objective)
Subjective
Subjective: participants
and researcher/s as
coresearchers
Intersubjectivity
Subjective objective
Methodology
Quantitative
Quantitative, (use of
mixed methods)
Quantitative, inclusion of
mixed methods
Qualitative, some
quantitative
Qualitative
Qualitative (predominantly)
Qualitative
Qualitative,
quantitative
mixed method
Axiology
Skeptical of emancipation
and transformation;
continuous
deconstruction process
Value laden;
transformation
a
Developed from texts presented in Denzin and Lincolns 1994 and 2000 editions of the Handbook of Qualitative Research, particularly the work of Denzin and Lincoln (2000), Lincoln and Guba (2000), Schwandt (2000), and Guba and Lincoln (1994), as
well as Gubas 1990 text, The Paradigm dialog, and Jennings Tourism Research. The German terms used in this table are drawn from the writings of Dilthey (1833 1911).
216
Quantitative Methodology
A quantitative methodology is generally associated with
the use of hypotheses to represent causal relationships
as per the ontological perspective of the paradigms informing this methodology. Subsequently, a quantitative
methodology is associated with hypothetico-deductive
paradigms (empiricism, positivism, postpositivism), which
are deductive in nature because they deduce reality and
then establish the nature of that reality by testing hypotheses. As a consequence, researchers generally design their
research projects utilizing hypotheses and a priori theories. The research is conducted by the researcher from an
objective stance; that is, the researcher takes an outsider
position (an etic position) to ensure that bias and values do
not influence any research outcomes. The empirical data
gathered will be used to test the hypotheses to determine
whether the empirical data support or do not support
those hypotheses. The overall research design will be organized and reported so that it may be repeated. Sampling
methods will tend to be random or probabilistically determined. The data that are collected will be reduced
numerically and analyzed using mathematical and statistical methods. The reporting of quantitative methodologically based research follows the structure of the hard
sciences, i.e., introduction, literature review, methodology, findings, discussion, and conclusion. The voice used
in the genre of reports is third person, passive. Findings
from research projects using quantitative methodologies
are usually representative and generalizable to the wider
study population.
Qualitative Methodology
A qualitative methodology is associated with holistic-inductive paradigms (social constructionist, social constructivism, phenomenological, and interpretive social
sciences approaches). Holistic-inductive paradigms enable researchers to study (business) phenomena in
their totality and complexity, instead of focusing on
sets of variables and subsequent causal relationships
Mixed Methods
The term mixed methods is sometimes used by
business researchers to describe mixed methodologies.
As previously noted, some business researchers would
dispute the potential for this to occur, due to the incommensurability of the ontological perspectives of some
paradigms. Mixed methodologies generally occur in different phases of research design. Some research designs
may include concurrent methodologies, a quantitative
and a qualitative methodology to gather the data or
empirical materials to illuminate business phenomena
from different paradigmatic perspectives. Generally,
mixed methods infer methods (data collection and
sometimes analysis) within a research design that may
be identified as either primarily positivistic or social constructionist, social constructivist, phenomenological, or
interpretive in nature. The following examples are of
mixed methods as mixed methodologies:
1. Qualitative exploratory study (phase one of design)
informs quantitative data collection tool construction
(phase two) in a larger quantitative research design.
2. Quantitative exploratory study (phase one of design)
informs qualitative research design (phase two of design).
3. Quantitative and qualitative studies conducted simultaneously or at separate times within a larger interdisciplinary research project.
Table II provides an overview of a comparison between
quantitative and qualitative methodologies. Note that
mixed methods have not been included, because essentially the mixing of methods will depend on which is the
dominant paradigm informing the research process and
217
Quantitative
Qualitative
Paradigm
Ontological perspective
Nature of reality determined by
Hypothetico-deductive
Based on causal relationships
Hypotheses, a priori theories
Epistemological stance
Purposea
Research position
Nature of research design
Sampling
Analysis
Report style
Outcomes
Objective
Explanation
Outsider, etic
Structured, replicable
Probabilistically determined
Mathematical and statistically determined
Scientific report
Generalizable to the population of interest
Holistic-inductive
Illuminating multiple realities
Grounding in real-world business and
business-related contexts
Subjective
Understanding and interpretation
Insider, emic
Emergent/developmental, content specific
Nonprobabilistically determined
Emblematic themes
Narrative text
Case specific; may be generalized to other
similar cases
a
The Purpose descriptors are based on the work of Dilthey (1833 1911), particularly the terms Erklaren (explanation) or Erklarung (abstract
explanation) and Verstehen (understanding or empathetic understanding) (see Table I). Verstehen was also used by Max Weber (1864 1920). Mixed
methods will assume varying positions regarding each of the descriptors, depending on the degree of proclivity toward either the hypothetico-deductive
or holistic-inductive paradigm. The potential permutations are manifold.
Conclusion
Business research is informed by positivistic and social
constructionist, constructivism, phenomenological, or interpretive paradigms. Associated with the latter cluster of
paradigms are critical theory, feminist perspectives, and
postmodern and participatory paradigms. Within the positivistic paradigms, a quantitative methodology informs
the overall research design. Social constructionist, constructivist, phenomenological, and interpretive paradigms will inform business research primarily through
a qualitative methodology.
The positivistic paradigms and their associated methods
of data collection and analysis hold greater sway, compared to other paradigms. A number of business researchers critique the hegemony of the positivistic paradigms,
and as a result, the use of social constructionist, constructivist, phenomenological, and interpretive paradigms informing research gradually increased in the latter half of
the 20th century, a trend that continues in the 21st century. Moreover, mixed methods reflect the recognition of
the advantages of both types of data collection and an
attempt to better represent knowledge and truths of
the world by using both approaches. Consequently,
within current world circumstances at the beginning of
the 21st century, business researchers continue to discuss
and/or query the dominant hegemony. Some scholars
assume a position that recognizes that it is not an
Further Reading
Belkaoui, A. (1987). Inquiry and Accounting, Alternative
Methods and Research Perspectives. Quorum Books,
New York.
Bhaskar, R. (1986). Scientific Realism and Human Emancipation. Verso, London.
Collis, J., and Hussey, R. (2003). Business Research, A Practical
Guide for Undergraduate and Postgraduate Students,
2nd Ed. Palgrave Macmillan, Houndmills, Hampshire, UK.
Denzin, N. K., and Lincoln, Y. S. (eds.) (2000). Handbook of
Qualitative Research, 2nd Ed. Sage, Thousand Oaks, CA.
Gummesson, E. (1999). Qualitative Methods in Management
Research, 2nd Ed. Sage, Newbury Park, CA.
Jennings, G. (2001). Tourism Research. John Wiley and Sons,
Milton, Australia.
Remenyi, D., Williams, B., Money, A., and Swartz, E. (1998).
Doing Research in Business and Management, An Introduction to Process and Method. Sage, London.
Robson, C. (2002). Real World Research, 2nd Ed. Blackwell,
Malden, MA.
Glossary
axiology The study of ethics and values.
data Units or records of information gathered in the course of
a research study/project. The term is usually associated with
quantitative methodology. Data units may be derived, for
example, from answers to questions in surveys, observations, or records of experiments. Data units are usually
aggregated, represented in numeric form, and subsequently
analyzed mathematically or statistically.
empirical materials Information gathered through the
processes associated with a qualitative methodology; may
include, for example, records of observations, interview
transcripts, and visual images. Empirical materials are
usually recorded in textual or visual form rather than being
reduced to numeric representations.
epistemology The science of knowledge; also the relationship
between the researcher and that which is to be known. The
relationship assumes either an objective or a subjective
stance.
interpretive social science paradigm Also referred to as the
interpretive paradigm, this term is associated with (social)
constructionism, (social) constructivism, social phenomenology, hermeneutics, and relativist approaches.
method Strategy and/or tool used by researchers to collect
and analyze data or empirical materials.
methodology A set of principles that provides a guiding
framework for the design of research projects/studies.
ontology The worldview or representation of reality particular to a specific theory or paradigm.
paradigm A set of beliefs regarding how the world operates/
functions. Paradigms have specific stances in regard to
ontology, epistemology, methodology, and axiology.
participatory paradigm The worldview (ontology) is generated collectively and recognizes multiple realities. The
research process emphasizes participant involvement,
a subjective epistemology, and the use of qualitative and
quantitative methodologies based on the principles of
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
219
220
Methods of data collection and analysis in business research are guided by the overall paradigm that informs the
research process. Each paradigm associates with primarily a quantitative or qualitative methodology. Some mixing
of methodologies may also occur. Examples of quantitative methodological forms of data collection include surveys, experimental and quasi-experimental methods,
observation, forecasting, nominal group technique,
focus groups, the delphic method, the documentary
method, case studies, and longitudinal studies. Qualitatively informed methods of empirical materials (data)
collection include semistructured interviews, in-depth interviews, observation, action research, focus groups, the
delphic method, the documentary method, case studies,
and longitudinal studies. A quantitative methodology
incorporates various methods of data analysis, including
descriptive and inferential statistics, consideration of
levels of significance, and Type I and Type II errors.
A qualitative methodology draws on, for example, content
analysis, successive approximation, constant comparison,
domain analysis, ideal types, and grounded theory analysis for empirical materials (data) analysis.
Introduction
Business research is predominantly informed by the
paradigms of positivism and postpositivism, particularly
critical realism. However, other paradigmssocial constructionism, social constructivism, social phenomenology, and the interpretive social science paradigms
are now being included in the repertoire of business researchers. Related to this group of paradigms are
critical theory, feminist perspectives, and postmodern
and participatory paradigms, although there is debate
as to whether feminist perspectives and postmodernism
are independent paradigms or differing perspectives
among social constructionism, social constructivism, social phenomenology, and the interpretive social
science paradigms. All of these paradigms espouse specific ontological, epistemological, methodological, and axiological tenets. In particular, the tenets of each paradigm
influence the methodology used to design business research projects. To be specific, positivistic and postpositivistic as well as chaos and complexity theory paradigms
draw on a quantitative methodology. Social constructionist, social constructivist, social phenomenological, and
interpretive social science-related paradigms generally
use a qualitative methodology. Research using mixed
methods incorporates both quantitative and qualitative
methodologies (and quantitative and qualitative methods)
in differing amounts and phases of research projects.
221
Types of Surveysa
Type of survey
By method of administration
Mail surveys
Telephone surveys
Face-to-face interviews
e-Questionnaires
Self-completion surveys
By location of administration
In situ surveys
(including
organizational surveys)
Intercept surveys
Household surveys
Omnibus surveys
Cost relative
to other
survey types
Implementation time
relative to other
survey types
Response rates
relative to other
survey types
Low
Long
Low
Moderate
Short medium
Moderate
High
Medium long
High
Low
Short long
Moderate
Moderate
Short medium
Low (unless
administrator
present)
High
Short long
Moderate high
Moderate
Short medium
Low moderate
High
Medium
Moderate high
High
Medium long
Moderate high
Implementation
process
Questionnaires distributed using
postal services
Interviewer-conducted structured
interviews, conducted via telephone
Interviewer-completed questionnaire,
conducted within businesses and
business-related settings
Questionnaires distributed
electronically to recipients
via e-mail and websites
a
Although relative comparisons have been made, both implementation and response rates may vary due to the number and nature of
reminders used.
222
223
224
and after implementation. Experimental research involving a pre- and posttest can be classified as a longitudinal
study because it contains temporal measuring points (preand post-). Advantages of longitudinal studies relate to
how they assist in determining cause-and-effect
relationships, as well as in facilitating assessment and
evaluation of changes in work practices. Disadvantages
of longitudinal studies are associated with the length
of time it takes to conduct the research, the cost of sustaining the study over time, and the potential for participants to drop out in the course of the study (research
mortality).
Qualitative Methods
The principles of a qualitative methodology guide qualitative methods of empirical material collection and analysis. A qualitative methodology tends to be associated with
social constructionism, social constructivism, social phenomenology, and interpretive social science paradigm, as
well as the related critical theory, feminist perspectives,
postmodern, and participatory paradigms. A qualitative
methodology focuses on gathering empirical materials
holistically in real-world business settings and contexts.
As a result, theories are inductively determined through
analysis that generates ideographic (thick and depthful)
insights that are specific to the study site and possibly
applicable to other similar business settings and contexts.
Researchers who engage in qualitatively informed research usually assume a subjective epistemological stance
in relation to empirical material collection and analysis.
Some examples of qualitative social science methods of
empirical materials (data) collection used in business include semistructured interviews, in-depth interviews, observation, action research, focus groups, the delphic
method, observation, the documentary method, case
studies, and longitudinal studies.
Semistructured Interviews
Semistructured interviews, which are less formal than
structured interviews, have generic foci and/or a set of
themes. The order of discussion of each of the themes may
vary between interviews, depending on the response to
a grand tour question that is used to focus the discussion. Rapport needs to be established to ensure that indepth information will be generated in the course of the
interview. Issues of reciprocity (mutual exchange of information) need to be clearly outlined. Advantages of
semistructured interviews are similar to those of unstructured (in-depth) interviews; they are interactive between
the participant and the researcher and reflect conversational exchange similar to that in a real-world setting.
Disadvantages are the time it takes to conduct them
and the volume of material generated for analysis.
225
In-Depth Interviews
In-depth interviews are unstructured interviews that have
similarities with a conversationalbeit a conversation
with a purpose, i.e., the research topic. In-depth interviews range in duration from 1 hour to upward to 5 hours
and beyond. Interviews in excess of 2 hours may be
conducted over a series of sessions. The keys to successful
interviews are the establishment of rapport, mutual respect, and reciprocity. Advantages of in-depth interviews
are that the researcher will be able to gain from the interview process information with richness and depth. Disadvantages are related to the time taken to gather and
analyze information.
Participant Observation
A number of typographies describe participant observation. Generally, participant observation may be described as a continuum of roles, ranging from a complete
participant to a complete observer. Participant observation, as does all research, requires the adoption of ethical practices and approval. In some instances,
participants may not be aware they are being observed.
The reason for not disclosing a researchers observation
role to participants is to enable the researcher to observe
phenomena as authentically as possible in the everyday
business or business-related setting. As a complete observer, the researcher does not participate as one of the
group being studied, but observes as an outsider. At the
other end of the continuum is the complete participant
role. In this role, the researcher acts as an insider. Again,
the identity of the researcher role may be hidden from the
insiders in order not to change behaviors. Between the
two end points of the continuum, researchers will adopt
differing degrees of observation and participation as best
suits the business research purpose.
Advantages of participant observation are that information is gathered in real-world business contexts and
settings, first-hand primary information is gathered,
and diverse methods may be incorporated (such as interviews, observations, and documentary analysis conducted
within the context of the business study setting). Disadvantages of the method are associated with behaviors and
patterns that may be influenced by the presence of the
observer/participant, the difficulty in observing all interactions and events if only one observer is present in the
business setting or study site, and with the fact that not all
facets may be available to the observer because some
contexts or interactions may be considered off-limits by
the participants.
Action Research
Action research is specifically associated with the participatory paradigm, which views reality as collectively
constructed from multiple realities/viewpoints. Action
226
findings are then discussed with each of the experts individually. Opinions may be modified and the researcher
repeats the process until a consensus is achieved regarding a number of multiple outcomes. Advantages relate to
the in-depth information that is gathered and to the fact
that the information is empirically sourced rather than
determined by a priori theories. Disadvantages relate
to the time taken in repeated interviewing rounds and
maintaining panel participation throughout the process.
The Documentary Method
The use of the documentary method within the social
constructionist, social constructivist, social phenomenological, or interpretive social science paradigms differs
from use within the positivistic paradigms in that there
is no a priori theory to guide the analysis specifically,
although a process must be followed. The text documents
are read and thematic units are identified, analyzed, and
grouped into categories and subcategories so that
a taxonomy of the materials is built up. The documents
are analyzed in regard to their purpose, tone, tenor style,
and explicit, implicit, and tacit texts. Advantages of this
method are its unobtrusive impact and its access to temporal insights in regard to past and present businesses and
business-related practices. Disadvantages relate to the
need for repeated readings of texts, which results in different interpretations, and to the fact that the documents
may be read without knowledge of the context of their
origin.
Case Studies
Under the social constructionist, constructivist, phenomenological, or interpretive paradigms, case studies enable
the extensive study of one case or similar cases across
time or space using a set of methods such as interviews,
focus groups, the documentary method, and participant observation. By using several methods, detailed information relating to the case or cases may be achieved.
The depth of materials obtained in a case study and
access to multiple sources of empirical materials
(data) are distinct advantages, although the amount of
time and resources required to gather details may be
a disadvantage.
Longitudinal Studies
Longitudinal studies involve multiple methods of empirical material (data) collection in relation to one organization or business over time, one set of people over time, or
similar organizations or sets of people over time. The purpose of longitudinal studies is to gather empirical materials
in regard to changes in cultural, social, political, economic,
and environmental trends concerning the research topic
being studied. That temporally extended information can
be gathered using a variety of methods is advantageous.
The time taken to collect, analyze, and maintain the
227
Mixed Methods
Quantitative
Focus group
Observation
Documentary
method
Case studies
Longitudinal
studies
Table III
Qualitative
Hypothetico-deductive Holistic-inductive
approach
approach
Objective stance of
Subjective stance of
researcher
researcher
Structured format
Semistructured or
unstructured format
Empirical materials
Data reduced and
(data) maintained in
mathematically and
their textual
statistically analyzed
wholeness and
analyzed in totality;
textual units
preserved
Study units identified
Study units
non randomly
predominantly
randomly assigned
or identified
Quantitative
Questionnaires
Structured interviews
Experimental and quasi-experimental methods
Forecasting
Nominal group technique
Focus groups
Delphic method
Observation
4
4
4
4
4
4
4
4
(structured
observations)
4
4
4
Documentary method
Case studies
Longitudinal studies
Semistructured interviews
In-depth (unstructured) interviews
Action research
a
Qualitative
4
4
4
4
(participant
observations)
4
4
4
4
4
4
Qualitative researchers may use the terms empirical materials and information instead of data.
Mixed methods
4
4
4
4
4
4
4
4
4
4
4
4
4
4
228
Table IV
multiple perspectives are equally valued, that no one position is privileged over another, that the interpretation of
the empirical materials (data) is subjectively informed by
the researchers emic knowledge of the research setting
and the participants, as well as the participants being
involved in validating the interpretations. The methodology continues to use qualitative methods that complement the empirical materials (data) collection methods
used. Analysis also commences as soon as the researcher
enters the field. The position of the researcher is reflexively and reflectively reviewed in the overall analysis. Subsequently, a qualitative methodology utilizes inductive
practices, to illuminate multiple realities, which are
grounded in the real-world businesses and businessrelated settings being studied. The research design
emerges in the course of the research process as concurrent analysis directs other empirical material (data) collection needs and directions. Emblematic themes arise in
the course of the research. Empirical materials (data)
collected is maintained as textual units that reflect the
themes and motifs arising out of analysis. Reporting of
the research findings should be in narrative form using
first person, active voice. Because sampling practices are
nonrandom, the research represents a slice of life from
the setting and people being studied. Table V provides an
overview of qualitative methods of empirical materials
(data) analysis.
Type of analysis
Descriptive statistics
Univariate analysis
Bivariate analysis
Multivariate analysis
Inferential statistics
Statistical significance
Nonparametric testsnominal level
Examples
Measures of central tendency (mode, median, mean); measures of variation (range,
percentile, standard deviation)
Cross-tabulation, scattergrams, measures of associationlambda (l), gamma (g),
tau (t), rho (r), chi-square (w2)
Multiple regression analysis, path analysis, time-series analysis, factor analysis
229
Type of analysis
Discussion
Content analysis
Textual materials are read, annotated, and coded. Categories are generated from reading,
annotating, and coding. Categories are evaluated in regard to relevance of emerging taxonomy
in relation to the empirical setting from which they emerged. This involves reflection and
questioning of assignment of codes and categories and the real world context
Constant comparative analysis involves two generic stages, coding and the comparison of codes to
generate categories to build an ideographic representation of the study phenomenon.
Theoretical sampling will also be applied to establish the repetitive presence of concepts. The
method has similarities with grounded theory analysis
The researcher will iteratively and reflectively compare codes and categories to develop
concepts, relationships, and theory. Questions in regard to goodness of fit with the
empirical world are posed constantly throughout the process. The method has similarities with
constant comparison and grounded theory analysis
Categorizes study units using a cover term, included terms, and a semantic relationship.
Categorization is an ongoing process during data collection. Domain analysis is founded on
Spradleys Participant Observation as well as the study of culture
Ideal types (models of social interactions and processes) establish a standard to which reality may
be compared. Ideal types emanate from the work of Max Weber (1864 1920)
The chronological ordering of events highlighting the causal relationships for their occurrence
Matrices demonstrate interactions between two or more elements of phenomena
Grounded theory is attributed to the work of Barney Glaser (1930 ) and Anselm Strauss
(1916 1996). It is an inductive process, as are all of the qualitative methods of empirical
material analysis. In its original form, theory is produced by identifying conditions that result in
a phenomenon occurring, which establishes a specific context, concomitant actions, and related
consequences
Networks, models, typologies, taxonomies, conceptual trees, mind maps, semantic webs, and
sociograms
Constant comparative
analysis
Successive approximation
Domain analysis
Ideal types
Event-structure analysis
Matrices
Grounded theory analysis
Mixed Methods
Mixing methods for data analysis results in the relevant
method of analysis being selected from either
a quantitative and qualitative methodology, to match
the appropriate data or empirical material collection
tool. Two examples of mixed methods being associated
with analysis are (1)qualitative methods of data collection
and quantitative data analysis and (2) open-ended questions in a questionnaire being analyzed using qualitative
methods.
Conclusion
Once the paradigm that informs the overall business research project/study has been determined, business researchers utilize quantitative, qualitative, or mixed
methodologies to guide the design of research projects.
Within a quantitative methodology, researchers adopt
quantitative methods of data collection. Primarily surveys
(interviews and questionnaires), experiments and quasiexperiments, and observation are the key business
methods. Other methods used variously within businesses
and business-related fields and disciplines are documentary analysis, longitudinal studies, forecasting, focus
groups, the nominal group technique, and the delphic
method. Some business fields and disciplines may use specific methods not utilized across disciplines or extensively
due to their specific field or disciplinary nature. Examples
of methods of data analysis include descriptive statistics,
inferential statistics, tests of statistical significance, levels of
significance, and consideration of Type I and Type II
errors.
Within a qualitative methodology, empirical material
(data) collection methods include semistructured and unstructured interviews, participant observation, action research, focus groups, the delphic method, the
documentary method, case studies, and longitudinal
studies. Analysis methods may include content analysis,
constant comparative analysis, successive approximation,
domain analysis, ideal types, event structure analysis, matrices, and grounded theory.
Examples of social sciences methods used in business
represent a wide range of both qualitative and quantitative
methods of data/empirical material collection and analysis,
selected based on the relevant methodology and overarching paradigm that inform the overall research process.
230
Further Reading
Cavana, R. Y., Delahaye, B. L., and Sekaran, U. (2001).
Applied Business Research, Qualitative and Quantitative
Methods. John Wiley and Sons, Milton, Australia.
Collis, J., and Hussey, R. (2003). Business Research, A Practical
Guide for Undergraduate and Postgraduate Students, 2nd
Ed. Palgrave Macmillan, Houndmills, Hampshire, UK.
Cooper, D. R., and Schindler, P. S. (2003). Business Research
Methods, 8th Ed. Irwin McGraw-Hill, Boston.
Davis, D. (2000). Business Research for Decision Making, 5th
Ed. Duxbury, Thomson Learning, Pacific Grove, CA.
Frazer, L., and Lawley, M. (2000). Questionnaire Design
and Administration. John Wiley and Sons, Brisbane,
Australia.
Frey, J. H. (1989). Survey Research by Telephone, 2nd Ed.
Sage Library of Social Research Volume 150. Sage,
Newbury Park, CA.
Donald A. Gross
University of Kentucky, Lexington, Kentucky, USA
Glossary
Introduction
Data Availability
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
231
232
The Problem
Solid research requires reliable and valid data. And, even
though the quantity and quality of campaign finance data
have increased dramatically in recent years, a fundamental
problem remainsthe nature and character of almost
all campaign finance data depend on the regulatory
environments surrounding the phenomena of interest.
This problem is compounded by the fact that regulatory
environments, both state and federal, continue to evolve
over time. Thus, data availability changes over time and
across government units, with the likelihood of new forms
of data coming to the forefront remaining a strong possibility. As such, any discussion of campaign finance data
must be viewed in the context of the evolving regulatory
environments governing elections.
Article Overview
An exhaustive discussion of the regulatory environments
found world wide is simply beyond the scope of this essay.
The focus here is on campaign finance data in the United
States, especially that available for federal elections. In
addition, the article examines only the most widely used
types of campaign finance data, recognizing that there are
other types of data that are typically classified under particular regulatory characteristics of specific states. The
focus is on the data that have become available since
the mid-1970s. First, the general regulatory parameters
that define the characteristics of particular types of campaign finance data are examined. Next, campaign finance
data that typically classified under the rubrics of hard
money and soft money are discussed.
Limits on Contributions
Other than reporting and disclosure requirements,
limits on campaign contributions are the most prevalent
characteristic of campaign finance regulatory environments. Limits on contributions tend to differ in terms
of three basic characteristics: (1) in terms of who or
what organizations that make contributions, (2) in terms
of who or what organizations that receive contributions,
and (3) in terms of the dollar amounts of specific types of
contributions (these differ across regulatory environments as well as across different electoral offices within
a given regulatory environment).
The most typical organizations and individuals that are
limited in the amount that they can contribute are individuals, political parties, the candidate and his or her
family, unions, corporations, foreign nationals, PACs,
and regulated industries. Organizations that receive
contributions that have limits placed on the size of contributions given to them typically include candidate committees, party committees, and PACs. Dollar amounts
differ from a total prohibition on contributions to no
limit on the size of a contribution. In addition, in some
jurisdictions, dollar amounts are adjusted over time to
take into account the effects of inflation.
Limits that are in place for federal elections are illustrative of the possible combinations that we may encounter in a given regulatory environment. Corporations,
unions, and foreign nationals are prohibited from
making contributions to candidates and political committees, whereas there are specific limits on the size of
a contribution from individuals, political parties, and
PACs. Until the passage of the Bipartisan Campaign
Reform Act in 2002, there were no major limits on
the size of contributions to party soft money accounts.
And, pending the outcome of current court action, the
strict limits imposed by this legislation may become nullified. When we examine the regulatory environments in
the states, we see a great deal of diversity in the parameters governing contributions. States such as Texas and
Illinois have almost no effective regulations limiting the
size of contributions, whereas states such as Kentucky
and Vermont have stringent regulations governing contributions.
233
Public Financing
There are two basic types of public financing schemes:
those that provide funds to political parties and those that
provide funds directly to candidates. At the federal level,
the major political parties are given some monies to help
pay for their national conventions, but public funding is
only given to candidates for the presidency. At the state
level, depending on the specific years we are analyzing,
approximately 20 states have some type of public financing system. Some give money exclusively to political
parties, some give exclusively to candidates, and some
give money to both political parties and candidates.
There are two primary mechanisms for funding the
state-level programs that fund political parties. Approximately one-half the states rely on a taxpayer check-off
system, whereby a taxpayer can designate part of his or
her tax liability (normally $1 to $5) for the public funding
system. In other states, a taxpayer adds money to the fund
234
Enforcement Provisions
As might be expected, among the states and the federal
government, there is a good deal of diversity in enforcement provisions and the implementation of campaign finance law. Malbin and Gais, in their excellent 1998
discussion of enforcement problems in the states, suggest
that there are data collection errors and note that there
appears to be little effort to systematically cross-check the
validity of the information provided by candidates and
organizations. Perhaps most important, as individuals
and organizations become ever more creative in their
attempts to avoid campaign finance laws, there can be
ambiguities in the coding categories of data.
Even though we know that there are data collection
problems, there is, at this time, no evidence to suggest that
it is a serious problem for most analyses. But differences
in the dissemination of information can create problems
for some analysts. Often campaign finance information is
only available on paper, which can become prohibitively
expensive in terms of time and money. And although
a number of states, like the federal government, are increasingly placing campaign finance data on the Internet,
they are not always readily downloadable. Finally, states
differ a good deal in the time frame for which campaign
Hard Money
Hard money is a term that has specific meaning in the
context of the federal regulatory environment. It is money
given to candidates and political parties that is subject to
federal contribution limits and prohibitions. Although
there is no limit on how such money is spent, it must
be fully disclosed. The definition of hard money may differ
in any given state. For simplicity, campaign finance data
are presented here in the context of the federal regulatory
environment, recognizing the fact that similar data obtained at the state level may not necessarily fit the strict
definition of hard money.
Expenditure Data
Probably the most widely used campaign finance data are
aggregate candidate expenditure data. Each datum is the
total amount of money spent by a single candidate for
a single election or electoral cycle. The Federal Election
Commission makes these data available for all federal
elections since 1978. Similar types of data are generally
available for many state elective offices, although there
are often serious data-availability issues when we attempt
to obtain data over time or across state elective offices.
Aggregate candidate expenditures for gubernatorial
elections are readily available for almost all states from
1978 to the present. State legislative elections are more
problematic, with complete data readily available only
since the mid-1990s. For earlier years, availability is
much more episodic.
There are a number of readily identified potential pitfalls when using aggregate candidate data. First, there are
two interrelated problems that are of special significance
when analyzing the effect of candidate spending on electoral outcomes or party competition; this is called the
simultaneity issue. The first way to think about the
simultaneity issue involves the interactive nature of campaign spending. Jacobson established in 1980 that in many
cases candidates spend money in response to spending by
other candidates. Thus, spending by one candidate becomes a function of spending by the others. A second way
to think about the simultaneity issue is that candidate
spending can be seen as a function of anticipated outcomes. Either view of simultaneity suggests that simple
ordinary least-squares regression of candidate spending
on electoral outcomes will necessarily result in biased
235
Contribution Data
Perhaps the simplest type of contribution data is that
which deals with the existence of the contribution
limits themselves. The simplest approach is to treat
each potential type of contribution limit as a dummy
Soft Money
Like hard money, soft money is especially relevant in the
context of the federal regulatory environment. Unlike
236
hard money, however, the term soft money has often had
a somewhat ambiguous meaning. In its broadest sense,
soft money is all money associated with federal elections
that is not subject to federal contribution limits. This is
a broad category of monies that includes numerous types
of financial transactions. Within this context, there are
really three types of activities that have received
significant attention in the literature: party soft money,
independent expenditures, and issue advocacy money.
Independent Expenditures
Independent expenditures are monies spent on campaign activities that are not coordinated with a
Issue Advocacy
Issue advocacy is noncandidate campaign activity that
does not expressly advocate the election or defeat of
a candidate. Generally speaking, any communication activity that does not use the words specified in the Buckley
decision (vote for, support, cast your ballot for, elect,
defeat, reject, or vote against) may be considered issue
advocacy. National political parties have generally been
restricted to a prescribed ratio of hard and soft money
when undertaking issue advocacy. However, in all other
cases, costs are not considered to be either a contribution
or expenditure under the federal regulatory environment.
Thus, in the case of issue advocacy, there are no contribution limits or prohibitions, no expenditure limits, and
no itemized reporting requirements.
The lack of reporting requirements for most issue advocacy creates serious problems for the data analyst. As
previously stated, there are data on national party activities. Some organizations voluntarily publish their expenditures and records can be obtained from television and
radio stations to re-create the amount of money spent on
these broadcast activities by organizations. This does not
guarantee, however, that we can accurately specify the
nature of the organization from the name listed as the
purchasing agent for the media spot. We clearly cannot
specify the ultimate source of the money. Expenditure
data on PACs can be used to estimate their issue advocacy
monies.
Other groupings, including 501s and 527s, whose
names derive from their income tax designation, have
traditionally not been required to publicly disclose either
their contributors or their itemized expenditures. Recent
federal legislation and pending court cases may change
our ability to collect more comprehensive data on these
organizations. For example, new disclosure requirements
on 527s began in June 2002. Nevertheless, there simply
are no data sets currently available that can be considered
systematic, reliable, and comprehensive when it comes to
Conclusion
The quantity and quality of campaign finance data are
expanding every day. Entirely new categories of data
are now available that were unheard of only 5 years ago
or only available for a limited number of legal jurisdictions. In addition to governmental agencies, there are
a number of organizations, both commercial and otherwise, that can provide campaign finance data. But, especially when we move from the federal level to the state
level, we remain confronted with the possibility that data
may only be available for a limited number of states or
a limited time frame. In some cases, the data have been
lost forever. Data availability on activities such as bundling, the use of conduits, and the use of internal communication by corporations and unions remains episodic.
The multiple jurisdictions that help define the nature of
campaign finance data often make direct comparisons
among data sets difficult. Differences among jurisdictions
can make it nearly impossible to follow the flow of funds
across jurisdictions. Some data sets have reliability and
validity questions associated with them. And as strategic
actors change their behavior, it is likely that new types of
campaign finance data will become available while other
types become irrelevant.
237
Further Reading
Corrado, A., Mann, T., Ortiz, D., Potter, T., and Sorauf, F.
(eds.) (1997). Campaign Finance Reform: A Sourcebook.
Brookings Institution Press, Washington, D.C.
Goidel, R., and Gross, D. (1994). A systems approach to
campaign finance in United States House elections. Am.
Polit. Q. 22, 125 153.
Goidel, R., Gross, D., and Shields, T. (1999). Money Matters:
Consequences of Campaign Finance Reform in U.S. House
Elections. Rowman and Littlefield, New York.
Gross, D., and Goidel, R. (2003). The States of Campaign Finance:
Consequences of Campaign Finance Law in Gubernatorial
Elections. Ohio State University Press, Columbus, OH.
Jacobson, G. (1980). Money and Congressional Elections. Yale
University Press, New Haven, CT.
Jacobson, G. (1990). The effects of campaign spending in
House elections. Am. J. Polit. Sci. 34, 334 362.
Magleby, D. (ed.) (2000). Outside Money. Rowman and
Littlefield, New York.
Malbin, M., and Gais, T. (1998). The Day after Reform.
Rockefeller Institute Press, Albany, NY.
Sorauf, F. (1992). Inside Campaign Finance: Myths and
Realities. Yale University Press, New Haven, CT.
Thompson, J., and Moncrief, G. (eds.) (1998). Campaign
Finance in State Legislative Elections. Congressional
Quarterly Press, Washington, D.C.
Campbell, Donald T.
Yvette Bartholomee
University of Groningen, Groningen, The Netherlands
Glossary
attitude Feeling or opinion about something or someone,
learned in the process of becoming a member of a group,
and the way of behaving that follows from this.
external validity Concerns the generalizability of a particular
experimental outcome to other persons, settings, and times.
internal validity Concerns the validity of the claim that in
a specific experiment the factor A, and factor A only, caused
a change in factor B.
multitrait multimethod matrix A correlational matrix measuring a set of traits by using a set of methods.
quasi-experiment A study that lacks random assignment but
otherwise resembles a randomized experiment.
randomized controlled experiment A study in which
subjects are assigned to either treatment or control groups
at random to equal out preexisting differences between the
groups.
Introduction
The career of the renowned American psychologist and
methodologist Donald T. Campbell (see Fig. 1) developed in parallel to the ever-increasing influence of
the field of psychology in the post-World War II era.
Campbell acquired his first job as an army psychologist
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
239
240
Campbell, Donald T.
Campbell, Donald T.
241
242
Campbell, Donald T.
Evaluation Research
Even though the article on the multitrait multimethod
matrix was widely read and cited, its influence is rather
marginal compared to Campbell and Stanleys experimental and quasi-experimental research designs, which became paradigmatic. Campbells work was foundational
for the new field of program evaluation or social
experimentation. This field evaluated the effectiveness
of policies such as social programs. Examples of social
programs are the New Jersey Negative Income Tax
Experiments, Head Start, or Sesame Street, large-scale
projects in the vein of President Johnsons Great Society
initiatives.
The New Jersey Negative Income Tax Experiments
were designed to find out whether guaranteeing an income to poor working families might be an alternative to
welfare. The concept of a negative income tax was that if
an income were to drop below a certain minimum level,
a negative tax would be levied, which meant that the tax
system would pay out cash. This negative tax would be
reduced when someone again earned an income above
the minimum level. The experiments were carried out
from 1968 through 1972 and involved a sample of more
than 1200 households. These households were randomly
divided into several experimental groups, which differed
in the level of guaranteed income and in the reduction rate
of negative taxes when earning income above the minimum level. The experiments had to test whether or not
this system of a negative income tax created an incentive
to work was tested. These experiments were the first
Campbell, Donald T.
Cross-Cultural Psychology
Campbell not only became extremely successful in the
multidisciplinary field of program evaluation, his quasiexperimental research methodology also was successfully
introduced in other social sciences, such as sociology and
economics. He even tried to extend his standardized
methodology to a discipline that predominantly used
qualitative research methods: cultural anthropology.
This effort retrospectively made him one of the founding
fathers of cross-cultural psychology, a subdiscipline
between psychology and cultural anthropology.
Campbell constantly emphasized the need for psychologists to test their theories in the field and he argued that
his quasi-experimental research methodology enabled
them to do so. Using anthropological data to test psychological theories was to him a kind of quasi-experimentation. If, for example, one wanted to investigate the effects
of different modes of child-rearing on personality
formation, it would be impossible to conduct a randomized experiment. In these cases, anthropological data
could form a quasi-experimental alternative to true
experimentation.
243
However, before psychologists could use these anthropological data, they had to make sure that these data were
reliableat least according to their standards. Whereas
psychologists were used to testing large groups of
respondents, anthropologists relied on qualitative
data gathered from small groups of people. This made
anthropological data less trustworthy in the eyes
of psychologists. Campbell presented his methods as
most useful for standardizing anthropologists research
efforts and detecting the factors that distorted results.
With the help of others, he created field manuals:
ready-made research designs to gather uniform data.
An example of such a field manual is Materials for
a Cross-Cultural Study of Perception, by the anthropologist Melville Herskovits, Campbell, and Marshall Segall,
developed for their study of differences in visual perception across cultures. Fieldworkerscooperating anthropologists actually gathering the datareceived very
precise research instructions through the manual, such
as exactly how to phrase their questions and from what
angle and distance to show the images in the manual. In
total, some 1878 respondents were questioned using this
standardized research design.
Campbell was so satisfied with his field manual method that he decided to use it for investigating another
topic, together with the anthropologist Robert LeVine.
This was the topic of ethnocentrism. Campbell and
LeVine developed an Ethnocentrism Field Manual to
coordinate and standardize the collection of data by anthropologists participating in their cross-cultural study.
The manual prescribed strict research procedures. It
gave detailed instructions on how to choose informants
and how to select interpreters and presented a uniform
interview schedule. An example of one of the interview
questions in the manual was the Bipolar Trait Inquiry
asking interviewees to characterize members of other
ethnic groups in dichotomies, such as peaceful or quarrelsome, hardworking or lazy, filthy or clean, stupid or
intelligent, handsome or ugly. After data were gathered
from approximately 20 societies all over the world, Campbell and LeVines field manual was used to interview 1500
people in East Africa.
In these cross-cultural research projects, Campbell
chose subjects interesting to psychologists, and most importantly, these projects were shaped by his quantitative,
standardized methodology. The field manuals were developed to make sure that the experimental conditions
were as uniform as possible to ensure reliable outcomes.
Conclusion
Campbell resigned from Northwestern University in
1979 at the age of 63. By then, he had become one of
the leading figures in his field. In 1970, he was granted
244
Campbell, Donald T.
Further Reading
Brewer, M. B., and Collins, B. E. (eds.) (1981). Scientific
Inquiry and the Social Sciences. Jossey-Bass, San Francisco,
CA. [Contains a bibliography of Donald T. Campbell,
1947 1979.]
Capshew, J. H. (1999). Psychologists on the March: Science,
Practice, and Professional Identity in America, 1929 1969.
Cambridge University Press, New York.
Dehue, T. (1997). Deception, efficiency, and random groups:
Psychology and the gradual origination of the random
group design. Isis 88, 653 673.
Dehue, T. (2001). Establishing the experimenting society: The
historical origin of social experimentation according to the
randomized controlled design. Am. J. Psychol. 114,
283 302.
Dunn, W. N. (ed.) (1998). The Experimenting Society: Essays
in Honor of Donald T. Campbell. Transaction Publishers,
New Brunswick, NJ.
Herman, E. (1995). The Romance of American Psychology.
Political Culture in the Age of Experts. University of
California Press, Berkeley, CA.
Oakley, A. (2000). Experiments in Knowing: Gender and
Method in the Social Sciences. Polity Press, Cambridge,
UK.
Overman, E. S. (ed.) (1988). Methodology and Epistemology
for Social Science: Selected Papers of D. T. Campbell.
University of Chicago Press, Chicago, IL. [Contains
a bibliography of Donald T. Campbell, 1947 1988.]
Shadish, W. R., Cook, T. D., and Leviton, L. C. (1991).
Foundations of Program Evaluation. Theories of Practice.
Sage, Newbury Park, CA.
Wuketits, F. M. (2001). The philosophy of Donald T.
Campbell: A short review and critical appraisal. Biol.
Philos. 16, 171 188.
Case Study
Jack Glazier
Oberlin College, Oberlin, Ohio, USA
Glossary
ethnography Used synonymously with fieldwork, or the
activity of the research anthropologist; also refers to the
written result, usually in book form. Ethnography focuses
on human behavior and belief within a well-defined
community.
fieldwork The characteristic research endeavor of cultural
anthropology, involving residence of the anthropologist
within a community in order to collect data based on
observation and informant testimony.
holism Understanding a community as a social unit in which
beliefs, values, and institutional arrangements are integrated. A holistic perspective requires that the anthropologist see any segment of behavior or belief in its natural
contextin relationship to other parts of the culture and to
the whole.
informants Members of a community on whom anthropologists rely for answers to questions asked in formal
interviews or, more commonly, through informal interactions during fieldwork. Informant testimony can complement, support, or contradict the behavioral data the
anthropologist gathers through participant observation.
participant observation A major research activity of fieldworkers that includes their immersion in the daily routines
of community life. As anthropologists become increasingly
familiar to their hosts over the course of many months,
the effect of their presence on the data collected should
be minimized. Ideally, the participant observer attempts
to learn about local custom under conditions that are as
natural as possible.
Introduction
The distinctive features of the anthropological case study
in many respects capture the theoretical and methodological qualities of anthropology as a discipline. Accordingly, any discussion of the nature of anthropological case
studies can proceed only by considering the general characteristics, methods, and goals of cultural anthropology.
The case study in anthropology often but not exclusively
represents an extended examination of the culture, or
ways of life, of a particular group of people through fieldwork. The published case study is often referred to as an
ethnography. During the first half of the 20th century,
that examination often covered a wide range of topics. The
breadth of early ethnographic monographs partly reflected a desire to preserve a record of historic cultures in the
process of momentous change under the colonial impact.
At the same time, anthropologists were documenting the
range and particularly the variation in human cultures.
American anthropology for the first four decades of the
20th century utilized the rich data of ethnography to point
up the unique character of individual cultures, thereby
undermining universal explanatory schemes based on cultural evolutionism, biology, or alleged human psychological or temperamental constants. Where early universalist
theories alleged singular lines of cultural and psychological development, 20th century anthropologists, up to the
1940s, were finding endless difference and variation.
Since the 1920s, writing comprehensive ethnographies
has steadily given way to more focused, problem-oriented
studies within a broad ethnographic framework that
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
245
246
Case Study
continues to emphasize difference over uniformity. Classic examples of early problem-oriented case studies are
Margaret Meads Coming of Age in Samoa and Bronislaw
Malinowskis monographic examination of Freuds
oedipal theory, Sex and Repression in Savage Society.
Both Mead and Malinowski presented case studies characteristic of their time. Their work effectively overturned
widely accepted explanations of putative human
universals believed to be derivative of a shared biological
substrate of the human species. Instead, Mead argued
that the phenomenon of adolescence was an American
cultural invention alien to the experience of the people of
Samoa. Likewise, Malinowski demonstrated that the supposed universality of the oedipal complex was only an
expression of a particular kind of family structurethe
patriarchal organization familiar in Europeunknown in
matrilineal societies, such as in the Trobriand Islands.
These two cases are emblematic of anthropologys abiding
interest in cultural difference and the utility of ethnographic case material in challenging universalist claims.
The term case study may also refer to specific case
material embedded within a larger ethnographic monograph. Here, detailed examples of dispute settlement in
local legal procedures, richly documented ritual performances, and the like can effectively illuminate the dynamics of social life. Especially germane is the work of Max
Gluckman, his students from the University of Manchester, and their colleagues who centered their ethnographic
investigations in Zambia and other areas of Central Africa.
Turners concept of the social drama, for example,
emerged from his detailed case material about social situations of crisis among the Ndembu of Zambia, where
breaches of widely accepted rules create crisis conditions
threatening to sever ongoing social relationships.
Through formalized ritual or legal procedure, efforts at
redress either may successfully avert schism, by reconciling antagonists, or may lead to social separation. Turners
social drama strategy focused closely on the processes of
community life as it concentrated on a limited cast of
players within a village, their multiplex relationships,
and their shifting structural positions over time. Other
Central Africanists pioneered similar detailed
processual research, also referred to as the extended
case method and situational analysis. In all instances,
the emphasis lay on close description of the
behavioral dynamics of real people and their complex
social relationships.
Fieldwork
Though cultural anthropologists utilize multiple theoretical orientations and study a wide range of cultural
topics, disciplinary characteristics suffusing the case
study tradition can nonetheless be identified. That
Case Study
247
248
Case Study
Cultural Similarities or
Cultural Differences
Studies that effectively contribute to culture theory and the
associated construction of generalizations are necessarily
comparative, because theory entails an explanation of multiple cases of cultural regularity. But in extending the reach
of the single case study, theory and generalization almost
exist at cross-purposes with the ethnography of the single
case. On the one hand, the case study stays very faithful to
detail, to native perspectives, to the subtleties of the vernacular language of the community, and to the context of
events considered in holistic fashion. That fidelity to detail
inevitably gives each case study a very distinct character,
because the particular content and concatenation of
events, activities, personalities, informant statements,
and the like are singular. The detailed ethnography deriving from fieldwork may well restrict theory development
and broad comparisons, if one is bent on maintaining the
integrity and holism of the data. For example, anthropological accounts of peoples as diverse as Cantonese villagers, Mundurucu or Yanomamo horticulturists in Brazil and
Venezuela, Mbeere farmer/herders on the Mt. Kenya periphery, and Tikopia Islanders in Polynesia characterize
them as patrilineal. This designation refers to their
mode of reckoning descent through the male line, ascending to father, grandfather, and so on, to an apical ancestor.
Men and women descended from that ancestor through
male links belong to a patrilineal groupa lineage, a
clan, or a moiety. In this respect, the five peoples cited
as well as many others appear similar in regard to their
Conclusion
The depiction of anthropology as a discipline traditionally
more concerned with the uniqueness of case study
material than with generalizations about culture of course
Case Study
249
Further Reading
Barrett, R. A. (1991). Culture and Conduct, 2nd Ed.
Wadsworth, Belmont, California.
Eggan, F. (1954). Social anthropology and the method of
controlled comparison. Am. Anthropol. 56, 743 763.
Geertz, C. (1973). Thick description: Toward an interpretative
theory of culture. In The Interpretation of Cultures,
pp. 3 30. Basic Books, New York.
Glazier, J. (1976). Generation classes among the Mbeere of
Central Kenya. Africa 46(4), 313 325.
Glazier, J. (1984). Mbeere ancestors and the domestication of
death. Man 19(No. 1), 133 147.
Gluckman, M. (1961). Ethnographic data in British social
anthropology. Sociol. Rev. 9, 5 17.
Haines, D. (ed.) (1996). Case Studies in Diversity: Refugees in
America in the 1990s. Praeger, Westport, Connecticut.
Mitchell, J. C. (2000). Case and situation analysis. In Case
Study Method: Key Issues, Key Texts (R. Gomm,
M. Hammersley, and P. Foster, eds.), pp. 165 186. Sage,
London.
Spindler, G., and Spindler, L. (eds.) (1977). Native North
American Cultures: Four Cases. Holt, Rinehart, and
Winston, New York.
Turner, V. W. (1957). Schism and Continuity in an African
Society. Manchester University Press, Manchester.
Van Velsen, J. (1967). The extended-case method and
situational analysis. In The Craft of Social Anthropology
(A. L. Epstein, ed.), pp. 129 149. Tavistock Publ., London.
Watson, J. L. (1975). Emigration and the Chinese Lineage.
University of California Press, Berkeley.
Watson, J. L. (1982). Of flesh and bones: The management of death pollution in Cantonese society. In Death and
the Regeneration of Life (M. Bloch and J. Parry, eds.),
pp. 155 186.
Categorical Modeling/
Automatic Interaction
Detection
William A. V. Clark
University of California, Los Angeles, Los Angeles, California, USA
Marinus C. Deurloo
University of Amsterdam, Amsterdam, The Netherlands
Glossary
artificial neural network A type of data mining technique,
based on biological processes, for efficiently modeling large
and complex problems.
chi-square automatic interaction detection A method
tailored to finding structure in high-dimensional categorical
spaces where the dependent variable is also categorical.
data mining A mathematical and statistical tool for pattern
recognition.
decision trees Methods of representing a series of rules.
entropy A measure of diversity for nominal variables.
entropy-based relevance analysis A simple data mining
method for categorical variables. It identifies variables and
their categories that are highly relevant to the prediction,
reduces the number of variables and their categories in
prediction, and improves both efficiency and reliability of
prediction.
proportional reduction in error A criterion that has been
used for choosing among the various conventional measures
of association between variables.
Introduction
Until the 1980s, most statistical techniques were based
on elegant theory and analytical methods that worked well
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
251
252
Entropy-Based Relevance
Analysis
Creating a predictive model from a large data set is not
straightforward. Most of the variables are redundant or
irrelevant, so a preliminary task is to determine which
variables are likely to be predictive. A common practice
is to exclude independent variables with little correlation
to the dependent variable. A good start perhaps, but such
methods take little notice of redundancy among the
variables or of any relationship with the dependent variable involving more than one independent variable.
Moreover, categorical variables are often handled awkwardly. Entropy-based relevance analysis (ERA) is a simple method that identifies variables and their categories
X
i j
pi
pij
pij ln
pi p j
and
4
pij :
I XY
:
HY
G2
2n
Decision Trees
Decision tree methods are both data mining techniques
and statistical models and are used successfully for
prediction purposes. Decision trees were developed
by Morgan and Sonquist in 1963 in their search for
the determinants of social conditions. In one example,
they tried to untangle the influence of age, education,
ethnicity, and profession on a persons income. Their
best regression contained 30 terms (including interactions) and accounted for only 36% of the variance. As an
253
254
Chi-Square Automatic
Interaction Detection
The chi-square automatic interaction detection is currently the most popular classification tree method.
CHAID is much broader in scope than AID and can
also be applied when the dependent variable is categorical. The algorithm that is used in the CHAID model splits
records into groups with the same probability of the outcome, based on values of independent variables. Branching may be binary, ternary, or more. The splits are
determined using the chi-squared test. This test is undertaken on a cross-tabulation between the dependent
variable and each of the independent variables. The
result of the test is a p-value, which is the probability
that the relationship is spurious. The p-values for each
cross-tabulation of all the independent variables are then
ranked, and if the best (the smallest value) is below
a specific threshold, then that independent variable is
chosen to split the root tree node. This testing and splitting is continued for each tree node, building a tree. As the
branches get longer, there are fewer independent
variables available because the rest have already been
used further up the branch. The splitting stops when
the best p-value is not below the specific threshold. The
leaf tree nodes of the tree are tree nodes that did not
have any splits, with p-values below the specific threshold,
or all independent variables are used. Like entropybased relevance analysis, CHAID also deals with a simplification of the categories of independent variables.
For a given r cj cross-table (r 2 categories of the
dependent variable, cj 2 categories of a predictor),
the method looks for the most significant r dj table
(1 dj cj). When there are many predictors, it is
not realistic to explore all possible ways of reduction.
Therefore, CHAID uses a method that gives satisfactory
results but does not guarantee an optimal solution. This
method is derived from that used in stepwise regression
analysis for judging if a variable should be included or
excluded. The process begins by finding the two categories of the predictor for which the r 2 subtable has the
lowest significance. If this significance is below a certain
user-defined threshold value, the two categories are
merged. This process is repeated until no further merging
can be achieved. In a following step, each resulting category composed of three or more of the original categories
is checked; if the most significant split of the compound
category rises above a certain chosen threshold value, the
split is carried into effect and the previous step is entered
again (this extra step ensures a better approximation of
3
1
6
Output
2
Input
5
Hidden layer
W13
1
W14
W36
W23
W15
4
W46
W24
2
W25
W56
5
255
256
Independent variables
Step 1A: Selection of the first variable
Income (4 categories)
Age of the head of household (4 categories)
Size of household (4 categories)
Rent of previous dwelling (5 categories)
Type of housing market (4 categories)
Number of rooms in previous dwelling (4 categories)
Type of previous dwelling (2 categories)
Tenure of previous dwelling (2 categories)
Step 1B: Income category simplification
Income 1 2, 3, 4
Income 1, 2 3, 4
Income 1, 2, 3 4
Step 2A: Selection of the second variable
Housing market
Step 2B: Simplification
Housing market 1 2 3, 4
Step 3A: Selection of the third variable
Size of household
Step 3B: Simplification
Size of household 1, 2, 3 4
Step 4: Further simplification of income
Income 1 2, 3, 4
a
ERA
G2
0.111
0.064
0.052
0.039
0.035
0.024
0.015
0.007
710.4
409.5
330.8
248.8
222.5
156.2
93.7
43.0
0.103
0.088
0.091
658.8
561.0
581.9
0.155
995.1
0.152
973.2
0.203
1301.3
0.199
1277.7
0.192
1230.5
Adapted and modified from Clark et al. (1988), by permission of the Ohio State University Press.
257
Income
1
1 58.4
2 35.7
3 5.8
n 44.5
Housing market
1+2+3
1 44.5
2 48.2
3 7.4
n 272
1
2
3
n
1+2
54.5
42.3
3.2
189
hh size
3+4
21.7
61.5
16.8
83
4
80.4
16.2
3.5
173
2
39.5
45.8
14.7
878
3
20.7
41.8
37.5
938
hh size
1
73.2
14.2
12.6
127
2
51.2
36.0
12.8
203
3+4
27.4
56.8
15.8
548
4
17.1
18.6
64.4
662
Details not
included
Housing market
1+2+3
4
17.1
56.3
64.6
34.7
18.3
9.0
404
144
Previous dwelling
1
1 8.5
2 69.2
3 22.3
n 188
2
24.5
60.7
14.8
216
Figure 3 Chi-square automatic interaction detection dendrogram for influences on housing choice for previous renters (hh, household). Figure values are
the percentages moving to each destination category: 1, multifamily rent; 2,
single-family rent; or 3, owner occupied; n is the sample size. Adapted and
modified from Clark et al. (1988), by permission of the Ohio State University
Press.
optimal splits by maximizing the significance of the chisquare statistics at each step. For each of the categories of
an independent variable selected in a previous step, the
CHAID technique considers the most important predictor in the next step. Thus, the process nests the results
and the end product can be presented as a tree (Fig. 3). As
in ERA, in this example, the CHAID procedure selects
income as the most important predictor and does not
simplify the variable (only further splitting is shown for
the first two categories of income in this presentation).
Both housing market type and the size of the household
are important predictors at the second and third stages for
these low-income groups. Interesting additional information is contained in the way in which the two variables
alternate in their contributions at different levels. Because
the results from the CHAID analysis emphasize nesting of
the independent variables within categories, the results
suggest alternate lines of inquiry for a model of the data
structure.
258
Further Reading
Abdi, H., Valentin, D., and Edelman, B. (1999). Neural
Networks, Quantitative Applications in the Social Sciences,
Vol. 124. Sage, London.
Causal Inference
Alberto Abadie
Harvard University, Cambridge, Massachusetts, USA
Glossary
assignment mechanism The process that determines which
units are exposed to a particular treatment.
covariate A variable not affected by the treatment.
experimental study A study that uses experimental data.
observational study A study that uses nonexperimental data.
outcome The variable possibly affected by the treatment.
treatment A variable, the effects of which are the objects of
study.
Causal inference comprises a set of tools that aid researchers in identifying and measuring causal relationships from
data using background knowledge or assumptions about
the process that generates the data to disentangle causation from association.
Causal Models
A Model of Potential Outcomes
Introduction
Establishing causal relationships is an important goal of
empirical research in social sciences. Unfortunately, specific causal links from one variable, D, to another, Y, cannot usually be assessed from the observed association
between the two variables. The reason is that at least
part of the observed association between two variables
may arise by reverse causation (the effect of Y on D) or
by the confounding effect of a third variable, X, on D
and Y.
Consider, for example, a central question in education
research: Does class size affect test scores of primary
school students? If so, by how much? A researcher
may be tempted to address this question by comparing
test scores between primary school students in large and
small classes. Small classes, however, may prevail in
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
259
260
Causal Inference
Causal Inference
Yi ,
n1 D 1
Di
i
P
1 X
Yi 1 Di
0 P
Y
Yi
n0 D 0
1 D i
i
P
^ is an unbiased estimator of
n1 i Di and n0 n n1. a
a0. Usual two-sample testing methods can be applied to
perform statistical inference about a0.
Selection on Observables
In the absence of experimental data, the independence
condition in Eq. (2) is rarely plausible. The reason is that
treated and nontreated may differ in characteristics
other than treatment exposurethat also have an effect
on the outcome variable, so Eq. (2) holds only for fixed
values of those characteristics. In statistical jargon, those
characteristics are called confounders. Let X be the vector
of confounders, then:
Y1 , Y0 ?
?DjX
j
n
j
j
Y1 Y0 1
n1
j1
J
X
261
EY j X, D 1 EY j X, D 0 dP X 5
and
aSATE EY1 Y0 j D 1
Z
EY j X, D 1 EY j X, D 0 dP X j D 1
6
Matching Estimators When X is discrete and takes on
a small number of values, it is easy to construct estimators
of aATE and aSATE based on Eqs. (5) and (6). Suppose that
262
Causal Inference
?
? D j X, then by Eq. (5):
aATE E Y
D p X
p X1 p X
n
^ X i
1X
Di p
Yi
^Xi
^ Xi 1 p
n i1 p
n
X
Yi gDi , Xi ; y2
i1
n
X
Yi aDi X0i b2
i1
Causal Inference
CovY, Z
CovD, Z
EY j Z 1 EY j Z 0
EY1 Y0 j D1 4D0
ED j Z 1 ED j Z 0
263
264
Causal Inference
E[Y | X, D]
E[Y0 | X]
E[Y1 | X]
(D = 0)
(D = 1)
is done by pure
Identification of a(X) for X 6 X
extrapolation of lines.
Bounds When the outcome variable is bounded (when
it has lower and upper bounds), the effect of the treatment
can be bounded even if no other identification condition
holds.
Here we focus on a simple case, in which the dependent variable is binary, so the lower bound is 0 and the
upper bound is 1. For each individual, we observe Y
and D, so we can estimate E[Y1 j D 1]( E[Y j D 1]),
E[Y0 j D 0]( E[Y j D 0]) and P(D 1). We cannot
estimate E[Y1 j D 0] or E[Y0 j D 1], but we know
that they are in between 0 and 1. We can use this fact
to bound SATE and ATE:
EY j D 1 1 SATE EY j D 1
and
EY j D 1 1 PD 1 EY j D 0 PD 0
ATE EY j D 1 PD 1
EY j D 0 1 PD 0
As always, we can estimate the bounds using sample
analogs. It can be easily seen that, for a binary outcome
variable, the width of the bounds is 1. Manski discusses
restrictions that can help narrow the bounds further.
Longitudinal Data
This section presents estimation techniques that are
especially suitable for longitudinal data.
Difference-in-Differences and Fixed Effects We
have studied how to control for observed differences between treated and controls. However, often there are
reasons to believe that treated and nontreated differ in
unobservable characteristics that are associated with potential outcomes even after controlling for differences in
observed characteristics.
Causal Inference
fYi 1 Yi 0g
fY i 1 Y i 0 g
n1 D 1
n0 D 0
i
10
11
E[Y(1) | D = 1]
E[Y1(1) Y0(1) | D = 1]
E[Y0(1) | D = 1]
E[Y(0) | D = 1]
E[Y(1) | D = 0]
E[Y(0) | D = 0]
265
E Yd0 d1
ZZ
EY j X0, X1, D0 d0 , D1 d1
dP X1 j X0, D0 d0 dP X0
for d0 and d1 in {0, 1}. This formula is sometimes
referred to as the G-formula or G-computation formula.
In principle, average potential outcomes may be estimated nonparametrically based on the G-computation
formula. Comparisons of average potential outcomes
inform us about different treatment effects (e.g., the
average effect of one additional period of treatment
given treatment in the first period is E[Y11 Y10]).
Based on these ideas, Robins has developed statistical
models for characteristics of the marginal distributions
of the potential outcomes (marginal structural models)
and models for the effects of additional periods of
treatment (structural nested models).
Figure 2 Graphical
differences.
t=1
interpretation
of
difference-in-
266
Causal Inference
Further Reading
Abadie, A. (2003). Semiparametric instrumental variable
estimation of treatment response models. J. Econometrics
113, 231263.
Angrist, J. D., Imbens, G. W., and Rubin, D. B. (1996).
Identification of causal effects using instrumental variables.
J. Am. Stat. Ass. 91, 444472.
Granger, C. W. J. (1969). Investigating causal relations by
econometric models and cross-spectral methods. Econometrica 37, 424438.
Hahn, J., Todd, P. E., and van der Klaauw, W. (2000).
Identification and estimation of treatment effects with
a regression discontinuity design. Econometrica 69,
201209.
Hardle, W. (1990). Applied Nonparametric Regression. Econometric Society Monograph, 19. Cambridge University
Press, Cambridge, UK.
Heckman, J. J. (2000). Causal parameters and policy analysis in
economics: A twentieth century retrospective. Q. J. Econ.
115, 4597.
Heckman, J. J., Ichimura, H., and Todd, P. E. (1997).
Matching as an econometric evaluation estimator: Evidence
from evaluating a job training programme. Rev. Econ.
Studies 64, 605654.
Heckman, J. J., and Vytlacil, E. J. (1999). Local-instrumental
variables and latent variable models for identifying and
bounding treatment effects. Proc. Natl. Acad. Sci. U.S.A.
96, 47304734.
Holland, P. W. (1986). Statistics and causal inference. J. Am.
Stat. Ass. 81, 945960.
Stephen E. Fienberg
Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
Glossary
accuracy A measure of the closeness of an estimate to the
true value.
contingency table A cross-classified table of counts according to two or more categorical variables.
dual systems estimation A method for combining information from two sources to estimate a population total,
including an estimate for the number of individuals missed
by both sources.
gross error The total error in an estimate of a population
total from a census, consisting of the sum of the errors of
omission and the errors of commission; also known as
erroneous enumerations.
imputation A statistical method for filling in values for
missing data. In the context of census-taking values, missing
questionnaire values are imputed as is the information for
entire households if the Census Bureau believed them to be
occupied.
mail-out/mail-back The primary method of collecting census
information in the United States. Questionnaires are mailed
out to all households listed in a master address file compiled
by the Census Bureau; recipients fill them out and then
return them by mail.
measurement error A variety of sources of error in the
census enumeration process that cause the reported
enumeration values to differ from the true values.
postenumeration survey A sample survey conducted after
the census to provide a second source of information on
households that can be used in dual systems estimation.
The U.S. census enumeration process both misses households and individuals within them and erroneously includes many individuals through duplication and other
types of errors. Since 1940, the Census Bureau has
worked to develop methods for correcting census error,
primarily in the form of omissions. The size of census
errors differs across population subgroups, leading to
what has been known as the census differential net undercount. Statistically adjusting census enumeration
counts for net undercount has been the topic of much
controversy. The sources of census error and the methods
that have been used to estimate the true population totals
by geographic area and demographic groups are reviewed
in this article.
Introduction
Taking an accurate and efficient population census of
a large, rapidly growing, diverse, and mobile population
such as that of the United States is fraught with difficulty.
The results of the count are used to allocate political
representation in Congress, tax funds to local areas,
and votes in the Electoral College, and to design legislative districts at federal, state, and local levels of government. Thus, the adequacy of the census is no mere
academic question. A local area overcounted or undercounted relative to others gets too much or too little
political representation and tax revenue.
From the late 1960s to the early 2000s, the census faced
challenges that it did not count well enough to serve its
political functions, and in particular that it was biased
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
267
268
269
270
10. Coding and processing errors: After a census questionnaire is completed and returned to the Bureau, the
information must be transferred to a computer file and
subsequently processed and checked for consistency.
Errors creep into the official files despite the new technologically based methods for data capture.
11. Geographic coding errors: These errors occur at
various stages in the process, from the compilation of
initial mailing lists as part of the Topologically Integrated
Geographic Encoding Reference (TIGER) system all the
way through coding. In 1990, local communities complained that the census master address file missed housing
units. These errors could result from geographic coding
problems, thus placing an address in the wrong place.
Some of these errors, presumably, lead to geographic
coding errors in census records. Despite all of the corrections that occurred throughout the various forms of
data review, residual geocoding errors place people in the
wrong census blocks.
12. Fabrication: In every census, there are anecdotal
reports of enumerators curbstoning, i.e., fabricating
questionnaires for real and often imaginary households,
and of respondents providing fabricated information. The
Bureau has a variety of methods to catch such fabrication,
but inevitably, a substantial number of fabricated questionnaires are included in the official census results.
13. Last resort information: Some questionnaires
are actually filled out as a last resort by enumerators without their directly seeing or interviewing any household
occupants. The household may not submit a form, and the
enumerator fails after repeated tries to contact the household. In such cases, enumerators use information from
mail carriers, neighbors, or building managers, that is,
proxy respondents, that may be inaccurate, incomplete,
or even intentionally false.
14. Imputation: The Census Bureau uses various forms
of statistical estimation (based either explicitly or implicitly
on statistical models) to fill in values for missing data. This
process of filling in is usually referred to as imputation. If
the Bureau had no information about the occupancy status
of a housing unit (e.g., from last-resort methods), it imputed a status to it, i.e., either occupied or vacant. If a unit
was imputed as occupied, or if the Bureau otherwise believed it to be occupied, then it imputed a number of people
to the household, as well as their characteristics. The current method of choice for imputation, known as the sequential hot-deck procedure, selects a housing unit
proximate in processing as the donor of the characteristics.
The assumption in the statistical model underlying the
imputation method is that neighboring housing units are
likely to have similar characteristics. In 1980, the Bureau
added 3.3 million people to the census through imputation.
Of these, 762,000 were added into housing units for which
the Bureau had no knowledge concerning whether the
units were occupied. The 1990 census had a much lower
Census year
1980
1990
2000
Erroneous
enumerations
Omissions
Gross error
6.0
10.2
12.5
9.2
15.5
15.8
15.2
25.7
28.3
271
272
Table III
Race
Day 2 count
Day 1 count
In
Out
Total
In
Out
250
150
50
?
300
?
Total
400
??
Census
(Black)
In
Out
In
Out
214
146
48
?
Total
360
Recount
(non-Black)
Census
(non-Black)
In
Out
Total
262
?
In
Out
36
4
2
?
38
?
Total
40
??
Total
273
274
Further Reading
Anderson, M. (1988). The American Census: A Social History.
Yale University Press, New Haven.
Anderson, M. (ed.) (2000). Encyclopedia of the U.S. Census.
CQ Press, Washington, D.C.
Anderson, M., Daponte, B. O., Fienberg, S. E., Kadane, J. B.,
Spencer, B. D., and Steffey, D. L. (2000). Sampling-based
Glossary
economic census A statistical program designed to obtain,
categorize, and publish information every 5 years (in years
that end in 2 and 7) for nearly all U.S. businesses.
enterprise An organization (company) that comprises all of
the establishments that operate under the ownership or
control of a single organization.
establishment A single physical location where business is
conducted or where services or industrial operations are
performed.
industry A detailed category in a larger classification system
used to describe and group business activities.
merchandise line A numerical system used to group major
categories of merchandise sold.
North American Industry Classification System (NAICS) A
numerical classification scheme jointly developed by
Mexico, Canada, and the United States to facilitate the
collection, tabulation, presentation, and analysis of data on
establishments.
retail trade The trade sector is made up of businesses
engaged in selling merchandise, usually without transformation, and providing services incidental to the sale of the
merchandise.
sector classification A numerical system used to group and
describe similar businesses.
Standard Industrial Classification (SIC) A numerical
scheme or code utilized to classify industries and products;
superseded in 1997 by a new method, the North American
Industry Classification System.
step in the process of distributing merchandise to customers, usually following manufacturers and wholesalers.
They sell products in relatively small quantities to consumers. Retailers vary greatly in employee size, from the
one- or two-person mom and pop storefront to the hundreds of persons that make up the staff of each Super
Target store. Chains of retailers often dominate local
and regional markets as well as the channels of merchandise distribution (e.g., Wal-Mart). Retailers also vary with
respect to the method by which customers are reached.
That is, there are both store and nonstore retailers, with
the latter selling merchandise using direct-response advertising, telemarketing, broadcast infomercials, catalogues, door-to-door sales, and the Internet, among
other methods of reaching customers.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
275
276
employers and a sample of small employers, i.e., singleestablishment firms with a payroll below a fixed point,
received mail surveys and the appropriate follow-up.
Data for the nonmail universe were derived or estimated
from administrative records of other federal agencies.
Published reports in electronic format include those on
merchandise line sales, establishment and firm size, and
miscellaneous subjects.
277
Market Share
Market share refers to the percentage of sales of a product
in units, dollars, or some other meaningful measure made
by a business relative to all sales of that product, usually
for a specific geographic unit and always for a specific time
period, e.g., December 2003. It is useful to know absolute
sales and trends in sales for a business, but the value of
these data can be extended greatly if market share is calculated. So, if a local clothing store has experienced
a recent increase in sales, e.g., a 50% rise over the past
4 years, but market share has declined, then it may be the
case that even though a favorable demographic profile
278
Assess Competition
Although the Economic CensusRetail Trade data do
not make available the names and associated data for
individual competing businesses, the available data can
be very useful, when combined with other information, in
understanding and measuring the effects of competitor
strength. Telephone business listings, Chamber of Commerce memberships, and other readily available data can
provide basic information on new and old competitors,
including their location, within a particular market. The
same sources can provide data on business expansions,
e.g., new locations, as well as on closings. Figures for
individual businesses (numerator) and U.S. Census
Bureau retail trade data (denominator) can be used to
calculate market share, but in the context of additions
or exits to and from the market. If a mom and pop
grocery store loses 30% of its market share in Jefferson
County between 1997 and 2002, and in 1999 a new WalMart store opens in Jefferson County, the likely reason for
the decline in market share would be the presence of the
new Wal-Mart store, provided that other business conditions in the market remained basically the same.
Site Location
Retail trade data can be combined with other demographic indicators as well as information on existing
businesses to make decisions about where physically to
locate a new business. Areas with lower than expected per
capita or per household retail sales, accounting for income
variations and the location of competitors, present
279
Establishments
Sales ($1000)
Paid employees
1997
1992
11,268
11,375
16,350,932
11,521,818
1,799,417
1,307,961
149,478
132,157
0.9
2.3
41.9
34.4
37.6
30.4
13.1
15.0
Sales ($1000s)
Business type
1982
1987
1992
1997
1982
1987
1992
1997
328
1229
822
1022
276
1222
841
977
300
1095
817
907
299
1040
858
786
784,103
1,419,864
1,295,442
344,553
1,084,413
1,672,434
1,792,514
365,021
1,567,227
2,156,006
2,376,629
492,112
2,199,033
2,284,888
3,718,928
490,348
280
Market Share
To calculate 1997 market share for a furniture and home
furnishings store in the Omaha, Nebraska MSA, two sets
of data are needed (Table III). The data in Table III show
that in 1997, the Omaha Furniture Company (fictitious
company) held nearly 12% of the Omaha MSA furniture
and home furnishings market. An expanded analysis
would help document trends in market share (i.e., comparison data for 1992 and 2002), determine what percentage of the Omaha MSA annual payroll and paid employees
it took to support the 11.7% market share, and identify
what percentage of the Omaha Furniture Companys
sales came from outside the MSA (data from company
records).
Pull Factors
One way to estimate the extent to which a community or
place draws its customers from outside its boundaries is to
calculate a pull ratio. Pull ratios are normally produced
with a specific area of retail trade in mind. For the
following example, the focus is on the extent to which
Douglas County, Nebraska draws its customers for new
automobiles from outside the county. Two sets of data
are required to calculate the pull ratio for the scenario
Table III Furniture and Home Furnishing Sales Market
Share for a Furniture Company, 1997a
Coverage
Omaha MSAb
Omaha Furniture Co.c
Market share
a
Furniture/home
furnishings sales ($1000)
566,775
66,500
11.7%
described (Table IV). A set of straightforward manipulations is required to arrive at the pull ratio:
Pull factor Douglas County per capita new
auto sales/state per capita new
auto sales state per capita income/
county per capita income
2:284=1:904 19613=22879
1:199 0:857
1:027:
The first component of the pull factor, the new auto sales
ratio (county per capita sales/state per capital sales),
produces an index of per capita sales using the state per
capita figure as the standard or average figure. Any ratio
larger than 1.0 means that Douglas County new automobile sales exceed the statewide average; in the example
here, a ratio of 1.199 means that per capita sales in Douglas
County are nearly 20% greater than the state average.
However, the first ratio must be adjusted for income
differences, and in this instance, state per capita income
is less than that of Douglas County. That is, purchasing
power for the state is lower than that for Douglas County.
The new ratio, 1.027, means that Douglas County pulls
2.7% more new automobile sales than would be expected
based on population and income data alone.
The analysis can be expanded to include multiple
points in time to determine if pull is increasing, decreasing, or staying the same. The pull factor can be calculated
for different retail NAIC systems (e.g., gasoline stations
and used cars) to determine where retail pull is the
greatest and least in Douglas County. Moreover, calculating pull factors for numerous NAIC systems and several
counties can provide valuable information on dominant
markets for different retail sectors.
Automobile
sales ($1000)
Population
Per capita
income
1,007,465
3,155,814
441,006
1,657,000
$22,879
$19,613
281
Transforming absolute sales figures into per capita measures offers a way to make reasonable comparisons. Three
sets of data are used in Table V for illustration purposes.
Per capita furniture and home furnishings are $1251, $61,
and $443 for Douglas County, Sarpy County, and
Nebraska, respectively. Sales per household totals are
$3030, $167, and $1103 for Douglas County, Sarpy County, and Nebraska, respectively. The large discrepancy in
per capita figures clearly shows how dominant Douglas
County is with regard to furniture and home furnishings
sales. In fact, the Douglas County pull factor for furniture
and home furnishings is quite high, 2.420! As it turns out,
Douglas County furniture and home furnishing retailers
draw customers from a five-state market area. The market
is dominated by one retailer, Nebraska Furniture Mart.
Region
Furniture
and home
furnishing
sales ($1000)
Population
Households
552,054
7278
734,973
441,006
118,571
1,657,000
182,194
42,426
666,184
Douglas County
Sarpy County
Nebraska
a
Sources: U.S. Bureau of the Census (1999), 1997 Economic Census,
Retail Trade, Nebraska (Table 3) and U.S. Bureau of the Census (2002),
Census 2002, Nebraska (Tables 1 and 3), U.S. Government Printing
Office, Washington, D.C.
per household furniture and home furnishings sales figures for Douglas County, and comparison low per capita
and household numbers for Sarpy County, suggest at first
examination significant market potential in Sarpy County.
Although some potential probably exists, the dominance
of Douglas County retailers for furniture and home
furnishings sales has a long history, and Douglas County
competition for a new Sarpy County furniture retailer is
substantial.
Focusing on a different retail category, gasoline stations, other comparisons are possible. All three of the
counties have per capita sales at gasoline station figures
that are substantially smaller than those found for the
entire state (Table VI). All other factors being equal, if
Sarpy County residents spent as much per capita as did
residents statewide, there would be $106,065,000 in sales
(86,232 1.23). The difference in the two figures, nearly
$20 million, represents potential new sales. Other confounding factors, e.g., commuting patterns, drive-to-work
distance, and the range of products/services offered at gas
stations, that affect the demand for products sold as well as
where people stop to purchase gasoline and related
products and services must be accounted for before business decisions are made. It is easier to establish clear sales
potential in markets that are not adjacent to others, especially if the products are not sold via the Internet or by
direct sales.
An alternative way of exploring retail opportunities is
to examine retail supply in one or more market areas.
Table VII shows data for seven Nebraska counties with
respect to food and beverage stores (NAIC 445). Although
the range in the number of establishments is relatively
small (11), there is substantial variation in the average
number of persons employed in those stores. On average,
the stores are small in Antelope and Boyd counties and
they are larger in Adams and Platte counties. Extended
analyses in each county would identify individual stores to
determine if one or two dominate the market. By combining contiguous counties or places (cities) that were in
close proximity, the market potential for a new competitor
could be determined.
In conclusion, data from the Economic Census
Retail Trade are used for a wide range of business and
government purposes. When combined with other U.S.
Table VI Gasoline Station Sales and Population for Douglas County, Sarpy County, Washington County, and the State of Nebraskaa
Region
Douglas County
Sarpy County
Washington County
Nebraska
a
Sales ($1000s)
Population
State/county ratio
306,109
86,232
10,826
1,488,262
441,006
118,571
18,470
1,657,000
$694
$727
$586
$898
1.29
1.23
1.55
Sources: U.S. Bureau of the Census (1999), 1997 Economic Census, Retail Trade, Nebraska (Table 3) and U.S. Bureau of the Census (2002), Census
2002, Nebraska (Tables 1 and 3), U.S. Government Printing Office, Washington, D.C.
282
Establishments
Paid employees
Employees per
establishment
16
10
10
6
6
14
5
471
73
177
32
136
472
118
29.4
7.3
17.7
5.3
22.6
33.7
23.6
Source: U.S. Bureau of the Census (1999), 1997 Economic Census, Retail Trade, Nebraska (Table 3), U.S. Government Statistical Office, Washington, D.C.
b
NAIC, North American Industry Classification.
Further Reading
Ahmed, S. A., Blum, L. A., and Wallace, M. E. (1998).
Conducting the economic census. Govt. Informat. Q. 15,
275302.
Casey, D. M. (2002). U.S. Retail Sales, Mall Sales, and
Department Store Sales Review. International Council of
Shopping Centers, New York.
Dumas, M. W. (1997). Productivity in two retail trade
industries: 198795. Month. Labor Rev. 120, 3539.
Glossary
long-form items Additional items asked of a sample of
households and individuals on the census long-form
questionnaire (the long form also includes the short-form
items).
population coverage Census count divided by the census
count plus the estimated net undercount (people missed in
the census minus duplicates and other erroneous enumerations).
public use microdata sample (PUMS) files Contain records
for households and people sampled from census long-form
records, processed to protect confidentiality.
short-form items Basic demographic items asked of everyone; short-form items are included on the short-form and
long-form questionnaires.
summary (SF) files Contain census tabulations for geographic areas, down to the block level (short-form tabulations) and the block group level (long-form tabulations).
The U.S. decennial census (hereafter census) is conducted every 10 years as required by Article 1 of the
Constitution. The first census in 1790 obtained minimal
information for each household. The 2000 census ascertained six basic items, plus name, for everyone (short-form
items) and more than 60 population and housing items for
approximately one in six households on long-form questionnaires. (The number of long-form items exceeded the
number of questions, some of which had multiple parts.)
The data are available in tabular and microdata formats.
The basic data serve constitutional purposes of reapportioning the House of Representatives, drawing new legislative district boundaries, and enforcing provisions of
the Voting Rights Act. The short-form and long-form
data are widely used by federal, state, and local governments, the private sector, academia, the media, and the
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
283
284
Population Estimates
Another important census data product comprises regularly updated small-area population estimates that the
Census Bureau develops based on each census. The Bureau produces estimates of total population by single years
of age, sex, race, and Hispanic origin on a monthly basis
for the United States and annually for states and counties
as of July 1 of each year. The Bureau also produces total
population estimates every 2 years for incorporated places
and minor civil divisions of counties (in states that have
such divisions). The Bureau has begun producing biennial
estimates of total population and children aged 5 through
17 for school districts. Population estimates are produced
by updating the census year figures with data from such
sources as birth and death records and immigration
statistics.
285
Source: U.S. Census Bureau (1973); authors inspection of 1980, 1990, and 2000 questionnaires.
Constitutional Purposes
Reapportionment
The decennial census plays a fundamental role in the U.S.
political system by providing population counts for each
286
Table II
Americans Overseas
Rules for counting Americans who live overseas have
varied from census to census. In 1970, 1990, and
2000, federal military and civilian employees (and their
dependents) living abroad were assigned a home state
from administrative records and included in the state reapportionment counts, but not in other data releases. This
overseas population totaled 576,000 people in 2000 (0.2%
of the U.S. population). Tests will be conducted in 2004 of
counting private U.S. citizens who live overseas.
Population Coverage
Research on census coverage, which began with analyses
of undercounts of draft-age men and young children
in the 1940 census, estimated a net undercount of the
population in every census from 1950 to 1990. Research
also estimated higher net undercount rates for some population groups than others, such as higher rates for blacks
than for nonblacks and children than for adults. Beginning
in 1970, the Census Bureau made special efforts to cover
hard-to-count population groups, although research
showed that such efforts were only partly effective. Beginning in 1980, the Bureau worked on a dual-systems
estimation (DSE) method, based on data from a postenumeration survey and a sample of census records,
that could be used to statistically adjust census counts for
measured net undercount. The Bureau originally planned
to use DSE methods to adjust 2000 state population totals
for congressional reapportionment, but a January 1999
decision by the U.S. Supreme Court precluded such adjustment. The Bureau also planned to adjust 2000 counts
for other purposes, but that was not done.
287
288
Fund Allocation
Over $200 billion of federal funds are allocated each year
to states and localities by formulas, many of which use
census data and population estimates. Examples include
the following:
(1) Medicaid ($145 billion obligated in fiscal year
2002): Reimburses a percentage of each states expenditures for medical care services for low-income elderly and
disabled people and families with dependent children by
a formula that uses per capita income estimates. The U.S.
Bureau of Economic Analysis develops these estimates
with data from administrative records, the decennial census long-form sample and other censuses and surveys, and
census-based population estimates (as denominators).
Other programs use the Medicaid formula.
(2) Title 1 of the Elementary and Secondary Education
Act ($9.5 billion obligated in fiscal year 2002): Allocates
funds to school districts to help educationally disadvantaged children by a formula that includes estimates of
school-age children in families with incomes below the
official poverty threshold. The estimates previously derived from the most recent census long-form sample;
currently, they derive from Census Bureau statistical
models, which include census poverty data as one input.
(3) Community Development Block Grants and Entitlement Grants ($3 billion authorized in fiscal year 2002):
Allocates the larger amount from two formulas to states,
metropolitan cities, and urban counties; the formulas use
census data on total population, people living in poverty,
overcrowded housing (first formula), and housing built
before 1940 (second formula).
289
Formula Allocation
Examples include using census estimates of children in
families with income below the poverty line in a formula to
allocate state child welfare block grants to counties and
using census estimates of female-headed households
living in poverty with dependent children in a formula
to allocate state crisis counseling funds to counties.
Facility Planning
Examples include using long-form data on commuting
patterns to help redesign bus routes and plan new roads
and using census socioeconomic data and administrative
records data on public health needs to locate health clinics.
Disaster Planning
Examples include using long-form data on vehicle ownership and disability to estimate the numbers of people
who would need to be evacuated in a disaster and how
many might need transportation and using long-form
place of work and place of residence data to estimate
daytime populations to develop disaster plans for employment centers.
290
Research Uses
Researchers use census data for analyses on topics such as
aging, educational attainment, migration flows, environmental exposures, and concentrated poverty. Many research studies based on census data have important
public policy implications.
Aging
Census summary tabulations support analyses of
migration flows and concentrations of the elderly by
geographic area for subgroups defined by living
arrangements, income, labor force attachment, and
other characteristics. PUMS files support detailed
Education
Researchers use census summary data on median years of
school completed, average income, and unemployment to
describe challenges facing communities; they use PUMS
files to assess age and race group differences in educational level, income, and housing quality.
Environment
Census small-area socioeconomic information related to
environmental hazards data permit analyses of the environmental effects on different population groups.
Migration
Census small-area data are an unparalleled resource for
the study of migration among regions, states, counties,
and places, and consequences for different parts of the
country. The Census Bureau publishes data on county-tocounty migration flows; the PUMS files permit detailed
analysis of the characteristics of long-distance and shortdistance movers and nonmovers.
Poverty
Analyses of 1970 and 1980 census summary data revealed
large increases in densely populated (mainly black) urban
neighborhoods with more than 40% poor families. These
findings stimulated further census-based research on
concentrated urban poverty.
Timeliness
Census data are collected once every 10 years and made
available 1 to 3 years after collection, which may affect
analyses for areas experiencing rapid change. Population
estimates update the basic demographic information for
small areas; the ACS is intended to provide updated longform-type information if it is implemented.
Coverage
The census misses some people, duplicates others, and
puts others in the wrong location. Although the census is
the best source of population counts (it has better coverage than household surveys), recognition of coverage problems is important when using the data.
Nonresponse Error
The census strives to obtain complete data for everyone
but, even after follow-up, obtains no or limited information for some households. Imputation methods use data
from nearby households to supply records for wholehousehold nonrespondents and to fill in individual missing items on person and housing unit records. Missing
data rates are high for some long-form items, such as
income and housing finances, and for residents of
group quarters. Imputation introduces variability in
estimates and may bias estimates if nonrespondents differ
from respondents in ways that the imputation procedures
do not reflect.
Further Reading
Anderson, M. J. (1988). The American Census: A Social
History. Yale University Press, New Haven, CT.
Baker, G. E. (1986). Whatever happened to the reapportionment revolution in the United States? In Electoral Laws
291
Glossary
behaviorism In psychology, the study of human being in
terms of observable behavior, characteristically stressing
stimulus and response, without reference to values, beliefs,
and attitudes.
behaviorism, pluralistic In sociology, the view that the
individual and interaction among individuals constitute the
basic units of analysis, and that social structures consist of
externally similar acts.
cultural lag A situation where elements of a culture change at
different rates causing disruption, typically when science
and technology change more rapidly than social institutions.
eugenics The study of human heredity to improve humankind through selective breeding.
evolutionism, unilinear The theory that all cultures develop
along similar lines from the simple to the complex, often
coupled with a belief in progress.
latent function The unintended and unrecognized consequences of an activity for group adaptation, for example,
the reinforcement of group identity in a Native American
rain dance.
mores Binding, extra-legal social norms considered essential
to a groups welfare.
objectivism A philosophical term to describe the view that
objects of knowledge exist independent of human perception, often used pejoratively by humanistic sociologists who
allege that positivistic sociologists treat human beings as
material objects.
operationalism The view that the only valid scientific
concepts are those that prescribe the means of measuring
the concept.
positivism The belief that knowledge comes only from the
senses and that the methods of the natural sciences thus
provide the only accurate means of attaining knowledge.
positivism, instrumental The use of measurement, statistics
and other quantitative techniques in formulating social
policy.
scientism A term, usually pejorative, applied to positivistic
social science, broadly to social evolutionism, more
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
293
294
295
296
Social Measurement
At the University of Minnesota, Chapin moved steadily
toward behavioral analysis and scientific neutrality. The
results were the living room scales and social experiments that won him a reputation as a pioneering technician of the new sociology. Although the first courses in
sociology at the university dated to the early 1890s,
Minnesota did not establish a formal department in the
field until 1910, initially staffed by an ethnographic
popularizer and two reformers in the Social Gospel tradition. In the half-dozen years before Chapin arrived,
a separate department, Social and Civic Training, was
established under Yale Ph.D. Arthur J. Todd; the department was subsequently merged with the Department of
Sociology.
Appointed at age 33 to replace Todd, Chapin was the
youngest chairman of a major sociology department in the
nation. His passion for order was just what the situation
demanded. The department he entered was the site of
a three-way tug of war among an older generation of
amateurs, social workers and social surveyors, and Luther
L. Bernard, a Chicago Ph.D. who was the sole voice for
what was now termed sociological objectivism and who
made no secret of his disdain for the other factions. When
Chapin appointed Russian emigre Pitirim Sorokin in
1923, Bernard feared that this distinguished theorist
was being groomed as his replacement.
Chapin did not initially transform the department,
partly because his own interests remained extremely
broad and partly because the merger of sociology and
social work gave a practical, socially oriented cast to
297
298
the living room cost 2 points. Chapin was aware that these
judgments were culturally conditioned: a different scale
would be needed for homes in China or India or for
a different time in the United States. He insisted nonetheless that using objective factors to measure status
avoided problems of interviewer bias and provided
a valuable tool for social workers in placing children in
foster homes.
From Statistics to
Social Experiments
During and after World War II, Chapin saw new opportunities for planning: measuring national morale, gauging
psychological crosscurrents that might affect peacemaking, organizing postwar conferences, and facilitating postwar demobilization. Previously confined to communities
and nations, social measurement was now a global priority. Against this background, Chapin moved in two directions. From 1941 to 1951, he was a contributing editor
to Sociometry, a journal founded by Jacob Levi Moreno
to study group dynamics. Chapin evidenced an interest in
sociometry in the earlier sense of social measurement in
his studies of classroom behavior, leadership, and conference procedures. He rejected, however, narrower definitions of sociometry as the quantification of informal
friendship constellations (Morenos program), or as
a therapeutic tool. An exception was an attempt in the
American Journal of Sociology in 1950 to prove that individuals ranked sociometrically as stars or leaders were
socially isolated and aloof, a conclusion that drew almost
immediate criticism in a subsequent issue of the same
journal. Chapin, the authors observed, was an otherwise
skilled social scientist who is relatively inexperienced in
the sociometric field.
In Experimental Designs in Sociological Research
(1947), Chapin revisited the problem of providing controls in social experiments, no longer convinced that statistics could give a truly scientific result. Experimental
designs required experiment and control groups to
be determined by frequency distributions of designated
traits. Comparisons could be made at a single time (crosssectional design), before and after (projected), or
after the fact, as an effect is traced backward to previous
causes (ex post facto). Applying his method, he divided
102 former Boy Scouts into two groups: the first group
consisted of those who dropped out of scouting after an
average of 1.3 years; the second included those whose
tenure averaged 4 years. These control and experimental groups were then paired by equating frequency distributions on place of birth, fathers occupation, health
rating, and mental ability. After 22 cases were eliminated
to make the groups equal, two groups of 40 were
compared for performance in later life as measured by
a Social Participation Scale. Although the methodology
299
grew ever more complex, the issue remained social adjustment. Whether the subject was social insight, the
effects of good housing, active versus dropout scouts, or
work relief, the conclusions were similar: people who join
more organizations have higher social intelligence; persons affiliated with political, professional, social, and civic
groups rank highest in social insight; good housing and
active scouting foster adjustment; and work relief is
better than the dole. Sociometry, as Chapin defined it
in a 1943 article, was essential to human adjustments
in an expanding social universe.
300
Further Reading
Althouse, R. C. (1964). The Intellectual Career of F. Stuart
Chapin. University Microfilms, Ann Arbor, Michigan.
Bannister, R. C. (1987). Sociology and Scientism. University of
North Carolina Press, Chapel Hill.
Camic, C. E. (1994). The statistical turn in American social
science: Columbia University, 1890 to 1915. Am. Sociol.
Rev. 59, 773 805.
Fine, G. A., and Severance, J. S. (1985). Great men and hard
times: Sociology at the University of Minnesota. Sociol. Q.
26, 117 134.
Platt, J. (1996). A History of Sociological Research Methods
in America, 1920 1960. Cambridge University Press,
Cambridge, UK.
Ross, D. (1991). The Origins of American Social Science.
Cambridge University Press, Cambridge, UK.
Glossary
Classical Model
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
301
302
Summary
Let FXjt ( ) be the distribution function of observed-score
Xjt for a fixed person and FTJt ( ) be the distribution
function of the true score for the population of persons.
The two models in Eqs. (3) and (4) can be summarized as
the following two-level model:
Xjt FXjt x; tjt ,
tJt FTjt t:
Classical Theory
In spite of its almost tautological nature, it is amazing how
many results relevant to measurement in the social sciences can be derived from the classical model in Eqs. (5)
and (6). The goal of these derivations is to derive
unobservables from unobservables, that is, to express
unobservable quantities defined in terms of true scores
and measurement error as functions of observables.
The following equations are examples of results derived
directly from the model in Eqs. (5) and (6):
EEjt 0,
s2Ejt s2Xjt ,
CovXjt , X0jt
7
8
0,
CovT, E 0,
10
11
and
PrfX0 jt xo g PrfX0 jt xo g
for xo large:
12
The first three results show that, for a fixed person j, the
expected error is equal to zero, the variance of the error
is equal to the variance of the observed score, and the
covariance between the observed score and a replication
X0jt is equal to zero. In all three equations, the left-hand
side contains an unobservable quantity and the righthand side contains an observable quantity. The result in
Eq. (10) shows that the covariance between true scores
and observed scores in any population is always equal to
zero. The derivation of this result is less obvious, but it
follows directly from the fact that Eq. (7) implies
a horizontal regression line of X on t. The zero
covariance in Eq. (10) leads directly to the equality
of the observed-score variance with the sum of the
true-score and error variances in Eq. (11).
The property in Eq. (12) shows that if there is
a large test score xo and the test is replicated, it is
more likely that a smaller (rather than larger) second
score will be observed. (By a large test score is meant
a score larger than the median; a comparable statement
is true for a score below the median.) This simple
property, which is a trivial consequence of the assumption of a random observed score, explains the often
misunderstood phenomena of regression to the mean
and capitalization on chance due to measurement
error.
More useful results are possible if the observed score,
true score, or error is used to define new test and item
parameters. Such parameters are often defined with
a practical application in mind (for example, item and
test analysis, choice of test length, or prediction of success
in a validity study). Some results for these parameters can
be derived with the model in Eqs. (5) and (6) as the
only assumption; for others, an auxiliary assumption is
needed. A commonly used auxiliary assumption is the
one of parallel scores on two measurement instruments.
Scores Xt on test t and Xr on a second test r are
strictly parallel if
tjt tjr ,
13
s2Xjt s2Xjr ,
14
Reliability
A key parameter in CTT is the reliability coefficient of
observed score XJt. This parameter is defined as the
303
squared (linear) correlation coefficient between the observed and true score on the test,
r2TX :
15
s2T
,
s2X
16
17
Internal Consistency
The internal consistency of a test is the degree to which
all of its item scores correlate. If the correlations are
high, the test is taken to measure a common factor.
Index i 1, . . . , n denotes the items in the test; a second
index k is used to denote the same items. A parameter for
the internal consistency of a test is coefficient a, which
is defined as
" Pn
#
n
i6k sik
a
:
18
n1
s2X
This parameter is thus equal to the sum of the item
covariances, sik, as a proportion of the total observed
score variance for the test (corrected by a factor slightly
larger than 1 for technical reasons).
As will become clear later, a convenient formulation
of coefficient a is
Pn
s2
n
19
1 i12 i :
a
n1
sX
For the special case of dichotomous item scores,
coefficient a is known as KuderRichardson formula
20 (KR20). If all items are equally difficult, this formula
304
Validity
If a test score is used to predict the score on another
instrument, e.g., for the measurement of future success
in a therapy or training program, it is important to
have a parameter to represent its predictive power. Let
Y denote this other score. We define the validity coefficient for the observed test scores, X, as the correlation
coefficient.
rXY :
20
The reliability coefficient remains an important parameter in predictive validity studies, but the correlation
of observed score X with Y, instead of with its true score
T in Eq. (15), becomes the ultimate criterion of success
for it in prediction studies.
Item Parameters
Well-known item parameters in CTT are the itemdifficulty or item p value, the item-discrimination coefficient, and the item validity coefficient. Suppose that the
items are scored dichotomously, where Ui 1 is the value
for a correct response to item i and Ui 0 is for an incorrect response. The classical parameter for the difficulty
of item i is defined as the expected value or mean of Ui
in the population of examinees,
pi E Ui :
21
interpretation problem does not exist for the itemvalidity coefficient, riY.
It is helpful to know that the following relation holds
for the standard deviation of observed score X:
sX
22
23
si riX :
24
i1
n
X
r2XT rXX0 ,
27
r2XT a,
28
rXT rXY :
29
and
Test Length
If the length of a test is increased, its reliability is expected
to increase too. A well-known result in CTT is the
SpearmanBrown prophecy formula, which shows that
this expectation is correct. Also, if the lengthening of the
test is based on the addition of new parts with parallel
scores, the formula allows calculation of its reliability
in advance.
Suppose the test is lengthened by a factor k. If the
scores on the k 1 new parts are strictly parallel to the
score on the original test according to the definition in
Eqs. (13) and (14), the SpearmanBrown formula for
the new reliability is
r2ZTZ
kr2XTX
,
1 k 1r2XTX
30
Attenuation Corrections
As already discussed, attenuation corrections were the
first results for CTT by Spearman in 1904. He showed
that if we are interested in the correlation between the
true scores TX and TY and want to calculate it from their
unreliable observed scores X and Y, the following relation
can be used:
r TX T Y
rXY
,
rXTX rYTY
rXY
:
rYTY
Parameter Estimation
The statistical treatment of CTT is not well developed.
One of the reasons for this is the fact that its model is not
based on the assumption of parametric families for the
distributions of Xjt and TJt in Eqs. (5) and (6). Direct
application of standard likelihood or Bayesian theory
to the estimation of classical item and test parameters
is therefore less straightforward. Fortunately, nearly all
classical parameters are defined in terms of first-order
and second-order (product) moments of score distributions. Such moments are well estimated by their
sample equivalents (with the usual correction for the variance estimator if we are interested in unbiased estimation). CTT item and test parameters are therefore often
estimated using plug-in estimators, that is, with sample
moments substituted for population moments in the
definition of the parameter.
A famous plug-in estimator for the true score of
a person is the one based on Kelleys regression line.
Kelley showed that, under the classical model, the
least-squares regression line for the true score on the
observed score is equal to
ET j X x r2XT x 1 r2XT mX :
33
31
305
32
306
Binomial Model
If the item scores are dichotomous, and observed
score Xjt is defined as the number-correct score, the
observed-score distribution can sometimes be approximated by the binomial with probability function
n
f x
px 1 pjt nx ,
34
x jt
where pjt is the binomial success parameter. For the
binomial distribution it holds that pjt EXjt , which
proves that this distribution remains within the classical
model. The assumption of a binomial distribution is
strong in that it only holds exactly for a fixed test if pjt is
the common probability of success for j on all items. The
assumption thus requires items of equal difficulty for
a fixed test or items randomly sampled from a pool. In
either case, pjt can be estimated in the usual way from
the number of correct responses in the test.
The assumption of a beta distribution for the binomial
true score, which has density function
f p
pa 1 1 pbn
,
Ba, n b 1
35
NormalNormal Model
Another strong true-score model is based on the assumptions of normal distributions for the observed score of
a fixed person and the true scores in the population:
Xjt Nmjt , sjt ,
mjt NmT , sT :
36
37
a rough approximation of the empirical distributions. Observed scores are discrete and often have a bounded
range; the assumption of a normal distribution cannot
hold exactly for such scores.
Applications
The major applications of CTT are item and test analyses,
test assembly from larger sets of pretested items, and
observed-score equating.
Test Construction
If a larger set of items exists and a test of length n has to be
assembled from this set, the usual goal is maximization of
the reliability or validity of the test. The optimization
problem involved in this goal can be formulated as an
instance of combinatorial programming. Instead of optimizing the reliability coefficient, it is more convenient to
optimize coefficient a; because of Eq. (28), r2XT is also
optimized. Using the facts that n is fixed and
n
X
sX
si riX ,
38
i1
39
si riX xi ,
i1
s2i xi c,
40
Test Equating
If a new version of an existing standardized test is
constructed, the scales of the two versions need to be
equated. The equating transformation, which maps the
scores on the new version to equivalent scores on the old
version, is estimated in an empirical study, often with
a randomly-equivalent-groups design in which the two
versions of the test are administered to separate random
samples of persons from the same population. Let X be the
observed score on the old version and Y be the observed
score on the new version. In classical equipercentile
equating, a study with a randomly-equivalent-groups
design is used to estimate the following transformation:
41
x j y FX 1 FY y:
This transformation gives the equated score X j(Y) in
the population of persons the same distribution as the
observed score on the old version of the test, FX(x).
Estimation of this transformation, which actually is
a compromise between the set of conditional transformations needed to give each person an identical
observed score distribution on the two versions of the
test, is sometimes more efficient under a strong truescore model, such as the beta-binomial model in
Eqs. (34) and (35), because of the implicit smoothing
in these models. Using CTT, linear approximations to
the transformation in Eq. (41), estimated from equating
studies with different types of sampling designs, have
been proposed.
Current Developments
The main theoretical results for the classical test model
were already available when Lord and Novick published
their standard text in 1968. In fact, one of the few
problems for which newer results have been found is
the one of finding an approximation to the conditional
standard error of measurement sEjTt. This standard
error is the one that should be used instead of the marginal
error sE when reporting the accuracy of an observed
score. Other developments in test theory have been predominantly in item-response theory (IRT). Results from
IRT are not in disagreement with the classical test model,
but should be viewed as applying at a deeper level of
parameterization for the classical true-score in Eq. (1).
307
42
i1
i1
Further Reading
Gulliksen, H. (1950). Theory of Mental Tests. Wiley, New York.
Kolen, M. J., and Brennan, R. L. (1995). Test Equating:
Methods and Practices. Springer-Verlag, New York.
Novick, M. R. (1966). The axioms and principal results of
classical test theory. J. Math. Psychol. 3, 118.
Lord, F. M., and Novick, M. R. (1968). Statistical Theories
of Mental Test Scores. Addison-Wesley, Reading,
Massachusetts.
Spearman, C. (1904). The proof and measurement of association
between two things. Am. J. Psychol. 15, 72101.
Traub, R. E. (1997). Classical test theory in historical perspective. Educat. Psychol. Measure. Issues Pract. 16(4), 814.
van der Linden, W. J. (1986). The changing conception of
testing in education and psychology. Appl. Psychol.
Measure. 10, 325332.
van der Linden, W. J. (2004). Linear Models for Optimal Test
Design. Springer-Verlag, New York.
van der Linden, W. J. (2004). Evaluating equating error in
observed-score equating. Appl. Psychol. Measure.
von Davier, A. A., Holland, P. W., and Thayer, D. T. (2004).
The Kernel Method of Test Equating. Springer-Verlag,
New York.
Clinical Psychology
Silke Schmidt
University of Hamburg, Hamburg, Germany
Mick Power
University of Edinburgh, Edinburgh, United Kingdom
Glossary
clinical psychology The application of psychological knowledge to a range of psychological and physical problems
across the life span.
life span approach The study of development and change
from infancy to old age.
psychotherapy The use of verbal and behavioral methods by
skilled practitioners to help individuals improve or cope
more effectively with a variety of personal and interpersonal
problems.
reflective practitioner model The therapists understanding
of the personal issues and needs that arise in the practice of
therapy, and appropriate action taken to work on such
issues.
scientist-practitioner model The application of psychological knowledge in the formulation of psychological problems
following a strategy of hypothesis testing, monitoring,
evaluation, and assessment of interventions and outcomes.
Introduction
Clinical psychology is a subject that focuses on the psychological (that is, the emotional, biological, cognitive,
social, and behavioral) aspects of human functioning in
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
309
310
Clinical Psychology
Background
Historically, psychology was considered academically as
a component of philosophy until the late 19th century;
a separation arose through the application of the scientific empirical method. The scientific empirical method
replaced philosophical analysis as the primary method
for approaching problems and added an additional
source of information linked to the hermeneutic (interpretive) method in order to gain insight. Wilhelm
Wundt established the first psychological laboratory in
Leipzig in 1875; his influence was widespread initially in
Germany, but this spread quickly to Britain and the
United States, where psychological laboratories were
established along the lines of Wundts model in Leipzig.
A student of Wundts, Lightner Witmer, coined the
term clinical psychology, which he defined in 1895
as follows: While the term clinical has been borrowed
from medicine, clinical psychology is not a medical psychology. I have borrowed the word clinical from medicine, because it is the best term I can find to indicate
the character of the method, which I deem necessary
for this work. . . . The term clinical implies a method,
and not a locality. Clinical psychology likewise is
a protest against a psychology that derives psychological
and pedagogical principles from philosophical speculations and against a psychology that applies the results of
laboratory experimentation directly to children in the
school room.
Strongly under Wundts influence, Witmer founded
the first psychology clinic in Philadelphia in 1896 and
applied the scientific method and knowledge of psychological principles to a range of psychological problems,
Theory
Clinical psychology has been an area that has incorporated the approaches and perspectives of diverse, even
contradictory, schools of psychology, including behaviorism, psychoanalysis, cognitive psychology, humanism, and
systems approaches. Each of the schools incorporates
a wide variety of theoretical models and empirical
evidence from a particular domain, such as the biopsychosocial domain, the behavioral domain, the cognitive
emotional domain, and the psychodynamic domain. The
models developed in these schools can best be understood as paradigms because they employ an exclusive
language code and are based on common assumptions.
Traditionally, the most widespread paradigm in psychopathology and theory is the psychoanalytic, or psychodynamic, paradigm, originally developed by Freud. The
essential assumption within psychodynamic theories is
that psychopathology arises from unconscious conflict.
On the other hand, the biological paradigm of abnormal
behavior is a broad theoretical perspective; it assumes that
mental disorders are caused by aberrant somatic or biological processes. This paradigm has often been referred
to as the medical, or disease, model. The learning (behavioral) paradigm assumes that normal and abnormal
behaviors are learned and acquired through experience,
or shaped by environmental stimuli. Emotional and cognitive approaches have in some ways evolved as a response
to these contradictory approaches, with a combination of
focus both on the inner world and on the outer world.
Apart from these paradigms, a variety of internal and
external processes (related to cognition and emotion) may
also be used to explain the psychodynamic responses
between external stimuli and inner conflicts.
In addition to the basic psychological models within
each of the schools of clinical psychology, a variety of
integrative approaches now span different schools, and
are often pervasive in scope. For example, theories and
models of emotion regulation, attachment, learned helplessness, and self-control have expanded from a distinct
area of interest to the development of sets of common
psychological principles. Some of these models are selectively outlined in the following discussions, in order
to show how the theories of different schools can be
combined into integrative models.
Clinical Psychology
311
Assessment
Assessment in clinical psychology involves determining
the nature, causes, and potential effects of personal distress, types of dysfunctions, and psychological factors associated with physical and mental disorders. It involves
the statistical, methodological, research, and ethical issues
involved with test development and use, as well as their
proper application in specific settings and with specific
populations. The development, use, and interpretation of
tests based on psychological theory, knowledge, and
principles have seen considerable expansion and effort
in the past 10 years. The ongoing advances in scientifically
based assessment systems, computer technology, and statistical methodology, the increasingly sophisticated uses
of psychological instruments in clinical settings, and the
widespread use of psychological tests in making decisions
that affect the lives of many people have created an exponentially growing body of knowledge and practice that
requires the expertise of the assessment specialist. The
complexity of the dimensions and tasks in clinical psychology has necessitated development of a wide range of
methods of assessment, each appropriate to the different
needs of basic psychology. There is no other field that
utilizes such varied assessment approaches; these include
unstructured and structured clinical assessment and rating approaches, observational methods (including audio/
video), psychometric normative attitudinal assessment,
neuropsychological standardized tests, and physiological
methods. The potential strength of the diversity of
methods is that rather than relying on one particular
type of approach, the comparison of different sources
of information allows a much more complex insight
into a disorder or problem.
This variety of assessment forms has evolved from the
problem that inner states are not directly observable. It is
this problem that caused a dialectic between the nomothetic versus idiographic approach in clinical psychology.
The nomothetic strand relies exclusively on the normative
approach toward measurementsthat is, by drawing
references from any kind of a norm group in regard to
the respective traits or state under consideration; moreover, all individuals have a score or value on a nomothetic
test. The normative reference might include a variety
of standardized approaches (population, subpopulation,
312
Clinical Psychology
Clinical Psychology
313
314
Clinical Psychology
defined training pathways, or are in the process of implementing registration policies for clinical psychologists and
psychotherapists.
Applied Research
In the spirit of Lightner Witmers experimental influence,
the so-called scientist-practitioner model was formulated
at a meeting in Boulder, Colorado in 1947. The scientistpractitioner model is based on the proposition that clinical
practice must be based on systematic experimental investigation into psychopathology, assessment, and intervention. It supports empirically based clinical practice as well
as the training of clinical psychology students in the
methods of scientific investigation and decision making,
and their application to practice. Training in clinical psychology therefore has an emphasis on research skills,
which now go well beyond the specifics of the behaviorally
dominated Boulder model to include methods such as
single-case quasi-experimental designs, longitudinal research, clinical outcome studies, randomized controlled
trials, survey methods, and so on. This broad research
training cuts across quantitative and qualitative methods
and provides clinical psychology with an ever-increasing
role in applied research, especially in North America.
Further Reading
Beck, A. T. (1976). Cognitive Therapy and the Emotional
Disorders. Meridian, New York.
Bowlby, J. (1969). Attachment and Loss. Vol. 1, Attachment.
Hogarth Press, London.
Bowlby, J. (1988). A Secure Base: Clinical Applications of
Attachment Theory. Routledge, London.
Clinical Psychology
Costa, P. T., and McCrae, R. R. (1992). Revised NEO
Personality Inventory and NEO Five-Factor Inventory
Professional Manual. Psychological Assessment, Odessa,
Florida.
Greenwald, A. G., and Banaji, M. R. (1995). Implicit social
cognition: Attitudes, self-esteem, and stereotypes. Psychol.
Rev. 102, 4 27.
Harris, P. L. (1989). Children and Emotion: The Development
of Psychological Understanding. Blackwell, Oxford.
Kelly, G. A. (1955). The Psychology of Personal Constructs.
Norton, New York.
315
Clustering
Phipps Arabie
Rutgers Business School, Newark and New Brunswick,
New Jersey, USA
Lawrence J. Hubert
University of Illinois at Champaign, Champaign, Illinois, USA
J. Douglas Carroll
Rutgers Business School, Newark and New Brunswick,
New Jersey, USA
Glossary
agglomerative algorithms Hierarchical clustering algorithms
that begin with each entity as its own (singleton) cluster and
then iteratively merge entities and clusters into a single
cluster, constituting the entire group of entities.
divisive algorithms The reverse of agglomerative algorithms;
begins with the entire set of entities as one cluster, which is
then iteratively divided (usually bifurcated) until each entity
is its own (singleton) cluster.
hierarchical clustering The most commonly used form of
clustering in the social sciences, probably because of widely
available software and the resultant dendrograms (inverted
tree structures). The only form of pairwise overlap of clusters
allowed is that one must be a proper subset of another.
multidimensional scaling A wide variety of techniques,
usually leading to representation of entities in a coordinate
space of specified dimensionality.
overlapping clustering In contrast to hierarchical clustering,
entities may simultaneously be constituent members of
more than one cluster. Hierarchical clustering is subsumed
as a special case.
partitioning clustering Each entity belongs to exactly one
cluster, and the union of these clusters is the complete set
of entities.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
317
318
Clustering
Hierarchical Clustering
Another important distinction is the degree of overlap
allowed by the method of clustering. For hierarchical
clustering, the traditional requirement is that in any
two distinct clusters, one is either a proper subset of
the other (nesting) or there is no overlap at all. Most
hierarchical methods assume a one-mode, two-way input
matrix; the highly popular method developed by Joe
H. Ward is a major exception. In agglomerative algorithms for hierarchical clustering, the algorithm begins
with each of the n objects as a singleton cluster and
then begins amalgamating the objects into clusters, according to which pairs of objects or clusters are most
similar if the data are similar (as in correlations) or
least dissimilar (as in distances). The process continues
stepwise until the result is only one cluster containing the
entire set of the n objects. Divisive algorithms take the
opposite route, beginning with one cluster and stepwise
splitting it successively until only singleton clusters remain. Deciding on where in the resulting chain of partitions is the right place to claim as the final solution is
generally an unsolved problem, with a solution usually
dictated by the users fiat, typically toward
a comparatively very small number compared to n. The
results of this iterative formation of a hierarchically nested
chain of clusters are frequently represented in output
from statistical software as a dendrogram, or inverted
tree, often printed in landscape (i.e., horizontal)
format for mechanical reasons. This dendrogram can
Clustering
Partitioning
For some researchers, clustering is (erroneously) synonymous with partitioning, which allows no overlap: each
object is in exactly one cluster and no overlap is allowed
among clusters. Most algorithms to implement partitioning assume a two-mode, two-way input matrix and are
variations of MacQueens K-means approach. Many applications of partitioning involve hundreds of thousands
of objects and there is the major problem of finding a
globally optimal representation. Few methods of clustering can guarantee a global, as opposed to a merely local,
optimal solution that is computationally feasible, and the
resulting partition is highly dependent on the starting
configuration, which may either be random, supplied
by an earlier analysis using hierarchical clustering, or
based on some other strategy. It therefore sometimes
becomes necessary to repeat the analysis with different
starting configurations and then to attempt to decide
which solution is best in some sense. Moreover, few
methods of clustering of any kind can obtain a global
optimum unless the data set is very small and/or complete
enumeration (possibly implicit, as in dynamic programming) is used.
Overlapping Clustering
There is a fourth approach, overlapping clustering, which
in turn has two variations: discrete versus continuous. In
discrete clustering, an object may simultaneously belong
to more than one cluster and membership within a cluster
319
Consensus Clustering
For hierarchical and for partitioning clustering, it is
common to obtain several different clusterings, using different methods, and then use logical algorithms to form
a consensus clustering. Not surprisingly, the literature
on such algorithms is often closely related to that of rules
for voting or social choice.
320
Clustering
Further Reading
Arabie, P., and Hubert, L. (1996). An overview of combinatorial data analysis. In Clustering and Classification (P. Arabie,
L. J. Hubert, and G. De Soete, eds.), pp. 5 63. World
Scientific, River Edge, New Jersey.
Carroll, J. D., and Arabie, P. (1980). Multidimensional
scaling.
In
Annual
Review
of
Psychology
(M. R. Rosenzweig and L. W. Porter, eds.), Vol. 31.
pp. 607 649. Annual Reviews, Palo Alto, California.
[Reprinted in Green, P. E., Carmone, F. J., and Smith,
S. M. (1989). Multidimensional Scaling: Concepts and
Applications, pp. 168 204. Allyn and Bacon, Needham
Heights, Massachusetts.]
Carroll, J. D., and Klauer, K. C. (1998). INDNET: An
individual-differences method for representing three-way
proximity data by graphs. In Psychologische Methoden und
Soziale Prozesse (K. C. Klauer and H. Westmeyer, eds.),
pp. 63 79. Pabst Science, Berlin.
Carroll, J. D., and Pruzansky, S. (1975). Fitting of hierarchical
tree structure (HTS) models, mixtures of HTS models, and
hybrid models, via mathematical programming and alternating least squares. Proceedings of the U.S.Japan Seminar
on Multidimensional Scaling, pp. 9 19.
Carroll, J. D., and Pruzansky, S. (1980). Discrete and hybrid
scaling models. In Similarity and Choice (E. D. Lantermann
and H. Feger, eds.), pp. 108 139. Hans Huber, Bern.
De Soete, G., and Carroll, J. D. (1996). Tree and other
network models for representing proximity data. In
Clustering and Classification (P. Arabie, L. Hubert, and
G. De Soete, eds.), pp. 157 197. World Scientific, River
Edge, New Jersey.
Gordon, A. D. (1996). Hierarchical classification. In Clustering
and Classification (P. Arabie, L. Hubert, and G. De Soete,
eds.), pp. 65 121. World Scientific, River Edge, New
Jersey.
Gordon, A. D. (1999). Classification. 2nd Ed. Chapman &
Hall/CRC, Boca Raton.
Coding Variables
Lee Epstein
Washington University, St. Louis, Missouri, USA
Andrew Martin
Washington University, St. Louis, Missouri, USA
Glossary
codebook A guide to the database that the researcher is
creatinga guide sufficiently rich that it not only enables
the researcher to code his or her data reliably but also
allows others to replicate, reproduce, update, or build on
the variables housed in the database, as well as any analyses
generated from it.
observable implications (or expectations or hypotheses) What we expect to detect in the real world if our
theory is right.
reliability The extent to which it is possible to replicate
a measurement, reproducing the same value (regardless of
whether it is the right one) on the same standard for the
same subject at the same time.
theory A reasoned and precise speculation about the answer
to a research question.
variable Observable attributes or properties of the world that
take on different values (i.e., they vary).
variable, values of Categories of a variable (e.g., male and
female are values of the variable gender).
Introduction
Social scientists engaged in empirical researchthat is,
research seeking to make claims or inferences based on
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
321
322
Coding Variables
speculate, but an obvious response centers on the seemingly idiosyncratic nature of the undertaking. For some
projects, researchers may be best off coding inductively,
that is, collecting their data, drawing a representative sample, examining the data in the sample, and then developing their coding scheme; for others, investigators proceed
in a deductive manner, that is, they develop their schemes
first and then collect/code their data; and for still a third
set, a combination of inductive and deductive coding may
be most appropriate. (Some writers associate inductive
coding with research that primarily relies on qualitative
[nonnumerical] data/research and deductive coding with
quantitative [numerical] research. Given the [typically]
dynamic nature of the processes of collecting data and
coding, however, these associations do not always or perhaps even usually hold. Indeed, it is probably the ease that
most researchers, regardless of whether their data are
qualitative or quantitative, invoke some combination of
deductive and inductive coding.) The relative ease (or
difficulty) of the coding task also can vary, depending
on the types of data with which the researcher is working, the level of detail for which the coding scheme calls,
and the amount of pretesting the analyst has conducted, to
name just three.
Nonetheless, we believe it is possible to develop some
generalizations about the process of coding variables, as
well as guidelines for so doing. This much we attempt to
accomplish here. Our discussion is divided into two sections, corresponding to the two key phases of the coding
process: (1) developing a precise schema to account for
the values of the variables and (2) methodically assigning
each unit under study a value for every given variable.
Readers should be aware, however, that although we
made as much use as we could of existing literatures,
discussions of coding variables are sufficiently few and
far between (and where they do exist, rather scanty)
that many of the generalizations we make and the
guidelines we offer come largely from our own experience. Accordingly, sins of commission and omission probably loom large in our discussion (with the latter
particularly likely in light of space limitations).
Coding Variables
323
Value label
Stay, petition, or motion granted
Affirmed; or affirmed and petition denied
Reversed (including reversed & vacated)
Reversed and remanded (or just remanded)
Vacated and remanded (also set aside &
remanded; modified and remanded)
Affirmed in part and reversed in part (or
modified or affirmed and modified)
Affirmed in part, reversed in part, and remanded;
affirmed in part, vacated in part, and remanded
Vacated
Petition denied or appeal dismissed
Certification to another court
324
Coding Variables
Codebooks
In line with our earlier definition, codebooks provide
a guide to the database that the researchers are creatinga guide sufficiently rich that it not only enables
the researchers to code their data reliably but also allows
others to replicate, reproduce, update, and build on the
variables housed in the database as well as any analyses
Coding Variables
325
326
Coding Variables
typically means that researchers should record the original value, reserving transformations for later. For example, even if the logarithm of age will ultimately serve as an
independent variable in the analysis, the researcher ought
code the raw values of age and do so sensibly (if a person
is 27, then the value of the variable age for that person
is 27).
Two other rules of thumb are worthy of note. One is
that wherever and whenever possible, researchers should
use standard values. If the zip code of respondents is
a variable in the study, it makes little sense to list the codes
and then assign numerical values to them (11791 1,
11792 2, 11893 3, and so on) when the government
already has done that; in other words, in this case the
researcher should use the actual zip codes as the values.
The same holds for other less obvious variables, such as
industry, to which the researcher can assign the values
(e.g., 11 Agriculture, 22 Utilities, and so on) used by
the U.S. Census Bureau and other agencies.
The remaining rule is simple enough and follows from
virtually all we have written thus faravoid combining
values. Researchers who create a variable gender/
religion and codes a male (value 0) Baptist (value
10) as value 010 are asking only for trouble. In addition
to working against virtually all the recommendations we
have supplied, such values become extremely difficult
to separate for purposes of analyses (but gender and
religion, coded separately are simple to combine in
most software packages).
Missing Values
However carefully researchers plan their project, they
will inevitably confront the problem of missing values.
A respondent may have failed (or refused) to answer
a question about his/her religion, a case may lack a clear
disposition, information simply may be unavailable for
a particular county, and so on. Investigators should be
aware of this problem from the onset and prepare accordingly. This is so even if they plan to invoke one of the
methods scholars have developed to deal with missing
data because it might affect the analyses. That is because
the various solutions to the problem assume that researchers treat missing data appropriately when they create
the original database.
At the very least, investigators must incorporate into
their codebook values to take into account the possibility
of missing datawith these values distinguishing among
the different circumstances under which missing information can arise. These can include refused to answer/no
answer, dont know, and not applicable, among others. Whatever the circumstances, researchers should assign values to them rather than simply leaving blank
spaces. Simply leaving missing values blank can cause
all types of logistical problemsfor example, is the observation truly missing, or has the coder not yet completed
Coding Variables
327
Further Reading
Babbie, E. (2001). The Practice of Social Research.
Wadsworth, Belmont, CA.
Data Documentation Initiative. http://www.icpsr.umich.edu/
DDI/ORG/index.html
Epstein, L., and King, G. (2002). The rules of inference.
University Chicago Law Rev. 69, 1133.
Frankfort-Nachmias, C., and Nachmias, D. (2000). Research
Methods in the Social Sciences. Worth, New York.
Inter-university Consortium for Political and Social Research.
(2002). Guide to social science data preparation and
archiving. Ann, Arbor, MI. Available at: http://www.
icpsr.umich.edu/ACESS/dpm.html
King, G., Honaker, J., Joseph, A., and Scheve, K. (2001).
Analyzing incomplete political science data: An alternative
algorithim for multiple imputation. Am. Polit. Sci. Rev. 95,
4969.
King, G., Keohane, R. O., and Verba, S. (1994). Designing
Social Inquiry: Scientific Inference in Qualitative
Research. Princeton University Press, Princeton, NJ.
Little, R. J. A., and Rubin, D. B. (1987). Statistical Analysis
with Missing Data. John Wiley, New York.
Manheim, J. B., and Rich, R. C. (1995). Empirical Political
Analysis, 4th Ed. Longman, New York.
Salkind, N. J. (2000). Exploring Research. Prentice Hall,
Upper Saddle River, NJ.
Shi, L. (1997). Health Services Research Methods. Delmar
Publishers, Albany, NY.
Stark, R., and Roberts, L. (1998). Contemporary Social
Research Methods. MicroCase, Bellvue, WA.
U.S. Census Bureau. http://www.census.gov/epcd/www/
naics.html
U.S. Court of Appeals Data Base. http://www.polisci.msu.edu/
pljp/databases.html
Cognitive Maps
Reginald G. Golledge
University of California, Santa Barbara, California, USA
Glossary
anchor point A dominant environmental cue.
bidimensional regression Regression in the spatial coordinate domain.
cognitive map The product of perceiving and thinking about
environments external to ones body. It represents a store
of experienced sensations that have been perceived,
noticed, identified, encoded, and stored in memory for
later manipulation and used in problem-solving or decisionmaking situations.
cognitive mapping The process of perceiving, encoding,
storing, internally manipulating, and representing spatial
information.
configurational matching The assessment of the degree of
congruence between actual and created spatial configurations.
displacement error The absolute distance between an
objective and a subjective location.
fuzziness The variance associated with a set of spatial
measures.
landmark A generally well-known feature, distinct because of
its physical, social, or historical properties.
multidimensional scaling An iterative procedure for finding
a minimal dimensional solution for configuring spatial
knowledge.
place cells Elements of the brain in which specific spatial
features are stored.
pointing Use of a device or body part to identify direction
from current location to a distant location.
reproduction task An act of using body-turning or locomotion to represent an experienced angle or distance.
sketch map A hand-drawn representation of what one recalls
about an environment.
spatial cognition The thinking and reasoning processes used
to manipulate, interpret, and use encoded spatial information.
spatial decoding errors Errors of translating encoded spatial
information for manipulation in working memory.
spatial encoding errors Errors of perceiving or sensing
feature or object location.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
329
330
Table I
Cognitive Maps
Methods for Obtaining Spatial Products
Collecting unidimensional scaling measures of spatial characteristics, such as interpoint distances or directions
Reproducing spatial properties from memory, such as walking distances or turning angles
Recording paths of travel
Using sketch mapping to define what people know and recall about the structure and content of large urban environments
Construction of latent spatial relations using indirect distance or proximity judgments and nonmetric multidimensional scaling
procedures
Using trilateration procedures to construct configurations from interpoint distance estimates
Constructing three-dimensional table models of routes and environments that have been experienced
Use of imagined locations at which to complete spatial tasks
Examination of verbal reports of spatial wayfinding and layout knowledge
Determining whether people without sight can reproduce distances, directions, and layouts
Use of projective convergence or classical triangulation methods to determine location of obscured features
Spatial updating after location translation
Exploring spatial relations in virtual environments
Cognitive Maps
The Geometries
Much of cognitive map measurement has been focused on
specific geometric componentsdepicted in terms of
points, lines, simple or complex networks, sequences or
strings, areas, hierarchies, and configurations or layouts
of places. Information stored in long-term memory does
not necessarily take the form of what is traditionally recognized to be a cartographic map. Discovering the spatial
nature of stored information and the spatial relations that
are embedded within it has resulted in a wide variety of
experimental designs and innovative task situations including metric and nonmetric formats. Most efforts
have concentrated on two-dimensional Euclidean
spacealthough some researchers have shown that city
block metric (i.e., Minkowskian space of R 1) is most
likely used by many people. Using a two-dimensional map
metaphor to describe spatial products has the convenience of allowing access to a variety of graphic, cartographic, and geometric measurement procedures that
allow comparisons to be made between the characteristics
and structure of spatial products and their counterparts in
objective reality.
Locating Places
In objective reality, locations are often known by several
labels or identities (e.g., the corner of High and Main;
the bank corner; the center of town). Before one can
determine locational accuracy, one must find by which
label a location is best known. In a designed experiment,
locations can be arbitrarily identified by a single name or
symbol, thus removing this ambiguity. Location is usually
specified in one of two ways: absolute or relative. Absolute
331
332
Cognitive Maps
Table III
Measuring Configurations
Multidimensional Scaling
In contrast to simply matching estimated and actual data,
a procedure that comes closer to the metaphorical idea of
producing a map as the spatial product involves the development of a distance or similarities matrix. Assuming
symmetry in judgments between pairs of places, only the
top diagonal half of such a matrix is used. Typically, this is
input to a metric or nonmetric multidimensional scaling
algorithm; these can be found in most statistical software
packages. Most of these are based on the original ideas
developed by Kruskal and Wish in the 1960s. The essence
of such an algorithm is to generate a set of distances that
satisfy the same monotonic relationship that is expressed
in the values in the data cells of the distance matrix. Starting with a random configuration, a method is used in
a series of iterations to converge the two monotonic sequences. At each iteration, a badness-of-fit statistic gives
an indication of whether the iterative procedure has
reached a satisfactory or acceptable minimum dimensionality. When dealing with interpoint distances in the real
world, the actual configuration of distances can be the
starting configuration. Using this procedure, the real
pattern is warped to most closely match the subjective
pattern in the data matrix. Achieving best fit may involve translating or rotating the subjective configuration.
When the actual configuration is used as the starting configuration, however, a fit statistic can be interpreted as an
accuracy measure. In most cases, output is forced into
a two-dimensional solution so that matching can occur.
Regression toward
the mean
Comparison of cognized
and objective distances
Except for the work of Kitchin (1996), researchers have focused little attention on determining
the reliability and validity of cognitive distance measures, and there is little evidence indicating
whether any individual might give the same responses to estimating distances over various
methods and at various times. Kitchin argues that, considering cognitive distance measures,
both the validity and reliability of product measurements are uncertain.
Estimates of cognitive distance usually show patterns of overestimation of short distances and
underestimation of longer distances. Over a populations response set, there is a tendency to
regress toward the mean.
Traditionally, cognized and actual distances are compared using linear regression analysis.
The typical pattern of overestimating shorter distances and underestimating longer distances
favors the nonlinear type (e.g., Y aX b rather than Y a bX) (where Y is the cognitive
distance, a is the intercept, X is objective distance, and b is the slope coefficient).
Cognitive Maps
Interpreting Configurations
Multidimensional scaling unpacks the latent two-dimensional (or higher) structure that is embedded in a series of
unidimensional distance judgments. It does so without
requiring the individual to expressly know a specific distance measure (e.g., meters, kilometers, yards, miles).
Although multidimensional scaling is accepted as a
powerful technique to recover latent structure, criticism
is sometimes leveled at the procedure because of potential
impurities in the interpoint distance matrix. For example,
if people are required to estimate the distances between
unfamiliar pairs of points, they may guess. A good guess
may be consistent with other information and allow
a reasonable configuration to be estimated. A bad guess
might considerably warp otherwise useful and accurate
patterns of estimation.
A second criticism is that the use of similarities rather
than distances may invoke functional dimensions
rather than spatial dimension. Thus, appropriate lowdimensional output may exist in three or more dimensions
rather than the two dimensions usually required for congruence mapping with real-world locational patterns.
Like factor analysis, researchers must identify the dimensions of an output configuration. Often it is assumed that
the lower dimensional configurations are most likely
matched to spatial coordinate systems (e.g., traditional
north, south, east, west frames). This may not be so. A
necessary step before accepting such an assumption
would be to regress the coordinate values along
a specific dimension with the coordinate values along
a particular north, south, east, or west axis. If the correlations are not high, then functional or other dimensions
may be the critical ones involved in making the distance or
similarity judgments. Despite these shortcomings, the
production of interpoint distance matrices and analysis
by metric or nonmetric multidimensional scaling
procedures has become a common and useful way of
measuring cognized spatial configurations.
Matching Configurations
One simple way of illustrating divergence between cognized and actual spatial data is through a grid transformation. This is achieved by first locating a standard
regular square grid over an objective configuration.
The same-sized grid is then applied to the cognitive configuration. The latter grid structure is warped to illustrate
distortions in the cognized layout. Good matches between
cognitive and real-world configurations produce very little
grid distortion, whereas poor fits between the two require
considerable grid distortions (Fig. 1). Distorted grids can
be compiled using standard contouring or interpolation
procedures. Measures are also available in most geographic information system software packages.
333
r
P
P
P
sin2 Y x2 2 sin Y cos Y xy cos2 Y y2
SD y axis
:
N
334
Cognitive Maps
Subject 007
Subject 007
C
Subject 141
Subject 141
Figure 1 Examples of distorted grids. (A) Distortion vectors for a newcomer to a setting.
(B) Distorted grid for a newcomer to a setting. (C) Distortion vectors for a long-time resident
of a setting. (D) Distorted grid for a long-time resident of a setting.
vj
yj
fj
a2
b21 b22
where ej and fj are the residual errors, and a1 and a2 are
analogous to the intercept terms of the standard linear
regression model and are used to perform appropriate
translations. Scaling and rotation are produced by the
Cognitive Maps
335
0 1 2
Map location
Mean center of estimates
Ellipses represent 0.25 standard deviations
336
Cognitive Maps
n
where n is the number of angle segments, RAS is the
relative accuracy score, OAS is the objective angle
segment, and CAS is the cognitive angle segment.
However, whereas the above measurement problems
emphasize unidimensional estimation procedures, interpoint directions can be examined using multidimensional
spaces.
Projective Convergence
A two-dimensional map of the location of places based on
directional estimates can be obtained using a traditional
surveying triangulation procedure called projective convergence. In surveying, the location of a point or an object
may be estimated by using an alidade to sight on the object
from a plane table placed at a known location and then
drawing a projection vector toward the target location
after ensuring that the plane table on which a drawing
is placed is oriented in a standard direction. A convergence
of three directional lines is usually subject to a small
amount of error resulting from the plane table orientation
errors. However, the projected lines create a triangle of
error and the simple process of bisecting each angle of the
triangle to define a midpoint gives a best estimate of
the target points location. In projective convergence,
Cognizing Layouts
This section deals explicitly with measuring cognized layouts rather than indirectly creating layouts from unidimensional judgments of spatial relations. Thus, the section
starts with discussion of a two-dimensional plane instead
of deriving it or assuming that one can be derived.
Anchors
As knowledge about environments is accumulated, increased understanding or familiarity allows the various
structural properties of environmental layouts to be enhanced. Studies of the development of knowledge of
large-scale complex environments has shown that, initially, except for a few key points labeled anchor points,
reliable information is sparse and structural information is
distorted and fuzzy. Repeated excursions (e.g., journeys to
work, shop, and educate) add information on a regular
basis and, by so doing, assist in the process of knowing
where things are and where they are in relation to other
things. Given sufficient exposure, environmental knowing
reaches a steady state in which a basic spatial structure
becomes evident. Eventually, except for minor adjustments as better knowledge is obtained about less frequently visited places, or as new or important features
are added in the environment that may grow to the
stage of anchor points, the known spatial structure becomes relatively stable. For the most part, researchers
have paid little attention to the development of spatial
layout knowledge over time. Rather, they have assumed
that cognitive maps would become stable. Indeed, Blades
showed in 1991 that sketch map representation of environmental knowledge remains fairly constant over successive time periods once a stage of adequate learning has
been achieved. Thus, when attempting to measure spatial
layouts, it is generally assumed that people are familiar
Cognitive Maps
Table IV
337
Measuring Configurations
Freehand sketch
mapping
Controlled
sketching
Layout completion
tasks
Recognition tasks
Verbal or written
spatial language
Placement
knowledge
Hierarchies
Model building
Measurements made on sketch maps usually consist of counts of individual features, classification of
such features into point, line, or areal categories, feature class counts, the imposition of traverses or
profile lines to determine whether features are represented in appropriate linear sequence. Information
can be obtained at both the individual (disaggregate) level and the population (aggregate) levels. The
nonmetric information collected by such devices is often used as an adjunct to information collected by
other procedures.
The information obtained from various controlled sketching procedures is essentially measured in
terms of content classification (e.g., Lynchs elements of paths, edges, districts, nodes, and landmarks).
The maps are analyzed to record how many of each type of feature occur. No a priori decisions of
what to look for on the sketches are needed; instead, one can use panels of judges or focus groups to
determine possible information classes. Sequence and ordering characteristics can be collected by
referring features to relative positions with regard to the cued places.
Measures include counts of infilled material by feature class, correct sequencing of features, correct
identification of features in segments, and correct identification of nearest neighbor.
Measures include correct matching of features. Measures parallel the classic types of measurement
models and represent a trend toward using forms of traditional psychometric spatial abilities tests to
evaluate peoples memory of specific configurations (point, line, and area). Generally called
configuration tasks, they emphasize a respondents ability to recognize which of several possible
alternatives displays the same spatial relations as are displayed in a cued display. Typical tasks involve
estimating which of a series of alternative building floor plans matches a recalled plan (after several
learning trials).
Measuring recognition of layouts may require respondents to match one of a series of statements with
a verbal statement previously given or require participants to identify which of a series of layouts
would be the most appropriately described by a set of logical connectives (such as and, or, not or).
Classic nearest-neighbor analysis could be implemented by asking which function (e.g., named
recreation area) was the closest nearest neighbor to a given function (e.g., named recreation area).
Each of a set of features in turn can represent the anchor of a judgment procedure. Counts of the
number of correct responses can be made. A simple rank correlation can be determined between
the actual ordering of nearest neighbors and the cognized responses. If placement is determined
via MDS, accuracy measures and fuzziness ellipses can be calculated.
Measures include the number of matches between actual and cognized spatial neighbors.
Measures include scoring the model on completeness, the number of correct line segments, number of
correct turn angles, sequencing of landmark cues, and orientation with respect to a given frame.
Graphic Representations
Freehand Sketching
The oldest and most widely used of these procedures is
that of freehand sketching. Epitomized by the innovative
work of Lynch in 1960, this procedure requires the production of a freehand sketch map of a given environment
(such as a city). Although originally treated as if the resulting information was metric, it has become commonly
accepted that these sketch maps relied so much on
graphicacy skills and had no production controls (e.g.,
scales or north lines or other orienting features) that
their use was very limited.
Controlled Sketching
Controlled sketching provides some information to the
respondent before the sketching task takes place. This
may be in the form of one or more key locations (such
as current position and a significant distant landmark)
or it may take the form of providing a scale bar, north
line, and a keyed location. Criticisms of controlled sketch
map approaches usually focus on mistakes of attributing
metric properties to nonmetrically represented features,
underestimating the distribution of individual differences
in graphicacy skills, and confusing preferred drawing
styles with content accuracy. This methodology is shown
to have considerable drawbacks when dealing with children and with different disabled groups (e.g., the blind).
Completion Tasks
The extent of layout information is sometimes measured
using a completion task. In this case, individuals may be
given representations with selected information missing
338
Cognitive Maps
Recognition Tasks
Placement Knowledge
Layout knowledge can also be tested in various recall
situations by requiring people to recall relative placement
of features in an environment. Here, after viewing an
environmental representation (e.g., map, photo, slide,
image, model), respondents may be asked to extract the
layout pattern of a particular class of features.
Model Building
Cognitive Maps
339
Further Reading
Blades, M. (1990). The reliability of data collected from sketch
maps. J. Environ. Psychol. 10, 327339.
Gale, N. D. (1982). Some applications of computer cartography to the study of cognitive configurations. Profess. Geogr.
34, 313321.
Garling, T., Selart, M., and Book, A. (1997). Investigating
spatial choice and navigation in large-scale environments.
In Handbook of Spatial Research Paradigms and Methodologies (N. Foreman and R. Gillet, eds.), Vol. 1,
pp. 153180. Psychology Press, Hove, UK.
Golledge, R. G. (1992). Do people understand spatial
concepts? The case of first-order primitives. In Theories
and Methods of Spatio-Temporal Reasoning in Geographic
Space. Proceedings of the International Conference on
GISFrom space to territory: Theories and methods of
spatio-temporal reasoning. Pisa, Italy, September 2123
(A. U. Frank, I. Campari, and U. Formentini, eds.),
pp. 121. Springer-Verlag, New York.
Golledge, R. G. (1999). Human wayfinding and cognitive
maps. In Wayfinding Behavior: Cognitive Mapping and
Other Spatial Processes (R. G. Golledge, ed.), pp. 545.
Johns Hopkins University Press, Baltimore, MD.
Hirtle, S. C., and Jonides, J. (1985). Evidence of hierarchies in
cognitive maps. Memory Cogn. 13, 208217.
Kitchin, R. (1996). Methodological convergence in cognitive
mapping research: Investigating configurational knowledge.
J. Environ. Psychol. 16, 163185.
Kitchin, R., and Blades, M. (2002). The Cognition of
Geographic Space. Taurus, London.
Montello, D. R. (1991). The measurement of cognitive
distance: Methods and construct validity. J. Environ.
Psychol. 11, 101122.
Silverman, I., and Eals, M. (1992). Sex differences in spatial
ability: Evolutionary theory and data. In The Adapted Mind:
Evolutionary Psychology and the Generation of Culture
(J. H. Barkow, L. Cosmides, and J. Tooby, eds.),
pp. 533549. Oxford University Press, New York.
Tobler, W. R. (1994). Bidimensional regression. Geogr. Anal.
26, 187212.
Wakabayashi, Y. (1994). Spatial analysis of cognitive maps.
Geogr. Rep. Tokyo Metrop. Univ. 29, 57102.
Cognitive Neuroscience
Craig Weiss
Northwestern University Feinberg School of Medicine,
Chicago, Illinois, USA
John F. Disterhoft
Northwestern University Feinberg School of Medicine,
Chicago, Illinois, USA
Glossary
amygdala A structure deep in the brain; coordinates autonomic and endocrine responses in conjunction with
emotional states.
cerebellum A highly foliated part of the hindbrain; coordinates and modulates movement and is involved in learning
motor skills.
cognitive map A representation of the spatial characteristics
of an individuals environment; enables movement directly
from one place to another regardless of starting point.
dendrite The structural part of a neuron; receives and
conveys information to the cell body of a neuron.
galvanic skin response (GSR) A change in electric conductivity of the skin; thought to be due to an increase in
activity of sweat glands when the sympathetic nervous
system is active during emotional states.
hippocampus A structure deep in the temporal lobe of the
brain; receives information from the association cortex, and
is required for the storage of new memories.
in vitro Latin for in glass; refers to the technique of working
with tissue that has been removed from the body and
stabilized in a culture dish using physiological buffers.
neural network The structures and interconnections that
comprise a functional unit within the brain.
nucleus A collection of neurons that are functionally related
and located within a relatively confined area of the brain.
soma The cell body of a neuron; inputs reach the cell body via
extensions known as dendrites, and the output leaves along
the axonal projection.
Cognition is a more abstract notion that involves awareness, consciousness, and, perhaps, thinking. The compound term cognitive neuroscience refers, then, to the
study of the neurobiological mechanisms of cognition.
This necessarily involves a multidisciplinary approach
and involves a vast amount of research using model systems for the analysis of cognition. The model system includes the animal model (i.e., the species), the paradigm
(i.e., the task that will be used to invoke and measure
cognition), and the methods of analysis. The study of
cognitive neuroscience includes an analysis of more
than just simple sensory-response properties; it also includes an analysis of the highest orders of neural processing and the problems that arise with changes to the
system, whether due to aging, disease, genetic mutation,
or drug abuse. Changes in behavior due to the latter factors have an enormous impact on society, and society can
thus only benefit from a better understanding of the neurobiological mechanisms underlying cognition. The discussion here concentrates on cognitive neuroscientific
studies of learning and memory, the area of special
expertise of the authors.
Model Paradigms
The rigorous scientific study of cognition has benefited
from objective measures using well-defined tasks and situations. Subjective responses from humans can be valuable, but the responses cannot be independently verified
due to the very nature of subjectivity. Furthermore, the
rich experiences of human interactions confound
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
341
342
Cognitive Neuroscience
Conditioned mouse
Average
blink
Light CS
Airpuff US
Control mouse
Average
blink
Light CS
Control mouse
Average
blink
Airpuff US
Eyeblink Conditioning
Eyeblink conditioning typically pairs a tone (100
1000 ms) with a brief (100 ms) puff of air to the cornea
(similar to that of a glaucoma test) or a mild shock to the
side of the eye. The study participant initially has no response to the CS but blinks in response to the US. After
the CS and US are repeatedly paired, the individual starts
to respond prior to the onset of US such that the peak of
the UR is often at the onset of the US. This is dramatic
evidence that the individual has learned the temporal
relationship between the two stimuli. An example can
be seen in Fig. 1. The learning is relatively fast when
the CS and US coterminate, and there is strong evidence,
especially from Richard Thompson and colleagues, that
acquisition of this task is critically dependent only on the
cerebellum. For example, decerebrated animals, which
have their forebrain disconnected from the brain stem,
can still learn the association, whereas rabbits with cerebellar lesions as small as 1 mm3 are unable to learn the
association when the lesion is placed properly.
The cerebellar dependence of simple conditioning
may suggest that the task is not cognitive in nature.
However, as shown by John Disterhoft and colleagues,
when the CS and US are separated in time, a memory
trace of the CS must be formed to interact with the effect
of US. The simple manipulation of introducing a trace
interval between the CS and US appears to make the task
more cognitive in nature, because the forebrain is required to learn trace eyeblink conditioning when the
stimuli are separated in time beyond some critical interval. This interval may be species specific, but a trace interval of only 500 ms appears to be sufficient to require
the activity of the hippocampus and forebrain for
Cognitive Neuroscience
learning to occur in rabbits and humans (250 ms is sufficient for rats and mice).
Another manipulation that is often used to make the
task more cognitively demanding is discrimination conditioning. This paradigm uses two different conditioning
stimuli such that one CS predicts the US, and the
other CS is never paired with the US. An individual
that learns this task will respond to the CS (the stimulus
that predicts the US) but not to the CS (the stimulus that
is not associated with the US). Although this task does not
require the hippocampus or forebrain to be learned, an
animal with such a lesion will not be able to inhibit CRs
when the stimulus contingencies of the experiment are
switched, i.e., the old CS is now the CS and vice versa.
Learning the reversal of the stimulus contingencies seems
to require a flexibility of associative learning that appears
to be cognitive in nature and requires that the hippocampus and associated circuitry be engaged for successful
acquisition.
Fear Conditioning
Fear conditioning typically pairs a tone (38 sec) with
a brief shock to the feet in order to evoke a fear response.
The fear response of rodents appears as the cessation of
normal exploration and grooming such that respiration is
the only movement observed. After the two stimuli are
paired together in time, the subject freezes in response to
the CS. A considerable amount of evidence from investigators such as Michael Davis, Michael Fanselow, Joseph
LeDoux, and Stephen Maren indicates that the association of the two stimuli is critically dependent on the amygdala, a part of the brain that is involved in modulating
autonomic responses. Like eyeblink conditioning, fear
conditioning can also be made more cognitively demanding by separating the tone and the foot shock in time by
a stimulus-free trace interval. The presence of the trace
interval adds the additional requirement of hippocampal
functioning for animals to learn the association of the tone
and the foot shock.
Although the two conditioning paradigms may not appear to be inherently cognitive in nature, several methodologies (e.g., neuronal recordings and lesions) indicate
that those paradigms require and activate higher order
brain structures that are associated with cognition. Two
other widely used behavioral paradigms that also seem to
be cognitive in nature are spatial navigation and object
recognition.
Spatial Navigation
Spatial navigation requires the integration of numerous
multimodal stimuli in order to acquire a flexible cognitive
map of the environment, so that a study participant
can self-direct to a goal, regardless of the starting point
343
Session 1
Session 2
Session 3
Session 4
Figure 2 Rats learn the location of a submerged escape platform in a pool of opaque water by forming a cognitive map of
their environment. Subjects were given four trials per session for
four sessions. Each trial started from a different quadrant of the
maze (chosen in random order). The single trial of each session
that started from the north position (black dot is the target) is
shown. Learning is indicated by the gradual decrease in the path
length. Figure courtesy of A. Kuo with permission.
344
Cognitive Neuroscience
Object Recognition
Object recognition is the ability to recognize a previously
experienced object as familiar. This familiarity can be
measured by recording the amount of time that a study
participant appears to spend attending to the object. The
object can be inanimate, or it can be another study participant, in which case the task is referred to as social
recognition. Primates are often used to study object recognition, especially the task referred to as delayed nonmatching to sample. This task has trials that are separated
by a delay of several seconds and the primate is rewarded
for picking one of two objects that was not picked on the
previous trial. This task takes advantage of the fact that
primates naturally prefer to explore new objects. This task
is impaired by lesions of the hippocampus and seems
cognitive in nature.
Model Systems
Except for the swimming maze and hole-board, all of the
paradigms discussed so far have been used with humans.
However, the use of animal model systems has led to the
greatest understanding of the neurobiological mechanisms that underlie the different tasks. Eyeblink conditioning was originally done with humans starting in the
1920s. The task was adapted for rabbits by Isidore
Gormezano and colleagues in the 1960s. The rabbit
was selected due to its tolerance for restraint and the
lack of many spontaneous eyeblinks. The rabbit is also
a good size for neurophysiological experiments because
it can support the hardware necessary for extracellular
recordings of neural activity from multiple sites. The disadvantage of the rabbit as a model system is that it has few
other behaviors that can be utilized by cognitive neuroscientists, and it has a relatively long life span, impeding
research on the effects of age on cognition.
More comprehensive descriptions of cognitive behaviors, including eyeblink conditioning, were attempted
by adapting the eyeblink paradigm for rats. This was
done by using a tether to connect the equipment to
the rat so that it could move about relatively freely.
The rat-based model allowed an analysis of age-related
impairments and a comparison of multiple tasks such as
eyeblink conditioning, fear conditioning, and water maze
learning. A combination of tasks is invaluable for validating general properties of cognitively demanding tasks.
Most recently, the eyeblink conditioning paradigm
was adapted for the mouse in order to determine the
effects of specific genes on learning, memory, and cognition. The shift from rat to mouse is technically rather
Behavioral Measurements
Eyeblink Conditioning
The parameters and methodologies used to measure cognition vary with the task being used. Eyeblink conditioning, as the name implies, relies on detecting movement of
the eyelids. Many investigators who use rabbits will,
however, use extension of the nictitating membrane
(NM) as a measure of the response, although both responses are often collectively referred to as eyeblinks.
The nictitating membrane, found, for example, in rabbits
and cats, acts as a third eyelid. Animals with this membrane also have an extra set of extraocular eye muscles (the
retractor bulbi muscle) that acts to withdraw the eyeball
into the socket. The membrane normally rests within the
inner corner of the eye and passively extends as the eyeball
is retracted during a blink. Extension of the NM is often
measured with either a minitorque potentiometer or
a reflective sensor.
Minitorque Potentiometers
The potentiometer is usually stabilized above the head of
the animal and the rotating arm of the potentiometer is
connected to a loop of silk that is sutured to the distal point
of the NM. This transducer can be calibrated so that there
is a direct conversion of volts to millimeters of NM extension. Some researchers have also used this transducer
in humans by connecting the rotating arm of the potentiometer to the eyelid with a lightweight adhesive.
Reflective Sensors
The other common method of measuring eyeblinks is to
use an infrared reflective sensor that measures changes in
reflected light as the NM extends and withdraws across
the eye. This device combines an infrared light-emitting
diode (LED) and a phototransistor into one package. The
LED is used to aim invisible infrared light at the cornea
and the phototransistor is aimed to detect the reflected
energy. This device does not require any physical attachment to the NM or eyelids, and is often calibrated in
terms of volts per maximum blink. Newer models of the
minitorque potentiometer are also based on a reflective
sensor, rather than wire windings, so that movement of
the potentiometer spins a polarizing filter relative to
Cognitive Neuroscience
345
5
4
3
Reflectance
2
1
0
0
500
1000
1500
Tone
Airpuff
Integrated
EMG
Raw EMG
Reflectance
200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400
Tone
Airpuff
Fear Conditioning
As stated previously, the fear response of rodents is
apparent in the cessation of normal exploration and
grooming, such that respiration is the only movement observed. This measure was originally recorded with
a stopwatch by trained observers. The time spent in
a freezing posture relative to the total time of observation
would yield a score of percent freezing. Better learning was
indicated by a greater percent of freezing to the conditioning stimulus. More recently, computer software has been
developed to detect movement either by video tracking or
by comparisons of adjacent video frames (if the two frames
are identical, there is no movement). The computer system
has the advantage of being objective, extremely sensitive to
small movements, vigilant in detection, and able to record
and store the data accurately.
Another technique to assess fear conditioning has been
to detect changes in heart rate activity. This involves the
Integrated EMG
Raw EMG
100
300
Tone
Airpuff
500
700
900
1100
1300
1500
Figure 3 Examples of eyeblink data recorded with electromyographic (EMG) electrodes and infrared reflective sensors.
Similar equipment can be used across species for a comparative
analysis of learning using model systems. The horizontal axis
represents time in milliseconds. Figure courtesy of J. Power
with permission.
346
Cognitive Neuroscience
implantation of two subdermal electrodes. The study animal is connected by wires during each training or testing
session. Heart rate measurements are especially useful for
animals that have few baseline movements, such as rabbits
or aged rodents. Heart rate activity can be easily recorded
and presentation of the fear-eliciting stimulus is often
accompanied by a transient decrease in heart rate.
More recent procedures involve telemetry systems that
receive a signal from a surgically implanted transponder.
There are no leads connected to the animal with this
system.
Fear conditioning experiments have also been done
with human study participants. The individuals are
usually told that they will be presented with tones and
a mild shock, but they are not told that the tone predicts
the onset of the shock some seconds later. They can also
be instructed to use a dial to indicate their ongoing expected probability of being shocked. This methodology
has been used by Fred Helmstetter and colleagues, who
find that the study participants quickly learn to expect the
shock at the appropriate interval after the conditioning
tone. Behavioral measurement of fear conditioning in
humans is done with the galvanic skin response (GSR),
which is also used in lie detector tests.
Spatial Navigation
As mentioned previously, the most widely used task for
spatial navigation is the water maze. Learning in this task
was originally measured with a simple stopwatch, i.e., the
time it took for the animal to find the escape platform was
recorded. Decreases in this latency indicated learning and
an awareness of the environment. Computer software that
has since been developed utilizes video tracking. These
computer systems not only record the time for the animal
to find the target, but they also record and measure the
path that the animal took to get to the target. This allows
an analysis of path length, speed, and latency, as well as
measures of proximity to the target. The path length measure has been a valuable tool to discriminate differences
due to motor/swimming abilities from differences due to
the formation of a cognitive map of the environment, i.e.,
recognizing the location in the environment. Many investigators will also introduce a probe trial, in which the
escape platform is removed in order to try and determine
how well the location of the platform was learned.
Electrophysiological Measures
of Learning
Electrophysiological correlates of learning have been
sought in order to locate the sites and processing of activity that might mediate cognition. The general idea is
Extracellular Recordings
The first experiments to detect electrophysiological correlates of learning and cognition used microelectrodes
with an exposed tip about 0.5 mm in length. These
electrodes recorded the extracellular electrical activity
of numerous neurons within the vicinity of the recording
tip. The activity would typically be gathered by setting
a threshold and recording a time point whenever the activity crossed the threshold. This technique works well
when the neuronal population is homogeneous in function, but the activities of individually isolated neurons are
necessary to understand the function of a region with
heterogeneous responses. Consider an example in
which there are two different cell types with responses
in opposite directions, i.e., one type shows increased activity and the other shows decreased activity. The net sum
of this activity as recorded with a multiunit electrode
would indicate no response, when in fact both cell
types were responding. The solution to this problem is
to record the activity of single neurons. This type of analysis often reveals that a population of neurons has heterogeneous responses, and the different response types can
be quantified in terms of percent representation.
The activity of single neurons is now recorded by using
electrodes or microwires with much smaller exposed tips
(0.05 mm or less) or by using waveform identification
techniques to cluster the activity according to groups
with similar waveforms. This technique is based on the
idea that each neuron exhibits a characteristic waveform
based on its biophysical properties and its distance from
the electrode; an analysis of subthreshold activity requires
intracellular recordings (see later). More recently,
a stereotrode, or closely spaced pair of electrodes (or
Cognitive Neuroscience
Intracellular Recordings
Electrophysiological recordings can also be made from
individual neurons by either penetrating the neuron with
a sharp micropipette electrode or by attaching the tip of
a micropipette to the surface of a neuron with slight suction. A few researchers use this technique in living animals. They rely on coordinates and characteristic
electrophysiological signals to indicate when the electrode is near a target. A sudden drop in noise and
a recorded voltage near the resting potential of neurons
(about 65 mV) indicate when the electrode has penetrated a single neuron. However, most studies with
intracellular recording have been done using in vitro
brain slices. These are slices of tissue that have been
cut from an extracted brain and stabilized in laboratory
containers with appropriate physiological buffers. These
slices remain viable for several hours under proper
conditions.
In vitro slices are often used to study the hippocampus
because it is involved with memory functions and because
its network properties can be isolated within a single slice
(consider a banana, and how each slice is similar to
another). The advantage of intracellular recording
is that it allows measurement of subthreshold changes
in voltage or current, and the electrode can be aimed
at either the soma or a dendrite when the slice is visualized
with an appropriate microscope and the electrode is
advanced with a micromanipulator (see Fig. 4). Furthermore, the results from these slice experiments can be
interpreted as being intrinsic to the structure being studied because it is physically disconnected from the rest of
the brain. This also applies to studies on the effects
of drugs, i.e., any effect of the drug must be acting on
347
65 mV
Trained
100 ms
Control
348
Cognitive Neuroscience
Imaging Cognition
A fishing trip through the brain in search of cognition
could be avoided if some sort of radar was available to
narrow down the target area. This so-called radar is now
available in the form of different neuroimaging techniques. The technique that presently has the greatest
spatial and temporal resolution is functional magnetic
resonance imaging (fMRI), which relies on differences
in the magnetic susceptibility of oxygenated and deoxygenated blood. This blood oxygen level-dependent
(BOLD) response can be followed in time while
a subject learns or performs a task. The response can
then be superimposed on a detailed image of the brain
so that functional activity (increases or decreases in the
BOLD response) can be localized. The technique is most
often used with human study participants because they
can keep their head very still, follow directions, give verbal
feedback, and indicate responses with some sort of detector (e.g., a keystroke). Some experiments have been
done recently with monkeys, but they are rather precious
and require lots of training, some sedation, and extreme
care to maintain in good health.
A simple animal model for neuroimaging of cognition is again based on the rabbit. Alice Wyrwicz, John
Disterhoft, Craig Weiss, and colleagues took advantage
of the rabbits natural tolerance for restraint and adapted
the eyeblink conditioning paradigm for the MRI environment. This paradigm allows the detection of functionally
active regions throughout most of the brain while the
rabbit is awake, drug free, and learning a new task. The
results so far have confirmed the involvement of the hippocampus and cerebellum in simple delay conditioning,
and have revealed specific regions that should be explored
more carefully with electrophysiological techniques to
understand fully the neural mechanisms that mediate
cognitive processes. An example of activation in the visual
cortex and hippocampus can be seen in Fig. 5.
Baseline
Conditioned
Further Reading
Cognitive Neuroscience
impairments in aging animals. Neurobiol. Learn. Mem.
80(3), 223233.
Disterhoft, J. F., Carrillo, M. C., Fortier, C. B., Gabrieli, J. D. E.,
Knuttinen, M.-G., McGlinchey-Berroth, R., Preston, A., and
Weiss, C. (2001). Impact of temporal lobe amnesia, aging
and awareness on human eyeblink conditioning. In Neuropsychology of Memory (L. R. Squire and D. L. Schacter,
eds.), 3rd Ed., pp. 97113. Guilford Press, New York.
Gerlai, R. (2001). Behavioral tests of hippocampal function:
Simple paradigms complex problems. Behav. Brain Res.
125(1-2), 269277.
LaBar, K. S., and Disterhoft, J. F. (1998). Conditioning,
awareness, and the hippocampus. Hippocampus 8(6),
620626.
Loeb, C., and Poggio, G. F. (2002). Neural substrates of
memory, affective functions, and conscious experience.
Adv. Anat. Embryol. Cell Biol. 166, 1111.
349
Cognitive Psychology
Johan Wagemans
University of Leuven, Leuven, Belgium
Glossary
architecture of cognition The basic components of the
cognitive system in terms of the representations and
computations that are used in the different mental
functions.
behaviorism Psychology as the science of behavior, limited to
observable entities such as stimuli and responses, and their
associations.
brain imaging The technique to visualize localized brain
activity corresponding to particular mental processes.
cognitive neurosciences The multidisciplinary consortium
of sciences devoted to mind brain relationships.
cognitive sciences The multidisciplinary consortium of
sciences devoted to the understanding of the mind and
the mental functions.
computational (information-processing) approach The
attempt to understand mental functions (e.g., perception)
as processes (computations) on internal (symbolic) representations.
ecological approach The attempt to understand mental
functions (e.g., perception) without intermediate processes
but in relation to the behaving organism with its natural
environment.
functionalism The philosophical view behind the informationprocessing approach, positing that mental processes can be
understood at a functional level, detached from their
hardware implementation in the brain.
neural network models Models of mental functions that rely
on the properties of the brain as a huge network of highly
interconnected simple units.
a particular theoretical framework in psychology developed in the 1960s and 1970s, and is currently being subsumed under larger multidisciplinary sciences such as the
cognitive sciences and the cognitive neurosciences. Second, the domain of cognition can be sketched, both in its
narrow sense, including strictly cognitive phenomena,
and in its broader sense, including also other phenomena
such as attention and performance, perception and action,
and emotion and motivation.
In the early days of psychology as an independent discipline, the internal processes were studied by means of
introspection: Trained observers were asked to report the
contents of their own thoughts and the dimensions of their
own subjective experiences. However, this method was
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
351
352
Cognitive Psychology
Cognitive Psychology
353
trace over time (e.g., when whispering a telephone number between hearing it and dialing it); and (3) a long-term
memory system, with an unlimited capacity (lifelong
learning!) and unlimited duration (reactivation of apparently forgotten material often appears possible), which is
mostly based on a semantic code (i.e., the meaning is
stored, not the details). A second example of this type
of research concerns the format of the internal representations. This issue has given rise to one of the classic
controversies in cognitive psychology, known as the
mental imagery debate. One position holds that there
is a single, universal representation format for all kinds
of information (i.e., encoding all information in a propositional format or language of thought, to borrow a term
introduced by Fodor in 1975). The alternative view (defended strongly by Kosslyn and Paivio) proposes that visual information is not recoded as propositions but is
maintained in an analogue representation format, giving
rise to the idea of mental images inspected by the
minds eye (e.g., mental rotation).
Another issue that has attracted a lot of interest and
research effort is whether certain internal operations are
performed sequentially (in serial stages) or simultaneously
(in parallel). This type of research has lead to a wide variety
of box-and-arrow models of the intermediate mental
processes in-between input stimuli and output responses.
All of these can be taken as attempts to carve up the mind
into its basic building blocks or modules (to borrow another term introduced by Fodor, in 1983), or, more generally, to discover the basic architecture of cognition
(i.e., the title of the 1983 book by J. R. Anderson, one
of the champions of this approach).
The problem with this whole approach is, of course,
that none of these cognitive mechanisms can be observed
directly. Researchers must design clever experiments,
with well-chosen experimental and control conditions,
as well as manipulated and registered variables, to derive
the internal representations and processes from observable responses such as response times and error rates.
These chronometric explorations of mind (the title of
a book by Posner in 1978) have had some success in some
domains, but it is not the case that universally accepted
unified theories of cognition have resulted from this
endeavor (despite valuable attempts in that direction
by pioneers such as Newell, in his last book in 1990).
354
Cognitive Psychology
and when deciding to perform certain actions (i.e., output). In-between these peripheral input and output devices are central cognitive mechanisms that operate on
internal representations. A critical assumption of this approach is that these internal representations and
processes can be understood at the functional level, without consideration of the hardware level (i.e., the neural
mechanisms implementing them).
This assumption constitutes the heart of the so-called
computational approach to cognition, which is the idea
that cognition is, literally, computation (defended strongly
by Pylyshyn in his 1984 book). This approach thus goes
beyond the computer metaphor in claiming that all cognitive processes can be understood as computations on
internal, symbolic representations. At the functional or
software level, the human mind works in just the same
way as the computer; it is only at the hardware level that
they differ, but that is essentially irrelevant to their operation as information processors. This philosophical position, known as functionalism (i.e., a modern version of
mind brain dualism), has implications for the relation
of cognitive psychology to artificial intelligence and its
position within the cognitive (neuro)sciences (see the
section below).
Central Cognition
Problem Solving, Decision Making, and
Reasoning
In their highly influential book on this topic, Newell and
Simon studied human problem solving by asking people
to think aloud while solving problems and by attempting
to reproduce the results and the processes in computer
simulations. They introduced the notion of a problem
space, consisting of the initial state of the problem,
Cognitive Psychology
355
356
Cognitive Psychology
Cognitive Psychology
357
processing mechanisms that are involved. When computer or robot vision and natural or biological vision
are investigated (as Marr does), it becomes immediately
clear that the issue of intermediate mechanisms that detect and process the incoming information is far from
trivial. The computational approach is a bottom-up approach, starting from the information, extracting as much
as possible from it by some well-chosen algorithms, and
representing the results in intermediate representations,
before reaching the final stage of conscious perception of
meaningful objects, scenes, and events. Research has
shown that a whole series of intermediate processes
and representations are needed to achieve tasks such as
object recognition.
This comes back again to the notion of information
processing, as in traditional cognitive psychology, but
with two important differences: (1) the mechanisms are
now spelled out in much more detail (they are written as
algorithms, implemented in a computer, to be simulated
and tested as models for human vision) and (2) world
knowledge stored in memory comes into play only as
late as possible. In light of these similarities and differences
with standard cognitive psychology, it is clear that important issues of empirical research are (1) to test the psychological plausibility of these algorithms (see Palmers Vision
Science: Photons to Phenomenology) and (2) to investigate
when cognition penetrates perception (see Pylyshyns
paper in The Behavioral and Brain Sciences in 1999).
Emotion and Motivation
Emotion and motivation are normally not included as core
domains within cognitive psychology, but it is useful to
point out that they, too, have cognitive aspects. What humans feel is at least partly influenced by what they know,
and mood has well-known influences on memory and
thought processes. Regarding motivation, it is clear that
knowledge is quite important in regulating and planning
behavior in trying to reach goals, both lower level, physiological goals (such as satisfying hunger or sexual appetite) and higher-level, cultural goals (such as establishing
a successful career and marriage, or enjoying a great novel,
movie, theater play, or opera). Schank and Abelsons 1977
book (as mentioned earlier in) includes quite a bit of interesting cognitive psychology on this topic.
358
Cognitive Psychology
society starting in 1977) until today (e.g., The MIT Encyclopedia of the Cognitive Sciences in 2001). Along with
cognitive psychology, there exists a variety of disciplines,
some of which are associated also with the early pioneers
of cognitive science (names in parentheses): cybernetics
(Wiener), computer science (Turing, von Neumann),
artificial intelligence (Newell and Simon, Minsky,
McCarthy), neural networks (Hebb, McCulloch and
Pitts, Rosenblatt), linguistics (Chomsky, Lakoff, Jackendoff), philosophy (Putnam, Fodor, Dretske, Dennett),
cognitive anthropology (Dougherty, DAndrade), and
cognitive ethology (Griffin, Allen and Bekoff ). Because
cognitive psychology is a behavioral discipline, with
a subject belonging to humanities but a research methodology, which is at least partly inspired by the natural sciences and engineering, it has fulfilled an important
bridging function within this consortium. The functionalist philosophy and the computational framework behind
most of the work in cognitive psychology have provided
the foundation for this crucial role within this network of
interdisciplinary collaborations.
Cognitive Psychology
gradually became clear that the analytic approach of decomposing intelligence into its basic building blocks and
then putting the building blocks together in a computer
model was doomed to failure. The radical alternative was
to have intelligence without representation (as Brooks
called it in an influential paper in Artificial Intelligence) or
to let intelligence grow into more natural circumstances by
building small, insectlike robots and studying, from a more
synthetic approach, how they develop survival strategies
and solve realistic problems, in interaction with their environment. In other words, rather than isolating the mind
from the brain, and the cognitive system from its environment, intelligence became embodied again and cognition
became situated in its context again. In artificial intelligence, this new trend has led to the emergence of a new
field called artificial life. It appears that cognitive psychology will have to collaborate more with social scientists
and biologists than with computer scientists.
Conclusion
It may appear that the pendulum has started to swing back
again in the direction of less internal cognition and more
externally guided action, after almost half a century of
dominance of cognitivism over behaviorism. However,
with the experience of the disadvantages of the onesided exaggerations of both approaches, it should be
possible to achieve a better synthesis. Finally, with the
increased understanding of the brain mechanisms, which
mediate all of the interactions between the mind and
the world, cognitive theories can now be built on firmer
ground. Scientists from all of the contributing disciplines,
including the social scientists, should work together to
try to understand human cognition in its broadest
possible sense.
359
Further Reading
Baddeley, A. D. (1999). Essentials of Human Memory.
Psychology Press, Hove, United Kingdom.
Bechtel, W., and Graham, G. (eds.) (1999). A Companion to
Cognitive Science. Blackwell, Oxford, United Kingdom.
Coren, S., Ward, L. M., and Enns, J. T. (1999). Sensation and
Perception, 5th Ed. Harcourt Brace College Publ., Forth
Worth.
Eysenck, M. W. (2001). Principles of Cognitive Psychology,
2nd Ed. Psychology Press, Hove, United Kingdom.
Gazzaniga, M. S., Ivry, R. B., and Mangun, G. R. (2002).
Cognitive Neuroscience: The Biology of the Mind, 2nd Ed.
Norton, New York.
Harnish, R. M. (2002). Minds, Brains, Computers: An
Historical Introduction to the Foundations of Cognitive
Science. Blackwell, Malden, Massachusetts.
Lamberts, K., and Goldstone, R. (eds.) (2004). Handbook of
Cognition. Sage, London.
Osherson, D. N. (ed.) (1995). An Invitation to Cognitive
Science, 2nd Ed. Bradford Books/MIT Press, Cambridge,
Massachusetts.
Parkin, A. J. (2000). Essential Cognitive Psychology. Psychology Press, Hove, United Kingdom.
Pashler, H. (ed.) (2002). Stevens Handbook of Experimental
Psychology. Wiley, New York.
Reisberg, D. (2001). Cognition: Exploring the Science of the
Mind, 2nd Ed. Norton, New York.
Sternberg, R. J. (ed.) (1999). The Nature of Cognition.
Bradford Books/MIT Press, Cambridge, Massachusetts.
Styles, E. A. (1997). The Psychology of Attention. Psychology
Press, Hove, United Kingdom.
Glossary
computational Describing types of models or approaches
based on actually implemented computer programs, whereby the formal representation is embodied in the computer
program and the output of computer runs can be compared
with empirical data.
ethnoscience Early cognitive anthropology, focused on
studying the semantics of folk knowledge systems with
a methodology from structural linguistics.
formal (models, features, or descriptions) Entities or
statements that are explicitly and precisely specified, often
via mathematical representations.
instantiation The application of an abstract but well-defined
plan (such as a computer program) to a particular concrete
situation (cf. realization in linguistics, which refers to
a filled-out concrete instance of an abstract category).
kinship terminological system The kinship terms of
a language, organized and analyzed as a distinct system.
reference and contrast An assessment in which reference
refers to the semantic relationship of words or cognitive
categories to phenomena in the world that the words or
cognitive categories represent; contrast refers to the
semantic relationship of words or cognitive categories to
other words or categories.
semantic component A dimension of contrasting attributes,
whereby the attributes distinguish members of one
semantic category from another.
similarity matrix A matrix of values that represent the
similarity of each item on a list to each other item. If
similarity is taken as symmetric (as it usually is), then the
information is all contained in half of the matrix (a halfmatrix).
triads test A way of arriving at a similarity matrix by
constructing a list of all possible triples from the list of
items, and then having informants for each triple indicate
which two out of the three items are most similar. Summing
the pairwise similarity scores across triples produces
a similarity matrix. Balanced block designs have been
constructed that allow use of only a subset from the list of
Background
The study of cognitive anthropology first emerged in 1956
when Ward Goodenough challenged John Fischers description and analysis of household composition patterns
on the island of Truk. Fischer had used traditional anthropological definitions of concepts such as matrilocal
and patrilocal residence (based on whether a new couple were living with the wifes or husbands parents), while
Goodenough argued that, for the Trukese whose land and
houses were owned by corporate matrilineal descent
groups, one could understand residence decisions only
in terms of the options Trukese faced and the rights that
created these options (i.e., in effect, with whose matrilineage the new couple lived). Goodenough argued that
Trukese decisions could be understood only in terms of
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
361
362
the categories (and entailments of categories) they recognized as part of their language and culture. Goodenoughs
goal was one that goes back to Boas and his students, but
his approach represented a radical departure. This approach was based on the linguistic insight most forcefully
driven home by Edward Sapir (in articles on Sound Patterns in Language and The Psychological Reality of
Phonemes) that effective descriptions of languages
(e.g., ones unknown to the linguist or the linguistic community) had to be based on the phonological (sound)
and grammatical categories and distinctions used by
speakers of that language, as opposed to external categories based on the linguists own language or on some
abstract or philosophic standard. The distinguishing of
the phoneme, the minimal cognitive unit of sound in
a language that distinguishes one meaningful unit (e.g.,
word) from another, from a phone type, a physical
sound unit that linguists use to describe the sounds of
a language, had an importance for linguistics comparable
to the importance for physics of the distinction of mass
from weight; it allowed a compact, formal, systematic
characterization of the sounds of a language, and thus
of the constraints on their patterning and combining
into larger units. Goodenough argued that cultures, like
languages, were systems that could be understood only via
an analysis that was built on the conceptual units used by
their members.
In linguistics, the analysis in terms of phonemes of the
sound system used by speakers of a language was called
phonemic, whereas the description or analysis of
a languages sounds in terms of some external classification scheme, such as the International Phonetic Alphabet
(IPA), was called phonetic. Those terms and the distinction they imply had already been generalized to culture by Kenneth Pike in 1954 as emic and etic, and
analyses such as that advocated by Goodenough came to
be spoken of as emic. In Pikes linguistically derived
sense, emic referred to an analysis of knowledge and/
or behavior that was structured by the categories that
native participants used in making their decisions regarding distinctions or actions, whereas etic referred to an
analysis structured by categories brought by the describer
from outside the given system; the anthropologist Marvin
Harris much later introduced a different and confusing
but ultimately more widely knowninterpretation of the
terms in which emic referred to a folk theory and contrasted with an etic or scientific theory.
The goal of the approach that grew in response to
Goodenoughs discussion of residence rules was an
emic approach to the description and analysis of culture.
Development of the approach began with the area of
shared knowledge closest to linguistics, that of the overlap
between language and culturethe semantics of word
meanings. Groups of semantically related words (or the
conceptual universe they referred to) were spoken of as
Frame Eliciting
In frame eliciting, native statements on some topic of
ethnographic interest are elicited, and then sentences
from the statements are turned into frames by replacing
meaningful entities in them with blanks; lists of potential
culturally appropriate fillers of the blanks are then
elicited. These statements typically came from a record
of less formal questioning involving versions of whats
that?, what is it used for?, who does it?, and so forth.
A statement such as X is a kind of M would then be
turned into a question, as in Is ____ a kind of M?, where
other elicited entities (such as Y, C, D, etc.) would be used
in the blank. Often taxonomic questions would also be
involved, such as what kinds of X are there? and what is
X a kind of?, and sometimes attribute questions, such as
how do you tell X from Y? or what is the difference
between X and Y?. Categories with similar privileges of
occurrence are grouped together and the features on
which they contrast with one another are examined, as
are their differential implications for further associations
or actions. Similarly, the relations between superordinate
and subordinate categories (in terms of inclusion
relations) are explored. The method is best described
by Frake in his Notes on Queries in Ethnography
article.
Componential Analysis
In componential analysis, an exhaustive set of referents of
each of a set of contrasting terms (a domain) is assembled.
Each referent is characterized on a list (ideally, a complete
list) of attribute dimensions that seem relevant. The classic example was in kinship, wherein the contrasting terms
were kinterms and the referents were kintypesgenealogical strings connecting ego (the reference person)
and alter (the relative of egos labeled by the kinterm);
attribute dimensions included information such as sex of
alter, alters generation relative to ego, and lineality
(alter on, next to, or off egos direct line of ancestors
and descendants); this is discussed in Wallace and Atkins
The Meaning of Kinship Terms. Relevance can be assessed on the basis of prior comparative experience,
Data Structures
Paradigmatic Structures
A componential analysis aims at producing one kind of
cognitive structure, a paradigm. Paradigmatic structures are composed of a set of terms (or categories), all
contrasting with one another at a single level of contrast
(like opposites, but with the possibility of being
multinary); the terms are distinguished from one another
by a set of cross-cutting semantic dimensions (i.e., sets of
contrasting attributes, as in Fig. 1: m: m1 vs. m2, x: x1 vs. x2,
y: y1 vs. y2). A paradigmatic structure is like a cross-tabulation in statistics. In principle, each dimension is relevant to each term. Figure 1 shows the actual structure,
but the information can also be presented in tabular form,
as in Table I.
Both Occams razor and relevant psychological studies
suggest that each of the categories in a paradigmatic structure should be conjunctively defined, but for folk categories the mechanism that produces conjunctivity is
a probabilistic one depending on cognitive ease,
which does allow for the possibility of some disjunctivity
where there exists a good functional reason and there is
sufficient effort invested in relevant learning (as, e.g., with
strikes in baseball). Cognitive ease refers in a general
and loosely defined way to the mental work involved in
learning and using (in this case) a category, and conjunctivity is one well-defined version or aspect of it. Conjunctivity refers to the intersection (as opposed to sum) of
y2
363
y1
F
H
x1
x2
m1
m2
Table I
Term
A
B
C
D
E
F
G
H
m1
m1
m2
m2
m1
m1
m2
m2
x1
x2
x1
x2
x1
x2
x1
x2
y1
y1
y1
y1
y2
y2
y2
y2
As defined in Fig. 1.
364
Taxonomic Structures
Another kind of structure is a taxonomic one. Taxonomic structures are composed of a set of terms at different levels of inclusion, related to one another by
a hierarchy of contrast (e.g., cat vs. dog) and inclusion
relations (cats and dogs are both carnivores) (see
Fig. 2). Such structures have a head term, which includes
some number of subordinate terms, which in turn may
each include terms at still lower levels of contrast. Dimensions of contrast are not normally repeated; successive
dimensions subdivide (rather than crosscut) prior or
higher level ones. Each semantic dimension (in Fig. 2:
m: m1 vs. m2, x: x1 vs. x2, y: y1 vs. y2) is only pertinent
to the node at which it occurs in the systemthat is, to the
subset of items covered by the superordinate term it subdivides (e.g., whatever it is that distinguishes cats from
dogs is irrelevant to horses vs. cows). Figure 2 shows
the actual structure, but the information can also be presented in tabular form, as in Table II.
The clearest examples of taxonomies in the literature
have been ethnobotanical and disease domains. The minimal example of a taxonomy is a set of two terms contrasting with one another under a single head term. Taxonomic
categories, like other folk categories, should in the normal
course of events be conjunctively defined, and the conjunctivity constraint can be a powerful tool when attempting to analyze and understand systems of folk categories.
M
m1
m2
m
Y
Mixed Structures
Mixed structures are possible; either of the preceding kinds
of structure can, in principle, be embedded as a node in the
other. That is, an item in a taxonomy could expand multidimensionally (instead of as the normal simple contrast),
whereas an item in a paradigm could be the head term
for a taxonomy. A fruitful research area concerns the
combination of cognitive conditions, conditions of use of
the terms, and nature of the pragmatic phenomena being
categorized by the terms in question that lead to one or
the other kind of semantic structure. Psychological issues,
including concept formation strategies and constraints,
short- vs. long-term memory, rehearsal time and
learning, and so forth, seem relevant to the issue.
Marking
Taxonomic semantic structures are relations among
named entities. But sometimes one name or label will
operate at more than one level of contrast. For instance,
cow can be opposed, as a female, to the male bull, but
cow can also be the cover term for bovines that includes
both cows and bulls. The relation between the words
cow and bull is an example of a contrast between an
unmarked term (cow) and a marked term (bull),
whereby the unmarked term is both opposed to the
marked one and included by it. Such marking relations
can be joined into more extensive marking hierarchies,
as man vs. animal, whereby man includes man vs.
woman, which, in turn, respectively include man vs.
boy and woman vs. girl. The unmarked member of
such an opposition represents a kind of default value for
both members of the opposition when the feature on
which they contrast is neutralized. For cows, the functional basis of the default is simple: the great number of
cows for each bull in a dairy herd. For men and women,
the basis is perhaps more interesting: having to do with
womens long-time lack of civil and other rights and
privileges. Marking is extensively seen in ethnobiological
classifications, and indeed, according to Brent Berlin,
Table II Tabular Presentation of a Taxonomic Structure
Dimensiona
x2
x1
x
y2
y1
y
Term
A
B
C
D
X
Y
M
m1
m1
m2
m2
m1
m2
x1
x2
As defined in Fig. 2.
y1
y2
provides a major mechanism via which such systems develop (a taxonomic term is used not just for its primary
referentsits unmarked sensebut also as a generic
cover term for other similar items; if the other items
have their own specific labels, these will be rarer marked
alternatives to the unmarked default, on the model of
bull). Marking was first worked out for phonological
systems by N. Trubetskoy in 1939 and translated as
Fundamentals of Phonology in 1969, and then broadly
extended by J. Greenberg 1966 to other linguistic phenomena; marking seems to show up widely, as well,
in shared nonlinguistic cognitive systems. It offers
a mechanism by which an existing cognitive category
can be used as a kind of generic to include novel referents,
and thus by which simple systems of classification can
evolve into more complex ones, as the novel referents
eventually get their own specific category under the
older generic.
Semantic Extension
Marking provides examples, as just seen, in which not all
referents of a term (or of a nonverbal cognitive category)
may be equally central for their users, or as well understood. In general, for word use at least, a great amount of
our usage is of words for referents that are not the primary
referents of the words. A normal process in word use is
semantic extension, whereby entities (objects, actions,
attributes, etc.) that do not have their own special labels
are spoken of via the label of some similar entity; the
choice of label depends on how the oppositional structure
of the semantic domain chosen matches the relations between the target entity and other entities from which it is
being distinguished. Such extension can be denotative
(based on attributes of form or appearance), connotative
(based on functional attributeswhat makes the category
useful), or figurative (metaphoric, metonymic, etc.). The
referent to which the label primarily belongs is spoken
of variously as the focal, core, kernel, or prototypic
referent. Prototypicality seems to be a result of
a combination of frequency of usage (what referents
learners most frequently encounter) and functional fit
(what referents best fulfill, whatever it is that makes
the category important enough to get its own unit or
label). For instance, in English, the term uncle prototypically refers to a parents brother, but it alsoentirely
literally and correctlycan be used for an aunts husband
(and, sometimes, as an unmarked term for a great
uncle,i.e., grandparents brother). It can be used connotatively for other close males with whom there is interaction in the way interaction should occur with an uncle;
these can include (depending on context and situation)
a parents cousins, family friends of the parents generation, and so forth. Such usage is not considered technically correct, as can be demonstrated methodologically
365
366
Types of Models
Taxonomic and paradigmatic structures emerged in early
semantic studies as ways of descriptively organizing
chunks of folk knowledge. There was and still is discussion
of the psychological status of such descriptions: to what
degree, or in what senses, are they to be considered psychologically real? The debate about psychological reality was inherited, along with the previously mentioned
methodology, by cognitive anthropology from structural
linguistics. When the discussion was broadened to include
actual psychological data (as opposed to simple reasoning), the issue quickly became one of real in what
sense and/or under what conditions. This methodological advance was introduced by Romney and DAndrade
in 1964 in Cognitive Aspects of English Kin Terms
(reprinted in 1969 in Tylers Cognitive Anthropology).
Variant versions of psychological reality emerged, inter
alia, among folk definitions of categories, folk
understandings of the connotations or implications of category membership, of differing contexts of definition and
application, etc. For kinterms in particular, actual folk
definitions (as in the because hes my mothers brother
answer to how do I know X is my uncle?) were found to
be quite different in form from the definitions implied by
componential analysis [first generation, ascending, colineal (or collateral), male, kinsperson]. On the other hand,
there was found strong psychological evidence for native
use of something like componential structures to reason
about the attitudes and behavior that pertain to various
classes of kin. (This was learned through answers to questions such as Describe an uncle, Why are uncles important?, How should you treat your uncle?, etc.)
Studies aimed at distinguishing psychologically real
from unreal componential structures (among those that
did the basic job of distinguishing the categories from one
another reasonably succinctly) in kinship showed that
interviewees were capable of applying a range of different
models in different contexts, even if some models
appeared more generally salient than others.
Formal Models of
Cultural Knowledge Systems
Attempts to organize and formalize the description of folk
definitions of kinterm categories led to a different kind of
modeling and a different kind of research methodology. It
was found (particularly by S. H. Gould in 2000 and by
367
368
Free-Listing Snapshots
One kind of information that is sometimes desired about
a group involves what ideas or thoughts or concerns, etc.,
and maybe what linkages among these, are most salient in
the group in one or another context. One way of getting
a quick sketch of such salient notions is to give informants
a short free-listing task (e.g., list five thoughts that come
to mind when you think of anthropology) or sketching
task (sketch an American high school); it is then possible
to form a composite list or picture based on responses
produced by more than one or two people (used tellingly
by Kevin Lynch in his 1960 Image of the City). The composite can be ordered so that answers produced by more
people are more saliently displayed, compared to those
produced by fewer people. Because the task is openended, and because specific responses are being counted
(rather than sorted into categories or ratings), repeated
answers can only represent items that are saliently associated with the stimulus concept in the relevant cultural
community; the odds of two people coming up with the
same response by chance (i.e., where there is no relevant
shared linguistic and cultural community understanding)
is exceedingly low, given the size of peoples vocabularies
and the varieties of ways available for describing and talking about even commonplace items. The high-frequency
items provide a kind of snapshot of where the collective
attention is directed. Such a snapshot can be used as
a starting point for more careful and expensive research
regarding the given issues. A comparison of such snapshots for two contrasting communities can focus
a researcher quickly on salient cognitive (or behavioral)
differences. The relevance of the snapshot(s) to
Multidimensional Scaling
Multidimensional scaling, like factor analysis, is
a computational method for taking a matrix of pairwise
similarities or dissimilarities for a list of entities (e.g.,
color terms, kinterms, diseases, or roles) as input and
finding the arrangement of points representing those entities in a space of a specified dimensionality such that the
rank order of interpoint distances best matches the
rank order of interentity dissimilarities. The measure of
this match is called stress. The lowest dimensionality
for which the stress is still low is looked for by running
the procedure for a sequence of dimensionalities. If
stress gradually increases as dimensionality decreases,
it suggests that the data have no clear dimensionality;
on the other hand, if the stress stays more or less constant
and low, down to a certain dimensionality, but spikes up
below that dimensionality, it suggests that the data arrangement really has that intrinsic dimensionality. An
example is the basic six primary color terms in English
(with similarity measured via a triads test), whereby from
5 down through 2 dimensions stress is low, but whereby
it spikes up in one dimension; an examination of the
arrangement reveals the familiar redorangeyellow
greenbluepurple color circle, sometimes with some
extra distance between the cool colors and the warm
ones, and sometimes with the warm colors a little more
spread out than the cool ones.
MDS then offers a way of getting a metric spatial model
out of ordinal (or interval) data. If the fit of the picture
to the input data is good, as it was with Romney and
DAndrades kinship data, then it suggests that the
MDS spatial model is a reasonably good representation
of the model that natives are using to generate the inputted similarity data (of the sort shown in Fig. 1). The picture comes in the form of a list of points with their
coordinates in the given dimensional space; the axes
are arbitrary in terms of the research problem, though
some programs can orient them so that the first accounts
for the greatest amount of interpoint distance, the second
the next amount, and so forth. The analyst needs to scan
369
Factor Analysis
Factor analysis does much the same task, but is
a mathematical algorithm that generates a best arrangement of the points in relation to the inputted data. It
requires ratio data. Unlike factor analysis, MDS uses
a heuristic technique that can get locked into
a relatively good solution while missing the optimal
one, but requires only ordinal data. One by-product of
the ratio/ordinal difference is that when imperfect ratio
data are fed into factor analysis (as is typically the case
with similarity measures), each irregularity in the data
forces some extradimensionality in the picture. The effect
is that factor analytic pictures can never be interpreted
by direct examination of the pattern of points in the
space, but always have to be attacked via indirect
means, usually some attempt to orient the axes of the
space (as in a varimax rotation) such that the resulting
axes become interpretable as meaningful variables or
scales.
One common data type involves a matrix of cases by
data measurements, whereby data measurements are, for
example, a question (e.g., do you give money to?, do
you show respect to?, etc.) by topic (e.g., father,
fathers brother, friend, etc.) matrix for each of a set
of informants, and whereby similarity (question to question, topic to topic, or informant to informant) can be
measured by correlation coefficients or by measures of
the proportion of measurements (or questions), over all
pairs, for which each of a pair of topics elicited the same
measurement (or response) from the same informant.
Another common data type is based on rating tasks
(from psychology), such as paired comparisons or
triads tests.
Hierarchical Clustering
Hierarchical clustering (HC) programs use the same
kinds of similarity data as those used by MDS to produce
hierarchical tree (or dendrogram) structures. Cladistic
analysis in biology is based on such a procedure. The
aim of the program is to find the best or most efficient
branching structure starting with each entity separate
from all others and gradually cumulatively joining entities
370
Consensus Measures
A comparable but different approach exists for evaluating
answers to questions that have right and wrong answersthe consensus theory approach worked out by
Romney and Batchelder. They have developed
a statistical technique that can take informants answers
to a set of questions and determine whether the spread of
answers to a given question represents differing degrees
of knowledge about a single correct answer or the presence in the target community of different correct answers, or the lack of any consensus. When there is a single
correct answer, the technique can tell the analyst
what that answer is. Correct here can refer to actual
facticitysay, the date on which the U.S. Civil War
endedor to a cultural consensus about what might
even be a factually incorrect answeras in beliefs in
some communities about who really blew up the
World Trade Center. Diverse correct answers, based
on different knowledge, can be, for instance, the distinction between interpretations of ritual by priests and by
laity, or the distinction for some cultural sequence, such as
the cargo system in one Mexican pueblo (shown by
Cancian in 1963), between the actual noncanonical
path that some know Joe to have taken and the canonical
path that others presume Joe to have taken). The method is similar to the free-listing snapshot one in that it
assumes that shared answers must have a basis in shared
knowledge, whereas uninformed guesses are going to be
idiosyncratic. In both cases, the analyst is taking advantage
of the fact that communities are characterized by shared
knowledge, and thus that there are ways of pulling inferences out of shared responses from community members
that could not be pulled out of answers produced by
a random cross-section of strangers; the two methods
represent different ways of getting at shared cognition.
Anthropology has progressed from a mechanistic view
of culture in which culture was seen as unitary, whole,
shared, and canonical. Formerly, everybody, at least in
allegedly simple cultures, was presumed to know and do
the same thing as everyone else, except for some variation
according to age and sex. It is now realized that the kind of
diversity known and experienced within our own culture
characterizes the makeup of other cultures as well (even if
the amount of variation is in part a function of size and
contextual diversity). It is now known that the division of
labor in society involves distributed knowledge as well as
distributed economic activity; we are beginning to understand something of how the shared frame works that
Implicational Analysis
Joseph Greenbergs 1968 exploration of implicational
universals in linguistics has been carried over into anthropology by Roy DAndrade and Michael Burton,
L. Brudner, and Douglass White (see below). A table is
constructed showing the interaction of the presence or
absence of one variable with the presence or absence of
another, and then a zero cell is looked for in the table
that is, a logically possible combination that no empirical
cases fall into (see Table III). The zero cell, if the other
numbers in the table are large enough to make it statistically significant, means that there exists a logical implicational relationship between the two variables; in the
example here, it means that the presence of Y implies
the presence of X, but not vice versathat is, X must
be present before Y, or the presence of X is a necessary
condition for the presence of Y. Depending on the data,
these implicational relationships can be chained (as
W ! Y ! X ! Z) to produce, for instance, a kind of
scale on which the sexual division of labor in different
societies can be ranked (in Burton et al.s 1977 example)
or the cumulation of symptoms that go into native speakers narrowing disease diagnoses (DAndrades example)
or the structure of Navajo attitudes toward conditions
involved in relocation (Schoepfle et al. in Gladwin
(1984)). Such chains, based on comparisons among systems, can also provide insight into how conceptual systems can and cannot change over time (as Greenberg
Table III A Two-by-Two Table, with a Zero Cell Implicational Relationshipa
Y
X
Present
Absent
a
Present
Absent
Exist cases
Exist cases
0: no cases (hence: Y ! X)
Exist cases
X and Y are features that are either present or absent in each case;
exist cases means that there are cases in the given cell.
371
Network Analysis
Network analysis, though not directly cognitive, does offer
considerable insight into the ways in which shared cognitive structure and content spread and into the social
conditions that affect such spread. It can also provide
some insight into the actual makeup of the social entities
to which various shared cognitive structure or content
inheresbecause collective cognition, by definition,
has to inhere in some social entity (whether formal or
informal) and has to achieve its collective sharing, complementarity, predictability, etc. through patterns of interactions among community members. Network analysis
is concerned with who communicates with whom within
a group and with the analytic insights that come from
considering the overall pattern of linkages within that
group.
372
Algebraic Models
For systems that are well understood, and for which variability and systematicity have been well explored, the
possibility arises of producing algebraic models with
axioms that are clear entities and operations, and for
which different ethnographic examples can be derived
373
Conclusion
Cognitive research methods offer a particularly useful
approach to an understanding of the nature of culture
(and thus of society) and how it works. But such usefulness
will depend on significant improvements both in our empirical knowledge of cultural models and in our ability to
formally represent this knowledge. Agent-based computational models seem to offer one very promising method
for addressing such goals. Our understanding of how cultural systems change and adapt will benefit from further
study of the interaction of individual cognitive properties
(including capabilities, constraints, mode of operation,
and so forth) with shared cognitive structures (including
cultural knowledge systems, cultural models, and whatever other kinds of shared cognitive structures future
work may turn up).
Further Reading
Berlin, B. (1992). Ethnobiological Classification. Princeton
University Press, Princeton, New Jersey.
Berlin, B., and Kay, P. (1969). Basic Color Terms: Their
Universality and Evolution. University of California Press,
Berkeley.
Bernard, H. R. (2002). Research Methods in Anthropology:
Qualitative and Quantitative Approaches. Altamira Press,
Walnut Creek, California.
DAndrade, R. G. (1995). The Development of Cognitive
Anthropology. Cambridge University Press, Cambridge.
Gladwin, C. H. (guest ed.). (1984). Frontiers in hierarchical
decision modeling (1984). Human Org. 43(No. 3), special
issue.
Gould, S. H. (2000). A New System for the Formal Analysis of
Kinship. University Press of America, Lanham, Maryland.
Greenberg, J. H. (1966). Language Universals. Mouton, The
Hague.
Greenberg, J. H. (1968). Anthropological Linguistics: An
Introduction. Random House, New York.
374
Commensuration
Mitchell L. Stevens
New York University, New York, USA
Glossary
commensuration The comparison of different objects,
attributes, or people according to a common metric.
cost-benefit analyses Decision-making techniques in which
the benefits and costs associated with a course of action are
calculated.
incommensurable Something that is not amenable to any
common measurement; something unique, incomparable.
metric Any standard of measurement.
Theories of Commensuration
The promise and limits of commensuration have engaged thinkers since ancient times. According to Martha
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
375
376
Commensuration
Commensuration as a Practical
Accomplishment
Commensuration is not inevitable. It typically requires
considerable human effort, political power, and large resource outlays to accomplish. The discipline, coordination, and technical expertise that commensuration
demands often take a long time to establish. This partly
explains why commensuration thrives in bureaucracies.
Because commensuration obscures distinctive characteristics, it is also a powerful strategy for producing the impersonal authority that is the hallmark of bureaucracies.
Before metrics can be used they must be invented, and
some instances of their creation stand among the most
enduring and consequential artifacts of human civilization. Money economies are prominent examples.
The recent transition of many European Union (EU)
nations to a common currency aptly demonstrates the
amount of effort required integrate disparate value
systems according to a common metric. The move to
the euro was in the planning stages for years and took
enormous political effort to assure compliance throughout the EU. Literally all components of economic life on
continental Europe had to be recalibrated to the new
currency, a truly massive organizational feat. Media
stories about the transition have ranged from the euros
affect on global financial markets to the Herculean
task of reprogramming thousands automated teller
machines, to the efforts of social workers to teach
blind consumers the distinctive feel of the new bills
and coins.
The rise of cost-benefit analysis in U.S. public policy
provides other examples of the work commensuration
requires. As Theodore Porter demonstrates, even before
the Flood Control Act of 1936, when federal law first
required that the benefits of federally subsidized water
projects exceed their costs, bureaucracies were using costbenefit analysis. Spurred by law and conflict, agencies
invested more heavily in development of cost-benefit
analysis; over time, the procedures were standardized,
regulated, and eventually incorporated into economic
theory. Attendant professional groups of econometricians, statisticians, and decision theorists emerged
to do the job of measuring and comparing the value of
such disparate objects as riparian habitats, natural vistas,
flood control, Indian land claims, and revenue. Commensurative expertise became institutionalized as distinctive
careers and elaborate professional industries. As
a commensuration strategy, the reach of cost-benefit analysis extended far beyond decision making. It shaped what
information was incorporated, how internal units interacted, relations with other agencies and with constituents,
and even the terms under which government projects
could be challenged or supported.
Commensuration
Incommensurables
Incommensurables are things that are regarded as unique
in ways that make them inimical to quantitative valuation.
Children are perhaps the most prominent example. Feminist legal scholars have made articulate arguments
against the development of markets in children and
have raised concern about the rise of quasimarkets in
international adoption, in which hopeful parents pay
large sums to intermediate firms for the transfer of infants
across national borders. Central to the feminist concern is
the notion that the singular value of human life is eroded
when it meets the cash nexus too directly. How to manage
adoptions in a manner that sufficiently honors human
integrity, alongside very real financial incentives for biological parents and the practical organizational costs of
adoption transactions, remains a controversial legal and
ethical question.
Sometimes people invest such singular value in particular objects that to commensurate them is tantamount
to destroying them. Joseph Raz calls such objects
constitutive incommensurables. Wendy Espeland argues that for the Yavapai Indians of the U.S. Southwest,
land is a constitutive incommensurable. When federal
agents sought to purchase Yavapai land for a large
water reclamation project, the Indians argued that their
land was a part of their identity as Yavapai and so should
not be the object of commensuration. In a more recent
example, several member countries of the EU have refused to adopt the euro, citing the intimate relationship
377
Commensuration Politics
Commensuration is rarely a politically neutral process because it transgresses cultural boundaries and reconfigures
values in ways that variably privilege different parties. The
long-standing legal debate over comparable worth pay
policies, in which the pay scales of predominantly female
occupations are rendered commensurate with predominantly male jobs, is a clear example. Proponents of
comparable worth argue that women are systematically
disadvantaged by a labor market that channels them into
occupations that pay less than ostensibly equivalent jobs
that are predominantly male. But the comparable worth
movement has been stymied by two formidable obstacles:
the expense to employers of adjusting female pay scales
upward and the stubborn cultural presumption that
the jobs men and women do are essentially different.
Another broad instance of commensuration politics
involves the national debate over the role of standardized
tests in college admissions. College entrance exams such
as the SAT purport to commensurate all college-bound
students with a common yardstick of college readiness.
SAT scores are closely correlated with race, however.
Asians have the highest average scores and AfricanAmericans the lowest, with those of Whites and Hispanics
in between. SAT scores also vary by social class. These
unsettling relations are hard to reconcile with those who
interpret the test as a measure of individual merit. At issue
in the debates over comparable worth and the validity of
test scores is the quintessential American question of how
to commensurate opportunitya political and ethical
problem as well as a technical one.
Commensuration is so ubiquitous, we often fail to notice it. Yet commensuration is crucial for how we implement our most cherished ideals. Values expressed through
commensuration are often associated with precision, objectivity, and rationality. Democracy has become synonymous with voting and polls. Standardized tests identify
merit, assure competence, and hold educators accountable. Cost-benefit analyses evaluate efficiency. Risk
assessments reassure us about uncertain futures. As
a vehicle for assembling and sorting information,
commensuration shapes what we attend to, simplifies
cognition, and makes a complicated world seem more
amenable to our control.
378
Commensuration
Further Reading
Anderson, E. (1993). Value in Ethics and Economics. Harvard
University Press, Cambridge, MA.
Carruthers, B. G., and Stinchcombe, A. L. (1999). The social
structure of liquidity: Flexibility, markets, and states.
Theory Soc. 28, 353 382.
Desrosieres, A. (1998). The Politics of Large Numbers:
A History of Statistical Reasoning. Trans. C. Nash. Harvard
University Press, Cambridge, MA.
Espeland, W. N. (2001). Commensuration and cognition.
Culture in Mind (K. Cerullo, ed.), pp. 63 88. Routledge,
New York.
Espeland, W. N., and Stevens, M. L. (1998). Commensuration
as a social process. Annu. Rev. Sociol. 24, 313 343.
Hacking, I. (1990). The Taming of Chance. Cambridge
University Press, Cambridge, UK.
Communication
Ayo Oyeleye
University of Central England in Birmingham,
Birmingham, United Kingdom
Glossary
communication Definitions vary according to theoretical
framework and focus. Media theorist George Gerbner
defined the term as social interaction through messages.
decoding The process of interpreting and making sense of
the nature of messages.
discourse analysis A method for analyzing the content of
mass communication; seeks to explain the ways that mass
media texts are used to convey power and ideology to
readers or audiences.
encoding The process of communicating through the use of
codes (aural, visual, etc.) that are deemed appropriate for
the objectives of the sender of a message.
group dynamics The scientific study of the behavior and
interactions of people in groups.
interaction The mutual relations between people in a social
context.
sociogram A chart used for illustrating the interrelations
among people in groups.
sociometry A technique for identifying the structure of ties in
a group based on affective interaction, as opposed to role
expectations.
Introduction
The origin of the word communication has been traced
back to the Latin word communicare, which means to
impart, i.e., to share with or to make common such things
as knowledge, experience, hope, vision, thought, opinion,
feeling, and belief. This historical link can be seen in the
way that many contemporary dictionaries define the
words communicate and communication. Central to
the various definitions are expressions such as to share,
to make known, to bestow, or to reveal, thus conveying the concepts of imparting information and having
something in common with another being. In communication studies, however the word takes on a rather more
complex guise and, as such, it is not easy to offer a single
definition that will encompass the various perspectives
that communication scholars hold on the concept. Indeed, scholars such as J. Corner and J. Hawthorn argue
that such a task is neither necessary nor desirable. They
contend that the study of communication should be undertaken by reference to its essential characteristics as
constituted by a variety of ideas and methods in the
arts and social science disciplines. The often conflicting
and contentious ideas about the nature of communication
must not be seen as a drawback. Rather, they should
usefully encourage students of communication to develop
a critical awareness of the relative status of all contending
ideas about the nature of the subject, while also engendering a sense of challenge to participate in the ongoing
efforts to explore and enunciate the nature and elements
of communication.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
379
380
Communication
Defining Communication
Over the years, communication scholars have developed
a number of definitions for communication that tend to
reflect the particular perspective from which they are
working. Such definitions can serve the useful purpose
of providing a starting point for framing inquiries into
some aspect of communication, or they can show the
perspective from which scholars are studying communication. A definition of communication as framed by
Berger and Chaffee, for example, inscribes it within the
tradition of scientific inquiry thus: Communication science seeks to understand the production, processing, and
effects of symbol and signal systems by developing testable theories, containing lawful generalizations, that explain phenomena associated with production, processing,
and effects.
Not all communication concerns or inquiries fall within
this rubric, however, and as McQuail has noted, the definition represents only one of several models of inquiry
into communicationan empirical, quantitative, and behavioral paradigm. In order to make some sense out of the
fragmentation and coalescence that have characterized
the development of communication as a field of studies,
especially from the 1960s, scholars have continued to
classify communication studies according to theoretical
approaches, methods of inquiries, levels of communication, and so on. For instance, Sven Windahl and Benno
Signitzer have identified two broad approaches to the
definition of communication. There is a transmission approach whereby communication involves a sender/message/channel/receiver model. Proponents of this
approach, which is also referred to as a linear model,
attempt to show how an idea, knowledge, emotion, or
vision is transferred from one person to another. Thus,
George and Achilles Theodorson define communication
as the transmission of information, ideas, attitudes, or
emotion from one person or groups to another (or others)
primarily through symbols.
The second approach described by Windahl and
Signitzer is one characterized by mutuality and shared
perceptions. This approach is referred to variously as the
interactional, ritual, or humanistic model. In this approach, there is recognition that communication entails
the active engagement of both the communicator and the
receiver. Indeed, the proponents of this approach prefer
to blur the distinctions between the roles of the communicator and the receiver of messages, stressing instead the
interchangeable nature of both roles. In this regard,
Gerbner defined communication as social interaction
through messages. Similarly, Wilbur Schramm, in
a later review of his earlier view of communication, offers
this transactional definition: Communication is now seen
as a transaction in which both parties are active. The
Communication
Levels of Communication
In the attempt to make sense of the expansive body of
work that constitutes the communication field, a less contentious analytical approach is to distinguish between different levels of the communication process. This
approach allows us to see the different communication
processes involved at different levels. It also helps to map
the field of human communication study in terms of the
research concerns and focus, as well as the various concepts and theoretical and methodological traditions deployed for its study at each level. This explains why the
study of human communication is characterized by
a multidisciplinary mode of inquiry.
There are two broad categories of communication. The
first concerns the locus of communication activity and the
381
Intrapersonal Communication
The study of intrapersonal communication involves attempts to understand how people make sense of the
world around them and how this in turn impacts the
way people react to the world around them. Thirteen
properties have been proposed to be involved in the
process of intrapersonal communication. These include
perceptions, memories, experiences, feelings, interpretations, inferences, evaluations, attitudes, opinions, ideas,
strategies, images, and states of consciousness. The study
of intrapersonal communication draws on theoretical and
methodological practices from cognitive, behavioral, and
social psychology. Key questions that researchers working
at this level of communication try to find answers to include how peoples perceptions, feelings, and understanding affect their interaction with the world around
them; how people respond to symbols; and how people
store and retrieve information.
Communications scholars working at the intrapersonal
level aim to understand the place of the individual in the
communication process. Intrapersonal communication
has been defined as communication within the self,
and of the self to the self. Communication begins and
ends with the self. The individual is the basic unit of any
social formation and, as such, there are important insights
to be gained on the nature of human communication from
understanding the role of the individual in this complex
process. The self can be understood as composed of different elements. First, there is the inner self, which is
made up of a number of elements such as self-perception,
self-evaluation, and personality. Second, surrounding the
inner self are other elements such as needs, which generate the drive to communicate with others and to interpret communications. Third, there is the element of
cognition, which allows us to make sense of the external
world and the communication coming from it. Because
the self is a very dynamic entity that is in constant interaction with the external world, it has a fourth crucial
382
Communication
world of interconnected beliefs and inferences. Wellknown communication phenomena such as coordination,
coorientation, misunderstanding, emotional empathy,
and identity negotiation are grounded in intersubjectivity.
Because intersubjectivity exists in degrees, an adequate
account of human communication must identify the
mechanisms that generate degrees of intersubjectivity.
Otherwise, communication scholars will be unable to explain problems associated with misunderstanding and its
impact on relational definition, conflict, the sources of
deception, and the lack of coordination. Conversely, by
understanding the mechanisms that generate degrees of
intersubjectivity, communication scholars are able to intervene and correct misunderstandings in social interactions. A good case in point is the application of social
judgment theory to the study of real conflicts that have
emerged from misunderstanding among social actors. By
basing the method for this on an understanding of the
mechanisms that generate degrees of intersubjectivity, it
is possible to identify sources of misunderstanding as well
as to explain the persistence of misunderstanding, even
when one or both parties have sought to address the
sources of that misunderstanding.
Approaches to Measuring Intrapersonal
Communication
Communication scholars have developed several approaches to study communication problems. Three in
particulartrait, transindividual, and cognitive/
interpretive approachesstand out for their potential
for understanding the problems and measurements of
intrapersonal communication.
Trait Approaches Trait approaches locate the core of
communication in the individual and his/her predisposition to initiate action or to respond to behavior. Traits are
understood as a stable predisposition to behavior, and, as
such, trait approaches treat human beings as a complex of
predispositions that are stable over long periods of time
and over various contexts. Although traits are treated as
constant and predictable, the behaviors they give rise to
are often varied. The trait approach has been used widely
to predict social behavior.
Transindividual Approaches These approaches take
communication, particularly its regular features, as deriving from the evolving and ritualized practices of social
aggregates such as dyads, groups, organizations, culture,
and so on. Scholars working in these approaches tend to
locate the center of communication away from the individual and place it instead on social collectives. They tend
to place emphasis on the role that the context of communication plays in the communication process, rather than
the individual. The transindividual approaches use
context to explain communication behavior in two
Communication
ways: behavioral contingency and conventionalized versions. In behavioral contingency, the research focus is on
analyzing or measuring the interactions between individuals or among members of a group that is sufficiently
stable to have a pattern. The emphasis is not on any attributes that the individuals bring to the interaction, but
on the communicative acts that evolve from such interactions. The focus here is on analyzing how communication between individuals is contingent on the behavior of
those individuals involved in an interaction. This approach entails the use of relational analysis that begins
at the level of the dyad rather than at the level of the
individual.
In the conventionalized versions approach, just as in
the behavioral contingency approach, emphasis is on the
context of communication. In this approach, emphasis is
on the shared attributes of members of a linguistic
community, such as knowledge of contexts, knowledge
of communicative forms, and knowledge of the appropriate interpretations for them. The assumption of a
communal pull of cultural resources for communicating
and making sense of the world taken by this approach
leads scholars away from focusing on the individual and
instead, to focusing on the context of interaction. If all
members of a linguistic community are assumed to share
a common knowledge of communicative competence and
performance, then they are culturally interchangeable.
These two models within the transindividual approaches
thus place emphasis on the nature of human communication as social interaction. However, these models are
not very good at explaining the extent to which a message
might impact on another persons behavior or state of
mind, because they fail to identify the mechanisms by
which such impact might occur. When such attempts
have been made, they reveal the need to recognize the
role of the individual in the communication process.
383
from messages, how they retain and transform such information, and how they act on it.
A method of research commonly used in this area of
work is discourse processing. This seeks to explain the
cognitive processes and knowledge structures that individuals use to interpret and produce texts. This research
method seeks to explore the ways that people in a social
interaction accommodate each others communication
competences through adaptive techniques in their production and interpretation of messages. There are many
approaches to the study of discourse, and the term has had
a checkered history in terms of how it is defined and used
in academic disciplines. Suffice it to note here that to the
extent that discourse processing entails the study of the
psychological processes that people deploy to communicative interactions, it can be said to be rooted in the level
of intrapersonal communication. That is not to say,
however, that the method of discourse processing treats
messages as solely governed by psychological processes,
because a considerable amount of attention is also paid to
contextual factors. Work in discourse processing shows
how intrapersonal communication involves a complex
cognitive system that underpins the ability to interpret
and produce messages in social interactions. Although
much communication is social and interactional, an
explanation of the mechanisms by which the twin essential elements of communication, impact and intersubjectivity, take place must incorporate an understanding of
the processes that occur at the intrapersonal level (see
Fig. 1).
Interpersonal Communication
Following from the work of Herbert Mead, in 1934, which
showed that communication is central to the capacity of
Inner self
Self-perception
Image and esteem
Personality
Intellectual, physical, and
social attributes
Self-evaluation
Communication
with others
Communication
from others
Cognition
Figure 1
process.
384
Communication
Group Communication
Research into the study of group influence on human
behavior can be traced back to the 1950s through
the work of the social psychologist Muzafer Sherif. Another social psychologist, Solomon Asch, worked on
the pressures and conformity that the group can exert
on individuals. A major strand of the work in group
Communication
385
Transitive relations
Intransitive relations
386
Communication
A
1
4
E
1
1
5
2
3
4
4
2
1
Mass Communication
Morris Janowitz defined mass communication in 1968 as
the institutions and techniques by which specialized
groups employ technological devices (press, radio,
films, etc.) to disseminate symbolic content to large, heterogeneous, and widely dispersed audiences. Building
on the original (1959) work of Charles Wright, communication scholars have identified other characteristics of
mass communication over the years:
Mass communication involves a large, anonymous,
and heterogeneous audience.
The sources of mass communication are institutions
and organizations that are primarily driven by a profit
motive.
Communication
The flow of mass communication is one-way, from
the source to a multitude of receivers, with little or no
opportunity of interaction for audiences.
The content of mass communication is formulaic and
standardized.
The relations between the senders and receivers
of mass communication are marked by anonymity and
remoteness.
387
Who?
Says what?
In which channel?
To whom?
388
Communication
Communication
389
390
Communication
works and that the notions about the power of the media
were based mostly on observation and speculation.
The limited effects phase This phase witnessed the arrival of empirical research into mass communication as
exemplified by the Payne Fund studies in the United
States in the early 1930s, and principally the work of
Paul Lazarsfeld et al., from the 1940s onward. This era,
according to McQuail, lasted until the 1960s and involved
a large amount of research into various aspects of media
content and form, films, programs, political campaigns,
public communication campaigns, advertising, and other
marketing campaigns. The research efforts of this period
were largely aimed at understanding how specific media
form and content can be used to persuade or inform the
public and to assess any negative effects of media that
might then lead to measures to control the media. Notable
among the studies of this period are Carl Hovlands study
of attitude change among soldiers in the U.S. Army during
World War II in the 1940s and 1950s, which showed that
orientation films were not effective in changing attitudes.
There was also the research by Eunice Cooper and Marie
Jahoda in 1947 on the Mr. Biggott cartoons, which
showed that the factor of selective perception could actually reduce the effectiveness of a message; Lazarsfeld
also did a study of voting behavior, which showed that
people were more likely to be influenced by others around
them than by the mass media.
The return of powerful effects phase This phase was
characterized by a shift in research focus from investigating
short-term, immediate effects on individuals to investigating long-term, cumulative effects on a large number of
people. Writing in defense of the research focus of this
phase, Gladys and Kurt Lang argued that the conclusion
about a minima effect of media represents only one particular interpretation, which had gained currency in media
scholarship at the time. They continued: The evidence
available by the end of the 1950s, even when balanced
against some of the negative findings, gives no justification
for an overall verdict of media impotence. Also at this
time, key among the studies that attributed powerful effects
to mass media are Elisabeth Noelle-Neumanns spiral-ofsilence theory, which argued that three characteristics of
mass communication (cumulation, ubiquity, and consonance) gave them powerful effects on public opinion. According to this theory, the opinions that people hold on any
matter of significance can be influenced through
a perception of such opinion as representing a minority
view. People hold back on their opinion when they perceive
it to be a minority one, for fear of isolation. This withholding
of opinion then influences others to do the sane thing, thus
creating a spiral of silence. The media are said to be a major
source of defining majority views. George Gerbners cultivation theory also ascribed a powerful influence to the mass
Further Reading
Berger, C., and Chaffee, S. (eds.) (1987). Handbook of
Communication Science. Sage, London.
DeFleur, M., and Ball-Rokeach, S. (1989). Theories of Mass
Communication, 5th Ed. Longman, New York.
Heath, R., and Bryant, J. (1992). Human Communication
Research. Lawrence Erlbaum, New Jersey.
Lowery, S., and DeFleur, M. (1988). Milestones in Mass
Communication Research, 2nd Ed. Longman, New York.
McQuail, D. (1994). Mass Communication Theory, 3rd Ed.
Sage, London.
Myers, G., and Myers, M. (1992). The Dynamics of Human
Communication. A Laboratory Approach, 6th Ed.
McGraw-Hill, New York.
Rosengren, K. (2000). Communication. Sage, London.
Severin, W., and Tankard, J., Jr. (2001). Communication
Theories, 5th Ed. Longman, New York.
Comparative Sociology
John R. Hall
University of California, Davis, Davis, California, USA
Glossary
analytic element A distinctive aspect of social phenomena,
deemed to vary across cases.
indirect method of difference John Stuart Mills analytic
strategy based on comparison of two or more groups of
cases that differ in the occurrence of basic propensities.
method of agreement John Stuart Mills analytic strategy
based on identifying propensities always found together in
otherwise diverse cases.
practice of inquiry A research methodology typified on the
basis of how it combines more elemental forms of
discourse, including value discourse, narrative, social
theory, and explanation or interpretation.
qualitative comparative analysis Charles Ragins Booleanalgebra and fuzzy-set-theory approach to comparative
analysis of potentially multiple causal configurations in
a set of cases.
Introduction
Comparisonoccursinsocialinquirywheneverobservations
about one or more aspects of a given case are considered in
relation to observations about other cases or theoretical
models. Comparisons may be formulated either by analyzing relationships among variables for multiple cases or by
investigating the parallels and differences among cases. Yet
similar social phenomena are often genetically connected
to one another. Comparative sociology thus encounters
Classic Approaches
One enduring problem of comparative sociology concerns
how to construe the object of inquiry to be studied. On
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
391
392
Comparative Sociology
the one hand, scientific positivism and, later, analytic realism suggest that research should seek to identify and
study objectively discernible phenomena. Alternatively,
the neo-Kantian legacy suggests that whatever the status
of reality as such, investigators bring phenomena into
distinctive focus by their frames of analytic interest.
From these two alternatives radiate an array of issues
and problems of comparative sociology that are parsimoniously framed by the contrast between two classic approaches, i.e., the scientific comparative method of John
Stuart Mill and the interpretive, or verstehende, comparative practice of Max Weber.
kind of inquiry, namely, a cultural science based on understanding unique social meanings and actions. This was
the central issue of the German Methodenstreit.
Max Weber sought to resolve this conflict by acknowledging the meaningful and cultural bases of both social
life and inquiry, while retaining a rigorous method of
analysis. His approach depended on two central assertions: first, that infinitely varying sociohistorical phenomena become objects of inquiry through cultural (value)
interests in them, and second, that because values cannot
be warranted scientifically, the objects of sociohistorical
inquiry cannot be scientifically determined, even though,
once determined, they may be studied by way of scientific
methods. Weber thus adopted an almost heroic stance,
embracing value neutrality as an ethical obligation,
while engaging in science as a vocation in a way that
warrants pursuit of contextualized truths.
Webers approach was both comparative and based
on the principle of Verstehen (interpretive understanding). Seeking a middle ground between generalization
and historicism, he combined explanation and understanding by using ideal types that would be adequate
on the level of meaning (that is, incorporate specifications of structured cultural meanings and meaningful actions). Such ideal typeswhat Guenther Roth calls
sociohistorical modelsserve as meaningfully coherent
analogues to empirical cases, yet allow for hermeneutic
and causal comparison beyond particularities.
Comparison thus becomes a multifaceted enterprise:
a given case may be compared with various sociohistorical
models, either to explore the degree to which one or
another model subsumes the case within its explanatory
orb, or to pursue refinement of the model. In addition,
multiple cases may be compared with one another in
relation to one or more models. Finally, models may be
compared, whether they are framed at a substantive level
of analysis (e.g., religious ethics of salvation in Christian
Europe) or a purely conceptual one (e.g., instrumentalrational versus value-rational action).
Contemporary Issues
In the 1960s, comparative sociology began to flourish.
Partly, the impetus came from increased interest during
the Cold War in cross-national comparison, and partly
it came from neo-Weberians, neo-Marxists, and other
historically oriented social scientists interested in alternatives to social systems theory, functionalism, and positivism, which they regarded as incapable of critically
engaging issues about alternative historical trajectories.
Various kindred practices also took hold (initially, social
science history, and later, an efflorescence of interdisciplinary research in the human sciences, broadly conceived). By the beginning of the 21st century, researchers
Comparative Sociology
and methodologists had confronted a set of linked problematics concerning case definitions and measurement,
study design, and interdependence of cases and possibilities of generalization.
393
394
Comparative Sociology
Consider war again. If the analytic interest is in explaining war as an outcome, it makes obvious good sense to
compare conditions under which wars occur with those in
which they do not occur. However, spatially, if not temporally, nonwar is something of a default condition, and
a question arises as to which nonwar cases to sample.
Comparison of most nonwar situations with war is unlikely
to be very enlightening. The goal, instead, is to include
cases in which war is a serious possibility but does not
occur. In this example and more generally, then, the task is
to define a population broader than those with a certain
value on the dependent variable but narrow enough to
permit relevant comparisons of meaningfully different
cases.
Moreover, how a dependent variable is defined is an
open question, and explanations may be proffered even
without comparison to cases in which a phenomenon is
absent. Continuing with the example of war, the analytic
interest may be in explaining variations across wars, rather
than explaining whether war occurs. Given this interest,
the dependent variable will be defined in a way that
makes the exclusion of nonwars irrelevant. In short,
whether sampling on the dependent variable is problematic depends on the goals of research, and whether such
sampling occurs depends on how the dependent variable
is defined.
Comparative Sociology
395
incorporated into even formalized comparative methodologies. In a different vein, Geoffrey Hawthorn has developed Webers logic of the mental experiment into
counterfactual analysis, a procedure that sharpens the
specification of conditions under which empirical cases
may be compared to purely hypothetical ones. Hawthorn
holds that the comparative consideration of hypothetical
scenarios can deepen an analysis if the counterfactual
hypotheses are neither so distant from the course of
events to be irrelevant nor so unstable in their dynamics
as to make prediction unreliable.
396
Comparative Sociology
Alternative Practices
By the end of the 20th century, there had been substantial development of comparative methods. Yet the
differencesbetween inductive and deductive historical
sociologists, between those interested in causal explanation versus interpretation, between historical sociologists
whose methodologies are explicitly comparative and historians using comparison only implicitlyoften threaten
to divide practices of inquiry that nevertheless share substantive interests. It is thus important to theorize the
overall domain of sociohistorical inquiry within which
comparative analysis is practiced. For this project, Hall
identifies the ways in which alternative practices of inquiry bring together various forms of discourse, discourses that constitute interrelated moments of analyzing
sociohistorical phenomena.
Four formative discourses (value discourse, narrative,
social theory, and explanation or interpretation) are typically implicated both in comparative inquiry and in case
studies often drawn on in comparative research. Each of
these formative discourses is capable of serving as
a dominant discourse that orders relations among all
four discourses through internal subsumption and external articulation of them, thus consolidating a meaningfully
coherent practice of inquiry. For example, if narrative
discourse predominates, it will order the articulation
among all four discourses in one distinctive practice of
inquiry, whereas the predominance of social theory
will order an articulation of discourses that constitutes
an alternative practice.
In turn, generalizing versus particularizing orientations make a difference in how inquiry works. Thus,
each dominant discourse, such as narrative, orders the
four formative discourses in one distinctive practice
when the goal is general knowledge, and a different
one when the goal is detailed knowledge of a distinctive
phenomenon. Given four alternative ordering discourses
and two (generalizing versus particularizing) orientations
of inquiry, it is possible to identify eight ideal typical
practices of inquiry. In these terms, the three methodologies identified by Skocpol and Somers, theory application,
contrast-oriented comparison, and analytic generalization, are the central comparative practices.
Theory Application
In the practice of theory application, the analyst seeks to
bring parallel phenomena into view via narratives that
apply a particular theoretical lens to the analysis of
cases. The particular social theory dictates the central
issues of comparative plot analysis for the narratives,
and explanation (or interpretation) centers on differentiating theoretically informed versus nontheoretical accounts, and on determining whether the nontheoretical
accounts require modification or disconfirmation of the
theory, or are simply matters that lie outside the theorys
domain. The emphasis on close and careful comparison of
a small number of cases offers bases for deepening
theorization of explanatory accounts and refining theory,
but generalization typically is undermined by the small
number of cases.
Analytic Generalization
Analytic generalization encompasses the formal methods
formulated by Mill and elaborated by Ragin. Here, the
researcher empirically tests or develops hypotheses deduced from theories or induced from observations. Narrative is structured to offer the basis for adjudication of
hypotheses in relation to theories, and the evaluation of
alternative explanations and interpretations mediates the
process of theoretical adjudication. The rigor of this practice approximates the intent of positivism, but problems of
measurement equivalence and sample size, discussed
previously, can threaten validity.
Contrast-Oriented Comparison
Explanation and interpretation are the central discursive
concerns that order inquiry oriented to the production of
bounded generalizations and rules of experience
through contingent and idiographic analysis of sociohistorical phenomena deemed kindred in relation to
a theoretical theme. The focus is on how a particular social
phenomenon (e.g., proletarianization, fundamentalism)
Comparative Sociology
Configurational History
The configurational history methodology operates by theoretically identifying the elements, conditions, and developments necessary for a particular (configurational)
social phenomenon to occur, e.g., modern capitalism,
or a particular technology of power. The theoretically
defined configuration is then used as a basis for generating
questions of historical analysis about the fulfillment of
conditions, creation of elements, etc. This strategy is
not inherently comparative in the conventional sense,
but it involves a strong use of social theory in relation
to historical analysis, and is thus is favored by historical
sociologists (e.g., Max Weber and Michael Mann) who
seek to develop sociologically informed explanations of
distinctive historical developments.
The cultural turn in epistemology, especially as underwritten by Foucault, has created conditions of substantially increased sophistication about comparative
research. Once methodologies of inquiry are understood
via a typology that traces their alternative relationships to
more elemental forms of discourse (narrative, social theory, etc.), comparative research becomes located in relation to a broader domain characterized by a condition of
integrated disparity, in which diverse practices invoke
radically alternative logics, while nevertheless retaining
sufficient points of mutual articulation to permit
397
Analysis,
Sociology
Further Reading
Burke, P. (1993). History and Social Theory. Cornell
University Press, Ithaca, New York.
Espeland, W. N., and Stevens, M. L. (1998). Commensuration
as a social process. Annu. Rev. Sociol. 24, 313 343.
Hall, J. R. (1999). Cultures of Inquiry: From Epistemology
to Discourse in Sociohistorical Research. Cambridge
University Press, Cambridge.
Hawthorn, G. (1991). Plausible Worlds: Possibility and Understanding in History and the Social Sciences. Cambridge
University Press, New York.
Lieberson, S. (1992). Small Ns and big conclusions: An
examination of the reasoning in comparative studies
based on a small number of cases. In What Is a Case?
(C. C. Ragin and H. S. Becker, eds.), pp. 105 118.
Cambridge University Press, Cambridge.
McMichael, P. (1990). Incorporating comparison within
a world-historical perspective. Am. Sociol. Rev. 55,
385 397.
Ragin, C. C. (1987). The Comparative Method. University of
California Press, Berkeley.
Ragin, C. C. (2000). Fuzzy-Set Social Science. University of
Chicago Press, Chicago.
Ragin, C. C., and Becker, H. S. (eds.) (1992). What is a Case?
Cambridge University Press, New York.
Roth, G., and Schluchter, W. (1979). Max Webers Vision of
History. University of California Press, Berkeley.
Skocpol, T. (1984). Emerging agendas and recurrent strategies
in historical sociology. In Vision and Method in Historical
Sociology (T. Skocpol, ed.), pp. 356 391. Cambridge
University Press, Cambridge.
Stinchcombe, A. L. (1978). Theoretical Methods in Social
History. Academic Press, New York.
Tilly, C. (1984). Big Structures, Large Processes, Huge
Comparisons. Russell Sage, New York.
Glossary
complex systems Emergent open systems that are far-fromequilibric, having nonlinear and self-organized behavior.
complexity science A multidisciplinary field concerned with
the analysis of complex systems.
emergent dynamics Those dynamics with properties qualitatively different from the properties of the component
parts.
far-from-equilibric A state in which there is continual flow
and change and yet which enables an overall structure to
form and be maintained.
nonlinearity In nonlinear systems, the output of the system
cannot be calculated by the sum of the results of interactions. This means that small changes within the system
over time can lead to large-scale, often unpredictable,
transformations in the system.
open system A system that is open to interaction in terms of
the exchange of information or energy with its environment.
self-organization Emergent dynamics that are the consequence of interaction through positive and negative feedback not fully determined by the early phase of system
development, the intentions of the interaction parts, nor
external forces.
Complexity Science
The term complexity science can be misleading because
it implies a unified body of science. There are attempts to
generate such unification, exemplified by the work of the
Santa Fe Institute and more recently the United Kingdom
Complexity Society. Indeed, for some, it is precisely the
possibility of unification between disciplines that has generated the interest in complexity science, exemplified by
the 1996 Gulbenkian Commission on Restructuring the
Social Sciences. While the idea of a unified science has
been very prominent in representations by popular science writers, within social science strong claims have also
been made about the potential of complexity science for
such unification. Social science writers argue that, now
that the inherent complexity and unpredictability of nonlinear natural phenomena have been revealed, the complexity of the social world can no longer be seen as distinct
due to its complexity. Both the natural and social world
can, they propose, display similar complex dynamics, requiring, therefore, similar explanation according to the
properties of complex dynamical systems. Though there
are disputes about just what complexity science is (for
example, whether it is a science of modernity or
a postmodern science), there is nonetheless a shared
concern, developed in recent decades, that phenomena
characterized by emergent, self-organized, and nonlinear
behavior share similar characteristics that render traditional linear methods of understanding limited.
Whereas there had been much initial interest in the
application of chaos theory to social science during the
early 1990s, interest in the implications of nonlinearity
became consolidated through the notion of complexity
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
399
400
science. Complexity science brought a focus on nonlinearity, with the limits of knowledge being understood as
a consequence of the character of systems, rather than
a problem of knowledge generation. Because complex
systems could not be broken down into their component
parts, the parts could not be analyzed to explain the whole
and predict future states. Complex systems have emergent
characteristics that are qualitatively different from those of
their parts. Complexity science therefore incorporates an
interest in understanding the emergent patterns of system
behavior and in developing models that can be applied to
a whole host of phenomena. It is the apparent ubiquity
of the characteristics of models of complexity science
that has generated the interest in complexity science, as
a new science that can integrate different disciplines.
Key features can be identified to describe complex
systems:
1. Self-organized emergence. Emergent dynamics
typify those complex systems having properties that are
qualitatively different from the properties of the component parts of the system, so that the whole is not reducible
to the sum of its parts. Complexity science has been interested in how such dynamic emergence comes into
being, given that such emergent differences are not inherent in the early phase of system development, nor the
intentions of the interaction parts, nor is it predetermined
or predictable from external forces impinging on the system. Rather the emergence is self-organized through the
positive and negative feedback relationships between interacting parts, the emergent dynamics, and those parts.
2. Nonlinearity. Important to the interest in selforganized emergence has been recognition of the significance of the nonlinearity of these dynamics, in which
small changes in a system over time can be iteratively
amplified so that they do not necessarily lead commensurately to small effects. In nonlinear systems, recursive
relationships open up the possibility that small changes
over time can lead to large-scale, often unpredictable,
transformations in the system. In other words, smallscale local change, in both natural systems and complex
social or cultural fields, can lead to large-scale global
transformation of the entire system. Nonlinear systems
dynamics can therefore become very sensitive to the initial conditions of the system, rendering problematic specific prediction.
3. Self-organized criticality. Complexity science has
a particular interest in the role of self-organized nonlinear
emergence for understanding life and its evolution. The
quest has been to explore how systems maintain themselves in optimum states for survivalthat is, how they
maintain appropriate stability, continuity, and order while
at the same time nurturing the capacity for local forms of
instability, change, and disorder. Particular attention
has been given to systems maintaining themselves at
previously hadin other words, because it is right. Alternatively, an emphasis can be made on shifts in our cultural
condition, in which many areas of life are becoming
destabilized and hence we have developed more sensitivity toward approaches that explore instability, chaos, and
change. This debate is significant for understanding the
development of the application of complexity science to
the social world, and gains increased prominence in the
context of the infamous Sokal affair, in which social
scientists were accused of misusing metaphors from
within the sciences. That said, however, within those
extremes, the strength of complexity science can be
seen in terms of its ability to offer better accounts of
the world in relation to shifting concerns about instabilities humans now face. Importantly then, complexity
social scientists have wanted to do more than just use
the metaphors of complexity science to describe the social
world. They have wanted to show how social science can
contribute to complexity science and how complexity science, applied and developed for the social world, can
illuminate real processes of social dynamics, even if
that knowledge is always local. The problem has been
how to do this. In relation to the concerns of social measurement, are three particular areas of attempts to achieve
this: mathematical modeling, simulation, and reinterpreting quantitative measurement.
401
Developing Simulation
While computer technology led to a new interest in nonlinear equations, computer technology also enabled the
402
(Re)interpreting Quantitative
Data
Though there have been some questions about the limits
of applying nonlinear mathematics to the social world, and
403
dynamic, then this suggests the need for continued adjustment of interventions. One implication is that when
considering strategies or planning, for example, the message is to see these as processes that are continually reflexive, in order to be adaptive to changing internal and
external environments. Another implication is that it becomes necessary to involve different forms of knowledge
throughout the system. For these reasons, many involved
in complexity research work with those engaged in the
world, exemplified by the notion, for example of integrative method.
Further Reading
Brockman, J. (ed.) (1995). The Third Culture: Beyond the
Scientific Revolution. Simon and Schuster, New York.
Byrne, D. (1998). Complexity Theory and the Social Sciences:
An Introduction. Routledge, London.
Byrne, D. (2002). Interpreting Quantitative Data. Sage,
London.
Capra, F. (1996). The Web of Life: A New Synthesis of Mind
and Matter. Harper Collins, London.
Cilliers, P. (1998). Complexity and Postmodernism: Understanding Complex Systems. Routledge, London.
Eve, R. A., Horsfall, S., et al. (eds.) (1997). Chaos, Complexity,
and Sociology: Myths, Models, and Theories. Sage, London.
Gilbert, N., and Troitzsch, K. G. (1999). Simulation for the
Social Scientist. Open University, Buckingham.
Kauffman, S. (1995). At Home in the Universe: The Search for
Laws of Self-Organization and Complexity. Oxford University Press, Oxford.
Khalil, E. L., and Boulding, K. E. (eds.) (1996). Evolution,
Order and Complexity. London, Routledge, London.
Kiel, L. D., and Elliott, E. (eds.) (1996). Chaos Theory in the
Social Sciences: Foundations and Applications. University
of Michigan Press, Ann Arbor.
Prigogine, I. (1997). The End of Certainty. The Free Press,
New York.
Prigogine, I., and Stengers, I. (1984). Order Out of Chaos:
Mans New Dialogue with Nature. Heinemann, London.
Richardson, K., and Cilliers, P. (2001). What is complexity
science? Emergence: J. Complex. Iss. Org. Mgmt. 3(1),
special edition.
Urry, J. (2003). Global Complexity. Polity, Cambridge.
Waldrop, M. M. (1992). Complexity: The Emerging Science at
the Edge of Order and Chaos. Viking, London.
Computer Simulation
Louis N. Gray
Washington State University, Pullman, Washington, USA
Glossary
choice point A location, usually within a program, that can
branch in two or more directions. The direction chosen
depends on the underlying program.
compile A procedure by which a program or subroutine is
translated into machine language (the fundamental
structure of the operating system; usually binary).
endogenous variables The variables on which a system
focuses; they are interrelated and may change as processes
evolve within a simulation.
exogenous variables The variables external to a system; they
are ordinarily set prior to simulation and cannot be altered
within the simulation.
falsifiability A property of a statement (proposition or hypothesis) that makes it possible to show that the statement is false.
For example, the statement I am six feet tall is falsifiable
(given measurement assumptions); the statement Someone
is six feet tall is not falsifiable. In science, specific empirical
statements depend on the availability and applicability of
appropriate measurement techniques.
flowchart A graphic depiction of elements of a simulation,
with specific attention to choice points.
isomorphism A feature of a model or program that operates
conceptually in a manner identical to a feature identified
among humans, social groups, organizations, or social
institutions.
model world The statement of a theory as specified in
a simulation. The statement includes all the choice points
and processes for determining outcomes.
moment-generating function A mathematical function that,
with appropriate variable substitutions, produces the
moments (central or raw) of a defined distribution. These
functions completely specify the characteristics of
a distribution.
process A procedure involving a time dimension and, thus,
can result in either stability or change under specified
conditions.
program Sets of instructions that control the operation of
a computer.
Historically, the term simulation has been used to describe procedures that attempt to mimic systems believed
to exist in nature. Theory-based, simulated model systems must be distinguished from empirical (real-world)
systems, the output of which is observable. Simulation
techniques have come to replace the closed-form mathematical approaches once used to examine model systems. In particular, computer simulation forces
examination of a theory in detail and requires addressing
potential alternatives.
Introduction
Closed-Form Mathematics
Closed-form mathematical solutions are those that
permit unique solutions to equations or systems of
equations. The variables in the equations can be either
exogenous or endogenous, but attention is centered on
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
405
406
Computer Simulation
Operationalization
Program/theory
Empirical
observations
Simulation
output
Early techniques involved mechanical or paper-and-pencil operations that, it was hoped, behaved isomorphically
with the target system as it moved from state to state. Prior
to the ready availability of personal computers, a variety of
procedures for randomization and for evaluating outcomes were employed. Unfortunately, these approaches
were limited in their applicability due to logistical requirements and the inevitable problems with implementation.
Initialize
A and B
Is A B?
Yes
Take action 1
No
Take action 2
Computer Simulation
Current simulation techniques involve the computer programming of processes analogous to those believed to
characterize social systems. The speed of modern computers makes it possible to develop virtual representations
of processes involving many variables and extended time
periods. To some extent, computer simulation has replaced the thought experiment as a way of constructing,
understanding, analyzing, and testing theories about social processes.
Computer Simulation Defined
Computer simulation can be defined as a program of
instructions implemented by a machine that is intended
to be isomorphic in output (graphic, numeric, or both) to
measurement of a real or hypothetical social process.
Process The process can be one that is believed to exist
or one whose existence is in question, but with implications that are being addressed. In either case, there is
a theoretical assertion that the process defined in the
model world works as if the hypothesized process was
operating in the empirical world (see Fig. 1). The process
may involve transitions between states of a system (as in
discrete-time Markov processes), the temporal course of
a system (as in continuous-time Markov process), or
any alternative the theorist can envision and program.
The hypothetical process, for example, may involve
feedback-control processes subject to stochastic input
that cannot be easily addressed in closed-form mathematics. The process is usually outlined in a flowchart.
Feedback
Update
A and B
No
Done?
Yes
End
Computer Simulation
407
408
Computer Simulation
Programming Languages
The programming language used for simulation requires
at least two sometimes contradictory features: (1) it needs
to be flexible enough to permit a variety of mathematical
operations or their approximation and (2) it needs to be
understandable to a focal audience of theorists and researchers without black box features that obscure its
operation.
Computer Simulation
Clear Operation
For a simulation to be evaluated by other theorists, researchers, and/or programmers, it is necessary that all
aspects of the simulation be understandable to as many
potential users as possible. To the extent that a simulation
contains some kinds of magic (i.e., a knowingly unrealistic component of a simulation), there should be skepticism about the results. If processes are involved in the
simulation that are other than those imagined to operate,
or are hidden from view in such a way that problems are
suspected, then the usefulness of the simulation, either
for the development of theory or for the analysis of data, is
suspect. The notion that faith should be maintained is as
foreign to a scientific use of simulation as it is to other
applications of science. This is not to say that elements
used in simulations cannot make use of simplifications or
subcomponents that are compiled separately from the
main code, just that there must be some point at which
these elements are accessible and can be seen to represent
currently understood or hypothesized processes.
409
Future Directions in
Computer Simulation
The range of activities that future computer simulations
might include is possibly broader than can be anticipated,
but two can be predicted: (1) the growth of specialized
languages and/or software and (2) increased blurring of the
demarcation between simulation and empirical research.
Specialized Languages
As users seek increasingly rapid ways of anticipating research and policy outcomes, the development of software
to fill those needs is likely. Generally, attempts to increase
the ability to respond to requests for information in these
areas should be welcomed, but care should be taken that
the need for information does not result in incorporation
of empirically insupportable processes. It is a very human
Further Reading
Bainbridge, W. S. (1998). Sociology Laboratory: IBM Pc/
Manual and 256K Diskette. Wadsworth Publ., Belmont,
California.
Banks, J. (ed.) (1998). Handbook of Simulation: Principles,
Methodology, Advances, Applications, and Practice. Interscience, New York.
Berk, R. A., Bickel, P., Campbell, K., Keller-McNulty, S. A.,
Fovell, R., Kelly, E. J., Sacks, J., Park, B., Perelson, A.,
Rouphail, N., and Schoenberg, F. (2002). Workshop on
statistical approaches for the evaluation of complex
computer models. Statist. Sci. 17, 173192.
Cameron, I., and Hangos, K. (2001). Process Modeling and
Model Analysis. Academic Press, San Diego.
Casti, J. L. (1992). Reality Rules: Picturing the World in
Mathematics. 2nd Ed. Wiley-Interscience, New York.
Donahoe, J. W., and Palmer, D. C. (1994). Learning and
Complex Behavior. Allyn and Bacon, Boston.
410
Computer Simulation
Computer-Based Mapping
Alberto Giordano
Texas State University, San Marcos, Texas, USA
Glossary
animated cartography The representation of movement or
change on a map. An example is the representation of the
path of a hurricane on a map, achieved by moving a dot
across an area with velocities and directions corresponding
to the ground speeds and directions of the eye of the
hurricane at different times. In another example, consecutive maps of U.S. population by state at census years
could be displayed at set time intervals (e.g., 10 years equals
5 s), allowing the viewer to detect change patterns. The
most important aspect of an animated map is that it depicts
something that would not be evident, or not as evident, if
the frames were viewed individually: what happens
between each frame is more important than what exists
on each frame. The creation of animated (or dynamic) maps
has become possible with the advent of computer
cartography and the interactive map and during the 1990s
dynamic maps came to be routinely used by cartographers.
computer-based mapping (computer cartography) The
creation of maps with computers, made possible by
specialized hardware (e.g., digitizers, scanners) and software
applications. A fundamental characteristic of computerbased mapping is the separation between data management
and display functionalities. This separation makes tasks
such as the update and editing of maps easier than it is in
traditional pen-and-ink paper cartography. Computer cartography has considerably extended the representational
capabilities of traditional maps, for example, making it
possible to create multimedia and animated maps. The
World Wide Web has further revolutionized cartography,
creating an entirely new distribution channel.
interactive map A type of computer-based map designed to
allow user interaction and exploration. Functionalities
implemented in interactive maps include pan, zoom in,
zoom out, and the interaction with hotspots, symbols map
users can click on to access additional information related
to the spot. Additionally, by turning on and off different
layers, users are often able to choose which information
Introduction
For most of its history, cartography was a highly
institutionalized enterprise. Cartographers were often
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
411
412
Computer-Based Mapping
Computer-Based Mapping
A Brief History of
Computer Cartography
As already mentioned, the first examples of the use
of computers in cartography were meant to aid map
413
414
Computer-Based Mapping
mapping systems are arguably the most important developments to occur in the field in the 1980s and 1990s.
Multimediality and animation have further changed the
face of cartography and will likely revolutionize the field
in the future. These new developments are discussed in
detail in the next sections.
Computer-Based Mapping
separated from cartographic data management functionalities. When a user clicks on a symbol, the information
that pops up is retrieved from the database and displayed
on the screen in a way that is similar to what was described
earlier in relation to overlays. Interactivity and database
connection have given the user significant freedom when
it comes to making customized maps. For example, the
online version of the National Atlas of the USA (available
at http://www.nationalatlas.gov) lets the user make dozens
of separate maps varying the geography and the variables
mapped. This flexibility is typical of electronic atlases,
a revisited version of traditional paper atlases that are
becoming a prominent feature of government agencies
Web sites, especially in North America and Europe.
A final example of how computers have changed the
way that users interact with the map is the case of choropleth maps. These are maps that portray the geographical
variability of a quantitative variable by shading geographical units according to the value of the variable. For example, a cartographer might wish to create a map showing
median household income in U.S. counties in the year
2000. The choropleth map is often used to display this
type of data. Probably the most important decision
a cartographer has to make when creating a choropleth
map is how to subdivide the data in classes. Assigning a unique shade of colorfor example, a shade of
greento all the over 3000 counties based on their household income would make it very difficult to extract information regarding individual counties from the map.
A solution is to: (1) group the over 3000 values in, say,
five classes ranging from the lowest to the highest value;
(2) assign a shade of green from light to dark to each class
so that the higher the income, the darker the shade of
green; (3) color the counties a shade of green corresponding to the countys median household income. If one tries
to create this same exact map at the U.S. Bureau of the
Census Web site (available at http://www.census.gov)
using Census 2000 data, the result is a choropleth map
in which the five classes are subdivided as follows:
(1) $9243 $23,750; (2) $23,848 $33,006; (3) $33,026
$41,183; (4) $41,201 $53,804; and (5) $53,945
$82,929. Looking at this map, the user will recognize
that geographical patterns are present in the distribution
of the value of median household income in the United
States. For example, the corridor between Washington,
DC and Boston, and the suburban counties around
Atlanta, Minneapolis, Denver, and San Francisco are
wealthy, with several counties in the fifth class. The southern states and parts of the southwest and the northwest
have large numbers of relatively poor counties, with several counties in the first class. To obtain the five groups
above, counties were first listed according to their median
household income in ascending order and then grouped
in five classes, each containing the same number of
elements (approximately 600). This procedure is called
415
416
Computer-Based Mapping
Generalization
These include functionalities to, for example, reduce
the number of points in a line, smoothing its appearance,
and reducing its size in bits; merging geographical units
(e.g., create a map of the states of the United States from
a map of the counties of the United States by merging
all counties that belong to the same state); reducing
the number of cities displayed on the map by using
a population threshold (e.g., no cities with less than
1 million people).
Cartographic Measurements
These operations include calculating the length of
a line, the distance between points, the area and perimeter of a polygon, and so on. Vector and raster systems
implement very different algorithms to take these types of
measurements.
Symbolization
Computer-mapping systems are particularly powerful as
tools to aid cartographic design. Especially in the most
recent releases, cartographic and GIS software packages
present the user with a wide choice of symbols and provide flexibility for designing or importing additional
symbols from other sources and for varying the size,
shape, and color of symbols. Additionally, the user can
easily customize the layout of the map, changing the position and characteristics of the legend, the title, the scale
bar, the neat line, and other cartographic elements.
However, it should be noted that there is one area in
which computer-based mapping systems have thus far
yielded disappointing results: the automated placement
of text on a map. This is a challenging task because different and often contrasting needs must be reconciled
when placing labels on a map. Text needs to be easily
readable and so labels should be quite large, but at the
same time space is often a scarce commodity, especially in
general-purpose maps that aim at portraying all physical
and human elements of a certain area. Also, labels need to
be placed so that it is clear to which feature they refer to,
but this is often difficult especially in areas where there
are many symbols referring to the same class of features.
For example, placing labels for cities in highly urbanized
regions or labels for countries in Europe can be a very
challenging task. The problem of automatically placing
labels has been tackled in different ways. One approach
that seems particular promising is artificial intelligence,
perhaps one of the future directions of development of
computer-based mapping.
Computer-Based Mapping
examples of the revolution brought to the world of cartography by the World Wide Web. From the point of view
of computer cartography, the revolution is one of distribution and of democratization, as already mentioned. To
distribute a map to potentially millions of viewers, one
simply need put it on the Internet. Compare this relatively
simple action with the investments necessary to produce
and distribute a map in the world of traditional print
cartography. Ease of distribution has led to exponential
growth in the number of maps produced and in the variety
of their styles and designs and it has also facilitated
the public exchange of competing views on the use of environmental and economic resources. Issues such as environmental justice and sustainable development are often
discussed using maps as tools for scientific visualization.
On the other hand, the World Wide Web has not per se
changed the way that maps are made. Multimedia and
animated cartography, however, have. Animated cartography was pioneered by Thrower, who published
a paper entitled Animated Cartography, in a 1959
issue of The Professional Geographer, and by Tobler,
who in 1970 developed a computer animation simulating
urban growth in the Detroit region. The main advantage
of animatedor dynamiccartography is that it overcomes the traditional immobility of paper maps, in
which time is kept constant (e.g., Household income
in U.S. counties in 2000). Animated maps can show
time changes, for example, by flashing a series of maps
representing household income in 2000, 2001, 2002, and
2003. Animated maps can also represent another phenomenon that is problematic to deal with in traditional
paper maps: movement. Traffic can be shown at different
times of the day, a hurricanes path can be tracked
in almost real time, and migration can be studied more
effectively. When animation is combined with
multimediality, the possibilities of traditional cartography
increase exponentially. Multimedia cartography is the
417
Further Reading
Cartwright, W., et al. (eds.) (1999). Multimedia Cartography.
Springer-Verlag, Berlin /New York.
Clarke, K. C. (1995). Analytical and Computer Cartography.
Prentice Hall, Englewood Cliffs, NJ.
Dodge, M., and Kitchin, R. (2001). Mapping Cyberspace.
Routledge, London.
Kraak, M.-J., and Brown, A. (eds.) (2001). Web Cartography.
Taylor & Francis, London.
MacEachren, A. M. (1995). How Maps Work: Representation,
Visualization, and Design. Guilford, London.
Peterson, M. P. (1995). Interactive and Animated Cartography. Prentice Hall, Englewood Cliffs, NJ.
Thrower, N. (1959). Animated cartography. Profess. Geogr.
11(6): 9 12.
Tobler, W. R. (1959). Automation and cartography. Geogr.
Rev. 49, 526 534.
Tobler, W. R. (1970). A computer movie simulating urban
growth in the Detroit region. Econ. Geogr. 46(2): 234 240.
Computer-Based Testing
Richard M. Luecht
University of North Carolina at Greensboro, Greensboro,
North Carolina, USA
Glossary
automated test assembly (ATA) Involves the use of
mathematical programming algorithms or heuristics to
select optimal test forms that simultaneously meet statistical
specifications as well as any number of content and other
test construction constraints.
computer-adaptive testing (CAT) A test process that
adapts in difficulty to the apparent proficiency of the
test taker. CAT is usually more efficient than conventional
fixed-item testing because it either reduces the number of
items needed to achieve a prescribed level of measurement
precision (reliability) and/or it achieves more precision
across a broader range of the score scale with a fixedlength test.
computer-based testing (CBT) A test process delivered on
a computer. Computer-based tests tend to differ in terms of
the level of adaptation to the ability of examinees, the size
of test administration units employed (items versus testlets),
the type of connectivity required to interactively transmit
data, the types of items supported, the test assembly
methods employed, and the nature and extent of test form
quality control mechanisms used.
multistage testing (MST) The administration of tests in
stages. Multi-item modules called testlets are typically
assigned to each stage. Examinees complete an entire
testlet before moving on. Scoring and adaptive routing
decisions can be employed between stages to achieve some
degree of test adaptation.
test-delivery driver A software test administration application that typically performs six basic operations:
(1) provides authorized navigation by the test taker,
(2) selects the items to administer (fixed sequence, random,
or heuristic based, such as a computer-adaptive test),
(3) renders the test items, (4) captures responses,
(5) timing (e.g., section time outs), and (6) real-time
scoring, which may be needed for adaptive testing as well
as final scoring, if a score report is immediately provided
to the examinee.
Computers in Testing
Computers have influenced almost every aspect of testing,
from test development and assembly to administration,
scoring, reporting, and analysis. The proliferation of personal computers (PCs), rapid improvements in network
technology and connectivity, and new developments in
adaptive testing technology and automated test assembly
have made computer-based testing (CBT) a mainstay for
virtually all types of testing, including educational tests,
certification and licensure tests, psychological assessments, and even employment tests.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
419
420
Computer-Based Testing
are eligible to take the test (e.g., have met certain educational requirements) within prescribed time frames and
that they pay any appropriate fees. Multiple application
access modes can be used for eligibility and registration,
including mailing in hardcopy applications, telephone access, and on-line Internet-based registration.
The registration and scheduling system also needs to
locate an available computer seat for each examinee.
Many of the commercial CBT vendors use dedicated test
centers with a fixed number of test workstations. Therefore, the registration and scheduling system needs to find
a location and time for each applicant to take the test.
Many of the scheduling systems used by the major CBT
vendors work in a similar way to an airline reservation
system that reserves a particular seat for each passenger
on a specific flight. Due to obvious capacity limitations in
any fixed testing site network, there can be enormous
competition for computer seats during certain times of
the yearespecially at the most convenient locations in
major metropolitan areas.
Test Administration and Delivery
Taking a computer-based test once implied sitting in front
of a dumb computer terminal connected to a mainframe
computer and responding with the keyboard to a sequence
of test questions. The rapid increase in availability of personal computers and improvements in connectivity, including networking technology and the emergence of the
Internet, have led to ever-improving distributed models
for testing. Modern CBT connectivity encompasses PC
local-area networks running in dedicated testing centers,
wide-area networks running in multiple locations, virtual
private networks built on Internet technology, and even
remote wireless networks capable of administering tests
on handheld personal digital assistants, pocket PCs, and
other small digital devices. Today, most computer-based
testingat least most high-stakes computer-based testingis conducted at dedicated test centers. These testing
centers have full-time test proctors and typically offer
a secure, quiet, and comfortable testing environment.
The actual test may be housed on a file server at the
center, or the workstations at the center may connect
directly to a central processing facility via the Internet
or a private connection.
One of the most important components of any CBT
test administration and delivery system is called the testdelivery driver, which is a software application that logs
the examinee into the test, administers the test by presenting the items in some prescribed sequence, may allow
the examinee to navigate around the test, carries out executive timing of the test sections, records appropriate
actions or responses, and transmits the actions and responses to an appropriate storage repository. The testdelivery driver may also conduct real-time scoring and
report the scores to the test taker. A test-delivery driver
Computer-Based Testing
needs to support multiple item types. Included are multiple-choice items, open-ended response items, essays
requiring word processing, computational problems,
items using interactive graphics, and custom computerbased work simulations. The test-delivery driver must also
support multiple CBT delivery models, some of which
require sophisticated item selection activities, including
using automated test assembly. These latter types of
models include computer-adaptive testing and adaptive
multistage testing.
Postexamination Processing
Scoring and reporting are two of the most common
postexamination processing activities for CBT. Very
few high-stakes computer-based examinations immediately release scores to the test takers, even though that
technology is rather trivial to implement. Instead, the
response data are transmitted to a central processing
facility for additional quality assurance processing and
to ensure the integrity of the scores. Scoring and reporting
to the examinees is done from that facility.
Postexamination processing also includes conducting
many types of psychometric analyses, including item analysis, item calibration and equating, and research studies
meant to improve the quality of the test. These types of
studies are routinely performed by most major testing
organizations.
421
422
Computer-Based Testing
423
Computer-Based Testing
Linear-on-the-Fly Tests
A variation on CFT is linear-on-the-fly testing (LOFT).
LOFT involves the real-time assembly of a unique fixedlength test for each examinee. Classical test theory or item
response theory (IRT) can be used to generate randomly
parallel LOFT test forms. There are at least two variations
of the LOFT model: a large number of unique test forms
can be developed far in advance of test administration
(which is a merely a special case of CFT, where ATA is
employed, as noted previously) or test forms can be generated immediately prior to testing (i.e., in real time). A
benefit of developing the test forms in advance is that
content and measurement experts can review each form.
The primary advantage of the LOFT model is that
numerous forms can be developed in real time from
the same item pool. Furthermore, there is typically
some overlap of items allowed across the test forms.
When test forms are assembled just prior to administration, the current exposure levels of the items can be considered in the test assembly algorithm. At-risk items can
be made unavailable for selection. For real-time LOFT,
explicit item exposure controls can be used to limit the
exposure of particular items, in addition to the random
sampling scheme. The benefits of LOFT include all those
associated with CFTs with the addition of more efficient
item pool usage and reduced item exposure.
The disadvantages of LOFT are similar to those of CFTs
(i.e., decreased measurement efficiency and exposure risks
if test banks are relatively small, limiting the number of
forms that can be produced). In addition, real-time LOFT
may limit or altogether preclude certain quality controls
such as test content reviews and data integrity checks. Although some quality assurance can be integrated into the
live test assembly algorithm, doing so tends to complicate
the functionality of the test delivery system and introduces
additional data management challenges (e.g., reconciling
examinee records). This latter problem can be slightly reduced in terms of risks to the integrity of the data by creative
database management (e.g., using system-generated test
form identifiers for every LOFT form).
Computer-Adaptive Tests
A computer-adaptive test adapts or tailors the exam to
each examinee. Under the purest form of CAT, this tailoring is done by keeping track of an examinees performance on each test item and then using this information to
select the next item to be administered. A CAT is therefore developed item-by-item, in real time, by the testdelivery driver software. The criteria for selecting
1.0
0.8
CAT
Random test
0.6
0.4
0.2
0.0
0
10
20
30
Item sequence
40
50
Figure 1 Average standard errors for a 50-item computeradaptive test (CAT) vs 50 randomly selected items.
424
Computer-Based Testing
425
Computer-Based Testing
Panel #003
Panel #002
Panel #001
A
A
F
D
D
D
Stage 1
Stage 2
Easier
G
G
Harder
nonadaptive tests, the ability of content experts and sensitivity reviewers to review the preconstructed testlets to
evaluate content quality, and the ability of examinees to
skip, review, and change answers to questions within
a testlet or stage.
Stage 3
426
Computer-Based Testing
Conclusions
CBT was once envisioned to make testing less complex
and cheaper; but the opposite tends to be true. CBT
systems are complex and expensive to operate. There
are also many types of CBT delivery models to consider
and, clearly, no singular CBT model is ideal for every
testing program. But progress is being made and each
new generation of CBTs seems to improve on the previous
generation. Today, testlet-based CAT, CMT, and CAMST have emerged as highly useful test-delivery models
that have attempted to reconcile some of the shortcomings of CFT and CAT. In the case of CA-MST, the use of
preconstructed testlets and highly structured panels
yields distinct improvements in test form quality control,
better security, more parsimony in data management, and
important system performance advantages. Yet, there is
room for improvement, and there will no doubt be new
CBT technologies and delivery models, as well as new
issues and perspectives that should be considered in
evaluating those models.
Further Reading
Folk, V. G., and Smith, R. L. (2002). Models for delivery of
CBTs. In Computer-Based Testing: Building the Foundation for Future Assessments (C. Mills, M. Potenza,
J. Fremer, and W. Ward, eds.), pp. 4166. Lawrence
Erlbaum, Mahwah, NJ.
Hambleton, R. K., and Swaminathan, H. R. (1985). Item
Response Theory: Principles and Applications. Kluwer,
Hingham, MA.
Lord, F. M. (1980). Applications of Item Response Theory to
Practical Testing Problems. Lawrence Erlbaum Assoc.,
Hillsdale, NJ.
Luecht, R. M. (1998). Computer-assisted test assembly using
optimization heuristics. Appl. Psychol. Measure. 22, 224236.
Computer-Based Testing
Luecht, R. M. (2000). Implementing the Computer-Adaptive
Sequential Testing (CA-MST) Framework to Mass Produce
High Quality Computer-Adaptive and Mastery Tests. Paper
presented at the annual meeting of the National Council on
Measurement in Education, New Orleans. Available on the
Internet at www.ncme.org
Luecht, R. M., and Nungester, R. J. (1998). Some practical
examples of computer-adaptive sequential testing. J. Edu.
Measure. 35, 229249.
Parshall, C. G., Spray, J. A., Kalohn, J. C., and Davey, T.
(2002). Practical Considerations in Computer-Based Testing. Springer, New York.
Sands, W. A., Waters, B. K., and McBride, J. R. (eds.) (1997).
Computerized Adaptive Testing: From Inquiry to
Operation.
American
Psychological
Association,
Washington, D.C.
Sheehan, K., and Lewis, C. (1992). Computerized mastery
testing with nonequivalent testlets. Appl. Psychol. Measure.
16, 6576.
427
Computerized Adaptive
Testing
Daniel O. Segall
Defense Manpower Data Center, U.S. Department of Defense, Washington, D.C., USA
Glossary
content balancing A set of one or more ancillary itemselection constraints based on content or nonstatistical item
features.
conventional testing An approach to individual difference
assessment whereby all examinees receive the same items,
typically (but not necessarily) in printed mode.
exposure control algorithm An algorithmic enhancement to
precision-based item selection that limits the usage rates of
some highly informative items for the purpose of increased
test security.
information A statistical concept related to the asymptotic
variance of maximum-likelihood trait estimates; it can be
expressed as the sum of individual item information
functions, which can be evaluated at specific points along
the trait scale.
item pool A collection of test questions and associated item
parameters from which items are selected for administration by the adaptive item-selection algorithm.
item response function A mathematical function providing
the probability of a correct response conditional on the
latent trait level y.
measurement efficiency The ratio of measurement precision to test length. One test or testing algorithm is said to be
more efficient than the other if it provides more precise
scores for a fixed test length, or if it achieves equally precise
scores with fewer administered items.
measurement precision An index of the accuracy of test
scores, often assessed by the average or expected squared
difference between true and estimated trait parameters,
E(y ^y2 :
stopping rule The rule used to determine when to end the
test; typically based on the number of administered items
(fixed length), or on the precision level of the estimated trait
parameter (variable length).
trait A psychological dimension of individual differences;
includes ability, aptitude, proficiency, attitude, or personality characteristics.
Computerized adaptive testing is an approach to individual difference assessment that tailors the administration
of test questions to the trait level of the examinee. The
computer chooses and displays the questions, and then
records and processes the examinees answers. Item selection is adaptiveit is dependent in part on the examinees answers to previously administered questions, and
in part on the specific statistical qualities of administered
and candidate items. Compared to conventional testing,
whereby all examinees receive the same items, computerized adaptive testing administers a larger percentage of
items with appropriate difficulty levels. The adaptive item
selection process of computerized adaptive testing results
in higher levels of test-score precision and shorter test
lengths.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
429
430
1 ci
1 e1:7ai ybi
n
Y
Pi yui Qi y1ui
i1
items. In contrast, an IRT-based test score (i.e., trait estimate ^y) has the same meaning for tests containing either
easy or difficult items (provided all item parameters have
been transformed to a common scale). This IRT invariance property enables the comparison of scores from different or overlapping item sets. In the context of IRT, ^
y
test scores are all on a common measurement scale, even
though these scores might have been estimated from tests
consisting of different items.
items forms the basis of all CAT item-selection algorithms. However, commonly used algorithms differ
along two primary dimensions: first, in the type of statistical estimation procedure used (maximum likelihood versus Bayesian), and second, in the type of item-response
model employed (e.g., 1PL, 2PL, or 3PL).
Table I
Step
1. Calculate provisional
trait estimate
2. Choose item
Maximum-Likelihood Approach
The maximum-likelihood (ML) approach to CAT item
selection and scoring is based on the log-likelihood
function
ly ln
n
Y
Pi yui Qi y1ui :
i1
The estimate ^
y(ML) is defined as the value of y for which
the likelihood (or, equivalently, the log-likelihood)
function is maximized. Because no closed-form expression exists for ^
y(ML), it is typically calculated using an
iterative numerical procedure such as the Newton
Raphson algorithm.
The estimator ^
y(ML) is asymptotically normally distributed with mean y and variance
1
Var^
y j y E q2 =qy2 ly
1
5
Pn
i1 Ii y
where the information function for item i, denoted by
Ii(y), is
0 2
Pi y
6
Ii y
Pi yQi y
and where P0i y denotes the derivative of the item
response function with respect to y. For the one- and
three-parameter logistic models, these derivatives are
P0i y 1:7Pi yQi y and P0i y 1:7ai Qi yPi y ci =
1 ci , respectively.
From Eq. (5), it is clear that the asymptotic variance of
the ML estimate ^
y(ML) can be minimized by choosing
items with the largest information values. If y were
known in advance of testing, then available items could
be rank ordered in terms of their information values [Eq.
(6)] at y, and the most informative items could be selected
and administered. Because y is not known (to know or
approximate y is, of course, the purpose of testing),
the most informative item can be selected using iteminformation functions evaluated at the provisional
(most up-to-date) trait estimate, Ii(^
yk(ML)). After the
chosen item has been administered, and the response
scored, a new provisional estimate can be obtained
and used to reevaluate item information for the remaining candidate items. These alternating steps of trait
431
Description
Obtain a provisional trait
estimate, ^
yk, based on the
first k responses
Compute information Ii(^
yk) for
each candidate item by
substituting the provisional
trait estimate ^
yk (calculated in
step 1) for the true parameter y
in Eq. (6); select for
administration the item with
the largest item-information
value
Bayesian Approach
In instances when a prior distribution for y can be specified, some test developers have opted to use a Bayesian
framework for item selection and trait estimation. The
prior density, denoted by f(y), characterizes what is
known about y prior to testing. The most common approach to prior-specification in the context of CAT sets
the prior equal to an estimated y-density calculated from
existing (or historical) examinee data. Then the assumption is made that future examinees (taking the CAT test)
are independent and identically distributed, y f(y). Although in many cases, additional background information
is known about examinees relating to y (such as subgroup
membership), this information is often ignored in the
specification of individual examinee priorsto allow
such information to influence the prior could lead to,
or magnify, subgroup differences in test score distributions.
A Bayesian approach provides estimates with different
statistical properties than are provided by ML estimates.
In CAT, Bayesian estimates tend to have the advantage
of smaller conditional standard errors, s(^y j y), but possess the disadvantage of larger conditional bias,
B(y) m(^y j y) y, especially for extreme y levels.
Thus, the choice of estimation approach involves
432
P
1=s2 ni1 Ii y y^y
MAP
MAP
10
MAP
Item-Selection Enhancements
Although the adaptive item-selection algorithms form an
efficient basis for precise measurement, test developers
have often found it beneficial or necessary to alter these
algorithms. These alterations, or enhancements, include
the specification of rules used to choose the first several
items; the specification of rules used to stop the test;
modifications to the item-selection algorithms, intended
to reduce opportunities for test-compromise and to help
achieve a more balanced item content; and the use of time
limits.
Stopping Rules
There are two common test termination or stopping rules
used in CAT: fixed length and variable length. Fixedlength tests require that the same number of items be
administered to each examinee. One consequence of
fixed-length tests is that measurement precision is likely
to vary among examinees. In contrast, variable-length
tests continue the administration of items until an individualized index of precision satisfies a target precision
level. These precision indices are often based on ML
information [Eq. (5)] or Bayesian posterior variance
[Eq. (10)] statistics.
Test developers have found that the choice of stopping
rule is often highly dependent on the test purpose, itempool characteristics, and operational constraints. In many
instances, for example, equally precise scores among examinees are paramount, helping to ensure that decisions
and interpretations made on the basis of test scores are
equally precise for all examinees. In other instances,
however, the occasionally long test lengths (possible with
variable-length tests) might be judged too burdensome
for examinees, and possibly for test administrators as well.
To moderate some of the operational burdens, variablelength testing has been implemented with upper-bound
constraints on the maximum number of administered
items, and, in some instances, on the maximum amount
of testing time allowed for each examinee. In other
instances, test developers have opted for fixed-length
433
434
Content Balancing
Test developers have been compelled in many cases to
depart from strict precision considerations when designing and implementing CAT item-selection algorithms.
These include cases, for example, in which the item
pool consists of items drawn from different content
areas of a more general domain (e.g., math items
drawn from algebra and geometry). In such instances,
item-selection algorithms that maximize precision may
not administer properly balanced tests, resulting in test
scores that have questionable validity. To help ensure
adequately balanced content across examinees, constraints can be placed on the adaptive item-selection algorithms (e.g., constraints that ensure equal numbers of
administered algebra and geometry items).
The most basic approach to content balancing spirals
the sequence of item administration among key content
areas. For example, math items would be administered in
the following order: (1) algebra, (2) geometry, (3) algebra,
(4) geometry, and so forth, where each item represents
the most informative item (passing the exposure-control
screen if used) at the provisional trait level among items in
the given content (i.e., algebra or geometry) domain.
Although the spiraling approach is adequate for a small
number of mutually exclusive content areas, this approach
is poorly suited for situations in which more complex
content constraints are desired. Consider the case, for
example, in which items are classified along several
dimensions simultaneously, and as a result do not fall
into mutually exclusive categories. In such cases, methods
such as the weighted deviations or shadow testing
approaches can be used. These approaches are designed
Time Limits
Because the effects of time pressure are not explicitly
modeled by standard item-selection and scoring algorithms, the imposition of time limits can in some instances
significantly degrade CAT measurement precision. Even
in spite of this undesirable consequence, most highstakes, high-volume testing programs have implemented
overall test time limits for a number of reasons, including
the desire to help reduce excessive test times. In instances
435
Item-Pool Development
Characteristics of the item pool (including size, item parameter distributions, and content coverage) directly impact CAT measurement efficiency and test score validity.
Furthermore, particular characteristics of the adaptive
algorithm (such as the stopping rule, number and type
of content balancing constraints, and type and level of
exposure control) can interact with key item-pool characteristics to further affect measurement efficiency and
test score validity. These characteristics are listed in
Table II.
Large item pools are desirable from several standpoints. First, large item pools tend to contain a larger
set of highly discriminating items, which in turn can provide greater measurement efficiency (i.e., greater precision for fixed-length tests and shorter test lengths for
variable-length tests). Second, large pools are more likely
to satisfy content balancing constraints, or to satisfy them
without severely impacting efficiency. For fixed-length
tests, large pools enable lower exposure levels (for the
most used items) and can satisfy these levels without
severely impacting precision. Many test developers
have found that high precision levels can be obtained
Table II Factors Affecting CAT Measurement Efficiency
Item-pool characteristics
Size
Item parameter distributions
Content coverage
Algorithm characteristics
Stopping rule
Content constraints
Exposure control
436
with pools of a size that is about six to eight times the test
length.
In principle, the ideal item pool contains items with
difficulty parameters (bi) values uniformly distributed
throughout the y range, and for the 3PL model, contains
high discrimination parameters (ai) values and low
guessing parameters (ci) values. In practice, these ideal
parameter distributions are often difficult to achieve. For
some tests, highly discriminating items may be rare, or
may exist only for items with difficulty values that span
a narrow range or for items of specific content areas.
In these cases, CAT algorithms can be very inefficient,
resulting in test scores that have low precision over some
trait ranges (for fixed-length tests), or resulting in long test
lengths (for variable-length tests). Consequently, test
developers, when possible, have tended to write and pretest large numbers of items in hopes of ending up with
a sufficient number of highly discriminating items of
appropriate difficulty and content.
Standard CAT item selection and scoring algorithms
assume that the IRFs for all items are known in advance.
In practice, these are estimated from examinee response
data. For the 3PL model, large-scale testing programs
have tended to use samples containing 500 or more responses per item to estimate item parameters. Programs
that have based their item-selection and scoring algorithms on the 1PL model have typically relied on smaller
sample sizes for IRF estimation. Test developers routinely
use conditional (on ^
y) item-score regressions to check
model fit. This model-fit analysis typically includes an
additional check of dimensionality or local independence
assumptions.
Many test developers have found it convenient, especially when developing the first set of pools, to collect
calibration data in paper-and-pencil format. This mode
of data collection is often faster and cheaper than collecting the same data by computer. In these cases, test
developers have attempted to ensure that the use of
item-parameter estimates obtained from paper-andpencil data are adequate for use when the items are
administered on computer in adaptive format. This assurance has been provided by several studies that have found
inconsequential differences in item-response functioning
due to mode of administration (computer versus paper
and pencil).
Because of the complexity of the interactions between
item-pool characteristics and adaptive testing algorithms,
and the effects these have on measurement efficiency,
test developers routinely conduct computer simulation
studies to fine-tune the adaptive algorithms and to examine the adequacy of candidate item pools. These simulations take as input the item-parameter estimates of items
contained in the pool (a, b, and c values), and if content
balancing is proposed, the content classification of each
item. Then, the consequences (on precision or test length)
Trends in Computerized
Adaptive Testing
In recent years, research on item-selection and scoring
algorithms has continued. This includes work on itemselection algorithms intended to provide greater measurement precision. One class of approaches addresses
the uncertainty regarding the provisional trait estimates toward the beginning of the test. These approaches
include methods such as the global information criterion,
weighted-likelihood information criterion, a-stratified
method, and fully Bayesian approaches. Research has
also continued on improved exposure control algorithms
to further guard against test compromise. Another class of
item-selection approaches that has been developed to
further increase the measurement efficiency of CAT in
the context of multidimensional IRT modeling involves
items that are selected to maximize the information along
several dimensions simultaneously.
As more testing programs have considered the use
of CAT, more attention has been given to its costeffectiveness. In addition to the benefits of increased
measurement precision and reduced test lengths, CAT
offers a host of other benefits associated with the computerized administration of test items. These include immediate and accurate scoring, minimal proctor
intervention, individually timed and paced test administration, standardized instructions and test administration
conditions, improved physical test security (no hard-copy
test booklets are available for compromise), and provisions for handicapped examinees (large print, audio,
and alternate input devices). Many of these benefits, especially when considered alongside the key benefit of
increased measurement efficiency, provide compelling
incentives in favor of CAT. But several obstacles have
prevented many test developers from adopting CAT. In
addition to specialized software requirements (necessary
for test development and administration), CAT also requires considerable resources for item-pool development
and for the purchase and maintenance of computer testdelivery systems.
437
438
Further Reading
Drasgow, F., and Olson-Buchanan, J. B. (eds.) (1999).
Innovations in Computerized Assessment. Lawrence Erlbaum, Hillsdale, New Jersey.
Lord, F. M. (1980). Applications of Item Response Theory to
Practical Testing Problems. Lawrence Erlbaum, Hillsdale,
New Jersey.
Computerized Record
Linkage and Statistical
Matching
Dean H. Judson
U.S. Census Bureau, Washington, D.C., USA
Glossary
blocking field/strategy A way to limit the search space
by forcing certain fields to match before considering
whether to link two records. Pairs that do not match on
one or more blocking fields are automatically not sent to be
compared.
constrained matching In statistical matching, every record
from both files must be matched one and only one time, as
opposed to unconstrained matching.
database A collection of records laid out in fields.
false negative/false nonlink A nonlink between two records
that is in fact not correct.
false positive/false link A link between two records that is in
fact not a correct link.
field A datum about the object, sometimes called a variable
(for example, first name, street name, street type, owner
name).
labeled data Pairs of records labeled concerning whether
they already have link or nonlink decision flags on them;
otherwise, they are unlabeled.
link decision A decision to join two records; a decision to not
join two records is a nonlink decision.
match Two fields (in two different records) that are
considered the same or sufficiently similar; two records
match if they refer to the same external object.
matching field/strategy For pairs of records that satisfy the
blocking strategy, the matching strategy indicates how to
compare matching fields so as to determine if the two
records should be linked.
record A collection of data about an individual object (for
example, a person, an address, a business entity).
search space The region in which the record of interest is
sought, in either record linkage or statistical matching.
statistical match Two records from different databases that
have been joined together, but are not believed to refer to
the same external object.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
439
440
coverage overlap between two databases, to evaluate duplication in an existing database, to add new records and
remove unused or unusable records, and to augment data
in one database with data from another. To clarify the
following discussions of database linkage, the first database will be referred to as database A, the second as database B (three-way links between databases are much
less common, and typically they are composed of multiple
two-way links).
Evaluating coverage overlap is important in population
censuses. In censuses and vital registration (e.g., births,
deaths) lists around the world, undercoverage (and occasionally overcoverage) is a constant concern. But how to
evaluate undercoverage, when by definition undercoverage means that the people or addresses are missing
from the list? A common solution is to use a dual system
estimator, in which the census-taking organization takes
the population census and follows up with a second enumeration attempt in a sample of areas. Thus, each person
or address has two opportunities to be captured, and from
these two captures an estimate of coverage (over or under)
is constructed. But, in order to make the dual system estimate, the people and/or addresses in database A have to be
linked to their equivalent record in database B. Because
people write down their names, dates of birth, and their
addresses with variations (e.g., William vs. Billy), linking
equivalent records is not obvious.
Evaluating duplication is also important in vital registries, population censuses, marketing applications, and
sample surveys. For example, immunization registries attempt to keep track of which persons have received which
immunization. If the list contains duplicate records on
a single person, it means that that persons actual immunization history is not accurately maintained. In vital
registers, it is certainly not of demographic value to record
duplicate births or to record deaths twiceboth lead to
estimation errors. Further afield, for direct marketing
applications or sample surveys, duplication means that
the household or person receives more than one solicitation, which is a waste of money at best and an annoyance
to the recipient at worst.
Adding new records and removing unused or unusable
records is important in list construction for sample surveys
that use a list frame. In this case, database A represents the
existing list, and database B represents a batch of new candidates for the list. Obviously, adding records that represent the same unit as one already existing on the database
is creating duplication. Just as obviously, linking a candidate record from B with a record from A falsely, and not
adding it to the list, is creating undercoverage. Thus, getting
the link right is of crucial importance in list construction.
A final, important use for record linkage is to use data
from database B to augment information in database A, if
each database contains information not contained in the
other (e.g., one is a credit card database, the other is a tax
First
name
Middle
name
Last
name
Suffix
Arthur
Arthur
F.
Jones
Jones
II
Middle
name
Last
name
Suffix
Arthur
Art
F.
F
Jones
Jones
22
22
1971
1971
Table I
Rule)
441
22
22
1971
1988
442
Record #
...
...
...
Record #
482
505
432
123
...
1
142
Character Matching
The most natural method of linking records, as already
discussed, is some character-by-character matching of
fields in the database. The primary weaknesses of character matching relate to inflexibility. Another type of
weakness is nonuniqueness of the link/nonlink decision.
16
Sliding Windows
4
505
...
...
N
Figure 2 Illustration of the space of all possible record pairs
when the data shown in Fig. 1 are jumbled.
2 101 Elm Court Apt.1 97111 vs. 101 Elm Street Apt.1 97111 3
3 101 Elm Street Apt.1 97111 vs. 101 Elm Street Apt.2 97111 4
Expert Systems
Character matching and sliding windows have the disadvantage that they rely only on the syntax of the fields,
and not on their semantics. That is, when comparing
Arthur to Artie, the character matchers do not
using any special intelligence about the relationship
between those two strings, rather, they blindly compare
character to character. The sliding-windows approach,
though using intelligence about the relative importance
of fields, also does not use any special knowledge about
the contents of the fields. An expert system attacks (and
partially solves) this problem. Expert systems use collections of rules or clues designed by experts. These collections of rules can accumulate over time, providing locally
specific rules to account for local conditions, special
situations, and the like.
FellegiSunter theory
Parsing and Preediting
A necessary pair of steps to computerized record linkage is
parsing the incoming data and applying any necessary
Px1 1 j M Px2 1 j M
PxN 1 j M
:
Px1 1 j M Px2 1 j M
PxN 1 j M
443
Px1 1 j M
Px2 1 j M
ln
Px1 1 j M
Px2 1 j M
ln
PxN 1 j M
:
PxN 1 j M
444
x13
The problem of finding donors in the donor data set is
conceptually similar to the problem of finding distances
between centroids in cluster analysis. There are essentially two proposed methods to find donorsfinding the
closest match and using those data, and developing
a model and using the model predictions as data. The
first method is to employ some distance measure algorithm typically used in clustering techniques to find the
nearest neighbor or single unique donor in the donor
database, then set some function of the value of the missing value Y1 on the target database equal to some function
of the amount from the donor and the target, f(Y1, Y2).
That is, the recipient on the target database gets imputed
the value f(Y1, Y2). The simplest function for a donation
would simply be f(Y1, Y2) Y2, or simply take the donated
value from the donor data set. Another function
that would attempt to reflect uncertainty around the donated value would be f(Y1, Y2) Y2 e, where e has
some distribution reflecting the uncertainty around the
donated value.
Another method currently in the literature is to
^~
^2 ~
estimate a multiple regression model Y
b
X 2,
^ 2 of the variable of interest
to generate the expected value Y
^1
from the donor data set, calculate the expected value Y
Model-Based Method
In the model-based method, the researcher uses multiple
regression (or a generalized variant) to find the expected
value of the variable of interest, to calculate the expected
value for each record in both data sets, to perform a simple
match using a distance measure on each estimated value,
and then to set the value of the missing variable equal to
a function of the value recorded for the donor. Using this
technique, the match would be performed on one variable,
the expected value of each case under a regression model.To
pick the minimum distance, the distance measure could be
among Euclidean, squared Euclidean or city-block
(Manhattan) distance (absolute value), because they eliminate negative distance values. (These are not the only
metrics; many others are possible.) For the purposes of
this exercise, squared Euclidean distance is used as the distance measure to minimize to select donors. The modelbased algorithm may be described in pseudocode as follows:
1. In the donor database,
^~
^2 ~
estimate the regression model Y
b
X 2.
2. Set i: 1.
3. Do while i5N1:
^~
^ 1i ~
4. Calculate Y
b
X1i.
5. Find j [ {1, . . . , N2} such that,
^ 1i Y
^ 2i)T (Y
^ 1i Y
^ 2j) is minimized.
(Y
6. Select Y2j as the donor for case i.
7. Append f(Y1i, Y2j) to the ith case in the target
database.
8. Set i: i 1.
9. End.
445
Mathematical Relationships
Each of the previously described techniques has common
attributes. Mathematically, the techniques are similar in
effect. The Mahalanobis distance function has two important properties: (1) the diagonal cells of the S1 represent
variances, and hence scale the individual distance calculations, and (2) the off-diagonal cells of the S1 represent covariances, and deform the individual distance
calculations. Note that the minimum value of any entry
in the S1 matrix is zero. There are no negative entries in
the S1 matrix.
In order to determine the relationship between the
Mahalanobis measure and the model-based measure,
begin with the function to be minimized:
^ 1i Y
^ 2j
Y
T
T
~
~
~
^~
^~
^~
^~
^ 1i Y
^ 2j ~
b
X1i b
Y
b
X1i b
X2j
X2j
h
iT h
i
~
~
^ ~
^ ~
b
X1i ~
b
X1i ~
X2j
X2j
T T
~
^ ~
^ ~
~
X1i ~
b
b
X1i ~
X2j
X2j :
446
~
^ is the analogue to the
^T~
b
Now it is seen that the term b
1
S of the Mahalanobis distance measure. Instead of
scaling the space by variances and covariances, the space
is scaled by the estimated coefficients of the model, and
cross-products of these estimated coefficients.
Statistical Matching
Considerations
When performing a statistical match, the researcher
hopes that the donated data really do represent what
the target record would have had, had the data been collected. Therefore, the researchers choice of variables to
be used in the match should be based on statistical evidence that they are reliable predictors or indicators of the
missing variable. For model-based methods, as far as possible, fully specify the model; do not leave any indicators
out of the set that would have an effect on the value of the
missing variable, and verify that the functional form of the
model is correct. Furthermore, as with any modeling exercise, the researcher needs to estimate the effect size of
each of the statistical match variables. Some variables
make a larger contribution to the matching of two different records, and the relative contributions of match
variables should make substantive sense.
A second important consideration is whether the
match should be constrained or unconstrained. In unconstrained matching, donor records are free to be reused,
repeatedly if necessary, if the donor record is the closest
match to more than one target record. A constrained
match requires that all records are used, and thus
a particular donor record, if already taken by a target
record, may not be taken by a new target record.
A third consideration is the implicit assumption of conditional independence: When matching two records only
on X, implicitly the researcher is assuming that Y and Z are
independent conditional on their X value. Conditional
independence is a feasible solution, but not the only
one, and in the absence of auxiliary information that reduces the set of feasible solutions to a very narrow range,
the match process should be repeated for various solutions so as to exhibit the uncertainty in the matching process.
A fourth consideration is that, like any regression-type
model, a misspecified (or underspecified) set of predictor
^
variables will bias the estimated regression coefficients b
and thus also bias the expected value of the missing var^ ; therefore, the model-based technique may not
iable Y
work as well when the data sets do not contain the needed
predictor variables. The centroid method is nonparametric in the sense that it would simply find the match
whether or not the indicator variables are good predictors.
Of course, when X is not a good predictor of Y (or Z, or
Acknowledgment
This article reports the results of research and analysis
undertaken by Census Bureau staff. It has undergone
a more limited review by the Census Bureau than its
official publications have. This report is released to inform
interested parties and to encourage discussion.
Further Reading
Belin, T. R., and Rubin, D. B. (1995). A method for calibration
of false-match rates in record linkage. J. Am. Statist. Assoc.
90, 694707.
Borthwick, A., Buechi, M., and Goldberg, A. (2003). Key
Concepts in the ChoiceMaker 2 Record Matching System.
Paper delivered at the First Workshop on Data Cleaning,
Record Linkage, and Object Consolidation, July 17, 2003,
in conjunction with the Ninth Association for Computing
Machinery Special Interest Group on Knowledge Discovery and Data Minings (ACM SIGKDD) International
Conference on Knowledge Discovery and Data Mining,
Washington, D.C.
Christen, P., Churches, T., and Zhu, J. X. (2002). Probabilistic
Name and Address Cleaning and Standardisation. Proceedings, December 2002, of the Australasian Data Mining
Workshop, Canberra.
Duda, R. O., Hart, P. E., and Stork, D. G. (2001). Pattern
Classification, 2nd Ed. John Wiley and Sons, New York.
Fellegi, I. P., and Sunter, A. B. (1969). A theory for record
linkage. J. Am. Statist. Assoc. 64, 11831210.
Gu, L., Baxter, R., Vickers, D., and Rainsford, C. (2003).
Record Linkage: Current Practice and Future Directions.
Technical Report 03/83, CSIRO Mathematical and Information Sciences. Available on the Internet at www.act.
cmis.csiro.au
Hastie, T., Tibshirani, R., and Friedman, J. (2001). The
Elements of Statistical Learning: Data Mining, Inference,
and Prediction. Springer-Verlag, New York.
447
Condorcet
Pierre Crepel
CNRSUniversite de Lyon 1, Villeurbanne, France
Glossary
calculus of probabilities A mathematical science initiated by
Pascal, Fermat, and Huygens, with contributions by
brothers James and Nicholas Bernoulli and, further, by de
Moivre, Bayes, Price, and Laplace; These scientists thought
that the nature of the calculus of probabilities changes
and that it could be applied to the functions of life and that
it is the only useful part of this science, the only part worthy
of serious cultivation by Philosophers.
Condorcets effect, or paradox Consider the case of an
election among three candidates, A, B, and C, with each
voter simultaneously showing a preference by placing the
candidates in order of merit. It is possible that the collective
opinion gives an incoherent (or cyclical) resultfor
example, A is better than B, B is better than C, C is better
than A. There are several interpretations of this effect
(Condorcet and Kenneth Arrow, for example, are not
concordant).
political arithmetic (Condorcet), or social mathematics The
application of mathematics (including differential and
integral calculus, the calculus of probabilities, etc.) to
political sciences, involving three parts: (1) collection of
precise facts such that computations can be applied,
(2) derivation of consequences of these facts, and (3) determination of the probabilities of the facts and of their
consequences.
political arithmetic (England, 17th century) The application of (elementary) arithmetical calculations to political
uses and subjects, such as public revenues or population
counts.
theore`me du jury The Condorcet jury theorem proposes
that n individuals, expressing their opinion independently,
each have a probability p (40.5) of giving a true decision
and thus a probability 1 p (50.5) of giving a false
decision. The collective decision of the majority thus has
a probability of truth approaching 1 when n becomes large.
The name theore`me du jury is Duncan Blacks and not
Condorcets.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
449
450
Condorcet
Condorcet
a deeper (practical and theoretical) reflection on the conditions that make possible and relevant the measurement
of social variables. The Controller-Generals reforms
were very ambitious. In an unpublished manuscript (studied by L. Marquet), Condorcet wrote that When I was
named Inspecteur des monnaies in 1775 by Mr Turgot, the
latter said to me that he purposed to include in a general
system the reform of weights and measures, the legislation
of currencies and the gold and silver trade.
From the very beginning of his first plans, Condorcet
aimed to solve technical problems (physical, chemical,
and mechanical), to evaluate the economic stakes, and
to suggest efficient administrative means for success of
the reforms. He was thus led to clarify concepts, in particular to study the constancy and variability of involved
phenomena, the possibility or not of measuring them with
precision, the social conditions of measurement, and what
may now be called examination of hidden economic
variables. He also paid attention to these concepts
when judging the relevance of various navigation projects
(canals in Picardy, 1775 1780; in Berry and Nivernais,
1785 1786), when supporting the gauge reform proposed
by Dez (an accurate means of measuring liquid content,
1776), and when refereeing plans for a new cadastre in
Haute-Guyenne (1782). As a commissaire (referee) and
also as a redactor of the Histoire de lAcademie Royale des
Sciences, in the bosom of the Academie, Condorcet had
many opportunities to deepen and perfect his reflections
on numerous matters, including the controversy between
Sage and Tillet on gold and metallurgy (1780) and various
other problems during the French Revolution.
451
innovations, including a theory of mathematical expectation with a solution to the St. Petersburg problem
for a finite horizon; a theory of complexity of random
sequences in regard to regular arrangements; a model
for probabilistic dependence, which is none other than
what are now called Markov chains; and solutions to the
problem of statistical estimation in the case of timedependent probabilities of events. This latter contribution
foreshadows, perhaps clumsily and not in a very practical
way, the concept of time series. Condorcet also produced
a definition of probabilities starting from classes of events,
and a theory of individual economic choice in a setting of
universal risk and competition. Unfortunately, he was too
daring in his writing, which suggested research programs
rather than concrete theorems; moreover, the exposition
of ideas was so unclear and impractical that his original
contributions were not understood in his lifetime or even
in the two following centuries.
It is not known exactly why Condorcet did not publish
his manuscript works on probability before his celebrated
memoir (1784 1787) and his Essai sur lapplication de
lanalyse a` la probabilite des jugements rendus a` la
pluralite des voix (1785). Was he unsatisfied with his
own dissertations? Did he fear to displease DAlembert?
There are also questions as to what extent Condorcets
work was related to Laplaces major contributions. How
did Condorcet compare his metaphysical views on probability with Humes views? Historians and scholars can
only propose hypotheses about these open questions.
452
Condorcet
Condorcet
equal importance, the apparently rigorous method may
lead us further from an exact result, than would a well
devised approximation to that method.
Without giving details here, Condorcet made a clear distinction between three different ends:
1. Do we try to find one winner or to determine a total
order among all the candidates?
2. Do we try first of all to choose the best candidate (as,
for example, in the election of a president) or to choose
only good candidates (as, for example, in the election of
members of Parliament)?
3. If we know the probability of truth of each voter, does
the result of the ballot give a maximum for the probability
of getting the best candidate (or the best order)?
The forms proposed by Condorcet were diverse, admittedly often somewhat complicated, but the priority was
always to respect the essential point (and not all the
abstract possible conditions) in the concrete situation.
453
454
Condorcet
Further Reading
Arrow, K. J. (1963). Social Choice and Individual Values,
2nd Ed. Wiley, New York.
Baker, K. M. (1975). Condorcet. From Natural Philosophy to
Social Mathematics. University of Chicago Press, Chicago.
Beaune, J. C. (ed.) (1994). La Mesure. Instruments et
Philosophie. Champ Vallon, Seyssel.
Black, D. (1958). The Theory of Committees and Elections.
Cambridge University Press, Cambridge.
Brian, E. (1994). La Mesure de lEtat. Albin Michel, Paris.
Chouillet, A. M., and Crepel, P. (eds.) (1997). Condorcet
Homme des Lumie`res et de la Revolution. ENS Editions,
Fontenay-aux-Roses (Lyon).
Condorcet (1994). Arithmetique Politique Textes Rares et
Inedits (B. Bru and P. Crepel, eds.) INED, Paris.
Crepel, P. (1990). Le dernier mot de Condorcet sur les
elections. Mathemat. Informat. Sci. Hum. 111, 7 43.
Crepel, P., and Gilain, C. (eds.) (1989). Condorcet Mathematicien, Economiste, Philosophe, Homme Politique. Minerve,
Paris.
Crepel, P., and Rieucau, J. N. (2004). Condorcets social
mathematics: a few tables. Social Choice Welfare (special
issue). To be published.
Guilbaud, G. T. (1968). Elements de la Theorie Mathematique
des Jeux. Dunod, Paris.
Marquet, L. (1989). Turgot, Condorcet et la recherche
dune mesure universelle. Bull. Org. Int. Metrol. Legale
115, 2 8.
McLean, I., and Urken, A. (1992). Did Jefferson and Madison
understand Condorcets theory of voting? Public Choice 73,
445 457.
McLean, I., and Urken, A. (1992). Classics of Social Choice.
Michigan University Press, Ann Arbor.
Perrot, J. C. (1992). Une Histoire Intellectuelle de lEconomie
Politique. EHESS, Paris.
Rashed, R. (1973). Condorcet. Mathematique et Societe.
Hermann, Paris.
Rothschild, E. (2001). Economic Sentiments. Adam Smith,
Condorcet, and the Enlightenment. Harvard University
Press, Cambridge.
Confidence Intervals
George W. Burruss
Southern Illinois University at Carbondale, Illinois
Timothy M. Bray
University of Texas, Dallas, Richardson, Texas, USA
Glossary
alpha level Denoted as a; represents the probability that
a researcher will commit a Type I error (rejecting the null
hypothesis when it is true). Standard alpha levels in social
science research are 0.05 and 0.01, but any level can be
specified. A level of 0.01, for instance, indicates that
a researcher believes they would incorrectly reject the null
hypothesis only 1 in 100 times.
confidence interval The interval estimate around
a population parameter that, under repeated random
samples, would be expected to include the parameters
true value at 100(1 a)% of the time. For instance, it is
highly unlikely that the percentage of persons in a city
favoring a bond issue is exactly the same as the percentage
in the sample. The confidence interval builds a buffer zone
for estimation around the sample percentage.
confidence level The probability that, under repeated
random samples of size N, the interval would be expected
to include the population parameters true value. Typically,
either 95 or 99% levels are chosen, but these levels are
pure convention and have no scientific justification;
researchers may pick any confidence interval that suits
their purpose.
point estimate A sample statistic that represents an estimate
of a population parameters true value.
population parameter A number that describes some
attribute of the population of interest, such as the
percentage of a countys population that believes jail
sentences are too lenient.
standard error of the estimate The error that results from
taking a sample from a population. The standard error is
derived from the sample size and the standard deviation of
the distribution of samples (i.e., the distribution of repeated
random samples taken from a population). Most often, the
standard deviation of the distribution of samples is
Introduction
Reports of opinion polls, election returns, and survey
results typically include mention of a margin of error
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
455
456
Confidence Intervals
Confidence Intervals
457
458
Confidence Intervals
and solved:
3:76 2:091:20 3:76 2:35
1:25, 6:27:
or
Key
CI upper limit
Sample
mean
CI lower limit
Sample confidence
interval includes
population
mean ()
Figure 1 Confidence intervals for sample averages; 95% of the confidence intervals include the
average (50). Dashed lines represent confidence intervals that do not include the population mean.
This figure was generated using the Statplus Microsoft Excel add-in.
Sample confidence
interval does not
include population
mean ()
459
Confidence Intervals
dotted lines show confidence intervals that do not intersect the horizontal line at 50; thus, 5 out of the 100 sample
confidence intervals do not capture the population mean.
Figure 1 demonstrates how sampling error causes the
confidence intervals of the repeated samples to vary
around population mean, but would capture the value
of the population 95 out of 100 random samples. Because
there is always a small chance that the sample in
a particular study would not actually capture the population mean, replication of research is one way that
scientists guard against drawing invalid conclusions
from sampling.
or
0:36, 0:41:
460
Confidence Intervals
coefficient is calculated as
s:e:bk
s:e:R
p ,
sXk n 1
Presentation of Confidence
Intervals in Social Science
Research
2:105
p 0:150:
2:004 50 1
Variable
Constant
FEMALEHOUSE
POPGROWTH
s.e.
p-Value
s.d.
11.128
1.616
0.070
1.677
0.150
0.026
0.000
0.000
0.009
2.003
11.527
Lower bound
Upper bound
14.503
1.314
0.018
7.754
1.918
0.123
a
Homicide data are from the United States Federal Bureau of Investigation. The data for percentage population growth and
percentage of female-headed households in the United States are from the United States Census Bureau.
b
Standard error.
c
Standard deviation.
Confidence Intervals
Standard error.
s.e.a
Odds ratio,
exp(b)
p-Value
0.365
0.644
3.690
0.099
0.178
0.437
1.441
1.905
1.238
0.0003
0.0002
0.0000
461
p^
e 3:696:644410:36533
,
1 e 3:696:644410:36533
p^
0:142
,
1:142
p^ 0:1246:
A juvenile with one charged offense and three prior
referrals has about 12.5% chance of being placed out of
home. Reporting a predicted value of the dependent
variable, given specific values of the independent
variables, is often the goal of statistical analysis. What
is missing from this predicted probability, however, is
a measure of uncertainty that accompany raw coefficients. Simulation can provide the quantity of interest
and a measure of uncertainty from the distribution of
simulated results. For example, the same logit regression
model here was used in a Monte Carlo simulation. From
a distribution of 1000 iterated models, the mean
probability for secure placement was almost 13%, with
a lower limit of 8% and an upper limit of about 20%
at the 95% confidence level, close to the predicted value
of about 12.5% calculated here. Furthermore, quantities
of interest and confidence intervals can be plotted to
report to laypersons statistical analysis that includes
measures of uncertainty. Figure 2 shows the change in
probability of an out-of-home placement as the number
of prior referrals is increased. Note too that the size of
the confidence intervals increases. As the probability of
placement increases, the uncertainty surrounding the
prediction increases as well. Thus, prediction of placement is less certain at higher levels of prior referrals in
a juveniles court record.
A researchers decision to include confidence intervals
should depend on the research question, the outlet for the
research, and the target audience for the presentation of
the results. Though raw coefficients and standard errors
may sufficiently describe point estimates and confidence
intervals to experts, more reader-friendly results can be
given to the public, which are often the final consumers of
research. Simulation is one method to present quantities
of interest along with uncertainty. Regardless of the method, an honest presentation of statistical results should
include a measure of uncertainty.
462
Confidence Intervals
100
95
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
0
0
5
6
Number of prior referrals
10
Figure 2 Predicted probability of secure placement for juvenile delinquency cases based on the number of prior
referrals, holding the number of offenses constant. Dotted lines represent 95% confidence intervals.
Further Reading
Altman, D. G., Machin, D., Bryant, T. N., and Gardner, M. J.
(2001). Statistics with Confidence, 2nd Ed. BMJ Books,
Bristol, England.
Confidentiality and
Disclosure Limitation
Stephen E. Fienberg
Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
Glossary
confidentiality Broadly, a quality or condition accorded to
statistical information as an obligation not to transmit that
information to an unauthorized party.
contingency table A cross-classified table of counts according to two or more categorical variables.
data masking The disclosure limitation process of transforming a data set when there is a specific functional relationship
(possibly stochastic) between the masked values and the
original data.
disclosure The inappropriate attribution of information to
a data provider, whether it be an individual or organization.
disclosure limitation The broad array of methods used to
protect confidentiality of statistical data.
perturbation An approach to data masking in which the
transformation involves random perturbations of the
original data, either through the addition of noise or via
some form of restricted randomization.
privacy In the context of data, usually the right of individuals
to control the dissemination of information about themselves.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
463
464
probability of identification of some individual in the relevant population. From this now widely recognized perspective, the goal of the preservation of promises of
confidentiality cannot be absolute, but rather should be
aimed, of necessity, at the limitation of disclosure risk
rather than at its elimination. Assessing the trade-off
between confidentiality and data access is inherently
a statistical issue, as is the development of methods
to limit disclosure risk. That is, formulation of the
problem is statistical, based on inputs from both
the data providers and the users, regarding both risk
and utility.
The article covers some basic definitions of confidentiality and disclosure, the ethical themes associated
with confidentiality and privacy, and the timing of release
of restricted data to achieve confidentiality objectives, and
albeit briefly, when release restricted data is required to
achieve confidentiality or when simply restricting access
is a necessity. A case is made for unlimited access to
restricted data as an approach to limit disclosure risk,
but not so much as to impair the vast majority of potential
research uses of the data.
In recent years, many researchers have argued that the
trade-off between protecting confidentiality (i.e., avoiding
disclosure) and optimizing data access to others has become more complex, as both technological advances and
public perceptions have not altered in an information age,
but also that statistical disclosure techniques have kept
pace with these changes. There is a brief introduction
later in the article to some current methods in use for
data disclosure limitation and statistical principles that
underlie them. This article concludes with an overview
of disclosure limitation methodology principles and
a discussion of ethical issues and confidentiality concerns
raised by new forms of statistical data.
Confidentiality
obligations
Disclosure
(intruder)
Harm
465
466
the secondary data analyst agrees to use the data for statistical research and/or teaching only, and to preserve
confidentiality, even if data sets have already been edited
using disclosure limitation methods. Some statistical
agencies adopt related approaches of licensing. Restricting access to a public good produces bad public
policy because it cannot work effectively. This is primarily
because the gatekeepers for restricted data systems have
little or no incentive to widen access or to allow research
analysts the same freedom to work with a data set (and to
share their results) as they are able to have with unrestricted access. And the gatekeepers can prevent access by
those who may hold contrary views on either methods of
statistical analyses or on policy issues that the data may
inform. In this sense, the public good is better served
by uncontrolled access to restricted data rather than by
restricted access to data that may pose confidentiality
concerns. This presumes, of course, that researchers
are able to do an effective job of statistical disclosure
limitation.
Adding noise
Releasing a subset of observations (delete rows
from Z)
Cell suppression for cross-classifications
Including simulated data (add rows to Z)
Releasing a subset of variables (delete columns
from Z)
Switching selected column values for pairs of rows
(data swapping)
467
468
Acknowledgments
This work was supported in part by Grant No. EIA9876619 from the U.S. National Science Foundation to
the National Institute of Statistical Sciences and Grant
No. R01-AG023141 from the National Institutes of
Health.
Further Reading
Bishop, Y. M. M., Fienberg, S. E., and Holland, P. W. (1975).
Discrete Multivariate Analysis: Theory and Practice. MIT
Press, Cambridge.
Dobra, A., and Fienberg, S. E. (2000). Bounds for cell entries
in contingency tables given marginal totals and decomposable graphs. Proc. Natl. Acad. Sci. 97, 1188511892.
Dobra, A., and Fienberg, S. E. (2001). Bounds for cell entries
in contingency tables induced by fixed marginal totals with
applications to disclosure limitation. Statist. J. UN, ECE
18, 363371.
Dobra, A., Erosheva, E., and Fienberg, S. E. (2003).
Disclosure limitation methods based on bounds for large
contingency tables with application to disability data. In
Proceedings of the Conference on New Frontiers of
Statistical Data Mining (H. Bozdogan, ed.), pp. 93116.
CRC Press, Boca Raton, Florida.
Domingo-Ferrer, J. (ed.) (2002). Inference control in statistical databases from theory to practice. Lecture Notes
in Computer Science, Vol. 2316. Springer-Verlag,
Heidelberg.
Doyle, P., Lane, J., Theeuwes, J., and Zayatz, L. (eds.) (2001).
Confidentiality, Disclosure and Data Access: Theory and
Practical Applications for Statistical Agencies. Elsevier,
New York.
469
Joachim H. Spangenberg
Sustainable Europe Research Institute, Cologne, Germany
Glossary
conspicuous consumption The consumption of goods not
only to meet basic needs, but also to serve as a key means of
identification of status and prestige.
consumption In classical economics, the nonproductive use
of savings; typified by expenditures on luxury goods and
services, particularly imported precious goods, jewelry,
artwork, etc. for the wealthy and upper classes.
consumption function The quantification of the positive
relationship between consumption and income; it explains
how much consumption increases as the result of a given
increase in income. According to standard economics, total
consumption is always equivalent to the level of income.
enforced saving In classical economics, the impossibility of
investing available money (mainly past profits) due to a lack of
investment opportunities. In socialist countries in the 20th
century, the definition related to the impossibility of spending
household income due to a lack of consumer goods. Recently
used to describe the externally imposed need for households
to save as a substitute for former public social security systems.
equimarginal principle There is always an equilibrium of
production and consumption and of demand and supply
because (1) the marginal utility of consumption is always
declining and (2) the marginal utility of each production
factor, ceteris paribus, declines as well.
hoarding In classical economics, holding profit, neither
reinvesting it nor using it for productive consumption.
For Keynes, this behavior is part of the liquidity preference;
and in neoclassical economics, it is part of the planned
saving. Speculation on the financial markets is based on
hoarded liquidity, although the term is no longer in use.
savings In classical economics, the conversion of revenues
into capital. Saving as nonconsumption, if used productively, equals investment. Productive use of savings by the
entrepreneur is investment in productive (income-generating
Historical Development
With the evolution of the modern economic system, the
role of saving and consumption has changed considerably,
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
471
472
the highest income levels in human history, stagnant saving rates, an unparalleled diversity and level of private
consumption, and ongoing changes in the composition of
goods (inferior goods are replaced by superior ones, but
these categories evolve as well). Except for those in the
low income brackets (the polarization of income is increasing in all Western societies), the utility from consumption is dominated by symbolism, by status
seeking, and by other social functions of consuming (identity, compensation, affiliation, self-esteem, excitement,
etc.). Spending patterns reflect this structural change:
for instance, although spending on food declined significantly in all Organization for Economic Cooperation and
Development (OECD) countries over the past four
decades, the money spent on fashion, entertainment,
travel, leisure in general, and other services rose dramatically. Although a minority in the northern hemispheric
region and the majority in the southern hemispheric region are still restricted to consumption to meet basic
needs (when possible), a global consumer class has
emerged (about one-third of them living in the southern
hemisphere), sharing attitudes, preferences, and consumption habits.
Unlike a quarter century ago, the sources of consumer
goods and the locations of investments from national savings are no longer restricted by national economies. The
share of goods originating in the southern hemispheric
region has increased, and monetary flows are concentrated between the OECD countries that are no longer
industrial, but service economies. The international financial markets, rather than the national savings, determine
the availability of investment capital (hence the converging trend of national interest rates). On the household
level, consumption and saving still represent behavioral
alternatives, but the link to the level of the national economy is mediated by the globalization process. Economic
theory, although developing over time (see subsection
below), still struggles with this complexity (the following
section); for this reason, reliable measurements and
data for decision makers are all the more important
(penultimate section). Nonetheless, a number of open
questions remain (final section).
473
474
475
Contemporary Knowledge on
Saving and Consumption
There is no single, logically coherent theory comprising all
aspects of household saving and consumption behavior.
Instead, several theories and conceptual constructs exist,
some of them more or less closely linked to each other.
Astonishingly, academic discussions of consumption and
saving often occur separately, with economists searching
for the basics of consumption decisions or trying to elaborate the determinants of saving decisions. As a result,
theoretical assumptions and empirical findings frequently
differ among studies, although most share the basic
approach.
According to modern economics, the sum of consumption spending C and savings S constitutes the total expenditure, and must be equal to the amount of total income
I : I C S. This basic equation is applicable to the
household level as well as to national economies. Even
if, for a period, consumption expenditures are higher than
income, this only refers to negative savings. Those negative savings are either paid from a stock of wealth accumulated from savings of previous periods or come from
credits, establishing a negative individual capital stock. In
the long run, credits have to be paid back, and the equilibrium will be reestablished. As a result, in modern (macro)economics, although defined quite differently than in
the classical period, consumption and saving are still complementary ways to spend income.
476
Consumption in Contemporary
Economics
Goods providing prestige, and those representing ambitions, fantasies, and dreams, dominate contemporary private demand. Saturation, changing preferences, and
replacement of goods are the rule rather than the exception, with multiutility goods (e.g., those with use value plus
status) gaining an inherent competitive advantage. Although the evolution of demand exhibits a significant inertia, because demands are learned in the socialization
phases and change with generations, the instruments
(what the Chilean economist Manfred Max-Neef would
call the satisfiers used to meet needs) vary within years
or even months, not only in response to income levels but
also because of knowledge of and familiarity with them.
Information plays a crucial role here; asymmetrical information and communication deficits on the consumers
side will restrict knowledge of and familiarity of goods,
thus markets are not perfect (Joseph Stieglitz, 2002 Nobel
laureate). It is not equilibrium, but rather a permanent
evolution from one nonequilibrium state to another that
characterizes consumer markets and keeps the consumer
society going, based on savings (mostly in Europe) or on
credit and loans (in the United States); in deflationary
Japan and growing China, household savings do not sufficiently translate into private consumption.
Empirical economic research has shown that investment, saving, and consumption are not based on rational
decisions, but instead follow social trends based on former
experience, historical expectations, and rules of thumb
(David Kahnemann, 2003 Nobel laureate). Empirically,
increasing saving does not lead to a decrease in interest
rates, which then leads to decreasing savings and thus to
an equilibrium, as the dominant theory suggests. Standard
economic theory still struggles to accommodate all these
findings.
Utility approach
The very basic assumption of neoclassical demand theory
proposes that consumers intend to maximize the utility of
their consumption decisions from a given bundle of consumption goods and services. Among the necessary assumptions are that each consumer is a homo oeconomicus,
seeking to maximize utility, deciding rationally, exclusively based on self-interest, and having complete and
correct information about the commodities (mainly
their price, their availability, and their ability to satisfy
present and future needs). Commodities have to be uniform in quality and highly divisible to permit incremental
changes in consumption. Further on, the utility has to be
measurable and homogeneous with respect to all commodities consumed; consumer tastes and preferences are
stable and externally determined, and consumers decide
independently of other opinions, and the marginal utility
477
478
Open Questions
Globalization Effects
The role of saving is changing as financial systems become
increasingly globalized. National saving and credit are no
longer the sole source for investment, and global financial
markets are providing a new reservoir. Electronic commerce and e-money offer consumption opportunities
beyond traditional restrictions imposed by saving and
credit institutions. On the macro level, savings create
global flows of investment and speculation, in particular
as long as the return from financial markets and stock
exchanges is higher than the return from real capital investments. Such international capital flows are shifting
consumption opportunities and growth potentials from
capital-exporting countries (e.g., Japan, the Association
of South East Asian Nations, and the European Union)
to capital-importing countries (e.g., the United States).
This trend has become even more pronounced in the past
decade because new incentives for saving were introduced in many OECD countries (the shift of social security measures from a transfer-based system to one based
on private investment at the stock exchange, mostly in
globally active investment funds).
479
480
Further Reading
Blaug, M. (1962). Economic Theory in Retrospect. Richard D.
Irwin, Homewood, Illinois.
Loayza, N., Schmidt-Hebbel, K., and Serven, L. (2000). What
drives private saving across the world. Rev. Econ. Statist.
82(2), 65181.
Content Analysis
Doug Bond
Harvard University, Cambridge, Massachusetts, USA
Glossary
automated (or machine) coding Computer technology that
reads text and extracts user-specified information deemed
relevant to its content and/or context. Such machine
processing of natural language is used to identify and then
record the text-based information in structured data
records for further analysis of the author or message.
Typically, machine coding is used for content analysis when
high volumes of information need to be processed, or when
the results are needed in real-time.
coding manual The decision rules and procedural steps for
creating structured data records from text-based or other
media. These decision rules and procedures follow from the
objectives and explicate the specific steps of a content
analysis process. Coding manuals may also contain information about the assumptions underlying and purpose behind
the content analysis, though sometimes these are implicit.
In any case, coding manuals serve to guide and inform the
coding process.
corpora Sets of tagged text or other media that are used to
train or optimize a machine coding tool or technology. They
may also be used for training humans to code, because they
provide a measure (the tagged, marked-up or coded text)
against which trial coding can be assessed. Corpora
represent samples of a larger pool of information to be
processed and from which certain decision rules pertaining
to syntax or semantics may be inducted.
dictionaries A compilation of decision rules for various
elements or parameters in a content analysis. One set of
dictionary entries may pertain to entities, their attributes,
and relationships, whereas others may detail actions or
events, along with their characteristics. Still other entries
may list alternative spellings, names, or other information
needed to guide the guide and inform the coding process.
When dictionaries contain multiple sets of complex and
related information, they are often referred to as ontologies.
manual (or human) coding The manual equivalent to
machine coding that, for content analysis, entails reading
text and extracting user-specified information deemed
Content analysis is a method to identify, extract, and assess selected information from a form of media for use as
a basis to infer something about the author, content, or
message. The information may appear as text, images, or
sounds, as symbolic representations of the same, or even
in the context of how, and/or way in which, a message is
conveyed through the media. Content analysis has been
used in a wide range of applications, from analyzing print
advertisements to product specifications, from patents to
treaties, and from health to police records, as well as in
a wide variety of literary and nonfiction texts. Over the
past 70 years, the practice of content analysis has been
broadening to include the investigation of information
presented in audio and visual media forms, such as the
monitoring of violence and sexually explicit images and
language in television programs and in films. Especially
with the advent of automated tools for identifying and
extracting images and sounds, content analysis is no
longer bound to documents, per se. Regardless of the
form of media, content analysis as a field within the context of scientific inquiry seeks replicable inferences from
a limited set of observations, with the goal to generalize to
a larger set of the same. The scientific process of content
analysis therefore is explicit and transparent in its definitions and categorizations of information, and in the independent reproducibility of its results, thereby lending
an intersubjective quality to its findings, regardless of
the specific procedures employed.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
481
482
Content Analysis
Content Analysis
483
484
Content Analysis
rather than being followed from a codebook. The statistical computation approach has the advantage of being
language independent. In other words, the incidence of
large volumes of words or symbols (for example, Chinese
characters) in any language can be assessed statistically
without knowing in advance their intended or referenced
meaning.
The statistical computation approach often relies on
the a priori compilation of large corpora of documents in
which the individual words and phrases are catalogued
along with their attributes, typically established by human
or manual coding of sample texts. Thus, a key word, nuclear, might be associated with war (say, 60% of the
time) or with family (say, 30% of the time), depending
on the particular substantive domain involved. The computations of content analysis tools typically account
for both the distance between tokens as well as their
sequence within a document.
Ontologies are also used to formalize the knowledge
conveyed in words and phrases, and to represent the
relationships among them. General-purpose ontologies
often contain tens of thousands of fundamental facts
about entities and the relationships among them. Simpler
versions of these large ontologies contain nothing more
than lists of words associated with a particular concept of
interest to a content analyst. These lists are sometimes
referred to as dictionaries, and are typically focused on
a specific substantive domain. Most of the do-it-yourself
automated tools in the content analysis marketplace allow
for the use of standard or basic dictionaries as well as
customizable lists to be compiled by individual users
and tailored to their specific interests with respect to
the documents being analyzed. In any case, both simple
and complex ontologies facilitate the automated generation of reliable associations of selected words to other
words and phrases, and create indices of the documents
that contain them.
Such technology is widely used in Internet and other
search engines, so that when searching for one or more
terms, the documents most commonly associated with the
terms will be retrieved. Other search engines are built on
more sophisticated search routines that account for links
across documents as well as the history of users accessing
them. In addition, this technology is useful in the automated generation of summaries of documents. Coupling
interactive search with automated summarizing capabilities makes the finding of specific information much
easier.
Practical applications of content analysis, however,
take search and summary procedures one step further
by presenting the results graphically and geographically.
This visualization is made possible by the computational
statistics used in the content analysis tools that quantify
the information and organize it in ways that are amenable
to visually compelling presentations. Automated content
Content Analysis
Further Reading
Allen, J. (1995). Natural Language Understanding, 2nd Ed.
Benjamin/Cummings Publ., Redwood City, California.
Bond, D., Bond, J., Oh, C., Jenkins, J. C., and Taylor, C. L.
(2003). Integrated data for events analysis (IDEA): an event
typology for automated events data development. J. Peace
Res. 40, 733 745.
Evans, W. (2003). Content Analysis Resources. Available on
the University of Alabama web site at www.car.ua.edu /
Holsti, O. R. (1969). Content Analysis for the Social Sciences
and Humanities. Random House, New York.
Jurafsky, D., and Martin, J. H. (2000). Speech and Language
Processing. An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition.
Prentice Hall, Upper Saddle River, New Jersey.
Krippendorff, K. (2004). Content Analysis. An Introduction to its
Methodology, 2nd. Ed., Sage Publ., Newbury Park, California.
485
Glossary
coding The process of examining content (messages and
texts) in order to detect and document patterns and
characteristics.
conceptual definitions Descriptions of concepts that signify
their typical meaning; dictionary-type definitions.
content analysis The research methodology in which messages or texts (often but not exclusively those in the media)
are examined, summarizing their characteristics by using
systematic procedures to place content into carefully
constructed categories.
intercoder reliability The process of establishing agreement
between two or more coders in the coding of content, for
the purpose of minimizing subjectivity and maximizing
reliability.
operational definitions Descriptions of concepts as they will
be measured in the study.
reliability The extent to which a study and its measures can
be replicated by other researchers, or by the same
researchers more than once, and still produce largely the
same results.
sampling The process of selecting particular elements to
examine from a general population about which the
researcher wishes to generalize.
validity The extent to which the variables used in a study
actually measure what the researcher purports or intends to
measure.
Introduction
Why Study Television Content?
Television has a ubiquitous presence in contemporary society, dominating the leisure time of people young and old.
Recent estimates establish average viewing among adults
in the United States at just over 3.5 hours/day, and children
and teenagers watch about an hour less than that daily.
Controversy about television content is as old as the medium. On its initial and rapid diffusion to households in the
United States, television was immediately embraced as
a favorite pastime but also soon sparked concern over
the types of programs and messages the medium conveyed.
Television was in its infancy when the first scientific analyses of its content were conducted by researchers. The
results of two nearly concurrent but separate analyses by
Sydney Head and Dallas Smythe in 1954 largely corroborated the concerns of early critics, with evidence that violence, gender and racial inequalities, and entertainment
programming (rather than educational programming)
abounded on the screen. Since those initial inquiries
were undertaken, there have been hundreds of studies
of television content. Researchers of television programming have documented the presence and pervasiveness
of violence and sex, have investigated portrayals of gender,
race, class, age, and sexual orientation, and have examined
diverse topics, such as the treatment of particular occupations and the depiction of families. Analyses of television
news have also been conducted and have focused on
topics such as the coverage of politics, crime, health
risks, natural disasters, war, and the economy. The studies
have amassed an impressive body of knowledge that
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
487
488
Designing Studies of
Television Content
Conceptualization
Like most studies, content analysis research begins with
an idea of what, specifically, will be analyzed and why. In
this stage, theory may be considered to support
relationships between content variables and extramedia
factors such as social forces, media production and distribution processes, and potential ways that audiences will
receive and respond to the content. A review of the literature would most certainly take place in the conceptualization stage to determine the patterns drawn from prior,
similar studies, to identify gaps in knowledge to be addressed by the present study, and to determine ways in
which key concepts have been measured in the past.
Hypotheses or research questions would typically be
formed to guide the inquiry. Hypotheses posed in content
analysis research may include comparisons among or
between media types or outlets (e.g., newspapers will
Category Construction
The next stage in designing a content analysis is defining the
concepts to be studied, both conceptually and operationally. The overarching goal is to determine ways of measuring the concepts to be investigated that most closely reflect
their dictionary-type denotative and connotative meanings. A close match between conceptual and operational
definitions ensures validity. Operational definitions in content analysis research take the form of categories that will
be used to summarize the media messages examined. For
example, noting the age group of characters used in commercials requires the placement of each person that
appears in the commercial into categories; these may include children, teenagers, young adults, middleaged adults, and the elderly. Careful attention must
be paid to category construction to ensure that the summary of content is as accurate and meaningful as possible.
Many of the same rules that dictate questionnaire
item construction guide the building of content analysis
categories. Categories must be exhaustive and mutually
exclusive. In other words, the researcher must anticipate
as well as possible any and every type of content that will
be encountered in the study, and must have a means of
placing that content decisively into one and no more than
one category. Placing the gender of reality-based program contestants into categories, for instance, provides
a fairly straightforward example. The categories male
and female would presumably be both exhaustive and
mutually exclusive. If the researcher wanted to determine whether such contestants model stereotypical gender roles, the construction of categories to indicate such
roles would be more complex but would still be guided by
the exhaustive and mutually exclusive requirements.
The unit of analysis must be established. The unit of
analysis is the individual entity that is actually being studied; it will be represented by one row of data in the computer file that will be used to analyze findings. The
identification of the unit of analysis in content analysis
research can be less straightforward than in other
methods, when a single individual is often the entity studied. Does the study have television program as the unit
489
Sampling
Sampling decisions are also not unique to content analysis, although there are particular opportunities and challenges that arise in sampling television content. The
objective is to gather a sufficient and appropriate amount
and type of television content to test the hypotheses and/
or examine the research questions. Therefore, content
analysis samples, like other samples in social science research, are ideally fairly large and representative of the
general population about which the researcher wishes to
infer.
Probability sampling techniques that ensure that each
unit in the population has an equal chance of being included in the sample can be utilized. A simple random
sample may be generated, for instance, by compiling a list
of all programs, episodes, or characters (depending on
the unit of analysis) in the population and then using
490
491
492
as a dependent measure (number of acts of physical aggression). The researcher would employ an independent
t-test to determine mathematically whether a significant
gender difference in acts of physical aggression had been
found. In other research scenarios, correlations are used
to look for associations between two continuous measures,
analysis of variance is used to examine differences among
three or more groups, and chi square is used to examine
relationships between nominal-level variables.
Discussing Results
After performing statistical analyses of the data, the researcher is poised to make more general conclusions
about the study and what it has discovered about the
nature of television content. Limitations to the research
are acknowledged when discussing the implications of the
study, and connections are made to past research to determine how the new data support or refute and replicate
or extend prior findings. If the motivation for the study
was concern for the possible impact of watching the content that was analyzed, for children or for general audiences, then speculation and/or theoretically grounded
claims about the potential effects of the content explored
would be advanced.
Two particular research traditions have artfully combined content analysis research and effects research to
determine affiliations between television messages and
the thoughts or attitudes of its audiences. Agenda-setting
theory, originally introduced by Max McCombs and
Donald Shaw in 1972, compares topics included in
news media content with the topics listed by members
of the public as the most important problems facing society. Cultivation theory, initially proposed by George
Gerbner in 1969, draws a link between the pervasiveness
of violence on television and the perception among viewers who spend a great deal of time watching television that
the world is a mean and dangerous place. Regardless of
whether content findings are associated with factors pertaining to either media production and distribution
processes or influence on audience members, studies
of television content are immensely interesting and important. Television is a vital and pervasive part of contemporary culture, and researchers involved in documenting
its patterns of content assist in furthering understanding
of this highly popular medium.
Assessment Validity,
Data
Further Reading
Berelson, B. R. (1952). Content Analysis in Communication
Research. The Free Press, New York.
493
McCombs, M. E., and Shaw, D. L. (1972). The agendasetting function of mass media. Public Opin. Q. 36,
176 187.
Neuendorf, K. A. (2002). The Content Analysis Guidebook.
Sage, Thousand Oaks, California.
Riffe, D., Lacy, S., and Fico, F. G. (1998). Analyzing
Media Messages: Using Quantitative Content Analysis
in Research. Lawrence Erlbaum, Mahwah, New
Jersey.
Smythe, D. W. (1954). Reality as presented by television.
Public Opin. Q. 18(2), 143 156.
Content Validity
Doris McGartland Rubio
University of Pittsburgh, Pittsburgh, Pennsylvania, USA
Glossary
construct validity The extent to which an item or measure
accurately represents the proposed construct.
content validity The extent to which items from measure are
sampled from a particular content area or domain.
content validity index A quantitative assessment of the
degree to which the item or measure is content valid by an
evaluation of a panel of experts.
criterion validity The extent to which an item or measure
predicts a global standard.
face validity The extent to which a measure appears to be
valid.
factor The domains or categories to which a measures
individual items can be assigned.
factorial validity The extent to which the measure maintains
the theoretical factor structure as assessed by factor
analysis.
factorial validity index A quantitative assessment of the
degree to which the factor structure hypothesized by the
researcher is supported by an evaluation of the measure by
a panel of experts.
interrater agreement The degree to which raters agree
when evaluating something.
logical validity The extent to which a measure is deemed by
a panel of experts to contain content validity.
validity The extent to which an item or measure accurately
assesses what it is intended to measure.
Introduction
Social science researchers often develop instruments to
measure complex constructs. Before they can make conclusions based on the measure, the measure must be shown
to have adequate validity. Validity assesses the extent to
which an instrument measures the intended construct.
Although all validity contributes to the construct validity
of a measure, validity can be compartmentalized into three
types: content, criterion-related, and construct validity.
Content validity is a critical part of measurement, given
that most research in the social sciences involves
constructs that are difficult to measure. Due to the complexity of constructs, single indicator items are not sufficient to measure these constructs. Rather, multiple items
are needed to approximate the construct. As a result, the
content validity of a construct becomes paramount to
ensure that the items used in the measure all come
from the same domain. Content validity is the first step
in evaluating the validity of the measure. Although this
subjective form of validity relies on peoples perceptions,
it can also be objectified with a rigorous content validity
study. Such a study can inform the researcher how representative of the content domain the items are and how
clear the items are, as well as providing an initial assessment of the factorial validity of the measure. Assessing
validity is a never-ending process; however, the process
can be advanced with the use of a content validity study.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
495
496
Content Validity
Definitions of Validity
Traditionally, validity has been conceived as consisting of
three types: content, criterion-related, and construct.
Content validity generally refers to the extent to which
items in a measure all come from the same content domain. Criterion-related validity indicates the extent to
which the measure predicts a criterion or gold standard.
Construct validity is the highest form of validity because it
subsumes the other types. When we are concerned about
whether the measure is measuring what it is suppose to be
measuring, we are concerned with construct validity.
Construct validity can be assessed by examining the measures factorial validity, known-groups validity, and/
or convergent and discriminant validity. Some would
argue that construct validity is the only type of validity
to which we should be referring. The differences are only
in the ways of testing for validity; all the ways of testing
contribute to a measures construct validity.
Selecting a Panel
Content Validity
Table I
497
INSTRUCTIONS: This measure is designed to evaluate the content validity of a measure. In order to do that, please
rate each item as follows.
Please rate the level of representativeness on a scale of 14, with 4 being the most representative. Space is provided
for you to make comments on the item or to suggest revisions.
Please indicate the level of clarity for each item, also on a four-point scale. Again, please make comments in the
space provided.
Please indicate to which factor the item belongs. The factors are listed along with a definition of each. If you
do not think the item belongs with any factor specified, please circle 3 and write in a factor may
be more suitable.
Finally, evaluate the comprehensiveness of the entire measure by indicating if any items should
be deleted or added. Thank you for your time.
Theoretical def inition
Representativeness
Clarity
Factors
1 factor
3 other, specify
1. Item 1
1 2 3 4
Comments:
1 2 3 4
Comments:
1 2 3
Comments:
2. Item 2
1 2 3 4
Comments:
1 2 3 4
Comments:
1 2 3 4
Comments:
2 factor
Copyright 2002, National Association of Social Workers, Inc., Social Work Research.
498
Content Validity
Conclusion
Testing the validity of a measure is a never-ending process.
Conducting a content validity study is the first step in that
process. Although content validity is a subjective form of
validity, assessing content validity with a panel of experts
who evaluate the measure on objective criteria enables
the researcher to make significant improvements to the
measure prior to pilot testing the instrument. Information
can be gleaned from a content validity study that would
not necessarily be realized from other forms of psychometric testing. For example, administering the measure to
200 subjects in order to assess the internal consistency and
factorial validity would not provide any indication of how
items could be revised in order to improve the validity of
the individual items. With the exception of item response
theory and structural equation modeling, most psychometric testing does not evaluate measures on an item-level
analysis. A content validity study provides information on
the individual items of the measure in regards to how
representative and clear each item is. Raters can
provide comments on each item for the researcher by
offering suggestions on how the item can be revised to
improve its content validity. In addition, information is
obtained on the performance of the overall measure in
terms of the measures level of content validity and an
initial assessment of the measures factorial validity.
Organizational
Data Sources
Next Steps
When the researcher is satisfied with the CVI of the measure, the next steps for the measure involve pilot testing
and further psychometric testing. Different types of reliability and validity need to be assessed before the measure is used in a decision study or a study that is testing
a particular theory. The measure should always be
evaluated and tested prior to testing any theory. No
generalizations can be made regarding the measure
until we are confident that our measure accurately and
consistently measures the construct of interest.
Behavior Validity
Assessment Validity,
Further Reading
Anastasi, A., and Urbina, S. (1997). Psychological Testing, 7th
Ed. Prentice Hall, Upper Saddle River, NJ.
Davis, L. (1992). Instrument review: Getting the most from
your panel of experts. Appl. Nurs. Res. 5, 194197.
Grant, J. S., and Davis, L. L. (1997). Selection and use of
content experts for instrument development. Res. Nurs.
Health 20, 269274.
Hubley, A. M., and Zumbo, B. D. (1996). A dialectic on
validity: Where we have been and where we are going.
J. Gen. Psychol. 123, 207215.
Lynn, M. (1986). Determination and quantification of content
validity. Nurs. Res. 35, 382385.
Messick, S. (1989). Validity. In Educational Measurement
(R. L. Linn, ed.), 3rd Ed., pp. 13103. Macmillan, New York.
Nunnally, J. C., and Bernstein, I. H. (1994). Psychometric
Theory, 3rd Ed. McGraw-Hill, New York.
Rubio, D. M., Berg-Weger, M., Tebb, S. S., Lee, E. S., and
Rauch, S. (2003). Objectifying content validity: Conducting
a content validity study in social work research. J. Soc. Work
Res. 27, 94104.
Shepard, L. A. (1994). Evaluating test validity. Rev. Res. Educ.
19, 405450.
Glossary
chi-square tests Goodness-of-fit tests designed to test the
overall fit of a model to the data by comparing the observed
counts in a table with the estimated expected counts
associated with some parametric model; these tests involve
the computation of test statistics that are referred to a chisquare distribution with a suitable number of degrees of
freedom.
contingency table A cross-classified table of counts according to two or more categorical variables.
degrees of freedom Associated with a log-linear model is the
number of independent parameters in a saturated model
that are set equal to zero; the degrees of freedom are used
as part of chi-square goodness-of-fit tests.
logit model (linear logistic model) A statistical model
for the log-odds of an outcome variable associated with
a contingency table as a linear function of parameters
associated with explanatory variables; logit models can be
given log-linear model representations.
log-linear model A statistical model for the expected
counts in a contingency table, which is a linear function
of parameters that measure the dependence among the
underlying variables; log-linear models include models of
mutual, conditional, and joint independence.
maximum likelihood A method of statistical estimation that
provides efficient estimates of parameters in a model in
a wide variety of circumstances.
minimal sufficient statistics Succinct data summaries associated with a parametric model; they can be used in lieu of
the original data for estimation purposes. For log-linear
models associated with contingency tables, these statistics
are marginal totals.
multiple systems (multiple-recapture) estimation A
method for combining information from two or more
sources to estimate a population total, including an estimate
for the number of individuals missed by all sources.
saturated model A fully-parameterized model that describes
the data in a contingency table perfectly.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
499
500
Two-Way Tables
Table I contains a summary of University of California
(Berkeley) graduate admissions data in 1973 for the six
largest programs; it is essentially this information that has
been used to argue that graduate admissions have been
biased against women. Overall, for the six programs,
44.5% of male applicants but only 20.5% of female
applicants were actually admitted to graduate school.
Is this evidence of sex discrimination? Those who compiled the data in Table I argued that it is such evidence
and they presented a standard chi-square test of independence to support their claims: X2 266.0 with 1 degree of freedom (df), corresponding to a p-value
550.0001. A similar conclusion appeared to hold for
all 101 graduate programs.
Table II contains one version of a classical data set on
social mobility in Great Britain; the data were collected by
the demographer David Glass and have been analyzed by
countless others. As has often been remarked by those
Men
Women
Total
Admitted
Refused
1198
1493
354
1372
1552
2865
Total
2691
1726
4417
1 (high)
2
3
4
5 (low)
50
28
11
14
0
45
174
78
150
42
8
84
110
185
72
18
154
223
714
320
8
55
96
447
411
mi mj
,
N
501
statistic is
2
2
X xij m
^ ij
ij
Men
m
^ ij
h
2 i
X xij xi xj =N
,
xi xj =N
ij
Program
Three-Way Tables
Table I reported admissions to the six largest graduate
programs at the University of California at Berkeley in
1973. Table III shows these data disaggregated by program (labeled A through E). Note that women were admitted at higher rates than men were in four of the six
programs! Separate chi-square tests applied separately to
each program suggest that independence is a reasonable
model to characterize the admissions process, except for
program A, for which the bias appears to be toward
women rather than against them. Thus there appears to
be little or no evidence of sex discrimination once one
introduces the explanatory variable program and looks at
the relationship between gender and admissions status
conditional on it. A relevant overall model for this
three-way table would thus be that admissions status is
independent of gender or conditional on program. This is
the first example of a log-linear model for three-way contingency tables.
A
B
C
D
E
F
Women
Admitted
Refused
Admitted
Refused
512
353
120
138
53
22
313
207
205
279
138
351
89
17
202
131
94
24
19
8
391
244
299
317
Table IV
Alcohol use
Cigarette use
Yes
No
Yes
No
Yes
No
911
44
3
2
538
456
43
279
Yes
No
a
502
Table V
Shorthand notation
Log-linear models
Degrees of freedom
[1][2][3]
IJK I J K 2
[1][23]
(I 1)( JK 1)
[2][13]
(J 1)(IK 1)
[3][12]
(K 1)(IJ 1)
[12][13]
I(J 1)(K 1)
[12][23]
J(I 1)(K 1)
[13][23]
K(I 1)( J 1)
[12][13][23]
[123]
(I 1)( J 1)(K 1)
0
Interpretation
Mutual independence of
1, 2, and 3
Independence of 1 from 2
and 3 jointly
Independence of 2 from 1
and 3 jointly
Independence of 3 from 1
and 2 jointly
Conditional independence of
2 and 3 given 1
Conditional independence of
1 and 3 given 2
Conditional independence of
1 and 2 given 3
No second-order interaction
Saturated model
i 1, 2, . . . , I,
and
j 1, 2, . . . , J,
m
^ ik xik ,
m
^ jk xjk ,
i 1, 2, . . . , I,
j 1, 2, . . . , J,
and
and
k 1, 2, . . . , K,
k 1, 2, . . . , K:
9
G 2
X
ijk
!
xijk
xijk log
,
m
^ ijk
10b
497.4
92.0
187.8
0.4
443.8
80.8
177.6
0.4
df
p-value
2
2
2
1
50.0001
50.0001
50.0001
0.54
503
504
505
URLs retrieved by six search engines responding to a specific query in December, 1997
Northern Light
Yes
Yes
Yes
No
Lycos
Lycos
Yes
No
Yes
No
HotBot
HotBot
HotBot
HotBot
Yes
No
Yes
No
Yes
No
Yes
No
Yes
No
Yes
No
19
Yes
No
Yes
22
No
17
31
Excite
Infoseek
No
Excite
AltaVista
Yes
No
Excite
Infoseek
No
Excite
13
506
Further Reading
Agresti, A. (2002). Categorical Data Analysis, 2nd Ed. Wiley,
New York.
Bickel, P. J., Hammel, E. A., and OConnell, J. W. (1975). Sex
bias in graduate admissions: Data from Berkeley. Science
187, 398404.
Bishop, Y. M. M., Fienberg, S. E., and Holland, P. W. (1975).
Discrete Multivariate Analysis: Theory and Practice. MIT
Press, Cambridge, Massachusetts.
Conversation Analysis
Danielle Lavin-Loucks
University of Texas, Dallas, Richardson, Texas, USA
Glossary
adjacency pairs A sequence of two utterances produced by
different speakers that are adjacent, where one utterance is
designated the first pair part and the utterance it occasions
is the second pair part.
ethnomethodology The study of the mundane activities of
social life, common-sense knowledge, and methods used by
social actors to produce and reproduce social order.
preference structure The notion that utterances are fashioned or designed in such a way that they prefer or favor
a given type of response. To provide a dispreferred utterance
in response requires interactional work, whereas a preferred
response is one that is, by design, in line with the prior
utterance.
repair A correction or change made by a speaker, a listener,
or another participant in a conversation that rectifies,
refines, or explains a prior utterance.
utterance A discrete unit of talk produced by a single speaker.
Introduction
Why Study Conversation?
Talk pervades every aspect of social life. As a fundamental
feature of social life, conversation (people talking
together) forms the basis through which social identities
are created and structural relationships and formal organizations are enacted and perpetuated, or literally talked
into existence. Historically, those interested in social
interaction and social organization relied on language
and talk as a means to conduct sociological research,
but gave short shrift to talk as a research topic or likewise
a primary locus of social organization. Although linguists
have analyzed the structure and function of language, they
have not addressed the structure of social interaction or
the practical actions that talk embodies, an oversight that
neglected a domain of underlying social order.
Ordinary conversation provides an opportunity for
studying interaction on its own terms, as opposed to
imposing formal structural variables and theoretical precepts onto talk. Furthermore, the study of conversation
taps into what Schegloff, in 1987, contended is a primordial site of social order and organization. More than this,
for the participants involved in any given instance of interaction, conversation has meaning. Understanding how
this meaning is created and altered through discourse
furnishes sociologists with a study of interaction that is
attentive to the context of interaction, but likewise revelatory of larger patterns of interaction that exist independently of any one instance of interaction. Talk performs
actions and interactional meaning can be ascertained
through an examination of the sequence of conversation.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
507
508
Conversation Analysis
Foundations of CA
In the 1960s, Harvey Sacks and Emmanuel Schegloff, and
later Gail Jefferson, developed the conversation analytic
perspective. However, the researchers did not set out to
study language per se; the goal was to develop a new
empirical approach to sociology. The quantitative techniques and large-scale grandiose theories that failed to fit
the data that were characteristic of mainstream social
research resonated as problematic with Sacks, Schegloff,
and Jefferson. Hence, CA can be viewed as the outcome of
a general dissatisfaction with conventional sociology.
Although it is a discourse-based approach to social
interaction, CA emerged out of sociological enterprises,
not entirely linguistic in nature. CA finds its roots in phenomenology, sociolinguistics, Austin and Searles speech
act theory, Wittgensteins ordinary language philosophy,
Harold Garfinkels ethnomethodology, and the pioneering work of Erving Goffman. Below, the ethnomethodological foundations of CA as well as the contributions of
Erving Goffman are discussed.
Ethnomethodology
The ethnomethodological tradition, formulated by Harold
Garfinkel, had a profound impact on the development
of the new radical approach to sociology. Fundamentally,
Garfinkels ethnomethodology sought to explain how people do the things they do, thereby illuminating the formal
and ordered properties of the taken-for-granted details of
social life. Although ethnomethodology concentrated on
seemingly mundane features of interaction, it revealed
something far more complex: an interaction-based social
order. The interest in the mundane and taken-for-granted
is mirrored in CAs approach, which takes as its starting point perhaps the most mundane of all activities,
conversation.
Potentially even more profound for the development
of CA, ethnomethodology endeavored to uncover the
common-sense knowledge and background expectancies
that allow individuals to interact with one another and
make sense of the world around them. For example,
Wieders 1974 ethnomethodological study of the convict
code examines deviance in a halfway house and demonstrates how residents invoke a code of conduct to explain
their actions and in turn how the code serves as an interpretive framework for such deviance. What Wieders
study substantiated was the very existence of underlying
social order, expectancies, and stable properties of interaction that allowed convicts to view their deviance as
Conversation Analysis
abandoning generalized theoretical constructs, and holding at bay preconceived notions of how things are or
should rightly be organized in favor of a neutral process
of inductive discovery. This is not to imply that researchers interests do not guide the discovery of phenomena,
but the process of discerning phenomena of interest is
driven primarily by noticings and observations.
Appropriate sites for unearthing phenomena of interest include all instances of natural conversation. In other
words, CA limits the appropriate types of data used for
analysis to nonexperimental, unprovoked conversation,
ranging from the everyday telephone exchange to institutional talk, such as discourse in legal trials or exchanges
between doctor and patient. Thus, almost any form of
verbal interaction that is noncontrived constitutes usable
data. The forms of data that conversation analysts use
include audio and video recordings of interaction,
whether collected by the researcher or already preexisting. Supplemental data, such as observations, interviews,
and ethnographic notes, constitute additional resources;
however, these are generally used secondarily.
Because CA is a naturalistic science, and phenomena
are unearthed using a process of unmotivated looking,
sampling issues are somewhat different than those
encountered in traditional sociological research. After
a specific phenomenon has been identified, the CA
approach involves gathering and assembling instances
of the phenomenon within one conversation, across
a number of instances of talk, and/or within varied settings. The unit of analysis or what an instance of
a phenomenon consists of is partially dependent on the
specific phenomenon. The goal is to gather as many specimens from as many different sources as possible. Many of
the phenomena that CA considers are new, in the sense
that they have not been fully explored or, for that matter,
identified. Before the distribution of practices across
social interaction can be analyzed, the practices
themselves require identification. Thus, the notion of
a population, or whether a sample is representative
or generalizable, is not necessarily at issue. What is at
issue is the endogenous organization of conversational
structures, which in some cases can be ascertained with
only one instance. Once a phenomenon is identified, analysts can search for instances that fall outside of the given
pattern.
Transcription
After a site is selected and a recording obtained, CA
research requires repeated examination of video- or
audio-recorded talk. Although written transcripts can
never replace audio and video recordings, they aid in
satisfying the repetitive viewing constraint and additionally serve as a resource for publicly presenting data in
written form. Thus, the need for a transcript is based
509
not only on practical concerns (i.e., publishing and presenting data), but also on analytical concerns when used
properly in conjunction with recordings.
Developing a detailed transcript entails not only capturing what was said, but also capturing the nuances of
how it was said. However, the transcripts produced by CA
practitioners are not phonetic transcripts, as is the case
with discourse analysis, nor are they glosses or approximations of what participants in an interaction say, as may
be the case with nonrecorded participant observation.
Instead, CA practitioners write down what is said verbatim, while simultaneously attending to the details of the
conversation that are normally corrected or omitted
from records of talk. In other words, it is not sufficient
to rely on accounts of conversational events or
a researchers description of a particular instance of
talk; these accounts can unintentionally compromise
the integrity of the data. The objective of this sort of
approach is to provide as much detail as possible while
still allowing a reader unfamiliar with CA access to the
data and forthcoming analysis. However, despite the
immense detail that goes into a transcript, even CA
researchers agree that the transcript that is produced is
an artifact and cannot sufficiently capture all of the detail
that is available on a recording, which is why the focus
of analysis remains on the actual recorded data.
The majority of transcripts that CA practitioners produce and rely on in their analyses utilize the transcription
system developed by Gail Jefferson. Table I is a partial
list of transcribing conventions and includes the most
commonly used symbols and their meaning.
The purpose of the detailed notation system is to capture sounds, silences, and the character or details of the
talk. As previously mentioned, ordinarily corrected phenomena, such as word repetitions, uhms, pauses, and
mispronunciations, are included in the transcript and
treated as important resources for accessing the nuances
of conversation. A labeling system similar to the documentation that accompanies ethnographic notes is used to
organize detailed transcripts. Traditionally, transcripts
supply the name of the speaker(s) or pseudonym(s),
the time and date that the recording was made, the
date that the conversation was transcribed, the name of
the transcriber, and a brief description of the setting in
which the conversation occurred.
With the advent of videotaping, new conventions were
necessary to capture elements of gaze, body positioning,
and nonverbal gestures that were available to the participants in the conversation, but previously unavailable to
the researcher. However, despite the increase in the use
of videotaped data, the specific notation system used to
describe actions and activities other than talk is not systematic, nor is it well developed. Although individual
researchers have developed highly specialized conventions that capture the complexity of nonverbal actions,
510
Table I
Symbol
"#
[
]
[[
()
(hello)
(( ))
(0.0)
(.)
HELLO
hello
hel(h)lo
hhh
hhh
hello:::
helhello,
hello?
hello.
hello!
hello
4hello5
5hello4
...
!
Conversation Analysis
Transcribing Conventions: Symbols Common to Conversation Analysis
Details
Technical meaning
Arrows
Right bracket
Left bracket
Double brackets
Empty parentheses
Parenthesized word(s)
Double parentheses
Parenthesized numbers
Parenthesized period
Uppercase
Underline
Parenthesized (h)
Raised degree and hs
No degree and hs
Colon(s)
Dash
Comma
Question mark
Period
Exclamation point
Equal sign
Raised degree signs
Carets
Carets
Ellipsis
Side arrow (left margin)
Source: Adapted from Sacks, H., Schegloff, E. A., and Jefferson, G. (1974). A simplest systematics for the organization of turn taking for conversation.
Language 50, 696735, with permission.
Conversation Analysis
511
512
Conversation Analysis
2
3
0:4
AN:Hhh uh ::: - 0:2 I don think I
ca :: n: I-
4
5
Types of CA
As a burgeoning approach to social interaction, CA
evolved into two distinct, yet complementary trajectories,
which are only moderately elaborated here. Although
both trajectories attended to the phenomenon of order,
and both accepted the general assumptions underlying
the conversation analytic approach, the means whereby
this was achieved and the goal of the analysis revealed
fundamentally different approaches to social research.
The first strand of CA research maintained its pure
focus on structure and sequence in naturally occurring
talk, concentrating on everyday conversations between
friends, strangers, intimates, etc.
Conclusion: Advantages,
Limitations, and Future
Directions
Advantages
What is distinct about CA is its approach to the phenomenon of interest. The type of analysis that is produced can
formally specify structures of talk, locate endogenous
order, and systematically illuminate the patterns that
characterize everyday interactions. What is most notable
about the conversation analytic approach is its appreciation of and attention to detail. Each detail is treated as an
analytical resource. Moreover, it is through the careful
analysis of detail that conversation analysts come to an
appreciation of how institutions are created, sustained,
identified, conveyed, and altered and how relationships
are formed through social interaction. Although conversation analysis may seem unsuited to investigations of
institutions, it can identify minute details of interaction
that comprise large-scale enterprises, as with the justice
system or the medical field.
Limitations
The main disadvantage of CA lies in the limitations it
imposes on the type of data suitable for analysis: recorded
(video or audio) data. Although this constraint guarantees
the veracity of the data, it severely limits the scope of
Conversation Analysis
examinable phenomena. In addition, some of the language surrounding CA and the related literature is highly
specialized and difficult to understand for inexperienced
practitioners. Although the transcription system is relatively standardized, it too can appear obscure or difficult
and is likewise time-consuming to learn, sometimes giving
the impression of a foreign language.
Moreover, just as other social scientific methods are
suitable for answering specific types of questions, so too is
CA. Researchers interested primarily in the distribution
of phenomena, or large-scale macrosociological questions, find that the line-by-line analysis characteristic to
CA may not be well suited to their research topic. Likewise, those large-scale macroprocesses that are not linguistically based fall outside of the realm of appropriate
topics for CA research.
Future Directions
The initial findings of conversational rules and the structure present in conversation have already been extended
into studies of institutional settings and workplace environments including the realms of survey research, human/
computer interaction, news interviews, medical and educational settings, and justice system operations. Some
researchers have used CA to examine how decisions
are made, how news is delivered, how diagnoses are communicated, and how disagreements and arguments are
resolved. Although CA here has been presented as
a method of analysis, or analytical technique, it can also
be characterized as an ideology; CA represents a way of
thinking about the world and the interactions that comprise lived social experiences. As a microinteractional
approach to order, CA is in its infancy. With the variation
in CA research, the potential for extention into the realm
of previously ignored social interaction is limitless.
Although a great deal of progress has been made in
describing the organization of talk, a great deal remains
unexplained and unidentified.
513
Further Reading
Beach, W. A. (1991). Searching for universal features of
conversation. Res. Language Soc. Interact. 24, 351368.
Boden, D. (1990). The world as it happens: Ethnomethodology
and conversation analysis. In Frontiers of Social Theory:
The New Synthesis (G. Ritzer, ed.), pp. 185213. Columbia
University Press, New York.
Clayman, S. E., and Maynard, D. W. (1995). Ethnomethodology and conversation analysis. In Situated Order: Studies in
the Social Organization of Talk and Embodied Activities
(P. ten Have and G. Psathas, eds.), pp. 130. University
Press of America, Washington, DC.
Drew, P. (1992). Contested evidence in courtroom crossexamination: The case of a trial for rape. In Talk at Work:
Interaction in Institutional Settings (P. Drew and
J. Heritage, eds.), pp. 470520. Cambridge University
Press, Cambridge, UK.
Drew, P., and Holt, E. (1998). Figures of speech: Idiomatic
expressions and the management of topic transition In
conversation. Language Soc. 27, 495522.
Heritage, J. (1997). Conversation analysis and institutional talk:
Analysing data. In Qualitative Research: Theory, Method
and Practice (D. Silverman, ed.), pp. 161182. Sage,
London.
Mandelbaum, J. (1990/1991). Beyond mundane reasoning:
Conversation analysis and context. Res. Language Soc.
Interact. 24, 333350.
Maynard, D. W. (1997). The news delivery sequence: Bad
news and good news in conversational interaction. Res.
Language Soc. Interact. 30, 93130.
Pomerantz, A. M. (1988). Offering a candidate answer:
An information seeking strategy. Commun. Monographs
55, 360373.
Psathas, G. (ed.) (1990). Interactional Competence. University
Press of America, Washington, DC.
Schegloff, E. A. (1992). The routine as achievement. Hum.
Stud. 9, 111152.
Silverman, D. (1998). Harvey Sacks: Social Science and
Conversation Analysis. Policy Press, Oxford, UK.
ten Have, P. (1999). Doing Conversation Analysis: A Practical
Guide. Sage, London.
Whalen, J., and Zimmerman, D. H. (1998). Observations on
the display and management of emotion in naturally
occurring activities: The case of hysteria in calls to
9-1-1. Soc. Psychol. Quart. 61, 141159.
Clayton Mosher
Washington State University, Vancouver, Washington, USA
Glossary
anonymity No identifying information is recorded that could
be used to link survey respondents to their responses.
confidentiality Identifying information that could be used to
link survey respondents to their responses is available only
to designated research personnel, for specific research
needs.
lambda (l) The frequency with which an individual commits
offenses.
National Institute of Drug Abuse (NIDA)-5 The five most
commonly used illegal drugs: marijuana, cocaine, methamphetamine, opiates, and phencyclidine.
official data Information derived from the normal functioning of the criminal justice process.
probability sample A sample that relies on a random, or
chance, selection method so that the probability of selection
of population elements is known.
quota sample A nonprobability sample in which elements are
selected to ensure that the sample represents certain
characteristics in proportion to their prevalence in the
population.
response rate The number of individuals participating in
a survey, divided by the number selected in the sample.
selective incapacitation A policy that focuses on incarcerating individuals who are believed to be at high risk of
reoffending.
self-report data Information obtained by asking people
about their criminal behavior.
urinalysis A chemical analysis of a urine sample to determine
if an individual has used drugs.
Introduction
The U.S. Department of Justice/Bureau of Justice Statistics (BJS), the Rand Corporation, and the U.S. Department of Justice/National Institute of Justice Arrestee
Drug Abuse Monitoring (ADAM) program are sources
of data on correctional facilities and known offenders. The
BJS collects data that examine federal, state, and local
correctional facilities and provides counts of inmates in
such facilities; the Rand studies in 1976 and 1978 were
inmate surveys of adult prisoners; and the ADAM program involved self-report data on individuals recently
arrested and awaiting criminal justice processing. Because these data concerned arrested and/or incarcerated
offenders, they suffer from some of the same methodological shortcomings that characterize all official crime
data. The criminal justice system can be depicted using
the analogy of a funnel as offenders are removed from the
system at each stage of processing. Of all offenses
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
515
516
Census of Jails
The Census of Jails is conducted by the Bureau of Justice
Statistics approximately every 6 years and is supplemented by the more abridged Annual Survey of Jails.
The most recent Census of Jails was conducted in
1999, and data have also been collected for the years
1970, 1972, 1978, 1983, 1988, and 1993. The 1999 census
included all locally administered jails that detained inmates beyond arraignment (typically 72 hours, thus excluding facilities such as drunk tanks, which house
individuals apprehended for public drunkenness) and
that were staffed by municipal or county employees.
The 1999 census identified 3084 jail jurisdictions in the
United States, and response rates of the institutions to the
517
518
inmates (i.e., the number of crimes committed by offenders in the 90th percentile of offending frequency). Their
findings indicated wide disparities in offending frequency. For example, with respect to the crime of robbery, half of the active robbers reported robbing someone
no more than five times in a given year, but the 10% most
active robbers reported committing no fewer than 87
robberies per year. Evidence of a small percentage of
highly active offenders was also reported for other
types of offenses: for assault, the median number of offenses was 2.40, whereas those in the 90th percentile
reported 13; for burglary, the median was 5.45, whereas
those in the 90th percentile reported 232; for theft, the
median was 8.59, with those in the 90th percentile reporting 425; for forgery and credit card offenses, the median
was 4.50, with the 90th percentile reporting 206; for fraud,
the median was 5.05, with the 90th percentile reporting 258. For crime in general, excepting drug sales, the
median was 14.77 offenses, with those in the 90th percentile reporting 605. These data led researchers to
conclude that most people who engage in crime, even
those who are incarcerated as a result of committing
crime, commit crimes at a relatively low level of
frequency. However, some individuals offend so regularly
that they can be considered career criminals.
Replications of the Rand Surveys
Partially as a result of concerns regarding the methodology and low response rates in the second Rand survey,
there have been three major follow-ups/replications of
Rand II: a reanalysis of the original the 1978 Rand II
survey data, a survey of prisoners in Colorado, and
a similar survey of incarcerated felons in Nebraska.
The first of these studies attempted to address the
issue of incomplete or ambiguous survey responses in
the Rand data. Focusing on the crimes of robbery and
burglary, missing or ambiguous data were coded more
conservatively than was done by the original researchers.
On recoding and reanalyzing these data, the derived
estimates of offending frequency were very similar to
the original Rand minimum estimates, although somewhat different from the Rand maximum estimates.
The replication of Rand II in the Colorado survey involved collecting data over a 1-year period between 1988
and 1989; two separate surveys were utilized (a long and
a short form). Some respondents were guaranteed anonymity, others were assured of confidentiality. These
methodological strategies were used in order to examine
whether the use of a particular instrument and/or the
different levels of protection affected inmates response
rates. The Colorado study was also the first one of this
nature to collect data on female inmates. The overall participation rate of 91% in the Colorado survey was much
higher than that in the Rand surveys; this was attributed to
three factorsthe sample for the Colorado survey was
drawn from inmates awaiting processing (with no involvement in other activities), the inmates had only recently
received their sentences and were thus less likely to be
influenced by peer pressure not to participate, and $5 was
credited to each respondents account for completing the
interview.
Similar methodology was employed in the Nebraska
inmate survey, which yielded a similarly impressive response rate and reported comparable findings. It is also
important to note that although the Nebraska and
Colorado replications of Rand II utilized somewhat different methods and produced much higher response
rates, the findings of all three studies were quite similar
with respect to offending frequency, providing support
for the validity of the Rand findings.
Rand data provide an excellent resource for studying
known offenders and have been used to examine a wide
variety of issues. One application of these data involved
the development of an offender typology. In a report
entitled Varieties of Criminal Behavior, inmates who participated in the Rand surveys were classified into 10 different groups based on the types and combinations of
crimes in which they engaged. Among these was
a category of violent predators who committed a wide
variety of crimes at a very high frequency. It was found
that although these offenders perpetrated assaults, robberies and drug offenses at very high rates, they also
committed more property crimes (e.g., burglaries and
thefts) compared to offenders who specialized in these
types of crimes. Partly because of these findings, research
on chronic offenders and selective incapacitation was
conducted using the Rand data. In one study, it was predicted that targeting highly active violent offenders could
simultaneously reduce crime rates and prison populations. A correctional strategy was proposed that minimized the role of incarceration for most offenders, but
emphasized long-term incarceration for offenders believed to be likely to commit crimes at a high rate in
the future. The selective incapacitation approach was subsequently criticized on both methodological and ethical
grounds, though research on it continues today. Of particular concern regarding this research was a proposed
predictive scale that was supposedly able to identify
chronic offenders; also of concern were questions
about whether the prisoners in the Rand sample were
characteristic of prisoners in general and the cost and
likelihood of a false positive if selective incapacitation
were put into policythat is, incapacitating an individual based on the mistaken belief that the individual
will commit future crimes.
Rand data have also been employed to investigate
offenders beliefs about the financial benefits of crime
and the link between drug use and crime. Indeed, such
research has found that offenders tended to exaggerate
the profits obtained through criminal activity in an effort
519
to justify their current (i.e., incarcerated) situation. Further, other analyses have employed Rand data to examine
the relationship between drug use and offending frequency. Results indicate that drug use cannot be said
to cause crime, because predatory criminality often
occurs before drug use. However, this research also
found that the amount of drugs used by an offender
is linked to the frequency of their offending, with rates
of offending being two or three times higher when
the offender is using drugs than when abstaining or in
treatment.
In sum, although the Rand surveys may suffer from
methodological problems associated with studying inmate
populations, replications have produced similar findings,
increasing confidence in their conclusions. These data
have provided a significant contribution to knowledge
and understanding of crime and serve as a standard for
collecting and analyzing self-report data from inmates.
520
I-ADAM
In 1998, the National Institute of Justice launched the
International Drug Abuse Monitoring (I-ADAM) program, partly in recognition of the increasingly global nature of the drug trade. Efforts at understanding substance
use across national borders were often confounded by
the fact that laws, penalties, and recording procedures
varied greatly, depending on the country in question.
I-ADAM has attempted to address this problem by
implementing a common survey, similar to the ADAM
survey used in the United States, in a number of different
countries. Currently, Australia, Chile, England, Malaysia,
Scotland, South Africa, and the United States participate in the I-ADAM program. The Netherlands and
Taiwan have also participated previously.
The I-ADAM program is administered similarly across
the participating countries, facilitating international
comparisons of substance use among arrestees. At each
I-ADAM site, trained interviewers, who are not affiliated
with law enforcement, conduct voluntary and confidential
interviews with arrestees within 48 hours of the arrival of
the arrestee in the detention facility. The survey measurement tool is standardized and common to each I-ADAM
site and is administered by local researchers. Similar to
ADAM in the United States, questions include those
about the frequency of use of various substances, age
at first use, perceptions of substance dependency, and
participation in substance abuse treatment. Demographic
data are also collected. Although the collection of data
on female and juvenile arrestees is optional, each
Conclusion
The sources of data on correctional facilities and known
offenders are particularly important for understanding
crime and corrections. The survey entities enable study
of the correctional process and the offenders under the
systems supervision, and the broad scope of their data
is reflected in the wide variety of studies that have
employed them. Research has been undertaken on
the expansion of the correctional system in terms of
prisoners, correctional officers, and cost; on racial and
socioeconomic disparities in correctional populations;
on offending frequency and the development of criminal typologies; and on assessments of treatment
programs aimed at rehabilitating offenders. The Bureau
of Justice Statistics, the Rand Corporation, and the Arrestee Drug Abuse Monitoring program have each provided a vital and unique contribution for understanding
offenders and their experience with the correctional
process. The federal governments Bureau of Justice
Statistics collects extensive data on federal, state, and
local jails as well as on the inmates incarcerated in these
institutions. These data are collected regularly and allow
for the constant monitoring of correctional populations,
correctional resources, and inmate characteristics. The
Rand inmate survey data have proved invaluable for
learning about serious offenders, particularly those
521
Further Reading
Arrestee Drug Abuse Monitoring (ADAM). (2000). 1999
Report On Drug Use among Adult and Juvenile Arrestees.
National Institute of Justice, Washington, D.C.
Auerhahn, K. (1999). Selective Incapacitation and the Problems of Prediction. Criminology 37, 703 734.
Bureau of Justice Statistics (BJS). (2001). Prison and Jail
Inmates at Midyear 2000. U.S. Department of Justice,
Washington, D.C.
Chaiken, J. M., and Chaiken, M. R. (1982). Varieties of
Criminal Behavior. National Institute of Justice Report
R-2814-NIJ. RAND Corporation, Santa Monica, CA.
Chaiken, J. M., and Chaiken, M. R. (1990). Drugs and
predatory crime. In Crime and Justice: A Review of
Research (M. Tonry and J. Q. Wilson, eds.), Vol. 13,
pp. 203 239. University of Chicago Press, Chicago, IL.
English, K., and Mande, M. (1992). Measuring Crime Rates of
Prisoners. National Institute of Justice, Washington, D.C.
Greenwood, P., and Abrahamse, A. (1982). Selective
Incapacitation. National Institute of Justice Report
R-2815-NIJ. RAND Corporation, Santa Monica, CA.
Horney, J., and Marshall, I. (1992). An experimental comparison of two self-report methods for measuring lambda.
J. Res. Crime Delinq. 29, 102 121.
Junger-Tas, J., and Marshall, I. (1999). The self-report
methodology in crime research. In Crime and Justice:
A Review of Research (M. Tonry, ed.), Vol. 25, pp. 291 367.
University of Chicago Press, Chicago, IL.
Mosher, C., Miethe, T., and Phillips, D. (2002). The
Mismeasure of Crime. Sage Publ., Thousand Oaks, CA.
Petersilia, J., Greenwood, P., and Lavin, M. (1977). Criminal
Careers of Habitual Felons. Department of Justice Report
R-2144 DOJ. Rand Corporation, Santa Monica, CA.
522
Correlations
Andrew B. Whitford
University of Kansas, Lawrence, Kansas, USA
Glossary
analysis of variance Method for the analysis of variation in
an experimental outcome that exhibits statistical variance,
to determine the contributions of given factors or variables
to the variance.
cross-product Multiplication of the scores on the two
variables for any single unit of observation in the data.
linear regression Method for estimating a linear functional
relationship between two or more correlated variables.
Regression is empirically determined from data, and may
be used to predict values of one variable when given values
of the others. It provides a function that yields the mean
value of a random variable under the condition that one (in
bivariate regression) or more (in multivariate regression)
independent variables have specified values.
linear relationship Response or output is directly proportional to the input.
moment Random variables can be represented in terms of
deviations from a fixed value. A moment is the expected
value of some power of that deviation.
normal distribution Probability density function with the
following form:
1 xm 2
1
f x p e 2 s
s 2p
polychotomous A measurement that is divided or marked by
division into many parts, classes, or branches.
Correlation represents a family of methods for summarizing the strength and direction of the relationship between two variables X and Y. Pearsons correlation
coefficient is the most familiar version, but is only appropriate given specific assumptions about the functional
relationship between the two variables and the choices
made about their measurement. Correlation as a concept
serves as a basis for statements about relationships and
associations, simple statistics for hypothesis testing, and
Introduction
One measure or statistical descriptor of the strength of the
relationship between two variables X and Y is the correlation coefficient, which is also called the coefficient of
linear correlation. As such, correlation coefficients summarize both the direction of that relationship and its
strength. The most well-known means for assessing correlation is the Pearson product-moment correlation coefficient. Since its introduction in 1896, this approach has
proved a useful mechanism for assessing relationships
because it indexes the degree to which two variables
are related. The values of this statistic are bounded between 1.0 and 1.0 inclusive; the negative or positive
sign indicates the direction of the relationship, and the
absolute value indicates magnitude (or strength of association). This statistic is also important because its construction, based on the products of moments in the data,
forms the basis of extensions to other, supplementary
statistics that complement the purpose of Pearsons correlation coefficient to summarize the relationship and its
strength by indexing.
It is important to note throughout this discussion that
a statistical description of correlation does not equate to
proof of causation between two variables. A simple reason
is that correlations can be deduced in single studies even
without the introduction of controls. In contrast, causal
statements require controlled experiments. This is
a simplification of a more complicated reasoning, but
signals the importance of separating statements of causality from statements about the association between two
variables. In many cases, causal claims will require additional statements based on intuition or the theoretical
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
523
524
Correlations
understanding of underlying mechanisms, but more developed probability-based statements of the concordance
between correlation and causality are also possible. In
short, correlations can be computed without referring
to dependence between two variables, even to the
point of not distinguishing between independent and
dependent variables.
Correlations are discussed here by emphasizing three
fundamental distinctions. First, the technology used to
assess correlations will depend on the measurement
scale employed by the researcher. Second, the technologies available include both parametric and nonparametric
choices. Third, correlation technologies are available for
both linear and nonlinear relationships. Together, these
patterns help separate the landscape of correlations into
convenient groupings that aid the researcher in choosing
the correct tool for data analysis.
Definition of Symbols
sy
s
P
Y my 2
N
zx
zy
X mx
,
sx
Y my
sy
Correlations
Use of Pearsons r
It is important to note that a number of factors affect the
calculation of the correlation coefficient, whether it is
calculated as z-scores, deviation scores, or by the computational formula. First, this is the coefficient of linear
correlation, so the tendency of points to be located on
a straight line will alter the size of the correlation coefficient. As long as the points display a random scatter about
a straight line, any of these calculations will act as
a statistical descriptor of the relationship between the
two variables in the data. If the data are related in
a nonlinear way (for example, in a U-shaped relationship),
the absolute value of the correlation coefficient will suffer
downward bias (will underestimate this relationship). An
alteration of this basic framework for nonlinear
relationships is discussed later.
For even linear relationships, as the variance for either
variable shrinks (as the group becomes more homogeneous on that variable), the size of the correlation coefficient in absolute value terms is smaller. This means that
for any two variables, there must be enough heterogeneity
in the data for the correlation coefficient to detect the
relationship. However, this caveat does not extend to the
number of pairs included in the data. As long as n 4 2,
the correlation coefficient will both detect the relationship and represent its strength. If n 2, the coefficient
will only detect whether there is a relationship and its
direction; strength in that case is expressed as one of
the two end points of the correlation coefficients scale.
If n 1, the coefficient fails due to homogeneity in both
variables.
It is useful to note that correlation forms the basis for
simple bivariate linear regression. The slope coefficient
is simply
q
P
Y my 2
,
b1 rxy q
10
P
Y my 2
.
and the intercept is calculated as b0 Y b1X
Interpretation of Pearsons r
As noted, the correlation coefficient provides information
about the direction of the relationship and its strength.
The range of possible values for the correlation coefficient
provides an ordinal scale of the relationships strength. In
short, the correlation coefficient reveals relationships
525
n1 r=1 r
:
2
526
Correlations
Tests of the null hypothesis, r 0, proceed by calculating the following statistic, which is distributed t with
n 2 degrees of freedom:
p
r n2
p :
1 r2
Tests of the null hypothesis, r r0, proceed by
calculating the following statistic, which is distributed
as a normal variable:
zr zr0
p
:
1=n 3
Tests of the null hypothesis, r1 r2, proceed by
calculating the following statistic, which is also distributed as a normal variable:
zr1 zr2
p
:
1=n1 3 1=n2 3
Chen and Popovich have developed additional tests,
including those for the difference between more than
two independent r values and the difference between
two dependent correlations.
12
180
p :
1 BC=AD
Point-Biserial Coefficient
The point-biserial correlation coefficient is the special
case of the Pearson product-moment coefficient if one
variable is measured in nominal (or dichotomous) form
and the second is an interval/ratio level. This statistic,
which is analogous to a two-sample t test, is calculated
as follows for populations:
m m0 p
rpb 1
pq:
sy
In this case, m1 is the mean of all the Y scores for those
individuals with X scores equal to 1, m0 is the mean of
the Y scores for those individuals with X scores equal to
0, sy is the standard deviation for all the scores of
the variable measured at the interval/ratio level, p is the
proportion of the pairs for which X 1, and q is the
Variable X
Total
A
C
B
D
AB
CD
AC
BD
Correlations
Y1 Y0 p
pq:
sy
If the dichotomous measure truly has underlying continuity (an underlying interval/ratio representation), then
the more general biserial coefficient is appropriate. The
biserial coefficient is a transformation of the point-biserial
coefficient that accounts for the height of the standard
normal distribution at the point that divides the p and q
proportions under the curve, which is denoted by u:
p
pq
rb rpb
u
or
rb
m1 m0 pq
:
u
sy
527
Rank-Biserial Coefficient
The rank-biserial coefficient, proposed by Cureton in
1956 and refined by Glass in 1966, is appropriate if one
variable is measured at the nominal/dichotomous level
and the second is ordinal. This coefficient uses notation
similar to the point-biserial coefficient:
rrb
2
m m0 :
N 1
Here, Aij is the actual count of cases in the ijth cell and
Eij is the expected count when the null of independence
is true and is calculated based on the marginals; C is not
bounded 1.0, but instead depends on the number of
categories for the row and column variables. This
limitation makes all C values calculated across data sets
incomparable unless the coding scheme for the two data
sets is exactly the same.
Cramers V is intended to limit this deficiency by ranging only from 0 to 1. The formula is again based on the
calculated value of w2 but now adjusted by the smallest
number of rows or columns in the table:
s
w2
:
V
n minr, c 1
Non-Product-Moment
Coefficients
As already noted, a second dividing line between types of
correlation coefficients is the weight of their reliance on
Pearsons product-moment coefficient for the basis of
their technology. In each of the following cases, the method significantly departs from Pearsons approach, but still
operates across the first division marking various combinations of the basic measurement schemes employed by
researchers.
Kendalls s
Up to this point, the non-product-moment-based
coefficients offered have provided means to address combinations of measurements for which an equivalent
product-moment-based coefficient is not available. Like
Spearmans rank-order coefficient, Kendalls t also
addresses data in which both variables are measured
528
Correlations
2CP DP
,
nn 1
Nonlinear Relationships
The final division among correlation coefficients addresses the question of nonlinear relationships between
two variables. As noted previously, when two variables are
related in a nonlinear way, the product-moment basis for
Pearsons r will understate the strength of the relationship
between the two variables. This is because r is a statement
of the existence and strength of the linear relationship
between two variables. The correlation ratio, Z, addresses
the relationship between a polychotomous qualitative variable and an interval/ratio level quantitative variable, but
an advantage of this measure is that it also states the
strength of a possible nonlinear relationship between
the two variables. At the same time, it makes only
a statement of the strength of that relationship, not direction, because of the polychotomous nature of the first
variable. For a population, Z is calculated by the following
equation:
s
P
Y mc 2
Z P
Y mt 2
The mean value of the interval/ratio scores (mc) is
calculated for each category of the first variable; the
grand mean (mt) is also calculated. Summation occurs
over the differences between the score for each
observational unit and these means. For a sample, the
sample means are calculated instead. Generally, Z2 will
be greater than r2, and the difference is the degree
of nonlinearity in the data. For two interval/ratio
variables, one variable can be divided into categories
Further Reading
Carroll, J. P. (1961). The nature of the data, or how to choose
a correlation coefficient. Psychometrika 26, 347372.
Chen, P. Y., and Popovich, P. M. (2002). Correlation:
Parametric and Nonparametric Measures. Sage, Thousand
Oaks, California.
Correlations
Cureton, E. E. (1956). Rank-biserial correlation. Psychometrika 1956, 287290.
Cureton, E. E. (1968). Rank-biserial correlationWhen ties
are present. Educat. Psychol. Meas. 28, 7779.
Fisher, R. A. (1921). On the probable error of a coefficient of
correlation deduced from a small sample. Metron 1, 132.
Glass, G. V. (1966). Note on rank-biserial correlation. Educat.
Psychol. Meas. 26, 623631.
Kendall, M. G. (1938). A new measure of rank correlation.
Biometrika 34, 8193.
Kendall, M. G. (1949). Rank and product-moment correlation.
Biometrika 36, 177193.
Lancaster, H. O., and Hamden, M. A. (1964). Estimates of the
correlation coefficient in contingency tables with possibly
nonmetrical characters. Psychometrika 29, 383391.
529
Correspondence Analysis
and Dual Scaling
Shizuhiko Nishisato
University of Toronto, Ontario, Canada
Glossary
duality Symmetry in quantification of rows and columns.
principal hyperspace Multidimensional space with principal
axes.
singular-value decomposition (SVD) The optimal decomposition of a two-way table into row structure and column
structure.
transition formulas and dual relations Mathematical relations in quantification between rows and columns.
Introduction
Historical Notes
There are a number of names for this family of scaling
techniques, some of which are correspondence analysis,
dual scaling, homogeneity analysis, optimal scaling,
Hayashis theory of quantification, biplot, and appropriate
scoring. All of these are mathematically equivalent or
closely related to one another. There are two precursors
of this family of scaling techniques: (1) mathematical eigenvalue theory, developed by mathematicians in the 18th
century and (2) singular-value decomposition (SVD) developed byBeltrami in1873, Jordanin1874, andSchmidtin
1907. The first was put into practice by Pearson in 1901 and
Hotelling in 1936 under the name principal component
analysis (PCA), and the second offered the computational
scheme for the most efficient decomposition of a two-way
table through the spacing (weighting) of rows and columns
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
531
532
Basic Rationale
Incidence Data
Data are either frequencies of responses or 1 (presence), 0
(absence) incidences. Contingency tables and multiplechoice data, expressed as (1, 0) response patterns, are
examples of incidence data. For this type of data, there
are a number of ways to formulate CA/DS scaling techniques, all of which, however, can be expressed in the
following bilinear expansion of a data element fij in row
i and column j:
fij
fi fj
1 r1 yi1 xj1 r2 yi2 xj2 rk yik xjk
f
rK yiK xjK
Dominance Data
1
Dominance data include rank-order data and paired comparison data in which the information is of the type
i jk
>
:
1 if subject i prefers object k to j
For n objects, these are transformed to dominance
numbers by
eij
n
X
n
X
i fjk
j1 k1 j6k
Numerical Examples
Multiple-Choice Data: An Example of
Incidence Data
CA is a name for quantification of contingency tables.
When CA is applied to multiple-choice data, it is called
multiple correspondence analysis (MCA). Torgerson in
1958 called MCA and DS principal component analysis
(PCA) of categorical data. To see the differences between
PCA of categorical data and PCA of continuous data, let us
look at a numerical example of the following six multiplechoice questions from a study in 2000 by Nishisato.
1. Rate your blood pressure. (Low, Medium, High):
coded 1, 2, 3
2. Do you get migraines? (Rarely, Sometimes,
Often): coded 1, 2, 3
3. What is your age group? (2034; 3549;
5065): coded 1, 2, 3
4. Rate your daily level of anxiety. (Low, Medium,
High): coded 1, 2, 3
5. How about your weight? (Light, Medium,
Heavy): coded 1, 2, 3
533
534
Table I
PCA
Subject
BP
Q1
Mig
Q2
Age
Q3
Anx
Q4
Wgt
Q5
Hgt
Q6
BP
123
Mig
123
Age
123
Anx
123
Wgt
123
Hgt
123
1
1
3
3
2
2
2
1
2
1
2
2
3
1
3
3
3
3
3
1
1
2
3
2
3
1
2
3
3
3
3
1
3
3
2
2
2
1
2
2
1
3
3
1
3
3
3
3
3
2
3
1
3
1
2
3
3
3
2
3
1
2
1
1
3
3
1
1
1
1
2
2
3
1
1
1
3
3
1
2
1
3
3
2
3
2
2
1
1
2
100
100
001
001
010
010
010
100
010
100
010
010
001
100
001
001
001
001
001
100
100
010
001
010
001
100
010
001
001
001
001
100
001
001
010
010
010
100
010
010
100
001
001
100
001
001
001
001
001
010
001
100
001
100
010
001
001
001
010
001
100
010
100
100
001
001
100
100
100
100
010
010
001
100
100
100
001
001
100
010
100
001
001
010
001
010
010
100
100
010
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
a
DS
Anx, anxiety; BP, blood pressure; Hgt, height; Mig, migraine; Wgt, weight.
Table II
PCA
BP
Mig
Age
Anx
Wgt
Hgt
a
BP
Mig
1.00
0.06
0.66
0.18
0.17
0.21
1.00
0.23
0.21
0.58
0.10
Age
1.00
0.22
0.02
0.30
DS
Anx
1.00
0.26
0.23
Wgt
1.00
0.31
Hgt
BP
Mig
Age
Anx
Wgt
Hgt
1.00
1.00
0.99
0.60
0.47
0.43
0.56
1.00
0.58
0.52
0.39
0.57
1.00
0.67
0.08
0.13
1.00
0.33
0.19
1.00
0.20
1.00
Anx, anxiety; BP, blood pressure; Hgt, height; Mig, migraine; Wgt, weight.
Table III
High BP
Med BP
Low BP
a
Migraine
2034
3549
5065
0
1
3
0
4
1
4
1
1
Rarely
0
3
0
Sometimes
Often
0
3
0
4
0
5
535
Low Anx
Tall
Low BP
Occasional migraines
Young
Light
Middle age
Frequent migraines
Mid Anx
Average weight
Old
Average BP
Average height
High Anx
Short
High BP
Rare migraines
Heavy
Figure 1 Dual scaling solutions 1 and 2. Anx, anxiety; BP, blood pressure.
Table IV
Solution 2
One end
Low BP
High BP
Frequent migraine
Old
High anxiety
Short
a
One end
Medium BP
Rare migraine
Middle age
Low anxiety
Medium height
High BP
Rare migraine
Old
Heavy
Short
Low BP
Occasional migraine
Young
Tall
Table V
Dominance matrix
Subject
(1)
(2)
(3)
(4)
(5)
(6)
(1)
(2)
(3)
(4)
(5)
(6)
01
02
03
04
05
06
07
08
09
10
6
6
3
3
5
2
1
4
2
6
1
1
5
4
3
6
2
3
1
1
5
5
2
2
1
3
4
2
4
4
4
2
4
6
4
5
5
6
5
3
3
4
1
1
6
4
3
5
3
5
2
3
6
5
2
1
6
1
6
2
5
5
1
1
3
3
5
1
3
5
5
1
3
1
1
5
3
1
5
5
3
3
3
3
5
1
1
3
1
1
1
3
1
5
1
3
3
5
3
1
1
1
5
5
5
1
1
3
1
3
3
1
5
3
3
5
5
5
5
3
536
Table VI
Rank-2 approximation
Subject
(1)
(2)
(3)
(4)
(5)
(6)
(1)
(2)
(3)
(4)
(5)
(6)
1
2
3
4
5
6
7
8
9
10
1.94
2.20
0.67
0.49
1.72
1.70
1.10
1.61
1.40
2.13
0.73
0.98
1.75
1.63
1.58
2.38
1.42
1.70
1.28
1.02
1.72
2.02
1.21
1.01
1.18
1.37
1.55
1.07
1.74
1.79
1.06
1.36
1.53
1.38
1.27
1.95
1.45
1.33
1.45
1.21
1.67
1.91
0.81
0.69
1.77
1.96
0.89
1.71
1.11
1.91
1.37
1.71
1.98
1.79
0.54
1.51
2.14
0.76
2.18
1.25
6
6
1
1
5
3
2
4
3
6
1
1
5
5
4
6
3
5
2
1
5
5
3
3
2
1
5
2
5
4
2
2
4
4
3
4
4
3
4
2
4
4
2
2
6
5
1
6
1
5
3
3
6
6
1
2
6
1
6
3
Night
S2
S9
Pub restaurant crawl
after work
S1
S10
S7
Pot-luck (night)
Evening banquet
Costly
Pot-luck (day)
Ritzy lunch
S3
Not costly 1
S4
S5
S8
Daytime
S6
2
Further Reading
Benzecri, J.-P., et al. (1973). LAnalyse des Donnees: II.
LAnalyse des Correspondances [Data Analysis II: Correspondence Analysis]. Dunod, Paris.
Gifi, A. (1990). Nonlinear Multivariate Analysis. John Wiley,
New York.
Gower, J. C., and Hand, D. J. (1996). Biplots. Chapman &
Hall, London.
Greenacre, M. J. (1984). Theory and Applications of Correspondence Analysis. Academic Press, London.
Greenacre, M. J., and Blasius, J. (eds.) (1994). Correspondence
Analysis in the Social Sciences. Academic Press, London.
Lebart, L., Morineau, A., and Warwick, K. M. (1984).
Multivariate Descriptive Statistical Analysis. John Wiley,
New York.
Meulman, J. J. (1986). Distance Approach to Multivariate
Analysis. DSWO Press, Leiden.
Nishisato, S. (1980). Analysis of Categorical Data: Dual Scaling
and Its Applications. University of Toronto Press, Toronto.
Nishisato, S. (1994). Elements of Dual Scaling. Lawrence
Erlbaum Associates, Hillsdale, NJ.
Nishisato, S. (1996). Gleaning in the field of dual scaling.
Psychometrika 61, 559599.
CostBenefit Analysis
James Edwin Kee
George Washington University, Washington, DC, USA
Glossary
discount rate The interest rate used to convert future
benefits and costs to their present value in year 1.
marginal benefits/marginal costs Marginal cost is defined
as the incremental (additional) cost of producing one more
unit of output. Marginal benefit is the incremental benefit
generated by that one unit of output.
net present value (NPV) The conversion of a stream of
future benefits less future costs to their equivalent benefits
and costs in year 1, at the beginning of the project or
program.
opportunity cost The value of using a resource (land, money,
etc.) for one thing instead of another.
shadow pricing An attempt to value a benefit or a cost where
no competitive market price exists.
sunk cost Investments previously made in a program or
projectsuch as original research and development costs
that cannot be recoupedcompared to ongoing costs.
Costbenefit (or benefitcost) analysis is a useful quantitative tool for policy analysts and program evaluators. It
is used to analyze proposed programs, to conduct
evaluations of existing programs (to assess their overall
success or failure), to help determine whether programs
should be continued or modified, and to assess the probable results of proposed program changes.
Introduction
The Use of CostBenefit Analysis
Costbenefit analysis is an applied economic technique
that attempts to assess a government program or project
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
537
538
CostBenefit Analysis
CostBenefit Illustration
Table I
Benefits
Direct
Indirect
Total benefits
Totals
1000
1750
750
2500
3000
1250
4250
4750
1750
6500
5000
2000
7000
15,500
5750
21,250
250
400
250
2000
700
500
1925
900
500
1850
1500
500
1775
1500
500
1700
250
5000
2250
9250
580
200
3680
625
50
3800
650
50
3950
755
50
4580
740
50
4490
3350
400
20,500
(2680)
(1300)
300
1920
2510
750
5-Year Total
5-Year NPV @ 5%
750
78
Costs
Direct
Start-up
Personnel
Equipment and materials
Capital
Indirect
Overhead
Mitigation
Total costs
1000
1.0042
1.5590
CostBenefit Analysis
:
1r
1 r 2
1 r x 1
539
540
CostBenefit Analysis
CostBenefit Analysis
541
Cost Categories
542
Table II
CostBenefit Analysis
CostBenefit Framework
Illustration of BenefitCost
Benefits
Direct: Tangible
Direct: Intangible
Indirect: Tangible
Indirect: Intangible
Costs
Direct: Tangible
Direct: Intangible
Indirect: Tangible
Indirect: Intangible
Transfers
Valuation approaches
Personnel
Materials and supplies
Rentals (facilities/equipment)
Capital purchases
Land
Volunteers
Fear if harm
General overhead
Spillover costs to third parties
Environmental damage
Compliance/client costs
Loss of aesthetics
CostBenefit Analysis
Analysis of BenefitsCosts
Presenting the Results
Net present value of benefits minus costs or costs minus
benefits is the most traditional format for government
agencies to present the results of the analysis; however,
a benefitcost ratio is sometimes used when comparing
similar programs. The benefitcost ratio is determined by
dividing the total present value of benefits by the total
present value of costs. For example, if benefits equal
$20 million and costs equal $10 million, the program is
said to have a benefitcost ratio of 2 to 1 or 2.0. Any project
with a benefitcost ratio of less than 1.0 should not
be undertaken because the governments opportunity
cost (its discount rate) is greater than that returned by
the project.
Unlike the private sector, government evaluators
usually do not conduct ROI or IRR analysis; however,
such an analysis also can be computed. It is the discount
rate that would yield total present value benefits equal to
costs. It is important for the analyst to conduct a sensitivity
analysis of key assumptions to see which have the greatest
impact on the analysis. What is the probability that those
assumptions will occur? The analyst should examine
a range of alternative assumptions and determine how
they impact the analysis.
Intangibles
No matter how creative the analyst, there are some benefits and costs that defy quantification. Even if an analyst
can value the cost of an injury, that dollar figure will not
fully capture the pain and suffering involved, and financial
savings from burglaries prevented does not fully capture
the sense of security that comes with crime prevention.
These are often important components of the cost
benefit equation and should be identified and explained
as clearly as possible. Costbenefit analysis may sometimes draw attention to an implicit valuation of some
intangibles that may hitherto have been hidden in rhetoric.
For example, if the costs of Program X exceed the hard
benefits (i.e., those that are quantifiable in dollar terms)
by an amount y, the intangibles must be worth at least y
to the public and their decision makers or the program
should be reconsidered.
Equity Concerns
It is not just the total benefits and costs but also who
benefits and who pays that are of concern to policymakers.
543
Comparison with
Cost-Effectiveness Analysis
The major alternative to costbenefit analysis is cost-effectiveness analysis, which relates the cost of a given alternative to specific measures of program objectives.
A cost-effectiveness analysis could compare costs to units
of program objectives, for example, dollars per life saved
on various highway safety programs. Cost-effectiveness
analysis is sometimes the first step in a costbenefit
analysis. It is especially useful when the programs objectives are clear and either singular or sufficiently related so
that the relationship between the objectives is clear and
the evaluator cannot place a dollar value on program benefits. For example, if the goals of certain education
programs are to prevent high school dropouts, alternative
programs can be compared by analyzing the costs per
dropout prevented (or per increase in the percentage
of students graduating) without valuing those benefits
in dollars.
The major advantage of cost-effectiveness analysis is
that it frees the analyst from expressing all benefits in
monetary terms. The analyst can simply present the benefits per x dollars and allow the decision maker to assess
whether the benefits equal the costs. However, government programs often generate more than one type of
benefit. Therefore, the analyst would have to weight
the various benefits to achieve a common denominator,
whereas in costbenefit analysis it is dollars that are the
common denominator. Nevertheless, when valuing in
dollars is impossible or impractical, or where the program
objectives are singular, cost-effectiveness analysis provides an alternative economic technique. Other alternatives to costbenefit analysis include costutility analysis,
risk analysis, and a variety of decision-making grids that
value and weight various aspects of program alternatives.
544
CostBenefit Analysis
Further Reading
Adler, M., and Posner, E. (eds.) (2001). CostBenefit Analysis:
Legal, Economic, and Philosophical Perspectives. University
of Chicago Press, Chicago, IL.
Boardman, A. E., Greenberg, D. H., Vinning, A. R., and
Weimer, D. L. (1996). CostBenefit Analysis: Concepts
and Practices. Prentice Hall, Upper Saddle River, NJ.
Kee, J. E. (2004). Cost-effectiveness and costbenefit analysis.
In Handbook of Practical Program Evaluation (J. Wholey,
H. Hatry, and K. Newcomer, eds.), 2nd Ed., pp. 506541.
Jossey-Bass, San Francisco, CA.
Levin, H. M., and McEwan, P. J. (2001). Cost-Effectiveness
Analysis, 2nd Ed. Sage, Thousand Oaks, CA.
Mishan, E. J. (1988). CostBenefit Analysis: An Informal
Introduction. Unwin Hyman Press, London.
Nas, T. F. (1996). CostBenefit Analysis: Theory and
Applications. Sage, Thousand Oaks, CA.
Glossary
annexation Changing the boundaries of two municipalities
by decreasing the land area of one and increasing the
land base of the other; extending the boundaries of
a municipality by incorporating the territory of an adjacent
unincorporated area.
borough The primary legal division in Alaska; in other states,
treated by the U.S. Census Bureau as statistically equivalent
to a county.
census area One of several types of areas demarcated by
the U.S. Census Bureau specifically for data collection
purposes.
central city Usually the largest city in a metropolitan
statistical area or consolidated metropolitan statistical area.
Additional cities qualify as central cities if requirements
specified by the Office of Management and Budget are met
concerning population size and commuting patterns.
city A type of incorporated place in all U.S. states and the
District of Columbia.
county The primary legal division in every U.S. state except
Alaska and Louisiana.
incorporated place A type of governmental unit legally
incorporated as either a city, town, borough, or village.
These entities are created to provide government services
for a place that has legally delineated boundaries.
independent city An incorporated place that is not part of
any county.
metropolitan area (MA) A core area with a large population
nucleus, usually of about 50,000 people, together with
adjacent communities that have a high degree of economic
and social integration with that core. In New England, the
total MA population must number at least 75,000.
parish The primary legal subdivision of the state of Louisiana.
place A location where population is concentrated in
sufficient numbers to be either an incorporated place,
or a location that is delineated by the U.S. Census Bureau
for statistical purposes.
Overview of Counties
In the United States, each state is generally divided into
counties, which are smaller administrative districts. According to the U.S. Census Bureau, counties are the
primary legal division of states. The Census Bureau
also recognizes several county equivalents. A large quantity of data is collected and published for counties, which
makes them a good scale for local analysis. The boundaries
of counties remain relatively stable and this fact makes
county data ideal for comparisons from one time period to
another. However, the boundaries of counties do change
periodically, and this has implications for comparison of
data across time periods. The National Association
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
545
546
Table I
County equivalent
Geographical area
Alaska
Alaska
Alaska, Northern
Mariana Islands
Puerto Rico
Louisiana
VA, MD, MO, NV b
County Equivalents
WEST
VIRGINIA
D.C.
MARYLAND
DELAWARE
VIRGINIA
Charlottesville
City
Albemarle
County
<
Richmond
City
Greensville
County
<
<
No
No
No
<
Yes
No
Yes
<
Independent Cities
Municipio
Parish
Independent city
<
New designation
for 2000?
NORTH CAROLINA
Henrico
County
Emporia
City
Overview of Cities
A city is a legally defined geographic entity that is incorporated. Incorporated places provide a range of public
services to the population situated within its bounds.
Metropolitan Areas
Cities are often a part of larger metropolitan areas (MAs).
MAs are designated by the federal Office of Management
and Budget (OMB) according to standards published in
the Federal Register and were designed to be used solely
for statistical purposes. Metropolitan areas are urban
areas that are composed of cities and counties (or county
equivalents) that are closely connected economically and
socially. To date, several types of metropolitan areas have
been designated.
547
548
Lake Michigan
Milwaukee Racine,WI
CMSA
WISCONSIN
AuroraElgin, IL
PMSA
Lake County, WI
PMSA
Chicago-GaryLake County,
IL-IN-WI CMSA
Chicago, IL PMSA
INDIANA
Chicago City
Gary-Hammond,
IN PMSA
ILLINOIS
Joliet, IL PMSA
Chicago
State boundary
CMSA
County boundary
PMSA
Data Sources
County and city data are available through several
sources. Most city and county data are considered to
be secondary data because they are collected by an entity
other than the user. The most commonly used secondary
data sources can be divided into two basic categories:
public domain data and data collected by private or nongovernmental organizations or businesses. The federal
government of the United States is the largest and
most comprehensive source of public domain demographic and economic data for cities and counties. Within
the federal government, the U.S. Census Bureau is the
most important data source and the entity that is most
familiar to the public at large.
549
Print Sources
City and county data are provided in print, in several
important publications. The best known sources are
The County and City Data Book, the State and Metropolitan Data Book, and USA Counties. The U.S. Census
Bureau publishes each of these reference books. The
County and City Data Book has been published since
the 1940s and contains population and economic data
for all U.S. counties and for cities with 25,000 or more
inhabitants, and places of 2500 or more inhabitants. The
State and Metropolitan Data Book, which has been published since 1979, is a supplement to the Statistical Abstract of the United States. The State and Metropolitan
Book summarizes social, economic, and political statistics
for states and metropolitan areas. It also serves as a guide
to other publications and data sources. USA Counties,
which includes data from 1982 on, provides over 5000
data items for counties. These data are gathered from
nine different federal and private agencies.
Internet Sources
Since the advent of the World Wide Web, city and county
data have become more widely available and much easier
to procure and manipulate for those with access to and
knowledge of the Internet. From the Census Bureaus
website for the Census 2000, American Fact Finder,
county or city data can be downloaded for any number
of variables. The reports that can be generated may be
designed by the user to make geographical comparisons,
and a user may also generate a data report for a variety
of scales in the same report. For instance, a report can
be designed to provide data on the race and ethnicity breakdown of Chicago City, Cook County, Illinois, Chicago
MSA, and the state of Illinois. The same variables (and
others) could be generated for all the counties in Illinois
as well.
Most public libraries are equipped with Internet access, and may even have a librarian who is trained to assist
patrons with accessing census data from the World Wide
Web. State data centers may also make data available over
the Internet on their websites. There are countless other
sources of county and city data available on the Internet.
One of the easiest methods of diving in is to use
a gateway of some sort to narrow down or organize the
data sets. A gateway is a website that provides access to
a larger range of data or information from other Internet
sources.
550
Further Reading
Glossary
European Law Enforcement Organization (Europol) The
European Police Office.
International Criminal Police Organization (Interpol).
self-report surveys Surveys in which individuals are asked to
report on their commission of a variety of criminal acts.
United Nations Crime Surveys (UNCSs) Surveys conducted every 5 years (since the early 1970s) that collect
data on crime and criminal justice system operations in
certain member nations of the United Nations.
victimization surveys Surveys in which individuals are asked
to report on their experiences as victims of crime.
Introduction
Since the early 1800s, social scientists and policy makers
have been interested in measuring the amount of crime in
society in order to examine trends in crime over time and
differences in crime across jurisdictions, address the correlates of crime and develop theories on the causes of
crime, and create policies to deal with the crime problem.
This entry examines criminal justice records, which consist of data on crime collected and compiled by police
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
551
552
in criminal courts in England in 1805, and more standardized judicial statistics recording indictments and convictions for indictable (more serious) offenses were collected
annually in that country beginning in 1834. Early commentators on these statistics emphasized the importance
of exercising caution in interpreting them and using them
as measures of crime. For example, discussing an alleged
increase in crime in England and Wales in the late 1800s,
Morrison asserted in 1892 that it was not possible to determine through the examination of such statistics
whether crime was increasing or decreasing in England
due to the erratic and haphazard manner in which criminal statistics were collected and analyzed. Morrison
further noted that a primary cause of increases in
crime revealed by judicial statistics was changes in legislation that added offenses to criminal codes. At the same
time, decreases in crime could be attributed to the abolition of certain laws and to the greater reluctance of the
public and police to set the law in motion against trivial
offenders by reporting their offenses. The influence of
legislative changes on crime rates was also addressed
by duCane, who noted in 1893 that offenses against the
British Education Acts (which required parents to send
their children to school), which were not legislatively
mandated in England until 1870, totaled over 96,000 in
1890. In his work The Decrease of Crime, duCane thus
argued that an uninformed comparison of crime rates over
the 1870 1890 period in England might conclude that
crime had increased, when in reality the increase was
attributable to an expansion in the definition of crime.
France published crime statistics, based on judicial
data, in 1827 (covering the year 1825). These early
crime statistics were part of the moral statistics movement
that emerged in several Western countries in the early
1800s. The collection of these statistics was also related to
the belief that the quantitative measurement techniques
that were being applied in the physical sciences could be
applied to the measurement of human phenomena. Based
on judicial statistics from France, Quetelet in 1842 examined the correlates of crime, with a particular focus on
gender and age. He also attempted to explain the causes of
crime, examining factors such as the relationship between
the consumption of alcohol and violent crime rates, the
relationship between poverty and relative inequality and
crime rates, and the relationship between the racial composition of the population and crime rates.
Interpol Data
The International Criminal Police Organization (Interpol) has collected international crime statistics based
on crimes known to the police annually since 1950. The
European Law Enforcement Organization (Europol),
formed in 1999, also serves to facilitate the sharing of
crime information among countries of the European
Union. Interpol crime data are accessible at the organizations Web site. Interpols first report was issued in 1954
and included data for only 36 countries; subsequently,
data were published every 2 years and every year since
1993, with the inclusion of data on a greater number of
553
554
commercial enterprises, even when the risk of victimization to private citizens is the same.
prosecuted for; the gender and age of individuals prosecuted; the number of individuals convicted and acquitted;
the number sentenced to capital punishment and various
other sanctions; the number of prisoners, the length of
sentences they received, and prison demographics. It is
important to note, however, that these data are not complete for all countries responding to the surveys for all
years.
555
556
Conclusion
Criminal justice records have been collected and compiled since the early 1800s and have been used by social
scientists to study trends in crime and the correlates and
causes of crime. They have also been used by legislators to
inform policies on crimes. More recently, especially given
concerns about the increasing globalization of crime
and the possible lessons that can be learned from
cross-national comparisons of crime rates, statistics
Further Reading
du Cane, E. (1893). The decrease of crime. Nineteenth
Century 33, 480 492.
Gurr, T. (1977). Contemporary crime in historical perspective:
A comparative study of London, Stockholm, and Sydney.
Ann. Am. Acad. Polit. Soc. Sci. 434, 114 136.
Interpol. (2003). International Crime Statistics. Available at:
http://www.interpol.int
Lynch, J. (2002). Crime in international perspective. In Crime
(J. Wilson and J. Petersilia, eds.), pp. 5 41. ICS Press,
Oakland, CA.
Maltz, M. (1977). Crime statistics: A historical perspective.
Crime Delinquency 23, 32 40.
Morrison, W. (1892). The increase of crime. Nineteenth
Century 31, 950 957.
Mosher, C., Miethe, T., and Phillips, D. (2002). The
Mismeasure of Crime. Sage, Thousand Oaks, CA.
Neapolitan, J. (1997). Cross-national crime: A research review
and sourcebook. Greenwood Press, Westport, CT.
Newman, G. (ed.) (1999). Global Report on Crime and Justice.
Oxford University Press, New York.
Quetelet, L. (1842). Treatise on Man and the Development
of His Faculties. S. W. and R. Chambers, Edinburgh,
Scotland.
Reichel, P. (2002). Comparative Criminal Justice Systems:
A Topical Approach. Prentice-Hall, Upper Saddle River, NJ.
United Nations. (2002). The United Nations Crime Survey.
Available at: http://www.uncjin.org
U.S. Bureau of Justice Statistics Web site. http://www.ojp.
usdoj.gov.bjs
van Dijk, J., and Kangaspunta, K. (2000). Piecing Together the
Cross-National Crime Puzzle. National Institute of Justice
Journal, Washington, D.C.
Criminology
Chester L. Britt
Arizona State University West, Phoenix, Arizona, USA
Glossary
crime An act committed that violates a law and for which
there is a state-imposed sanction upon conviction.
criminal justice system The agencies responsible for the
enforcement of criminal laws, including the police, courts,
and corrections.
delinquency An act committed by a minor that violates a law
and for which the state may impose a sanction upon
adjudication.
domestic violence Causing or threatening harm to a member
of the family or household.
victimization Physical, psychological, or monetary harming of
an individual victim in a criminal incident.
Introduction
Criminology is a wide-ranging interdisciplinary field that
encompasses the study of crime and the criminal justice
system. Criminological research focuses on issues related
to the causes and consequences of crime, delinquency,
and victimization, as well as the operation of the criminal
justice system, with an emphasis on police, courts, and
corrections. Because of the wide range of topics studied
within criminology, researchers have required a variety of
different methods to address various topics. At the outset,
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
557
558
Criminology
Self-Reports
Crime and Delinquency Self-report studies of crime
and delinquency were pioneered in the mid-1950s by researchers interested in examining the link between family
characteristics and delinquent behavior. A decade later,
Travis Hirschi systematically linked the testing of theories
of crime and delinquency to the collection and analysis of
self-reported delinquency data. The central feature of selfreport surveys of crime and delinquency is the expectation
that respondents to a questionnaire will respond to one or
more questions about criminal or delinquent acts that they
may have committed. In the early work by Hirschi, questions about delinquency were fairly simple and focused on
less serious forms of crime. Hirschis original delinquency
scale, for instance, consisted of six items that asked whether
the person had stolen something of little value (less than
$2), stolen something of moderate value ($2 to $50), stolen
something of great value (more than $50), hit or attacked
another person, taken someones car for a ride without the
owners permission, and damaged property that did not
belong to them. In the years following Hirschis work, many
other large-scale self-report delinquency surveys that expanded on the list of criminal or delinquent acts were
conducted. These surveys included much more serious
offenses, such as robbery and sexual assault, than were
found in Hirschis original scale. Self-report surveys
have become a staple in criminological research. In addition to investigator-initiated surveys specifically designed
to test one or more theories of crime and delinquency,
government agencies also conduct surveys. For example,
the National Institute of Drug Abuse conducts the annual
Household Survey of Drug Use, which is designed to provide annual and national estimates of drug use that can be
linked to other background characteristics of the respondent.
Although it strikes many people as odd that respondents to a questionnaire would tell a stranger about crimes
that they had committed, a rather large body of evidence
indicates that people are generally quite willing to selfreport their own illicit behavior. This willingness to
self-report crime and drug use appears to hold regardless
of the manner in which the questionnaire is administered.
For example, questionnaires administered in face-to-face
interviews and self-administered questionnaires administered anonymously and nonanonymously have shown the
same general patterns of responses. In other words, there
is no evidence of differential truthfulness that systematically varies by the format in which a questionnaire is
administered. Checks on the accuracy of responses also
indicate that respondents tend to be truthful in self-reporting the behavior in question, although it may not be
with perfect accuracy or recall.
One of the primary benefits to using the self-report
method is that it allows the researcher to gather extensive
Criminology
a better understanding of the conditions or the circumstances that make a crime event more or less likely. As
victimization data have become more accessible to researchers, analyses have helped to further develop theoretical frameworks for understanding victimization risk.
Newer research relying on victimization survey data has
focused on the dynamics of crime, with a particular interest in the interactions among the victim, the offender,
and the social and physical space of the criminal event.
Life-History Calendars Researchers have begun including life-history calendars to expand on the self-report
method. Respondents are asked to review a calendar that
contains a designated number of days, weeks, months, or
years and to note the date or time when different events
occurred. One of the key advantages in the use of lifehistory calendars is the possibility for the researcher to
establish causal order, although the researcher is clearly
dependent on the accuracy of the respondents memory.
Clearly, if the time period covered by the life-history
calendar is too great, the possibility of error is considerable. For example, one would not expect a respondent to
be able to remember specific dates several years in the
past. However, using a calendar that covered the past 6 to
12 months, one would have a reasonable expectation that
the respondent would correctly pick the month in which
some event (e.g., birth of a child, separation from a spouse)
occurred, which could then be compared to the timing of
a crime and/or arrest. For longer time periods, or events
that may have taken place in the more distant past,
a calendar simply using years of events would provide
the researcher with a rough approximation of the timing
of different events.
Official Sources
An official source of data on crime refers to information
collected by a government agency. The level of government may range from the city to the county, state, or
national level. In the United States, the primary source
of official data on criminal behavior is collected by the
Federal Bureau of Investigation (FBI) and published in
its annual Uniform Crime Reports (UCR). The UCR contains two different types of information on crimes that
researchers frequently use: crimes known to the police
and arrest reports. Crimes known to the police are used to
calculate the official crime rate and are based on monthly
reports submitted by thousands of police precincts across
the United States. These are crimes that have either been
reported to the police by citizens calling for police service
(either as a victim or as a witness to a crime) or been
discovered by the police while on duty. Arrest statistics
are based on arrest reports submitted by the arresting
agency to the FBI. Arrest statistics provide information
on the demographic characteristics (e.g., age, sex, and
559
560
Criminology
(http://www.superiorcourt.maricopa.gov), where individual cases may be accessed and provide a brief history
of the case. Although there is relatively little personal
background information on the individual whose case
is being processed, precluding a detailed multivariate
analysis, the information provided makes tracing the
history of a case quite easy.
Experimental Methods
Experimental methods allow the researcher to manipulate
the treatment or intervention and the assignment of cases
to treatment and control groups. Although many disciplines routinely use experiments to test theories of behavior, the study of crime and criminal justice presents a more
challenging venue in which to conduct experimental research. Often, it is not ethical or practical to conduct experimental research in the study of crime. For example,
a theory might suggest that family environment is an important cause of criminal behavior, but it is not realistic to
think that one could randomly assign children to families in
order to test the impact of various family characteristics on
crime and delinquency. Yet, researchers have devised
a number of other creative ways of testing criminological
theory using experimental designs, even those that highlight the importance of family dynamics. It is more common to find the applications of experimental methods in
criminology to focus on some aspect of the criminal justice
system, where cases can be randomly assigned to different
treatments and to a control group without there being
concern for violating an individuals rights or raising
concerns about ethics violations or even public safety.
Experimental Research in Criminal Justice
Perhaps the most widely known use of experimental
methods in criminology comes from the 1984 work of
Sherman and Berk on the effect of arrest on domestic
Criminology
violence in Minneapolis, Minnesota. In their study, officers were asked to randomly assign the male partner involved in a domestic violence call to one of three
conditions: verbal warning, physical separation (removed
from the premises for a short time), or arrest. A follow-up
interview then attempted to measure whether rates of
domestic violence changed after intervention by the police. Sherman and Berk concluded that arrests had
a deterrent effect on the individual who had been arrested
and that individuals who had been arrested were less
likely to commit new domestic violence offenses than
those who had received only a verbal warning or been
separated. In the years since the Sherman and Berk paper
was published, there have been many reanalyses of their
data that call into question their conclusions, but their
basic design has provided the framework for a series of
replication studies in other jurisdictions. The use of experimental methods has also been common in the examination of the effectiveness of different types of
supervision for defendants on pretrial release or individuals serving community-based punishments, police intervention into areas with perceived higher rates of crime
(so-called hot spots), and jury decision-making.
Design Sensitivity in Criminal
Justice Experiments
Unfortunately, many of the experimental studies in criminal justice fields show a lack of an effectthe treatment
condition does not appear to reduce the incidence of
crime or to affect the outcome measured by the researcher. The lack of statistically significant effects of
the treatment on the outcome measure has been
a source of frustration for criminological researchers, because the design of many of these different studies is
usually sound and should provide a convincing and
straightforward way to test the validity of a theory.
Weisburd explains that experiments in the study of
some dimension of the criminal justice system are generally not sensitive enough to detect effects. In part, this is
due to the small samples used in many evaluation studies,
resulting in concerns about the statistical power of studies.
Unfortunately, criminal justice agencies often do not have
the personnel and other resources to perform a large-scale
experiment. A related issue concerns the administration
of the treatments. Weisburd notes that it is usually a small
number of individuals working within an agency who are
responsible for administering the treatment. Thus, when
one particularly highly motivated employee works hard at
accomplishing the goals of the program, the final analysis
may show an effect. Yet, if the entire staff is not so motivated, or does not consistently administer the treatment in
the way expected by the researcher, the chances of detecting a significant effect are low. The inconsistency of
treatment also helps to explain why a program may appear
to be effective in one jurisdiction but fail in another.
561
Quasi-experimental Methods
One of the most common ways to assess the impact of
a change in policy or in the law is through the use of
quasi-experimental methods. In contrast to experimental
methods, where the treatment is manipulated by the researcher and the control (no treatment) group has been
randomly assigned, quasi-experimental methods use a
nonequivalent control group: it was not randomly assigned.
Researchers will try to ensure that the controlreally
a comparisongroup is similar to the treatment group
on as many dimensions as possible, but there is no assurance that the control group in a quasi-experimental study
would react in the same way as a true control group.
Although the use of a nonequivalent control group may
appear to be problematic, in 1979 Cook and Campbell
described the properties of a number of different quasiexperimental designs that have the potential to be just as
convincing as a true experiment.
Assessing the Impact of a Change in Laws
The quasi-experimental approach is particularly useful in
assessing the impact of a law or policy change, because
one may compare two or more cities, states, or nations that
are similar, except for a change in law or policy that went
into effect in one location. For example, Britt and
co-workers examined the effectiveness of a change in
a gun law in Washington, DC in the late 1970s. According
to those who pushed for passage of the law, it was expected
to reduce the number of gun homicides in the DC metro
area. Britt et al.s test of the impact of the change in the
DC gun law used Baltimore as a control site, since
Baltimore was demographically similar to DC. Also important was that the two cities exhibited similar historical
trends in rates of homicides. Britt et al. concluded that the
change in gun law had no effect on homicides, since
a virtually identical pattern of homicides occurred in
Baltimore, where no change in gun laws had occurred.
Other research has used a similar approach to determine
the impact of changes in drunk-driving legislation on
drunk-driving fatalities.
562
Criminology
Observational Methods
Observational methods typically require the researcher to
follow along with the individual(s) being studied in order
to gain a fuller understanding of each persons behavior.
The researcher might take notes, conduct unstructured
or structured in-depth interviews, tape conversations, or
videotape activities. The close and regular contact between the researcher and the subject of the research allows for a relationship to develop that permits the
researcher access to information that would otherwise
be unavailable in a more standard survey or in archival
data. For example, a researcher who has spent several
hundred hours riding along with police officers in
a given department will build a rapport, and presumably
trust, with the officers over time. This allows the researcher to ask why questions after some event that
provide the researcher with some insight as to the
motivations of the officers being studied. Observational
methods, usually combined with in-depth interviewing,
have helped to produce some of the richest research on
the criminal justice system that continues to affect the way
that researchers view the operation of different agencies.
Police Behavior
In the area of policing research, the study of police
behavior, with its emphasis on the use of discretion,
often requires researchers to spend many hours of
ride-alongs with police. During the many hours spent
with police, researchers are able to collect detailed information about the conditions that place police and citizen
decision-making in a broader theoretical context. The
ability to observe repeated police citizen encounters
has allowed researchers to identify patterns in these
encounters, which in turn has helped to generate theoretical explanations for the use of police discretion.
Research continues to explore the links among characteristics of the suspect, the officer, the apparent victim, and
the context of the police citizen encounter and how
these affect the chances of an officer making an arrest
in any given situation.
Courts
One of the classic studies of courtroom operations is provided in Eisenstein and Jacobs 1977 analysis of courts in
Baltimore, Chicago, and Detroit. In addition to using official archival data on cases processed in each of these
three jurisdictions, they spent many hours in courtrooms
taking notes on the activities and interactions occurring in
the courtroom. They followed their detailed observations
by interviewing judges, prosecutors, and defense attorneys to have each explain what was taking place in the
courtroom and why. Thus, where prior research and
Eisenstein and Jacobs own analysis of archival data
had been able to document statistical regularities in courtroom decision-making, such as the effect of pleading
guilty on the severity of punishment, the use of systematic
observation and in-depth interviewing was able to provide
an explanation for why this link existed.
Ethnographic Methods
An ethnography is essentially an attempt by the researcher to understand another world view or culture.
In contrast to the methods described above, ethnographic
research does not try to use information on one characteristic or variable to predict the value of another characteristic or variable. Instead, the intent is to try to
understand the reasons for behaviors, norms, and customs. The information gathered in an ethnography may
come from many different sources, including in-depth
interviews that may be structured or unstructured, content analysis of documents or other cultural forms, and
participant observation.
Criminology
Mixed Methods
There may be situations where one is interested in the
relationship between two different concepts or variables
and has the opportunity to use multiple methods. Although earlier criminological studies used mixed
methods, it is only since the late 1980s that their use in
criminology has been taken seriously. When researchers
use both quantitative and qualitative methods to test hypotheses, the validity of the results is strengthened considerably if the answers to the research questions appear
to be similar. In those situations where the answers appear
to be different, the use of qualitative methods is likely to
point to important gaps in the quantitative method or the
extant theory on an issue. In light of these gaps, the qualitative methods then help to explain the observed pattern
of results or how the theory could be modified. Ulmers
1997 study on the use of sentencing guidelines in three
different counties in Pennsylvania helps to illustrate how
in-depth interviews can inform quantitative findings.
After noting that conformity to sentencing guidelines varied significantly across the three counties, even after controlling statistically for predictors of sentence decisions,
he used in-depth interviews with key courtroom actors to
discover why there was a greater (or lesser) emphasis on
complying with the guidelines.
Summary
The range of topics and issues in the field of criminology
has required researchers to use a wide range of methods
to gain a fuller understanding of crime and the criminal
justice system. Again, it is important to note that there is
no single best method to use for the study of crime and
criminal justice; each method is aimed at answering
a different kind of question. Fundamentally, the most
important concern related to the use of different methods
in the study of crime and criminal justice is for the
research question to match the method used. The qualitative and quantitative methods outlined above have the
potential to answer different research questions or to
answer the same research question in different ways.
Consider the link between family structure and crime.
Using the self-report method, a researcher could gather
detailed survey data on characteristics of the respondents
family and self-reported criminal behavior and test for
a link between family structure and crime. A researcher
could also use data from the Uniform Crime Reports
to examine the link between crime rates and rates of
divorce or single-parent households at the city, the county, or the state level. Alternatively, a researcher could
conduct in-depth interviews with individuals confined
563
Further Reading
Agar, M. (1996). The Professional Stranger. Academic Press,
San Diego, CA.
Britt, C., Kleck, G., and Bordua, D. (1996). A reassessment of
the D.C. gun law: Some cautionary notes on the use of
interrupted time series designs for policy impact assessment. Law Soc. Rev. 30, 361 380.
Cook, T., and Campbell, D. (1979). Quasi-Experiments:
Design and Analysis for Field Settings. Houghton Mifflin,
Boston, MA.
Eisenstein, J., and Jacob, H. (1977). Felony Justice. Little
Brown, Boston, MA.
Fleisher, M. (1998). Dead End Kids: Gang Girls and the Boys
They Know. University of Wisconsin Press, Madison, WI.
Harris, R. (1973). The Police Academy: An Inside View. Wiley,
New York.
Hindelang, M., Hirschi, T., and Weis, J. (1981). Measuring
Delinquency. Sage, Newbury Park, CA.
Hirschi, T. ([1969] 2002) Causes of Delinquency. Transaction
Publishers, New Brunswick, NJ.
Marquart, J. (1986). Prison guards and the use of physical
coercion as a mechanism of social control. Criminology 24,
347 366.
Sherman, L., and Berk, R. (1984). The specific deterrent
effects of arrest for domestic assault. Am. Sociol. Rev. 49,
261 272.
Ulmer, J. (1997). Social Worlds of Sentencing. SUNY Press,
Albany, NY.
Weisburd, D. (1991). Design sensitivity in criminal justice
experiments. Crime Justice Annu. Rev. 17, 337 379.
Wolfgang, M., Figlio, R., and Sellin, T. (1972). Delinquency in
a Birth Cohort. University of Chicago Press, Chicago, IL.
Yablonsky, L. (1959). The delinquent gang as near-group. Soc.
Problems 7, 108 117.
Critical Views of
Performance Measurement
Barbara Townley
University of Edinburgh, Edinburgh, Scotland, United Kingdom
Glossary
discourse An accepted way of viewing or talking about a subject
or issue; a framing process operating culturally that circumscribes what may be said or thought about a particular issue.
epistemology The study and theory of the nature, origins,
objects, and limitations of knowledge.
New Public Management A generic term for changes in
public administration in many Organization for Economic
Cooperation and Development (OECD) countries at the
end of the 20th century; involves the introduction of
markets into public bureaucracies and monopolies, the
contracting out of public services and their management
through performance contracts and oversight agencies, and
the increased use of management and accounting controls.
ontology Theory of the nature of being or existence.
performance indicators Statistics, ratios, and other forms of
information that illuminate or measure progress in achieving aims and objectives of an organization. They are
indicative rather than precise and unambiguous and alert
the need to examine an issue further.
performance measures Benchmarks designed to indicate the
economy (minimize cost of resources with regard to quality of
inputs), efficiency (e.g., relationship between output of goods
and services, in relation to resources used to produce them),
or effectiveness (relationship between the intended and
actual results) of a current or past activity, unit, organization,
etc. Measures should reflect consistency (over time and
between units), comparability (comparing like with like),
clarity (simple, well defined, and easily understood by those
being measured), comprehensiveness or centrality (reflecting
those elements important to the organization or program, but
also needing to be parsimonious, because too many measures
are problematic), and control (should measure only those
elements over which the individual or unit has control).
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
565
566
consumers or clients. In this sense, performance measures are designed to enhance citizen confidence in
producer and professional delivery. However, rather
than seeing the introduction of performance measures
as the progressive refinement of management techniques, a critical perspective on their use poses a
different set of questions: Why the current concern
with performance measurement? What are performance
measures a problematization of, or an answer to? Within
the public sector, the focus on performance measures
has accompanied state restructuring, with an increased
market component to public sector delivery, including
privatization and outsourcing. This is accompanied by
a discourse that emphasizes the importance of outcomes
of government activity, a refocus from process or procedural compliance, as the main criteria of organizational effectiveness. Within this framework, performance
measures are designed to act as substitutes for market
forces, to achieve organizational decentralization and
managerial flexibility while at the same time ensuring
there is effective service delivery. Thus a critical perspective alerts us to the economic and political contexts
in which appeals to the use of performance measures
have more salience. It does so in the belief that an
awareness of these issues helps in understanding the
reception of, and responses to, their introduction.
567
Unintended Consequences
The act of measurement is designed to influence behavior.
Performance measures do indeed influence behavior,
but their introduction does not necessarily ensure that
the behavior that results is the desired behavior, hence
the reference to their unintended consequences.
Problems arise with an implicit assumption that influence is synonymous with eliciting desired behavior, or
control. The following discussions present some of the
dysfunctional consequences (tunnel vision, goal displacement, suboptimization, myopia, ossification, gaming, and
misrepresentation) that have been observed with the introduction of performance measures at both individual
and organizational levels.
Tunnel vision occurs when there is a focus on the
measures to the exclusion of other aspects of operations
or organizational functioning. This sometimes results in
goal displacement, whereby individuals focus on behavior
that tries to affect the measure of the performance, rather
than the performance. For example, maintaining a good
record becomes more important than any notion of service to clients, and managerial activity is directed to the
appearance, rather than the substance, of good performance. Suboptimization is characterized by a focus on the
unit to which the measures apply in order to achieve these,
regardless of whether a unit focus may detract from the
effectiveness of the organizational performance as a whole
or, indeed, cause problems for the achievement of organizational objectives. Similarly, myopia is the focus on
achieving short-term objectives or measures at the expense of longer term needs.
Once performance measures have been established for
an organization or unit there is a danger that these measures then constrain behavior to the extent that new or
innovative methods or practices are eschewed for fear that
measures may not be reached. This can lead to ossification
of practices and ossification of measures, because they
reflect criteria that were important at the time of their
creation but may not reflect current concerns or objectives. Performance measures may also introduce a climate
of competitiveness and fear, which may in turn introduce
an element of gaming, or the search for strategic advantage as units attempt to achieve their targets, even if this
is to the detriment of other groups or units within the
568
Representations condense, simplify, and translate an activity located in a specific time and space and enable it to
be controlled at a distance. A three-dimensional world is
reduced to a two-dimensional representation. Any organization needs to use performance measures as a form of
organizational intelligence, but as with any system of
representation, a distinction must be drawn between an
abstract representative system and that which it is claimed
to represent.
One danger lies in the reification of the proxy or model,
i.e., the model, or that which is measured, is then assumed
to be the real. The reliance on measurement implies
that the measurement is of something that is an intrinsic or
objective property of the system, rather than a construct of
the measurement process. For example, the construction
of performance measures tends to focus on those things
that are most easily demonstrated and measured, especially if measures are designed to be comparable and
conform to ideals of replicability (i.e., they can be reproduced over time), portability (they can be transmitted or
reported to other locations or organizations), and calculability (the ease with which they may be combined with
other measures to give overall measures). Because something is easily demonstrated and measured, it does not
necessarily capture what is considered to be important in
the organization, or give a complete picture of organizational activities. For example, an emphasis on traditional
crime statistics (measures of crimes committed, detection and arrest, etc.) focuses specifically on law enforcement and downplays police service roles in crime
prevention and victim support, both of which are more
difficult to measure.
Because of the failure to capture the whole picture in
a few measures, there is a tendency for performance measures to expand to take into account all phases of operations, an inevitable tendency for performance measures to
grow in complexity. An already burgeoning bureaucratization grows as systems are introduced to ensure that all
activities that are engaged in are recorded. As these become too complicated for a clear-cut view of performance,
some pieces of information are disregarded and others
are singled out. Core functions become the things that
get measured, others may get downgraded. Over time,
however, this can lead to a shift in, or loss of, organizational capabilities and a redefinition of organizational
identity. Activities conform to the constructions demanded by the management technique. These changes
are subtle over time, but they are introduced through
measures, not through debate and agreement.
A degree of discrepancy can emerge between managerial and practitioner models, between official knowledge and practice. Measures, simplified versions of
specialist knowledge and complex practice, come to
represent the real. A shadow rationalization, an abstracted form of representation that reflects institutional
Performance Measures
Epistemological and Ontological
Foundations
It is useful to examine in more depth some of the assumptions that sustain performance measures, by addressing
two questions: What is the underlying model that supports
them and what are the epistemological assumptions that
need to be in place to sustain performance measurement
as a reasonable practice?
There is a heavy reliance on analysis, the reduction of
entities to their component parts. Implicit within this
model is a reductionism and decomposition, an assumption that elements can be reduced to their component
parts, that an analysis of component parts will lead to
their synthesis, and that the whole is the sum of its
parts. Some of the difficulties with this model are illustrated by the practical problems of trying to identify performance measures. Mention has already been made of
the bureaucratic impact of performance measures. This in
part because of a plethora of terms that are available to
capture activities. Reference may be made to outputs,
intermediate outputs, final outputs, outcomes, intermediate outcomes, etc. These may be supplemented by core
measures, management measures, indicators, activitybased measures, efficiency measures, etc. It is this that
569
leads to a reported proliferation of measures that accompany these systems over a period of time.
It also, in part, reflects a confusion as to what performance is, whether it is economy, efficiency, effectiveness, outputs, quality, customer service, social results,
etc. The balanced scorecard, for example, was devised
in response to dissatisfaction with reliance solely on accounting measures in organizations, on the grounds that
these did not give an adequate reflection of organizational
performance. Measures should therefore reflect a broader
range of considerations, not only financial ones, and
should thus incorporate customer perspectives, internal
business perspectives, innovation, and learning. There is,
however, no limit as to what may be reported.
This proliferation is the inevitable outcome of trying to
stabilize what are after all complex organizational and
social interactions occurring through time and space.
As Mintzberg observed, organizations can certainly define on paper whatever they choose . . . but what have
these labels to do with real phenomena and working
processes? Mintzberg continues that it is a major fallacy
of this approach that a phenomena has been captured
because it is written down, labeled, and put in a box,
ideally represented by numbers.
There is concomitantly a high degree of formalization attached to the operation of performance measures:
the assumption of carefully delineated stepsmission
statements, objectives, targets, performance measures
executed in sequential order, and their integration
secured through rules and procedures. This, however,
assumes a degree of tight coupling between the various
elements of organizational activities and their overall objectives. In many areas, this degree of tight coupling is
difficult to identify, as, for example, school activity and
educational levels and hospitals and general health. It is
for these reasons that often identified targets may not
relate to organizational plans. A related assumption is
that organizational units and interorganizational networks
stay tied together by means of controls in the forms
of incentives and measures. Undue emphasis on this, of
course, is in danger of underestimating the role of values
in securing organizational objectives and integration.
Performance measures place a heavy emphasis on both
objectivity, i.e., measures reflect what is out there, and
quantification, i.e., the importance of facts and hard data.
There is sometimes the assumption that this may substitute for, rather than guide, social and political issues in
judgment and decision making. Imagine the following
vignette: Two states or governments want to measure
the effectiveness of public schools in relation to the educational attainment of students. The performance measures in both states indicate poor educational attainment
by students in state schools in comparison to private
sector education. With the same information, the decision-making process may be diametrically opposed.
570
Performance Measures:
A Technical or Political Tool?
A critical perspective on performance measures has
a guiding axiom: it is not that everything is bad, it is
that everything is (potentially) dangerous. Any socially
coordinated system requires some mechanisms that can
act as proxies for activities between their location and
decision-making centers. In this sense, measures provide
an invaluable coordinating role, notwithstanding some of
their obvious drawbacks. However, there is a danger of
viewing organizations and the managerial techniques
within a rational or technocratic framework; that is, that
organizations are rational systems and that procedures or
systems used in their coordination are purely a technical
mechanism. Perhaps a main conclusion that may be
drawn from the studies of their use is that there is
a danger in seeing performance measures as merely technical devices, a transparent snapshot of activity. Rather, it
becomes important to be aware of their political role, not
in a self-interested sense as politics is so often understood,
but in terms of a recognition of the values and perspectives
that measures represent.
It seems important when using and interpreting performance measures to be cognizant of several questions:
What are the origins of the criteria used in the measures? That is, what are the measures and how were they
devised?
Who chose or set the measures, and to what purpose?
Which groups or whose perspectives are being
served by the measures? Which groups or whose perspectives are being silenced by these particular measures?
What is it hoped to achieve by recourse to these
measures?
Further Reading
Carter, N., Klein, R., and Day, P. (1992). How Organizations
Measure Success. Routledge, London.
De Bruijn, H. (2002). Managing Performance in the Public
Sector. Routledge, London.
Kaplan, R. S., and Norton, D. P. (1992). The balanced
scorecard: measures that drive performance. Harvard Bus.
Rev. 70(1), 71 79.
571
Cross-Cultural Data
Applicability and
Comparisons
Murray J. Leaf
University of Texas at Dallas, Richardson, Texas, USA
Glossary
cross-cultural comparison Similar cultural ideas, behaviors,
or institutions compared across different communities.
culture The ideas, values, artifacts, and organizational arrangements that human beings develop interactively and
disseminate by means of communication and example.
cultural information system A system of mutually interrelated ideas established in a community and referred to in
communications and symbolic interactions.
information In the mathematical theory of information, the
information potential of a message is the inverse of its
probability; the lower its probability, the higher its information potential. The information potential of a message source
corresponds to its entropy or degree of randomness.
system A collection of elements in which each element has
a determinate relation to every other element.
Cultural Systems
Although anthropologists commonly describe culture as
learned, better operational guidance is achieved by
thinking of culture as taught. The idea of learned
directs attention to what is largely personal and subjective.
Taught directs attention to what is observable and public. If what the people in a community teach each other is
traced out, it can be seen to fall into a number of quite
distinct bundles. Each bundle involves a distinctive set of
interrelated ideas. Some of these bundles are shared by
everyone within a community, some by most in the community, and some by very few. For a small community,
such as a New Guinea tribe or a Punjab village, the total
number of systems is commonly small, perhaps six to eight
in general consensus and 20 to 30 overall. For a nationstate, the total number of systems shared by absolutely
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
573
574
forms for social relationships that they carry over into the
others. Kinship utilizes ideas of relations established by
birth and marriage and, usually, persisting throughout
life. Such relationships are always defined in reciprocal
pairs: if I am an X to you, you must be a Y to me. As
a logical concomitant of the reciprocal character of kinship definitions, kinship systems are usually described and
can always be graphically represented as ego centered.
That is, the definitions of a position to be occupied by all
possible relatives are framed in relation to a specific I or
self. A kinship system does not provide a definition for
a father or father in general, but for my father; not
uncle in general, but my uncle, and so on. The person
who is my father at birth will remain my father till death
(in most systems), and if X is father to me, then I am son
or daughter to him. The specificity of the definitions
permits precise reasoning from one set of relations
to others in the system. For example, brothers (in the
English/American system) will have exactly the same people as uncles and exactly the same person as father. But
cousins will have all but one uncle in common, and one
cousins uncle will be the other cousins fatheramong
other things. Additional kinship ideas are attached to the
basic relational ideas. These commonly include rules for
marriage, adoption, inheritance, marital residence, and
definitions of what constitutes a household.
Organizations that are viewed as governments, professional associations, or voluntary associations are not
usually defined by the idea of relationships from birth,
relations as reciprocal pairs, or a central self or ego. They
use instead the basics of an office. An office is position
that a given individual may occupy for a time but can or
must leave, to be replaced. Unlike the ideas of kinship
relations, this requires making an absolute separation between the social position and the person who occupies the
position, and in this sense, a much more fully developed
cognitive logic is embodied in Piagets sense. Offices are
defined in such a way as to be the same for all comers. The
President of the United States is not president only in
relation to citizen, for example, but also in relation to all
other defined or possible political roles or relations.
Systems of ideas that define government are more
standardized around the world than are those that define
kinship, and hence lend themselves more readily both to
holistic comparison (constitutional analysis) and to comparison of selected items of data created by the application
of the definitional categories. The feature that most marks
such systems as governmental is lawful power. The positions are defined in such a way that the one can command
the many. One or a few people occupying certain positions
may properly give orders to any or all of those occupying
other positions, and may command force (which is to say,
people occupying other positions) to compel obedience.
By contrast, the distinguishing feature of professional
associations is that their offices are divided between
575
576
577
Conclusion
International comparative studies now have available
a great deal of data. National censuses are increasingly
available on-line. Numerous international governmental
and nongovernmental agencies are supervising the collection of large amounts of internationally standardized
data on a wide range of important matters including law,
labor, health, agriculture, the environment, and virtually
all commercial activities important for international
578
Further Reading
Bernard, H. R. (1994). Research Methods in Anthropology.
Qualitative and Quantitative Approaches. Sage, Beverly Hills.
Brokensha, D., Warren, D. M., and Werner, O. (1980).
Indigenous Knowledge Systems and Development.
University Press of America, Lanham, Maryland.
Feinberg, R., and Ottenheimer, M. (eds.) (2001). The Cultural
Analysis of Kinship: The Legacy of David M. Schneider.
University of Illinois Press, Urbana.
Fischer, M. (2002). Integrating anthropological approaches to
the study of culture: The hard and the soft. Cybernetics
and Systems, Vol. 1. Proc. 16th Eur. Mtg. Cybernet. Syst.
Res., Vienna (R. Trappl, ed.), pp. 3767 3772. Austrian
Society for Cybernetic Studies, Vienna.
Krippendorff, K. (1980). Content Analysis: An Introduction to
its Methodology. Sage, Beverly Hills.
Leaf, M. J. (1972). Information and Behavior in a Sikh Village.
University of California Press, Berkeley and Los Angeles.
Neale, W. C. (1990). Developing Rural India: Policies, Politics
and Progress. Perspectives on Asian and African Development #3. Riverdale Publ., Maryland.
North, D. C. (1990). Institutions, Institutional Change and
Economic Performance. Cambridge University Press,
Cambridge.
Ostrom, E., Shroeder, L., and Wynne, S. (1993). Institutional
Incentives and Sustainable Development: Infrastructure
Policies in Perspective. Westview Press, Boulder.
Salzman, P. C. (2001). Understanding Culture: An Introduction to Anthropological Theory. Waveland, Prospect
Heights, Illinois.
Shannon, C. E., and Weaver, W. (1949). The Mathematical
Theory of Communication. University of Illinois Press, Urbana.
Werner, O. (1993). Short take 11: Constructed folk definitions
from interviews. Cult. Anthropol. Methods J. 5(3), 4 7.
Glossary
competence Knowledge; the probability of knowing a correct
answer (without guessing and not by chance).
cultural beliefs Learned and shared beliefs.
cultural competence How much an individual knows or
shares group beliefs. In the formal model, this is the
proportion of answers an individual knows; in the informal
model, this is the correspondence between the responses of
an individual and those of the group.
formal model A formal process model of how questions are
answered. The model proceeds from axioms and uses
mathematical proofs to arrive at estimates of competence and answers to a series of questions. It can only
accommodate categorical-type responses.
informal model A variation of the consensus model that
estimates the relative level of competency in a sample from
the pattern of correlations between individuals responses.
It can accommodate interval-scale or fully ranked responses.
performance The probability of a correct answer due to
knowledge and chance/guessing.
Introduction
Cultural beliefs are beliefs that are learned and shared
across groups of people. Because the amount of information in a culture is too large for any one individual to
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
579
580
Pr Xik
Pr Xik
Pr Xik
Pr Xik
581
1, Xjk 1 p Di 1 Di gi Dj 1 Dj g j 1 p 1 Di gi 1 Dj g j ,
1, Xjk 0 p Di 1 Di gi 1 Dj 1 Dj g j 1 p 1 Di gi 1 1 Dj g j ,
0, Xjk 1 p 1 Di 1 Di gi Dj 1 Dj g j 1 p 1 1 Di gi
1 Dj g j ,
0, Xjk 0 p 1 Di 1 Di gi
1 Dj 1 Dj g j 1 p 1 1 Di gi
1 1 Dj g j :
4
582
Cij
Dd
i Dj ,
p1 p
C
C . . . D2n
2 n23 n3
D1
7
6
D
6 27
7
6
7
6
6 D3 7D1 , D2 , D3 , . . . , Dn :
6 . 7
6 . 7
4 . 5
Dn
i1
N
Y
Xik Di 1 Di gi 1 Xik 1 Di gi ,
i1
7
and for a correct answer of 0 is
N
Y
i1
N
Y
1 Xik Di 1 Di gi Xik 1 Di gi :
i1
8
Bayes theorem can then be used to estimate the posterior
probability that an answer is 1, for the given responses:
The person-by-person agreement matrix may be factored with a minimum residual factoring method
(principal axis factoring or iterated principal factor
analysis without rotation) to solve for the unknown
competence values on the main diagonal. Because there
are two unknowns (Di and Dj) in each equation, three
or more subjects are required to find a solution. A
goodness-of-fit rule is used to determine the dimensionality of the solution. If the ratio of the first to second
eigenvalues is at least three 3 : 1, then it is assumed that
the data contain only a single dimension. This is equivalent to testing the first assumption of the consensus
model, namely, that there is only a single answer key
applicable to all the subjects. Also, all competencies
should be positive (0 Di 1.0).
Sample Size
Sample size determination in a consensus analysis is similar to other types of analyses; namely, when variability is
low, power is high and small samples will suffice. Here,
variability is a function of the concordance (competence)
among subjects. Three parameters are needed in order to
estimate the number of subjects necessary: the average
competence level of the group being studied, the minimum proportion of items to be correctly classified, and the
desired confidence level (Bayesian posterior probability)
in those classifications. When planning a study, it is advisable to use conservative estimates for these parameters.
For example, with relatively low agreement (average competence level of 0.50), a high proportion of items to be
correctly classified (0.95), and high confidence (0.999),
a minimum sample size of 29 is necessary. For higher
levels of competence and lower levels of accuracy and
confidence, smaller sample sizes are necessary.
583
Related Analyses
This model closely parallels the psychometric test model,
with some exceptions. First, the focus of a consensus analysis is on the subjects as the unit of the analysis and not on
the items. Second, the answers are unknown, and so the
analysis estimates the answers and correspondence with
the answers. Third, the answers are estimated with
a weighted aggregation: a linear combination of subject
584
Summary
The cultural consensus model provides estimates for the
answers to a series of questions, when the answers are
unknown, and estimates individual competence on the
questions. The formal model can accommodate multiplechoice type questions and the informal version can accommodate fully ranked or interval response data. The
formal version of the consensus model proceeds from
axioms and uses mathematical derivations to solve for
competence from observed agreement. The confidence
(probability) in each answer is calculated using the competence scores to adjust the prior probabilities. In the
informal version of the model, a subject-by-subject correlation matrix is factored to estimate both the competencies (the first factor loadings) and the answers (the first
Data
Applicability
and
Further Reading
Baer, R. D., Weller, S. C., Pachter, L. M., Trotter, R. T.,
de Alba Garcia, J. E., et al. (1999). Beliefs about AIDS in
five Latin and Anglo-American populations: The role of the
biomedical model. Anthropol. Med. 6, 1329.
Baer, R. D., Weller, S. C., Pachter, L. M., Trotter, R. T.,
de Alba Garcia, J. E., et al. (1999). Cross-cultural perspectives on the common cold: Data from five populations.
Human Org. 58, 251260.
Batchelder, W. H., and Romney, A. K. (1986). The statistical
analysis of a general Condorcet model for dichotomous
choice situations. In Information Pooling and Group
Decision Making (G. Grofman and G. Owen, eds.),
pp. 103112. JAI Press, CT.
Batchelder, W. H., and Romney, A. K. (1988). Test theory
without an answer key. Psychometrika 53(1), 7192.
Boster, J. S. (1986). Exchange of varieties and information
between Aguaruna manioc cultivators. Am. Anthropol. 88,
429436.
Chavez, L. R. H., Hubbell, F. A., McMullin, J. M.,
Martinez, R. G., and Mishra, S. I. (1995). Structure and
meaning in models of breast and cervical cancer risk
factors: A comparison of perceptions among Latinas, Anglo
women, and physicians. Med. Anthropol. Q. 9, 4074.
Dressler, W. W., Balieiro, M. C., and Dos Santos, J. E. (1997). The
cultural construction of social support in Brazil: Associations
585
Data Collection in
Developing Countries
Emmanuela Gakidou
Center for Basic Research in the Social Sciences, Harvard University, USA
Margaret Hogan
World Health Organization, Geneva, Switzerland
Glossary
Introduction
administrative data Information on indicators of performance and service delivery reported by governments or
government agencies at a national or subnational level.
census An enumeration of people, houses, firms, or other
important items in a country or region at a particular time.
Used alone, the term usually refers to a population census.
developing country Generally, a country that has a per
capita income below a certain threshold; this is usually the
case in Latin America and the Caribbean, Africa, Europe
(not including member states of the European Union or
European Free Trade Area), the Middle East, and Asia (not
including Japan, Australia, and New Zealand).
microdata Information from surveys or censuses at the
individual level, i.e. unaggregated original data; not always
available due to confidentiality concerns.
There are four major sources of data in developing countries: censuses, household surveys, administrative data,
and vital registration systems. Each source has built-in
advantages, biases, and problems. Censuses, the preferred source of microdata, are the most commonly available
data source but are conducted infrequently, making social
measurements somewhat difficult. Household surveys are
most important in developing countries. Administrative
data may be complicated by a variety of biases, and complete vital registration systems exist in only a small number
of developing countries. Meaningful analyses of all of the
data sources require thorough assessment of the problems
and usefulness of each type.
Census Availability
In both developed and developing countries, censuses are
the foundation of microdata analysis for many social measurement tasks. Almost every country has some sort of
census initiative; in the period from 1995 to 2004, approximately 85% of countries around the world had or planned
to have a census. The U.S. Census Bureau has a complete
record of the censuses that have been conducted in
countries and areas of the world since 1945.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
587
588
Limitations of Census
Data Analysis
Although census microdata offer a wealth of information
for analysis, there are some drawbacks associated with
Household Surveys
The most important source of data for developing countries is household surveys. When implemented with an
appropriate sampling plan, household surveys can overcome the significant challenges of selection bias inherent
in most administrative data systems. Contrasted with censuses, the smaller scale of household surveys allows for
longer interviews and smaller pools of more thoroughly
trained interviewer staff, which can result in higher quality data on a more comprehensive set of topics. The
smaller sample size of household surveys also means
that the surveys can be carried out regularly, allowing
for more immediate feedback on program effectiveness
for policymakers. National statistical offices often carry
out household surveys on an annual or other frequent
periodic basis.
The national household surveys are often part of multicountry efforts using common instruments. Some of the
major programs are the World Fertility Survey, the Living
589
590
Sample Design
Expertise in designing a sample frame and sampling is
essential to the successful implementation of a household
survey. Many household surveys, particularly in countries
without recent national censuses, must deal with fundamental challenges in the design of the sample. The science
of survey sampling is well developed, with widely recognized standards of quality, but the implementation of the
sample design may be problematic. In the process of implementation, there are several opportunities for creating
selection bias. Older members of the household may not
want to be the primary interview respondent even if chosen by Kisch tables or a similar approach. For reasons of
cost, interviewer teams may not be able to return to households multiple times in remote areas if respondents were
not available or if follow-up is required.
Post-hoc evaluation of the sample age, sex, and education distribution for many surveys can reveal how far
away a sample is from the actual age and sex distribution of
the population. This problem is particularly pronounced
in conflict or postconflict settings, but may be an important issue in all low-income countries as well. It is an
unfortunate situation that these countries are perhaps
most in need of reliable demographic data. For middleincome developing countries as for developed countries,
there is also the problem of institutionalized individuals;
this can be very important, particularly because individuals are often institutionalized because they are not like
the rest of the population. The absence of elderly, sick, or
mentally handicapped household members in a household sample can bias population figures dramatically.
591
Cross-Population Comparability
Comparability has become an important challenge in
modern survey instrument development. Comparability
is required not only across countries, but also within
countries over time, or across different subpopulations
delineated by age, sex, education, income, or other
characteristics.
The fundamental challenge in seeking crosspopulation comparable measures is that the most accessible sources of data are categorical self-reported data.
When categorical data are used as the basis for understanding quantities that are determined on a continuous,
cardinal scale, the problem of cross-population comparability emerges from differences in the way different
individuals use categorical response scales. Efforts to ensure linguistic equivalence of questions across different
settings may improve the psychometric properties of
these questions in terms of traditional criteria such as
reliability and within-population validity, but they will
not resolve problems stemming from noncomparability
in the interpretation and use of response categories.
Thus, cross-population comparability represents a more
stringent criterion for evaluation of measurement instruments, beyond the traditional concepts of reliability and
validity.
Recent advances in survey design, including work at
the World Health Organization, have resulted in two main
strategies to correct the problem of differential use of
categorical response scales. The first strategy is to establish a scale that is strictly comparable across individuals
and populations. Measurements on the comparable scale
can then be used to establish the response category cutpoints for each survey item. The second approach is to get
categorical responses from different groups for a fixed
level on the latent scale. If the level is fixed, variation
in the responses provides information on the differences
in cut-points across individuals and populations. This
strategy usually involves the introduction of short
stories (vignettes) into survey instruments that allow
analysts to calibrate peoples responses to their perceived
cut-points on a latent variable scale. Vignettes are
(usually brief) descriptions of hypothetical people or
situations that survey researchers can use to correct
592
Administrative Data
Some areas of social measurement depend heavily on
administrative data collected by service providers.
Well-known examples include primary school enrollment
data collected from primary schools and aggregated up to
the national level, and extensive information collected
from public health service providers on the delivery of
various interventions, such as immunizations or the Directly Observed Treatment Short course (DOTS) for tuberculosis. These types of data can often be useful,
especially because they may be the only source of data
for a particular field of social measurement for several
countries.
Administrative data, however, have to be used with
caution, and there are at least four fundamental problems
associated with them. First, there is the problem of
denominators: the groups in need of service may not access the service in the first place (e.g., the poor or marginalized groups). For some measures, the problem of
denominators may not be a significant issue when census
data can be used to define the target group, but for other
areas, such as health, defining who needs a service in an
area may be much more challenging. The second problem
involves selection bias: the individuals who receive an
intervention or service in a community are likely to be
a nonrandom sample of the population. This creates many
analytical challenges for using such data to understand
what is happening in a population. Third, there are problems of collation and aggregation: data from different
levels of an administrative systems often do not get communicated, or get partially communicated to the next level
up in the system. The result is that aggregations for
provinces or for the entire country are often based on
incomplete returns from various levels. This limits comparability across years and across units of study. A final
problem is that, in many countries, financial incentives are
paid based on the administrative data performance
results. Even when incentives are not paid, other nonfinancial incentives may lead to inflated figures. This often
means that administrative data tend to exaggerate service
delivery. An important challenge for social measurement
is to figure out how to deal with these issues in data analysis and how to correct administrative data for known
biases.
Conclusions
Although there are many possibilities for data analysis in
developing countries, there are several problems associated with each data source; these difficulties must be
minimized, corrected for, and acknowledged before
meaningful analysis can be completed. Developing countries have incredible need for accurate data at the population level, and yet struggle with the collection of
consistently useful and complete data sets. International
efforts at collating and dispersing microdata may lead to
more cross-country comparability and access to what
could be important information related to welfare in
developing countries.
Further Reading
Abu-Libdeh, H., Alam, I., Dackam-Ngatchou, R.,
Freedman, H. A., and Jones, G. C. (2003). Counting the
People: Constraining Census Costs and Assessing Alternative Approaches. Population and Development Strategies
Series No. 7. United Nations Population Fund (UNFPA),
New York. Available on the Internet at www.unfpa.org
African Census Analysis Project (www.acap.upenn.edu).
Centro Latinoamericano y Caribeno de Demografia
(CELADE) (www.eclac.org/celade).
Demographic Database of Population Activities Unit (PAU-DB)
of the United Nations Economic Commission for Europe
(UNECE) (www.unece.org).
Demographic and Health Surveys (DHS) (www.measuredhs.
com).
Integrated Public Use Microdata Series (IPUMS)International project (www.ipums.umn.edu).
Labor Force Survey (www.ilo.org).
Living Standards Measurement Study (LSMS) (www.
worldbank.org).
Multiple Indicator Cluster Surveys (MICS) (http://childinfo.
org).
Murray, C. J. L., et al. (2003). Cross-population comparability of evidence for health policy. In Health Systems
Performance Assessment: Debates, Methods, and Empiricism
(C. J. L. Murray and D. Evans, eds.), pp. 705714. World
Health Organization, Geneva.
Pan-Arab Project for Child Development (PAPCHILD) and
the Pan-Arab Project for Family Health (PAPFAM)
(www.papfam.org).
U.S. Census Bureau. Census dates for countries and areas of
the world: 1945 to 2004 (www.census.gov).
World Fertility Survey (http://opr.princeton.edu).
World Health Surveys (WHS) (www.who.int).
Data Collection,
Primary vs. Secondary
Joop J. Hox
Utrecht University, Utrecht, The Netherlands
Hennie R. Boeije
Utrecht University, Utrecht, The Netherlands
Glossary
Introduction
Data collection, primary vs. secondary, explains the advantages and disadvantages of collecting primary data for
a specific study and reusing research material that was
originally collected for a different purpose than the study
at hand. After a brief discussion of the major data collection strategies in primary research, we discuss search
strategies for finding useful secondary data, problems associated with retrieving these data, and methodological
criteria that are applied to evaluate the quality of the
secondary data.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
593
594
a planned design and observes the effects of the independent variables on the dependent variable, the outcome
variable. The essence of an experiment is that the research
situation is one created by the researcher. This permits
strong control over the design and the procedure, and as
a result the outcome of an experiment permits causal
interpretation. This is referred to as the internal validity
the degree to which the experimental design excludes
alternative explanations of the experiments results. At
the same time, the fact that the experimenter creates
the research situation, often in a laboratory setting,
implies that the situation is to some degree artificial.
The problem here is the ecological validitythe extent
to which we can generalize the results of our study to reallife situations. Experimental laboratory studies put
emphasis on those variables that are easily manageable
rather than on variables that reflect the everyday activities
of people coping with real-life situations. Typically,
because an experiment involves setting up experimental
situations and exposing subjects to different stimuli,
experiments involve a relatively small number of subjects
and variables. However, because there is strong control
over the design, most experiments make an effort to
manipulate several variables, using designs that permit
conclusions about both their individual and their combined effects. Several handbooks describe a variety of
experimental designs, including designs for longitudinal
research and case studies.
objective characteristics of a population. The major methodological problems in interview surveys are obtaining a
representative sample and the validity of the responses
given by the respondents. Obtaining a representative
sample is usually accomplished by drawing a random sample from the population, using scientific sampling
methods. However, in most Western countries survey
nonresponse is considerable and increasing, which may
threaten the representativeness of the sample. In addition, both respondent and question characteristics can
affect the responses. To ensure valid responses, interview
questions must be carefully designed, evaluated, and
tested.
Qualitative Research
Qualitative researchers examine how people learn about
and make sense of themselves and others and how they
structure and give meaning to their daily lives. Therefore,
methods of data collection are used that are flexible and
sensitive to the social context. A popular method of data
collection is the qualitative interview in which interviewees are given the floor to talk about their experiences,
views, and so on. Instead of a rigidly standardized instrument, interview guides are used with a range of topics or
themes that can be adjusted during the study. Another
widely used method is participant observation, which generally refers to methods of generating data that involve
researchers immersing themselves in a research setting
and systematically observing interactions, events, and so
on. Other well-known methods of qualitative data collection are the use of focus (guided-discussion) groups,
documents, photographs, film, and video.
Settings, events, or interviewees are purposively sampled, which means guided by the researchers need for
information. Provisional analyses constantly change this
need, and therefore sampling takes place during the research and is interchanged with data collection. Contrary
to probability sampling, which is based on the notion that
the sample will mathematically represent subgroups of
the larger population, purposive sampling is aimed at
constructing a sample that is meaningful theoretically;
it builds in certain characteristics or conditions that
help to develop and test findings and explanations. Sampling strategies include aiming at maximum variation,
snowball sampling, critical case, and stratified purposeful.
The intense role of the researcher brings about issues
with regard to reliability and validity. That the researchers
are their own instrument is necessary to gain valid knowledge about experiences or the culture of a specific individual or group; to reduce the reactivity of the research
subjects, prolonged engagement is recommended. Another issue is the lack of control over the researchers
activities; therefore, researchers should keep detailed
notes of their fieldwork and the choices they make in
595
596
Table 1
Spontaneous
Quantitative Experiment
Interview survey
Mail survey
Structured diary
Web survey
Qualitative
(Passive) observation
Monitoring
Administrative records
(e.g., statistical records,
databases, Internet
archives)
Open interview
(Participant) observation
Focus group
Existing records (e.g.,
ego-documents, images,
Unstructured diary
sounds, news archives)
597
598
The analysis of quantitative data often starts with a process in which the data are cleanedincomplete records
may be edited or automatically imputed, nonnormal data
may be transformed, aggregated scores may be calculated,
and so on. This is all based on the assumptions and
interpretations of the primary researcher. Researchers
who reanalyze the data for another research purpose
may not always agree with the assumptions implied in
the data cleaning and coding. Instead, they may prefer
to use other methods that are more in line with their own
research purpose. If the secondary data contain all the
original variables and if the codebook documents all data
manipulations, researchers can apply their own datacleaning process. However, tracing the preliminary
data-cleaning process can be a difficult and very timeconsuming process. At the very least, researchers should
be alerted and aware of changes to and recodings of the
original raw data when secondary data are used.
If meta-information on the secondary data is incomplete or even totally lacking, it becomes impossible
to assess the reliability and validity of the original
procedures. Such data, however, can still be useful for
teaching purposes, for use as example data or as practice
data for students. In fact, there are several data archives
and other organizations that make data available specifically for teaching purposes; some of these are general,
whereas others aim at a specific analysis technique.
Further Reading
American Statistical Association. http://www.amstat.org/
Berg, B. L. (2001). Qualitative Research Methods for the Social
Sciences. Allyn & Bacon, Boston.
British Library National Sound Archive. Oral history collection. http://www.cadensa.bl.uk
Copernic. Available at: http://www.copernic.com/
Council of European Social Science Data Archives (CESSDA).
http://www.nsd.uib.no/cessda/
Cresswell, J. W. (1998). Qualitative Inquiry and Research
Design: Choosing among Five Traditions. Sage, Thousand
Oaks, CA.
Dbmscopy. Available at: http://www.dataflux.com
De Leeuw, E. D., and de Heer, W. (2002). Trends in
household survey nonresponse: A longitudinal and international comparison. In Survey Nonresponse (R. M. Groves,
D. A. Dillman, J. L. Eltinge, and R. J. A. Little, eds.), pp. 41 54.
Wiley, New York.
De Vaus, D. (2001). Research Design in Social Research. Sage,
Thousand Oaks, CA.
Doyle, P., Lane, J., Theeuwes, J., and Zayatz, L. (2002).
Confidentiality, Disclosure and Data Access: Theory and
599
Glossary
Data distribution and cataloging cover the full postcollection production stream from storing the data, creating the
documentation, determining the level of access to be provided, preserving the data and supporting documentation,
and providing the means for the discovery, acquisition,
and intelligent secondary analysis of the data. The decisions made during this process, as well as how thoroughly
it is accomplished, have a direct bearing on the availability
and usability of social science data for secondary analysis.
Distribution of data refers to the format of publication,
access restrictions imposed on the data, archival considerations, and the network chosen for providing access to
the data. Cataloging is viewed in its broadest sense: how
extensively and exhaustively the data collection is described and how well it facilitates discovery of and access
to the data for the secondary user.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
601
602
Development of Distribution
Methods
Social and economic data have been recorded and distributed for as long as written materials have been accumulated. Some of the earliest clay and stone tablets record
economic transactions, population censuses, and storehouse inventories. Although the media and extent of content have changed over time, the means of distribution has
stayed essentially the same until the mid-1960s and the
introduction of the computer into daily life. Up until this
point, the distribution of social science data depended on
recording information on some medium that could be
stored, copied, and physically delivered in an inflexible
format for interpretation by the end user. Generally, the
data were integrated into the metadata either as an inclusion in a line of text or in a formal table with column
headers and row stubs. In order to transfer the information into another format or another work, researchers had
to copy and transfer the image of the original data or handentered the individual data items into a new format.
Access to computing resources brought changes to the
distribution process that are still emerging. The first step
was the creation of a data file that could be processed
electronically by the end user. Initially, the storage medium continued to require copying and physically transporting the file to the end user. These storage media were
primarily off-line and included things such as paper tape,
punch cards, and 9-track or round tapes. However, if one
had the right reader and the right equipment, one could
process the data without additional hand-entry. As on-line
methods of storage became cheaper and Internet transfer
speed improved, data could be downloaded directly to the
end users computer ready to use.
This not only opened up opportunities for secondary
data analysis, but also resulted in the separation of data
from their surrounding metadata. The metadata that described the storage structure of the data, as well giving
them context and meaning, remained in the traditional
formats. Archives, such as the Inter-University Consortium for Political and Social Research (ICPSR), stored,
copied, and distributed data on rapidly changing media,
but continued to store, copy, and distribute print versions
of the accompanying metadata. Even when the metadata
information, or parts of it, was put into an electronic format, there was little consistency in its structure. The noted
exceptions were set-up files for heavily used statistical
packages, such as Statistical Package for the Social
Sciences (SPSS) and Statistical Analysis System (SAS).
However, these files contained only a small portion of
the metadata available for accurately interpreting data.
Even the switch to electronic images of metadata material
simply improved the access and transfer rate for obtaining
metadata; it did not radically change how the information
could be processed. Human intervention and interpretation were still required.
Advances in data storage and transfer changed user
expectations. Statements such as the following, made in
reference to the extremely large electronic summary data
files from the 1990 Census, were common: If its electronic, you just need to put it in the computer and look at
it. This statement reflects the disjoint between a users
understanding of the structure of electronic data and the
realities of presenting data in an understandable manner
through the incorporation of data and metadata. The data
referred to here resided on hundreds of 9-track tapes,
stored in files with hundreds of thousands of records
up to 150,000 characters in length. Without the information provided by the printed metadata, it was, in
essence, just a bunch of numbers. Clearly, in order to
meet the rising expectation of users to be able to locate,
access, AND interpret data on-line, the quality, consistency, and manageability of the metadata would have
to improve to the level of the data files themselves.
Levels of Description
Descriptive material, or metadata, for social science data
files can be extensive. For example, the printed metadata
for the National Longitudinal Survey creates a stack over
5 ft high. This may seem excessive, but the metadata for
any single data item should provide information on how,
when, why, and by whom it was collected, how it was
processed, the decisions made in editing it, and the
resulting analysis, as well as its relation to other data.
This information may be well defined, scattered in
a variety of documents and formats, or missing altogether.
It helps to think of metadata as layers of description
related to a data item.
603
604
605
606
Further Reading
Cruse, P., Fanshier, M., Gey, F., and Low, M. (2001). Using
DDI extensions as an intermediary for data storage and
data display. IASSIST Quart. 25(3), 5 12. Available at
http://iassistdata.org/publications/iq/iq25/iqvol253cruse.pdf
Dodd, S. (1982). Cataloging Machine-Readable Data Files: An
Interpretive Manual. American Library Association, Chicago, IL.
Data Documentation Initiative. Available at http://www.icpsr.
umich.edu/DDI
Green, A., Dionne, J., and Dennis, M. (1999). Preserving the
Whole: A Two-Track Approach to Rescuing Social Science
Data and Metadata. Council on Library and Information
Sciences, Washington, DC.
Mayo, R. (2000). Metadata in international database systems
and the United Nations Common Database (UNCDB).
IASSIST Quart. 24(1), 4 14. Available at http://datalib.
library.ualberta.ca/publications/iq/iq24/iqvol241mayo.pdf
Musgrave, S., and Ryssevik, J. (1999). The Social Science
Dream Machine: Resource Discovery, Analysis and Data
Delivery on the Web. Available at http://www.nesstar.org/
papers/iassist_0599.html
National Archives and Records Administration, Archival
Research and Evaluation Staff. (1990). A National Archives
Strategy for the Development and Implementation of
Standards for the Creation, Transfer, Access, and LongTerm Storage of Electronic Records of the Federal
Government. National Archives and Records Administration, Technical Information Paper No. 8. Available at http://
www.archives.gov/research_room/media_formats/strategy_
for_electronic_records_storage.html
Ryssevik, J. (1999). Providing Global Access to Distributed
Data Through Metadata StandardisationThe Parallel
Stories of NESSTAR and the DDI, Working Paper
No. 10, UN/CEC Work Session on Statistical Metadata.
Available at http://www.nesstar.org/papers/GlobalAccess.
html
607
Mustafa Dinc
The World Bank, Washington, D.C., USA
The findings, interpretations and conclusions are entirely those of the authors and do
not represent the views of the World Bank, its executive directors, or the countries
they represent.
Glossary
allocative efficiency The efficiency of a production process
in converting inputs to outputs, where the cost of
production is minimized for a given set of input prices.
Allocative efficiency can be calculated by the ratio of cost
efficiency to technical efficiency.
decision-making unit (DMU) The designator for units
(firms, organizations, production elements, service delivery
agents, etc.) being analyzed in a data envelopment analysis
model. Use of this term redirects emphasis of the analysis
from profit-making businesses to decision-making entities;
i.e., the analysis can be applied to any unit-based enterprise
that controls its mix of inputs and decides on which outputs
to produce (the enterprise is not dependent on having
profit as an output, although in the private sector, this is
likely to be one of the outputs).
efficiency frontier The frontier represented by the best
performing decision-making units; made up of the units in
the data set that are most efficient in transforming their
inputs into outputs. The units that determine the frontier
are those classified as being 100% efficient, usually with
a value of 1; any unit not on the frontier has an efficiency
rating of less than 100%.
efficiency score/relative efficiency A score allocated to
a unit as a result of data envelopment analysis. This score is
between 0 and 1 (i.e., 0 and 100%). A unit with a score of
100% is relatively efficient; any unit with a score of less than
100% is relatively inefficient (e.g., a unit with a score of
60% is only 60% as efficient as the best performing units in
the data set analyzed). The efficiency score obtained by
a unit will vary depending on the other units and factors
included in the analysis. Scores are relative (not absolute) to
the other units in the data set.
input Any resource used by a unit to produce its outputs
(products or services); can include resources that are not
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
609
610
611
Econometric approaches offer the ability to test assumptions about mathematical relationships assumed between input and output variables.
Econometric approaches may offer more stable
estimates of efficiency and target inputoutput levels
because the estimates are not dependent on only
a small subset of directly observed inputoutput levels.
Econometric approach estimates of marginal input
output values and of efficiency are more transparent and
can be more readily communicated to the layperson.
Because DEA is an extremal value method, noise
(even symmetrical noise with zero mean) such as measurement error can cause significant problems.
DEA is good at estimating relative efficiency of
a DMU, but it converges very slowly to absolute efficiency. In other words, it can tell you how well you are
doing compared to your peers, but not compared to
a theoretical maximum. The latter is a strength of the
econometric approach.
Use of DEA
Since it was first introduced in 1978, DEA has become
a widely used analytical tool for measuring and
evaluating performance of organizations. It has been
successfully applied to different entities operating in
various areas in many contexts worldwide. In many
cases, evaluations of these entities by using traditional
approaches have been very difficult because of complex
and multidimensional aspects of production processes
in which inputoutput relations were poorly understood. Some examples of the areas in which DEA has
been used are health care, education, banking and
finance, manufacturing, benchmarking, and management evaluation. In addition to these relatively narrow
and focused areas, DEA techniques have been applied
to evaluations of local governments, cities, regions, and
even countries. Studies incorporating a wider scope
include assessments of social and safety-net expenditures as inputs and various quality-of-life dimensions
as outputs.
In other applications, analysts have employed DEA to
get new insights about business activities and methods
and to evaluate these activities. Examples of these applications are benchmarking studies of organizations and
evaluations of the relative efficiencies of mutual vs. corporate forms of organization in the U.S. insurance sector.
Analysts have also used DEA to evaluate governmental
and community activities. The underlying reason DEA
has been used in such a wide variety of activities is
its ability to handle multiple inputs and outputs without
having to specify a production relationship and weighting
system.
612
Fundamentals of DEA
The application of a DEA model involves a three-stage
process. The first stage is involved with the definition and
selection of DMUs to be analyzed. In a DEA analysis, all
units under consideration should perform similar tasks
with similar objectives under the same set of technological and market conditions. These units should
use the same kind of inputs to produce the same kind
of outputs. The second stage is the determination of
input and output variables that will be used in assessing
the relative efficiency of selected DMUs. The final stage
is the application of one of the DEA models and analysis
of results.
After selecting DMUs to be investigated, the analyst
needs to choose a DEA model appropriate to the analysis.
This process has two important aspects; one is related
to the returns-to-scale assumption and the other is
related to the orientation of the model. The returns-toscale issue is relatively easy. If the production process is
observed to have constant returns to scale, then an additive model would be appropriate; otherwise, a multiplicative variable return-to-scale model should be
selected. An additive model ratios outputs to inputs; the
model developed by Abraham Charnes, William Cooper,
and Edwardo Rhodes (the CCR model) is probably the
most widely used and best known DEA model. It is used
when a constant returns-to-scale relationship is assumed
between inputs and outputs. This model calculates the
overall efficiency for each unit, where both pure technical
efficiency and scale efficiency are aggregated into one
value. A multiplicative variable return-to-scale model
measures technical efficiency. The convexity constraint
in the model formulation of Rajeev Banker, Charnes,
and Cooper (the BCC model) ensures that the comparable unit is of a scale size similar to that of the unit being
measured. The efficiency score obtained from this model
gives a score that is at least equal to the score obtained
using the CCR model.
Determining the orientation of the model depends on
the purpose of the analysis. Most decision-making
processes have two major aspects: administrative and policy. An input minimization model addresses the administrative aspect of the problem on hand by addressing the
question how much input (cost) reduction is possible to
produce the same level of output? This information gives
decision makers an opportunity to reallocate excess inputs
to more needed areas. However, there is also a policy
aspect of the efficiency assessment of institutions. Because many inputs used are fixed or quasi-fixed, it is
very difficult to reduce them in the short run. Moreover,
particularly in public policy-related studies, these inputs
are largely financed by taxpayer money and involve equity
and equality issues. Therefore, policymakers often want
Formulation of a Basic
DEA Model
Consider first the relatively simple fractional programming formulation of DEA. Assume that there are n
DMUs to be evaluated. Each consumes different amounts
of i inputs and produces r different outputs, i.e., DMUj
consumes xij amounts of input to produce yrj amounts of
output. It is assumed that these inputs, xij, and outputs, yrj,
are nonnegative, and that each DMU has at least one
positive input and output value. The productivity of
a DMU can then be written as follows:
Ps
ur yrj
hj Pr1
:
m
i1 vi xij
DEA Softwares
Several commercially available DEA software packages
are powerful enough to handle thousands of DMUs and
a large number of input and output variables by using
different extensions of DEA.
subject to
Ps
ur yrj
Pr1
1
m
i1 vi xij
for
j 1, . . . , n,
613
614
OnFront Software
EMQ has developed OnFront software for measuring
economical productivity and quality. The software was
developed by Rolf Fare and Shawna Grosskopf, the originators of the Malmquist productivity index. [The
Malmquist total factor productivity index is defined
using distance functions. Distance functions allow
Efficiency Measurement
System Software
Efficiency Measurement System (EMS) software was developed by Holger Scheel; it contains various models for
efficiency measurement. EMS is free of charge for academic users and can be downloaded from the Internet
(www.wiso.uni-dortmund.de). Features of Version 1.3
include convex and nonconvex technologies; constant,
nonincreasing, nondecreasing, variable returns to scale;
radial, additive, maxAverage (also known as Fare-Lovell),
minAverage, and super efficiency measures; input, output, or nonoriented; weights restrictions; nondiscretionary inputs/outputs; support for program efficiency,
Malmquist indices, and Window analysis; and reports
scores, shadow prices/weights, intensities (lambdas),
benchmarks, and slacks. The operating system is Windows
9x/NT and accepted data files are Excel 97 or ASCII.
DEAP Software
DEAP Version 2.1 was written by Tim Coelli from the
Centre for Efficiency and Productivity Analysis (CEPA).
CEPA was established in 1995. It is located in the School
of Economic Studies at the University of New England
(Armidale, Australia). This program is used to construct
DEA frontiers for the calculation of technical and cost
efficiencies and for the calculation of Malmquist total
factor productivity (TFP) indices. The DEAP program
can be downloaded from the University of New England
website (www.une.edu) free of charge (interested parties
may contact Tim Coelli at the University of New England
to discuss bugs or new versions).
The DEAP program has three principal DEA options:
(1) standard constant returns-to-scale (CRS) and variable
615
Conclusions
DEA is an exciting flexible method of assessing relative
efficiency among decision units using the same technology and in the same or very similar organizational circumstances. One of the reasons that DEA is an important
management tool for diagnosis among decision-making
units is its ability to provide guidance for how nonefficient
units can become more efficient. In addition to the DEA
aspects covered here, several other issues may be of interest to performance analysts, ranging from different
formulations of the DEA model, to bounding the relative
weights, to the use of discretionary vs. nondiscretionary
variables, to parametric alternatives to DEA. One of
the most important contributions to the technique is the
incorporation of the Malmquist index, which, in a way,
involves the time dimension in the model. In order
to obtain reliable results from DEA applications, the
technique should be used with a series of sensitivity
assessments.
Further Reading
Arnold, V. L., Bardhan, I. R., Cooper, W. W., and
Kumbhakar, S. C. (1996). New uses of DEA and statistical
regressions for efficiency evaluation and estimationwith
an illustrative application to public sector secondary
schools in Texas. Ann. Op. Res. 66, 255277.
Banker, D. R., and Thrall, R. M. (1992). Estimation of returns
to scale using data Envelopment analysis. Eur. J. Op. Res.
62, 7484.
Banker, R. D., Charnes, A., and Cooper, W. W. (1984). Some
models for estimating technical and scale inefficiencies
in data envelopment analysis. Mgmt. Sci. 30(9),
10781092.
Banker, R. D., Chang, H., and Cooper, W. W. (1996).
Simulation studies of efficiency, returns to scale and
misspecification with nonlinear functions in DEA. Ann.
Op. Res. 66, 233253.
Bjurek, H., Hjalmarson, L., and Forsund, F. R. (1990).
Deterministic parametric and nonparametric estimation
of efficiency in service production: A comparison.
J. Econometr. 46, 213227.
Charnes, A., Cooper, W. W., and Rhodes, E. L. (1978).
Measuring the efficiency of decision making units. Eur. J.
Op. Res. 2(6), 429444.
Charnes, A., Cooper, W. W., Lewin, A. Y., and Seiford, L. M.
(eds.) (1994). Data Envelopment Analysis: Theory,
Methodology and Applications. Kluwer Academic Publ.,
Boston, MA.
Cooper, W. W., Kumbhakar, S., Thrall, R. M., and Yu, X.
(1995). DEA and stochastic frontier analysis of
616
Data Mining
John A. Bunge
Cornell University, Ithaca, New York, USA
Dean H. Judson
U.S. Census Bureau, Washington, D.C., USA
Glossary
cross-validation In statistics, the process of selecting (random) fractions of the data set, using one fraction for
training and the remaining fractions for test or validation.
data mining The process of using computational and
statistical tools to discover usable information in large
databases.
labeled data Records that have been labeled with the
classification outcome, as opposed to unlabeled data, for
which the classification outcome is not available.
latent variable An outcome or a variable that is presumed to
exist and to be causally important for the model, but is not
observed by the researcher.
supervised learning Statistical learning based on a labeled
data set.
test or validation data set The portion of the data set, or
a new data set, on which the model fit on the training data
set is tested or validated.
training data set A data set (typically labeled) to which the
data mining model is fit.
unsupervised learning Statistical learning based on an
unlabeled data set.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
617
618
Data Mining
prospects for a new product offering. In another application, Federal law requires all manufacturers using toxic
chemicals to report such usage each year at each facility,
along with a large amount of related information (chemical characteristics, location of plant and disposal facilities,
waste management strategies, etc.); researchers wish to
characterize best-practice facilities that perform especially well in toxic-waste reduction. Data miners may
apply a variety of analytical procedures, separately or in
conjunction, to solve such problems.
In this article, there is no attempt to discuss every
kind of analysis that might be considered data mining.
Rather, the focus is on methods that are judged to be
the most-used and most important, and that will be
most useful to the practitioner. They are classified in
the following ways. First, supervised and unsupervised learning, terms that come from the machine
learning literature, are distinguished. In supervised
learning, there is a target (response, dependent) variable, along with a typically large number of input (predictor, independent) variables: The objective is to
characterize the response in terms of the predictors,
that is, to find a function, rule, or algorithm that relates
the predictors to the response. Doing this requires
a learning or training sample, i.e., a data set in which
a case or observation consists of a large list or vector of
independent variables (called the feature vector in the
pattern recognition literature) and a known response.
A data mining procedure analyzes the learning sample,
generates or builds the desired function or rule, and
assesses its performance. Supervised learning is further
subdivided according to the nature of the target variable, which may be binary (dichotomous), categorical
with several unordered categories (polytomous or multinomial), ordered (such as a Likert scale), or quantitative (discrete or continuous). The first is arguably the
canonical data mining situation, as in the credit-card
marketing example, in which the customers are classified either as buyers or nonbuyers. On the other hand,
in the example of toxic chemicals, there is a quantitative
response, namely, the amount of toxic waste released by
a given facility in a given year; this is more readily
related to classical statistical procedures such as multiple regression.
Unsupervised learning is also considered. In this situation, the search is for patterns in a data set without
a specific known target variable or response. The data
set now consists only of feature vectors without an observed target variable, and data mining must search for
clusters, connections, trends, or patterns in an openended way. Thus, unsupervised learning tends to be
more computationally intensive. Both methods may be
used in conjunction: For example, clusters in the data
may first be searched for and defined, and then characterized using supervised learning.
Challenges
The challenges to data mining are effectively the converse
of the benefits. Certainly computers can discover patterns
in data: But do the patterns discovered by the computer
make sense? Making sense is a contextual concept, and
computers have not yet achieved the intuitive ability that
a human has. A second challenge is the problem of multiplicity: in a statistical sense, multiplicity refers to the
fact that, in 20 tests of hypotheses with an a-level of 0.05,
a single test is expected to result in significant findings just
by chance. If the computer is instructed to pursue classically significant results, 5% of the time, a random phantom will be chased. A third challenge to the data miner is
selection bias and representativeness of the training data
set: does the database, however large it may be, represent
the population of interest? In many databases currently
subject to data mining techniques, the relationship between the database available to the analyst and any population of interest is questionable, due to factors such as
imprecise population definition, nonrandom sampling, or
temporal drift of the phenomenon in question
since the sample was taken. If the database depicts
a nonrepresentative portion of the population, how can
the results of the analysis accurately reflect that population? A final crucial challenge to the data miner is the
curse of dimensionality. This refers to the fact that,
as the numbers of observations and measured variables
increase, the volume of the possible data space increases
Data Mining
Supervised Learning:
Binary Response
The problem of binary response, also known as statistical
classification, is a special case of pattern recognition and
machine learning. Observations are partitioned into two
groups or classes, and often one class is smaller and more
interesting than the other. Examples include buyers versus nonbuyers of a product or service, persons receiving
unemployment benefits versus persons not receiving benefits, fraudulent credit-card transactions versus nonfraudulent ones, and so on. The objective is to classify
the individuals based on their features, or more specifically, to characterize the class of interest in terms of
available observed information. More technically, suppose there is a learning or training sample consisting
of some (often large) number of cases, and each case has
a known class and a (large) number, say k, of recorded
feature variables or covariates. When a new (hitherto
unobserved) case is confronted, its class will not be
knownthis is the essence of the problemso the classifier must be constructed based solely on the observable
feature variables.
The range of all possible values of all the feature
variables defines a k-dimensional space called the feature
space. The initial objective of statistical classification is to
partition or subdivide this feature space into two subsets,
one corresponding to each class. When a new observation
is obtained, where it falls in the feature space is checked
and it is classified accordingly. The problem is to define
the boundary between the two subsets in an optimal
waythat is, to find the optimal classification boundary
or classifier. Statistical theory states that, given (1) the
distributions of the data in the two classes, (2) the relative
population sizes (or prior probabilities) of the classes,
and (3) the costs of misclassification, an optimal classifier
exists and can, in principle, be approximated. To find it,
some method is appliedmathematical, statistical, logical, computational, or some combination thereofthat
relates the values of the feature variables to the known
classes of the observations in the learning sample.
619
620
Data Mining
Discriminant Analysis
The oldest method of statistical classification, dating back to
the 1940s, is discriminant analysis, classically based on the
multivariate normal distribution. In this method, the assumption is that the feature variables follow a multivariate
normal distribution in both classes. Note that this is generally not realistic, because (among other considerations) typically there are different types of feature variables, such as
the binary variable sex, whereas the multivariate normal
assumes that everything is quantitative and continuous.
Nevertheless, discriminant analysis is often employed as
part of a suite of analyses, whether or not the multivariate-normality assumption is met.
Assume that the distribution of the features in
both classes is the same multivariate normal except for
location (multivariate mean); the resulting classification
boundary, then, is linear, so the procedure is called linear
discriminant analysis. This boundary is a line in twodimensional space, a plane in three-dimensional space,
and a hyperplane in higher dimensional feature spaces.
If the feature distribution in the two classes differs both
in location and in dispersion (multivariate mean and covariance matrix), then the classification boundary is quadratic (that is, a parabola or paraboloid sheet) and is called
quadratic discriminant analysis. It is possible to test
for equality of covariance matrices to decide which to use.
The effectiveness of linear or quadratic discriminant
analysis is often limited by the underlying distributional
assumptions and the simplicity of the classification boundary. Nonparametric discriminant analysis attempts to
address this by allowing a nonparametric (relatively unspecified) form for the feature distributions in the two
classes, and hence a highly flexible shape for the classification boundary. However, in this case, it can be difficult
to understand or characterize the fitted distributions and
boundary, especially in high-dimensional data, and overfitting the learning sample, and consequent lack of
generalizability, may be of concern.
Feature selection is less than straightforward in discriminant analysis. The user may test all possible subsets
of the variables, looking at (for example) overall misclassification rate as the objective function; in some software
implementations, there are various iterative routines
modeled on stepwise regression.
Logistic Regression
Logistic regression is a well-known procedure that can be
used for classification. This is a variant of multiple
regression in which the response is binary rather than
quantitative. In the simplest version, the feature variables
are taken to be nonrandom. The response, which is the
class, is a binary random variable that takes on the value 1
(for the class of interest) with some probability p, and the
value 0 with probability 1 p. The success probability p
is a function of the values of the feature variables; specifically, the logarithm of the odds ratio or the log odds,
log[p/(1 p)], is a linear function of the predictor
variables. To use logistic regression for classification,
a cutoff value is set, typically 0.5; a case is assigned to
class 1 if its estimated or fitted success probability is
greater than (or equal to) the cutoff, and it is assigned
to class 0 if the estimated probability is less than the cutoff.
Because of the nature of the functions involved, this is
equivalent to a linear classification boundary, although it
is not (necessarily) the same as would be derived from
linear discriminant analysis.
Like standard multiple regression, logistic regression
carries hypothesis tests for the significance of each variable, along with other tests, estimates, and goodness-of-fit
assessments. In the classification setting, the variable significance tests can be used for feature selection: modern
computational implementations incorporate several
variants of stepwise (iterative) variable selection. Because
of the conceptual analogy with ordinary multiple
regression and the ease of automated variable selection,
logistic classification is probably the most frequently used
data mining procedure. Another advantage is that it
produces a probability of success, given the values of
the feature variables, rather than just a predicted class,
which enables sorting the observations by probability of
success and setting an arbitrary cutoff for classification,
not necessarily 0.5. But wherever the cutoff is set, logistic
classification basically entails a linear classification boundary, and this imposes a limit on the potential efficacy of the
classifier. Some flexibility can be achieved by introducing
transformations (e.g., polynomials) and interactions
among the feature variables.
Neural Networks
Neural networks have attracted a vast amount of theoretical research and commercial software development.
The discussion here is confined to their application in
the classification problem. The basic purpose of
Data Mining
Input 1
Output
Input 2
Input 3
Hidden layer
621
Tree-Structured Classifiers
These classifiers, also known simply as decision trees,
have several advantages: they are easy to understand and
interpret, they are nonparametric (depending on no distributional assumptions), and they incorporate feature
selection as an integral part of the algorithm. They are
of relatively recent date, but there are nonetheless
a number of variants and competing procedures within
the family. In production data mining, a tree may be used
as a primary model-building procedure or as an independent analysis (for example, as a check on logistic
regression results), or for automated feature selection.
Arguably, the most well-established decision tree algorithm is Classification and Regression Trees (CART);
although there are a number of competitors, for simplicity, the focus here is on CART. The objective is to subdivide the data into decreasing subsets, according to the
descending branches of a decision tree, in such a way that
the resulting nodes or leaves are as pure or unmixed as
possible (that is, so that the subsets of cases represented
622
Data Mining
Smoker
(15%)
Nonsmoker
(30%)
Former smoker
(50%)
Diff. to quit
(90%)
Easy to quit
(40%)
Never smoked
(28%)
Liberal
(50%)
Not liberal
(10%)
These principles are combined, and solved computationally, in support vector machine software.
623
Data Mining
Bayesian Networks
Bayesian networks are graphical tools for representing
causal relationships between variables. A Bayesian (or
belief) network is represented by a directed acyclic
graph that encodes the dependencies between variables.
At each point in the graph, the structure of the graph,
prior knowledge, and data are used to update conditional
dependencies. The strength of a Bayesian network is that
it encodes causal and probabilistic information and provides an updating mechanism for adding new data as they
come into a learning system.
Unsupervised Learning
Until now the discussion has focused on situations in
which a collection of records is labeled; that is, the
value of the response variable is known (e.g., it is
known whether a particular credit card transaction in
question was fraudulent or valid). In the unsupervised
learning situation, the records are not labeled; instead,
there is a collection of data, but, a priori, there is nothing
to indicate in what class the particular record is located. In
unsupervised learning, the analysis technique imposes
more structure on the data by searching for that structure
(and not some other kind of structure). Of course, if the
structure imposed by the analysis technique is incorrect,
the results will be fallacious.
55
3
3
3
3 2
2
2
2
2
2
2 2
2 2
2
2
22
2 22 2
22
2 2
22
2
2 2
22
2
2
10
3 33
3
2 3 33 3
3
3 3
3 3
2
3 3
3
323 3
2 22
22
2
2
2
22 2
2
1
3
3
113
Cluster Analysis
1
11
1
1
31
11
11 1
11
1
1
1
1
1
1
1 1
1
1
.
1
1
1
Developing (1)
In transition (3)
3
2
22
2
2
2 2
222 22 2
2
2
222 22
2 22 2 2 22 2
2 22
Developed (2)
2
2
Nonhierarchical
The basic form of cluster analysis is nonhierarchical.
In nonhierarchical cluster analysis, we merely attempt
1
1
1
1
29
Crude death rate
624
Data Mining
machines (this has already happened with neural networks), can be anticipated. Most importantly, though, it
is possible to foresee the expansion of data mining from
the almost classical statistical or computational tasks
discussed here, to the ability to mine directly immense
quantities of relatively unstructured data that are not
(at present) amenable to classical statistical analysis,
including text, image, bioinformatic, and Internet mining.
These capabilities are presently in their infancy.
Acknowledgments
The authors wish to thank three anonymous reviewers for
their valuable comments, which improved the content
and readability of the article.
This article reports the results of research and analysis
undertaken by Census Bureau staff. It has undergone a
more limited review by the Census Bureau than its official
publications. This report is released to inform interested
parties and to encourage discussion.
Further Reading
Bellman, R. (1961). Adaptive Control Processes. Princeton
University Press, Princeton, New Jersey.
Bennett, K., and Campbell, C. (2000). Support vector
machines: Hype or hallelujah? SIGKDD Expl. 2, 1 13.
Fayyad, U. M., Piatetsky-Shapiro, G., and Uthurusamy, R.
(eds.) (1996). Advances in Knowledge Discovery and Data
Mining. AAAI Press, Menlo Park.
Hastie, T., Tibshirani, R., and Friedman, J. H. (2001).
The Elements of Statistical Learning. Springer-Verlag,
New York.
Heckerman, D. (1995). A Tutorial on Learning with Bayesian
Networks. Microsoft Research Technical Report MSR-TR95-06. Available on the Internet at ftp://ftp.research.
microsoft.com
Mitra, S., and Acharya, T. (2003). Data Mining: Multimedia,
Soft Computing, and Bioinformatics. Wiley, New York.
Pearl, J. (1997). Probabilistic Reasoning in Intelligent Systems.
Morgan Kaufman, New York.
Glossary
argument Sets of statements that support a conclusion. An
argument is designed to persuade.
inference The evidentiary relationship maintained between
premises and conclusions.
logic A normative discipline that studies the forms of reasoning; seeks the criteria by which to differentiate good from
bad arguments.
soundness A quality of an argument if and only if it contains
all true premises.
validity If the premises of an argument are taken as true and
the conclusion drawn is compelling, the argument is
considered valid. A valid argument is sound if and only if
all the premises are in fact true. Invalid arguments cannot
be sound.
Deduction and induction are two different forms of argument. In a deductive argument, the conclusion necessarily follows from premises. In an inductive argument,
the premises support a conclusion to varying degrees. Part
of the challenge of the inductive argument is to establish
criteria by which to determine which conclusion is best
supported. In this article, deductive and inductive inferences are examined with the aim of showing how to establish the criteria by which to differentiate between good
and bad arguments.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
625
626
Deduction
Consider the following argument (a traditional example of
a deductive argument):
1. All humans are mortal.
2. Socrates is human.
3. Therefore, Socrates is mortal.
The conclusion (3) in this argument is established by the
premises (1) and (2). It is a valid argument; (1) and (2)
have to be true, but apart from whether this is a good or
bad argument, we can recognize this argument as expressing the standard form of a deduction, i.e., it is a deductive
argument.
Deductive arguments are unique in that the claims of
such arguments are conclusive. When, as in the preceding
argument, the conclusion follows from the premises, we
have validity. If the conclusion did not follow, we would
say of this argument that it is invalid. For deductive arguments, when it is not possible for the premises to be true
at the same time that the conclusion is false, we say that
the conclusion follows from the premises. Note well that
the premises need not actually be true. The requirements
of deductive validity require only that it is not possible for
the conclusion to be false while at the same time the
premises are true. The conclusion of a valid deductive
inference is never false.
Oftentimes it is suggested that deductive arguments
move from the general to the particular, as in the claim
that Socrates is mortal. In this case, talk of the particular
simply refers to the conclusion having to do with
Socrates as a single object of the deductions focus.
Irving Copi and Keith Burgess-Jackson suggest that
this way of thinking of deduction and deductive arguments is more than slightly misleading. The difficulty
lies, they write, in the fact that a valid deductive argument may have universal propositions for its conclusion as
well as for its premises. For example, consider the argument that Copi and Burgess-Jackson provide to substantiate their claim:
1. All animals are mortal.
2. All humans are animals.
3. Therefore, all humans are mortal.
This argument is valid; (3) follows from (1) and (2). But the
conclusion (3) is not about a particular object; the conclusion (3) quantifies over all objects that fall into the class
of mortals (2). Copi and Burgess-Jackson add that a valid
deductive argument may have particular propositions for
its premises as well as for its conclusion. They consider
627
Not only does (3) follow from the true premises (1) and
(2), we are compelled to believe that (3) is true and so the
argument is sound.
So far, we have been considering arguments that are
deductive; the contents of the conclusions lie entirely
within the domain of the contents of the premises. In
such arguments, it seems relatively straightforward to
think about the differences between validity, truth, and
soundness. Matters get more complicated when these
principles of logic are applied to induction, or to inductive
arguments.
Induction
Truth and Soundness
Suppose that the following argument is made:
1. All trucks have seven wheels.
2. Annas vehicle is a truck.
3. Therefore, Annas truck has seven wheels.
If we accept the premises as true in this deductive argument, are we also compelled to accept the conclusion? If
yes, then we have to accept that this argument is valid. So it
can be said that the premises are taken as true in a valid
argument; they establish the conclusion and the validity of
the argument. The problem is that validity guarantees the
truth of the conclusion only on the grounds that the
premises are in fact true. Validity is only the first step
in evaluating deductive arguments. Once we have determined if an argument is valid, we then wish to know if it
is soundthat is, if the premises of the valid argument
are in fact true. A sound deductive argument is an
argument in which the truth of the conclusion is
guaranteed. Validity is not enough to secure truth. We
require further criteria to establish truth and we require
a further distinction.
An argument is thought to be sound if and only if it is
valid and it contains all true premises. Take the following
argument (it is valid but not sound):
1. Lou is a secretary.
2. All secretaries are female.
3. Lou is female.
Although the premises (1) and (2) in this argument are
true, we are compelled to question its conclusion (3).
Valid arguments do not secure true conclusions (even if
we later find out that Lou is short-form for Louise. In
contrast, consider the following argument (it is sound,
which implies that it is already valid):
1. All medical students are in the Faculty of Medicine.
2. Nafisa is a medical student.
3. Therefore, Nafisa belongs to the Faculty of
Medicine.
628
Further Reading
Copi, I. M., and Burgess-Jackson, K. (1996). Informal Logic.
Prentice Hall, Upper Saddle River, New Jersey.
629
Glossary
common cause The description given to variation in a stable
process; can also be described as natural variation in
a system, when nothing unusual is happening (e.g., no
special causes). However, common cause variation is also
present in processes that are out of control and are constant
over time and throughout the system.
deadly diseases and obstacles The term used by Deming to
describe the things that stand in the way of transformation.
Plan, Do, Study, Act (PDSA) model In its original form, this
was the Plan, Do, Check, Act model for continual
improvement, depicted in the form of a four-spoke wheel.
Though many refer to this as the Deming wheel, Deming
always referred to this as the Shewhart cycle. Deming had
taken this four-step approach from Shewharts method for
continually improving processes, to achieve a state of
statistical control. In fact, Shewhart intended the wheel to
be drawn as spiral, to represent the continual nature of the
improvements taking place.
prediction For Deming, management is prediction. The
premise, if a process is in statistical control, is that
the future behavior of that process is predictable for the
foreseeable future. The notion of prediction comes from
theory. The construction of a chart allows the evidence to
be interpreted, using theory to predict what may happen
next. Data alone do not provide a prediction. Deming
stated that rational prediction requires theory and
builds knowledge through systematic revision and extension
of theory based on comparison of prediction with
observation.
Shewhart control chart A two-axis chart, used to plot data
taken from processes and products. The plotted data
points on the chart illustrate variation in the entity being
measured. The chart will also show the mean and
upper and lower control limits. Charts are an operational
definition of the distinction between common and special
causes of variation. Charts can be used for judgment of
the past, stability of the present, and prediction of the
future by looking back on a set of results from a process,
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
631
632
Introduction
Dr. William Edwards Demings (1900 1993) most notable achievement in his long life was probably the impact
he had on Japanese manufacturing in the post-World War
II period. Deming, a statistician from the United States,
was asked by General MacArthur to visit Japan to help
train the industrialists on how to improve the quality
of their products in the rebuilding process (in Japan,
MacArthur was frustrated with the poor-quality telephone systems, which he was finding very unreliable).
MacArthur was familiar with the work on improving the
quality of telephone manufacture by Dr. W. A. Shewhart,
who famously invented the control chart when he applied
statistical methods to mass production. Deming had followed Shewharts teachings and developed the application of statistical quality control to an increasing
number of sectors and processes outside of manufacturing. It was this new philosophy that Deming taught the
Japanese.
Demings impact on Japanese manufacturing remained relatively unknown until the 1960s, when the
Western economies started to notice and feel the impact
of superior Japanese products. It was only when Western
journalists, academics, and industrialists started to investigate the secrets of Japans success that they came
across Demings name, along with the names Juran,
Sarashon, Prossman, and Ishikawa, for example. The Japanese Union of Scientists and Engineers (JUSE) recognized the impact of Demings work and in 1951 named
their annual quality award after him (the Deming Prize).
A Potted Biography
Deming, born 14 October 1900, died 20 December 1993.
His first degree, in 1917, was at the University of
Wyoming, where he was studying mathematics. He then
read for a Masters degree in mathematics and
physics from the University of Colorado in 1922 and
earned a Ph.D. in physics from Yale University in 1928.
During his summer vacation work (1925 1926) at the
Hawthorne, Illinois plant of Western Electric, he met
Dr. W. A. Shewhart, who was conducting a research project to find a more economic way of controlling quality.
This meeting was an important event in the history
of quality management. Deming learned the theory of
statistical quality control from Shewhart, and it was
these ideas that were developed over the next 65 years.
633
634
Variation
First, consider Shewharts 1939 invention of the control
chart and his notions of common and assignable (Deming
called these special) causes of variation. The purpose of
the control chart is to detect the presence of special causes
in a process. This concept is at the root of reducing variation to provide stable and predictable processes.
The control chart is a unique invention, illustrating
variation in a process or product. There are many types
of charts for different processes. The example given in
Fig. 1 shows a simple but effective individuals-movingrange chart, with the mean, upper, and lower control
limits all calculated from the data from measuring specific
features of a process. The chart shows data taken from
a typical management report on weekly overtime. If these
data were presented in the form of a table of figures, as is
often the case in monthly management reports, the discussion around the meeting room might be quite exciting
around week 12, with a high of nearly 400 hours. The
managers would return to their departments and warn
about excessive overtime, and low and behold, the next
2 weeks would show a reduction! Then, the managers
would celebrate getting the overtime down to
150 hours. But, surprise, surprise, the figure would go
up again. If, however, the data are presented in a different
format, on a chart, the flows of the weekly trends are
seen more clearly. It is apparent that though the data
appear variable, when the mean and the 3s control limits
are calculated and drawn on the chart, the process is
Knowledge of Systems
There are many processes in an organization. Collectively,
they form a system. Deming defined a system as a network
of interdependent components that work together to try
to accomplish the aim of the system. The interdependent
components can be conceived as processes. An orchestra
is an example of how a network of individuals may choose
to cooperate, combine, and communicate in order to produce a piece of music for the listeners to enjoy. Deming
points out that there are no prima donnas in an orchestra;
the players are there to support one another.
450
UCL = 406.7
400
Hours |3|
350
300
Mean = 264.8
250
200
150
LCL =122.9
100
1
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Week
Figure 1 A Shewhart control chart, showing weekly overtime data. UCL, Upper control limit;
LCL, Lower control limit.
A Theory of Knowledge
The third element of SoPK is a theory of knowledge. In his
writings on this, Deming begins the section with the subheading Management is Prediction, with a footnote to
C. I. Lewis 1929 publication Mind and the World-Order,
which is a theory of knowledge, based on the notion of
conceptualistic pragmatism. It is a little-known fact outside the community of those who know Demings and
Shewharts work that they were both influenced greatly
by Lewis book. Shewhart was reputed to have read it first
and recommended it to Deming. He apparently read it
14 times and Deming a mere seven!
So why was this important? First, Shewhart had read
widely as he developed his work in the 1920s. His reading
included several books from the philosophy of science.
Those reading Shewharts work carefully will find that he
talks consistently about prediction, so that if a process is in
control, it is predictableall things remaining equal. Second, a systems view of an organization (or, for that matter,
the universe) progresses into a theory of knowledge that
goes back to the pre-Socratic philosophy of Heraclitus,
famous for claiming that we cannot step in the same river
twice. This theory suggests that the universe is in a state of
flux, or motion, e.g., a nonstatic universe. Therefore, the
application of the statistical method to organizations requires being able to view the world as if it was in flux,
because control charts show the variation of data over
time. So, for variation, apply the concept of flux.
The organization is therefore a dynamic entity with the
future unfolding before us. The control chart provides
a means of capturing this process, and we have to learn
to interpret the charts from this perspective. This is where
pragmatism comes into the equation. Lewis provided
a theory of knowledge that would help to understand
the world-order with our minds, based on a theory of
conceptualistic pragmatism, which incorporated prediction. Lewis work gave both Shewhart and Deming
a theory that in many ways supported what they were
635
636
Further Reading
Deming, W. E. (1950). Some Theory of Sampling. John Wiley
and Sons, New York (reprinted in 1984 by Dover, New York).
Deming, W. E. (1953). On the distinction between enumerative and analytic surveys. J. Am. Statist. Assoc. 48(262),
244 255.
Deming, W. E. (1986). Out of the Crisis. MIT Press,
Cambridge, Massachusetts.
Deming, W. E. (1991). A Tribute to Walter Shewhart on the
Occasion of the 100th Anniversary of His Birth. SPC Press,
Knoxville.
Deming, W. E. (1994). The New Economics for Industry,
Government, Education, 2nd Ed. MIT Press, Cambridge,
Massachusetts.
Democracy, Measures of
Jeffrey Bass
University of Missouri-Columbia, Department of Anthropology,
Columbia, Missouri, USA
Glossary
civil society The social space created by institutions and civil
associations, autonomous of the state and its institutions.
consensus democracy A form of democracy in which political
power is shared and decentralized so as to accommodate
enduring regional, religious, ethnic, and linguistic social
cleavages; the alternative to majoritarian democracy.
illiberal democracy A political system with regular elections,
but in which civil liberties are severely restricted and the
rule of law is routinely violated by the state.
liberal autocracy A political system without elections, but in
which the state respects and maintains the rule of law and
guarantees partial, but significant, civil liberties.
liberal democracy A political system marked not only by free
and fair elections, but also by the rule of law and the
protection of basic civil liberties, such as freedom of
speech, assembly, and religion.
majoritarian democracy A form of democracy in which
political power tends to be centralized and concentrated so
as to reflect the will of the majority, or even a bare plurality;
associated with a strong centralized government, singleparty cabinets, and a first-past-the-post plurality electoral
system; the alternative to consensus democracy.
polyarchy An operational definition of political democracy
divorced from democratic ideals; used to describe objectively the procedures and institutions associated with
contemporary liberal democracies.
social democracy A democracy that emphasizes socioeconomic equality and the creation of certain social rights, such
as education, health care, and extensive social services.
In measuring the degree to which modern states are democratic, most social scientists currently evaluate democratic performance using some type of procedural
measure that focuses on the states democratic institutions
and practices. Alternatively, some scholars have begun to
Introduction
Social scientists have developed a range of approaches to
measure the degree to which modern states are democratic. Although dictionaries commonly define democracy
very simply as rule by the people, this definition alone
has proved too vague to be used as an operational measure. Historically, rule and the people have been very
broadly defined. For example, the Soviet Union used its
claim to have achieved relative socioeconomic equality to
refer to itself as a peoples democracy, even though it
was a one-party dictatorship. Alternatively, the United
States strongly identified itself as a political democracy,
even though slavery and later segregation in southern
states politically disenfranchised a large segment of its
African-American population. In both historical cases,
local understandings of democracy differed from those
commonly accepted by social scientists today.
This article begins by briefly outlining various conceptions of democracy, and how it is understood by most
social scientists. It then examines attempts by social scientists to develop procedural indices to compare political democracies and measure the degree to which
democratic ideals are realized in modern states. It also
explores how some scholars have created measures of
democracy that more fully take into account the roles
that civil society, political culture, and socioeconomic
forces have in creating a substantive democracy. In the
concluding section, I argue that future research on democracy should take into account an even broader range
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
637
638
Democracy, Measures of
Procedural Approaches
Joseph Schumpeter: Basic Premises of
the Procedural Approach
Joseph Schumpeter has been widely credited with pioneering the influential procedural approach toward
measuring democratic performance. He argued in 1947
against understanding democracy as rule by a general will
that achieves a common good, and in favor of what he
called a realist model, in which democracy is defined as
an institutional arrangement for arriving at political decisions in which individuals acquire the power to decide
by means of a competitive struggle for the peoples vote.
Schumpeter viewed democracy as a set of political institutions and procedures divorced from any broader ideal
conceptions of democracy. Later, social scientists augmented Schumpeters minimalist procedural standard
by adding additional criteria for identifying and measuring political democracy. The following discussions review
some of these augmented procedural standards.
Democracy, Measures of
639
commonly recognized, liberal democratic values. Contemporary debates concerning the importance of minority rights and the democratic implications of different
types of representational institutions reveal dimensions
of liberal democracy not yet taken account of by most
procedural democratic measures.
Minority Rights
Liberal democratic theorists since World War II have
traditionally tended to argue that democratic freedom
and equality can best be ensured through the provision
of individual political rights and civil liberties. But recently some theorists have resurrected a tradition of liberal democratic thought that argues that for freedom and
equality to prosper in multinational (or even multicultural) societies, it is necessary to also require some
group-differentiated or minority rights. This has been
justified on the grounds that these group-differentiated
rights ensure that members of national minorities can
participate as equals in the greater society. For example,
laws that ensure the use of bilingual ballots or allow the
use of minority languages in the court system help to
ensure the protection of the political and civil rights of
minority groups. It has been argued that an additional
measure of the degree to which minority interests are
protected should be included in any comprehensive measure of liberal democratic performance.
The introduction of this additional measure of liberal
democracy, however, creates a paradox, namely, that as
minority rights are expanded, it is possible that some individual political and civil rights might be restricted. For
example, special representation rights in which historically marginalized groups are ensured a certain number
of seats in national legislatures can be interpreted as limiting the individual political rights of other citizens. Additionally, legislation that prioritizes the employment of
members of historically disadvantaged groups can be seen
to restrict the individual civil rights of citizens who are not
members of these groups. Not all democratic values are
necessarily mutually reinforcing. Nevertheless, multidimensional measures of democratic performance that distinguish differences among these value dimensions better
identify and measure these types of potential democratic
trade-offs.
640
Democracy, Measures of
Methodological Issues
Determining the exact point along any procedural measure of democracy at which a country should be designated
as democratic (or polyarchic) is ultimately subjective. In
fact, from a historical perspective, identifying what exactly
a political democracy consists of has proved to be a moving
target. In the 19th-century United States, political elites
(and elites in most other democracies of the time) typically assumed that the limited voting rights of women,
Blacks, and the poor did not compromise the democratic
credentials of their governments. By these typical 19thcentury democratic standards, countries such as apartheid
South Africa, for example, could be described as fully functioning liberal democracies. Since this time, however, both
popular and scholarly democratic standards have evolved.
In fact, since the 18th century, there has not only been
Democracy, Measures of
Substantive Approaches to
Measuring Democracy: Economics,
Culture, and Civil Society
Critics of the procedural approach point out that these
measures typically overlook those components of democracy that exist largely outside of state structures. Liberal
democratic political institutions do not exist in a vacuum.
Socioeconomic conditions, cultural attitudes, and independent mutual-interest associations within a countrys
civil society all influence the nature and quality of political
democracy in modern societies.
641
642
Democracy, Measures of
Democracy, Measures of
Further Reading
Beetham, D. (1999). Democracy and Human Rights. Polity
Press, Cambridge.
Bollen, K. (1993). Liberal democracy: Validity and method
factors in cross-national measures. Am. J. Pol. Sci. 37(4),
1207 1230.
643
Demography
John R. Weeks
San Diego State University, San Diego, California, USA
Glossary
age pyramid A graph of the number (or percentage) of
people in a population distributed by age and sex, in which
the ages are represented by bars stacked horizontally.
age and sex structure The distribution of a population
according to age and sex.
cohort data Data referring to people who share something in
common that is tracked over time; in demography this is
most often the year of birth, thereby producing age cohorts.
demographic balancing (or estimating) equation The
formula expressing that the population at time 2 is based
on the population at time 1 plus births minus deaths plus
the net migration between times 1 and 2.
fecundity The physiological capacity to reproduce.
fertility Reproductive performance rather than the mere capacity to do so; one of the three basic demographic processes.
infant mortality rate The number of deaths of infants under
1 year of age divided by the number of live births in that
year (and usually multiplied by 1000).
life expectancy The average duration of life beyond any age
of people who have attained that age, calculated from a life
table.
longevity The length of life, typically measured as the average
age at death.
migration The process of permanently changing residence
from one geographic location to another; one of the three
basic demographic processes.
period data Population data that refer to a particular year
and represent a cross section of the population at one
specific time.
population projection The calculation of the number of
people who could be alive at a future date, given the
number now alive and given reasonable assumptions about
future mortality, fertility, and migration.
sex ratio The number of males per the number of females in
a population.
standardization The adjustment of rates, such as the crude
death rate and crude birth rate, so that differences in age
structure between two populations are taken into account.
Introduction
Population growth is determined by the combination of
mortality, fertility, and migration, so all three processes
must be measured if we are to understand what kind
of demographic change is occurring in a given region.
However, because each of these three demographic
processes varies by age and sex, we must also know
how to measure the population structure according to
those characteristics. Change in population processes
and in the spatial distribution of the population are
influenced especially by changes in the social and economic structure of a society, which means that we must
also know the basic measures by which we track the
demographic characteristics of the population. Population change underlies many, if not most, of the social
changes occurring in the world todayit is both evolutionary and revolutionary. For this reason, methods of
projecting the population are important tools for assessing
alternate scenarios for the future of human society. Because population projections can employ all of the other
demographic methods, the methods for doing projections
are discussed last.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
645
646
Demography
Basic Descriptors of
Population Growth
Human populations, like all living things, have the capacity for exponential increase rather than simple straightline increase. We can express this growth in terms of the
ratio of the population size at two different dates:
p2
er n
p1
p2 p1 b d im om
lnp2 =p1
n
lnp2 =p1
r
so,
Doubling-time
ln2
r
Demography
between the crude birth rate (CBR, see Eq. 17) and the
crude death rate (CDR, see Eq. 6):
CNI CBR CDR:
Measuring Mortality
CDR
647
n dx
n px
100000
X
i
nwsx
n Mx
648
Demography
abridged life table because it groups ages into 5-year categories rather than using single years of age. The probability of dying (nqx) between ages x and x n is obtained
by converting age/sex-specific death rates (nMx) into probabilities. A probability of death relates the number of
deaths during any given number of years (that is, between
any given exact ages) to the number of people who started
out being alive and at risk of dying. For most age groups,
except the very youngest (less than 5) and oldest (85 and
older), for which special adjustments are made, death
rates (nMx) for a given sex for ages x to x n may be
converted to probabilities of dying according to the
following formula:
n qx
Life Tables
Another very useful way of standardizing for age is to
calculate the expectation of life at birth, also known as
life expectancy. This measure is derived from a life table,
which was first used in 1662 by John Graunt to uncover
the patterns of mortality in London. Life expectancy can
be summarized as the average age at death for a hypothetical group of people born in a particular year and
being subjected to the risks of death experienced by
people of all ages in that year. An expectation of life at
birth for U.S. females in 1999 of 79.4 years does not mean
that the average age at death in that year for females was
79.4. What it does mean is that if all the females born in
the United States in the year 1999 have the same risks
of dying throughout their lives as those indicated by the
age-specific death rates in 1999, then their average age at
death will be 79.4. Of course, some of them would have
died in infancy whereas others might live to be 120, but
the age-specific death rates for females in 1999 implied
an average of 79.4. Note that life expectancy is based
on a hypothetical population, so the actual longevity of
a population is measured by the average age at death.
Because it is undesirable to have to wait decades to
find out how long people are actually going to live, the
hypothetical situation set up by life expectancy provides
a quick and useful comparison between populations. One
of the limitations of basing the life table on rates for a given
year, however, is that in most instances the death rates of
older people in that year will almost certainly be higher
than will be experienced by todays babies when they
reach that age. This is especially true for a country that
is in the midst of a rapid decline in mortality, but even in
the United States in the twenty-first century, current life
tables are assumed to underestimate the actual life expectancy of an infant by 5 or more years.
Life-table calculations, as shown in Table I for U.S.
females for 1999, begin with a set of age- and sex-specific
death rates, and the first step is to find the probability of
dying during any given age interval. Table I is called an
nn Mx
1 ann Mx
n qx lx
10
649
Demography
Table I
(10)
Expectation
(5)
Number of years lived:
of life
Probabilities
Average
(7)
(9)
of death
(2)
number of
Number
In this
(3)
(4)
proportion of
Number of Number of Age-specific
years of
dying
and all
(1)
(6)
persons
females in
during
subsequent live remaining
deaths
death rates
Age
Number
(8)
alive at
the
at beginning
age
age
in the
in the
interval,
alive at
In the age
beginning who
population, population,
of age
intervals,
interval,
x to
beginning of interval, interval,
die during
interval,
ex
T
xn
interval,
I
P
D
M
d
L
interval,
q
n x
n x
n x
x
n x
n x
x
n x
01
15
510
1015
1520
2025
2530
3035
3540
4045
4550
5055
5560
6065
6570
7075
7580
8085
85
1,867,649
7,383,117
9,741,935
9,538,922
9,587,530
8,841,667
9,150,709
9,959,530
11,332,470
11,231,542
9,855,838
8,447,622
6,692,991
5,546,089
5,110,451
4,909,038
4,272,506
3,003,063
2,934,837
12,291
2274
1510
1593
3998
4244
5161
7629
13,123
19,015
24,817
32,498
41,443
54,812
79,166
118,514
163,496
194,124
436,152
0.00658
0.00031
0.00016
0.00017
0.00042
0.00048
0.00056
0.00077
0.00116
0.00169
0.00252
0.00385
0.00619
0.00988
0.01549
0.02414
0.03827
0.06464
0.14861
0.00654
0.00123
0.00077
0.00083
0.00208
0.00240
0.00282
0.00382
0.00577
0.00843
0.01251
0.01905
0.03049
0.04822
0.07457
0.11384
0.17463
0.27824
1.00000
100,000
99,346
99,223
99,146
99,064
98,857
98,620
98,343
97,967
97,401
96,580
95,372
93,555
90,702
86,328
79,891
70,796
58,433
42,175
654
122
77
83
206
237
278
376
566
821
1208
1817
2852
4374
6437
9095
12,363
16,259
42,175
99,444
397,089
495,924
495,525
494,802
493,694
492,407
490,773
488,419
484,953
479,879
472,316
460,643
442,577
415,549
376,719
323,074
251,520
283,790
7,939,099
7,839,655
7,442,566
6,946,642
6,451,117
5956,315
5,462,621
4,970,213
4,479,440
3,991,021
3,506,068
3,026,188
2,553,872
2,093,229
1,650,652
1,235,103
858,384
535,310
283,790
79.4
78.9
75.0
70.1
65.1
60.3
55.4
50.5
45.7
41.0
36.3
31.7
27.3
23.1
19.1
15.5
12.1
9.2
6.7
Source: Death rate data from the U.S. National Center for Health Statistics; other calculations by the author.
and
lx n lx n d x
11
nlx an dx
12
l85
M85
13
650
Demography
14
Tx
lx
15
16
Measuring Fertility
Demographers use the term fertility to refer to actual
reproductive performance and use the term fecundity
to refer to the biological capacity. This can lead to some
confusion because the medical profession tends to use
the term fertility to refer to what demographers call
fecundity. For example, couples who have tried unsuccessfully for at least 12 months to conceive a child are
usually called infertile by physicians, whereas demographers would say that such a couple is infecund.
A woman is classified as having impaired fecundity if
she believes that it is impossible for her to have a baby,
if a physician has told her not to become pregnant because
the pregnancy would pose a health risk for her or her baby,
or if she has been continuously married for at least 36
months, has not used contraception, and yet has not
gotten pregnant.
Among women who are normally fecund and who regularly engage in unprotected sexual intercourse, the probability is very close to 1.0 that she will become pregnant
over the course of 12 months. This varies somewhat by
age, however, with the probability peaking in the early 20s
and declining after that. Furthermore, women who are
lactating are much less likely to conceive than nonlactating women.
The measures of fertility used by demographers
attempt generally to gauge the rate at which women of
reproductive age are bearing live births. Because poor
health can lead to lower levels of conception and higher
Demography
b
1000
p
17
651
b
30 F15
1000
18
4 p0
f
30 p15
1000
19
652
Demography
20
n bx
f
n px
1000
21
ASFR 5
22
bf
b
23
Demography
X n bf
x
f
n px
lx
500000
24
653
Measuring Migration
Migration is defined as any change in usual residence.
Although definitions of migration have a certain arbitrariness to them, the change is usually taken to mean that you
have stayed at your new residence for least 1 year, and
implicit in the concept of migration is the idea that
a person has moved far enough so that the usual round
of daily activities in the new location does not overlap with
those of the old location. A person who changes residence
within the same area has experienced residential mobility,
but is not a migrant. In the United States, a person who
moves across county lines is typically considered to be
a migrant, whereas a move within county boundaries is
classified as residential mobility but not migration.
When data are available, migration is measured with
rates that are similar to those constructed for fertility and
mortality. Gross or total out-migration represents all people who leave a particular region during a given time
period (usually a year), and so the gross rate of outmigration (OMigR) relates those people to the total
mid-year population (p) in the region (and then we
multiply by 1000):
OMigR
OM
1000
p
25
IM
1000
p
26
654
Demography
27
28
29
IM OM
1000
bd
30
Sex Structure
Migration, mortality, and fertility operate differently to
create inequalities in the ratio of males to females, and this
measure is known as the sex ratio (SR):
SR
M
100
F
31
A sex ratio that is greater than 100 thus means that there
are more males than females, whereas a value of less
than 100 indicates that there are more females than
males. The ratio can obviously be calculated for the
entire population or for specific age groups.
Age Structure
At any given moment, a cross section of all cohorts defines
the current age strata in a society. Figure 1 displays the
Demography
6
5
4
Age
C
3
2
A
1
0
2000
2001
2002
2003
2004
2005
Time (moment of birth in years)
2006
655
Age Pyramids
An age pyramid (also known as a population pyramid) is
a graphical representation of the distribution of
a population by age and sex. It can graph either the
total number of people at each age or the percentage
of people at each age. It is called a pyramid because
the classic picture is of a high-fertility, high-mortality society (which characterized most of the world until only
several decades ago) with a broad base built of numerous
births, rapidly tapering to the top (the older ages) because
of high death rates in combination with the high birth rate.
Nigerias age and sex structure as of the year 2000 reflects
the classic look of the population pyramids, as shown in
Fig. 2A. Developed countries such as the United States
have age and sex distributions that are more rectangular or
barrel-shaped (see Fig. 2B), but the graph is still called an
age pyramid.
656
A
Demography
95+
9094
8589
8084
7579
7074
6569
6064
5559
5054
4549
4044
3539
3034
2529
2024
1519
1014
59
04
12
10
8
6
4
Males (millions)
4
6
8
Females (millions)
10
12
10
8
6
4
Males (millions)
4
6
8
Females (millions)
10
12
95+
9094
8589
8084
7579
7074
6569
6064
5559
5054
4549
4044
3539
3034
2529
2024
1519
1014
59
04
12
Figure 2 Age pyramids. (A) Nigeria 2000. (B) United States 2000. Permission to Population, An
Introduction to Concepts and Issues (Non-InfoTrac Version) 8th edition, by Weeks, copyright 2002
Wadsworth, a division of Thomson Learning, has been granted to John Weeks for class use. All rights
reserved. Aside from this specific exception, no part of this book may be reproduced, stored in a retrieval
system, or transcribed in any form or by any meanselectronic, mechanical, photocopying, recording or
otherwisewithout permission in writing from Thomson Learning Global Rights Group: www.
thomsonrights.com. Fax 800 739-2215.
population. The higher this ratio is, the more people each
potential worker has to support; conversely, the lower it is,
the fewer people each worker has to support. The DR is
calculated as follows:
DR
1 p65
100
50 p15
15 p0
32
Demography
Population Projections
A population projection is the calculation of the number of
people who could be alive at a future date given the number now alive and given assumptions about the future
course of mortality, fertility, and migration. In many respects, population projections are the single most useful
set of tools available in demographic analysis. By enabling
researchers to see what the future size and composition of
the population might be under varying assumptions about
trends in mortality, fertility and migration, it is possible
intelligently to evaluate what the likely course of events
might be many years from now. Also, by projecting the
population forward through time from some point in
history, it is possible to speculate on the sources of change
in the population over time. It is useful to distinguish projections from forecasts. A population forecast is a statement about what the future population is expected to be.
This is different from a projection, which is a statement
about what the future population could be under a given
set of assumptions. There are two principal ways to project
populations: (1) extrapolation methods and (2) the cohort
component method.
Extrapolation
Extrapolation methods are an adaptation of Eq. (1). It
assumes that some rate of growth will hold constant between the base year (P1the population in the beginning
year of a population projection) and the target year (the
year to which a population is projected forward in time).
657
33
Further Reading
Chiang, C. (1984). The Life Table and Its Applications.
Krieger, Melbourne, FL.
Hinde, A. (1998). Demographic Methods. Oxford University
Press, New York.
Murdock, S. H., and Ellis, D. R. (1991). Applied Demography:
An Introduction to Basic Concepts, Methods, and Data.
Westview Press, Boulder, CO.
Palmore, J., and Gardner, R. (1983). Measuring Mortality,
Fertility, and Natural Increase. East-West Population
Institute, Honolulu, HI.
Plane, D., and Rogerson, P. (1994). The Geographical Analysis
of Population: With Applications to Planning and Business.
John Wiley & Sons, New York.
Preston, S. H., Heuveline, P., and Guillot, M. (2001).
Demography: Measuring and Modeling Population
Processes. Blackwell Publishers, Oxford.
658
Demography
Rebecca Yang
University of Texas, Dallas, Texas, USA
Glossary
bootstrapping A resampling procedure to empirically estimate standard errors.
central tendency The dominant quantitative or qualitative
trend of a given variable (commonly measured by the mean,
the median, the mode, and related measures).
confidence interval A numeric range, based on a statistic
and its sampling distribution, that contains the population
parameter of interest with a specified probability.
data A plural noun referring to a collection of information in
the form of variables and observations.
descriptive statistics Any of numerous calculations that
attempt to provide a concise summary of the information
content of data (for example, measures of central tendency
and measures of dispersion).
dispersion The tendency of observations of a variable to
deviate from the central tendency (commonly measured by
the variance, the standard deviation, the interquartile
range, etc.).
inferential statistics The science of drawing valid inferences
about a population based on a sample.
level of measurement A characterization of the information
content of a variable; for example, variables may be
qualitative (nominal or interval) or quantitative (interval
or ratio).
parameter The true value of a descriptive measure in the
population of interest.
population The total collection of actual and/or potential
realizations of the unit of analysis, whether observed or not.
sample A specific, finite, realized set of observations of the
unit of analysis.
sampling distribution A theoretical construct describing the
behavior of a statistic in repeated samples.
statistic A descriptive measure calculated from sample data to
serve as an estimate of an unknown population parameter.
Descriptive Statistics
A number of terms have specialized meaning in the domain of statistics. First, there is the distinction between
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
659
660
Forms of Data
All statistics are based on data, which are composed of one
or more variables that represent the characteristics of one
or more of the type of thing being studied. A variable
consists of a defined measurement. The type of thing
on which the measurements are taken is called the unit
of analysis. For example, the unit of analysis could be
individual people, but it could also be families, households, neighborhoods, cities, nations, or galaxies.
The collection of all measurements for one realization
of the unit of analysis is typically called an observation. If
there are n observations and k variables, then the data set
can be thought of as a grid with n k total items of information, although more complex structures are possible.
It is absolutely central to conducting and interpreting
data analysis to be clear about the unit of analysis. For
example, if one observes that 20% of all crimes are violent
in nature, it does not imply that 20% of all criminals are
violent criminals. Crimes, though obviously related, are
simply a different unit of analysis than criminals; a few
really violent criminals could be committing all the violent
crimes, so that less than 20% of criminals are violent in
nature.
Levels of Measurement
A variable is something that varies between observations,
at least potentially, but not all variables vary in the same
way. The specific values that a variable can take on, also
known as attributes, convey information about the
differences between observations on the dimension measured by the variable. So, for example, two persons differ
on the variable income if ones income is $100,000 and
the others is $200,000. They also differ on the variable
sex if one is male and the other is female. But the nature
of the information conveyed by the two variables is quite
different. For one thing, the incomes can be ranked, but
the individuals sexes cannot be ranked. One can say
The issue of coding of categorical variables merits further discussion. Frequently, the attributes of a categorical
variable are represented numerically. For example,
a variable for region may be coded in a data set as follows:
1 represents north, 2 represents south, and so on. It is
essential to understand that these numbers are arbitrary
and serve merely to distinguish one category from
another.
661
observe the temporal order of events for individual persons, strengthening causal inferences that the researcher
may wish to draw.
With these basic concepts in hand, the most common
descriptive statistics can be explored.
662
Measures of Variability
Variety may be the spice of life, but variation is the very
meat of science. Variation between observations opens
the door to analysis of causation and ultimately to understanding. To say that variation is important for data analysis is an understatement of the highest order; without
variation, there would no mysteries and no hope of
unraveling them. The point of data analysis is to understand the world; the point of understanding the world is
the differences between things. Why is one person rich
and another poor? Why is the United States rich and
Mexico poor? Why does one cancer patient respond to
a treatment, but another does not?
It is no accident, therefore, that this most central pillar
of science has a variety of measures that differ both in the
details of their calculations and, more importantly, in their
conceptual significance.
In the world of measures of variability, also called
dispersion or spread tendency, the greatest divide runs
between measures based on the position of observations
versus measures based on deviations from a measure of
central tendency, usually the arithmetic mean. A third,
less common, group of measures is based on the
frequency of occurrence of different attributes of
a variable.
Positional measures of variation are based on percentiles of a distribution; the xth percentile of a distribution is
defined as the value that is higher than x% of all the
observations. The 25th percentile is also known as the
first quartile and the 75th percentile is referred to as
the third quartile. Percentile-based measures of variability are typically paired with the median as a measure
of central tendency; after all, the median is the 50th
percentile.
Deviation-based measures, in contrast, focus on
a summary of some function of the quantitative distance
of each observation from a measure of central tendency.
Such measures are typically paired with a mean of one sort
or another as a measure of central tendency. As in the case
of central tendency, it is impossible to state whether position-based or deviation-based measures provide the best
measure of variation; the answer will always depend on
the nature of the data and the nature of the question
being asked.
Measures of Variability Based on Position
For quantitative variables, the simplest measure of variability is the range, which is the difference between the
maximum and minimum values of a variable. For obvious
reasons, this measure is very sensitive to extreme values.
Better is the interquartile range, which is the distance
between the 25th and 75th percentiles. As such, it is completely insensitive to any observation above or below those
two points, respectively.
Xi N
N
Because the positive and negative deviations cancel out,
measures of variability must dispense with the signs of
the deviations; after all, a large negative deviation from
the mean is as much of an indication of variability as
a large positive deviation.
In practice, there are two methods to eradicate the
negative signs: either taking the absolute value of the
deviations or squaring the deviations. The mean absolute
deviation is one measure of deviation, but it is seldom
used. The primary measure of variability is, in effect,
the mean squared deviation. For a population, the
variance parameter of the variable X, denoted by s2X , is
defined as:
PN
Xi m2
2
:
sX i1
N
However, the units of the variance are different than the
units of the mean or the data themselves. For example,
the variance of wages is in the units of dollars squared,
an odd concept. For this reason, it is more common for
researchers to report the standard deviation, denoted by
sX, which for a population is defined as:
s
PN
2
i1 Xi m
:
sX
N
Unlike the formulas for the population and sample
means, there is an important computational difference
between the formulas for the variance and standard
deviation depending on whether one is dealing with
a population or a sample. The formulas for the sample
variance and sample standard deviation are as follows:
s
Pn
Pn
2
2
i1 Xi X
2
i1 Xi X
:
SX
,
SX
n1
n1
Note that the divisor in these calculations is the sample
size, n, minus 1. The reduction is necessary because the
calculation of the sample mean used up some of the information that was contained in the data. Each time an
estimate is calculated from a fixed number of observations, 1 degree of freedom is used up. For example, from
a sample of 1 person, an estimate (probably a very bad
one) of the mean income of the population could be obtained, but there would be no information left to calculate
the population variance. One cannot extract two estimates
from one data point. The correction for degrees of
663
K
X
p2k ,
k1
664
Inferential Statistics
On October 7, 2002, the New York Times reported that
Two-thirds of Americans say they approve of the United
States using military power to oust [Iraqi leader Saddam]
Hussein. This is an extraordinary claim, since it is obviously impossible to know precisely what 290 million people say about anything without conducting a full-scale
census of the population. The claim by the Times was
based on a telephone survey of only 668 adults nationwide, meaning that the times did not know what the remaining 289,999,332 Americans actually had to say about
the impending war. To claim to know what is on the mind
of the country as whole from such a small sample seems
like utter madness or hubris run amok, even if they admit
that their poll has a margin of sampling error of plus or
minus four percentage points.
In fact, they have a very sound basis for their claims, and
under the right conditions they can indeed make valid
inferences about the population as whole from their sample. Much of what one thinks one knows about this countrys people and their attitudesthe poverty rate, the
unemployment rate, the percentage who believe in
God, the percentage who want to privatize Social Security,
etc.is information based on surveys of tiny fragments of
the total population. This section briefly describes the
essential logic of sampling theory and subsequent sections
illustrate some of the most important applications.
Sampling Theory
Suppose one has a large population and one wishes
to estimate the value of some variable in that population.
The problem is that for financial or practical reasons, one
can draw only a small sample from the population. There
are parameters of the population that are unknown, such
as the population mean, m, and the population standard
deviation, s. Even the total size of the population, N, may
not be known exactly. Through some selection process,
one draws a sample. Based on the sample, one calculates
the sample statistics, such as the sample mean and
the sample standard deviation. The sample size, n, is
known, but typically is minuscule compared to the
population size.
The key question, as illustrated in Fig. 1, is how can one
make a valid inference about the population parameters
from the sample statistics? One usually does not care at all
about the sample itself. Who cares what 668 individual
people think about the prospect of war? Only the population values are of interest.
The first step is to understand the process for drawing
the sample. If the sample is drawn in a biased fashion,
one will not be about to draw any inference about the
population parameters. If ones sample is drawn from
665
S
N=?
=?
=?
X1
Selection process
n
X
s
Sample
Population
26,235 Total
combinations
X2
X3
3 cards that can be drawn from the box. Figure 2 shows the
sample space, containing all possible outcomes of the process of drawing 3 cards; each is equally likely and each
outcome has a sample mean associated with it. All of
these possible outcomes, considered together, make up
the sampling distribution of the sample mean, that is,
the probability distribution for the variable that records
the result of drawing 3 cards from the box of 55 cards.
What is the mean of the distribution of these potential
sample means? The Central Limit Theorem, the most
important finding in all of statistics, provides the answer.
The mean of the sample means is the population mean.
Symbolically,
mX m:
On the one hand, this is very good news. It says that any
given sample mean is not biased. The expectation for
a sample mean is equal to the population parameter one
is trying to estimate. On the other hand, this information
is not very helpful. The expectation of the sample mean
just tells one what one could expect in the long run if
one could keep drawing samples repeatedly. In most
cases, however, one is able to draw only one sample.
This one sample could still be high or low and thus it
seems that no progress has been made.
However, progress has been made. The sample mean,
as a random variable, also has a variance and a standard
deviation. The Central Limit Theorem also states that:
s2X
s2
s
; therefore, sX p :
n
n
This formula states that the degree of variation in sample means is determined by only two factors, the underlying variability in the population and the sample size. If
there is little variability in the population itself, there will
be little variation in the sample means. If all the students
in the class are 24 years of age, each and every sample
drawn from that population will also be 24; if there is great
variability in the ages of the students, there is also the
potential for great variability in the sample means, but that
variability will unambiguously decline as the sample size,
n, grows larger. The standard deviation of the sample
means is referred to as the standard error of the mean.
Note that the size of the population does not matter,
666
just the size of the sample. Thus, a sample size of 668 out
of 100,000 yields estimates that are just as accurate as
samples of 668 out of 290 million.
The Central Limit Theorem provides one additional
piece of information about the distribution of the sample
mean. If the population values of the variable X are normally distributed, then the distribution of the sample
mean of X will also be normally distributed. More importantly, even if X is not at all normally distributed, the
distribution of the sample mean of X will approach normality as the sample size approaches infinity. As a matter
of fact, even with relatively small samples of 30 observations, the distribution of the sample mean will approximate normality, regardless of the underlying distribution
of the variable itself.
One thus has a complete description of the probability
distribution of the sample mean, also known as the sampling distribution of the mean. The sampling distribution
is a highly abstract concept, yet it is the key to understanding how one can drawn a valid inference about
a population of 290 million from a sample of only 668
persons. The central point to understand is that the
unit of analysis in the sampling distribution is the sample
mean. In contrast, the unit of analysis in the population
and the sample is, in this case, people. The sample mean,
assuming a normally distributed population or a large
enough sample, is a normally distributed random
variable.
This property of the sample mean enables one to
make fairly specific statements about how often and
by how much it will deviate from its expectation, the
true population mean. In general, a normally distributed
random variable will take on values within 1.96 standard
deviations of the variables mean approximately 95% of
the time. In this case, the standard deviation of sample
means is called the standard error, and the mean of the
sample means is equal to the underlying population
mean.
Concretely, the normality of the sampling distribution implies that the probability is 0.95 that a randomly
chosen sample will have a sample mean that is within
1.96 standard errors of the true population mean, assuming that the conditions for normality of the sampling
distribution are met. And therefore it follows, as night
follows day, that the probability must also be 0.95 that
the true population mean is within 2 standard errors of
whatever sample mean one obtains in a given sample.
Mathematically,
h
s
s i
Pr m 1:96 p 5 X
5 m 1:96 p
0:95
n
n
implies that
i
h
1:96 ps
1:96 ps 5 m 5 X
0:95:
Pr X
n
n
1:96 ps :
for example, X
n
Probability
Statistics
(calculated)
Population
Theory
Sampling
distribution
Sample
Table I
667
Estimator
Mean
Standard error
Distribution
Sample
mean: X
mX m
s
sX p
n
Normal if X is normal
or n is large
Sample
^
proportion: P
mP^ P
r
P1 P
n
s
s12
s2
sX1 X 2
2
n1
n2
Normal if n is large
Difference of
means from
2 independent
1 X
2
samples: X
Difference of
proportions from
2 independent
^1 P
^2
samples: P
sp^
mX1 X2 m1 m2
mP^1 P^2 P1 P2
sP^1 P^2
s
P1 1 P1
P2 1 P2
n1
n2
Normal if both
variables are normal in
the population or if both
samples are large
Normal if both
samples are large
668
Further Reading
Blalock, H. M., Jr. (1972). Social Statistics. McGraw-Hill, New
York.
Everitt, B. S. (1998). The Cambridge Dictionary of Statistics.
Cambridge University Press, Cambridge, UK.
Fruend, J. E., and Walpole, R. (1992). Mathematical Statistics,
5th Ed. Prentice-Hall, Englewood Cliffs, NJ.
Glossary
adaptive (surface specific) Close correspondence of digital
elevation model pattern and density to critical groundsurface features.
critical feature Any of the six fundamental elements of
ground-surface geometry: a peak, pit, pass, pale, ridge, or
course (channel).
digital elevation model (DEM) Any set of terrain heights
defining a topographic surface; commonly arranged in
a square grid, but also a triangulated irregular network or
digitized contour or slope lines.
facet A planar or moderately curved area of sloping terrain,
especially where delimited by explicit rules or procedures.
geometric signature A set of measures describing topographic form sufficiently well to distinguish different
landscapes or landforms.
geomorphometry (morphometry) The numerical characterization of topographic form.
shaded relief (analytical hill shading) A depiction of
topography by varied intensity of reflected light calculated
from slope gradient, aspect, and the location of a simulated sun.
slope line (flow line) A topographic path of steepest descent,
normal to height contours.
triangulated irregular network (TIN) A digital elevation
model comprising heights located at the vertices of triangles
that vary in size and shape commensurate with terrain
complexity.
viewshed The ground area, computed from a digital elevation
model, visible from a specified location.
Digital terrain modeling (DTM), or simply terrain modeling, is a modern approach to the measurement of
Earths surface form. Loosely applied to any computer
manipulation of terrain height (elevation), the term is
variously defined (for example, as the representation of
Introduction
An amalgam of Earth science, computer science, engineering, and mathematics, digital terrain modeling (DTM)
is a comparatively new field, paralleling the development
of digital cartography and geographic information systems
(GIS). DTM provides hydrologic analyses, base maps
for plotting nontopographic information, and other visual
and numerical representations of the ground surface for
civil engineering, national defense, agriculture, resource
management, and education.
Cognate Disciplines
Geomorphometry, the overarching practice of terrain
measurement, includes landforms; discrete features
such as watersheds, landslides, and volcanoes; and landscapes, or continuous topography. DTM principally automates the quantification of landscapes. Originating in
the descriptive geometry of curved surfaces by 19thcentury French and English mathematicians and the
measurement of mountain ranges (orometry) by German
geographers, DTM evolved from disciplines that predate
the electronic computer. Two of these are the engineering-directed quantitative terrain analysis and the processoriented quantitative geomorphology. DTM includes the
computer animation of topographic display, or terrain
rendering; operationally, it resembles metrology, the
measurement of industrial surfaces.
669
670
Terrain Geometry
The continuous surface that DTM must approximate
from a finite sample of discrete heights is complex but
not without order. Topography consists of repeated occurrences of a few geometric elements (Fig. 1), six of
which (four points and two lines) are known as critical
features. A peak is a height maximum; a pit is a local
minimum, or closed depression; a pass, or saddle, is
a low point between two peaks; a pale is a high point
between two pits. Two operational constructscontours,
intersections of the terrain with a surface of constant
height, and slope lines, paths of steepest descent (normal
Figure 1 Some elements of surface form in terrain recontoured to a 10-m interval (source-map interval, 20 ft) from
a square-grid 10-m digital elevation model. 1, Peak; 2, pit; 3, pass;
4, pale; 5, ridge; 6, course; 7, hill; 8, valley; 9, watershed.
A 3.3-km 2.1-km area near Oakland, California, prepared in
GIS software. Northwest trend reflects local geologic structure
and fault lines. Rectangle locates Fig. 3.
Curvature in profile
None
Convex
None
Convex
Curvature in plan
Concave
Concave
DEM Structures
Terrain heights usually are arranged in a data structure for
efficient manipulation and display (Fig. 3). As in GIS
generally, the spatial arrangement is either raster (grid
cell) or vector (pointlinepolygon). A DEM may be
arrayed in a square grid (hexagons or equilateral triangles
are rare), a triangulated irregular network (TIN), and as
height contours or slope lines. All data structures are
compromises, each with advantages and drawbacks.
Square (actually rectangular) grid DEMs mimic the
file structure of digital computers by storing height Z
as an array of implicit X and Y coordinates (Fig. 3A).
Although a grid facilitates data processing, algorithm
development, and registration with spacecraft images,
its discretized (discontinuous) structure and regular
spacing do not accommodate the variable density of terrain features. Many data points are redundant. Grids can
adapt somewhat to topography by recursively subdividing
squares in complex areas, but at a loss of computational
efficiency. The TIN structure (Fig. 3B) is a DEM interpolated from critical features that are extracted manually
from maps or by computer from a grid or contour DEM.
The irregularly distributed heights are vertices of triangles
671
Figure 3 Digital elevation model structures for the 780-m 640-m rectangle in Fig. 1. Dots are
height locations. (A) Square grid (every fifth point); (B) triangulated irregular network; (C) intersections of 20-m contours (heavy lines) with slope lines (light).
DEM Sources
Most DEMs are produced and distributed as square grids.
Early DEMs were created by field survey, visual interpolation of topographic maps, or semiautomated tracing
of contours coupled with computer interpolation. These
methods were replaced by photogrammetric profiling and
then by optical scanning and automated interpolation of
map contours (Fig. 1). DEMs now cover all of Earth,
Earths seafloor, and the planet Mars. The global
GTOPO30, compiled from many maps and several raster
and vector sources, has a 30' ( 1 km) spacing. The digital
terrain elevation data (DTED) system is the U.S. military
equivalent. Global coverage also exists at 5- and 10-km
spacing. The U.S. National Elevation Dataset (NED) is
a seamless 1'' (30 m) DEM (Alaska is at 2'') assembled
from all 55,000 1 : 24,000- and 1 : 63,360-scale topographic maps. Japan is gridded at 50 m, Australia is at
9'' (250 m), and other areas are at various resolutions.
Remote sensing bypasses contour maps and generates
DEMs from direct measurement of terrain height. Among
current technologies are the Global Positioning System
(GPS), digital photogrammetry, synthetic-aperture radar
DEM Preprocessing
Data Limitations
Regardless of source or spatial structure, all DEMs contain systematic or random flaws. Most error in DEMs
derived from contours originates in the maps, which
were not intended to provide heights of the high density
and accuracy desirable for DTM. Because contour maps
are approximations of terrain, just as DEMs are of the
maps, the level of quality guaranteed by official map
standards is only statistical. Locally, contour accuracy
can be low; precision in most level terrain is poor. Contourto-grid processing is a second source of error. All
interpolation procedures are compromises, and some algorithms add spurious pits (feature 2 in Fig. 1), terracing,
and linear artifacts. Advanced technology does not insure
DEM quality; the U.S. 1'' SRTM data are no more accurate
than the 1'' National Elevation Dataset (NED). InSAR,
LiDAR, and other remote-sensing techniques all introduce errors, some of them severe, that are unique to
their technologies.
Data Refinement
DEMs generally must be calibrated, amended, or otherwise prepared for subsequent analysis. Computer processing can create a TIN or grid DEM from scattered
heights, convert from one map projection or data
672
Modeling Landscapes
DTM represents continuous topography by a spatial calculation shared with digital image processing, the neighborhood operation, in which a result is computed from
adjacent input values. The input from a grid DEM is
a compact array of heights (a subgrid or window) moved
through the data in regular increments. The default subgrid is 3 3 (the result is assigned to the central cell), but
may be any size or shape. The result is assigned to each
triangle in a TIN or to each quadrilateral facet in a DEM
defined by contour and slope lines (Fig. 3). Neighborhood
operations characterize topography in three, broadly
overlapping, domainsvertical, or relief (Z); horizontal,
or spatial (X, Y); and three-dimensional (X, Y, Z). The
spatial pattern and frequency distribution of the derived
attributes can vary greatly with DEM resolution.
Relief Measures
Because terrain height is measured from an absolute
datum, Z-domain calculations are the most straightforward. Standard statistics describe height and its first
and second derivatives, slope gradient (Fig. 4), and profile
curvature (Fig. 2). The higher derivatives require very
accurate DEMs. Mean, median, and modal height express
central tendency; height range, or local relief, and standard
deviation describe dispersion. Skewness and its approximation, the elevation-relief ratio (the hypsometric integral), express height (and volume) asymmetry. Kurtosis,
a measure of distribution peakedness or the influence
of extreme values, is rare in modeling height and of
uncertain significance for slope or curvature.
Spatial Measures
Parameters of terrain pattern and texture, unreferenced
to an absolute datum, are more abstract. Derivatives of
orientation, distance, and connectivity, three fundamental qualities of location, include shape, elongation, spacing, sinuosity, parallelism, adjacency, sequence (relative
position), and containment (a feature within another).
Among common X, Y measures computed from neighborhood operations are aspect (Fig. 4), the compass direction
faced by a slope, and plan curvature (Fig. 2). Nearestneighbor statistics and other metrics express qualities
such as uniformity of features (clustered, regular, or random), to which standard statistics do not apply. Some
spatial properties are independent of distance and thus
topologic rather than geometric. Not only are these nonEuclidean properties difficult to quantify, but the X, Y
topology imprinted on most terrain by hierarchically ordered drainage is random. Analysis of critical features by
graph theory, equally applicable where the fluvial overprint is absent or faint, offers an approach to quantifying
topologic attributes.
673
Three-Dimensional Measures
Processing DEMs in the X, Y, Z domain captures the most
complex properties of terrain. Among them are roughness, which comprises both relief and spacing; intervisibility, which quantifies line-of-sight and calculates
a viewshedthe area visible to an observer; openness,
the exposure or enclosure of a location; and anisotropy,
the variance of terrain complexity with azimuth. All
Z-domain attributes vary with the size of the area over
which they are measured. This scale dependence is quantified by the power spectrum, a technique of waveform
analysis; by semivariance, the correlation of relief with
distance; and by the fractal dimension, a measure of
the proportion of fine- to coarse-scale relief.
DTM Applications
Visualization
Diagrams, tables, profiles, and maps all display DTM
results. Histograms show the frequency distribution of
terrain measures, and semivariograms and spectral
functions show their scale dependence. Tables that are
output from correlation and factor analysis reveal redundancy among measures. Though topography may be illustrated in cross-section, by profiles, it is best displayed as
thematic maps in color or gray tone (Figs. 4 and 5). The
most elementary DTM map, showing height contoured in
wide intervals, can be enhanced substantially by adding
three-dimensional perspective or land-cover detail from
an aerial photo or satellite image. Height maps are used
widely as a base for nontopographic information, and vast
resources are committed to the video rendering of height
for computer games and military simulations. The technique of relief shading creates a particularly effective base
map by combining slope gradient and aspect (Fig. 4) to
portray topography as the varied intensity of reflected
illumination (Fig. 5).
Aggregation
DTM may combine several maps of terrain measures
or may incorporate nontopographic data. The likelihood
674
Hydrologic Modeling
Ecology and water resources drive much current DTM.
Parameters of hydrologic behavior that are mapped from
DEMs include watershed area (Table I), specific catchment area (As, the upslope area between slope lines, per
unit contour width), the topographic wetness index
[ln(As/S), an estimate of soil moisture], and the stream
power index (AsS, an estimate of erosive force, where S is
slope gradient). Drainage networks extracted from
a DEM help simulate watershed hydrographs, measure
stream flow, forecast the extent of flooding, and estimate
sediment delivery. The automated techniques delineate
stream channels, by sequentially recording DEM grid
cells along convergent flow paths (Fig. 2) to accumulate
drainage area with decreasing elevation, and then identify
the enclosing ridges (Fig. 7). Fidelity of the resulting
drainage net depends on the algorithm and DEM quality
and structure. Where these procedures organize continuous terrain into a mosaic of watersheds (Fig. 7), the
extracted landforms can be analyzed further as discrete
entities.
(slope, curvature, and symmetry). Additional rules maintain objectivity in measuring dimensions and recording
presence or absence of qualitative attributes. Though
symmetric forms such as volcanoes and impact craters
are captured by a few measures, watersheds, landslides,
and other irregular landforms may require many parameters (Table I).
Because interpreting landforms in the automated environment of DTM risks losing insight from the measurements through their uncritical manipulation by computer,
the data should be explored in different ways. Where
landforms vary in shape with size or relative age, analysis
by subgroups may strengthen correlations and quantify
contrasts in morphology. Identifying redundancy among
parameters, transforming skewed distributions, and
screening data graphically before fitting regression lines
or making other calculations also can clarify landform
interpretation.
Modeling Landforms
DTM of watersheds and other landforms can reveal correlations with physical processes that explain their origin
and morphologic evolution. Linking recognition of
a landform with its physical understanding requires defining the feature, sampling its population, choosing and
measuring descriptive parameters, analyzing the data, and
interpreting the results. A sample of landforms extracted
from a DEM, by rule-based procedures, must be large
enough to include all sizes and morphologic subtypes.
Descriptive measures should reduce landforms to basic
dimensions of height, breadth, area, and their derivatives
Table I Some Morphometric Parameters of a Landform
A Fluvial Watershed
Parameter
Watershed area
Total drainage length
Drainage density
Plan symmetry
Mean height
Relief
Ruggedness number
Elevation-relief ratio
Description
Area of horizontally projected
watershed surface
Length of horizontal projection of all
channels
Total drainage length/watershed area
Area of largest inscribed circle/area of
smallest circumscribed circle
Average of 3050 digital elevation
model heights
Maximum heightminimum height
Relief/drainage density
(Average heightminimum height)/
relief
Further Reading
Brooks, S. M., and McDonnell, R. A. (eds.) (2000). Geocomputation in hydrology and geomorphology. Hydrol. Proc.
14, 18992206.
Discoe, B. (2002). The Virtual Terrain Project. Available on the
Internet at http://www.vterrain.org
Evans, I. S. (2003). Scale-specific landforms and aspects of the
land surface. In Concepts and Modeling in GeomorphologyInternational Perspectives (I. S. Evans, R. Dikau,
E. Tokunaga, H. Omori, and M. Hirano, eds.), pp. 6184.
Terrapub, Tokyo.
Guth, P. L. (2002). MicroDEM package, for Windows 95
and NT. Available on the Internet at http://www.nadn.
navy.mil/users/oceano/pguth/website/microdem.htm
Hengl, T., Gruber, S., and Shrestha, D. P. (2003). Digital
Terrain Analysis in ILWIS. Available on the Internet at
http://www.itc.nl/personal/shrestha/DTA/
Hutchinson, M. F., and Gallant, J. C. (1999). Representation
of terrain. In Geographical Information Systems
(P. A. Longley, M. F. Goodchild, D. J. Maguire, and
675
Discrimination,
Measurement in
Trond Petersen
University of California, Berkeley, Berkeley, California, USA
Glossary
allocative discrimination Occurs in the employment setting
when there is unequal access to jobs at the point of hiring,
promotion, and firing.
disparate impact discrimination Occurs when rules and
procedures that should be irrelevant to a situation confer advantage to one group over another (for example,
minimum height requirements that do not affect job
performance but give males an advantage).
disparate treatment discrimination Occurs when treatment of people depends on race, gender, or other
characteristics that should be irrelevant to a situation.
valuative discrimination Occurs in the employment setting
when female-dominated occupations command lower
wages than male-dominated occupations do, even though
there are no differences in relevant requirements in the
jobs, such as skills and work effort.
within-job wage discrimination Occurs in the employment
setting when wages for doing the same work for the same
employer are unequal.
Introduction
Discrimination may occur in many settings, including
employment, housing, schools and universities, retail
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
677
678
Discrimination, Measurement in
Discrimination, Measurement in
679
There is also concern over disparate impact (structural) discrimination at a broader societal level. Child
care provisions, access to part-time work and over-time
hours, health benefits, pension rules, etc. may have
disparate impacts on men and women. For example, the
degree of wage inequality in a society may be especially
detrimental to women. Women may be at the bottom of an
occupational hierarchy in most societies, but the actual
experience of being at the bottom, and its impact on, say,
the overall gender wage gap, depend on how wide the
distance is between the bottom and the top. These broader
features of entire societies necessitate comparative analyses. They are very important but are rarely directly relevant for understanding the discriminatory aspects of
actions taken by employers within a given society.
Measuring disparate treatment discrimination requires
assessing whether equally qualified people are differentially treated by the same discriminator (i.e., employer),
due, for example, to their race or sex. Measuring disparate
impact discrimination requires investigating employment
rules and procedures. It is necessary to establish that
these are unrelated to job performance, and a correlation
between sex and race, on the one hand, and the rules and
procedures, on the other, must be established. If there
is such a correlation, it can be concluded that the rules
have disparate impact for men and women or for ethnic
groups.
Admissible Evidence
Specific types of data are needed in order to measure
whether discrimination has occurred. The main difficulty
in assessing within-job wage and allocative discrimination arises from the fact that it is employers or their representatives who discriminate, ignoring here co-worker or
customer discrimination. An essential requirement of
obtaining relevant data, making them equivalent to
admissible evidence, is access to specific information
on employer behaviors (for example, how employers handle hiring and promotions related to gender, wages, etc.).
The relevant sampling unit is therefore the employer and
the decisions the employer makes regarding groups. This
is the case in both disparate treatment discrimination and
disparate impact discrimination.
Relevant data usually come from personnel records,
testimonies, interviews, in-depth case studies, and employment audits. A large and mostly quantitative research
streams comparing the outcomes of men and women who
have worked for different employers, but not for the same
employer, are problematic in this regard, not because
this type of research lacks value, but because it is typically
inconclusive in assessing possible discrimination. Standard sample employee surveys do not usually reveal
whether differential outcomes among groups were
680
Discrimination, Measurement in
Discrimination, Measurement in
681
may be both disparate treatment discrimination and disparate impact discrimination. A variety of motives for
discrimination may be present. Allocative discrimination
leads to racial and gender segregation, by which is meant,
in the context here, to unequal distribution of races and
genders in occupations, firms, and jobs. In the United
States, there has been much discussion of the impact of
affirmative action, whereby employers make special
efforts to hire and promote women and ethnic minorities.
The fear of some (especially White males) is that this may
lead to reverse discrimination. Others claim that reverse
discrimination, to the degree that it exists, might counteract already existing and widespread discrimination.
Hiring
Discrimination at the point of hiring entails an intricate set
of issues, and three processes need to be analyzed. The
first concerns the recruitment process (for example,
through newspaper ads, employment agencies, or social
networks). The second concerns who gets offers and who
does not when a job is being filled. The third concerns the
conditions (pay, job level, etc.) under which those hired
get hired, or the quality of the offers given.
Assessing the recruitment process requires observing
and measuring how it occurs, whether due diligence was
given when placing employment ads, whether information about the jobs reached potential applicant pools in
a nonbiased way, etc. These processes are difficult to
document. Disparate impact discrimination may easily
arise in recruitment procedures. For example, if recruitment to a large extent takes place through information
networks (say, referrals from male employees or male job
networks), there may be a disparate impact on women or
minorities. In terms of who gets offers or is hired, discriminfation is also difficult to document. Information on the
applicant pool is rarely available and all that may be accessible to outsiders is information on those hired. Even in
large firms, information on the hiring process is often
incomplete, or if complete, often is not computerized
or is available only in an incompletely documented format. And when the relevant information is available, it
likely is ambiguous and open to many interpretations.
Data on who gets offers and who does not may come
from personnel records kept by firms on all of their
applicants. With information about the qualifications of
the applicants, and their fit relative to the job, it is possible
to assess whether there is disparate treatment in making
job offers. Audit studies provide another source of data on
who gets hired. If matched pairs of equally qualified men
and women apply for the same jobs with same employer,
a higher rate of success by men, compared to women, in
getting job offers is evidence of discrimination.
To assess the quality of offers made and the placements
at hire requires measuring the conditions for all offers
682
Discrimination, Measurement in
Promotion
Allocative discrimination in the promotion process, compared to hiring, is easier to measure and analyze. Although
deciding which employee is more qualified for promotion
usually involves some amount of subjective judgment, on
which there may be disagreements among various assessors, the relative qualifications of those promoted and
those passed over can be documented in a comparatively
unambiguous manner, given the promotion rules of the
organization. Assessment of allocative discrimination thus
requires measuring the relevant job performance and
qualifications of employees, and whether they were subsequently promoted from one evaluation period to the
next. If women or minorities with qualifications equal to
those of men or nonminorities are promoted at a lower
rate, then there is evidence of disparate treatment.
Disparate impact discrimination can also occur at the
promotion stage. For example, if an employer requires
geographic mobility for an employee to advance, this may,
on average, be more difficult for women than men to
satisfy. A requirement of geographic mobility in order
to be promoted, unless mobility would lead to acquiring
a bona fide occupational qualification, such as experience
in different units of an organization, could be classified as
disparate impact discrimination.
A major difficulty arises in measuring discrimination in
promotion to, or hiring into, top positions, often referred
to as the glass-ceiling problem. For top positions,
a considerable amount of intangible information always
goes into a decision; highly qualified people make judgments about highly qualified candidates, and the judgment
to a large extent not only assesses past performance, but
also future potential. Investigating statistical data is rarely
sufficient in such cases, and what is needed is the supplementary thicker description that can be acquired from
observing the hiring or promotion process, how candidates
were selected, job interviews, and so on. Social scientists
rarely have access to information on these processes.
A similar measurement problem arises when the
groups discriminated against are small. For example, an
employer may not have hired any people with visible
Firing
To assess discrimination in firing and layoffs requires collecting data on the grounds for termination and workforce
reductions, and whether there is disparate treatment (for
example, say two women were fired, but two equally or
less qualified men were kept). In the United States, most
employment discrimination lawsuits relate to firing, but
that need not mean that this is the most prevalent form of
discrimination. It may only reflect that it is a form that is
easy to pursue in the courts. In many work settings, layoffs
have been tied to seniority; the last hired is the first to be
laid off. This may disadvantage women, who often have
lower seniority, and can thus be construed as disparate
impact discrimination. Discrimination by age and health
status may also be a factor in firing.
Discrimination, Measurement in
683
certain job catgories lower wages only because the category is female dominated, without there being any other
characteristics of the job that could justify lower wages.
Valuative discrimination has so far arisen only in connection with gender, not in connection with racial or other
kinds of discrimination. It is not necessarily related to any
prejudice against women or to any statistical differences
between the sexes. The employers sole motive may be
to minimize wage costs, paying lower wages by taking
advantage of occupational categorizations as female
dominated. This is an easy (legally blameless) way to
minimize wage costs, compared to, say, within-job wage
discrimination, because in the latter case, an employer
would be in blatant violation of the law. Legislation
regarding valuative discrimination varies across countries.
It is not illegal in the United States, and though it is
illegal in Canada, it is not enforced.
Documentation of valuative discrimination is difficult
and the evidence is often ambiguous. There are two principal ways to measure it. One approach, which can be
called the market approach, is based on use of sample
or census data on individual employees, their wages, occupations, and qualifications. To each occupation is imputed a value or job worth. This can be obtained from
measuring how hard the occupation is to perform, the
skills it requires (such as technical versus people skills),
and so on. This is sometimes obtained from separate
sources, developed for assessing aspects of occupations,
and then matched to individual-level data. An index of the
factors may be created or each factor may be considered
by itself. A low versus high score on a factor or the index
indicates a low versus high job worth. Jobs or occupations
with equal scores on an index are said to be of equal value.
The final step is to regress the individual wages on the
index for job worth (or on the separate factors entering
into an index), on individual qualifications, illegitimate
characteristics such as gender, and, particularly, the percentage of females in the occupation. If there is a negative
effect on the wage due to the percentage of females in the
occupation, then the conclusion is that female-dominated
jobs are undervalued in the market, because, even for
same job worth, as measured by the index or the factors,
and for same individual qualifications, wages in femaledominated occupations are lower than they are in
male-dominated ones. Many such studies have been undertaken in North America, Europe, and elsewhere.
The market-based approach establishes the extent to
which female-dominated occupations earn lower wages,
even when they have the same value as measured by
a job-worth index or the separate job-worth factors.
The job-worth index or factors capture what is valued
in the market. Some factors here that perhaps ought to
be highly valued, such as nurturing skills, which are more
common among women than men, are nevertheless given
a low value in the market. That would indicate a bias in
684
Discrimination, Measurement in
Summary
In measuring employment discrimination, the central
difficulty is in assembling admissible evidence. This is
evidence on how employers treat people differentially
by gender, race, or other illegitimate factors, or how
rules and procedures irrelevant for job performance
advantage one group over another. Evidence of discrimination in hiring, promotions, and setting of wages
needs to be assembled primarily at the workplace level.
The type of measurement needed varies with the kinds of
discrimination.
Discrimination may occur in a number of settings other
than employment. In housing markets, discrimination
against specific ethnic groups can arise as a result of pure
prejudice or from statistical discrimination, as when some
groups are correctly considered to be worse or better
tenants. In retail markets, such as in car sales, discrimination may arise from knowledge of bargaining behaviors
of various groups, which then can be exploited in order to
extract higher sales prices. Regardless of the setting, the
central requirement of a measurement strategy is that
the discriminatory actions are measured on the basis of
how likes are treated as unlikes at the level at which
discrimination occurs.
Further Reading
Bloch, F. (1994). Antidiscrimination Law and Minority
Employment. University of Chicago Press, Chicago, IL.
England, P. (1992). Comparable Worth: Theories and Evidence. Aldine de Gruyter, Hawthorne, New York.
Nelson, R. L., and Bridges, W. P. (1999). Legalizing Gender
Inequality. Courts, Markets, and Unequal Pay for Women
in America. Cambridge University Press, New York.
Reskin, B. F. (1998). The Realities of Affirmative Action
in Employment. American Sociological Association,
Washington, D.C.
Rhoads, S. E. (1993). Incomparable Worth. Pay Equity Meets
the Market. Cambridge University Press, Cambridge, MA.
Valian, V. (1998). Why So Slow? The Advancement of Women.
MIT Press, Cambridge, MA.
Domestic Violence:
Measurement of Incidence
and Prevalence
Jana L. Jasinski
University of Central Florida, Orlando, Florida, USA
Glossary
alpha coefficient A measure of internal validity, ranging from
0 to 1, that indicates how much the items in an index
measure the same thing.
common couple violence Occasional outbursts of violence
from either partner in an intimate relationship.
incidence Frequency of occurrence of a phenomenon.
patriarchal terrorism Systematic male violence based on
patriarchal control.
prevalence Presence versus the absence of a social phenomenon.
Issues in Definition
What Is Domestic Violence?
Research on domestic violence has almost a 30-year history and there exists a general understanding of the meaning of the term. At the same time, however, when pressed
to indicate exactly which actions and relationships should
be considered as domestic violence, there is little consensus. According to the Model Code on Domestic and Family
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
685
686
Issues in Measurement
How Do We Measure Domestic
Violence?
Although the debate about how to define domestic violence is essential, equally important to consider is how to
measure the phenomenon. The most common assessment of domestic violence focuses on physical assaults,
687
Common Measures of
Domestic Violence
Conflict Tactics Scales
The Conflict Tactics Scale (CTS) is perhaps the most
widely used and accepted instrument to assess assaults
by intimate partners. It is also the subject of a great deal
of controversy. The original CTS was based on the
theoretical framework provided by conflict theory and
was developed as a way to measure overt actions used
as a response to a conflict of interest. Specifically, this
instrument measures the extent to which specific tactics
have been used by either partner in an intimate relationship. The original design of the CTS was to begin with the
less coercive items and move toward the more aggressive
items so that the socially acceptable items were presented
first to the respondents. The CTS is designed to measure
three methods for dealing with conflict: reasoning, verbal
aggression, and physical force. The alpha coefficients for
reliability ranged from a low of 0.50 for the reasoning
scale to 0.83 for the physical assault scale.
688
Table I
Measure
Conflict Tactics Scale
CTS
CTS2
Composite Abuse Scale
Severity of Violence Against
Women Scales
Abusive Behavior Inventory
Index of Spouse Abuse
Propensity for Abuse Scale
Measure of Wife Abuse
Definition
Both CTS and CTS2 measure the extent to which specific tactics are used by individuals in
an intimate relationship
Measures reasoning, verbal aggression, and physical force (19 items, form R)
Measures sexual coercion, physical injury, physical assault, psychological aggression, and
negotiation (39 items)
Measures different types of abuse and frequency of abuse; includes four scales (emotional
abuse, physical abuse, harassment, and severe combined abuse)
Measures symbolic violence, threats of physical violence, actual violence, and sexual
violence (46 acts are included)
Measures frequency of physical, sexual, and psychological abuse (30 items)
Measures severity or magnitude of both physical and nonphysical abuse (30 items)
Nonreactive measure of propensity for male violence against a female partner (29 items)
Measures frequency of verbal, physical, psychological, and sexual abuse, and emotional
consequences of abuse for the victim (60 items)
689
690
Further Reading
Brownridge, D. A., and Halli, S. S. (1999). Measuring family
violence: The conceptualization and utilization of prevalence and incidence rates. J. Fam. Violence 14, 333 349.
DeKeseredy, W. S. (2000). Current controversies on defining
nonlethal violence against women in intimate heterosexual
relationships. Violence Women 6, 728 746.
DeKeseredy, W. S., and Schwartz, M. D. (2001). Definitional issues. In Sourcebook on Violence against Women
(C. M. Renzetti, J. L. Edelson, and R. K. Bergen, eds.),
pp. 23 34. Sage Publ., Thousand Oaks, California.
Dobash, R. P., Dobash, R. E., Wilson, M., and Daly, M.
(1992). The myth of sexual symmetry in marital violence.
Social Probl. 39, 71 91.
Dutton, D. G. (1995). A scale for measuring propensity for
abusiveness. J. Fam. Violence 10, 203 221.
Hegarty, K., Sheehan, M., and Schonfeld, C. (1999). A
multidimensional definition of partner abuse: Development
Glossary
contingency table A multivariable table with cross-tabulated
data.
Likert scale A rating scale to measure attitudes by quantifying subjective information.
quasi-symmetry A generalization of the contingency table
model symmetry corresponding to the generalized multidimensional Rasch model.
Rasch model A model providing a formal representation of
fundamental measurements.
social surveys and his work on the sociology of measurement. He eventually developed a very critical attitude
toward both the overemphasis on data analysis, particularly the frequent applications of advanced structural
equation models, and the lack of fundamental work on
design and measurement problems in sociology.
Introduction
Otis Dudley Duncan (1921) is a leading figure in
20th-century social science. Duncans academic career
encompasses over 40 years of groundbreaking contributions to social science theory and methods. Duncans contribution to the scientific literature includes 16 authored
and 6 edited books, as well as over 200 articles in scientific
journals and other publications.
Duncan was born in Texas in 1921 and grew up in
Stillwater, Oklahoma, where his father headed the
Department of Sociology at the Oklahoma A&M College
(subsequently Oklahoma State University). He started his
training as a sociologist before World War II, but was
drafted immediately after completing his Master of Arts.
After the war, Duncan went to the University of Chicago,
where he was trained by William F. Ogburn and Philip M.
Hauser. He took his doctorate in Chicago in 1949. His long
career includes professorships at the universities of
Chicago, Michigan, Arizona, and Santa Barbara. Short appointments as a visiting scholar to Oxford, England and
Vienna, Austria in the 1970s were important for his contributions to social measurement. Since his retirement in the
fall of 1987, he has been Professor Emeritus at the University of California at Santa Barbara.
Throughout his career as a sociologist, Duncan
introduced many methods of data analysis, and often
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
691
692
693
Rasch Measurement
In 1973, Duncan spent time at the Institute of Advanced
Studies in Vienna teaching path analysis and structural
equation modeling. There he was introduced by one of his
students to the work of Georg Rasch. Until then, Raschs
work had largely been applied to psychological and educational tests. According to Duncans own recollection, he
did not pursue the idea immediately, but only after having
learned about log-linear contingency table analysis in the
late 1970s. In the early 1980s, Duncan chaired the Panel
on Survey Measurement of Subjective Phenomena of the
National Research Council. In the final report from the
panel, he introduced the idea of Rasch measurement as
a way of evaluating social survey measurement. He used
a log-linear contingency table model to demonstrate the
model. The work was more or less contemporaneous with
Tue Tjurs work in 1982 and Henk Keldermans, in 1984,
and contributed along with these to the identification of
Rasch models as a special class of log-linear models.
Theoretical Foundation
Many texts on Rasch focus on the mathematical and statistical properties of the models. For Duncan, the theoretical foundation of Raschs perspective was of major
importance. He emphasized the scientific meaning of
Raschs term specifically objective measurement,
used by Rasch in 1966 and 1977 in describing the unique
properties of the model. In particular, Duncan cared
about its potential consequences for social science. In
1960, Rasch had given his definition of objective measurement an exact formal representation in what was later
called the one-parameter Rasch model. Duncan pointed
out that social survey measurements must adhere to the
694
one-parameter Rasch model in order to justify quantitative comparisons between objects measured on any
proposed social scale. The measurement instrument is
independent of the measured objects (specifically
objective), and thereby valid as an instrument, only if
the one-parameter model holds.
In view of the difficulty in satisfying the Rasch model,
modern psychometric developments defined the two- and
three-parameter item response theory (IRT) models, relaxing Raschs basic invariance criterion. In contrast to this,
Duncan emphasized the importance of empirically verifying the one-parameter Rasch model on any given set of
empirical data. Only when the data fit this model were
the estimates invariant measurements in Raschs sense.
Such a strong test of the quality of social measurement
had in Duncans view hitherto not been available, and its
potential for the advancement of quantitative social science
was important. Instead of the IRT models Duncan developed models that retained Raschs criterion of specific objectivity in a multidimensional context. The models draw on
earlier work by, for instance, Louis Guttman in the 1950s;
Guttman had suggested the possibility of two-dimensional
response structures for the deterministic Guttman response
model. The multidimensional Rasch model is based on the
same idea of a noninteractive multidimensional response
structure, but allows for a probabilistic rather than
a deterministic relationship between the unmeasured latent
variables and the manifest response.
Item A
Item B
1/(1 bAai)
bAai/(1 bAai)
1/(1 bBai)
bBai/(1 bBai)
Item A
Item B
1/(1 ai)
ai /(1 ai)
1/(1 bBai)
bBai /(1 bBai)
a2i =1
ai bB ai
bB a2i :
Response
Item A Item B
No
Response
Item A Item B
No
Yes
Yes
2
No
Yes
Table IV
695
No
Yes
ln S0
ln S1
ln b ln S1
ln b ln S2
696
Table V Coding of Content and Intensity as Latent Dimensions in the Response Process
Attitude response
Negative
Neither negative nor positive
Positive
Direction (ai)
Intensity (zi)
0
1
2
0
1
0
For cross-classifications extending beyond two threecategory responses (e.g., 3 3 3 tables), it is possible to
write models that test Rasch-like multidimensional invariance. Duncan and Stenbeck examined the use of Likert
response formats in opinon research (e.g., strongly
agree, agree, neutral, disagree, strongly disagree).
They found evidence of multidimensionality corresponding to the presence of the qualifying intensity labels
(e.g., strongly agree, strongly disagree), as well as
the neutral response alternatives. This questions the general opinion among survey methodologists that Likert
scales may increase the precision of unidimensional measurement. It may instead introduce unintended additional
sources of variation in the response process. If the additional sources can be estimated as additional traits operating independently of the content dimension, then
multidimensional Rasch measurement is achieved. But
the evidence seemed to suggest that the introduction of
the intensity labels on response categories confound the
measurements in a way that renders the Rasch model
untenable, a result that spoke against the application of
the Likert response format in social survey measurement.
Duncans Critique of
Contemporary Sociology
Duncans late work on Rasch models as applied to survey
data repeatedly produced results that questioned the
validity and reliability of a large body of research tools
and empirical results derived from social science surveys.
Duncans critique of social science was eventually
Further Reading
Blau, P. M., and Duncan, O. D. (1967). The American
Occupational Structure. Wiley, New York.
697
Durkheim, Emile
Stephen P. Turner
University of South Florida, Tampa, Florida, USA
Glossary
anomie An absence or diminution of the effective power of
social facts, such as obligations, over the individual, or
underregulation.
collective consciousness The source of causes, corresponding to individual consciousness, which constrain individual
thinking, typically through feelings of obligation.
generality and externality The two definitional properties
of social facts: general, meaning appearing throughout
society (which may also refer to such societies as the
family), and external, meaning exercising constraints on
the individual.
social fact A collective cause that is manifested as a constraint
on individuals and is general or collective in origin, or an
effect of such a cause.
social realism The doctrine that collective causes are real
forces or that collective entities, such as society, are real.
The French sociologist Emile Durkheim is generally regarded as one of the two major founding figures of sociology (along with Max Weber) and the primary source of
the central ideas of this discipline about the scientific
significance of statistical facts. An intellectual leader
who built up a school around the journal he edited,
LAnnee Sociologique, Durkheim exercised influence
on educational policy in the Third Republic and was
a supporter of secularism and the spiritual or moral
Socialism of the time. His major quantitative work, Le
Suicide, published in 1897, is regarded as the classic
work of quantitative sociology. His shorter methodological treatise, The Rules of Sociological Method, contains
many of the standard arguments for understanding statistical facts as indicators of latent causal processes, which
Durkheim directed at the received tradition of social science methodological thought represented by John Stuart
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
699
700
Durkheim, Emile
Durkheim, Emile
Social Fact
Durkheim took a great deal of his rhetoric about treating
social facts as things from Adolphe Quetelet; the idea of
treating collective statistics as measures of underlying
social phenomena is Quetelets as well. Quetelet,
however, conceived these claims in terms of an analogy
between statistical variation and the perturbations of the
planetary paths, both of which could be understood as
variations around stable curves, or laws, which in the social case were typically represented as the curves that
could be fit to a tabular presentation of rates, such as
age-specific rates of marriage. As a consequence, Quetelet
focused his attention on the theoretical embodiment of
societal means or central tendencies, which led to his
theory of the homme moyen (the average individual),
who was understood to represent the center of gravity
produced by social influence on the individual in
a given society.
Durkheims problem in The Rules of Sociological
Method was to provide an alternative account of the significance of these more or less stable rates within a society
that did not depend on this dubious analogy and that
allowed other kinds of evidence, such as the evidence
of fixed and stable phenomena (such as laws) to be understood in the same terms. Accordingly, Durkheim
urged that collective statistics be treated as things and
that many other stable features of social life, such as
laws and customs and even proverbs, be understood in
the same way. Durkheims argument is that these social
facts are the visible by-products of hidden causal
processes, and he locates these causal processes in the
collective consciousness. Social facts are by definition
general and external, meaning that they operate in
a collective consciousness that is part of the mental life
of all individuals in a society and impose themselves on the
individual as though from outside.
The concept of social fact that he employed in
Suicide and defended in his methodological manifesto
The Rules of Sociological Method reflected this basic
701
Collective Consciousness
By collective consciousness, Durkheim now meant something different than what he had meant in The Division of
Labor in Society. It now indicated a kind of mental cause
that lives within each individual mind, so that the individual is a duplex, with psychological impulses originating
from the individual, but also from what Durkheim called
collective currents or causal forces that are the product of
the pooling of the causal powers of the many minds that
join together in a collective social relationship such as
a society. Durkheim thought that such forces were produced by any society and considered a national society,
a classroom, and a single family to be collective entities
with their own social forces. Individual conscious experience reflected, but only imperfectly, the operations of
these causal forces. An individuals sense of obligation,
for example, is experienced as external, and for Durkheim
the externality is genuine in that the force of obligation is
a real causal force arising in the collective consciousness.
Introspection, however, is an inadequate guide to the
actual causal processes at work in the case of the production of a feeling of obligation, and similarly the manifestations in values of underlying collective consciousness
processes are imperfect guides to these processes. But
there are circumstances under which rates can serve as
good indices of latent lawlike causal relations.
The novelty of this argument, but also its closeness to
widely expressed ideas about the relationship between
observed social statistics and underlying causal laws,
must be understood if we are to consider it as an attempt
to solve the puzzle that in many ways represented an
intellectual dead end for the older social statistics tradition at the time Durkheim was writing. Texts of this and
earlier periods routinely reported results in the form of
tabular data, often of rates (such as Halley tables) or visual
702
Durkheim, Emile
Suicide
Suicide rates were routinely invoked in the nineteenth
century as evidence of the power of collective forces,
however this idea might be understood, and suicide
seems to embody the problem and the promise of the
quest for underlying laws in its most extreme and dramatic
form. Individual decisions to commit suicide are deeply
private and paradigmatically individual. They do not
directly arise from obligation, law, or any other ordinary
external compulsion; nevertheless, suicide rates are stable
in a way that makes it appear that they are compelled. The
problem of suicide in the collection of suicide statistics
and their analysis was a major concern of mid-nineteenthcentury statisticians, and their studies produced quite
extraordinary collections of analyses of the diurnal, seasonal, and other periodicities of suicide, showing that they
decreased in times of revolutionary ferment and varied
with latitude, light, and temperature, as well as with religion, marital status, age, and so forth. Jacques Bertillon,
in an encyclopedia article, estimated the interaction between age and marital status in order to estimate the
preservative effect of marriage when the preservative effect of age had been taken into account, an early form of
the reasoning that G. U. Yule was to develop as partialling,
but this was unusual. Most of the analyses concerned
tables of rates and their secular trends, in particular to
answer the question of whether suicide was increasing.
One theorist of suicide who attempted to make sense
of the tables, Enrico Morselli, an Italian psychiatrist,
collected a vast number of published studies of suicide
in order to give a social account of the causes of secular
increase in suicide and distinguished patterns in the statistical explanations that could be given different explanations in terms of different social processes. Morselli
introduced the notion of egoism, by which he meant unregulated desire, as a cause of suicide in order to explain
the higher suicide rates of the richer areas of Italy. But
Morselli believed that the intermingling of causes in these
cases limited the value of statistical analysis. Nevertheless,
he had no technique to deal with this problem and presented tables of rates. Durkheim reanalyzed some of these
tables and, with the help of his nephew Marcel Mauss, also
analyzed data on France supplied to him by his main
intellectual rival, Gabriel Tarde.
The first part of Yules classic paper on pauperism was
published in 1895, and this paper transformed the analysis
of intermingled causes. Durkheim acknowledged, but did
not employ, Bertillons methods for controlling for the
contributions of marriage (although they influenced his
discussion of the preservative effects of marriage) and was
unaware of Yules methods. Neither, indeed, was relevant
to the point he wished to make. In keeping with his notion
that the statistical imperfectly represented underlying or
latent processes, he looked instead for cases in which the
statistical patterns pointed most directly to the underlying
processes, and this meant in practice that he operated
with tables of rates, typically suicide rates by category
or region and argued that only cases in which the patterns
were absolutely consistent indicated the presence of law.
Thus, he eliminated as failing to point to a law a huge
number of well-established statistical relationships in
which the monotonic progression of rates was imperfect
and constructed tables that showed perfect monotonically
increasing relationships that he then interpreted as indicating genuine relationships. In theory, this was a stringent
standard. In practice, the method was somewhat arbitrary
in that one could produce imperfections or perfection
in the monotonic increases in the law of the table by
manipulating the intervals. The problem of arbitrariness
of selection of intervals was known at the time, and it was
also discussed in relation to mapping, in which shadings
were determined by intervals.
The very backwardness or peculiarity of Durkheims
analytical technique was recognized at the time, even by
his associates, but ignored by his American admirers. The
book was read by them as a primitive kind of correlational
analysis using aggregate variables, a technique that was
widely deployed in early quantitative sociology, in part as
a result of the computational difficulties in dealing with
data for large numbers of individuals and in part because
the unit of analysis both for Durkheim and for many
of these early studies was a collectivity. In most early
nineteenth-century American studies, statistics were
used on communities (such as per person rates of church
giving). This style of analysis was subsequently criticized
on the grounds of the ecological fallacy, but for Durkheim,
and indeed for many subsequent studies following him
and using these aggregate statistics, the criticism was
misplaced because, as Durkheim himself understood,
the statistics that were being analyzed were measures
of collective currents rather than surrogates for
data about individuals.
Durkheims basic reasoning in Suicide was that
increased rates of suicide were a result of imbalances between over- and underregulation and excessive or inadequate social connectedness. In Durkheims terminology,
Durkheim, Emile
anomie designates underregulation and egoism designates lack of social connectedness or integration, which
he understood as excessive individuation. Insufficient individuation he called altruism, and he suggested that
there was a form of suicide common in primitive societies
that corresponded to it. Overregulation, in theory, produced suicide as well, and he speculated that suicides
among slaves might be explained in this way, but he
treated this possible form of suicide as rare and did not
even examine the well-known statistics on suicides under
different prison regimes. Egoism can be most easily seen
in relation to religion. The well-known fact that suicide
rates were higher for Protestants than for Catholics he
interpreted as a result of the role of free inquiry and its
effect on groups. Protestantism, he argued, produces less
integration; Judaism, as a result of anti-Semitism, produces more integration and consequently less suicide.
The product of less integration is excessive individualism,
and this produces a higher suicide rate. A family life with
children preserves from suicide by integration as well. At
the other extreme from excessive individuation was insufficient individuation, which resulted in what he called
altruistic suicide, marked by an insufficient valuation of
ones own life, which he considered to be common in primitive societies and to account for the statistical differences
between military suicide and civilian suicide. Marriage has
a more complex effect, on which Durkheim spent much
effort; he argued that family life with children produced
an integrative effect separate from the regulative effect of
marriage itself. He knew that whereas married men are less
suicide-prone than unmarried men of the same age,
women generally do not benefit as much from marriage.
So he explained the effect of marriage itself in terms of
regulation or anomie and observed that the preservative
effect of marriage on men was stronger where divorce was
legally difficult, but more favorable to women where divorce was easier. Durkheims explanation for this difference relied on a claim about feminine psychology, namely
that the desires of women are more limited and less mental
in character, so that regulation that benefits men is excessive for wives. This explanation might be thought to violate
his own methodological principle of explaining social facts
solely by reference to social causes, but it also overlooked
an obvious possible social causefatalism resulting from
overregulation. This Durkheim, who was opposed to radical divorce-law reform was predisposed to ignore, although he acknowledged in passing that there might be
some examples of this kind of suicide in modern life among
childless young married women or very young husbands.
703
704
Durkheim, Emile
individuals were grouped, and by such facts as social density. Later on, this picture changed, and, although the
terms and emphases did not shift completely, the later
emphasis tended to be on the ideational aspects of
collective life, particularly on representations, which
Durkheim understood collective life to consist in. The
representations of a society were formed in moments
of collective flux and then crystallized, much as a period
of revolutionary enthusiasm and fervor was followed by
a more or less stable set of symbols, national rituals, and
sacralized forms of public action.
Reception
The legacy of Durkheims ideas, and of the general strategy of measurement and quantitative analysis that
Durkheim established in Suicide, as I have indicated, is
somewhat peculiar for a work that was widely read and
studied as a classic. The concept of anomie had a long
subsequent history in American sociology, but as
a measure applied to the individual, not, as Durkheim
had conceived it, to the social milieu. In anthropology,
the basic dimensions of Durkheims account were recharacterized by Mary Douglas as the features of institutions
and of perception, especially in her notion of group-grid
analysis, which defined group and grid axes as Cartesian
coordinates into which institutions could be placed (thus,
the Nazi party is considered high in group and high in grid,
i.e., high both in integration and in regulation). This use of
Durkheims basic concepts in anthropology, however, has
generally not been associated with either measurement or
quantitative analysis. During the middle part of the twentieth century, anomie was a widely used concept. Robert
Mertons paper on anomie, designed as an explanation of
criminal behavior as sharing societal goals and aspirations
but indifferent as to the legality of the means, was one of
the most widely cited texts in social research during
this period. Within Durkheims own camp, his protege
Maurice Halbwachs, who continued to write on the statistics of suicide, notably in the 1930 book The Social
Causes of Suicide, modernized the statistical approach
and sought to treat social causes as a variable contributing
to the suicide rate and to reinterpret Durkheims achievement as having pointed out these causeswhich of course
had actually been amply done by his predecessors, especially Morselli for whom the social causation of suicide
was his central thesis.
Further Reading
Besnard, P., Borlandi, M., and Vogt, P. (eds.) (1993). Division
du Travail Social et Lien Social: La The`se de Durkheim un
Sie`cle Apre`s. Presses Universitaires de France, Paris.
Borlandi, M., and Mucchielli, L. (eds.) (1995). La Sociologie et
Sa Methode: Les Re`gles de Durkheim un Sie`cle Apre`s.
LHarmattan, Paris.
Jones, R. A. (1999). The Development of Durkheims Social
Realism. Cambridge University Press, New York.
Stedman-Jones, S. (2001). Durkheim Reconsidered. Polity
Press, Oxford.
Lukes, S. (1972). Emile Durkheim: His Life and Work,
a Historical and Critical Study. Harper & Row, New York.
Pickering, W. S. F. (ed.) (2001). Emile Durkheim: Critical
Assessments. 3rd series. Routledge, New York.
Schmaus, W. (1994). Durkheims Philosophy of Science and the
Sociology of Knowledge: Creating an Intellectual Niche.
University of Chicago Press, Chicago, IL.
Schmaus, W. (ed.) (1996). Durkheimian sociology. J. Hist.
Behav. Sci. (special issue) 32(4).
Turner, S. P. (ed.) (1986). The Search for a Methodology of
Social Science. D. Reidel, Dordrecht.
Turner, S. P. (ed.) (1995). Celebrating the 100th anniversary of
Emile Durkheims The Rules of Sociological Method.
Sociol. Perspect. (special issue) 38(1).
Glossary
cohort A group of like-aged individuals born in the same
calendar year or group of years.
Cross-sectional data Data that provide the characteristics of
a population at a particular point in time (i.e., census data).
discrete choice models A family of models developed using
utility theory to represent choices between finite, discrete
alternatives.
gravity model A model used to account for a variety of spatial
interactions, including migration movements, based on an
analogy to Newtons gravity equation.
gross migration The number of in-migrants to a region or
out-migrants from a region, expressed either as an absolute
number or as a migration rate.
longitudinal data Data that include a history of migration
and residences.
Markov property The probability of migrating between two
areas is dependent only on the current location.
migrant An individual who makes a permanent change of
residence that alters social and economic activities.
multiregional cohort models Models that project a population into the future subject to constant birth, death, and
migration rates.
net migration The difference between the number of in- and
out-migrants to a region.
Introduction
Populations in Motion
Since the first settlements along the Atlantic seaboard, the
geographic center of the American population has consistently shifted west and south as the population expanded across the country. Defined by five major
periods including western settlement, movement from
rural to urban areas, movement from the city to the suburbs, movement from the rustbelt of the Northeast and
Midwest, and counterurbanization, the directions and
reasons that Americans move has changed over time.
In fact, the seemingly restless nature of the American
population is translated into a continuously changing population distribution, with the large volume of migration
partly attributable to an apparent lack of rootedness
within the population, but which also underlies deeper
economic, cultural, or social issues. Indeed, Americans
are one of the most mobile populations in the world,
leading Long (1988, p. 57) to comment that Americans
have a reputation for moving often and far away from
home, for being committed to careers and life-styles,
not places. Americans are not the only highly mobile
population, with approximately 20% of the Canadian population changing residence each year and with similarly
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
705
706
The Problem
The migration literature is full of examples regarding the
dynamism and temporal aspects of migration, including
the impact of life cycle effects such as marriage, divorce,
empty-nesting, and failing health on migration propensities, return and chronic migration (multiple migrations
by the same person), the lag between opportunities and
migration as information diffuses over space, and the
changing economic and social determinants of migration.
For instance, the migration literature has noted the
declining importance of distance. Whereas distance
was once a formidable barrier to long-distance migration
due to the physical costs of relocation and costs associated
with attaining information on distant opportunities, it has
become much less so as enhancements to telecommunications and transportation have eased its burden. Likewise, the importance of amenity effects has likely
increased in the past two decades, as individuals and corporations alike are able to vote with their feet, highlighted by the rapid population and economic growth
of states such as Colorado, Utah, and Arizona and the
Canadian province of British Columbia.
In simple terms and given such highly mobile societies,
the problem is how to accurately capture the nature of
population movement within migration models. Representing the mathematical, conceptual, or statistical representations of testable hypotheses, models frequently try
to identify the factors that explain migration behavior by
identifying statistical relationships and selecting the dependent and independent variables to operationalize the
model. Based loosely on Newtonian physics, the gravity
model, for instance, has a long and rich history within the
migration literature but is conceptually and empirically
limited. Instead, given the propensity to migrate within
most populations, there is a need to understand how migration into and out of an area is likely to evolve over time
with respect to both personal factors, such as age and
education, and regional effects, such as changing employment opportunities. Yet, much of the data utilized within
migration models are cross-sectional in nature, relying on
a static image or snapshot of a population system at
a particular point in time. As a consequence, many models
typically rely on a single period to capture what is an
inherently dynamic process. The problem, therefore, is
707
708
709
pt n Gn pt
pt 5n G3 pt:
710
Econometric Methods
Given that migration is typically motivated by economic
concerns, it is only logical that analysts have attempted to
link both economic and demographic components and
their interrelationships with a single model or set of
models. In 1986, Kanarolgou et al., for example, described
discrete choice models, where potential migrants choose
between a finite set of migration options. Rooted in utility
theory, the probability of migration from i to j is assumed
to be an outcome of a utility maximization process.
Migration is treated as a random phenomenon and
probabilities are used to describe migration propensities
due to uncertainties on the part of the observer and the
migrant. A set of variables representing the economic,
geographic, or social characteristics of the regions (i.e.,
regional wage rates, employment, population effects)
along with observed personal attributes (i.e., selectivity
measures such as age, level of education, gender of the
migrant) operationalizes the model, with the utility
function written as
Uin Vin ein :
The first component, Vin, is the deterministic component, describing utility via a set of observed characteristics (i.e., employment growth rate, wage levels,
unemployment level, and personal attributes). The
second, stochastic component, ein represents all unobserved differences in utility, making the model
a random utility model.
Using the random utility concept, the migration decision can be formalized such that the probability that an
individual will choose to migrate to j is
P Mij Pr Uj 4Ui for all j not equal to i: 8
Assuming that all ein are independent and identically
distributed yields the multinomial logit model defined by
expVi
:
P i P
exp Vj
10
11
12
711
712
13
14
Pseudo-cohort Analysis
Although longitudinal data files offer significant advantages when modeling migration, information relating to
relatively small populations such as the foreign-born or
Aboriginal populations is often limited by small sample
sizes. More generally, detailed geographical information
may be limited given the relatively small sample size of
many longitudinal data files. Reliance on a single census
interval, on the other hand, is also imperfect, given that it
is a cross-sectional representation of a population at
a particular point in time. Ultimately, information derived
from a single census tells us little about the dynamics of
the populationhow it evolved over time, where it
evolved from, or the differential directions and paths
through which it evolved.
One option is to link consecutive census files (or other
cross-sectional sources), creating pseudo or artificial
panels. The advantage of linking census files is the ability
to follow groups and changes to those groups over time. In
particular, the analyst can define cohorts that may be
followed through time as they artificially age and make
spatial (i.e., by migrating) or other (i.e., economic, occupation, social) adjustments. Cohort definition should rest
on the issues that are most salient to the study, with samples defined with respect to age. For example, a cohort
aged 25 to 34 in 1980 would be aged 35 to 44 in 1990. In
addition to changes in their age profile, other social, economic, and demographic characteristics of the cohort will
evolve over the interval. Clearly, although individuals cannot be followed, this methodology can trace aggregate
changes in groups or group behavior over time.
Pseudo-cohort analysis has proven particularly useful
as a tool to examine the immigrant population where key
questions continue to revolve around their assimilation
and settlement, both of which occur across a relatively
713
714
Further Reading
Beaumont, P. M. (1989). ECESIS: An Interregional Economic
Demographic Model of the United States. Garland, New York.
Ecological Fallacy
Paul A. Jargowsky
University of Texas, Dallas, Richardson, Texas, USA
Glossary
Abstract
In many important areas of social science research,
data on individuals are summarized at higher levels of
aggregation. For example, data on voters may be published only at the precinct level. The ecological fallacy
refers to the incorrect assumption that relationships between variables observed at the aggregated, or ecological,
level are necessarily the same at the individual level.
In fact, estimates of causal effects from aggregate data
can be wrong both in magnitude and in direction. An
understanding of the causes of these differences can
help researchers avoid drawing erroneous conclusions
from ecological data.
715
716
Ecological Fallacy
was stated in terms of simple bivariate correlation coefficients, his critique is a challenge to regression analysis on
aggregate data as well. All slope coefficients in bivariate
and multiple regressions can be expressed as functions of
either simple or partial correlation coefficients, respectively, scaled by the standard deviations of the dependent
and independent variables. Because standard deviations
are always positive, the sign of any regression coefficient
reflects the sign of the correlation coefficient on which it is
based, whether simple or partial. Thus, regression analysis
on aggregated data, a common practice in several disciplines, runs the risk of committing the ecological fallacy
as well.
717
Ecological Fallacy
First, consider a few simplified scenarios using scatterplots, following Gove and Hughes 1980 example. Suppose the interest is in a dichotomous dependent variable
such as dropping out of high school, coded as either 1 if
a person is a dropout or 0 if a person is not. Further, suppose
there are two groups, White and Black, and the basic question of interest is whether members of one group or the
other are more likely to drop out. Data on individuals are
lacking, but the overall proportion of persons who are
dropouts in three different neighborhoods is known. Also
known is the proportion of Black persons in each of the
three neighborhoods, which for the purpose of illustration
is set to 0.20, 0.50, and 0.80.
Figure 1 shows how ecological inference is supposed to
work. The figure shows the separate rates for Whites and
Blacks as dashed lines, because the researcher does not
observe these data. The Black group has a higher dropout
rate than does the White group, and so as the proportion
of Black persons in the neighborhood rises, the overall
dropout rate also rises. In this case, it is possible to infer
correctly from the aggregate data that Blacks are more
likely to drop out.
Figure 2 shows how the ecological data can give misleading results. In this case, Whites have a higher dropout
rate than do Blacks in each neighborhood. However, the
dropout rate of both groups rises as the percentage of
Blacks in the neighborhood rises, perhaps because the
percentage of Blacks in the neighborhood is correlated
with some other variable, such as family income. Even
though higher rates exist for Whites than for Blacks in
every neighborhood, the ecological regression coefficient
will have a positive slope, because the overall dropout rate
rises as the percentage of Blacks rises. In this case, the
ecological regression would correctly report that the de-
0.5
0.4
Dropout rate
0.3
0.2
0.1
0
0.2
0.5
Blacks in neighborhood (%)
0.8
0.5
0.4
DV rate
Dropout rate
0.4
0.3
0.2
0.3
0.2
0.1
0.1
0
0
0.2
0.5
Blacks in neighborhood (%)
0.8
0.2
0.5
Blacks in neighborhood (%)
0.8
718
Ecological Fallacy
0.5
Dropout rate
0.4
0.3
0.2
0.1
0
0.2
0.5
Blacks in neighborhood (%)
0.8
In Fig. 4, Blacks do have higher dropout rates, compared to Whites, in each neighborhood, and the rates of
both groups rise as the percentage of Blacks increases.
Regression on the aggregate data produces a positive
slope, but virtually all of that slope is driven by the common dropout increase in both groups in the more heavily
minority neighborhoods. Only a small fraction of the slope
reflects the influence of the race of individuals on the
dropout rate. In this case, the direction of the ecological
inference would be correct, but the magnitude of the
effect would be substantially overestimated.
Yj a bXj uj :
Ecological Fallacy
719
1. Random grouping is not very likely to arise in practice, but it is instructive to consider the possibility. If
the data are aggregated randomly, and the model
was correctly specified at the individual level, there
will be no aggregation bias. The expected value of
mean X and mean u for all groups will be the grand
mean of X and u, respectively, and they will not be
correlated.
2. If the grouping is based on the X (or multiple Xs),
there will be no aggregation bias. This follows because the conditional mean of the disturbance term
is zero for all values of X if the individual model is
correctly specified.
3. If the grouping is based on Y, aggregation bias is
very likely. For example, if Y and X are positively
related, in the groups with higher levels of Y, both
high values of X and larger than average disturbance
terms would be found, and at lower levels of Y, the
opposite would occur. Clearly, the aggregate levels
of X and u will be correlated and the ecological
regression is misspecified.
4. Grouping based on geography, the most common
method, is also the most difficult to evaluate, because neighborhood selection may be based on
a complex set of factors operating at different
levels. However, if the dependent variable is
something like income, the danger exists that
neighborhood aggregation is more like case 3.
If the dependent variable is less likely to be
involved in the residential choice function, then
sorting by neighborhood will be more like
cases 1 or 2.
When data are provided in an aggregate form, the researcher must understand and evaluate how the groups
were formed. Then the researcher must try to ascertain
whether the procedure is likely to introduce aggregation biases or aggregation gains in view of the specific
dependent variable and explanatory models under
consideration.
720
Ecological Fallacy
b1 b2 b3 Xj uj :
Ti Wi Wi Bi Pi a bPi ui :
Ecological Fallacy
concept of a correctly-specified individual-level equation is not helpful in this context, since individual data
contain the answer in ecological inference problems with
certainty. That is, with individual data, we would not need
to specify any equation; we would merely construct the
cross-tabulation and read off the answer. Having the extra
variables around if individual-level data are available
would provide no additional assistance
(p. 49).
721
Least
Further Reading
722
Ecological Fallacy
Economic Anthropology
E. Paul Durrenberger
Pennsylvania State University, University Park, Pennsylvania, USA
Glossary
formalism An
approach
based
on
methodological
individualism.
institutional economics A theory of economics that holds
that people create institutions to minimize the costs of
information gathering and transactions.
methodological individualism The assumption that institutional forms emerge from individual decisions.
practice theory The theory that knowledge is a function of
everyday activity.
substantivism The assumption that all economic systems are
unique and understandable in their own terms.
Economic anthropology is not the importation of economics to exotic locales, but anthropological understandings of all human systems of production, exchange, and
consumption. This paper, is a discussion of measurement
issues in economic anthropology. I discuss how theoretical approaches shape our understanding of systems; how
our understanding of systems forms our view of which
variables to operationalize and measure; the relationships
among mental and material phenomena; and issues of
measurement of both material and mental phenomena.
Introduction
Anthropology is different from other social science and
humanities disciplines because it is at the same time holistic, comparative, and ethnographic. Anthropologists
think in terms of whole systems rather than part systems.
We think as much of the interconnections among
economics, politics, religion, demography, ecology, and
other systems as we do about the variables that modern
scholarship isolates for separate study by different disciplines such as sociology, political science, economics,
demography, and biology. Because we seek to generalize
our conclusions, we situate our findings within a matrix of
findings from many times and places. We attempt to
expand our historic coverage, and thus our range of
comparison, by the recovery and interpretation of
archaeological remains. Where we can observe social
life, we are ethnographicwe base our findings on the
meticulous observation of everyday life in particular
locales. Although ethnography may be theory-driven,
theory in anthropology is highly constrained by ethnographic observation and archaeological findings. There
are a plethora of issues and problems that center on
the measurement and interpretation of archaeological
remains that I do not address here.
Issues of measurement arise from anthropologys
being holistic, comparative, and ethnographicbut perhaps most directly in ethnography when we describe the
workings of concrete economic systems. Because we must
strive for reliability and validity, measurement issues are
paramount in comparative studies as well. How can we
recognize an economic system in a society that is radically
different from the ones with which we are familiar? How
can we compare different economic systems? What kinds
of samples allow cross-cultural generalization? The holistic approach also relies on measurement to the extent
that we wish to be able to show relationships among different aspects of culturesfor instance, security of land
tenure and the allocation of labor.
Because the assumptions we bring to a problem determine what and how we measure, all questions of measurement start with questions of theory. In economic
anthropology, the technical problems of how to measure
variables are almost trivial compared to the theoretical
problems of what to measure and which units to use in
making observations and measurements. On the other
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
723
724
Economic Anthropology
Theory
Economic theory proceeds from the cautionary assumption that we hold all other things equal and from an orientation toward natural science. Does a feather fall faster
or slower than a cannonball? If we abstract away atmosphere and weather and think of them in a vacuum, they
fall at the same rate. This is fine for physics, but engineers
know there are no vacuums in practice, so, while the
insight might be of interest to the formulation of theory,
it is of little practical importance. In their aspiration to
achieve the same abstract level of theory as physicists,
economists define similar uniform, if imaginary, allthings-equal environments. They then proceed to derive
the logical implications of the set of assumptions about
human nature. Anthropologists build their discipline on
the fact that things are never equal.
Neoclassical economists assume that ideal economic
individuals know all the alternatives available to them, can
assess the results of one decision relative to another, and
thus can chose rationally among possibilities. Economic
anthropologists dare to challenge this assumption and
Economic Anthropology
have perfect knowledge, that they are deceitful and selfinterested, and that incomplete information or disinformation increases uncertainty. Thus, people try to gain
more information, but the effort to get information is
costly. Even in terms of economic assumptions, it is costly
to use markets. People therefore create alternatives that
short circuit the process to gain more certaintythese
are institutions. These institutions then guide further decisions.
A major response of economic anthropology to such
abstract theoretical formulations has been to ignore
them in favor of detailed ethnographic accounts of people
in locales. This approach so characterizes American anthropology that it is called American particularism or historical particularism. Barrett sums up the features of this
approach. There is a limited interest in history, and because this approach assumes that habit and custom, rather
than individual decisions, guide social life, it emphasizes
values, norms, and emotions. This leads researchers to
emphasize interior views rather than external ones. The
emphasis on relativism and the associated assumption that
cultures are loosely organized and unique imply that little
generalization is possible. The emphasis on ethnographic
description of individual examples means, as Robin Fox
put it in 1991, that fieldwork replaces scholarship and
thought.
The substantivist approach assumes that all economic
systems are unique and understandable in their own
terms. The formalist approach assumes that all people
share the mentality that economists attribute to peoplethat they allocate scarce means among competing
ends with perfect knowledge of alternatives and hierarchically ranked objectives. In his 1982 study of domestic
production among Oaxacan metate producers and their
relationships to a larger capitalist system of which they are
part, Scott Cook came to appreciate a number of limitations in economic anthropology that both these approaches share as a consequence of their adherence to
the underlying assumption of particularism. He summarizes the limitations as:
1. A tendency to reduce explanations of complex
processes involving interrelated and contradictory
variables to descriptions of isolated events.
2. A tendency to explain economic process at the
empirical level and a failure to develop any
conceptual framework to expose underlying social
dynamics.
3. A pervasive focus on individual agents.
4. An almost completely circulationist view of the
economy.
He finds that economic anthropologists see events as
unique and unrelated because they stay close to the empirical; do not analyze systems; and do not understand,
appreciate, or develop theories. They are more concerned
725
726
Economic Anthropology
Systems
The first problem of measurement is to specify a system.
Our assumptions may provide the answerbut, from the
global to the household, the definition of the system, its
parts, and their relations with one another will determine the variables of interest. These assumptions specify
the relationships we can possibly see among events and
individuals. This kind of analysis is qualitative, but it
specifies the variables and relationships that raise the
questions of measurement.
From an ecological perspective, we can say that it is
always important to trace energy flows. If the inputs are
greater than the outputs, as, for instance, in industrial
agriculture, we know immediately that the system is
not sustainable. This approach does not tell us why anyone
would engage in activity that has no possible future, but
our observations show us without a doubt that, for instance, modern states and corporations do engage in
such nonsustainable agricultural practices.
From a purely economic perspective, if we look at only
market transactions, we ignore the institutional systems
that support the market. For instance, people may sell
their labor on markets, but a commodity approach cannot
ask or answer questions about the source of the labor. The
people are created somehow. Furthermore, markets require vast legal and institutional structures to support
them. They are not natural. They are costly to maintain,
and someone must bear the costs. Seeing economic
systems in this way is a consequence of anthropologys
insistence on holismseeing the relationships among demography, markets, law, and households.
As Marshall Sahlins points out, if we only ask questions
about production, we ignore the uses to which people
put the things they obtain, their reasons for valuing
them. So cultural questions have material consequences
even if we argue that material relations determine cultural
awareness.
Eric Wolf argues that anthropology, like the other social sciences, straddles the reality of the natural world and
the reality of peoples organized knowledge about that
world. This is the distinction between the exterior view
of what is and the interior views of different people at
different places and timeswhat they think or know
about external realities. Some analysts treat the problem
that this poses by disregarding the impasse between the
interior and the exterior realms. Others discount the mental and consider behavior in the material world as primary.
Still others focus on the mental schema that people define
for themselves and consider behaviors in the material
Economic Anthropology
727
Variables
Each set of assumptions suggests different units of analysis and different variables to measure. The first technical
methodological question is how to operationalize each
variable so that we can recognize it. The second related
methodological question is how to measure each variable.
These decisions and categories do not depend on or necessarily reflect locally relevant systems of thought.
If we assume that economic activity is that which
produces, exchanges, or uses something, we measure
the amounts of time and goods as people produce, exchange, and use them. This opens all realms of activity to
our analysis. The invisible labor of households and the
informal economy can become the subjects of inquiry,
and we can develop less distorted and more realistic descriptions of economic systems.
One method of measuring the allocation of labor is
time allocation studies. The objective of the method is
to record what a sample of people are doing at a sample of
places and moments of time. From this information on the
number of times people do specific activities, we can infer
the amount of time they spend doing different things.
Bernard discusses the problems and potentials of this
method. One problem is that observers affect the behavior of the people they are observing; it is difficult to
control for this reaction effect of people being observed.
The method assumes answers to the questions of whom
to observe, how frequently to observe them, how long to
observe them, when to observe them, and where to observe them. A truly random sampling strategy would
have to consider each dimension, except perhaps duration. A less than truly random sampling strategy introduces questions of bias that can never be answered
after the fact. Another problem is the validity of the categories of observation. How do observers categorize the
actions to count them? What counts as an instance of
728
Economic Anthropology
Economic Anthropology
regional systems that enmesh villages. Others have expanded these concepts of regional and central place analysis to different geographic areas.
In her 1994 study of the California strawberry
industry, Miriam Wells develops quantitative data on
many variables from many sources, including government
documents, observation, and interviews. This allows
a meticulous comparison of areas within the state and
illustrates the historical forces at work so she can assess
the causal factors at work, for instance, in the reemergence of share-cropping arrangements after they
had all but disappeared. She finds that different factors
affected different regions in different ways, something
that a theoretical approach, neoclassical or Marxist,
could never have shown because it would have assumed
a uniformity that was unrealistic.
In his 1965 assessment of the civil-religious hierarchy
the Mayan community of Zinacantan, Frank Cancian meticulously measures the wealth and prestige of individuals
to understand how they are related to participation in the
system over time.
These examples, and others like them, show that how
the anthropologist defines the system indicates the
variables to measure and that this sets the problems of
operationalization and measurement that the anthropologist must then solve. The solutions are different for different units of analysisfrom agricultural plot to
household to village to market area to nation to global
system; from production to consumption to exchange;
and from questions of marketing to questions of prestige.
729
730
Economic Anthropology
Decision Making
In 1956, Ward Goodenough argued that if anthropologists impose outside categories such as Malinowskis
sociological or Harriss etic on a culture they cannot
understand how people make the important decisions that
guide their lives. He was discussing residence rules that
specify where newlywed couples live after they get married. The categories were some of the most precisely defined in anthropology, yet Goodenough found that they
did not guarantee reliable interpretations of data. Another
anthropologist had collected census data that were similar
to Goodenoughs, but had arrived at different conclusions
about postmarital residence. Goodenough argued that the
problem was that neither anthropologist had understood
how people actually decided where to live after they married. He proceeded to outline the factors that were relevant to the people and how they balanced them in
making residence decisions. His more general conclusion
is that accurate ethnography depends on describing how
people think about and decide thingsthe internal, ethnographic or emic viewrather than on imposed exogenous categories.
This paper informed ensuing work in cognitive anthropology, much of it collected in Steven Tylers 1969 book,
Cognitive Anthropology. When Roger Keesing returned
to the issue of how to understand how people make allocative decisions, he distinguished between statistical
models that describe the observed frequencies of possible
outcomes and decision models that specify how people
decide on allocations in terms of their own categories and
judgments of salience and importance.
In 1987, Christina Gladwin and Robert Zabawa
agreed with anthropologists rejections of neoclassical
economists explanations of increasing farm sizes and decreasing number of farmers in the United States. This
critique rejects the idea that such social phenomena
were the net result of the decisions of rational self-seeking
individuals and argues in favor of a more institutional,
sociological, structural, or materialist view that
emphasizes social relations, class structures, patterns of
ownership, and conflict among sectors, regions, and countries. Like Keesing, they argue that these structures do not
themselves account for changes but that changes in international markets, technology, monetary policies, and
inflation shape farmers decisions. They argue that people
change structures by their decisions. This is consistent
with Gladwins other work.
More recently, Joseph Henrich argues that although
we can understand phenomena in terms of individual
decision making, people do not in fact base their decisions
on the weighing of costs and benefits. It is incorrect to
assume, he argues, that people use their reasoning abilities to develop strategies to attain goals. In fact, people
Economic Anthropology
One solution to this riddle is the extreme postmodernist one, which argues that structures of meaning are not
anchored in the outside world. Another solution might be
extreme holismto affirm that everything affects everything else and that we cannot sort it all out because it is too
complex. At best, this view suggests, we can provide an
appreciation for the complexity by some attempt to recapitulate it in another mode. In the postmodern view, the
explication of reality needs no mediation, only experience
(not reflection, analysis, or depiction). This is counter to
the principles of both art and science, which highlight
some dimensions and underplay others to organize and
filter reality rather than replicate it. Although it may be
true that everything is related, we do not see hurricane
forecasters watching butterflies to detect how the beat of
their individual wings will impact El Nino.
Here I return to a practical issue and to the work of
Jean Lave. If patterns of thought are situational, determined by changing social structures, then it is not effective to try to change social patterns by changing minds.
The prevalent model of education in the United States
seems to be the transference of abstractions from teachers
to students. Alternatives to this involve learners in practice and more or less resemble apprenticeship programs.
Lave challenges the currently popular view of education
as transference of knowledge that centers on the idea that
scientific research abstracts the principles underlying
phenomena, teachers transfer these principles to students
in classrooms, and students then apply them in their work
and lives. The idea of transference is that people learn by
assimilating abstractions that are transferred to them in
classrooms. This is the logic of safety courses that have no
relationship to accident rates and the classroom education
of fishermen that has no relationship to their subsequent
success. Lave centers her account on computation, mathematics in practice, to show that the principles that
teachers teach are not the ones that people usethat
the knowledge of mathematics that teachers transfer in
classrooms is not used or useful beyond classrooms.
She goes on to argue that schooling has become
a means of legitimizing hierarchy in terms of, to use
Katherine Newmans phrase, meritocratic individualism,
the ideology of ranking individuals according to their
merit and attributing their success, failure, prestige, status, income, and other characteristics to some measure of
individual merit or achievement. Schooling is a way of
establishing an individuals merit for such rankings.
Thus, it becomes its own end, its own objective, and
loses references to the outside world of practice. In the
process, schooling becomes the measure of merit and
becomes rationalized as the means of transferring knowledge so that schooling or education becomes identified
with knowledge. Thus, when people want to change someones behavior, it seems obvious that education is the
answer.
731
732
Economic Anthropology
Conclusion
The theories that we use define the systems that we observe. The systems that we observe define the variables
that we can measure. Some variables are material; some
are mental. We operationalize these variables and measure them in order to compare our theoretical
understandings with the empirical realities we can observe. We use the insights we gain from such work to
refine our theoretical understandings, our understandings of systems, and our measurements. By measuring and using a scientific approach, we can break out of the
endless rounds of other-things-equal speculation and develop reliable and valid understandings of the variety of
economic systems that characterize humanity today and
through history.
Further Reading
Acheson, J. M. (2002). Transaction cost economics. In Theory
In Economic Anthropology (Jean Ensminger, ed.),
pp. 27 58. Altamira, Walnut Creek, CA.
Bernard, H. R. (1988). Research Methods in Cultural
Anthropology. Sage, Newbury Park, CA.
Cancian, F. (1965). Economics and Prestige in a Maya
Community. Stanford University Press, Stanford, CA.
Cook, S. (1982). Zapotec Stone Workers. University Press of
America, Washington, D.C.
Durrenberger, E. P. (2000). Explorations of class and
consciousness in the U.S. J. Anthropol. Res. 57, 41 60.
Durrenberger, E. P. (2001). Structure, thought, and action.
Am. Anthropol. 104, 93 105.
Durrenberger, E. P., and Erem, S. (2000). The weak suffer
what they must. Am. Anthropol. 101, 783 793.
Durrenberger, E. P., and King, T. D. (eds.) (2000). State and
Community in Fisheries Management. Bergin and Garvey,
Westport, CT.
Durrenberger, E. P., and Tannenbaum, N. (1990). Analytical
Perspectives on Shan Agriculture and Village Economics.
Yale University Southeast Asia Studies, New Haven, CT.
Hannerz, U. (1992). Cultural Complexity. Columbia University
Press, New York.
Lave, J. (1988). Cognition in Practice. Cambridge University
Press, New York.
Malinowski, B. (1922). Argonauts of the Western Pacific.
Waveland, Prospect Heights, IL.
McCay, B. M., and Acheson, J. M. (eds.) (1987). The Question of
the Commons. University of Arizona Press, Tucson, AZ.
Moore, C., Romney, A. K., Hsia, T.-L., and Rusch, G. D.
(1999). Universality of the semantic structure of emotion
terms. Am. Anthropol. 101, 529 546.
Romney, A. K., Weller, S. C., and Batchelder, W. H. (1986).
Culture as consensus. Am. Anthropol. 88, 313 338.
Wells, M. J. (1994). Strawberry Fields. Cornell University
Press, Ithaca, NY.
Wolf, D. L. (1990). Factory Daughters. University of California
Press, Berkeley, CA.
Wolf, E. R. (1997). Europe and the People without History.
University of California Press, Berkeley, CA.
Wolf, E. R. (1999). Envisioning Power. University of California
Press, Berkeley, CA.
Economic Development,
Technological Change, and
Growth
Paul Auerbach
Kingston University, Surrey, United Kingdom
Glossary
economic growth The growth rate in national income.
economic income The maximum amount that an individual,
firm, or nation can consume without causing a deterioration
in the value of that individual, firm, or nations capital stock
or assets.
human capital Human beings viewed as productive assets.
Education enhances the value of human capital.
Human Development Index (HDI) An index of development aggregating measures of income per capita, life
expectancy, and education.
national income Commonly measured as Gross Domestic
Product: the value of output produced within a nation in
a given year.
Paretian criterion Economic growth takes place in a society
only when at least one member of the society is made
better off (by his or her own reckoning) and none made
worse off.
Introduction
Why Economic Development and
Growth are Important
For the greater part of its history, humankind has lived in
poverty. The long tenure of even the greatest civilizations
of the pastthe Egyptians, the Romansevidenced, by
contemporary standards, little change in the material existence of the vast majority of the population. By contrast,
the exceptional nature of the experience of the past few
hundred years in the richer, more privileged parts of the
world may be illustrated with the numerical example
given in Table I. It is presumed that, at the beginning
of the period, the population has a material claim on resources at a low level of subsistence. This claim on resources will be set as equivalent to $50 per year per capita.
One may note the following results from this simple
hypothetical calculation. First, it is clear that the longlived civilizations mentioned above never approached
monotonic increases in per capita expenditure of even
1% per year, since even this level of growth would,
after a few hundred years, have resulted in material standards for their populations unknown in the ancient world.
Second, seemingly small differences between societies in
per capita growth rates that are sustained over periods of
more than 100 years can evidently cause dramatic changes
in their relative fortunes. One may conclude that the past
few hundred years mark a new period in the material
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
733
734
Table I Per Capita Annual Expenditure of $50 and Its Increase over Time for Annual Rates of Growth of 1%, 1.25%,
1.3%, and 1.5%
After
After
After
After
After
After
After
After
After
After
After
10 years
50 years
100 years
150 years
200 years
250 years
300 years
350 years
400 years
450 years
500 years
1%
1.25%
1.3%
1.5%
$55
$81
$134
$220
$362
$596
$980
$1611
$2650
$4358
$7167
$56
$92
$171
$318
$592
$1102
$2052
$3818
$7105
$13,223
$24,609
$56
$94
$180
$343
$654
$1247
$2378
$4536
$8652
$16,505
$31,484
$57
$104
$218
$460
$968
$2037
$4289
$9029
$19,007
$40,016
$82,241
Income
Consumption
Investment
Government expenditure
Exports minus imports
Wages
Profits
Interest
Rent
to a market economy (as in the early stages of the Industrial Revolution in Great Britain). Furthermore, wide
gaps between rich and poor can distort investment priorities (e.g., a priority in pharmaceutical research may be
given to the diseases of the rich).
Table IV
Year 1
Year 2
$150
$150
Year 5
Year 6
$150
$150
Year 5
Year 6
$225
Year 10
$150
735
Year 1
Year 2
$225
$225
Year 10
?
736
Year 1
Year 2
$150
Year 5
Year 6
Year 10
$150
$150
$150
$150
$120
$120
$120
$120
$120
$30
$30
$30
$30
$30
minus
Accounting depreciation charge
(10-year basis)
Economic income
Table VI
Year 2
$225
Year 5
Year 6
Year 10
$225
$225
$225
$225
$120
$120
$120
$120
$120
$105
$105
$105
Year 5
Year 6
minus
Table VII
Year 1
Year 2
$225
$225
$225
$240
$240
$240
Economic income
($15)
($15)
($15)
Year 10
minus
737
738
Improvements in Communication These improvements can be linked to the most fundamental and important invention of the human species, Homo sapiens
the invention of language. In the modern world, the ability
of individuals to communicate and absorb information in
writing is an inherent aspect of development.
Emulation Improvements in the education and skills of
an individual can hardly be kept secret and emulation by
others is a fundamental aspect of human history. Since
individuals can rarely lay claim and then capture the full
value of the benefits gained by others as a result of their
example, there are bound to be positive spillover effects
from an individuals acquisition of improvements in
education and skills.
Inherently Interactive Activities A simple example
of an inherently interactive activity is the driving of an
automobile. The legal requirement that drivers must pass
a test of proficiency is linked to the notion that the probable benefits to society as a whole are greater than those
that accrue to any one individual: the likelihood of an
individual avoiding a car accident is a function not only
of his or her own proficiency, but that of the other drivers
on the road.
State Governance
Great emphasis has been placed on the quality of state
governance as a key variable affecting economic development. Two aspects have been emphasized.
Democratic Rule
Democracy has been held to be important, not only for its
own sake, but because it can facilitate the process of development. Democratic governments may often have
short time horizons, focusing on the next election, and
their resultant behavior may have deleterious effects on
the nations long-term development. However, the
85
100
100
40,000
25
0
0
100
739
15
5
5
17
23
33
19
26
18
14
21
32
740
Table VIII. For all its weaknesses, the HDI helps to focus
attention away from a one-sided emphasis on GDP
growth as a measure of economic progress.
Further Reading
Dasgupta, P. (1993). An Inquiry into Well-Being and
Destitution. Clarendon Press, Oxford, UK.
Ghatak, S. (2003). Introduction to Development Economics,
4th Ed. Routledge, Andover, MA.
Maddison, A. (2001). The World Economy: A Millennial
Perspective. Organisation for Economic Cooperation and
Development, Paris.
Sen, A. (1999). Development as Freedom. Oxford University
Press, Oxford, UK.
Streeton, P. (1999). On the search for well-being. Econ. Dev.
Cult. Change 48, 214245.
Economic Forecasts
Charles G. Renfro
Journal of Economic and Social Measurement, New York,
New York, USA
Glossary
Introduction
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
741
742
Economic Forecasts
taking this possibility into account when evaluating predictive accuracy obviously can further muddy the water.
At a deeper level, apart from the forecast values, there
is also the fact that the apparent measured outcomes are
not necessarily determinate. Specifically, the first published estimates of the realized values of many macroeconomic variables, including gross domestic product,
employment, and even price measures, are subject to
revision. They may change significantly in the next
month after the first estimates are made, and then one
or more times in the year after that. They may change
multiple times in succeeding years. For example, for the
United States, the aggregate estimates of nonfarm employment published during or immediately after the 1992
presidential election are not even approximately the same
as the estimates published today for that same 1992 time
period. A forecast of employment made at the beginning
of 1992, which at the end of 1992 might then have
appeared to be a wild overestimate of employment,
could today be assessed in retrospect as an
underprediction of what is now believed to be the contemporaneous actual level of employment. Obviously,
todays U.S. employment measurements can have no impact on the 1992 U.S. presidential election, but they are
pertinent to todays evaluations of forecast accuracy then.
In short, in as much as the estimated, realized values of
many economic variables are subject to multiple revisions,
the measurement of forecast accuracy for these variables
is not simply a matter of applying once, or even twice,
some formula to measure the prediction error.
It should be apparent from what has been said so far
that economic prediction, as an activity, involves inherent
complexities that can far exceed those associated with
other types of prediction. For example, the prediction
of the relative movement of the planets around the
sun, based on the laws of physical motion, differs in nature
from economic prediction in several ways. Most fundamentally, the prediction of planetary motion obviously has
no effect on the motion: what is predicted is independent
of the subject of the prediction. However, in addition,
once the prediction is made, the process of evaluating
the accuracy of that prediction involves also a much
less complex measurement problem. The inference to
be drawn from these considerations is not that the laws
of physical motion are better understood than is the nature of economic processes, although this could be true,
but rather that, in the case of large-body physical motion,
the predictive context is a much simpler context. Notice
also that it is only at this point that any specific mention has
been made of the qualitative basis on which an economic
forecast might be madethat is, the model or other representation of the economic process that generates the
values that are predicted. Obviously, as a matter of scientific consideration, the adequacy of this representation
is important and this issue will need to be investigated
Economic Forecasts
Methodology of Economic
Prediction
The preceding discussion presumes a general familiarity
with the concept of a forecast, but so far, neither have the
specific methods of making economic predictions been
considered nor have many of the terms used been defined.
Considered generally, the methods commonly used to
make economic forecasts range from those that have no
apparent logic, to those that possess a logic that can, at
least at first sight (or even after careful evaluation), appear
to be farfetched, to defensible statistical methodologies
that may be capable of being generally replicated and may
or may not be grounded in economic theory. The first type
of method includes simply guesses that can be made by
anyone, in much the same sense that it is not necessary to
be a meteorologist to predict that it will rain tomorrow;
furthermore, a given prediction of the weather might or
might not be made with reference to any immediately
available meteorological evidence. An example of the second type of methodology is the sunspot theory of the
economist and logician, William Stanley Jevons (1835
1882). Jevons attempted to relate economic activity
particularly the trade cycleto the observed pattern of
solar flares; on first consideration, this theory appears
quite tenuous, but it gains a degree of logic, particularly
for an agricultural economy, from the idea that crop yields
can be affected by changes in the intensity of the suns
energy. The well-known argument put forward by
Thomas Malthus at the beginning of the 19th century,
that bare subsistence would be the inevitable lot of
mankind in general, because the food supply increases
arithmetically whereas population growth is exponential,
is also a type of forecast. Its logical methodology relies
743
744
Economic Forecasts
i 6 1,
task. Operationally, in this case, simply knowing the pattern of past price behavior does not provide a valid basis
on which to predict future price behavior. In contrast, if it
is true that, instead, a1 5 1, the current value of the
variable Yt depends systematically on both its previous
values and the random effects of the disturbance term,
ut. Nonzero values of a0 and the other ai constants (i 2,
3, . . . , k) will affect the particular pattern of the behavior
of the variable Yt over time, but clearly do not change the
basic assertion that future values of this variable are
systematically related to its past values. Obviously, in practice, a number of issues arises, including such things as the
value of k, the maximum number of lagged values of Yt,
and which of the ai take nonzero values, as well as what
specific values. In addition, the properties of the disturbance term can be characterized in a variety of
ways, leading to much more complex mathematical
statistical expressions as potential models, including the
approach taken by George Box and Gwilym Jenkins. To
examine, even in general terms, the various aspects of this
type of statistical model would easily take several hundred
pages, and there are many books devoted to the ways in
which time-series variables can be characterized in order
to then attempt to forecast their future values. A further
extension is to consider not one single variable Yt, but
instead a set of economic variables Y1t, Y2t, . . . , Ynt in
order to form a multivariate forecasting model, but this
extension does not change the nature of the methodology
variable by variable, if each time series is treated as being
independent of the others as a process.
All such characterizations of the behavior of time series
are based on the idea that it is a fixed pattern of past
behavior of the variables that provides the means to produce forecasts of future values. This is not to say that the
potential patterns of behavior cannot be very complexly
represented: practically any regular, and even some seemingly highly irregular, patterns of behavior can be modeled, in principle. However, these patterns will depend
on the specific values taken by the constants, a0 and
the ai (i 1, 2, . . . , k), as experimentation with a graphing
calculator will easily demonstrate. Moreover, as indicated
previously, for a real-world example, these values are a
priori unknown. Consequently, they must be estimated
using appropriate statistical estimation methods, given
the observed past values of the Yt (or the set
of variables Y1t, Y2t, . . . , Ynt). The capability to estimate
these reliably depends on, among other things, the available quantity (and their quality as measurements) of past
observations. Furthermore, during this estimation process, tests must also be made to assess the validity of
any hypotheses adopted concerning certain critically
important values, such as which, if any, of the ai have
a value of zerothe occurrence of which particular
value evidently implies that certain lagged values of Yt
should not appear in the estimated equation. And, of
Economic Forecasts
course, operationally, as a forecasting method, this methodology obviously relies on the persistence of the estimated behavioral patterns over time, at least
throughout the time for which the forecasts are made.
For its validity, therefore, the methodology does not necessarily require any knowledge as to why economic
variables behave as they do, but this is not to say that it
does not require instead a possibly immense amount
of acquired knowledge concerning the representation
of the patterns of that behavior, and not a little faith
that these patterns will persist in the future.
Forecast methodologies of the type just briefly described clearly have the property of requiring a statistical
characterization of the variable to be predicted. For this
reason, they are commonly employed in those instances in
which the person making the forecast either lacks knowledge as to why a particular economic variable might behave in a particular way, or lacks the time to attempt
a particular detailed study of the whys and wherefores
of that behavior. One of the reasons to adopt the approach
in a particular case might be the need to provide forecasts
for a substantial number of variables. For example, an
employee of large firm, particularly a firm producing
a variety of products that are sold in various different
markets or in a range of container sizes, might choose
to employ a statistical methodology of this type as
a sales forecasting technique. This decision may be
buoyed, in part, by the belief that among many forecasted
variables, the prediction errors made might, in the
aggregate, cancel to at least some degree, in the end providing a satisfactory composite result. As a normative
consideration, it is important to recognize that the choice
between methodologies can depend on a great variety of
circumstances, so that the particular choices made do not
always need to be justified on the basis of individual predictions. It is, for example, difficult to predict the length of
a given human life, but the life expectancy characteristics
of large populations are known to be quite predictable, at
least under normal circumstances.
As just indicated, the choice of methodology for economic forecasts is obviously, in part, a question of the
particular application, as well as what is meant by an
economic forecast. Prior to the work in the later 1930s
of Jan Tinbergen, who is generally credited with the construction of the first macroeconometric models, economic
forecasts almost always took the form of predictions of
individual prices, as well as a variety of other individual
economic, generally microeconomic, variables. Beginning
in about 1954, with the creation of the first macroeconometric model to be used to make economic forecasts,
known as the KleinGoldberger model, the prediction
of the performance of entire economies began to take
center stage. It is such macroeconomic forecasts that
many people today commonly consider to be economic
forecasts. These predictions have the characteristic that
745
746
Economic Forecasts
Economic Forecasts
747
748
Economic Forecasts
Economic Forecasts
Conclusion
Economic prediction, as a subject on its own, might be
approached by first considering, in particular, the question of the degree to which a particular predictive model is
or is not a good representation of the economic process
being modeled. The second issue addressed might then
be the degree to which the characteristics of the process
remain essentially the same over time. If the model is
inherently a good representation and if the characteristics
of the process do not change between the past and the
future, it might logically be expected that any attempt at
prediction would be likely to yield good results. However,
here, these topics have only been considered toward the
end of the narrative. The reason is that when the prediction of economic variables is considered as a measurement problem, there may be certain other aspects of this
particular context that also need to be taken into account
and given an equal weight.
It has been accepted, as a first principle, that any forecast of the future must be rooted in the observation of the
past. Whereas a single guess about the future can possibly
be realized independently of such observation, as a matter
of consistent performance, it is only to the degree that the
past provides the raw materials of the future that scientifically based prediction is possible. In this context, it is
749
Further Reading
Bodkin, R. G., Klein, L. R., and Marwah, K. (1991). A History
of Macroeconometric Model-Building. Edward Elgar,
Brookfield, VT.
Box, G., and Jenkins, G. (1984). Time Series Analysis:
Forecasting and Control. Holden Day, San Francisco, CA.
Clements, M. P., and Hendry, D. F. (1998). Forecasting
Economic Time Series. Cambridge University Press, Cambridge.
Hogg, R. V., and Craig, A. T. (1978). Introduction to
Mathematical Statistics. Macmillan, New York.
Keynes, J. M. (1936). William Stanley Jevons 18351882:
A centenary allocation on his life and work as economist
and statistician. J. Roy. Statist. Soc. 99(3), 516555.
750
Economic Forecasts
Economics, Strategies in
Social Sciences
Marcel Boumans
University of Amsterdam, Amsterdam, The Netherlands
Glossary
accurate measurement A measurement in which the
difference between a true value of a measurand and the
measurement result is as small as possible.
calibration The adjustment of the model parameters to
obtain the best match between stable properties of the
set of observations and those of the model generated
outcomes.
filter A model that accomplishes the prediction, separation,
or detection of a random signal.
graduation The process of securing, from an irregular series
of observed values of a variable, a smooth, regular series of
values consistent in a general way with the observed series
of values.
index number A measure of the magnitude of a variable at
one point relative to a value of the variable at another.
passive observation An observation of a quantity, influenced
by a great many factors that, in two ways, cannot be
controlled: it is not possible to insulate from those factors
that fall outside the theoretical domain, and those factors
falling within the theoretical domain cannot be manipulated
systematically.
precise measurement A measurement in which the spread
of the estimated measurement errors is a small as possible.
Introduction
To discuss and compare various measurement strategies
in economics, it is helpful to have a framework to indicate
more precisely the kinds of problems encountered in economics and the kinds of strategies developed to treat
them. To make comparisons of these, at first sight,
quite different strategies possible and transparent, the
scope of these strategies is reduced to the common aim
of finding the true values of the system variables, denoted
by the (K 1) vector xt. (see Table I). Throughout this
article it is assumed that these system variables are independent. Because all systems discussed are dynamic systems, t denotes a time index. For all strategies, it is
assumed that xt is not directly measurable. In general,
the value of xt is inferred from a set of available observations yti (i 1, . . . , N), which always involve noise eti:
yti f xt eti :
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
751
752
Table I
Symbols
Symbol
K
N
t
xt
yt
^xt
^xtj t1
OC
M
Tt
et
Zt
e0t
etM
At
Ht
Ft
Q
R
kk
Description
Number of system variables
Number of observations
Time index
(K 1) vector of system variables
(N 1) vector of observations
Measurement of xt
Estimate of xt made at time t using the
information available up to time t 1
Background conditions
Measurement model
Time domain relevant for the measurement of xt
(N 1) vector of observation errors
(K 1) vector of system noise
(K 1) vector of measurement errors
(K 1) vector of estimated measurement errors
(K N) matrix of model parameters (atij)
(N K) observation matrix
(K K) system matrix
Variancecovariance matrix of Zt
Variancecovariance matrix of et (s2tij)
Norm
s [ Tt ; a1 , a2 , . . .
Mx xs , s [ Tt ; a1 ; a2 ; . . .
Me es , s [ Tt ; a1 , a2 , . . . :
Dy Df x, OC
K
X
qf
i1
qxi
Dxi
qf
DOC:
qOC
A necessary condition for ^xt to be a (indirect) measurement of xt is that model M must be a representation of
the observation equation [Eq. (1)], in the sense that it
must specify how the observations are related to the
values to be measured. This specification also implies
a specification of the error term. A common feature of
the strategies is that they all are developed to deal with
the problem of taking measurements under circumstances that cannot be controlled. To explore this
problem of (lack of) control, it is useful to rewrite
Eq. (1) as an empirical relationship, f, between the
observations y, the system variables x, and background
conditions OC:
y f x, OC,
Invariance Problem
To find out about qf/qxi and whether it is stable for, and
independent of, variations in x and other conditions, the
aim, whenever possible, is to carry out controlled experiments. To simplify the notation, qf/qxi will from now on
be denoted by fxi . The idea behind a controlled experiment is to create a specific environment, a laboratory, in
which the relevant variables are manipulated in order to
take measurements of particular parameters, with the aim
to discover the relationship, if any, between these
variables. In a laboratory, a selected set of factors
would be artificially isolated from the other influences;
in other words, care would be taken that ceteris paribus
DyCP
,
Dxi
Dxj 0
j 6 i:
Dy
DOC
fOC
,
Dxi
Dxi
753
Error Problem
In general, it is desirable that the measurement results, ^x,
are accurate. Accuracy is a statement about the closeness
of the agreement between the measurement results and
the true values of the measurand. Closeness means that
the results should be concentrated near the true value x
in the sense of location and spread: It might be required
that measurement results have their mean close to x (i.e.,
are unbiased) and have little spread (i.e., are precise).
Precision is a statement about the closeness of the individual measurement results of the same measurement
procedure. The difference between precision and unbiasedness is illustrated in Fig. 1 by an analogy of measurement with rifle shooting. A group of shots is precise when
the shots lie close together. A group of shots is unbiased
when it has its mean in the bulls-eye.
In an ideal laboratory, accurate measurements can be
achieved by controlling the circumstances such that
DOC 0. Then y would be an accurate measurement
of x, in the sense that y is an observation of x without
noise. Whenever it is not possible to control the circumstances, which is usually the case in economics, accurate
754
11
s[Tt
and
Eet 0:
16
E Zt eTs 0
for all t, s:
18
unbiasedness), or VareM
t 5Varet (increasing precision). Because it is assumed that E[et] 0, accuracy will
generally be obtained by aiming at precisionthat is,
reducing VareM
t . In other words, most error-reducing
strategies discussed herein are based on the principle of
MT
minimizing the squared error keM
t et k, where k k is
a norm that each strategy is defined in a specific way.
Because unbiasedness is assumed, these strategies aim
at precision.
An example of an old and very simple strategy to
achieve accuracy is by taking the average. In this strategy, it is assumed that all errors taken together
will nullify each other. Suppose
that yi x ei
P
(i 1, . . . , N), then ^x 1=N N
i1 yi is considered to
be an accurate
P measurement of x because it is
assumed that N
i1 ei 0.
Econometric Modeling
In general, economists believe that theory can solve the
problem of invariance; theory tells us which factors have
potential influence and which do not. One of the reasons
for believing this is that Haavelmo, who had advanced this
problem, had pointed out the possibility that the empirically found relationships may be simpler than theory
would suggest. This could lead researchers to discard
potential influences that could explain shifts in these
relationships.
A standard econometric textbook describes the basic
task of econometrics: to put empirical flesh and blood
on theoretical structures. This involves three steps of
specification. First, the theory must be specified in explicit functional form. Second, the econometrician
should decide on the appropriate data definitions and
assemble the relevant data series for the variables that
enter the model. The third step is to bridge theory and
data by means of statistical methods. The bridge consists
of various sets of statistics, which cast light on the
validity of the theoretical model that has been specified.
The most important set consists of the numerical
estimates of the model parameters, A. Further statistics
enable assessment of the precision with which these
parameters have been estimated. There are still further
statistics and diagnostic tests that help in assessing the
performance of the model and in deciding whether to
proceed sequentially by modifying the specification in
certain directions and testing out the new variant of the
model against the data.
Predictive Performance
With the generally shared belief among economists that
theory functions as a reliable guide, one of the strategies of
attaining precise measurement is the method of adding
^xIt1,i
K
XX
aIsij ysj
and
st j1
^xII
t1,i
K1
XX
aII
sij ysj
and
st j1
xII
eII
t1,i kyt1,i ^
t1,i k i: 1, . . . , K 1;
where each observation ysj is a proxy variable of the
unobservable xsj, thus N K and Ht I (the identity
I
matrix): ysj xsj esj. If eII
t1,i 5et1,i for the majority of
these error terms (i: 1, . . . , K), choose model II. Note
that for each additional variable, the model is enlarged
with an extra (independent) equation. As a result, the
prediction errors are assumed to be reduced by taking
into account more and more variables.
In this strategy, it assumed that the estimation techniques lead to the true system parameters: At Ft.
X
X
X
xt1
As xs Zs
As xs
Zs
st
st
st
As xs Z,
20
755
23
st
22
^xNI
t1,i yt,i ;
^xNII
t1,i yti yti yt1,i :
p
X
s1
asi yt s,i
q X
K
X
s0 j1
j6i
asj yts,j ,
24
756
Calibration
25
26
27
28
29
30
31
757
32
Methods of Graduation
London has defined graduation as the process of securing,
from an irregular series of observed values of a continuous
variable, a smooth regular series of values consistent in
a general way with the observed series of values. This
method assumes that some underlying law gave rise to an
irregular sequence of values, that these should be revised,
and that the revised sequence should be taken as a representation of the underlying law. There are two main
classes of graduation methods: the moving weighted
average method and the WhittakerHenderson method.
m
X
as yts :
33
s m
34
^xt
s m
as xts
m
X
s m
as ets :
as ets ,
37
s m
Defining
R20
m
X
a2s ,
39
m
Var^xt
:
Varyt
40
VarDz^xt
:
VarDz yt
41
m
X
eMWA
35
VarDz^xt s2
m
X
m z
Dz as 2 :
42
758
WhittakerHenderson Method
According to the WhittakerHenderson method, graduation should be such that two criteria are taken into account. First, the graduated values should not deviate
overly much from the original observed values. This criterion is referred to as fit. A numerical measure of fit
might be defined as follows:
X
Fit
ws ^xs ys 2 ,
44
s
where parameter z establishes the degree of polynomial (z 1) inherently being used as a standard of
smoothness.
The WhittakerHenderson method adopts the numerical measures of fit and smoothness defined by
Eqs. (44) and (45), and combines them linearly to produce Eq. (46):
X
X
2
G Fit lSmo
ws ^xs ys l
Dz^xs 2 :
s
46
The minimization of Eq. (46) is referred to as a Type B
WhittakerHenderson graduation; a Type A graduation
is the special case of Eq. (46) in which ws 1 for all s.
When l 0, then, because Fit 0, G is minimized at
Fit 0, which implies ^xs ys for all s, the no-graduation
case. Thus, in general, as l approaches 0, ^xs approaches
ys, and fit is emphasized over smoothness. Conversely, as
l is set very large, the minimization process inherently
emphasizes Smo to overcome the influence of the large
l. This, in the limiting case, constrains ^xs toward the leastsquares polynomial of degree z 1, thereby reducing
the magnitude of Smo, and securing a least-squares fit.
Type A with the linear trend as the standard of smoothness (z 2) is in macroeconomics better known as the
Kalman Filter
A handbook definition of Kalman filtering is a method of
predicting the behavior of a signal xt given observations yt
that are subject to error et, so that yt Htxt et. The term
filtering refers to the removal as much as possible of the
error term et to give a prediction of the true signal xt.
A prerequisite to the application of the Kalman filter is
that the behavior of the system under study be described
by a system equation:
xt1 Ft xt Zt :
47
48
and
e0tjt1 ^xtjt1 xt ,
49
50
51
52
53
54
and
56
which is the variancecovariance matrix of the measurement errors corresponding to the optimum value of
Kt given by Eq. (57). This equation also provides
a convenient computational means of updating the
variancecovariance matrix of the system variables to
take account of the observation made at time t.
In accord with Eq. (47), the forecast at time t of the
state at time t 1 is taken to be
55
58
^xt1jt Ft^xtjt :
59
60
61
Index Numbers
An index number measures the magnitude of a variable at
a point relative to its value at another point. Here, the limit
is to comparisons over time. The values of the variable are
compared with the value during a reference base period
yt
759
Figure 2 The recursive Kalman filter cycle. Adapted from Welch and Bishop (2002).
760
C u, yt minq yt qt : U qt u :
64
Then the cost of living index is defined as follows:
C u, yt
:
x0t u
65
C u, y0
If U is known, the cost function C can be constructed
and thus the cost of living index x0t(u). However,
generally, U is not known. Thus x0t(u) is estimated by
developing bounds that depend on observable price and
quantity data but do not depend on the specific
functional form of U. To obtain these bounds, it is
assumed that the observed quantity vectors for the two
periods, qi (i t, 0) are solutions to the cost minimization problems; i.e., assume
yi qi Cui , yi :
66
67
^xP0t 4^xL0t ,
68
69
then
Further Reading
Gourieroux, C., and Monfort, A. (1996). Simulated-Based
Econometric Methods. Oxford University Press, Oxford.
Haavelmo, T. (1944). The probability approach in econometrics. Econometrica 12(suppl).
Hodrick, R., and Prescott, E. (1997). Postwar U.S. business cycles:
An empirical investigation. J. Money Credit Banking 29, 116.
Jazairi, N. (1983). Index numbers. In Encyclopedia of Statistical
Sciences (S. Kotz and N. Johnson, eds.), Vol. IV, pp. 5462.
Johnston, J. (1984). Econometric Methods. McGraw-Hill,
Singapore.
London, D. (1985). Graduation: The Revision of Estimates.
ACTEX, Winsted, Connecticut.
Lucas, R. (1976). Econometric policy evaluation: A critique. In
The Phillips Curve and Labor Markets (K. Brunner and
A. Meltzer, eds.), pp. 1946. North-Holland, Amsterdam.
OConnell, P. (1984). Kalman filtering. In Handbook of
Applicable Mathematics (W. Ledermann and E. Lloyd,
eds.), Vol. VI, pp. 897938.
Pagan, A. (1987). Three econometric methodologies: A critical
appraisal. J. Econ. Surveys 1, 324.
Sydenham, P. (1979). Measuring Instruments: Tools of
Knowledge and Control. Peter Peregrinus, Stevenage.
Welch, G., and Bishop, G. (2002). An Introduction to the
Kalman Filter. Available on the Internet at http://
www.cs.unc.edu
Zellner, A. (1994). Time-series analysis, forecasting and
econometric modelling: The structural econometric modelling,
time-series analysis (SEMTSA) approach. J. Forecast. 13,
215233.
Glossary
calculus of variations The calculus concerned in general
with the choice of a function so as to maximize (or
minimize) the value of a certain integral. An example is
finding a curve that joins two fixed points in a plane with
the minimum distance.
contract curves As introduced by Edgeworth, a mathematical
investigation into the degree to which matters such as utility
functions and boundary conditions restrict the possibility of
exchanges between people.
hedonism An ethical theory of conduct whose criterion is
pleasure. Such happiness, however, is nowadays seen as
that of society rather than that of an individual person.
index numbers Numbers used in the measurement of things
such as relative price levels and the changes in price levels.
They can be calculated over months or even years, attention being focused on their variation in time or space.
subjective probabilities Thenumericalrepresentationanindividuals own beliefs in the chances of occurrence of events.
utility A number indicating a things value or the numerical
representation of individuals preferences.
Francis Ysidro Edgeworths writings ranged from comments on Matthew Arnolds interpretation of Butler to
the statistical law of error, including items as different as
the theory of banking, the behavior of wasps, examination statistics, index numbers, utility, and the theory of
correlation and of the multivariate normal distribution.
Edgeworth made important contributions to the moral
sciences, economics, probability, and statistics. He was,
in the best sense of the word, a scholar.
Genealogy
Francis Ysidro Edgeworth, born on February 8, 1845,
in Edgeworthstown, County Longford, Ireland, was
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
761
762
Edgeworths Works
Edgeworths work is neither easily nor comfortably compartmentalized; even Edgeworth himself, in preparing his
Papers Relating to Political Economy for publication, admitted to difficulty in classifying his economics papers
appropriately (he ended up placing them in the following
sections: value-and-distribution, monopoly, money, international trade, taxation, mathematical economics, and reviews). The early work in ethics (and mathematics) flowed
naturally into that in economics and that in turn gave rise
to the work in probability and statistics.
Writers on Edgeworth have usually, at least until relatively recently, made much of the difficulty and obscurity
of his style, qualities that to a large extent have been
responsible for the comparative neglect (which Stiglers
1978 paper did a great deal to remove) of his work. Part of
the problem lies in his terminology and notation. For
example, in considering a random variable X having
a normal distribution with
1
Pa X b p
2ps2
b
a
Economics
Edgeworths first contribution to political economy,
Mathematical Psychics: An Essay on the Application of
Mathematics to the Moral Sciences, was published in
1881. Containing the application of mathematical techniques to economics and sociology, this work may be seen
as an extension of the New and Old Methods of Ethics.
Here Edgeworth not only presented, for the first time, his
contract curves, but also declared his intent to attempt
to illustrate the possibility of Mathematical reasoning
without numerical data . . . ; without more precise data
than are afforded by estimates of quantity of pleasure.
He also suggested an analogy between the Principles
of Greatest Happiness, Utilitarian or Egoistic and the
763
favor of equality contrasted with an approach using probability and discussed the connection between assumptions
of equal frequency and those of equal utility. Edgeworth
was perhaps not altogether successful in what he attempted to do here, although he concluded by writing
There is established then, upon a rough yet sufficiently
sure basis, the calculation both of Probability and Utility;
and the more fundamental of these is Utility.
In his obituary for Edgeworth, Keynes wrote,
[Metretike] is a disappointing volume and not much
worth reading (a judgment with which I know that
Edgeworth himself concurred). Although Keynes may
have been right in his statement, later work by
F. P. Ramsey in the 1920s showed the fecundity of
Edgeworths ideas; however, the simultaneous axiomatization of probability and utility had to wait until
L. J. Savages work in the 1950s. The statistician may
be interested to note that we also find here something
similar to R. A. Fishers advocation of the rejection of
a hypothesis if an observation equal to or greater than
some specified value is obtained.
Edgeworths early thoughts on probability, prompted
by John Venns The Logic of Chance, were published in
his The Philosophy of Chance in Mind in 1884. Here
he examines the metaphysical roots rather than the mathematical branches of the science. Probability is described
as importing partial incomplete belief ; also, the object
of the calculus is probability as estimated by statistical
uniformity.
In 1922, in a paper again in Mind and with the same
title as that just discussed, Edgeworth gave his final
thoughts on probabilities, this time in response to
Keyness A Treatise on Probability. Here he makes the
important observation, often ignored by modern writers,
that there are applications in which the use of a` priori
probabilities has no connexion with inverse probability.
Here he also discusses some objections to commonly
made assumptions about a priori probability and
shows that these are of little practical importance in
the theory of errors. Once again he notes that the effects
of the a priori probabilities are very often masked by
evidence conveyed by observations when the latter are
sufficiently numerous.
Another of Edgeworths wide-ranging and important
papers is his article on probability published in the 11th
edition of the Encyclopdia Britannica. This article,
a summary not only of Edgeworths original work but
also of that of other statisticians, covers topics such as
the probability of causes and future effects, the measurability of credibility, the binomial distribution, the rule of
succession, Buffons needle problem, the normal law of
error and other laws, regression, and correlation. We
find here too, in his discussion of prior probabilities,
the statement in general the reasoning does not require
the a priori probabilities of the different values to be
764
Applications
The third volume of McCanns 1996 collection is devoted to
Edgeworths papers on applications in economics, the social sciences, physics, chemistry, biology, education
(mainly to do with examinations, and perhaps sometimes
carried out with his paternal grandfather and aunt in mind),
and psychical research. Even this classification by no means
covers all the topics; there is a paper detailing the behavior
of various species of hymenoptera, based on observations
made over 11 years, and a single paper is concerned with
matters as widespread as bimetallism, bees, and the preponderance of one sex in our nurseries and one party in our
parliaments. We also find in the more applied papers contributions to analysis of variance, stochastic models, multivariate analysis, and (multiple) correlation.
Conclusion
Edgeworth occupied various honorable positions in
learned societies; he was, at various times, president of
Section F (Economics) of the British Association for the
Advancement of Science, president of the Royal Statistical Society, vice-president of the Royal Economic Society,
and a fellow of the British Academy. Edgeworth was the
founding editor of The Economic Journal and essentially
retained this post from March 1891 until his death in 1926.
There is no complete bibliography of Edgeworths
writings. In 1928, Bowley published an annotated bibliography (in itself, no mean feat), with summaries of the
papers (and reviews) grouped in a number of classes, of
most of Edgeworths writings on mathematical statistics.
Further lists, with considerable overlap, may be found in
the collections by Edgeworth (1925) and McCann (1996).
What sort of man was Edgeworth? Most of those who
wrote about him after his death did not know him as
a young man. Arthur Bowley, long-time friend and sole
disciple of Edgeworth, became acquainted with him
only in his middle-age. Beatrice Webb (then Potter) described him as gentle natured . . . excessively polite . . . diffident, and Keynes, joint editor for some time
with Edgeworth of The Economic Journal, found him
to be modest . . . humorous . . . reserved . . . angular . . .
proud . . . unyielding.
The many facets of Edgeworths personality were perhaps best summarized by Alfred Marshall, who said
Francis is a charming fellow, but you must be careful
with Ysidro; nevertheless, I think I should like to have
known them both.
Further Reading
Bowley, A. L. (1928). F. Y. Edgeworths Contributions to
Mathematical Statistics. Royal Statistical Society, London.
Edgeworth, F. Y. (1925). Papers Relating to Political Economy.
3 vols. Macmillan, for the Royal Economic Society, London.
Keynes, J. M. (1972). The Collected Writings of John Maynard
Keynes. Vol. X. Essays in Biography. Macmillan, for the
Royal Economic Society, London.
McCann, C. R., Jr. (ed.) (1996) F Y. Edgeworth: Writings in
Probability, Statistics and Economics. 3 vols. Edward Elgar,
Cheltenham, U.K.
Mirowski, P. (ed.) (1994). Edgeworth on Chance, Economic
Hazard, and Statistics. Rowman & Littlefield, Lanham, MD.
Savage, L. J. (1954). The Foundations of Statistics. John Wiley
& Sons, New York.
Stigler, S. M. (1978). Francis Ysidro Edgeworth, statistician.
J. Royal Statist. Soc. 141, 287322.
Glossary
achievement test A sample of an examinees behavior,
allowing inferences about the extent to which the examinee
has succeeded in acquiring knowledge or skills in a content
domain in which the examinee has received instruction.
adaptive test A sequential form of individualized testing in
which successive items in the test are chosen based on
psychometric properties and test content, in relation to the
examinees responses to previous items.
aptitude test A sample of an examinees behavior; intended
to allow inferences about how successful the examinee will
be in acquiring skills.
certification Voluntary process by which examinees demonstrate some level of knowledge or skill in a specified area,
typically an occupational area.
constructed response item An exercise for which examinees
must create their own responses or products, rather than
choose a response from a specified set of responses.
criterion-referenced test An assessment instrument that
allows its users to make score interpretations as to how well
an examinee has performed with respect to the content
measured by the test. Such tests often provide an indication
as to where the examinee is with respect to a specified
performance level or cut score. These interpretations can
be distinguished from those that are made in relation to the
performance of other examinees.
formative evaluation Concerned with judgments made
during the administration of an instructional program;
directed toward modifying or otherwise improving the
program or the status of students taking the program before
it is completed.
licensing The granting, usually by a governmental or supervisory agency, of authorization or legal permission to
practice an occupation or profession.
norm-referenced test An assessment instrument that allows
its users to make score interpretations in relation to the
performance of a defined group of examinees, as distinguished from those interpretations made in relation to
a specified performance level.
Introduction
The use of tests is pervasive in the field of education.
Before children attend kindergarten, tests may be
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
765
766
administered to them to assess their readiness to undertake the normal activities of their age group. Tests may
be administered to examinees who have completed
their educational programs and want to engage in
teaching others. Tests may be given to senior citizens
interested in getting their high school equivalency diplomas. However, by far, the bulk of the testing done in the
educational arena is related to the evaluation of student
progress, as when a teacher has completed a unit of instruction and wants to see if the students have sufficiently
mastered the content. There are, however, many other
uses made of tests in the field of education. In the material
that follows, the varieties of tests used in education are
first discussed. Delineation of these varieties helps in the
ensuing discussion of the various uses made of tests in
education. Some attention is then paid to the delivery
modes used with educational tests, given the current
rapid influx of computers into everyday lives around
the world. The final section provides information on
how and where to locate many of the existing tests in
education.
Norm-Referenced and
Criterion-Referenced Tests
Norm-referenced tests are made up of test items that have
the effect of spreading examinees out along a score continuum. An examinees status can then be determined by
seeing where along this continuum the examinee is located in relation to a well-defined group of examinees.
Many of the commercially available achievement tests
given in elementary/secondary schools are of this nature.
Criterion-referenced tests, on the other hand, are not
used to compare a particular examinee to a well-defined
group of other examinees, but rather to provide an indication of how well an examinee has performed with respect to the content measured by the test. Often such tests
provide an indication as to where the examinee is with
respect to a specified performance level. This requires
that a cut score, or passing score, is established so that
the examinees score can be compared to this cut score, to
judge whether the material being tested has been mastered. With criterion-referenced tests, items are selected
that measure as well and as completely as possible the
domain being tested. However, if the passing score has
been set prior to actual test construction, items may be
selected to be content representative and to separate
students at the passing score.
With the expanded amount of testing now being done
on a statewide level in the United States, and with recent
federal mandates with respect to statewide testing,
a new kind of testing, standards-based testing, is often
described. Standards-based tests are, however, really
just a type of criterion-referenced test. They are used
to determine the extent to which examinees have mastered state content standards, indicating what students
should know or be able to do at different grade levels.
Such tests are not intended to be used to compare the
rank ordering of one examinee with another. However,
normative information is often used for groups of examinees, as when the percentage of proficient students in
a particular school is compared with the percentage in
a district or state. Critical to this sort of testing is the
degree of match of content specifications, and resulting
test content, to the state standards. With many commercially available achievement tests of a norm-referenced
nature, a suitable level of match does not exist, and many
states have consequently engaged in the construction of
standards-based tests particular to the given state.
be used to measure attitudes. A test of high school geometry qualifies as a cognitive test. A test in physical education that measures how adept students are on the parallel
bars qualifies as a psychomotor test. Finally, an activities
questionnaire, which assesses students preferences for
in-school, out-of-class activities, qualifies as an affective
measure. Typically, tests are said to measure only one of
the cognitive, psychomotor, or affective domains, but
sometimes the lines of demarcation are not completely
clear and sometimes a test can measure more than one of
the domains. For example, a test of the ability to speak
a foreign language can be viewed as both a cognitive and
a psychomotor test.
Psychomotor tests and affective measures typically
have problems not usually associated with cognitive
tests. Psychomotor tests are usually more costly and
time consuming to administer, and because test tasks
are typically scored by human raters, may be less reliable
than cognitive tests are. Affective measures suffer from
the problem that students often select the most socially
acceptable response, rather than the response that best
reflects truth. In addition, depending on the nature of the
affective measure, students or other stakeholders may
view the questions as an invasion of privacy.
767
768
Diagnosis
Tests are used extensively to evaluate the special needs of
individual students, both in planning for individualized
educational programs and for determining the need for
special services, such as individualized counseling sessions. For the former use, the tests are typically of the
variety that measures basic academic skills and knowledge
in the cognitive domain. It is clearly better if the tests are
criterion referenced in nature, with a close match between the test content and the basic skills, but sometimes
norm-referenced tests can prove useful. For the latter use
(that is, determining the need for special services), measures employed often assess within the affective domain.
However, intelligence quotient (IQ) tests, which measure
within the cognitive domain, may also be employed. With
these tests, which are typically standardized and administered on an individual examinee basis, special skills are
often needed to interpret the results. Hence, such tests
are typically administered, and the scores interpreted, by
school or clinical psychologists.
769
Self-Discovery
Placement
Selection
Selection tests measure within the cognitive domain and
are used to select individuals for admission to an institution or special program. Such tests are standardized and
almost always are of a selected response nature. An example is the Scholastic Achievement Test (SAT) I Reasoning Test, which is a test of developed verbal and
mathematical skills given to high school juniors and seniors in a group-based setting. Another example is the
Academic College Test (ACT) assessment battery, a set
of standardized achievement tests of cognitive abilities
also given in a group setting. Such tests are norm
referenced in nature and are used to rank examinees
on the cognitive attributes being assessed. The test scores
are typically combined with high school grade point average to predict success in college.
For a variety of reasons, selection testing has been
written about extensively. A major reason is because different groups of examinees receive different average
scores on such tests. For example, in the United States,
African-American and Hispanic examinees tend to score
lower than do Caucasian and Asian-American examinees,
and the issue is whether the differences are construct
relevant, i.e., whether the differences are related to
what is being measured and are not due to extraneous
variables. The other major reason for the attention paid
to these tests has to do with the perceived susceptibility
of these tests to coaching (i.e., short-term instruction
on non-construct-related attributes such as test-taking
skills and strategies). Finally, it should be noted that
scores on such tests are used, albeit on an infrequent
basis, to select younger students for gifted and talented
programs. At issue with certain of these programs is
whether this constitutes an appropriate use of the test
score.
The distinction between linear and adaptive testing
comes to the forefront with selection testing because
two prevalent existing admissions testing programs (in
this case, for admission to graduate or professional school)
are computer adaptive in nature. These tests are the Graduate Record Examinations (GRE) General Test and the
Graduate Management Admission Test (GMAT).
770
Monitoring Trends
Monitoring tests are used to evaluate group performance
over time. Such monitoring can be done using either
a cross-sectional approach, whereby the performances
of successive cohorts of students (i.e., fourth graders in
two successive years) are monitored, or longitudinally,
whereby the same cohorts performance is measured
over time (i.e., fourth graders in year one and the same
group of students, now fifth graders, in year two). A good
example of such a test in the United States is the National
Assessment of Educational Progress, which makes use of
a cross-sectional approach.
Program Evaluation
Program evaluation tests have been implemented to see
how well a particular program has been progressing or
functioning. At the district level, for instance, such a test
would be used to see how well students in the schools in
that district are functioning with respect to state curriculum standards. Typically, cut points are set on such
exams, and students are separated into descriptive categories based on their scores, such as proficient or partly
proficient. The grouped results of fourth grade tests in
math and language arts considered in successive years can
provide a macro-level indicator of how well a particular
district has been performing. When tests are used for
program evaluation purposes, strong sanctions are typically not imposed for schools or districts that may not have
met expectations.
Accountability
Accountability testing in education is carried out with the
express purpose of holding schools, districts, and states
accountable for student performance. Such testing
programs set explicit performance standards, often with
some form of reward for those schools or districts in states
that meet standards, and sanctions for those that do not
meet standards. In the spotlight in the United States is
testing of this sort related to the No Child Left Behind
Act, which provides a federal-level imperative that states
must meet certain specified performance standards to continue to receive federal funding (e.g., the states must
771
Sources of Information on
Existing Educational Tests
For most educational testing contexts, likely one or several
commercially available tests can be considered. One exception is in classroom testing for the management of
instruction; a teacher-made test is often preferable in
this case, if only to ensure an exact match between
what has been taught and what is being tested. Even in
this context, however, a commercially available test or an
item pool from which the teacher can build the test may
often prove to be sufficient. There are a number of good
sources for finding out about existing tests and for getting
additional information on any of the tests that may prove
to be of interest. For those individuals interested in
searching or checking online, a good place to start is at
Further Reading
American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. (1999). Standards for Educational and
Psychological Testing. American Psychological Association,
Washington, D.C.
772
Elaboration
Carol S. Aneshensel
University of California, Los Angeles, California, USA
Glossary
extraneous variable Source of spurious covariation between
two other variables; also known as a control variable.
focal relationship The one cause-and-effect type of relationship that is pivotal to the theory being tested.
internal validity The extent to which cause-and-effect-type
inferences can be drawn.
intervening variable Operationalizes the causal mechanism
by linking an independent variable to a dependent variable; a consequence of the independent variable and
a determinant of the dependent variable.
redundancy Covariation due to correlated independent
variables.
spuriousness Two variables appear to covary because they
depend on a common cause.
suppressor variable Conceals a true relationship or makes
it appear weaker.
test factor Variable added to a bivariate analysis to clarify
the connection between the other two variables.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
773
774
Elaboration
Elaboration
Test
factor
Independent
variable
Dependent
variable
Test
factor
Dependent
variable
Figure 3 The test factor as an antecedent variable to an independent variable and a dependent variable. Reprinted from
Aneshensel (2002), with permission.
Test
factor
Independent
variable
Independent
variable
775
Dependent
variable
Revised Multivariate
Elaboration Model
The original development and most textbook discussions
of the elaboration model limit its application to crosstabular analytic techniques for simple two-variable
models stratified on the third variable. Although this
approach serves an expository purpose, it is of limited
value to research applications that employ numerous
variables in one simultaneous multivariate model.
A revised elaboration model has recently been presented,
with application to multivariate analysis. This revision
starts by introducing the concept of the focal relationship,
which is implicit in the original version of the model. The
focal relationship is the one relationship that is pivotal to
the theory being tested. It serves as an anchor for the
remainder of the elaboration analysis, which is dedicated
to evaluating its internal validity. In addition, the revised
model organizes the various test factors of the elaboration
model into two strategies for establishing internal validity:
an exclusionary strategy that rules out alternative explanations, and an inclusive strategy that connects the focal
relationship to a theory-based causal nexus.
The focal relationship is the one relationship of primary significance to the theory being tested, the linchpin
that holds the remainder of the model in place. It has
776
Elaboration
Happenstance
The first possibility to be ruled out is that the two variables
have been observed to co-occur simply by chance. This
possibility is usually addressed with tests of statistical significance, although some researchers use confidence intervals instead. In many applications, this is a simple
regression of the following form:
^ c b f Xf ,
Y
Spuriousness
As already mentioned, with spuriousness, the values of the
two focal variables coincide because they share a common
cause. When a spurious third variable is added to the
analysis, the covariation between the focal independent
and dependent variables is partly or completely eliminated. Third variables that generate spuriousness are
referred to as control variables. For a control variable to
produce spuriousness, it must be related to both the focal
independent variable and the focal dependent variable. If
both of these conditions are not met, the variable cannot
be a source of spurious covariation. To test for spuriousness, one or more control variables (Xc) are added to the
simple regression that was given by Eq (1):
^ c b f Xf b c Xc :
Y
Elaboration
Redundancy
Third variables that generate redundancy can be referred to simply as other independent variables, although
in the language of the original elaboration model, these
are labeled conjoint. In a redundant association, the
other independent variable covaries with the focal independent variable; it also influences the dependent variable. Consequently, the focal independent variable
appears to be related to the focal dependent variable
when this variable is not included in the analysis. As
with control variables, other independent variables
must be associated with both focal independent and
dependent variables to produce redundancy. If these conditions are not both met, then the association between the
two focal variables cannot be due to redundancy. Redundancy is the result of correlated independent variables.
To assess redundancy, the model presented in Eq (2)
is expanded by the addition of another independent
variable (Xi):
^ c bf Xf bc Xc bi Xi :
Y
777
Antecedent Variables
Analysis of the antecedent variable is isomorphic to analysis of control and other independent variables. It logically
follows the exclusionary analysis, but often these variables
are considered in one simultaneous step with control and
other independent variables. The cumulative regression is
^ c bf Xf bc Xc bi Xi ba Xa ,
Y
Intervening Variables
Intervening variables are, by far, the most important of the
inclusive set of variables, because they specify the causal
mechanisms that generate the observed focal relationship. The intervening variable (Xv) is added to the cumulative Y-regression:
^ c bf Xf bc Xc bi Xi ba Xa bv Xv :
Y
778
Elaboration
Consequent Variables
^ c bf Xf bc Xc bi Xi ba Xa bv Xv bx Xf Xv ,
Y
Consequent variables were not part of the original elaboration model, but have been added to the inclusive strategy because they perform a function isomorphic to that of
antecedent variables. Whereas antecedent variables
extend the causal sequence backward to the conditions
producing the focal independent variable, consequent
variables extend the causal sequence forward to the conditions that follow from the focal dependent variable.
Consequent variables help to establish the validity of
the focal relationship by demonstrating that Y produces
effects anticipated by the theoretical model. Consequent
variables logically succeed the dependent variable and
clarify the continuation of the causal sequence within
which the focal relationship is embedded. Like the antecedent variable, the consequent variable does not alter the
focal relationship but rather enhances our understanding
of it. The consequent variable functions as a dependent
variable and is therefore not part of the cumulative
regression previously developed. Instead, it requires an
additional regression equation in which the consequent
variable is the dependent variable, and the focal dependent variable acts as an independent variable, e.g.,
^ c c Yf.
Y
Elaboration: Specification
Thus far, the focal relationship has been discussed as if it is
universal, operating at all times and for all persons. Some
relationships, however, are found only under some conditions and other relationships affect only some types of
persons. Therefore, the final component of the elaboration model concerns specification of the focal relationshipthat is, examining the possibility that the focal
relationship is conditional. A conditional relationship
varies across the values of the test factor. It may be present
for only some values, and otherwise absent, for example,
or positive in sign for a range of values and negative for
6
where bx represents the conditional nature of the
association. Suppose that Xv represents membership in
a subgroup of the population (say, gender), and it is
coded 1 for females and 0 for males. Under these
circumstances, bf is the impact of Xf among males and
the corresponding value is bf bx for females. If bx is
significantly different from 0, it means that the focal
relationship is conditional on gender. In other words,
the analysis has specified the focal relationship for
females versus males. Identifying particular circumstances and specific subgroups that modify the focal
relationship is the final step in testing how well the data
align with theoretical expectations.
Control
variables
Exclusionary strategy
Independent
variables
Independent
variable
Antecedent
variables
Focal
relationship
Intervening
variables
Dependent
variable
Consequent
variables
Inclusive strategy
Elaboration
Summary
To recap, the elaboration model is a method of analysis for
drawing causal inferences from correlational data by systematically adding test factors or third variables to the
analysis of a bivariate association. In multivariate analysis,
two strategies are used to assess the internal validity of the
focal relationship, as illustrated in Fig. 4. In the exclusionary phase of analysis, covariation that is accounted for by
control variables and other independent variables is indicative of spuriousness and redundancy, respectively,
whereas residual covariation may be indicative of relatedness. If no residual covariation remains at the end of
the exclusionary analysis, there is no need to proceed to
the inclusive strategy because there is no relationship
to be elaborated. Otherwise, the analysis turns to
a consideration of the causal system within which the
focal relationship is embedded. The inclusive strategy
elaborates the meaning of the focal relationship. When
expected patterns of association with the focal relationship do indeed materialize, the inference of relatedness is
supported. This analysis entails antecedent, intervening,
and consequent variables. This is an inclusive approach to
779
Further Reading
Aneshensel, C. S. (2002). Theory-Based Data Analysis for the
Social Sciences. Pine Forge Press, Thousand Oaks, CA.
Clogg, C. C., Petkova, E., and Haritou, A. (1995). Statistical
methods for comparing regression coefficients between
models. Am. J. Sociol. 100, 12611293.
Converse, J. M. (1987). Survey Research in the United States:
Roots and Emergence 18901960. University of California
Press, Berkeley, CA.
Davis, J. A. (1985). The Logic of Causal Order. Sage, Newbury
Park, CA.
Kenny, D. A. (1979). Correlation and Causality. John Wiley &
Sons, New York.
Rosenberg, M. (1968). The Logic of Survey Analysis. Basic
Books, New York.
Election Polls
Marc D. Weiner
Rutgers University, New Brunswick, New Jersey, USA
Glossary
benchmark survey An election poll conducted early in the
campaign season that (1) provides a starting point from
which a campaign may measure its progress and/or
(2) informs a campaign or media organization about the
publics attitudes, interests, and preferences on issues
relevant in the campaign.
election poll An information-gathering effort that uses survey
research techniques during a political campaign to assess
public opinion on a particular topic or candidate.
exit poll Election day polls sponsored by media organizations,
conducted as in-person interviews predominantly about
vote choice, outside the polling place immediately after the
voter has cast his or her ballot; exit polls are used to inform
election day news coverage, including analysis of how
particular groups are voting, and projections of winners.
horserace journalism The practice of news reporting predominantly or exclusively on which candidate is winning, as
opposed to reporting on the candidates positions on issues
in the election.
push poll A form of political telemarketing to disseminate
campaign propaganda under the guise of conducting
a legitimate public opinion poll; its purpose is to push
voters toward a particular candidate or issue position.
tracking poll One in a series of election polls that are used
over the course of a campaign to track changes in support
for a candidate, likely voting patterns, and related items of
electoral interest.
trial heat A preelection poll, or part of a preelection poll, that
forces respondents to choose between candidates in order
to assess the viability of a potential candidate or the
immediate strength of an existing candidate.
Introduction
An election poll is an information-gathering effort that
uses survey research techniques during a political campaign to assess public opinion on a particular topic or
candidate. There is a variety of different procedures for
election polls and different organizations and individuals
can sponsor them for several different purposes. Election
polls are distinguished from other instances of survey
research by their relationship to politics, in both their
timing and their content vis-a`-vis a given election. By
permitting political actors and the media to assess public
opinion, these information-gathering exercises can play
an important role in democratic elections.
Whereas the concept and purposes of election polls
are essentially straightforward, they are operationally
complex because the population to be sampleda given
electorateexists only on the given election day. This
makes determining and weighting the sample to accurately
reflect the true voting population for the upcoming election both difficult and fraught with political implications.
In addition, the publication of election polls may have an
effect on the composition and behavior of the electorate,
and so in addition to the operational complications, there
are normative political and ethical considerations.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
781
782
Election Polls
Election Polls
exit-poll sample very accurately reflects the true electorate, in terms of both demographic and partisan composition as well as vote choice. However, media outlets that
report winner projections on the basis of exit polls have
been the subject of bitter criticism on the basis that publishing known election outcomes in real time affects electoral results in voting jurisdictions in later time zones
where the polls have not yet closed.
In the American national context, coordinating the conduct, aggregation, and reporting of in-person interviews
over a 1-day period in a set of voting precincts sufficient to
represent all voting jurisdictions is a logistically enormous
and extraordinarily expensive undertaking. As a result,
since 1990 the major media networks and large news organizations have banded together to form an exit-polling
consortium to conduct nationwide exit polling and winner
projections. This type of consortium takes advantage of the
economy of scale, in terms of both the aggregation of
information and the minimization of costs. For the 1996
presidential election, the consortium conducted approximately 150,000 in-person interviews in 1 day at over 1500
voting precincts across the country. The newest incarnation of the exit-polling consortium, the National Elections
Pool, was formed in 2003 to replace Voter News Service,
the organization that performed the function from 1994
through 2002. Voter News Service, formed in 1994 as an
extension of a similar consortium, Voter Research and
Surveys, was disbanded in 2003 after failing to produce
accurate data on the 2002 midterm election and providing
flawed information on the 2000 presidential election.
Its performance on the 2000 presidential election was
particularly poor, most notably as to the news networks
predictions of the Florida results.
Election polls can also be characterized on the basis of
their method. There are four well-developed basic methods of conducting survey researchperson-to-person
telephone interviews, written interviews by mail, personal
interviews, and person-to-machine telephone interviews
(also known as interactive voice response interviews).
Since telephone interviewing is the quickest and most
immediate method, most preelection polls are conducted
that way, whether sponsored by political actors or media
organizations. On-line Web interviews show promising
potential, but there are still concerns about respondent
accessibility as well as ensuring representative samples.
A comprehensive typology must also consider what is
not an election poll, such as focus groups, push polls, and
the use of simulated polls to market goods or services or
to raise funds for political and nonprofit organizations.
Although focus groups are legitimate research techniques, push polls, marketing and solicitation, and fundraising are not and often serve to undermine the
legitimacy of the polling industry.
Because they are often used in a campaign setting,
focus groups are sometimes considered a form of election
783
polling. The focus group is a qualitative research technique in which a small group of people is brought together
with a facilitator who stimulates and guides otherwise
informal discussion around a given set of topics. The content of the discussion is in the nature of a group interview,
can provide in-depth insight into issues, and can be used
to supplement quantitative research with a level of detail
that could not be obtained through conventional survey
interviewing. Focus groups cannot, however, be considered surveys because they do not employ probability sampling to permit inferences to be made to the larger
population. Although the participants are sometimes selected on the basis of demographic characteristics, because of the size the grouptypically ranging from 8 to
20the results are not generalizable. Focus groups are,
however, useful to campaigns to permit the pretesting of
themes and approaches and to explore how the conversation about issues takes place among potential voters.
Similarly, focus groups can help both campaign and
media pollsters to determine the best content for an election survey and to pretest survey instruments.
Another phenomenon often appearing during election
campaigns is the push poll. The push poll, however, is
not a legitimate poll but rather a campaign technique
designed to affect, rather than measure, public opinion
with regard to a candidate or issue. This technique has
been referred to as both a pseudo-poll and as a negative
persuasion strategy. The American Association of Public
Opinion Researchers described push polling as a form
of political telemarketing [designed] to disseminate
campaign propaganda under the guise of conducting
a legitimate public opinion poll. A push poll appears
to be something that it is not; as such, it is an unethical
practice.
In a typical push poll operation, a campaign approaches potential voters in a manner that simulates
a legitimate public opinion poll and provides negative
information about an opposing candidate or issue. The
push poll sponsor then asks opinion and preference questions using wording designed to push respondents away
from the opposition and toward the sponsors position or
candidate. Although push polling has been used by political parties, individual candidate organizations, and interest groups, it is an insidious activity that undermines
the publics confidence in the both the polling industry
and the electoral process. As a result, several state legislatures, as well as the House of Representatives, have
entertained bills to ban push polling. Constitutional
free speech considerations, however, have prevented
the adoption of such legislation.
Much in the same way that sponsors of push polling use
the legitimacy of the polling industry to sway public opinion, marketing organizations may use the appearance of
a public opinion poll to sell products and services, and
political parties, candidates, and special-interest groups
784
Election Polls
Election Polls
785
786
Election Polls
for Congress and other lower offices in the states that are
not yet finished voting. Supporters of the projected presidential losers party may decide to stay home when they
hear the projections, causing candidates of that party in
the other races on the ticket to lose the coattail support
they would otherwise have had.
In the American context, the possibility of legislation to
prevent the publication of any polls, even if laudable
to protect the electoral process from these effects, is extremely limited by the First Amendments guarantees
against restrictions on freedom of the press and freedom
of speech. Generally, the type of legislation proposed
makes it more difficult either to conduct exit interviews
themselves or to temporally restrict the publication of the
results. In the international context, according to a late
1990s survey by the European Society of Opinion and
Market Research and the World Association for Public
Opinion research, 30 of 78 countries have embargoes
or moratoriums on the publication of election polls.
Typically, these types of restrictions take the form of
publication bans during a specified period prior to the
election, for example, 7 days in France or 3 in Canada
(ESOMAR, 2001).
The second concern speaks to the quality of the democratic dialogue between the candidates. The operative
assumption under this concern is that political parties will
alter the content of their platforms and that candidates
will alter which issues they address and/or the position
they will take on those issues simply to curry favor with the
electorate. Ultimately, an assessment of this concern must
speak to complex normative issues of political representation, i.e., whether or how in a representative democracy
a political leader should be affected by public opinion.
Further Reading
American Association of Public Opinion Research. (AAPOR)
(1997). Best Practices for Survey and Public Opinion
Research and Survey Practices AAPOR Condemns. University of Michigan Press, Ann Arbor, MI.
Asher, H. (2001). Polling and the Public: What Every Citizen
Should Know, 5th Ed. CQ Press, Washington, DC.
Foundation for Information. (2001). Whos Afraid of Election
Polls? Normative and Empirical Arguments for the Freedom of Pre-Election Surveys. European Society for Opinion
and Marketing Research, Amsterdam.
Traugott, M. W., and Lavrakas, P. J. (2000). The Voters Guide
to Election Polls, 2nd Ed. Chatham House, New York.
Traugott, M. W., and Kang, M. (2000). Push Polls as Negative
Persuasion Strategies. In Election Polls, the News Media,
and Democracy (P. Lavrakas and M. Traugott, eds.),
pp. 281 300. Chatham House, New York.
Glossary
confidence level The likelihood that a samples results will
fall within the statistical margin for error of a given sample
size. Most large, national survey samples are calculated so
that they have a margin for error at plus or minus
3 percentage points at the 95% confidence level, meaning
that the results normally would vary within plus or minus
three percentage points 19 out of 20 times.
hypothesis testing A systematic attempt to verify research
assumptions by testing a hypothesis against the null
hypothesis.
measurement error The error introduced into survey results
that comes from question wording, question order, faulty
sampling design, or other logistics. Any error that is not
a statistical sampling error counts as measurement error.
null hypothesis A statement that is the opposite of the
research hypothesis. In hypothesis testing, the researcher
attempts to disprove the null hypothesis before proceeding
to verify the research hypothesis.
population The entire body of units being studiedthe
group from which the sample is drawn. For example,
national presidential election polls attempt to choose
a sample of voters from the population of all those who
will vote in the election.
push poll A persuasive, usually negative, campaign message
delivered over the phone and disguised as a polling call. Its
purpose is to push voters preferences away from one
candidate and toward another.
random digit dialing A technique used by researchers to
ensure that unlisted phone numbers as well as listed
numbers are included in the sample of voters called.
randomness Accurate polling requires a random sample of
voters. Random sampling requires that every member of
the population being studied (for example, voters in a given
jurisdiction) should have an equal chance of being chosen
for the sample.
refusal rate The percentage of people who refuse to
participate in a survey.
sample A systematically selected group of the population
being studied.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
787
788
789
790
that only the Democratic and Republican candidates represent meaningful choices at the ballot box.
message is disguised as a polling call, is downright dishonest. The traditional, large-sample preelection poll,
however, is based on sound social science and statistical
theory. Despite its flaws, the modern telephone poll is the
best, and cheapest, way to get a snapshot of public opinion
in a dynamic election campaign.
Improving Reporting of
Margin for Error
There might be less confusion about the accuracy (or
relative lack of accuracy) in election polls if news organizations more precisely explained how margin for error is
to be interpreted. If, like the New York Times, they explained that each percentage in a two-person race varies
by plus or minus X percentage points, perhaps it would
help news consumers have a more realistic notion of what
polls can and cannot do. One of the problems in this
misinterpretation is the tendency of news organizations
to report that a lead is statistically significant if the
percentage-point difference between the two candidates
levels of support is greater than the statistical margin for
error, given its particular sample size. As in the example
previously given, a 4-point lead in a sample with a 3-point
margin for error does not represent a statistically
791
Further Reading
Asher, H. (2001). Polling and the Public: What Every Citizen
Should Know. CQ Press, Washington, D.C.
Shively, W. P. (1997). The Craft of Political Research. Prentice
Hall, Saddle River, New Jersey.
Traugott, M. W. (2001). Trends: Assessing poll performance in
the 2000 campaign. Public Opin. Q. 65(3), 389419.
Traugott, M. W., and Lavrakas, P. J. (2000). The Voters Guide
to Election Polls. Chatham House Publishers, New York.
Van Evera, S. (1997). Guide to Methods for Students of Political
Science. Cornell University Press, Ithaca, New York.
Epidemiology
William W. Dressler
University of Alabama, Tuscaloosa, Alabama, USA
Glossary
buffering effect A moderating effect in which some
factor reduces the impact of a risk factor on a health
outcome.
cultural constructivism A theoretical orientation in social
science emphasizing the study of subject-defined meaning
of events and circumstances.
generalized susceptibility A condition of heightened risk
of falling ill.
relative risk The probability of disease among a group of
people exposed to a risk factor divided by the probability
of disease among people not exposed to that risk factor.
social epidemiology The study of the distribution of disease
and health-related issues in relation to social factors.
social inequality Differential access to resources within
a society, based on membership in one of several ranked
social statuses.
social integration The degree to which an individual is
linked to others by a variety of social relationships.
qualitative methods Nonnumeric methods of observation
for the description and analysis of meaning and patterns of
association among concepts.
quantitative methods Numeric methods of observation for
the description and analysis of patterns of association
among variables.
Epidemiology: An Overview
The conceptual framework for epidemiologic research
grew out of its original focus on infectious disease. In
that framework, there are three important elements:
(1) the agent, or the virus, bacteria, or parasite responsible
for the disease; (2) the host, or the person infected; and
(3) the environment or setting in which the agent can
flourish and encounter the host. Because of this broad
perspective on people in their environments, some writers
like to refer to epidemiology as a part of a larger field of
medical ecology. This ecological perspective has been
very useful for locating the production of disease in
human-environment transactions.
The nature of the environment, of disease-producing
agents, and of person-environment transactions have
come to be viewed quite differently in the convergence
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
793
794
Epidemiology
Epidemiology
Social Epidemiology
Social Inequality and Health
There is probably no more widely replicated finding in
social epidemiology than the inverse association of socioeconomic status and health outcomes. Across a wide
variety of acute, chronic, and traumatic health outcomes,
those lower in the socioeconomic hierarchy are at higher
risk than those in higher strata. This association has
been replicated widely in western European and North
American societies. There is some evidence that there is
a direct association between socioeconomic status and
disease risk in developing societies, although as a nation
develops economically the association becomes inverse.
The pattern of the association between socioeconomic
status and health outcomes differs only slightly using
different measures of socioeconomic status. In general,
the same results are obtained using occupational class,
income, or education as a measure of socioeconomic
status.
The typical pattern can be illustrated by data collected
on the relationship between occupational class and coronary heart disease mortality for men in the industrial
state of Sao Paulo, Brazil, by Duncan and colleagues
in 1995. Men who are unskilled laborers die at a rate of
57.8/100,00 population from coronary artery disease. For
men in semiskilled and skilled occupations the rate declines sharply to 14.2/100,000. Men in service and professional occupations have death rates of 8.0/100,000
population.
There are several basic issues to be noted about these
results. First, in virtually any epidemiologic analysis, the
inclusion of control variables or covariates is essential.
There are differences in disease rates between genders
and age groups, associations which, under some circumstances, can be of social epidemiologic interest in and of
themselves. In many circumstances, however, these are
confounding variables (or variables that limit the analysts
ability to discern patterns if those variables are left uncontrolled) to be removed from the analysis. Therefore, in
these data, gender is controlled for by including data only
for men, and the data have been adjusted for differences
among men in the age of death.
Second, there is a gradient evident in the relationship.
Whereas there is a sharp decline from unskilled to skilled
occupations, the decline continues from skilled to service and professional occupations. Epidemiologists traditionally refer to this kind of gradient as a dose-response
relationship (i.e., varying doses of a factor lead to corresponding differences in the outcome), which is often considered to be stronger evidence of a causal relationship.
Third, and related to the previous point, these data are
cross-sectional. That is, data were collected from death
certificates, indicating the cause of death and the persons
795
796
Epidemiology
Implications
As S. V. Kasl once noted, epidemiologic findings tend to
be quite reliable, but opaque. That is, given the methodological emphases in epidemiology, great care is usually
taken to deal with issues of confounding, temporal ambiguity in associations and certain kinds of measurement
error (especially having to do with public data sources
such as death certificates), among others. At the same
time, given that large-scale surveys are often undertaken
that require long-term follow-ups, the nature of the data
Epidemiology
797
concept here, host resistance, is a brilliant rhetorical appeal to ideas fundamental to basic infectious-disease epidemiology. In the classic triad of agent-hostenvironment, the interaction of the agent and host in
a particular environment does not automatically lead to
the illness of the host; rather, there are factors (e.g., nutritional status) that enable the host to resist the invasion
of a pathogen. Cassel drew the analogy with human social
life. Individuals are exposed to certain kinds of stressful
events and circumstances, often as a result of being disadvantaged in the socioeconomic hierarchy. There are,
however, resources that individuals can bring to bear to
withstand these psychosocially noxious events and circumstances, resources that are social in nature. Higher
social integration can provide individuals a greater opportunity to obtain the help and assistance of others, as well as
their emotional support, in times of crisis. This support, in
Cassels term, buffers the effects of social stressors on
the risk of disease. By buffering, Cassel means that under
conditions of low support from others increased exposure
to stressors is associated with a substantial increase in the
risk of disease, but that under conditions of high social
support increasing exposure to stressors has little association with health outcomes. People with greater access to
social support are thus protected from the harmful effects
of stressors.
There was also a second component to Cassels argument that was crucial to the development of research on
social factors and health. That was his suggestion that
social factors were not specific in their etiologic effects
but rather created a state of generalized susceptibility to
disease. Cassel based his argument on the observation
that social inequality and social integration are associated
with a wide range of disease outcomes, not with a few or
even a category of health outcomes. Therefore, research
on social processes can take any of a number of health
outcomes as its object of study because of the creation of
a state of generalized susceptibility.
It is difficult to underestimate the influence that this
model of the stress process has had on research on social
factors and disease risk. Also, in true epidemiological
fashion, this research depends in an important way on
a complex linkage of research literatures. To use an analogy, explaining the association of elevated serum cholesterol with coronary artery disease mortality does not rest
on the epidemiologic observation of association, but also
requires evidence from pathological studies of victims of
heart disease showing the buildup of plaque on the walls
of the main coronary arteries. Similarly, studies of the
stress process do not rest solely on the observation of
associations in population studies, but depend in an important way on laboratory research on psychophysiology
and psychosocial processes, as well as basic social research. But community-based research is fundamental
in examining the distribution of health outcomes.
798
Epidemiology
The impact of social factors on the expression of psychological distress could be taken as an indicator of the importance of those factors in all areas, under the
assumption of generalized susceptibility.
The shift in the definition of health outcomes is evident
in other areas as well. Although many researchers in the
area of cardiovascular disease continue to use clinical
definitions as outcome variables, substantial contributions
have been made by social scientists using the direct measurement of blood pressure as an outcome variable. Blood
pressure is particularly suited to this kind of research
because it is conveniently measured and has a wide continuous distribution in human populations. Many investigators working cross-culturally have employed blood
pressure as an outcome variable for this reason.
Another example of shifting the definition of health
outcomes can be seen in the 2001 study of low birth
weight done by Oths, Dunn, and Palmer. These investigators examined the contribution of job stressors to the
risk of low birth weight in a sample of predominantly lowincome women in the rural South. Due to the time-limited
nature of the process (i.e., 9 months) the investigators
were able to use a prospective research design, measuring
job stressors at the beginning and in the third trimester of
pregnancy, and then obtaining birth weights after delivery. Even with a sample of 500 women, however, statistical power was low when treating the outcome as
a conventional clinical assessment (i.e., less than 2500
grams at birth). Using the direct measurement of birth
weight as a continuous variable, however, enabled the
investigators to detect an effect of job-stressor exposure
during pregnancy on birth weight (an effect of 190 grams
or over 7 ounces of birth weight, after controlling for
known covariates).
Such a strategy in measurement can lead to disagreements with researchers who place greater stock in biomedical systems of nosology, but this has been an effective
strategy for increasing the quantity and quality of social
scientific research on health outcomes.
Epidemiology
799
800
Epidemiology
Cultural Dimensions of
Population Health
The epidemiologic transition refers to the shift in patterns
of morbidity and mortality within a population from one
dominated by infectious and parasitic diseases (compounded by nutritional deficiencies) to one dominated
by chronic diseases (compounded by nutritional imbalances and excessive caloric intake). The use of this descriptive term followed decades of cross-cultural research
on health, research that has documented the changing
patterns of health accompanying social and cultural
change.
Many of these observations were based on studies of
blood pressure because (as previously noted) the measurement of blood pressure could be conveniently done
under relatively difficult field conditions. By the 1970s,
literally dozens of studies had accumulated on societies
around the world, ranging from peoples still leading
a hunting and gathering way of life to peoples in the throes
rapid industrialization. In an interesting integration of
social science and epidemiology, Waldron and colleagues
in 1982 combined data on sample blood pressure, body
mass, and age and sex distributions with data from the
Human Relations Area Files, a compendium of sociocultural data on world societies. When examined in relation
to subsistence economy, they showed that there were no
blood pressure differences and no increase of blood pressure with age in societies based on simpler technologies
and economies (i.e., foraging, pastoralism, and intensive
agriculture). When compared to societies practicing extensive agriculture (i.e., plantation societies) and industrialized societies, however, there was a sharp and
dramatic increase in community average blood pressures
(independent of differences in age, sex, and body mass
distributions).
This study provides a useful context for understanding
research on modernization (or acculturation; the terms
are often used interchangeably) and disease. In most of
this work, research has been carried out in which communities have been ordered along a continuum of modernization. Traditional communities are those in which
the local economy is still based on production for local
consumption, there is little formal education, national
languages are not frequently used, the social structure
emphasizes extended family relationships, and there is
little penetration by global supernatural belief systems.
Modern communities are those in which wage-labor has
replaced local production, formal education is present,
the national language has supplanted local dialects, the
Epidemiology
801
802
Epidemiology
Summary
At one level, the convergence of social science research
and epidemiological research is uncontroversial because
both approaches share many things in common. Indeed,
social scientists exposed to epidemiological methods for
the first time may wonder why they would be regarded as
distinctive. At the same time, special features of studying
health and disease have led to a distinctive approach with
an attendant set of concerns in epidemiology that social
scientists will come to appreciate. By the same token,
epidemiologists incorporating social scientific sensibilities into the study of health and disease have open to
them a distinctive way of thinking about the subject matter and a novel set of solutions for research on social
processes of clear relevance.
There are, however, compromises to be made in moving
in either direction. For example, epidemiologists may be
dismayed by the labor-intensive nature of social scientific
measurement and the way that this measurement process
places limits on sample size. Anthropologists (for example)
may be equally dismayed at the thought that something as
subtle and nuanced as a cultural syndrome could be studied
by including a single question on a survey. But by stepping
back from their usual conventions and appreciating the
theoretical insights to be gained by this methodological
rapprochement, researchers in these varied fields will
find much to be gained.
Further Reading
Adler, N. E., Boyce, W. T., Chesney, M., Folkman, R., and
Syme, S. L. (1993). Socioeconomic inequalities in health.
JAMA 269, 31403145.
Berkman, L. F., and Kawachi, I. (2000). Social Epidemiology.
Oxford University Press, New York.
Berkman, L. F., and Syme, S. L. (1979). Social networks, host
resistance and mortality. Am. J. Epidemiol. 109, 186
204.
Brown, G. W., and Harris, T. W. (1978). Social Origins of
Depression. Free Press, New York.
Cassel, J. C. (1976). The contribution of the social environment to host resistance. Am. J. Epidemiol. 104, 107
123.
Cohen, S., Kessler, R. C., and Gordon, L. U. (1995).
Measuring Stress. Oxford University Press, New York.
Dressler, W. W. (1994). Cross-cultural differences and
social influences in social support and cardiovascular
disease. In Social Support and Cardiovascular Disease
(S. A. Shumaker and S. M. Czajkowski, eds.), pp. 167192.
Plenum Publishing, New York.
Dressler, W. W., and Santos, J. E. D. (2000). Social and
cultural dimensions of hypertension in Brazil: A review.
Cadernos de Saude Publica 16, 303315.
Duncan, B. B., Rumel, D., Zelmanowicz, A., Mengue, S. S.,
Santos, S. D., and Dalmaz, A. (1995). Social inequality and
mortality in Sao Paulo state, Brazil. Int. J. Epidemiol. 24,
359365.
Durkheim, E. (1951 [1897]). Suicide (J. Spaulding and
G. Simpson, trans.). Free Press, New York.
Guarnaccia, P. J. (1993). Ataques de nervios in Puerto Rico.
Med. Anthropol. 15, 157170.
Haan, M. N., Kaplan, G. A., and Camacho, T. (1987). Poverty
and health: Prospective evidence from the Alameda County
study. Am. J. Epidemiol. 125, 989998.
Janes, C. R. (1990). Migration, Social Change and Health.
Stanford University Press, Stanford, CA.
Marmot, M., Rose, G., Shipley M., and Hamilton, P. J. S.
(1978). Employment grade and coronary heart disease in
British civil servants. J. Epidemiol. Community Health 32,
244249.
Mirowsky, J., and Ross, C. E. (1989). Psychiatric diagnosis as
reified measurement. J. Health Soc. Behav. 30, 1125.
Oths, K. S., Dunn, L. L., and Palmer, N. S. (2001).
A prospective study of psychosocial strain and birth
outcomes. Epidemiology 12, 744746.
Waldron, I., Nowotarski, M., Freimek, M., Henry, J. P.,
Post, N., and Witten, C. (1982). Cross-cultural variation in
blood pressure. Soc.l Sci. Med. 16, 419430.
Wilkinson, R. G. (1996). Unhealthy SocietiesThe Afflictions
of Inequality. Routledge, London.
Equivalence
Johnny R. J. Fontaine
Ghent University, Ghent, Belgium
Glossary
construct bias Generic term used to refer to the cultural
specificity of the theoretical variable and/or domain underrepresentation.
construct equivalence Generic term used to refer to
functional and/or structural equivalence.
cultural specificity of the theoretical variable Occurs
when a theoretical variable can be used validly only within
a specific cultural context.
domain underrepresentation Occurs when important aspects of the domain that a theoretical variable is assumed to
account for are not represented in the measurement
instrument.
full score equivalence Occurs when scores that can be
directly compared between cultural groups.
functional equivalence Occurs when the same theoretical
variable accounts for measurement outcomes across
cultural groups.
item bias Occurs when scores on a specific item cannot be
compared across cultural groups.
method bias Occurs when method factors have a differential
impact on measurements across cultural groups, leading to
noncomparability of scores.
metric equivalence Occurs when relative comparisons, for
instance, between experimental conditions, are valid
between cultural groups.
structural equivalence Occurs when the same measurement
instrument forms a valid and sufficient indicator of
a theoretical variable across cultural groups.
Introduction
Intensified intercultural exchanges and mass migrations
leading to multicultural societies have also influenced the
behavioral sciences. These sciences have become more
cross-culturally oriented and the types of societal and individual problems with which practitioners are confronted
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
803
804
Equivalence
an inductive reasoning test contain unfamiliar words, verbal abilities have a systematic, additional impact on the
measurement. Moreover, an instrument cannot give any
information about aspects of the domain that are not represented in the items. For example, the results of an
inductive reasoning test do not give an indication for intellectual functioning in general, since the latter theoretical variable refers to a much broader domain than
represented in the test. Although evidence for the relevance and representativeness of the stimulus material
forms a necessary condition for a valid interpretation of
the test scores, it does not form a sufficient condition;
other unintended factors, such as response styles, may
interfere. As already mentioned, it must be demonstrated
that the selected stimuli elicit the intended behavior in the
specific contexts where they are applied.
Psychometric Modeling
Analysis of the relationships between observed and latent
variables provides a first possibility for investigating
whether the intended psychological phenomena are
elicited by a set of stimuli. In psychometrics (but also
in sociometrics and in econometrics), mathematical and
statistical models have been developed that relate observed item behavior to a position on a latent variable.
By specifying precise relationships between items and the
latent variable, a psychometric model also makes predictions about how the scores on the items should interrelate.
Thus, it can be tested whether a psychometric model
adequately accounts for the observed interrelationships.
If this is the case, the psychometric model allows
estimates of the position of an individual test taker on
the latent variable.
The most common psychometric models use two basic
parameters to describe the relationship between an observed and a latent variable, namely, an association or
weight parameter and a level or intercept parameter.
Since these parameters play an important role further
on in this article when the levels of equivalence and
types of bias are distinguished, these two parameters
are presented in somewhat more detail for two prototypical psychometric models, namely, confirmatory factor
analysis and the two-parameter logistic model.
Confirmatory factor analysis (CFA) is often used with
the measurement of attitudes and personality traits. In
this model, it is assumed that both the observed and
the latent variables are measured at interval level. The
relationship between observed and latent variable can
be represented by a regression model, Xij ai biYj,
where Xij is the expected score on the observed variable
i and Yj is the position on the latent variable for subject j.
The association parameter, or weight, bi, indicates the
expected change in the observed score when there is
one unit change in the latent variable. The stronger the
Equivalence
Nomological Network
A fundamental question in scientific research concerns
the interpretation of latent variable scores in terms of the
intended theoretical variable, while excluding other theoretical variables. An important source for the meaning of
a theoretical variable is the network of predicted positive,
zero, and negative relationships with other theoretical
variables, called the nomological net. It reflects major
information on the scientific theory about that variable.
The interpretation of a measurement gains in credibility
to the extent that this network of relationships can be
empirically confirmed. For instance, the credibility of
a score interpretation in terms of inductive reasoning increases if the measurement relates in the theoretically
predicted way to other measures, such as tests of deductive reasoning and school success or failure.
805
Levels of Equivalence
When the measurement context is extended to more cultural groups, three basic questions can be asked within the
general measurement framework presented, namely,
whether across these groups (1) the same theoretical
variables can account for test behavior, (2) the same observed variables can be used for measurement, and (3)
comparisons can be made based on the same latent
variables. Since two basic parameters can be distinguished in the relationship between observed and latent
variables, these questions lead to four different levels of
equivalence. These are as follows: (1) functional equivalence, which implies that a theoretical variable has the
same psychological meaning across the cultural groups;
(2) structural equivalence, which implies that an observed
variable refers to the same latent variable, orin measurement termsthat the weight parameter linking observed and latent variables differs (nontrivially) from zero
in each of the groups; (3) metric equivalence, which implies that the weight parameter between an observed and
a latent variable has the same value in the cultural groups
and thus that cross-cultural comparisons of score patterns
can be made; and (4) full score equivalence, which implies
the same value for both the weight and the intercept
parameters between observed and latent variables across
the groups. This allows for cross-cultural comparisons of
scores at face value. Note that these four types are hierarchically ordered in the sense that a higher level of equivalence requires that the conditions for the lower levels
are met. Each of these four levels of equivalence is presented in greater detail here (see also Table I).
Functional Equivalence
MultitraitMultimethod Approach
One of the major practical concerns in measurement lies
in controlling for the possible impact of the specific method that is used on the estimates of a theoretical variable.
For example, working with Likert scales can introduce
an acquiescent response style. Method variance can be
806
Table I
Equivalence
Four Levels of Equivalence, with Corresponding Types of Bias and Research and Data Analytic Methods
Level of
equivalence
Answer to
question
Major conditions
Types of bias
(1) Functional
equivalence
Similar network of
convergent and
discriminant relationships
with other theoretical
variables across cultural
groups
(1) Analysis of
nomological network
and context variables
(2) Structural
equivalence
Stimuli should be
relevant and representative
for the content domain
across cultural groups
Stimuli should have a nontrivial weight parameter
across cultural groups
( same internal structure)
(2) Analysis of
domain via expert
judgments or
qualitative research
(3) Multitrait
multimethod
measurements
(3) Metric
equivalence
Identical weight
parameters across
cultural groups
Identical intercept
parameters across
cultural groups
Structural Equivalence
The second question is whether the same theoretical variable can be operationalized by the same observed
variables across cultural groups. Since such observed
variables reflect the reactions of respondents to specific
stimuli, two conditions that are often treated separately in
the literature must both be met. First, the items or stimuli
of the instrument should be relevant to and representative
of the content domain within each of the cultural groups.
Second, it should be demonstrated that the items or
stimuli indeed elicit the intended behavior in each of
the cultural groups. This implies that each item response
should refer to the same latent variable in each of the
cultural groups. In terms of psychometric modeling,
this means that the latent variable should have positive
(nontrivial) weight parameters for each of the observed
variables in each of the cultural groups. No further restrictions are imposed on the weight and intercept parameters. Since the relationships between observed and latent
variables are referred to as the structure of a test, crosscultural comparability at this level is called structural
Equivalence
Metric Equivalence
The third question is whether it is possible to make quantitative comparisons between cultural groups on the basis
of the latent variable. The type of comparisons that can be
made depends on the restrictions that hold on the intercept and weight parameter in the psychometric model. If
only the values of the weight parameter are the same
across cultural groups, then it is possible to compare patterns of scores between cultural groups. Equal weight
parameters imply that a change in the latent variable
leads to the same expected change in the observed
variables in each of the cultural groups. The restriction
of equal weight parameters implies that the observed and
latent variables are measured on the same metric scale
across cultural groups. Since the origin of the scale can
still differ across cultural groups, only patterns of scores,
referring to, for instance, experimental conditions or
subgroups, can be directly compared across cultural
groups. This level of equivalence is called metric equivalence or measurement unit equivalence in the literature. It
can be compared with measuring temperature on
a Celsius scale in one cultural group and on a Kelvin
scale in another cultural group. The temperatures cannot
be directly compared between the two groups; however,
relative differences can be compared, such as the
difference between the average temperature in summer
and in winter.
In a study on information processing comparing Kpelle
(Liberian) with U.S. children, metric equivalence was
assumed. Children were asked to report the number of
dots they had seen on short-term (0.25 s) displays. In one
condition, the array of dots was random; in the other,
there was patterning. According to the authors, the observed differences in average achievement between
Kpelle and U.S. children could have been caused by factors such as motivation and familiarity with the testing
807
Types of Bias
Complementary to the three basic questions for equivalence, there are three basic questions for bias: across cultural groups (1) what can cause bias in theoretical
variables, (2) what can cause bias in the observed
variables, and (3) what can cause bias in the direct
comparisons of score patterns or scores based on the
same latent variables? Taking into account that method
factors can have an impact on both the whole instrument
and specific items of the instrument, four major types of
bias can be distinguished. These are as follows: (1) cultural
specificity of the theoretical variable, (2) domain underrepresentation, (3) method bias, and (4) item bias. Each of
these four types of bias is discussed here in more detail
(see also Table I).
Cultural Specificity of
the Theoretical Variable
The first question asks what can cause bias theoretical
variables. Here, the answer here lies in the cultural
808
Equivalence
Domain Underrepresentation
The second question asks about sources of bias in the use
of the same instrument across cultural groups. Since the
observed variables form the interplay between the stimuli
of the measurement procedure and the behavior that
is elicited by it, the causes might be situated (1) in
the stimulus material or (2) in the fact that the intended
behavior is not elicited by the stimulus material. Here,
the first problem is examined. The second problem is
discussed as method bias.
As has been made clear in the presentation of the
general framework, the stimuli must be relevant to and
representative of the domain to which they are supposed
to refer. Since cultural groups can differ widely in their
behavioral repertoire, this can pose serious problems.
A set of stimuli that is relevant to and representative of
the target domain in one cultural group need not be relevant and representative in another cultural group. An
instrument constructed in one cultural group might contain items that are irrelevant for the corresponding domain in another group. For instance, an item about
systematically locking ones windows and doors might
be a good indicator of psychoticism in cold countries
but not in warm countries, where windows must be
opened at night to let in the fresh air. In addition, it is
possible that the stimuli of an instrument designed in one
cultural group are relevant, but not representative of the
target domain in another cultural group. This is called
domain underrepresentation. It means that the results
of the measurement cannot be generalized to the
whole domain. This implies that the same observed
variables are insufficient or cannot be used across cultural
groups to measure the same theoretical variable. For instance, in 1996, Ho demonstrated that the domain of
behaviors relevant to the theoretical variable of filial
piety is much broader in China than in the United States.
Merely translating a U.S. instrument for filial piety
seriously underrepresents the domain in China.
When the behavioral repertoire turns out to be very
different between cultural groups, the question arises as
to whether this points to noncomparability of the underlying theoretical variable. For instance, the question has
arisen as to whether or not filial piety has a different
meaning in a Chinese context than in a U.S. context
and thus is not comparable between the two cultural
groups. Hence, cultural specificity of the theoretical variable and domain underrepresentation are often discussed together under the umbrella term of construct
bias.
Method Bias
The third and last question asks which factors can cause
bias in quantitative comparisons of scores or score patterns between cultural groups. If cultural specificity of the
theoretical variable and domain underrepresentation can
be excluded, then the remaining type of bias is method
bias. The possible threats to full score and metric equivalence are discussed together, because the factors that
have been described in the literature to cause bias in the
full comparability of scores (causing a lack of full score
equivalence) also affect comparability of score patterns
(causing a lack of metric equivalence). Moreover, the
factors can also affect structural equivalence (see also
Table I). Method bias refers to all those biasing effects
that are caused by the specific method and context of the
measurement. It must be noted that method bias does not
affect cross-cultural comparisons, if it operates in the
same direction and to the same extent within each cultural
group. From a cross-cultural perspective, the problem lies
in a differential impact of method bias across cultural
groups.
In the literature, method bias is restricted to those
factors that have a biasing effect on all or substantial
parts of a measurement instrument. Method factors
that have an effect only on specific items are treated separately as item bias. Although, conceptually, item bias is
just a form of method bias, there is a good reason to
distinguish the two. As seen in the next section, item
bias can often be straightforwardly detected by applying
adequate psychometric methods, whereas the detection
of method bias requires additional research using different methods. According to a 1997 paper by Van de Vijver
and Tanzer, factors that cause method bias relate to the
stimulus material, how it is administered, and to whom it is
administered. These are called, respectively, instrument
bias, administration bias, and sample bias.
Equivalence
Instrument Bias
Instrument bias is caused by specific item content, specific response format, and specific response styles. Differential familiarity across cultural groups with either the
item content or the response format can be considered
a major source of method bias. Lack of familiarity with the
stimulus material or the response format may cause unintended difficulties, whereas familiarity can lead to testwiseness and unintended test-easiness. For instance, it
has been demonstrated that the direction of writing the
alphabet (from the left to the right for Latin languages or
from the right to the left for Arab languages) has an impact
on the difficulty of items in a figural inductive reasoning
test that are presented in a horizontal way. The differential
impact of the response format was demonstrated in a 1979
study by Serpell. British children outperformed Zambian
children in a pattern reproduction task with a paper-andpencil response format, but not when plasticine or configurations of hand positions were used to present the
items. The cultural difference in performance was even
reversed when respondents had to reproduce patterns in
iron wire; making toys in iron wire is a favorite pastime in
Zambia.
Another form of instrument bias is caused by
differences in response style. Specific response formats
may have a differential impact on response styles, such
as social desirability, extremity scoring, and acquiescence. For instance, Hispanics tend to use more the
extremes of the scale than Anglo-Americans, but only
when a 5-point scale is used, and not when a 10-point
scale is used.
Administration Bias
Administration bias refers to all biasing effects that are
caused by the way in which a test is administered. This bias
may be due to a lack of standardization across cultural
groups or by different interpretations of standardized administration procedures. A lack of standardization in test
administration between cultural groups may be caused by
(1) differences in physical conditions (such as temperature or luminosity) and social environment (such as class
size when subjects are tested in groups); (2) different
instructions for the respondents due to ambiguous
guidelines or differential expertise of the test administrators; and (3) problems in the communication between
tester and testee due to differences in language, the
use of an interpreter, or culture-specific interpretation
of the instructions.
Even if the test administration is perfectly standardized from an objective point of view, a differential meaning of the characteristics of the test administration may
lead to bias. Effects have been reported in the literature of
(1)theuseofmeasurementorrecordingdevicesthatarouse
more curiosity or fear in cultural groups less acquainted
with them and (2) differential tester (or interviewer or
809
Item Bias
Item bias refers to those causes of noncomparability that
are due to responses on specific items. The most obvious
reason for item bias lies in a poor translation of the item.
For instance, in an international study, the original English item used the term webbed feet for water birds. In
the Swedish version, this item was translated as swimming feet, which caused an unintended easiness of this
item for Swedish pupils. The impact of a poor translation
may increase when the original item has an ambiguous
meaning. Another problem is that items can cause nuisance variance by invoking unintended traits or processes.
For instance, the word dozen in a numerical ability item
might introduce additional verbal abilities. In addition,
individual items might be more appropriate for the domain of interest in one cultural group than in another.
Psychometrically, the item bias can affect the intercept, the weight, or both parameters in the relationship
between the observed and latent variable. Item bias that
affects only the intercept is called uniform item bias.
Within a particular cultural group, the item score (or
log odds of a correct versus an incorrect answer) is systematically higher than in another cultural group independently of (or uniform across) the position on the latent
variable; the bias is the same for all persons. Item bias that
affects the weight parameter is called nonuniform item
bias. If the weight parameter differs, the size of the bias for
a respondent in a group depends on her or his position on
the latent variable. The bias is thus not uniform across the
possible positions on the latent variable.
810
Equivalence
Nomological Network
As already discussed, the network of convergent and
discriminant relationships of a theoretical variable with
other theoretical variables is one of the main sources of
information on the scientific meaning of a theoretical variable. Therefore, the empirical study of the nomological
network within each cultural group forms one of the important strategies to exclude cultural specificity of the
theoretical variable and support functional equivalence.
The study of the nomological network, however, is
interesting not only for justifying the identity of theoretical variables cross-culturally. It can also contribute
to elucidating the meaning of cross-cultural differences
obtained with full score equivalent measurements. In
1987, Poortinga and Van de Vijver compared the study
of the impact of culture on behavior with the peeling of
an onion. The variables from the nomological network
that account for the cultural differences are like the layers
of an onion. The impact of culture has been fully grasped
when all cultural differences have disappeared after the
effects of those variables are controlled for. For instance,
in 1989, Earley found that a difference in social loafing
(working less hard in a group than alone) between a U.S.
sample and a Chinese sample disappeared when the allocentric versus idiocentric orientation of the individual
Equivalence
Domain
In the context of achievement testing, judgment methods
are frequently used to detect irrelevance of items for
specific ethnic or cultural groups. Expert judges who
are familiar with the ethnic or cultural group screen
items for inappropriate content. These items can then
be deleted from the instrument.
In other research fields, relevance and representativeness of the stimuli can also be studied by means of key
informants who are well acquainted with the local culture
and language. For instance, in 1992, Schwartz asked local
researchers to add culture-specific value items that they
thought were not represented in the original instrument.
Later analyses indicated that these culture-specific value
items did not lead to culture-specific value dimensions,
which supported the comprehensiveness of the original
model.
Another approach is to study the domain of investigation in a rather open and unstructured way, such as performing content analysis on responses to open questions.
For instance, in a 2002 comparative study of the cognitive
structure of emotions between individuals from Indonesia
and from The Netherlands, Fontaine and co-workers first
asked subjects to write down as many emotion terms as
they could think of in 10 minutes. Thus, they ensured that
the emotion terms used were relevant to and representative of the domain of emotion terms in each of these
groups.
811
812
Equivalence
Equivalence
Multimethod Approach
In addition to the problem of generalized uniform bias for
higher level inferences, which was discussed in the previous section, there is the difficulty that method factors
can have a biasing effect on all item responses in
a particular cultural group. For instance, the same
items might be more susceptible to social desirability in
one cultural group than in another. These systematic
method effects may go unnoticed if only the internal
structure of the instrument is analyzed. However, this
type of bias can be detected by applying multiple methods
or, more preferably, a multitraitmultimethod design.
Only by applying different methods for measuring the
same theoretical variable can systematic effects associated
with a specific method be revealed.
Conclusions
Cross-cultural measurements can be distorted in many
different ways and this leads to noncomparability of
scores. The tenet of the entire bias and equivalence literature is that one cannot simply assume full score equivalence in cross-cultural measurement and interpret
cross-cultural differences at face value. In reaction to
the recognition of the plethora of possible biasing effects,
extensive psychometric, methodological, and theoretical
tools have been developed to deal with these effects. This
arsenal of tools offers a range of possibilities to empirically
justify the intended level of equivalence and draw valid
conclusions from cross-cultural measurements.
813
Further Reading
Berry, J. W., Poortinga, Y. H., Segall, M. H., and Dasen, P. R.
(2002). Cross-Cultural Psychology: Research and
Applications, 2nd Ed. Cambridge University Press,
Cambridge, UK.
Camilli, G., and Shepard, L. A. (1994). Methods for Identifying
Biased Test Items. Sage, Thousand Oaks, CA.
Cole, N. S., and Moss, P. A. (1989). Bias in test use. In
Educational Measurement (R. L. Linn, ed.), 3rd Ed.,
pp. 201219. Macmillan, New York.
Harkness, J. A., Van de Vijver, F. J. R., and Mohler, P. P.
(2003).
Cross-Cultural
Survey
Methods.
Wiley,
Hoboken, NJ.
Holland, P. W., and Wainer, H. (eds.) (1993). Differential Item
Functioning. Erlbaum, Hillsdale, NJ.
Messick, S. (1989). Validity. In Educational Measurement
(R. L. Linn, ed.), 3rd Ed., pp. 13103. Macmillan, New
York.
Millsap, R. E., and Everson, H. T. (1993). Methodology
review: Statistical approaches for assessing measurement
bias. Appl. Psychol. Measur. 17, 297334.
Poortinga, Y. H. (1989). Equivalence of cross-cultural data: An
overview of basic issues. Int. J. Psychol. 24, 737756.
Reynolds, C. R. (1995). Test bias and the assessment of
intelligence and personality. In International Handbook of
Personality and Intelligence (D. H. Saklofske and M. Zeider,
eds.), pp. 543573. Plenum, New York.
Steenkamp, J.-B.E.M., and Baumbartner, H. (1998). Assessing
measurement invariance in cross-national context.
J. Consum. Res. 25, 7890.
Tanzer, N. K. (1995). Testing across cultures: Theoretical
issues and empirical results [Special Issue]. Eur. J. Psychol.
Assess. 11(2).
Van de Vijver, F. J. R., (ed.) (1997) Cross-cultural psychological assessment [Special Issue]. Eur. Rev. Appl.
Phychol. 47(4).
Van de Vijver, F. J. R., and Leung, K. (1997). Methods and
data analysis of comparative research. In Handbook of
Cross-Cultural Psychology, Vol. 1, Theory and Method
(J. W. Berry, Y. H. Poortinga, and J. Panday, eds.), 2nd Ed.,
pp. 257300. Allyn and Bacon, Boston, MA.
Van de Vijver, F. J. R., and Leung, K. (1997). Methods and
Data Analysis for Cross-Cultural Research. Sage,
Thousand Oaks, CA.
Van de Vijver, F. J. R., and Tanzer, N. K. (1997). Bias and
equivalence in cross-cultural assessment: An overview.
Eur. Rev. Appl. Psychol. 47, 263279.
Garrath Williams
Lancaster University, Lancaster, England, United Kingdom
Glossary
bias The distortion of results; may be deliberate or unwitting.
Bias can occur either by neglect of some relevant factor(s)
or overestimation of others; it can also take place at any
stage of researchin the selection and definition of a topic,
in research design, in the conduct of the study, or in the
interpretation and publication of results.
ethics May have a descriptive or a normative sense; ethics
may offer a description of standards by which a group or
community regulates its behavior and distinguishes what
is legitimate and acceptable as aims, or means of pursuing these. Normative ethics (the main concern of this
article) offers evaluation, vindication, or critique of such
standards.
fabrication, falsification Two important ways in which
research can be undermined. Fabrication is the artificial
creation of data, invariably of data that support a researchers hypothesis. Falsification is the altering of data,
again usually so as to make a chosen result or hypothesis
appear more strongly supported.
informed consent The voluntary participation of a person in
any transaction with another; guards against both coercion
and deception. There is a high potential for ignorance or
false belief when much research is conducted. Speaking of
consent as informed draws attention to the need for
relevant knowledge if participation is indeed to be
voluntary.
plagiarism The copying of anothers intellectual endeavors
without appropriate acknowledgment, probably leading
to incorrect attribution of work; can sometimes occur
unwittingly (for instance, by forgetting to reference
a source).
research misconduct A relatively narrow category, by
convention covering only a subset of forms of unethical
research conduct. Different funding and regulatory bodies
Ethical issues surrounding research are complex and multifaceted. There are issues concerning methods used, intended purpose, foreseen and unforeseen effects, use and
dissemination of findings, and, not least, what is and what
fails to be researched. In this article, the issues are broken
down into two main categories: first, how research is done,
and second, how it is determined by and in turn affects
a wider context. Familiar issues are discussed, such as the
need for methodologically sound investigation and appropriate protections for human research participants, as well
as issues arising in collaborative research. The second set
of issues is less well investigated and, indeed, particularly
difficult to address; these are issues that extend well beyond the control of the researcher or, quite often, any one
individual or organization. Also discussed are research
topic selection, research funding, publication, and, finally,
the role of external ethical guidelines and their institutional setting.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
815
816
involves proper conduct toward human research participants. Finally, there are the thorny issues that arise in
collaborative work.
817
818
as care in interpreting findings. Quantitative methodologies are less likely to reveal differences in understandings between researcher and the researched, and thus
researchers assumptions may easily lead to misinterpretation of data. Qualitative research is often thought to be
less ethically problematic because the researcher does not
remain a neutral gatherer of information but is involved
with the research participants, seeks to establish a rapport,
and wants to learn from them. However, qualitative social
research is likely to be more intrusive for the participants
and the researchers sympathy and involvement is usually
limited because he/she will leave the situation having
completed the study. Confidentiality is an ethical issue
in all research, but will often be more problematic with
regard to qualitative work insofar as data reveal more
about the person studied. For instance, when detailed
case studies are reported, care has to be taken to ensure
that individuals are not identifiable.
Respecting Individuals in
Human Research
Respect for the participants in human research has long
been the most debated area of research ethics. The wellknown background for these concerns resides in Nazi
experiments on concentration camp inmates during
World War II, gruesome in degrees right up to prolonged
torture and deliberate killing. The Nuremberg code of
1946 and the World Medical Association Declaration of
Helsinki, first promulgated in 1962, presented an explicit
disavowal of such science and affirmed the importance
of both the welfare of each person who was to be the
subject of research, and of each individuals free and explicit consent.
Though this background is medical, the same issues
apply to all social research that poses potential risks to the
interests or welfare of individuals. When the research
does not pose risks to welfare, there may also be a case
for thinking considerations of consent are irrelevant.
However, this is by no means necessarily the case: people
have, and take, an interest in their privacy, which is generally not reducible to welfare considerations. Individuals
care deeply about how they are seen by others, so that
even the most innocuous research that touches on the
privacy of those researched must raise the matter of consent. Here the focus is on three issueswelfare, consent,
and privacy; the discussion then briefly touches on some
possible conflicts that arise when the public interest seems
to contradict these rights of research participants.
Welfare
Welfare is a consideration in regard of all research participantsadult or child, well or infirm, of sound-mind or
affected by mental disturbance or degeneration (indeed,
it is also a matter of concern with regard to research
animals). It is usually taken as the first priority that research must not pose undue risks to research participants.
Often this may be a relatively marginal issue in social
research (as opposed to, say, medical research); it will
nonetheless be relevant in some psychological work (recall, for instance, the famous Stanford prison experiment,
whereby volunteers playing roles of guards and inmates
became brutalizers and victims within a space of days), or
in anthropology (whereby the presence of outsiders may
have unpredictable effects on previously closed societies).
Likewise, special consideration must be given to research
on those who are vulnerable for various reasons: the very
young, the infirm, and those belonging to groups who
experience prejudice or systematic injustices such as
poverty.
Consent
Consent poses an obvious problem for social measurement. Though it is a straightforward ethical requirement that researchers not interfere with peoples lives
without their consent, the fact remains that someone
who knows they are being observed will almost certainly
act differently than if they believed they were unobserved
by evaluating strangers. Even pollsters have found
a problem here: in Britain in the 1980s, polls consistently
underpredicted the proportion of the electorate that
would vote Conservative. The usual explanation is that
people were reluctant to admit to voting Conservative,
this being perceived by many as the party of not-soenlightened self-interest.
There are several obvious methods of avoiding this sort
of problem. In the first place, research might be designed
so as to leave peoples lives untouched, so that there is
neither risk nor manipulation nor invasion of privacy. In
some disciplines, this may be straightforward: very many
economic statistics can be gathered without raising any of
these problems; historical research, except of the recent
past, is even less likely to raise such issues. Retrospective
use of data gathered for other purposes is another technique, though it may nonetheless pose privacy and consent issues, depending on how the data were initially
gathered. In other areas, it may often be impossible not
to affect peoples lives. Much sociology, for example, will
wish to get close to peoples lived experience, which requires direct contact between researched and researcher.
Observers who do not declare themselves as such are
effectively manipulating other participants, and if discovered would almost certainly be perceived to have betrayed
their trust.
Direct, nonmanipulative contact inevitably gives rise to
observer effects. Withholding information about the particular purpose of the study is one strategy to minimize
these. This means, however, that individuals will not
know, to a greater or lesser extent, what they are
consenting to; in research, as in ordinary life, this naturally
819
820
however, that there is an important tension between preserving confidentiality and allowing scrutiny of research
findings and interpretations. This reflects a more general
tension between respecting privacy and sharing data and
results. As a final point, it may be worth emphasizing
that there will be a strong duty of care with regard to
data: carelessness in storage (for instance, the use of
a computer system without adequate security protections)
can obviously lead to significant breaches of the privacy of
research subjects.
Conflicts with the Public Good or the Law
Many social investigations can or do touch on activities
that are illegal or on the fringes of legality, or that damage
public goods. Here, of course, the research can often be
claimed to be in the public interest: to reduce criminal or
antisocial behavior, it may be helpful to understand it
better. (Indeed, it may turn out that such behavior is
not damaging or is at least less so than supposed. The
use of illegal drugs might be thought a case in point,
insofar as the legal context of the activity is much more
of a problem than is the activity.) Nonetheless, when there
is a real, apparent, or claimed injury to the public, state
agencies may take an interest in the research in ways that
threaten research subjects privacy and undermine or vitiate their consent. (A useful analogy is to compare the
protection of sources in journalism.) There is no straightforward way to deal with such dilemmas, or even to prevent their arising, because apparently innocuous research
may uncover unpleasant surprises. Institutional measures
can, however, help protect the integrity of researchers and
foster trust on the part of research participants. A recent
case in the United Kingdom illustrates the difficulty. In
1997, a study was published of human immunodeficiency
virus (HIV) transmission in a Scottish mens prison. By
including one case of sexual transmission to a female nonprisoner, the study provided concrete evidence of how
a particular strain of HIV could be transmitted by different pathways. The infected woman subsequently approached the police, however, and via a complicated
chain of events, blood samples from the study were seized
by police and used to convict one research participant of
culpable and reckless conduct. Now, the legality of such
police activity clearly varies with jurisdiction, and, of
course, there may be cases when the privacy accorded
to research participants ought, morally speaking, to be
overridden. The point, however, is that such cases are
often open to dispute, so that researchers may find themselves under pressure from state agencies and may be
doubtful whether compliance is ethical or indeed legal.
In the case given, no protection was offered by the institution where the research had taken place, and individual
researchers had effectively no choice but to comply with
police requests. In the United States, by contrast, it would
be much more likely that the sponsoring institution would
take a strong interest in defending the data or study materials. Whether research confidentiality is likely to be
supported by institutions or is liable to be undermined
by state bodies may, then, be a salient factor in deciding
whether particular sorts of research should be done at all.
Collaborative Issues
As social research has become more and more of
a collaborative endeavor, and the size of research teams
has tended to increase, so the significance of issues posed
by collaboration has grown. These are first and foremost
questions about clarity and fairness in the demarcation of
responsibilities, questions that may be posed prospectively (who will do what?) and retrospectively (who should
be credited, or blamed, for what?). These questions concern not just intellectual division of labor, but also resource allocation, legal and contractual responsibilities,
and intellectual property rights (IPRs). Confusion about
any of these can clearly generate grave practical and moral
problems, and so the first issue will be to ensure, prospectively, as much clarity and fairness as possible.
If there is a lack of clarity regarding responsibilities for
validating data, for instance, it is easy to see that inaccuracies may enter into analysis and reporting, inviting
doubt concerning the honesty of the research and damaging its integrity. Standards of fairness will to a large
degree be conventional, as in researchers salary payments, or contractual regulation of IPRs. However,
conventions may compass significant injustices, the salary
gap between male and female academics being a notorious
case in point. Moreover, changing circumstances or unexpected findings may alter the division of labor; prospective arrangements will need to be revisited and clear
management will thus be vital to maintaining the integrity
of collaborative work. Here the institutional context may
help or hinder significantlyfor instance, by providing
clear channels for oversight and review, straightforward
mechanisms to adjudicate disputes, or clear and fair policies regarding remuneration (or their opposites).
One important subset of issues arises with regard to
publication and authorship in research teams. Different
contributors may have different expectations as to who
should, and should not, be included as an author, particularly in multidisciplinary teams wherein participants
are used to different conventions. Differing levels of seniority pose obvious problems, especially because they are
unlikely to correspond directly to the extent of responsibility for the work or, obviously, to the amount contributed to the project. Junior researchers are also most likely
to be excluded from authorship if, for example, they are no
longer employed on the project when papers are published. In addition, honorary authors who have not
made a direct contribution to the work are sometimes
included, whether because they wield influence in the
821
822
Publication Issues
Publication is increasingly recognized as an area posing
many ethical problems. As already noted, serious issues
may arise in commercially sponsored work, when findings
are disliked by the sponsor. In the academic context, more
subtle difficulties arise. In the first place, publication
shares in the problem mentioned previously, that of allowing innovation in the institutional context of research.
Some sorts of research may be easier to publish, or at least
easier to publish in officially recognized or prestigious
places, thus supporting the conventional at the price of
the original.
None of this would be quite so serious, however, were
researchers not exposed to an environment in which extreme weight is placed on publication, both as regards
quantity of publications and, because quality is extremely
difficult to assess objectively, the prestige of the publishing journals. The very idea of publication as an end in
itself, which tends to emerge from this pressure, represents a basic distortion of research ethics. Publication is
fundamentally a means to the most effective and concise
reporting of research and findings; the quantity of publications is no indicator of the significance of research or
the ability of a researcher to produce sound, innovative
research. The pressure to publish creates an obvious incentive to overpublish, or to publish work sooner than
might be desirable from the point of view of intellectual
823
B
C
Summary of the principles of the ASA Code of Ethics originally agreed on in 1997.
824
Conclusion
The most important general considerations that need to
be taken into account in the conduct and context of social
measurement have been discussed. It is clear that many of
the issues are complex and usually require that researchers weigh a number of competing and potentially
conflicting factors. A general overview will not necessarily
be immediately helpful in providing specific guidance in
any concrete situation; nonetheless to make appropriate
and ethical decisions across a professional career, a wide
awareness of the issues is vital.
Further Reading
American Psychological Association (APA). (2004). Ethical
Principles of Psychologists and Code of Conduct. Available
via the APA website at www.apa.org
American Sociological Association (ASA). (2004). Code of
Ethics. Available via the ASA website at www.asanet.org
825
Ethnocentrism
Thomas F. Pettigrew
University of California, Santa Cruz, Santa Cruz, California, USA
Glossary
acquiescence bias The systematic bias caused by respondents of surveys and questionnaires who agree with
whatever is presented to them. The bias is enhanced when
a scales items are unidirectional, that is, measure the
phenomenon in the same form.
authoritarian personality A personality syndrome of traits
that is highly associated with outgroup prejudice characterized by submission to authorities and rigid adherence to
conventional standards.
ingroup A group of people who share a sense of belonging
and a feeling of common identity.
ingroup bias A systematic tendency to favor ones own group.
outgroup All groups outside of the ingroup. Especially salient
outgroups are those that are perceived as distinctively
different from the ingroup.
Introduction
The Need for the Concept
From Africa and Northern Ireland to the Middle East,
wars, sectarian strife, and intergroup conflict are major
Definition of Ethnocentrism
In popular language, ethnocentrism refers to a belief in
the cultural superiority of ones own ethnic group. In this
sense, ethnocentrism involves cultural provincialism.
More recently, the concept has come to signify simply
an unusually high regard for ones ingroup.
William Graham Sumner, a Yale University Professor
of Political and Social Science, coined and introduced the
concept in 1906 in his classic volume Folkways. It soon
gained wide use throughout the social sciences. At the
same time, Sumner also introduced the invaluable concepts ingroup and outgroupa social differentiation that
is universal among human beings. These three concepts
are still widely employed in social science even though
social science long ago dismissed Sumners larger theoretical positions as invalid.
The positive functions of ethnocentrism for the ingroup are obvious. High morale, group cohesiveness, patriotism, pride, and loyalty are often linked to a sense of
ingroup superiority. But there are negative consequences
of ethnocentrism as well. In addition to making
harmonious intergroup relations with other groups possibly more problematic, such a belief can lead to arrogant
inflexibility and blindness, which prevent actions needed
for the ingroups own welfare.
Sumners definition was broader than the nowstandard use. He held ethnocentrism to describe a view
by . . . which ones own group is the center of everything,
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
827
828
Ethnocentrism
and all others are scaled and rated in reference to it. Note
that this conception of ethnocentrism has two components: (1) an exaggeration of the ingroups position and
cultural superiority and (2) a disparagement of all outgroup cultures. These components form the core of
Sumners hypothesis.
Ethnocentrism
829
830
Ethnocentrism
bias did not correlate at all with their trait ratings of the
other tribes or their willingness to interact with the other
tribes.
A Test with the Cross-Cultural Area Files
Cashdan provided the most extensive test of the
Sumnerian hypothesis at the societal level. In 2001, she
used the invaluable data file of published codes on 186
preindustrial societies developed by Murdock and White.
Two independent sets of ratings for ethnic loyalty and
outgroup hostility strengthened her analysis. Two additional independent measures tapped the amount of internal warfare. And five further measures tapped the degree
of threat to the group: two independent ratings of the
threat of famine and three of external conflict. All independent ratings of the same concept proved significantly
and positively correlated.
Catastrophic food shortages, but not routine food
shortages, were consistently associated with stronger ingroup loyalties. External warfare also consistently related
to greater ingroup loyalty. The latter finding can be interpreted in several ways. Cashdan saw it as evidence that
external warfare led to enhanced loyalty. But the causal
paths could go in both directions. As the Sumnerian hypothesis holds, greater loyalty could be contributing to the
increased external warfare.
Other tests, however, failed to support the universality
of Sumners hypothesis. Cashdan found that none of the
four Spearman rank correlations between the ratings of
group loyalty and outgroup hostility attained statistical
significance. With varying numbers of societies in the
tests, the correlations ranged from 0.08 to 0.25. Indeed, three of the relationships were in the opposite direction from that predicted by Sumner. Similarly, the
degree of internal warfare was positively related with hostility toward outgroupsnot negatively as predicted by
the hypothesis.
procedure risks a strong acquiescence biasthat is, regardless of content, some respondents will agree with all
the statements (yea-sayers) and a few others will disagree
with all the statements (nay-sayers). When a scale contains
an equal number of positively and negatively worded
statements, these responses are moved toward the center
of the distribution of responses.
In addition, the E-scale has a fundamental conceptual
problem. By combining two prejudice subscales with
a subscale of rigid patriotism, it simply assumes the universal validity of the Sumnerian hypothesis. And our review of relevant studies demonstrates that such an
assumption is not justified. Ethnocentrism at the individual level of analysis is best tailored to the concepts basic
definition of markedly high regard for ones own ethnic
group. Whether ethnocentrism relates positively with
hostility toward outgroups must remain an empirical
question. Consequently, independent measures of attitudes and actions toward outgroups are needed.
Most studies of ethnocentrism at the individual level
have followed this procedure. Researchers have used various measures of ingroup regard while placing increasingly less emphasis on the cultural component of
ethnocentrism. One popular measure is to ask how identified the subject is with the ingroup, for example, the
question How close do you feel toward other Xs?
Working with children, Aboud in 2003 used the same
measures to tap both positive and negative evaluations of
the ingroup (whites) and outgroups (blacks and Native
Indians). The ratings of whites by her subjects constituted
her ethnocentrism measure, whereas the other ratings
constituted her outgroup measures. Similarly, other
studies assess ingroup bias by testing for ingroup preference in evaluations. Such preferences can be determined
not only by questionnaires but in laboratory experiments
as well by measuring the differential allocations of resources to the ingroup and outgroup.
Measures of Ethnocentrism
Ethnocentrism
831
Further Reading
Aboud, F. E. (2003). The formation of in-group favoritism and
out-group prejudice in young children: Are they distinct
attitudes? Dev. Psychol. 39, 4860.
Adorno, T. W., Frenkel-Brunswik, E., Levinson, D. J., and
Sanford, R. N. (1950). The Authoritarian Personality.
Harper, New York.
Brewer, M. B. (1999). The psychology of prejudice: Ingroup
love or outgroup hate? J. Soc. Issues 55(3): 429444.
Brewer, M. B., and Campbell, D. T. (1976). Ethnocentrism
and Intergroup Attitudes: East African Evidence. Sage,
Beverley Hills, CA.
Cashdan, E. (2001). Ethnocentrism and xenophobia: A crosscultural study (1). Curr. Anthropol. 42, 760765.
Feshbach, S. (1994). Nationalism, patriotism and aggression:
A clarification of functional differences. In Aggressive
Behavior: Current Perspectives (L. Huesmann, ed.),
pp. 275291. Plenum, New York.
Levine, R. A., and Campbell, D. T. (1972). Ethnocentrism:
Theories of Conflict, Ethnic Attitudes, and Group Behavior.
John Wiley, New York.
Mummendey, A., and Otten, S. (1998). Positive-negative
asymmetry in social discrimination. In European Review
of Social Psychology (W. Strobe and M. Hewstone, eds.),
Vol. 9, pp. 107143. John Wiley, New York.
Murdoch, G. P., and White, D. R. (1969). Standard crosscultural sample. Ethnology 8, 399460.
Struck, N., and Schwartz, S. H. (1989). Intergroup aggression:
Its predictors and distinctness from in-group bias. J. Pers.
Soc. Psychol. 56, 364373.
Sumner, W. G. (1906). Folkways. Ginn, New York.
Ethnography
Faye Allard
The University of Pennsylvania, Philadelphia, Pennsylvania, USA
Elijah Anderson
The University of Pennsylvania, Philadelphia, Pennsylvania, USA
Glossary
analytic induction Both a theoretical and a research approach
that aims to uncover causal explanations. This is done
inductively, by testing, redefining, and refining hypothesized concepts and relationships throughout the research
process until no new data contradicts the researchers
explanation.
ethnographic interview An in-depth interview conducted
with the aim of understanding the worldview, beliefs, and
social reality of the respondent. This type of interview is
often open-ended and typically not does adhere to a strict
set of predetermined questions.
field notes The primary way in which ethnographic research
data are recorded; this may include thick descriptions of
observations (dialogue, gestures, facial expressions, body
movements, use of space, and physical surroundings), the
researchers responses to these observations, and the
hypotheses that emerge from the data.
folk ethnography A term coined by Elijah Anderson which
describes the process through which individuals construct
and maintain everyday understanding about their social
world by continually observing and interpreting the
behavior around them. An individuals interpretations of
these observations are shaped by a personal ideological
framework, which develops over time through socialization
and experiences and provides the lenses through which an
individual tries to understand the people encountered in
daily life.
grounded theory A theoretical approach that places emphasis on the construction of theory, which is grounded in data
and is generated through inductive reasoning.
local knowledge A term developed by Clifford Geertz that
describes the shared ideologies, norms, and values, etc. that
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
833
834
Ethnography
these definitions appear broad, studying any culture systematically commonly requires posing questions
concerning how people meet the exigencies of everyday
life, how social groups form, function, and maintain themselves, and how shared ideologies, rules, and values are
negotiated and transmitted by people. Ethnography often
addresses these questions.
Introduction
Ethnography is the systematic study of culture. Ethnography is an inductive, qualitative research method, in
which the researcher, or ethnographer, immerses himself
or herself into the field (the social reality of the subjects of
investigation) to study and describe local culture. Typically, the ethnographer uses two main techniques to collect dataparticipant observation, which is recorded in
the form of field notes, and in-depth, or ethnographic,
interviews. Ethnographic data can also be collected from
historical records and artifacts, journals, and other written
material. Ethnographic data are usually, but not necessarily, derived from the process of ethnography and are
characterized by thick description, a term coined by
Clifford Geertz in 1973 to describe rich, layered descriptions of the field. The way in which ethnographic data are
presented is not, however, limited to the written form;
photographs, films, audio recordings, and poetry have
been used to render local settings ethnographically. It
is impossible to distinguish between ethnographic theory
and method, because data analysis is a continual process
that is developed as fieldwork progresses. Thus, ethnographic methodology comprises not only the physical
processes of conducting ethnography, but also the logical
procedures of analysis and the presentation of data.
the demands for academic rigor. By immersing themselves in the groups they were studying and acting as
participant observers, Malinowski and Radcliffe-Brown
strived to construct their theories and representations
inductivelyin other words, to base their theory on
their observations. Both presented their findings in
very detailed monographs, favoring impersonal description over personal narrative.
In the United States, anthropologists employed by government agencies to study Native Americans usually reflected the ideology of their employers, who perceived
Native Americans as problem people in need of integration into American life. However, this attitude did not
always prevail; anthropologist Franz Boas, as detailed in
Ronald P. Rohners The Ethnography of Franz Boas, and
ethnologist Frank Hamilton Cushing reported their findings from the Native American perspective, resulting in
higher levels of objectivity and validity. Through studying
Native Americans, Boas and Cushing also helped to legitimize fieldwork completed closer to home, which was
then an unconventional field site.
Early Sociologists
As the distinctive fieldwork techniques pioneered by
Malinowski and Radcliffe-Brown gained popularity
within their discipline, other academics in Europe and
America were also taking an increasing interest in field
methods. In the late 19th century, Charles Booth,
a prominent London businessman, hybridized field techniques with more traditional quantitative techniques,
such as descriptive statistics, surveys, and mapping
data, to produce a detailed study of Londons impoverished population. On the other side of the Atlantic, the
American philanthropist Jane Addams also adopted, albeit unwittingly, the anthropological methods being pioneered by Boas to describe Hull House, a racially and
ethnically diverse working-class settlement in Chicago.
Addams hoped to influence social reform by informing
a wider segment of society about the harsh conditions
faced by the urban poor. This distinguished Addams
from her anthropological forefathers, whose political
motivations were limited to those who were sent by
governments to study how to civilize primitive people.
One of the first sociological studies to utilize field
methodology to conduct a comprehensive and scientific
community study was The Philadelphia Negro, by William
Edward Burghardt Du Bois in 1899. Drawing heavily on
the hybrid methods forged by Booth, Du Bois study involved systematically mapping, surveying, interviewing,
and studying the history of Philadelphias black population. This pioneering work resulted in the first systematic
study of the African-American community. In the late
1920s, at the University of Chicago, William I. Thomas
and Florian Znanieckis The Polish Peasant in Europe and
Ethnography
835
belief that social life was fixed, instead positing that the
social world was ever changing. In order to study such
a dynamic entity, Park argued that fieldworkers should
take a pragmatic approach. This notion was heavily influenced by Parks colleagues John Dewey, George Mead,
and William James, who belonged to the first American
philosophical movement, known as Pragmatism, which
explained meaning and truth in terms of the application
of ideas or beliefs to the performance of actions and their
outcomes. Based on the theoretical ideas of the
Pragmatists and the methodology of the anthropologists,
Park believed that the researcher should actively participate in the social world in order to understand the culture
in the context of the setting. Park further stressed that the
fieldworkers should remain in the setting for an extended
period of time, and must enter without prior assumptions
in order to understand the true meaning of the behavior
being observed. Parks principles of studying social life in
the natural setting formed the beginnings of naturalistic
ethnography.
Advances in Ethnographic Methodology
Park and Burgess meticulously trained their students,
who included sociologists from the Second Chicago
School, notably, Everett C. Hughes, Herbert Blumer,
and Louis Wirth. Park taught by example and strongly
encouraged his students to go into the field to conduct
empirical research, even advising them to forgo the library
in favor of getting [their] hands dirty in real research.
Fieldwork at the Chicago School was characterized by
informal and somewhat unstructured procedures.
Drawing on the eclectic methods of Thomas, Park insisted
that his students should utilize a range of research techniques. Participant observation was one of the major components of the early Chicago Schools methodological
repertoire, and Park expected his students to enter
a field site and assume a role within it. Participant observation was sometimes done covertly, as exemplified by
Nels Andersons study The Hobo and Frederick Thrashers research on gangs. By shedding their identity as
a researcher, participant observers were able to gain entry
to and infiltrate a social group in order to participate in and
observe interaction in its natural form, thus improving
the validity of data collected. Yet, covert participant
observation demands a great deal from the researcher,
requiring them to learn how to speak the speak of those
being studied and to conform to patterns of behavior that
can be unfamiliar, illogical, and sometimes illegal. Until
Park and his followers advocated the process, there had
been little attempt to conduct covert participant research,
as anthropologists had primarily tended to be physically
conspicuous to the groups they studied. In Chicago, on
the other hand, sociologists generally looked like the populations they were researching and so could observe
covertly.
836
Ethnography
Ethnography
Symbolic Interactionism
Building on the theoretical discourse of Mead, Dewey,
and other early Chicagoans, Herbert Blumer, developed
a theoretical framework that has come to be known as
symbolic interactionism. Symbolic interactionism served
to rejuvenate and legitimate fieldwork at the Chicago
School by responding to many of the positivistic critiques.
Blumer openly criticized sociologists fixation on establishing scientific causal relationships. He argued that
many sociologists, including some early Chicagoans,
held a superficial knowledge of the social world they studied, and consequently their data had been forced into
preconceived categories and concepts that bore little relation to the social reality being investigated. Blumer insisted that social interaction was the sociologists primary
subject matter. To Blumer, the social world was composed
of an infinite number of social interactions, taking place in
an infinite number of settings, involving an infinite number of individuals, thus every social interaction was
a completely unique event. He argued that humans do
not merely react to social events; they interpret, define,
and give meaning to behavior, based on their experiences.
Blumer criticized those who depicted individual actors as
involuntary pawns of social structures, and argued that the
role of social forces should be explained in terms of the
process of interpretation engaged in by the individuals as
they encounter situations. To do so, Blumer argued that it
is imperative to grasp an individuals view of social reality,
by engaging in participant observation and attempting to
837
838
Ethnography
Ethnography
839
categories as necessary. Second, these categories are refined by combining any categories that overlap, a process
that Glaser and Strauss argued should be shaped by the
emerging properties of each category. Third, theory must
be delimited by removing redundant categories. Finally,
when these steps are complete, the theory can be written.
Glaser and Strauss encouraged the use of theoretical
sampling, which they defined as the process of data collection for generating theory whereby the analyst jointly
collects, codes, and analyzes his data and decides what
data to collect next and where to find them, in order to
develop his theory as it emerges. This strategy is intended
to encourage a comprehensive investigation of the research question, because cases will be selected based
on their theoretical relevance to the emerging categories.
When data no longer aid the construction of categories,
Glaser and Strauss claim that the categories have become
saturated and at this point theory can be constructed. Two
types of theory can be generated via grounded theory,
substantive and formal. Substantive theory is applicable
only in specific situations, but from this, formal theory can
be developed, which can be generalized across broader
contexts.
840
Ethnography
Ethnography
841
important progression has been the development of survey-style ethnography. Survey-style ethnography consists
of a large number of in-depth interviews and frequently
requires interviewing over 100 informants. Typically, in
order to expedite this process, data are collected by
a team of collaborative ethnographers, a methodological
process originally cultivated in 1941 by Lloyd Warner in the
Yankee City Series, an epic multivolume, mixed-approach
study of Newburyport, Massachusetts, and also through
the individual work of Howard Becker and Renee C.
Fox, whose collaborative field teams investigated student
physicians. Usually, those who are interviewed are drawn
from a nonrandom sample of individuals whom the ethnographer has reason to believe can shed light on the topic
being investigated. Survey-style interviews are generally
conducted only once with each participant, who are encouraged to speak frankly and at length in his or her own
terms regarding the topic being investigated.
Aided by over 30 research associates, sociologist
Kathryn Edin and anthropologist Laura Lein interviewed
379 single mothers in Chicago, Boston, San Antonio,
Charleston, and rural Minnesota. The interviews were
conducted in order to study the efficacy of welfare reform.
Edin and Lein compared the in-depth interview data for
those mothers who held low-wage, unskilled employment
with data for those mothers who were on welfare, to ascertain the financial well being of both groups, and concluded that mothers who worked were worse off than
those on welfare. In her bid to understand morality
among the working class, in 2000, sociologist Miche`le
Lamont singlehandedly interviewed over 300 blue-collar
men, each for approximately 2 hours, the information
from which comprised the entire data set. Katherine
Newman, in an attempt to gain a comprehensive understanding of the working poor, mobilized a large multiethnic research team in 1999 in order to conduct 300 indepth interviews. Supplementing the data generated by
these interviews, Newmans team completed 150 life history interviews, shadowed 12 individuals for a year, and
completed 4 months of participant observations at fastfood restaurants. To overcome problems of consistency
with multiple interviewers, many survey-style ethnographers extensively train their team of interviewers, which
has contributed to growing trend in the formal teaching of
ethnographic methods. Whereas Park taught by example
and much of what his students learned about methods
was through first-hand experience in the field, ethnographic methodology now forms the basis of classes at
academic institutions and is the focus of a great number
of books. The formal teaching of ethnographic
techniques has resulted in a generation of introspective
and reflexive ethnographers, and a renewed interested in
methodological issues.
Although survey-style ethnography does not result in
the thick description generated by ethnography that is
842
Ethnography
Conclusion
Ethnography can be considered a fundamental methodology of the social sciences. Over the past century,
ethnographic methodology has led to the discovery of
some of the most valuable concepts, theory and data
produced in the social sciences. Without ethnography
and its attendant fieldwork, the development of labeling
theory, the level of understanding about the plight of
the urban poor, and the appreciation of the subjective
complexity of social interaction would have gained less
ground.
Once marginalized by positivistic paradigms, ethnography has now evolved into multiple flourishing
Ethnography
843
Further Reading
Anderson, E. (2001). Urban ethnography. International
Encyclopedia of the Social and Behavioral Sciences,
pp. 16004 16008. Elsevier.
Anderson, E. (2003). A Place on the Corner. University of
Chicago Press, Chicago, IL.
Anderson, E. (2004). The Cosmopolitan Canopy.
Becker, H. (1968). Social observation and social case studies.
In International Encyclopedia of the Social Sciences, pp.
232 238. Macmillan, New York.
Becker, H. (1970). Sociological Work: Method and Substance.
Aldine, Chicago, IL.
Becker, H. (1998). Tricks of the Trade; How to Think about
Your Research While Youre Doing It. University of
Chicago Press, Chicago, IL.
Burawoy, M., et al. (1991). Ethnography Unbound. Power and
Resistance in the Modern Metropolis. University of
California Press, Berkeley, CA.
Denzin, N. K. (1997). Interpretive Ethnography: Ethnographic
Practices for the 21st Century. Sage, London.
Duneier, M. (1999). Sidewalk. Farrar Straus & Giroux,
New York.
Farris, R. E. L. (1967). Chicago Sociology 1920 1932.
University of Chicago Press, Chicago, IL.
Fine, G. A. (1995). A Second Chicago School: The Development of a Postwar American Sociology. University of
Chicago Press, Chicago, IL.
Geertz, C. (1983). Local Knowledge: Further Essays in
Interpretive Anthropology. Basic Books, New York.
Glaser, B. G., and Strauss, A. L. (1967). The Discovery of
Grounded Theory: Strategies for Qualitative Research.
Aldine de Gruyter, Berlin.
Holstein, J. A., and Gubrium, J. F. (1995). The Active
Interview. Sage Press, Thousand Oaks, CA.
Lamont, M. (2000). The Dignity of Working Men: Morality
and the Boundaries of Race, Class and Immigration.
Harvard University Press, Cambridge, MA.
Newman, K. S. (1999). No Shame in My Game: The Working
Poor in the Inner City. Knopf and the Russell Sage
Foundation, New York.
Robinson, W. S. (1951). The Logical Structure of Analytic
Induction. Am. Sociol. Rev. 16, 812 818.
Eugenics
Garland E. Allen
Washington University, St. Louis, Missouri, USA
Glossary
allele One of several forms of a gene for a particular trait; for
example, tall (symbolized T) in pea plants is one form of the
gene for height and dwarf (symbolized t) is another allele;
there can be more than two alternative alleles for a given
trait within a population.
dominant gene A gene whose effects mask those of its
recessive counterpart (for example, tallness in pea plants is
dominant over shortness, or dwarf).
eugenics An ideology and movement of the early 20th
century aimed at improving human social conditions by
preventing those deemed genetically unfit from having few,
if any, offspring, and encouraging those who are deemed fit
to have more offspring.
gene A unit of heredity (known to consist of a specific
segment of DNA) that is passed on from parent to offspring
during reproduction; in sexually reproducing organisms
such as humans, offspring inherit two genes for each trait,
one from the male parent and the other from the female
parent.
genetics The science of heredity in the biological sense.
genotype The particular set of genes any individual carries
for a given trait (TT, Tt, tt); genotype determines the
genes that an individual can pass on to his or her
offspring.
heterozygous The condition in which an individual has two
different alleles for a given trait (in Mendelian notation
given as Aa, Tt, Bb, etc.).
homozygous The condition in which an individual has the
same alleles for a given trait (in Mendelian notation given as
AA or aa, TT or tt, BB or bb).
phenotype The appearance or expression of a trait, regardless of genotype; thus, in pea plants individuals that
are TT or Tt look alike (they are tall) but have different
genotypes.
recessive gene A gene whose effects are masked by
a dominant gene (for example, blue eye color in humans
is recessive to brown eye color).
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
845
846
Eugenics
In this brief definition, Galton lays out all the dimensions that came to characterize eugenics as an ideology
and social/political movement during the first half of the
20th century: (1) a clear belief in the methods of selective
breeding from early 20th century animal and plant breeding as applied to the improvement of the human species;
(2) a strong belief in the power of heredity in determining
physical, physiological, and mental (including personality) traits; (3) an inherent ethnocentrism and racism that
included belief in the inferiority of some races and superiority of others (a view extended to ethnic groups and
social classes as well); and (4) a belief in the power of
science, rationally employed, to solve social problems,
including ones so seemingly intractable as pauperism,
crime, violence, urban decay, prostitution, alcoholism,
and various forms of mental disease, including manic depression and feeblemindedness (retardation).
Although the term eugenics and the idea behind it
were Galtons, he himself did not found a movement to
either develop eugenics as a science or to put its
principles into practice. Eugenics movements did not
begin to arise in various countries of Europe or the United
States until the first decade of the 20th century, nor did
they become generally effective in promoting social and
political programs nationally or internationally until after
1910. The earliest eugenics movements were founded in
Germany in 1904, in Great Britain in 1907, and in the
United States in 1908 1910. Other eugenics movements
appeared subsequently around the world: in western
Europe (France, Norway, Sweden, Denmark), Russia,
Latin America (Cuba, Brazil, Mexico), Canada, and
Asia (Japan). However, it was in the United States,
Great Britain, and Germany where eugenics as an intellectual and social movement reached its greatest strides
and, from the eugenicists point of view, achieved its
greatest ideological and political effects.
Eugenics
to another, and over time, from the early 1900s until the
period just before the onset of World War II. For
example, British eugenicists were particularly concerned
with the high fecundity and inherited mental degeneracy
of the urban working class, particularly those labeled as
paupers. By contrast, American eugenicists were more
concerned with the number of feebleminded who filled
the prisons and insane asylums and, after World War I,
with the supposed genetic deficiencies of immigrants. In
Germany, mentally ill, psychotic, psychopathic, and psychiatric patients in general, along with the congenitally
deaf, blind, and feebleminded, were of greatest concern.
German eugenicists were also particularly interested in
increasing the number of fitter elements in society (positive eugenics)where prior to the National Socialist
takeover in 1933 fitness was understood more in
terms of class rather than race. In France and Russia,
where ideas of the inheritance of acquired characteristics
held more sway than in other countries, eugenicists concentrated more on public health reforms than on selective
breeding. Although eugenics was not a monolithic
movement, certain core principles and beliefs did link
various eugenics movements together and the three
major international eugenics congresses, held in 1912
(London) and in 1921 and 1932 (New York), emphasized
the similarities among the various movements while also
revealing the differences.
847
in terms of the Mendelian theory of heredity, where various adult traits were often spoken of as being transmitted
via discrete units, called genes, from parents to offspring.
This was sometimes referred to as the unit-character hypothesis, in which an adult character or trait (e.g.,
feeblemindedness) was thought to be determined by a
single Mendelian gene (e.g., in this case a recessive gene,
i.e., one whose effects could be masked by its dominant
allele, or alternative form of the gene).
4. Coupled with point (3) above, eugenicists argued
that the environment in which a person was raised was
of much less importance than the germ plasm inherited
from their parents (i.e., biological inheritance) as the
cause of adverse social status, criminality, or general
social maladjustment. A corollary of the latter two points
is that such behaviors, including feeblemindedness, were
so strongly determined by biological factors that they were
virtually unchangeable. Significantly improving the cognitive ability of the feebleminded or making the criminal
into a model citizen was deemed virtually impossible.
Biology was destiny.
Major Leadership
In most countries, eugenics movements combined theory
(about the nature and pattern of inheritance) with various forms of social and political programs (from education committees to lobbying political leaders). For
example, the acknowledged leader of American eugenics,
Charles B. Davenport (1866 1944), was inspired by
spending part of a sabbatical year (1899 1900) in
London with Galton and his protege Karl Pearson
(1857 1936) to develop eugenics as the quantitative
study of evolution. Indeed Davenport, like Galton
and Pearson, first applied biometrical principles to the
evolution question and only secondarily to eugenics. In
the years before 1925, most eugenicists were well-respected members of the scientific community and the
eugenic ideas they espoused were not considered eccentric or bizarre. Davenport received his Ph.D. from
Harvard in 1891, taught at the University of Chicago,
and then obtained funds from the Carnegie Institution of
Washington to establish his own research laboratory, the
Station for the Experimental Evolution (SEE) in 1904 at
Cold Spring Harbor, Long Island, New York, to promote
the study of heredity (soon to become known as genetics) and its relationship to selection and evolution. He
was a member of the National Academy of Sciences
(United States) and the National Research Council. In
848
Eugenics
Eugenics
849
Figure 3 The building that housed the Kaiser-Wilhelm Institute for Anthropology, Human Genetics and Eugenics, as it looks today. Photo courtesy of Garland
Allen.
850
Eugenics
alcoholic or a criminal? How was feeblemindedness defined? Recognizing that such conditions are culturally
defined, Davenport, for example, lumped all such individuals into the category of social defectives or socially
inadequate persons. Although eugenicists would have
liked to have had what they could refer to as objective
and quantitative measurements, for most of the behavioral and mental traits in which they were interested, no
such definitions or measurements existed. For the most
part, they had to rely on highly qualitative, subjective
methods of defining traits and categorizing individual behavior.
Measuring the Trait of Intelligence
One trait that could be expressed quantitatively was intelligence, tests for which were developed particularly in
the United States. In 1912, Davenport arranged for his
long-time friend, Henry H. Goddard (1856 1962), then
Director of the Training School for Feebleminded Boys
and Girls at Vineland, New Jersey, to administer versions
of the French Binet-Simon test to immigrants arriving at
Ellis Island. Although the Binet-Simon test was intended
to measure only an individuals mental functioning at
a given point in time, Goddard and a host of American
psychometricians considered that it also measured innate,
or genetically determined intelligence. Goddard coined
the term feeblemindedness to refer to those people who
scored below 70 on his tests and claimed that it was
a condition of the mind or brain which is transmitted
as regularly and surely as color of hair or eyes. Because
Goddard was convinced that feeblemindedness was
a recessive Mendelian trait, he reformulated the concept
of intelligence from a continuous character to that of
a discrete character. And it was Goddard who carried
out the famous study demonstrating the supposed inheritance of mental deficiency in a New Jersey family known
by the pseudonym Kallikak.
For psychometricians and eugenicists, the belief that
their tests measured innate capacity rather than merely
accumulated knowledge meant that the tests could be
used as an instrument for carrying out educational and
social policy, not merely as a measure of an individuals
progress at a specific point in time. For eugenicists, the
new mental tests, especially the Stanford-Binet test first
published in 1916, were seen as a precise, quantitative tool
for measuring an otherwise elusive, but fundamental
human trait. The fact that much of the material, including
terminology, on which the tests were based was culturebound did not deter psychometricians or eugenicists from
claiming that the tests measured only innate learning capacity. Even when results from the U.S. Army tests during
World War I showed that the longer recruits from immigrant families had lived in the United States, the better
they did on the tests, Carl C. Brigham (1890 1943),
a Princeton psychologist who analyzed the data, argued
1
II
5
I
2
A
5
E
6
E
7
N
III
N
1
4
5
5
6
6
7
7
8
that the trends showed a decline in the quality of immigrants over time, not their degree of familiarity with the
cultural content of the tests.
The Family Pedigree Method
The family pedigree chart was one of the main analytical
means of displaying and analyzing data on the heredity of
one or another behavioral trait (Fig. 4). The data that went
into constructing pedigree charts, and on which strong
hereditarian claims were based, were often anecdotal,
subjective, and many times obtained from second- and
third-hand sources. Typical examples are the family
studies carried out through the auspices of the Eugenics
Eugenics
George Delaval
of 1692
1
II
5
2
4
4
III
IV
11
12
13
14
15
16
17
851
18
19
10
20
21
14. Administrator.
15. Legislator.
16. Clergyman.
17. Fearlessness.
18. Authorship.
19. Inventiveness.
20. Musical capacity.
21. Artistic capacity.
Figure 5 A family pedigree chart for thalassophilia (love of the sea), in the family of Charles
William de la Poer Beresford. Davenport interpreted the fact that the trait appears only in males
(squares) and can skip generations to indicate that it was a sex-linked Mendelian recessive. Reprinted from Davenport, C. B. (1919). Naval Officers, Their Heredity and Development, p. 43.
Carnegie Institution of Washington, Washington, DC.
852
Eugenics
Political Activity
From the start, most eugenicists were anxious to play a role
in the public arenain what today would be called formation of public policy (Karl Pearson in England was an
exception in maintaining caution in this regard, and for
that and other reasons related to the structure of British
politics, eugenicists did not influence much legislation in
Parliament). A good deal of eugenicists efforts in other
countries, however, focused on lobbying for compulsory
sterilization laws for the genetically unfit and, especially
in the United States, for eugenically informed immigration restriction.
Passage of Eugenical Sterilization Laws
The United States pioneered in the passage of eugenical
sterilization laws. After the ERO was launched in 1910,
Laughlin became particularly active in lobbying for the
passage of a number of such sterilization laws at the state
level. Indeed, Laughlin drew up a Model Sterilization
Law that served as a prototype from which each state
Eugenics
Criticisms of Eugenics
Almost from the beginning, many of the basic premises of
eugenics were brought under critical scrutiny by biologists, medical doctors, social workers, and laypersons from
all walks of life. Criticisms emerged in most countries by
the mid-1920s, though the reasons differed widely.
853
854
Eugenics
over Eugenics, at the Third International Eugenics Congress in New York City in 1932. Muller, who harbored
strong eugenical beliefs as well as socialist leanings, argued that until the economic and social environment
could be equalized it would be impossible to know how
much of any individuals social inadequacy was due to
heredity and how much to environment.
Eugenics
855
Henry Fairfield Osborn. Frederick Osborn has been described as a moderate or reform eugenicist, perhaps
a misleading term but nonetheless signaling the different
approach he was to give to eugenics ideas especially in the
postwar period. Osborn did not invoke a rejection of eugenic ideology or goals, but a toning down of the overtly
racist and simplistic genetic claims of the previous generation. Osborn was the first secretary and later President
of the Pioneer Fund, set up by millionaire Wycliffe
Draper of New York, with Harry Laughlin and Madison
Grant as advisors, to demonstrate that biologically as well
as sociologically blacks were inferior to whites (in the
1950s and 1960s, Draper funded, through other channels,
a number of citizens groups in the southern United States
to oppose the civil right movement and school integration;
the Fund exists in the early 21st century under the Presidency of psychologist J. Philippe Rushton). As a major
figure after the war in the Rockefeller-funded Population
Council, Osborn expanded eugenicists goals from controlling the fertility of individual (what he considered
inferior) families to whole groups and nations: that is,
control of population growth, especially among the poor at
home and Third World nations abroad. Concern about
the high birth rate of non-whites was the central underlying issue of the population control movement, although
its mission was stated in less directly racist terms as saving
the planet from overpopulation. Centered primarily in
the United States, the population control movement (represented by organizations such as the Population Council, Zero Population Growth, and Planned Parenthood)
was the direct heir to the social mission of the older eugenics movement.
France
In France, eugenics was based from the outset of the
soft hereditary concepts associated with the inheritance
of acquired characteristics and so placed less emphasis
on Mendelian interpretations, i.e., the unit-character
hypothesis. The French were never able to pass any significant eugenic-based legislation and, in 1930, with promulgation of the Papal Encyclical Casti connubi, which
directly criticized eugenical sterilization, the movement
took a serious setback. As a result, throughout the 1930s
eugenics became much more a public health movement
than a biological determinist movement.
The Soviet Union
Although eugenics flourished in Russia both before and
immediately after the revolution of 1917, in the postrevolutionary period it was viewed as both one of the
most important applications of Marxist science to
human society (Marxism gave high priority to science
and scientific thinking applied to societal issues) and potentially as a class-based and racist ideology that denied
the possibility of human improvement. Several factors
856
Eugenics
Eugenics Today
The history of the older eugenics movement raises many
issues relevant to the expanding work in genomics, especially the Human Genome Project (HGP). Since the advent of new technologies associated with test-tube babies,
sequencing the human genome, cloning new organisms
from adult cells, stem cell research, genetic testing, and
the prospects of gene therapy, the term eugenics has once
again come into popular culture. Since it is possible,
through in utero testing, to determine whether a fetus
is male or female, has Downs syndrome (trisomy 21),
or a mutation for Huntingons disease, cystic fibrosis, thalassemia, or Tay-Sachs, should these tests be required for
all pregnant women? And if so, who should have access to
the results? Can medical insurance companies refuse to
cover families or their children if the mother does not
undergo genetic testing of the fetus? Some medical ethicists argue that the outcomecontrolling births in order
to reduce the number of defective people in societyis
identical to that issuing from the old eugenics movement,
only the means and the agency are different. According to
this view, it makes little difference whether state legislation or social and economic pressure force people to make
reproductive decisions that they might not otherwise
make. Other ethicists, however, argue that state coercion,
as in the old eugenics movement, is qualitatively different
from various forms of social pressure, since the latter still
gives the individual some range of choice. In addition, it
can be argued that modern genetic decisions are made on
a case-by-case basis and do not involve application of
policies to whole groups defined racially, ethnically, or
nationally.
Further Reading
Adams, M. B. (ed.) (1990). The Wellborn Science: Eugenics in
Germany, France, Brazil and Russia. Oxford University
Press, New York.
Eugenics
Allen, G. E. (2001). Mendel and modern genetics: The legacy
for today. Endeavour 27, 63 68.
Allen, G. E. (1986). The eugenics record office at Cold Spring
Harbor, 1910 1940: An essay in institutional history. Osiris
2, 225 264.
Barkan, E. (1992). The Retreat of Scientific Racism. Changing
Concepts of Race in Britain and the United States
between the World Wars. Cambridge University Press,
New York.
Broberg, G., and Roll-Hansen, N. (eds.) (1996). Eugenics and
the Welfare State. Sterilization Policy in Denmark, Sweden,
Norway and Finland. Michigan State University Press, East
Lansing, MI.
Carlson, E. A. (2001). The Unfit: History of a Bad Idea. Cold
Spring Harbor Laboratory Press, Cold Spring Harbor,
New York.
Chase, A. (1977). The Legacy of Malthus. Alfred A. Knopf,
New York.
Kevles, D. J. (1985). In the Name of Eugenics. Alfred A. Knopf,
New York.
857
Glossary
censoring Happens when the timing of events for members
of a sample are observed incompletely because events do
not occur within the observation period.
competing risks Two or more distinct possible changes in
a discrete outcome (e.g., someone may leave a job because
she quits, retires, dies, or is fired).
event An essentially instantaneous change in the value of
a discrete random variable, or of some outcome measured
by such a variable.
hazard rate The limit, as Dt shrinks to zero, of the probability
that an event occurs between t and t Dt, divided by Dt,
given that the event did not occur before t.
risk set Sample members at risk of a certain event at
a particular time (e.g., only currently married people are
at risk of divorce).
spell A continuous interval of time between a change in the
value of the discrete outcome; it may be partitioned into
subspells (shorter continuous intervals of time) to record
changes in predictor variables or changes in their effects.
survival analysis The special case of event history analysis
that examines a single nonrepeatable event (e.g., death, first
birth).
transition rate Analogous to a hazard rate, but focuses on
transitions to a particular state of the discrete outcome (i.e.,
a transition to one of several competing risks).
truncation Occurs when event histories of certain cases are
excluded from the sample because their event occurred
before data collection began or after it ended.
Introduction
Empirical Applications
Social scientists in many fields collect event histories.
The usual goals are to understand the timing, spacing,
and order of events and the factors that influence
them. For example, demographers collect fertility
histories, marital and cohabitation histories, and migration histories. Demographers, economists, and sociologists assemble work histories of people. Sociologists
also gather histories of riots, strikes, and other forms of
collective action. Both political scientists and political sociologists compile histories of wars and other international
conflicts and, contrastingly, of various efforts of nationstates to solve world problems (e.g., accession to treaties,
joining of international organizations). Political scientists
also examine histories of various legislative actions and of
political party changes. Organizational scientists accumulate histories of organizational foundings, mergers, and
failures, and of various actions of organizations (e.g.,
changes in organizational strategy, replacement of an
organizations chief officers, shifts in its linkages to
other organizations). Social psychologists sometimes
collect histories of the behaviors and interactions of
individuals (e.g., who talks to whom) in both experimental
and nonexperimental settings.
Social scientists may use event history analysis to address the following types of questions: Do marriages tend
to end in divorce more often and/or sooner if preceded by
the couple cohabiting? What types of workers and jobs
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
859
860
lead to more rapid job turnover? Which kinds of individuals are especially slow to leave unemployment? How
does police intervention in a riot affect the timing and
nature of subsequent collective action in the same locality? Which types of countries are quick (or slow) to accede
to various kinds of treaties? What attributes of businesses
are associated with a business failing or being acquired
by another firm?
Status
4
3
2
1
0
0
10
20
30
40
50
Age (in years)
60
70
80
Conceptualization of
the Change Process
In event history analysis, a time path like the one in Fig. 1
is customarily assumed to represent reality for a member
of the sample. But a specific conceptualization of the
change processin this example, of starting and stopping
jobsinfluences the picture in important ways. In particular, analysts may differ as to which statuses should be
distinguished and which ones should be combined. For
example, some might distinguish various kinds of jobs
(e.g., in terms of occupation) and treat unemployed
and not in the labor force as distinct ways of being
out of work. Others might combine never worked
with out of work into one status called not working.
Further, transitions from one status to another are not
always instantaneous. If changes are not instantaneous, analysts may make different decisions about the time of each
event. For example, an analyst might decide that the date of
a divorce occurs when a couple ceases to live together, files
for divorce, or is officially divorced. Such decisions should
be grounded in social scientific theory, but practical considerations also play a role. Available methodological tools
give limited guidance concerning the consequences of making one decision rather than another. Making finer (coarser)
distinctions about statuses tends to increase (decrease) the
number of events and to shorten (lengthen) the time between events. Consequently, these kinds of decisions can
profoundly affect overall findings.
Data
Information like that shown graphically for a single case in
Fig. 1 must be converted into a form of data suitable for
statistical analysis of event histories. How information is
coded and put into machine-readable form depends on
the computer software that will be used for the event
history analysis. A wide variety of software is available,
and new software and new extensions of existing software
appear intermittently.
There is no single standard way of organizing event
histories as data. From a statistical viewpoint, one way is
not intrinsically better than another, but some ways yield
data that are easier to manipulate and require less storage
space. One common way is to subdivide each cases event
history into a series of spells. A spell refers to a continuous
interval of time when the value of the discrete outcome does
not change. It ends when an event occurs or when the
observation period ends so that the spell is censored on
the right. A spell that begins with an event and
terminates either with another event or with the end of
the observation period may be subdivided further into
a series of subspells (shorter time intervals), in which covariates or their effects may change. Splitting spells defined
861
Statistical Concepts
Survival Probability (Survivor Function)
Consider an event in an individuals lifetime, assumed to
begin at t0, when the individual may first become at risk of
the event. For simplicity, t0 is assumed to be zero and is
omitted from subsequent equations. The time of the first
event is a random variable, T1, or simply T for a single
nonrepeatable event, such as death or having a first
Hazard Rate
A particularly important concept is the hazard rate,
also called the failure rate (especially in the engineering
literature):
Prt5T5t Dt j T4t
f t=St
Dt
d ln St=dt:
ht lim
Dt#0
862
Transition Rate
An especially important concept when there are competing risks is the transition rate (also known as the transition
intensity, or as the instantaneous rate of a transition) to k,
rk t htmk t:
Exploratory Analyses
Exploratory analyses rely primarily on estimating the
statistics defined in the previous section and then
examining how the estimates vary over time and across
subgroups of cases distinguished on the basis of proposed
predictor variables. These methods are especially useful
in event history analysis because patterns of variation over
time and across cases are complex and are not easily
captured by a single statistic.
Survival Probability
When right-censoring occurs and is independent of the
occurrence of the event being studied, the product-limit
estimator of S(t) proposed by E. L. Kaplan and Paul Meier
(KM) in 1958 is unbiased and asymptotically consistent:
Y
di
^
SKM t
1
, ti1 5 t 5 ti ,
8
n
ti5t
X di
,
n
ti5t i
ti1 5 t 5 ti :
11
863
ti 1 5 t 5 ti :
Hazard Rate
There is no unambiguously best estimator of hazard
rates. One estimator is based on the KaplanMeier
estimator of the survival probability at time t:
^KM t 1 ln 1 di ,
h
Dti
ni
ti1 5 t 5 ti : 13
ti1 5 t 5 ti :
14
864
ti1 5 t 5 ti : 15
Transition Rate
The transition rate to state k, rk(t), is estimated using
analogues to Eqs. (13)(15), except that d(i) in those
equations is replaced by dk(i), the count of the events
consisting of a transition to state k at time t(i). For example,
the NelsonAalen estimator of the transition rate to
state k is:
1 dki
^rk t
,
ti1 5 t 5 ti :
16
Dti ni
Confirmatory Analyses
Exploratory analyses are usually followed by confirmatory
analyses in which researchers seek to model the process
generating events in terms of proposed explanatory
variables and some basic underlying pattern of time
dependence. After formulating a model, they want to estimate it and test it against plausible alternatives. In principle, any of the previously defined statistical concepts
could be the primary focus of a model. For various
reasons, the most appropriate statistic is often the hazard
rate, or the transition rate if there are competing risks.
Another popular choice is the mean time between events,
or the mean logarithm of the time between events.
Specification
h(t)
Ht
Constant
Exponential
at
Monotonic
Gammab
aatb1 eat
Rt
Gb a 0 aub1 eau du
gt
Gompertz
Makeham
Rayleigh
Pareto
be
a b egt
a bt
a/(t b)
Weibull
b(t d)g
Nonmonotonic
Generalized Rayleigh
Log-logistic
Log-Gaussianc
Sickle
865
a bt gt2
dg(dt)g1/[1 (dt)g]
"
2 #
ln t m
1
p exp
2s2
st 2p
ln t m
1F
s
b tegt
a
In terms of time t, t0 0, and parameters
(denoted
R1
b
G(b) denotes the gamma function, 0 ub1 eu du.
c
Rt
0
ln
h u du
a
Gb
aub 1 e au du
gt
(b/g)(e 1)
at (b/g)(egt 1)
at bt2/2
a [ln(t b) ln b]
i
b h
t dg1 dg1
g1
at (b t/2)t2 (g/3)t3
ln[1 (dt)g]
ln t m
ln 1 F
s
(b/g)2[(gt 1) egt 1]
866
that h(t) may be constant (i.e., not varying with time), may
change monotonically (either increasing or decreasing
with time), or may change nonmonotonically (typically
having only one inflection point).
17
qtyxi t
yx t
i :
qty xj t
y xj t
18
J
Y
x t
gj j ,
20a
19a
ln ht ln b ct,
20b
19b
j1
number of covariates.
has an additive effect
log transition rate)
the hazard rate (or
yxt expb0 x,
21a
ln yxt b0 x,
21b
ln ht b x ct:
22a
22b
23a
ln ht b0 x g0 zt:
23b
tp1 5 t tp ;
24b
867
f 0 t
,
S 0 t
25
where f0(t) and S0(t) refer to the PDF and the survival
probability associated with T0. In Eq. (25), the subscript
0 indicates that the quantities refer to the baseline. In an
AFT model, it is assumed that the CDF of the time of an
event Ti for a case i with covariates xi equals the
CDF for T0 (the baseline) multiplied by some function
868
Model Estimation
Maximum-Likelihood Estimation
26
I
X
1 ci ln f ti , xi ci ln Sti , xi
30a
i1
I
X
1 ci ln hti , xi ln Sti , xi ,
30b
i1
ht j xi f t j xi =St j xi
f0 exp b0 xi t exp b0 xi
0
S0 exp0 b
xi t 0
h0 exp b xi t exp b xi :
27
28
29a
E log T0 E log T0 :
29b
Partial-Likelihood Estimation
The method of partial likelihood (PL) can be used to
estimate the proportional hazard rate model in which
q(t) is taken to be a nuisance function that is not directly
estimated. The partial likelihood is
I
. X
Y
Lp
yxv ti ,
yxi ti
31
i1
v[Rti
869
Further Reading
Blossfeld, H.-P., and Rohwer, G. (2002). Techniques of Event
History Modeling: New Approaches to Causal Analysis, 2nd
Ed. Lawrence Erlbaum, Mahwah, New Jersy.
Cox, D. R., and Oakes, D. (1984). Analysis of Survival Data.
Chapman and Hall, London.
Petersen, T. (1991). The statistical analysis of event histories.
Sociol. Methods and Res. 19, 270323.
Tuma, N. B., and Hannan, M. T. (1984). Social Dynamics:
Models and Methods. Academic Press, Orlando.
Wu, L. L. (2003). Event history models in life course analysis.
In Handbook of the Life Course (J. Mortimer and
M. Shanahan, eds.), pp. 477502. Plenum, New York.
Experimenter Effects
Robert Rosenthal
University of California, Riverside, Riverside, California, USA
Glossary
Noninteractional Effects
Observer Effects
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
871
872
Experimenter Effects
Interpreter Effects
The interpretation of the data collected is part of the
research process, and a glance at any of the technical
journals of the contemporary behavioral and social sciences will suggest strongly that researchers only rarely
debate each others observations, but they often debate
the interpretation of those observations. It is as difficult
to state the rules for accurate interpretation of data as it is
to define accurate observation of data, but the variety of
interpretations offered in explanation of the same data
implies that many interpreters must turn out to be
wrong. The history of science generally, and the history
of psychology more specifically, suggest that more observers and interpreters are wrong longer than is necessary because theories are not held quite lightly enough.
The common practice of theory monogamy has its advantages, however. It does maintain motivation to make more
crucial observations. In any case, interpreter effects seem
less serious than observer effects. The reason is that the
former are public whereas the latter are private. Given
a set of observations, their interpretations become generally available to the scientific community. There is freedom to agree or disagree with any specific interpretation.
This is not so with observations. Often these are made by
a single investigator, so others are not simply free to agree
or disagree. Instead, it can only be hoped that no observer
errors occurred, and the observations can and should be
repeated by others.
Some kinds of common interpreter effects in the behavioral and social sciences may be reducible by means of
improved research and data analytic methodology. For
example, the interpretations by many psychotherapy researchers of the literature on psychotherapy outcome, i.e.,
that psychotherapy does not work, held sway for a long
time until, in 1980, Smith, Glass, and Miller showed on
the basis of quantitative, comprehensive analyses of all the
literature that the interpretation that psychotherapy did
not work was not only in error but was in error to a large
and specifiable degree.
As recently as 25 years ago, bodies of literature were
little more than massive inkblots on which the interpreter
could impose a wide range of interpretations. Now,
however, the variety of meta-analytic procedures that
have been evolving make more objective, more systematic, and more quantitative the summarizing of entire
research domains. As more behavioral researchers employ these newer procedures of meta-analysis, the rate of
Intentional Effects
It happens sometimes in undergraduate laboratory science courses that students collect and report data too
beautiful to be true. (That probably happens most often
when students are taught to be scientists by being told
what results they must get to do well in the course, rather
than being taught the logic of scientific inquiry and the
value of being quite open-eyed and open-minded.) Unfortunately, the history of science tells us that not only
undergraduates have been dishonest in science. Fortunately, such instances are rare; nevertheless, intentional
effects must be regarded as part of the inventory of the
effects of investigators. Four separate reports on important cases of scientific error that were very likely to have
been intentional have been authored by Hearnshaw,
Hixson, Koestler, and Wade, respectively. The first and
last of these, of greatest relevance to behavioral researchers, describe the case of the late Cyril Burt, who, in three
separate reports of over 20, over 30, and over 50 pairs of
twins, reported a correlation coefficient between intelligence quotient (IQ) scores of these twins, who had been
raised apart, of exactly 0.771 for all three studies! Such
consistency of correlation would bespeak a statistical miracle if it were real, and Wade credits Leon Kamin for
having made this discovery on what surely must have
been the first careful critical reading of Burt. Although
the evidence is strong that Burts errors were intentional,
it is not possible to be sure of the matter. That is often the
case also in instances when fabrication of data is not the
issue, but when there are massive self-serving errors of
citation of the literature.
Intentional effects, interpreter effects, and observer
effects all operate without experimenters affecting
their subjects responses to the experimental task. In
those effects of the experimenters to be described next,
however, the subjects responses to the experimental task
are affected.
Interactional Effects
Biosocial Effects
The sex, age, and race of the investigator have all been
found to predict the results of his or her research. It is
not known, but is necessary to learn, however, whether
participants respond differently simply in reaction to experimenters varying in these biosocial attributes or whether
experimenters varying in these attributes behave differently
toward the participants; in the latter case, experimenters
obtain different responses from participants because the
Experimenter Effects
experimenters have, in effect, altered the experimental situation for the participants. So far, the evidence suggests that
male and female experimenters, for example, conduct the
same experiment quite differently, so the different results
they obtain may well be due to the fact that they unintentionally conducted different experiments. Male experimenters, for example, were found in two experiments to
be more friendly to participants. Biosocial attributes of participants can also affect experimenters behavior, which in
turn affects participants responses. In one study, for example, the interactions between experimenters and the participants were recorded on sound films. In that study, it was
found that only 12% of the experimenters ever smiled at the
male participants, whereas 70% of the experimenters
smiled at the female participants. Smiling by the experimenters, it was found, predicted the results of the experiment. The moral is clear. Before claiming a sex difference in
the results of behavioral research requires first ensuring that
males and females are treated identically by experimenters.
If the treatment is not identical, then sex differences may be
due not to genic, or constitutional, or enculturational, or
other factors, but simply to the fact that males and females
are not really in the same experiment.
Psychosocial Effects
The personality of the experimenter has also been found
to predict the results of his or her research. Experimenters
who differ in anxiety, need for approval, hostility, authoritarianism, status, and warmth tend to obtain different
responses from experimental participants. Experimenters
higher in status, for example, tend to obtain more
conforming responses from participants, and experimenters who are warmer in their interaction with participants
tend to obtain more pleasant responses. Warmer examiners administering standardized tests of intelligence are
likely to obtain better intellectual performance than are
cooler examiners, or examiners who are more threatening
or more strange to the examinees.
Situational Effects
Experimenters who are more experienced at conducting
a given experiment obtain different responses from study
participants as compared to their less experienced colleagues. Experimenters who are acquainted with the participants obtain different responses than do their
colleagues who have never previously met the participants. The things that happen to experimenters during the
course of their experiment, including the responses they
obtain from their first few participants, can influence
the experimenters behavior, and changes in their behavior can predict changes in the participants responses.
When the first few study participants tend to respond
as they are expected to respond, the behavior of the
873
experimenter appears to change in such a way as to influence the subsequent participants to respond too often
in the direction of the experimenters hypothesis.
Modeling Effects
It sometimes happens that before experimenters conduct their study, they try out the task they will later
have their research participants perform. Though the evidence on this point is not all that clear, it would seem that
at least sometimes, the investigators own performance
becomes a predictor of their subjects performance.
For example, when interviewers speak in longer sentences, their research participants tend also to speak in
longer sentences.
Expectancy Effects
Some expectation of how the research will turn out is
virtually a constant in science. Psychologists, like other
scientists generally, conduct research specifically to test
hypotheses or expectations about the nature of things. In
the behavioral sciences, the hypothesis held by the investigators can lead them unintentionally to alter their behavior toward their participants in such a way as to
increase the likelihood that participants will respond so
as to confirm the investigators hypothesis or expectations.
This is essentially describing, then, the investigators hypothesis as a self-fulfilling prophecy. An event is prophesied and the expectation of the event then changes the
behavior of the prophet in such a way as to make the
prophesied event more likely. The history of science
documents the occurrence of this phenomenon, with
the case of Clever Hans serving as a prime example.
Hans was the horse of Mr. von Osten, a German mathematics instructor. By tapping his foot, Hans was able to
perform difficult mathematical calculations and he could
spell, read, and solve problems of musical harmony.
A distinguished panel of scientists and experts on animals
ruled that no fraud was involved. There were no cues
given to Hans to tell him when to start and when to
stop the tapping of his foot. But, of course, there were
such cues, though it remained for Oskar Pfungst to demonstrate that fact. Pfungst, in a series of brilliant experiments, showed that Hans could answer questions only
when the questioners or experimenters knew the answer
and were within Hans view. Finally, Pfungst learned that
a tiny forward movement of the experimenters head was
a signal for Hans to start tapping. A tiny upward movement of the head of the questioner or a raising of the
eyebrows was a signal to Hans to stop his tapping.
Hans questioners expected Hans to give correct answers,
and this expectation was reflected in their unwitting signal
to Hans that the time had come for him to stop tapping.
874
Experimenter Effects
Animal Experiments
Twelve experimenters were each given five rats that were
to be taught to run a maze with the aid of visual cues. Half
of the experimenters were told their rats had been specially bred for maze-brightness; half of the experimenters
were told their rats had been specially bred for mazedullness. Actually, of course, rats had been assigned at
random to each of the two groups. At the end of the
experiment, the results were clear. Rats that had been
run by experimenters expecting brighter behavior showed
significantly superior learning compared to rats run by
experimenters expecting dull behavior. The experiment
was repeated, this time employing a series of learning
Experimenter Effects
Implications
Three kinds of implications flow from the work on interpersonal self-fulfilling prophecies. The first are the methodological implications for the conduct of scientific
inquiry to minimize the effects of experimenter expectancy effects (and other effects of the experimenter).
For example, the use of double-blind research designs
flows directly from the need to control a variety of experimenter effects, including the effects of experimenter
expectancy. More indirect consequences of a methodological sort flowing from this research are some of the
work focusing on newer procedures for the analyses of
scientific data, including work on meta-analysis, on contrast analysis, and other developments in significance
testing and effect size estimation.
A second line of implications involves those for the
study of nonverbal communication. For some 35 years,
there have been strong reasons to suspect that the mediation of interpersonal expectancy effects depends heavily
on unintended nonverbal communication, and this has
generated a good deal of work on that topic. A third
line of implications involves those for the practical
consequences of these phenomenafor example, in
classrooms, clinics, corporations, courtrooms, and, in particular, interpersonal contexts, such as those involving
gender and race.
All three types of implications have been investigated
intensively, but much of what needs to be known is not yet
known. It seems very likely, however, that efforts to fill
these gaps in knowledge will be repaid by noticeable
progress in the methodological and the substantive
development of the sciences.
875
Further Reading
Ambady, N., and Rosenthal, R. (1992). Thin slices of
expressive behavior as predictors of interpersonal consequences: A meta-analysis. Psychol. Bull. 111, 256 274.
Blanck, P. D. (ed.) (1993). Interpersonal Expectations: Theory,
Research, and Applications. Cambridge University Press,
New York.
Cooper, H. M., and Hedges, L. V. (eds.) (1994). The Handbook
of Research Synthesis. Russell Sage, New York.
Goring, E. G. (1950). A History of Experimental Psychology,
2nd Ed. Appleton-Century-Crofts, New York.
Harris, M. J., and Rosenthal, R. (1985). The mediation of
interpersonal expectancy effects: 31 meta-analyses. Psychol.
Bull. 97, 363 386.
Hearnshaw, L. S. (1979). Cyril Burt: Psychologist. Cornell
University Press, Ithaca, New York.
Hixson, J. (1976). The Patchwork Mouse. Anchor Press/
Doubleday, Garden City, New York.
Koestler, A. (1971). The Case of the Midwife Toad. Random
House, New York.
Lipsey, M. W., and Wilson, D. B. (2001). Practical Metaanalysis. Sage, Thousand Oaks, California.
Pfungst, O. (1907). Clever Hans. Barth, Leipzig. (Translated in
1911 by C. L. Rahn; Henry Holt, New York.).
Rosenthal, R. (1976). Experimenter Effects in Behavioral
Research. Irvington Publ., Halsted Press, Wiley, New York.
Rosenthal, R. (1991). Meta-analytic Procedures for Social
Research. Sage, Newbury Park, California.
Rosenthal, R. (1994). Interpersonal expectancy effects: A
30-year perspective. Curr. Direct. Psychol. Sci. 3, 176 179.
Rosenthal, R., and Jacobson, L. (1968). Pygmalion in the
Classroom. Holt, Rinehart and Winston, New York.
Rosenthal, R., and Rubin, D. B. (1978). Interpersonal
expectancy effects: The first 345 studies. Behav. Brain
Sci. 3, 377 386.
Rosenthal, R., Rosnow, R. L., and Rubin, D. B. (2000).
Contrasts and Effect Sizes in Behavioral Research:
A Correlational Approach. Cambridge University Press,
New York.
Smith, M. L., Glass, G. V., and Miller, T. I. (1980). The
Benefits of Psychotherapy. Johns Hopkins University Press,
Baltimore.
Wade, N. (1976). IQ and heredity: Suspicion of fraud beclouds
classic experiment. Science 194, 916 919.
Experiments, Criminology
David Weisburd
Hebrew University, Jerusalem, Israel, and University of Maryland,
College Park, Maryland, USA
Anthony Petrosino
Nashuah, New Hampshire, USA
Glossary
block randomization A type of random allocation that is
stratified in order to maximize equivalence of experimental
groups on key indicators. Block-randomized experiments
also allow examination of interactions between blocking
factors and outcomes in an experimental context.
control group The group in an experiment that does not
receive the proposed intervention; often defined as
a comparison group because it usually receives some form
of traditional criminal justice intervention that is compared
to the experimental condition.
cluster randomized trials A type of experiment in which
large units, such as drug markets, schools, police beats, or
prison living units, are randomly assigned.
eligibility pool The cases or units of analysis that are eligible
for inclusion in an experiment.
experimental criminology The application of randomized
studies to understand crime and justice issues.
experimental group The group in an experiment that
receives the innovative intervention or treatment.
external validity A measure of the extent to which findings or
results from a study sample are seen to represent the
characteristics of the larger population of interest.
evidence-based policy An approach that encourages the
development of public policy based on research findings.
Randomized experiments are generally considered an
important component of strong evidence-based policy.
internal validity A measure of the extent to which an
evaluation design can rule out alternative explanations for
the observed findings; a research design in which the
effects of treatment or intervention can be clearly
distinguished from other effects is defined as having high
internal validity.
Introduction
In criminology, experiments involving programs, policies,
or practices are an important research design because,
when implemented with full integrity, they provide the
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
877
878
Experiments, Criminology
Rationale
The major problem that faces evaluation researchers in
criminology, and indeed in the social sciences more generally, is that causes and effects are extremely difficult to
isolate in the complex social world in which treatments
and programs are implemented. The finding that some
people, institutions, or places do better after treatment is
always confronted with the challenge that improvement
was not because of treatment, but because of some other
confounding factor that was not measured. Sometimes
that confounding factor derives from the nature of the
selection processes that lead some people to gain treatment. For example, if a program relies on volunteers, it
may recruit people who are likely to improve in regard to
drug use, crime, or other measures, simply because they
were ready and motivated to improve when they
volunteered. Or if an evaluation compares program completers to those who dropped out, the results may be
confounded by differences (other than prolonged exposure to the program) between those who stay and those
who do not. Sometimes the confounding is simply a matter
of the natural course of events. For example, when sites
for intervention are chosen, it is often because of the need
to address existing serious crime problems. Though
choosing sites with very serious problems makes sense
in terms of distributing scarce criminal justice resources,
it may be that the unusually high crime rates observed
would naturally decline even if there was no intervention
(the technical term for this phenomenon is regression to
the mean).
An evaluation designs ability to rule out alternative
explanations for observed findings is measured in terms
of internal validity. A research design in which the effects
of treatment or intervention can be clearly distinguished
from other effects is defined as having high internal
validity. A research design in which the effects of treatment are confounded with other factors is one in which
there is low internal validity. For example, suppose
a researcher seeks to assess the effects of a specific
drug treatment program on recidivism. If, at the end
of the evaluation, the researcher can present
study results and confidently assert that the effects
Experiments, Criminology
Barriers to Experimentation in
Crime and Justice
Despite the theoretical benefits of experimental study
in terms of internal validity, some scholars (e.g., Clarke
and Cornish, in 1972, and Pawson and Tilley, in 1997)
have argued that practical and ethical barriers limit the
use of randomized experiments in real crime and justice
contexts. Ordinarily, such concerns relate to the question
of whether the random allocation of sanctions, programs,
or treatments in criminal justice settings can be justified
on the basis of the benefits accrued to society. Or conversely, the concern is whether the potential costs of not
879
880
Experiments, Criminology
Structure of a Typical
Criminological Experiment
Though the substantive area and measurements can be
different, experiments in criminology do not differ structurally from other randomized experiments. Criminological experiments typically have an eligibility pool,
randomization, experimental and control groups, and
post-test or follow-up measures relevant to crime and
justice (Fig. 1). The eligibility pool is composed of
those cases or units of analysis that are eligible for the
experiment. For example, in an experimental test of sex
offender treatment, the eligibility pool may consist of only
those sex offenders who volunteered for treatment, have
a certain percentage of time left on their sentence, and
have no prior sexual crime history. The eligibility pool
is thus composed of all sex offenders who meet these
criteria for inclusion in the study. Researchers then randomly assign members of this eligibility pool to the study
conditions. Randomization often follows a simple division
between treatment and control or comparison conditions.
However, some studies use stratification procedures in
randomization; this ensures greater equivalence between
study groups on the characteristics used for stratification.
Such stratification is called blocking, and criminal justice
experiments may block on characteristics such as gender,
education, or criminal background. Block randomization procedures not only enhance equivalence on the
traits used in stratification, they also allow experimental
analysis of interactions between those traits and outcomes. For example, in a block-randomized experiment
in which randomization has been stratified by gender, the
researcher can examine not only the average difference in
outcomes between treatment and control groups, but also
whether the treatment works differently for men or
women. Importantly, the use of block randomization
Experimental
Eligibility pool
created
Figure 1
Eligible cases
randomized
Outcome measures
Control
Experiments, Criminology
881
882
Experiments, Criminology
14
12
11.6
11
10
9.4
9.4
7.8
6.2
6
4
2
0
1.8
0
0.2
0.6
194550 195155 195660 196165 196670 197175 197680 198185 198690 199193
Experiments, Criminology
Further Reading
Binder, A., and Meeker, J. W. (1988). Experiments as reforms.
J. Crim. Just. 16, 347358.
Boruch, R. F. (1997). Randomized Experiments for Planning
and Evaluation. A Practical Guide. Sage, Thousand Oaks,
California.
Boruch, R. F., Snyder, B., and DeMoya, D. (2000). The
importance of randomized field trials. Crime Delinq. 46,
156180.
Campbell, D. T. (1969). Reforms as experiments. Am. Psychol.
24, 409429.
Campbell, D. T., and Stanley, J. (1963). Experimental and
Quasi-Experimental Designs for Research. Houghton-Mifflin, Boston.
Clarke, R., and Cornish, D. (1972). The Controlled Trial in
Institutional Research: Paradigm or Pitfall for Penal
Evaluators? H. M. Stationery Office, London.
Cullen, F., and Gendreau, P. (2000). Assessing correctional
rehabilitation: Policy, practice, and prospects. In
883
884
Experiments, Criminology
Experiments, Overview
George Julnes
Utah State University, Logan, Utah, USA
Glossary
context-confirmatory inquiry Quantitative or qualitative
research in which initial hypotheses are tested as in
a traditional deductive experiment, but then subsequent
hypotheses (e.g., of moderated relationships not entailed by
the main effects studied first) are developed and subjected
to a confirmatory test using the same data or other
information available in that context.
controlled experiment Inquiry in which the researcher
manipulates the independent variable(s), with some control
over other factors, in order to observe the effects on the
dependent variable(s).
critical experiment An experiment that provides a definitive
test of two or more competing theories.
experiment A deliberate intervention guided by and in
service of promoting our understanding of the structures
and causal processes of our world.
falsificationism An epistemological and methodological
stance in which knowledge is viewed as advancing by
subjecting the implications of theories to empirical tests
and dismissing as falsified those theories not supported.
hypothetico-deductive inquiry Confirmatory research
guided by deducing implications of preferred theories and
then testing whether those implications are consistent with
the observed patterns.
INUS (insufficient but necessary element of an unnecessary
but sufficient package) conditions Causation conceived in
terms of an insufficient but necessary element achieving
a causal impact by virtue of being part of an unnecessary
but sufficient package.
natural experiment Observations around an event uncontrolled by the researcher but believed to be sufficiently
powerful as to reveal a causal relationship between the
event and changes in outcomes associated with the
event.
randomized experiment A controlled experiment in which
the assignment of levels of the independent variable(s) is
performed based on a random process; ideally, this involves
matching subjects into pairs (or triads, etc.) and randomly
Introduction
Experiments have been central to the development of
the social sciences. What we understand as the experimental method, however, (1) has evolved over centuries,
(2) is somewhat different in different social sciences, and
(3) continues to evolve. Accordingly, we must acknowledge several definitions of experiment, beginning with
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
885
886
Experiments, Overview
Varieties of Research
Experiments
What we accept as the proper definition of experiment
matters to both practice and theory. Because of the particular challenges of social science methodology, definitions of experiment in this field have tended to differ from
those in the natural sciences.
Experiments, Overview
of the Earth (voted the seventh most beautiful experiment). Similarly, Newton placed a prism in a beam of
sunlight to decompose light into a spectrum, refuting
Aristotles conclusion that white light was pure and colored light was some alteration of it (voted the fourth
most beautiful experiment). Neither of these experiments involves the manipulation of a variable as part
of the study. Or, rather, if placing a prism in a stream
of light is understood as manipulating a variable, then
almost any systematic observational methodology can
be viewed as experimental.
It remains an open question whether it would be productive in the social sciences to view similar systematic
observations as experiments. It seems clear, however, that
the social science definition is not universal across the
sciences and that the more restrictive view common in
social science is motivated by the greater likelihood of
reaching the wrong conclusions in complex situations.
Failure to manipulate causal factors in social settings generally leaves important questions unanswerable.
Of course, even in the inclusive sense, not every observation, or set of observations, qualifies as an experiment. As with Kaplans, all definitions of experiment refer
to some degree of active organization of the observations,
with the experiments generally inspired by theories and
with the results used to inform theories. Indeed, the common, although perhaps unfair, criticism of the inductive
approach associated with Francis Bacon is that observation is too indiscriminate and of limited use unless
guided by some conceptual framework. Among social scientists, there may be some (e.g., B. F. Skinner) who claim
never to have tested an hypothesis, but most would recognize their efforts as being guided by a paradigm and
devoted to elaborating that paradigm.
887
888
Experiments, Overview
Experiments, Overview
Theories of Causation
We have seen that for Newton and his predecessors the
goal of experimentation was to reveal the causes of observed effects. To understand how the modern view of the
experiment came about it is necessary to understand how
causation has been viewed over the past few centuries. We
begin with the skeptical critique of Hume and then highlight some of the ways that people have responded to his
skeptical challenge.
Humes Skeptical Critique
David Hume (1711 1776), the last of the three major
British Empiricists, sought to offer a theory of knowledge
889
890
Experiments, Overview
Experiments, Overview
891
Functions of Experiments
The examples presented so far suggest that there are
several functions of experiments. Using categories suggested by Kuhn in 1962, one function is the verification or
weakening of theories by comparing theory-based predictions with observations of the phenomena of interest. The
studies conducted by Wertheimer to counter Wundts
theory illustrate this effort to put theories to empirical
test. Despite this being the central role of the critical
experiment as presented by Newton, Kuhn did not believe
that it is common or easy to find unequivocal points of
contact between theories and nature. Indeed, the QuineDuhem corollary recognizes that typically there are so
many ancillary assumptions involved in translating theory
into hypotheses that results inconsistent with theorybased predictions can usually be explained away.
As such, Kuhn emphasized two other functions of experiments. A particularly common function has been the
use of experiments to develop more precise measurements of important facts, such as speed of light, the
speed of signals sent along neurons, or the mass of
a particular particle. Other than in areas like physiological
psychology, the social sciences do not have many facts or
constants that need to be confirmed or established.
Milgrams initial study, however, provides a parallel to
this concern with facts or constants in establishing the
892
Experiments, Overview
Structure of Experiments
Given that experiments are employed to serve different
functions, we can expect them to vary somewhat in how
they are structured. And yet there are only a few basic
forms that constitute the domain of design for most social
science experiments. This is perhaps because the structure of experiments is the result of the overriding function of advancing our conceptual understanding of our
world.
Interestingly, in the natural sciences, understanding
the causes of phenomena often involves understanding
the structure of what is being studied. Thus, Rutherfords
experiment explained deflected particles as evidence of
the dense nuclei of atoms (ranked ninth of the ten most
Experiments, Overview
893
causal factors in applied social science is in trying to understand the impact of policy changes intended to improve social conditions. This was the goal that
Campbell put forward in promoting the experimenting
society, and it remains an attractive goal for many. Although some are skeptical of our ability to carry out randomized experiments in policy settings, the economist
Larry Orr has written persuasively on the merits and practicality of randomized social experiments. He points out
that randomized experiments are particularly important
for understanding the policy effects that result from
a known policy change.
Development of the Logic of
Experimental Design
The developments described so far represented tactics to
control threats to the validity of causal conclusions. This
emphasis on strengthening causal conclusions also led to
the development of the logic of experiments, with each
step in this development intended to reduce the
likelihood of reaching invalid conclusions.
Hypothetico-Deductive Logic Wanting to be as scientific as the natural sciences, many social scientists, such
as the psychologist Clark Hull, promoted a version of the
hypothetico-deductive method that we discussed as
having been developed by Galileo and Newton. This formal model guided the design of experiments by requiring
confirmatory research in which the implications of
a theory are developed through deduction and then tested.
Implications that are supported, or confirmed, by experimental results are accepted as theorems, and our belief in
the theoretical postulates, or axioms, that generated them
is strengthened. The logic of this confirmatory approach
was seen as being scientific in that the testing of implications appeared to be a fair way of testing theories.
As such, although the axioms and theorems of Hulls
behavioral theory were ultimately judged not to be of
lasting value, the method was accepted as a standard
for how experiments were to be designed. This reinforced
the dichotomy of exploratory work that generates hypotheses and confirmatory inquiry that tests the implications
of those hypotheses.
Falsification and Null Hypotheses The confirmatory
approach of the hypothetico-deductive method, however,
involved the willing acquiescence to what is known as
modus tollens, the fallacy of the excluded middle, or affirming the consequent. Basically, this involves the logical
error of moving from the premise of If my theory is true,
Outcome A will be observed and the empirical result of
Outcome A is observed to the conclusion, My theory is
true. To avoid this obvious error (Outcome A might be
predicted by a host of conflicting theories), all researchers
need to do is follow the appropriate rules of logic and
894
Experiments, Overview
Alternative Views of
Experimental Inquiry
The standard, or modernist, model of the social science
experiment developed so far has not gone unchallenged.
Alternative views of what experimentation is and should
be have been offered from several perspectives. In considering these critiques, it is useful view each as a criticism
of one or more consequences of the single-minded focus
in the standard model of experiment on strengthening
causal conclusions.
To frame this presentation of alternatives, recall the
description of the series of experiments by Stanley
Milgram. As presented, Milgrams first study of
Naturalistic Critique
One of strengths of the standard model is that, with
its reliance on hypothetico-deductive logic and
falsificationism, it offers a foundation for valid inference.
However, Kuhns early critique of the modern model of
the experimental method concluded that the model did
not correspond well to what real scientists actually do.
Instead of the systematic development and testing of hypotheses, scientists seemed to be more opportunistic in
conducting experiments.
In particular, actual research does not seem to follow
the falsificationist logic emphasized as the ideal for social
science experiments. Milgrams research, for example,
makes no real effort to falsify any theory. At most, falsification in Milgrams work, and most social science, is
represented by rejecting, or falsifying, a null hypothesis.
But, as previously noted in regard to Meehls criticism, the
null hypothesis in most research reports is not a position
claimed by any meaningful alternative theory. As such,
there is the appearance of following the rigorous standard
model of the experiment in service of the falsificationism
agenda, but the actual practice is different.
This discrepancy between formal model and practice is
an example of Kaplans distinction between the reconstructed logics that are reported in journals and the
logic in use that is employed by scientists in their everyday
scientific lives. According to this view, Milgrams series of
studies appear to reflect a rational unfolding of research
questions, but the actual process was less tidy. In addition
to the chance elements that might have led to some personality factors or environmental conditions being studied
before others, there is the issue of a maturation in understanding occurring as the series of studies is conducted,
with the later understanding then being used to represent
the logic of the earlier studies. More generally, experiments designed with one purpose in mind are recast in
terms of a different logic once the results are understood.
Experiments, Overview
Contextualist Critique
One reason why deviations from the standard model
might be constructive is that the context confronting researchers may be different from what is presumed by the
standard model. Consistent with this, a second criticism of
the modern experiment has been that efforts to isolate
single causal factors, one of the strengths of the modern
experiment, are misguided and destined for disappointment. In this view, causation is much more contextual in
that the impact of one factor is dependent on the levels of
other factors in a specific context. If causation is so context-dependent, the general laws that experiments are so
often designed to discover become ideals that are difficult
to achieve in social science. Furthermore, the deductive
stance reported in most accounts of experimentation has
the difficulty of being useful only if we are able to make
meaningful predictions. If the situation being studied is
sufficiently complex, there will always be caveats in our
predictions, resulting in either post hoc explanations of
why the predicted outcomes did not occur or post hoc
revisions of the predictions. Furthermore, if the reality of
causal relationships is too complex for meaningful predictions, the result of the hypothetico-deductive method will
be an overemphasis on those trivial predictions that will
hold true in almost all contexts (e.g., Students receiving
an experimental form of instruction will learn more than
students receiving no instruction).
One response to this recognition of contextual factors
has been to incorporate moderated relationships into the
deductive frameworks used to make predictions. Lee
Cronbach had in 1957 argued for this use of moderated
relationships but later, in 1975, realized that all efforts to
incorporate interaction effects in service of developing
general laws would be of only limited value. In a 1986
article entitled Social Inquiry by and for Earthlings, he
distinguishes a hypothetical world of constant truths from
the more complex real world that he views us as
confronting. In the hypothetical world, the hypotheticodeductive method is rational: The timeless truths that
social scientists could attain through this approach to
experimentation are worth the effort. If, however, the
truths discovered in social science experiments are so
895
Constructivist Critique
A third critique of the standard model is concerned not
with the generality of the truths that result from
experiments but rather with the question of whether
our conclusions from experiments are a function of the
reality that we are studying or more a function of the
constructs and operations that we use to make sense of
what we are observing. In this view, different people with
different constructs will interpret the results of an experiment differently. More formally, Kuhn emphasized how
the use of different constructs leads naturally to the incommensurability of different paradigms coexisting within
science.
Like the contextualist critique, the constructivist view
questions the value of the search for general laws. It is not
that constructivists deny that experimentalists could come
up with general laws; they merely dismiss the laws as
regularities that result from imposing researchers
constructs on a more complex reality. The focus instead
for constructivists is to use methodologies that emphasize
our interpretative abilities in making sense of the world
around us. Theory is still possible in the constructivist
view (see Karl Weicks seminal work in the areas of organizational behavior and theory), but the generality of
our understanding is opposed by our giving priority to
developing meaningful representations of specific settings. As with the contextualists, the constructivist critique
claims that induction from exposure to specific settings
896
Experiments, Overview
Future Trends
The critiques of experimental methods just summarized
challenge the standard model of experiment, but the experiment, in some form, will remain central to social science. In considering changes in the function and structure
of experiments that might strengthen their usefulness, we
can talk with some confidence about minor refinements
and speculate about more fundamental changes.
Beyond Hypothetico-Deductive
Formalism
In addition to these tactical refinements of better statistical methods and more empirical knowledge about the
strengths and weaknesses of the different experimental
designs, there is a notable trend to reframe the role of the
experiment in advancing our understanding. In general
terms, the trend associated with a commonsense realist
philosophy of science involves viewing the experiment as
an evolved methodology that allows us to pose questions
to nature and, when done right, increases our chances of
receiving meaningful answers. This pragmatic turn is behind Kaplans abovementioned distinction between logic
in use and reconstructed logic, with the implication that
the reconstructed logic is useful but is not an ideal that we
must restrict ourselves to in conducting science.
Rom Harre, a realist philosopher of science, amplifies
this point in referring to the fallacy of logical essentialism,
which involves the disastrous assumption that formal logic
should be given primacy as the preferred form of discourse in science. Hilary Putnam, another realist philosopher of science, refers to antiformalism as describing
a stance in which we value the insights from the formal
theories associated with logic and the experimental method but do not expect or allow these theories to dictate
practice (he speaks instead of the primacy of practice). To
provide a flavor of this realist trend in the future of the
experiment, we consider two examples of methodological
issues associated with experiments that are increasingly
being viewed in pragmatic terms.
Experimental Methodology and
Ramification Extinction
Donald Campbell was one of the major scholars
concerned with experimental and quasi-experimental inquiry. Whereas his early work was focused on the
advantages of true randomized experiments and ways
to approximate those advantages with weaker designs,
he came to appreciate the larger task of scientific method
and even the role of case studies in addressing that larger
task. In writing the forward to Robert Yins book on case
study research he noted, More and more I have come to
Experiments, Overview
897
we could attempt to confirm this interpretation with further analyses of the data available about these same students. One way to approach this confirmation would be to
predict which students, perhaps based on measurements
of how gifted they are in visualizing quantitative
relationships, might benefit most and least from the intervention and then to examine the moderated relationship (e.g., using interaction terms in regression analysis) to
assess the prediction. Another way to confirm the hypothesis in the same context that generated, it is to obtain
additional outcome variables, predict patterns in which
some of the outcomes measured would be affected more
by the intervention than others, and then conduct the
analyses to assess that prediction.
In that this testing of an interaction effect occurs after
the data are in hand, it is less vulnerable to the limits of
rationality highlighted by the contextualist critique.
However, to the extent that the interaction terms to be
tested cannot be derived in any direct manner from the
analyses already conducted (as is true with tests of the
interaction terms when only the main effects were analyzed previously), this approach satisfies the confirmatory
intent of the hypothetico-deductive model. Interestingly,
this emphasis on the iteration between gathering evidence
and developing theoretical implications is consistent with
the method developed by Newton and also promoted 100
years ago by pragmatic social scientists who preceded the
ascendence of logical positivism. As such, we might talk of
a prepositivist emancipation wherein we are now free to
make better use of the insights of the early pragmatists.
The current emphasis on using pattern-matching to
strengthen internal validity is an example of this
pragmatic orientation.
Satisfying the Superego versus Supporting
Cognitive Management
Adherence to good practice in applying quantitative
methods is intended to yield a better understanding of
the phenomena being studied. There are, however, multiple paradigms available for guiding statistical analyses
as applied to experiments. The hypothetico-deductive
model is typically framed in terms of only one of these
alternative paradigms, the Neyman-Pearson paradigm.
Referred to as forward-looking, this approach dominates
statistical textbooks in social science and involves the traditional emphasis on deciding before the data are collected what criteria will be used to reach conclusions
once the analyses are carried out (e.g., the null hypothesis
will be rejected if p 5 0.05).
Although the Neyman-Pearson approach seems reasonable, we need to ask whether this paradigm is really
suited for the task at hand when interpreting experimental
results. Specifically, Gerd Gigerenzer has argued that the
Neyman-Pearson approach is a statistical parallel of the
898
Experiments, Overview
psychodynamic superego in that it specifies correct behavior but is out of touch with the demands of everyday
living. Instead, Gigerenzer promotes Fishers statistical
framework as being more consistent with our task of making sense of experimental data. Referred to as a backwardlooking approach, the Fisherean experimenter first collects and analyses the data and then (looking back at these
data as analyzed) comes up with what appear to be the best
interpretations (Teddy Seidenfeld and Henry E. Kyburg
are recent mathematical philosophers who have sought to
rehabilitate Fishers approach).
Viewing the development of understanding as less
a matter of making decisions and more a matter of supporting interpretations, a pragmatic stance on applying
statistical analyses to experimental results emphasizes
a backward-looking approach to guiding our beliefs.
This distinction between forward- and backward-looking
approaches to statistical inference brings to mind again
the distinction that Kaplan made between the formal reconstructed logic that is used to explain results and the
logic in use that reflects actual practice. Most of the
criticisms of the use of statistical inference in social science (including both hypothesis tests and confidence intervals; see the work of psychologist Charles Reichardt for
more on advantages of confidence intervals) concern their
failings in terms of a presumed reconstructed logic. In
contrast, statistical inference can be useful from the vantage point of supporting a logic in use.
Taking the superego-ego metaphor more broadly, an
additional step in moving beyond hypothetico-deductive
formalism is the view that the role of the experiment in
social science is as a tool for cognitive management,
meaning that it is valued primarily not as an arbiter of
truth but as a means of screening the beliefs that we are
willing to hold. The philosopher Jerry Fodor makes this
point in arguing that the notion of observation in the
empiricist account of the experiment is becoming increasingly abstract. Pointing out that the use of technology
means that observation by humans may be nothing
more than observing computer printouts, Fodor argues
that the essence of the experiment is neither formal deduction nor direct observation but rather a process of
posing questions to nature. Accordingly, what constitutes
an effective tool for cognitive management in one field
might differ from what is effective in other fields. This,
along with Campbells notion of context-dependent ramification extinction, argues for a flexible definition of what
constitutes an experiment in social sciences. In some
fields, such as neuropsychology, random assignment
may not be as important as other forms of experimental
control. On the other hand, in fields such as anthropology,
it may be useful to conceive of experiments in very different ways, including the use of qualitative methods and
other methods not based on aggregate covariation, to assess the impacts of deliberate interventions.
Conclusion
The design and use of experiments in social science is
evolving. The critiques of the standard model from
other perspectives have highlighted limitations that can
be overcome. A consideration of the variety of physical
science experiments reminds us that we need not restrict
ourselves to a narrow definition. Instead of viewing the
experiment solely as an instrument for the particular
form of empiricism promoted by logical empiricists,
a prepositivist emancipation allows us to view experiments
as tools in the pragmatic task of guiding our efforts to make
sense of our world.
In particular, we can consider the standard model
of the social science experiment, with its design-based
definition (i.e., requiring active, preferably random,
manipulation of causal factors) and its role in promoting
falsificationism, as one valuable model rather than the
only viable model. A more inclusive definition of the experiment as a deliberate intervention designed, based on
current understanding, to probe the nature of our world
(to pose questions in ways that we can expect meaningful
answers) has several advantages. Of greatest import, such
an inclusive definition can avoid most of the limitations
raised by the naturalistic, contextual, and constructivist
critiques while still allowing us to focus on the rigor necessary to counter the rival alternatives that Campbell and
others have identified as the crucial task of scientific
inquiry.
Further Reading
Cook, T. D. (1985). Postpositivist critical multiplism.
Social Science and Social Policy (L. Shotland and
M. M. Mark, eds.), pp. 21 62. Sage, Thousand Oaks, CA.
Cook, T. D., and Campbell, D. T. (1979). Quasi-Experimentation: Design and Analysis Issues for Field Settings. Rand
McNally, Skokie, IL.
Cronbach, L. J. (1986). Social inquiry by and for earthlings. Metatheory in Social Science (D. W. Fiske and R.
A. Shweder, eds.), pp. 83 107. University of Chicago Press,
Chicago, IL.
Guerlac, H. (1973). Newton and the method of analysis.
Dictionary of the History of Ideas (P. P. Weiner, ed.),
pp. 378 391. Charles Scribners Sons, New York.
Harre, R. (1986). Varieties of Realism. Blackwell, Oxford.
Julnes, G., and Mark, M. (1998). Evaluation as sensemaking:
Knowledge construction in a realist world. Realist Evaluation: An Emerging Theory in Support of Practice (New
Experiments, Overview
Directions for Evaluation, no. 78) (G. Henry, G. Julnes, and
M. Mark, eds.), pp. 33 52. Jossey-Bass, San Francisco.
Kaplan, A. (1964). The Conduct of Inquiry. Chandler, San
Francisco, CA.
Kuhn, T. S. (1962). The Structure of Scientific Revolutions.
University of Chicago Press, Chicago, IL.
Mohr, L. B. (1996). The Causes of Human Behavior:
Implications for Theory and Method in the Social Sciences.
University of Michigan Press, Ann Arbor, MI.
Orr, L. L. (1998). Social Experiments: Evaluating Public Programs
with Experimental Methods. Sage, Thousand Oaks, CA.
899
Experiments, Political
Science
Rose McDermott
University of California, Santa Barbara, California,
USA
Glossary
artifact An exogenous variable that is created by some aspect
of the experimental manipulation and which can vary with
the independent variable. The effects of these variables can
then become confounded with the experimental findings.
between-subjects design An experimental design in which
each person is exposed to only one of the experimental
manipulations. Differences are then measured between
groups.
control condition The condition of subjects in an experiment
who receive no treatment or manipulation.
direct replication Repeating an experiment as closely as
possible to determine the reliability of results.
experiment A method of investigation with an independent
variable, a dependent variable, high levels of control on the
part of the investigator, and the random assignment of
subjects to treatment conditions.
experimental condition The condition of subjects in an
experiment who are exposed to the manipulated variable or
treatment.
interaction effect When the outcome or effect of one
variable varies with the outcomes or effects of a second
variable. This occurs in factorial design experiments.
systematic replication Repeating an experiment in such
a way that certain aspects, measures, or protocols of the
original experiment are systematically varied. These replications serve to clarify or extend aspects of the original
experiment.
within-subjects design An experimental design in which
each person is exposed to all treatment conditions. Thus,
each person serves as his or her own control and differences
are found within each individual.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
901
902
smoke caused lung tissue to die under controlled conditions, the causal link was established and it became much
more difficult for the tobacco companies to argue that the
relationship was spurious in origin. This causal link
remains critical in addressing many questions of great
importance in political and social life as well.
There are at least three ways in which experiments
differ from other forms of social measurement. First,
many political scientists begin their inquiries by looking
for naturally occurring phenomenon in which they are
interested and then seeking to examine them in
a variety of ways. Thus, if a scholar was interested in
military coups, he could find states where they occurred
and then study them in some systematic fashion. In these
cases, methods of inquiry can include field work, interviews, surveys, archival exploration, and so on. On the
other hand, an experimentalist tends not to wait for events
to occur. Rather, an experimentalist goes about creating
the conditions that produce the events of interest.
Thus, if a scholar is interested in how leaders make
decisions under conditions of stress, he can design an
experiment in which he asks subjects to make various
decisions and then imposes stress on them while they
are completing these tasks. Experimentally induced stress
can take many forms, depending on the interest of the
investigator. One can turn up the heat in the room, impose
a tight time limit, give an impossible task with falsenegative feedback, make the room too loud or too bright,
or simply create a crisis in the midst of the task, such as
a fire or medical emergency. Obviously, some issues and
questions of concern lend themselves more readily to
experimental creation and manipulation than others,
but clever experimentalists can gain a great deal of control
by creating their own conditions of interest. The real advantage of this opportunity lies in the experimentalists
ability to measure and capture exactly what they are interested in studying. Because of this, experimenters are
not easily led astray by irrelevant forces. In other words,
experimentalists do not need to test or replicate every
aspect of the real world in order to investigate their
areas of interest. Rather, they need only examine the
key relationships between variables that they suspect
have causal impact on the outcomes of interest. In this
way, an experimentalist draws on theoretical ideas and
concepts to derive hypotheses and then designs a test,
or multiple tests, to explore the variables and relationships
that appear to be causal in nature. Therefore, experimentalists do not need to re-create the external world in their
laboratories in order to achieve meaningful results; by
carefully restricting their observation and analysis to
those variables deemed central to the relationships of
interest, they can test competing alternative causal hypotheses within controlled conditions.
Second, and related, experimentalists can control and
systematically manipulate the treatment conditions to
Conditions
Several specific aspects of experimental design deserve
mention. First, perhaps the most important aspect of experimental design relates to the creation of the experimental conditions themselves. Experiments require at
least two conditions in which to measure the independent
variable in order to see whether it has had any effect on the
dependent variable; obviously, one condition simply constitutes a constant and no effect can be demonstrated.
Typically, an experiment is designed with a control condition and an experimental one. This means that in one
treatment group, the control condition, subjects engage in
some benign but related task, which should take a similar
amount of time, require the same kind and amount of
903
904
Interaction Effects
In an experiment, investigators typically look for main
effects, that is, the direct effect of the independent variable on the dependent variable. One of the potential
methodological advantages of introducing more than
one experimental condition at a time, however, lies in
the ability to uncover interaction effects between
variables that might have been hidden if only one variable
were examined at a time. Interaction effects occur when
one independent variable has different effects depending
on the impact of a second independent variable. For example, an interaction effect would be discovered if one
found that party identification had a different impact on
voting behavior depending on race.
Obviously, many variables can affect social and political behavior. Therefore, depending on the topic under
investigation, sometimes an experimenter will hold certain things constant, sometimes he will allow certain
variables to vary at random, and sometimes he will introduce a systematic variation in independent variables.
When such a systematic variation exists, precautions
must be taken to avoid artifactual findings, those results
that may appear to look like real interaction effects, but in
Replication
Obviously, repeating experiments constitutes a fundamental part of the scientific method in order to determine
the reliability of results. Replication also means that
results remain falsifiable. However, the reasons for, and
forms of, replication shift depending on the purpose.
Replication means only that an investigator repeats an
experiment. In a direct replication, the experimenter tries
very hard to reproduce closely as many elements of the
original experiment as possible. Similar results typically
confirm the reliability of the findings. These kinds of replications are unusual, however. More common is
a replication of some elements of the original experiment
coupled with some kind of extension into a new and different aspect of the original work. Typically, this occurs
when some aspect of the original experiment stimulates
some additional hypotheses about the phenomenon
under investigation.
Another type of repetition takes the form of systematic
replication. In this situation, an experimenter systematically varies some aspect of the original experiment. Most
commonly, this takes places to clarify some unexpected or
unexplained aspects of the original experiment or to seek
some additional information about some previously unconsidered feature of the original study. This can occur,
for example, when a new alternative explanation, which
did not occur to anyone beforehand, presents itself after
the first experiment is complete. Even though it can seem
time-consuming and expensive at the time, pilot testing
almost always ends up saving time, money, and enormous
905
Subjectivity in Design
The best and most clever experimentalists put a lot of
time and energy into designing successful experiments.
906
At least two factors really enhance the ability of an experimentalist to design experiments of high quality and
impact. First, strong experimentalists do their best to
put themselves in the place of their subjects. They try
to imagine what it would be like to enter their experimental condition blind, for the first time, without a clear sense
of the purpose or the working hypothesis. In undertaking
this fantasy, it can be helpful to assume that although
subjects may be cooperative and want to help, they will
nonetheless most probably be trying to figure out what the
experimenter wants. A further step in this process
involves careful debriefing of subjects after early or
pilot testing. They should be asked what they experienced,
what made sense to them, and what was confusing.
It should be ascertained whether they were able to determine the purpose, if they were not told outright, or,
if not, was there any pattern to what they thought
was going on that differed from the real intention of
the experiment?
A second way to help improve experimental design is to
try to imagine how a different potential pattern of responses might appear. If the results were not what was
wanted or expected, why might that be the case? What
other potential patterns might emerge? How can new
tests be designed to examine these alternative hypotheses? Being able to anticipate alternatives prior to actual
large-scale testing can really help clarify and improve experimental designs beforehand.
Nonexperimental Alternatives
Why?
In many cases, experiments may not be the optimal form
of investigation in political science. There may be other
forms of inquiry that are better suited to the problem
under investigation. However, when this happens,
there may be other forms of nonexperimental research
that nonetheless remain related to experimental work that
might be worth considering.
There are several reasons that researchers might want
to seek nonexperimental alternatives to experimental
work. The first and most important lies in the fact that
many questions of interest are not causal questions. For
example, in trying to predict the outcome of a particular
election, pollsters may not need to know why certain people vote in particular ways; they just want to be able to
survey people in such a way as to obtain an accurate prediction. Other issues that are important to political scientists also fall outside the causal realm. For example,
sometimes investigators merely want to demonstrate
that a particular phenomenon exists, such as incipient
democracy in a formerly communist country.
Second, there are many situations in which experimental work would be impossible to conduct or unethical to
Correlational Work
Correlational work does not involve the administration of
any experimental treatment manipulation or condition. As
noted previously, correlational work can offer useful, interesting, and important hypotheses about the relationship between variables, but it cannot demonstrate
causality. Partly this results from the possibility of
a third spurious cause influencing both factors. On the
other hand, correlational work can establish that no causal
relationship exists. If there is no correlation between
variables, there is no causation. Occasionally, some very
complicated relationship that depends on several mediating factors may exist, but at least a direct or simple causal
relationship is ruled out if no correlation exists. In other
words, it cannot prove, but it can disprove, a causal relationship between variables.
A second problem with correlational work results from
confusion about directionality. In correlational work, it
can be difficult to determine what the cause is and
what the effect is in many circumstances. Some techniques that seek to partially ameliorate the problem of
directionality exist. The cross-lagged panel technique
collects correlational data on two separate occasions.
Thus, in other words, investigators obtain information
on variables 1 and 2 at times 1 and 2. By looking at the
same variables in the same populations over time, some
element of directionality might be illuminated.
Pseudo-Experimental Designs
Pseudo-experimental designs represent nonexperimental
designs in which the investigator nonetheless maintains
some control over the manipulation and the collection of
data. In other words, an investigator does create and
Quasi-Experimental Designs
A quasi-experiment allows an investigator to assign treatment conditions to subjects and measure particular outcomes, but the researcher either does not or cannot assign
subjects randomly to those conditions. To be clear, in
pseudo-experimental design, the study lacks a control
condition, whereas in quasi-experimental design, the researcher does not or cannot assign subjects to treatment
conditions at random. This feature actually makes quasiexperiments much easier to use and administer in field
and applied settings outside of the laboratory. However,
what is gained in flexibility and external validity may be
lost in being able to make unequivocal arguments about
causality. However, quasi-experiments do allow scholars
to make some causal inferences and interpretations, just
not fully dependable arguments about causality.
In general, two types of quasi-experimental designs
predominate: the interrupted time series design and
the nonequivalent control group design. In the former,
a within-subjects design is employed to examine the
effects of particular independent variables on the same
group of subjects over time. Typically, subjects are measured both before and after some kind of experimental
907
treatment is administered. In the latter, a betweensubjects design is invoked to measure the impact of
the independent variable on different groups of subjects. What remains common to both types of quasi-experiments is the fact that investigators do not or cannot
assign subjects to treatment condition at random.
Field Experiments
Field experiments are those that take place outside
a laboratory, in a real-world setting. Goznell published
the first field experiment in political science in the
American Political Science Review in 1927. In conducting
a field experiment, an investigator typically sacrifices control in order to achieve increased generalizability. In psychology, nonexperimental field testing early in a research
program often helps generate novel hypotheses. Once
these ideas are tested and refined in laboratory settings,
psychologists often then return to field testing to validate
their findings. In this way, field studies can prove extremely helpful in specifying the level of generalizability
of laboratory results to the real world.
Two important points should be made about field experiments. First, successful field experiments rarely constitute a simple transfer of a laboratory study to a field
setting. Investigators have less control in the field; they
cannot always control the nature of the treatment, the
comparability of subjects, or the impact of unplanned,
extraneous factors on outcomes of interest. Most importantly, often subjects cannot be assigned to treatment
conditions at random. Second, results from a single
field study, no matter how large the population, can rarely
be taken on their own, because there are often so many
extraneous and potentially confounding factors occurring
during the completion of the field study. Figuring out
what is actually causing the outcome of interest can be
challenging. Such findings typically need to be interpreted within the context of similar studies conducted
using other formats, including laboratory experiments,
interviews, surveys, or other methods. Results that demonstrate convergent findings across methods allow for
greater confidence in accuracy.
Despite these concerns, field experiments can prove
quite beneficial for a variety of reasons. Often, behaviors
of interest cannot be induced in a laboratory, but can
easily be observed in the real world. In addition, experimental findings that replicate in the field increase peoples sense of confidence in the generalizability of the
results. Obviously, trade-offs between laboratory and
field experiments exist and the nature of the research
question of interest often determines the appropriate setting for a study.
Rather than creating a setting as experimenters do in
a laboratory setting, those who conduct field experiments
908
Experimental Ethics
When conducting any experimental work, ethics remain
an important consideration. The U.S. Department of
Health and Human Services imposes certain guidelines
on the ethical treatment of human subjects. Most universities have human subjects institutional review boards to
oversee the administration and implementation of these
guidelines. These guidelines encompass four aspects.
First, subjects must be able to give their informed
consent before participating in an experiment. Experimenters should provide subjects with a clearly written,
simple statement explaining the potential risks and expected gains from their participation in the experiment.
They should be told that they can stop their participation
at any time without penalty. And they should be given
contact information about who to go to in case they have
any concerns about the ethics of experiments in which
they participate.
Second, experimenters are required to take every reasonable precaution to avoid harm or risk to their subjects
as a result of their participation. Third, experimenters
should provide a debriefing opportunity to all subjects,
in which they are told as much as possible about the
experiment in which they just participated. In particular,
subjects should be told that their information will be kept
confidential and that no identifying information will be
released without their prior written consent. Often, subjects are told how they can receive copies of the results at
the conclusion of the experiment if they are interested
in doing so.
Finally, the issue of deception remains a controversial topic. Psychologists continue to employ deception
more than behavioral economists. Deception may prove
necessary in those instances in which a subjects prior
knowledge of the working hypotheses would influence
his or her behavior in systematic or inauthentic ways.
This bias can hinder the discovery of important processes
and dynamics. However, when investigators employ deceptive techniques, institutional review boards remain
particularly vigilant to ensure that the use and value of
such experiments are carefully monitored.
Conclusion
Experiments provide a valuable tool for the measurement
of social variables and processes. They provide unequaled
purchase on causal inference through experimental control and the random assignment of subjects to treatment
conditions. And careful design can ensure the ethical
treatment of subjects during the experimental process,
so that the benefits of discovery continue to outweigh
the risks posed to subjects.
Experiments provide an unparalleled ability to clarify
causality in ways that can reduce confusion about important processes and relationships. By showing the true
direction of the casual link, human beings can learn something important about themselves, and possibly, take
steps to change environments and institutions that can
cause ill. In his famous experiment, Stanley Milgram attempted to try to understand what it was about the German national character that would lead ordinary citizens
to become complicit in the atrocities surrounding the
Holocaust. Before he tested his hypotheses about the
nature of obedience to authority in Germany and
Japan, where he expected high levels of compliance, he
ran his control group of presumed individualist Americans at Yale. All the experts agreed that less than 1% of
subjects would shocks learners to their assumed death.
Yet, the majority of subjects did so. As the films show,
these subjects were not obvious sadists nor did they delight in being cruel to their fellow man. They did not easily
or readily comply; they argued, cried, walked around
the room, tried to talk their way out of the situation, and
were inordinately relieved when they found out their
partner was not dead, but they obeyed nonetheless.
After Milgrams experimental findings, those observers
who discarded Nazi defenses that claimed that they
were just following orders were forced to confront
results that proved that the power of the situation
could overcome personal dispositions. One may not be
comfortable knowing this, but this insight teaches one the
importance of deviance when morality is compromised,
the critical significance of opposition to inappropriate
authority, and the transcendent knowledge that removing
oneself from the situation can provide the best defense
against self-destruction.
Further Reading
Aronson, E., Ellsworth, P., Carlsmith, J. M., and Gonzales, M.
(1990). Methods of Research in Social Psychology.
McGraw-Hill, New York.
Brody, R., and Brownstein, C. (1975). Experimentation and
simulation. In Handbook of Political Science (F. Greenstein
and N. Polsby, eds.), Vol. 7, pp. 211 263. Addison-Wesley,
Reading, MA.
Campbell, D., and Stanley, J. (1966). Experimental and QuasiExperimental Designs for Research. Rand McNally,
Chicago, IL.
909
Experiments, Psychology
Peter Y. Chen
Colorado State University, Fort Collins, Colorado, USA
Autumn D. Krauss
Colorado State University, Fort Collins, Colorado, USA
Glossary
construct validity Inferences about the extent to which
operations of variables are similar to their correspondent
constructs.
dependent variables Those variables that are affected by the
independent variables.
external validity Inferences about the extent to which
a causal relationship found in an experiment generalizes
across people, settings, time, etc.
extraneous variables Nuisance variables that remain uncontrolled and are confounded with either the independent or
dependent variables.
independent variables Presumed causal variables that are
deliberately manipulated by experimenters.
internal validity Inferences about the extent to which the
observed causal relationship between an independent and
a dependent variable is accurate.
random assignment A process to allow all participants an
equal chance of being selected and/or placed in experimental conditions.
reliability Consistent responses to a measure.
variables Quantitative or qualitative attributes of an object of
interest.
Survey of Psychological
Experiments
Virtually every psychological discipline uses laboratory
experiments to investigate hypotheses derived from
theory or reasoning. For example, some investigators
have examined how different types of cognitive processing
tasks activate different areas of the brain. By manipulating
the type of cognitive processing task (such as repeating
presented words aloud or generating a use for each of the
same words), it has been substantiated that different areas
of the brain are activated.
In another laboratory experiment that simulated an
organization, the effects of belief about decision-making
ability on multiple performance measures were investigated. While participants played the roles of managers,
researchers manipulated beliefs about decision-making
ability by including information in the instructions for
the task that described the ability either as a stable trait
or as an acquirable skill. Compared to participants who
believed that decision-making ability was a stable trait,
those who received information conveying that decisionmaking ability was an acquirable skill perceived higher
levels of self-efficacy throughout the simulation, set increasingly difficult organizational goals, and subsequently
achieved higher levels of productivity.
Considerable research in psychology has examined
eyewitness testimony and the factors that can affect the
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
911
912
Experiments, Psychology
Basic Components in
Psychological Experiments
Although the topics of the preceding examples are diverse, the basic structures of the experiments are similar.
Each experimental structure contains three critical
elements: variables of interest, control, and measurement.
A thorough understanding of these elements and other
related concepts, such as validity, is paramount for successfully conducting an experiment.
Variables
A variable represents a quantitative or qualitative attribute of an object of interest. In a psychological experiment,
the object can be a person, an animal, a time period,
a location, an institution, or almost anything else .
There are three types of variables involved in any psycho-
Experiments, Psychology
Construct A
Peer pressure
(4)
913
Construct B
Aggression
Extraneous
constructs
(2)
(1)
(8)
Independent variable
Number of peers present
Dependent variable
Arguing with people
(3)
(9)
(6)
(10)
Construct E
Anxiety
(5)
Construct C
Social
support
(7)
Construct D
Emotional
stability
Figure 1 Snapshot of the causal relationship between peer pressure and aggression from an
independent variable and a dependent variable.
never observed in psychological experiments. Before experimenters can be certain about the existence of path 4,
they need to provide evidence about path 3, which still
only serves as a necessary but not sufficient requirement.
Three conditions should be considered when evaluating
the legitimacy of path 3. First, the IV must precede the DV
in time. Second, the IV must relate to the DV. Finally, no
extraneous variables can cause these two variables to
covary. The former two conditions are relatively easy to
meet by means of research designs as well as statistical
analyses. It is the violation of the last condition that
presents the greatest threat to the internal validity of
an experiment.
Various factors may directly challenge the veracity of
path 3. These factors, also referred to as threats to internal
validity, include selection, history, and maturation. For
instance, selection bias occurs when participants in different experimental conditions are dissimilar at the beginning of the experiment. History refers to an external event
(e.g., tragedy of September 11, 2001) occurring during
the period of experimentation and potentially altering
the DV. In contrast to history, maturation threatens
internal validity because some internal changes (e.g.,
more experience) occur during the period of experimentation. The preceding examples clearly show that a DV can
be changed even in the absence of any effects caused by
the IV. To reduce threats to the internal validity of
an experiment, it is imperative that experimenters exercise rigorous experimental controls and use adequate
research designs.
Experimental Controls
An experiment conducted without three types of control
(i.e., manipulation, elimination or inclusion, and randomization) is technically not considered a true experiment.
Although statistical control is widely practiced in data
analysis, it should not be confused with the concept of
experimental control. Statistical control is an application
of statistical techniques (e.g., analysis of covariance or
hierarchical regression analysis) to remove the effects
of presumed extraneous variables. It is a type of control
that is performed at the stage of data analysis.
Manipulation
Control by manipulation refers to the procedure when
researchers systematically change the properties of IVs
(e.g., different types of cognitive processing tasks
presented to participants, as described earlier) in
a consistent and standardized manner. Consistency and
standardization are met when the properties of the IV
(e.g., strength of the manipulation) are identical for participants within the same experimental condition and different across experimental conditions. It is safe to say that
definite causal conclusions cannot be made without
a systematic manipulation, although a systematic manipulation does not necessarily guarantee causation. There
are several critical points regarding manipulation.
First, to qualify as a true experiment, the properties
of an independent variable need to be deliberately
changed by experimenters. The common pitfall is to
914
Experiments, Psychology
Experiments, Psychology
the therapist as an additional independent variable. Specifically, participants can be assigned to one of four experimental conditions: a treatment with a male therapist,
a treatment with a female therapist, a placebo control with
a male therapist, and a placebo control with a female
therapist. This experimental design enables consideration
of the effect of the treatment, the effect of the therapists
gender, and the interaction of both independent
variables.
Random Assignment
Practically speaking, not all extraneous variables can be
directly controlled by elimination or inclusion. It is random assignment that indirectly controls extraneous
variables. Random assignment refers to a process by
which all participants have an equal chance of being selected and/or placed in experimental conditions. This randomization process works under the assumption that the
threats of extraneous variables are equally distributed
over all experimental conditions in the long run, thereby
reducing the likelihood that IVs are confounded with extraneous variables. As a result of random assignment,
changes in a DV can be attributed to variations in an
IV. There is considerable evidence that nonrandom assignment and random assignment often yield different
results. In practice, researchers can accomplish random
assignment by using random number tables provided by
most statistics books, or by using statistical software produced by various companies (e.g., by SPSS Inc., Minitab
Inc., and SAS Institute Inc.) to generate random numbers.
Random assignment should not be confused with random selection or random sampling. In contrast to random
assignment, which facilitates causal inferences by equating participants in all experimental conditions, random
selection is a process to select randomly a sample of participants from a population of interest as a way to ensure
that findings from the sample can be generalized to the
population. Although researchers can use various probability sampling strategies such as systematic sampling to
select a sample from a population, most published psychological experiments rely on nonprobability or convenience samples. Therefore, results of an experiment using
random assignment based on a convenience sample may
or may not be similar to those found for another group
of participants who are randomly selected from the
population.
The distinction between the process of random assignment and the outcome of random assignment is also very
important, although it is often overlooked or misunderstood. The process of random assignment, in theory,
equates participants across experimental conditions
prior to any manipulations of IVs. In other words, randomly assigned participants are expected to be equal on
every variable across all conditions in the long run.
However, the actual outcome of random assignment
915
Measurement
As mentioned earlier, adequate construct validity is vital
for both IVs and DVs, though the construct validity of the
DV is often neglected. Having a valid DV, which is often
assessed by an instrument (e.g., measures or records) in
psychological experiments, allows the intended inference
about a construct to be made, compared to other plausible
inferences (path 2 vs. path 6 in Fig. 1).
The major factors that threaten the construct validity of
independent and/or dependent variables have been delineated, including inadequate explication of the construct,
construct confounds, mono-operation bias, monomethod
bias, reactivity to the experimental situation, experimenter expectancies, and resentful demoralization. For
instance, mono-operation bias and monomethod bias are
biases that arise when only one operationalization or one
method is used to assess independent or dependent
variables. If results are similar when the same construct
is operationalized differently or measured by different
methods, greater confidence about construct validity is
obtained.
Contrary to the factors that may threaten construct
validity, other characteristics, such as reliability, may
provide evidence to support the construct validity of
both the IV and the DV. Reliability is a necessary piece
of this type of evidence, and it is an indication of the
relative amount of random fluctuation of individual responses on the measure. A measure cannot be valid if it is
not reliable, although a reliable measure does not ensure
validity. Suppose that aggression in Fig. 1 was not assessed
by the amount of arguing with others but instead by
a questionnaire containing several queries about the
participants desire to quarrel and fight with others. If
this measure possessed low reliability, it would imply
that participants responses to the items would be inconsistent. Inconsistent responses can be attributed to one or
more sources of measurement error, such as the items on
the measure, the time or location of administration, and
the motivation or fatigue of the participants. The magnitude of these measurement errors can be evaluated by
different types of reliability indices, including test/retest
reliability, internal consistency, alternate form reliability,
intercoder (or interrater) reliability, or generalizability
coefficients.
Experimental Designs
In this section, the discussion centers on five experimental
designs, and their variations, that are often utilized in
psychological research. These designs can be used either
916
Experiments, Psychology
Basic Design
The basic design, also referred to as a treatmentcontrol
posttest design, consists of two groups, a treatment group
and a control group. Only participants in the treatment
group receive a manipulation. Although the control group
generally receives no manipulation, other types of control
groups (e.g., placebo control or wait-list control) receive
manipulations for either ethical or practical reasons. An
assessment of dependent variables from both groups will
be conducted after an IV is manipulated. Part of the treatments-effect study described earlier used the basic design. This design can be depicted as having the following
structure:
R
Treatment group
Control group
X1
Xn
O
O
Variant 1:1
or
R
X1
R
R
Xn
O
O
Variant 1:2
Opre
Opre
Opost
Opost
Opre
X1
Opost
Opret
Xn
Opost
Opre
X1
Opost
Opre
Xn
Opost
Opre
Variant 2:1
or
Opost
Variant 2:2
Experiments, Psychology
Factorial Design
When describing manipulation by inclusion, it was
shown that more than one independent variable can
be included in an experiment. Such a design is referred
to as a factorial design, in which IVs are labeled as
factors (e.g., treatment and gender of therapists), and
each factor has at least two levels (e.g., types of treatments or male and female therapists). The structure of
the design varies contingent on the number of factors
and the number of levels within each factor. The number of experimental conditions in the factorial design is
equal to the product of the number of levels of each
factor, although experimenters can employ fewer conditions if some have no theoretical interest or are difficult to implement. If there are four factors and two
levels within each factor, the number of experimental
conditions is 16 (2 2 2 2).
The simplest structure can be shown to consist of four
experimental conditions created by two factors (A and B)
and two levels (1 and 2) within each factor. Based on the
combination of factors and levels, participants in the experiment receive one of four treatments: XA1B1, XA1B2,
XA2B1, or XA2B2:
R
XA1B1
R
R
XA1B2
XA2B1
O
O
XA2B2
Opre
XA1B1
Opost
R
R
Opre
Opre
XA1B2
XA2B1
Opost
Opost
Opre
XA2B2
Opost
Variant 3:1
917
reactivity refers to when participants react to pretest measures, which subsequently distorts their performance on
posttest measures. It occurs if the pretest measure influences participants subsequent responses on the posttest
measure (i.e., a main effect of the pretest measure), or the
pretest measure interacts with the treatment and then
influences participants responses on the posttest measure
(i.e., an interaction effect). The structure of the design is
depicted as follows:
R
Opre
R
R
R
Opost
Group 1
Group 2
Opost
O
Group 3
Group 4
Opre
Longitudinal Design
Compared to the prior designs, a longitudinal design provides stronger evidence for internal validity. Recall from
the substance-abuse study that participants receiving the
treatment reported using cocaine on significantly fewer
days for the first 6 months, as compared to those who
received the placebo treatment; however, the treatment
effect diminished when the participants were retested
12 months after the interventions. Without the longitudinal design, the lack of a long-term benefit of the treatment would not have been known. The design consists of
multiple pretests and posttests over a period of time. The
numbers of pretests and posttests do not need to be the
same, and generally there are more posttests than
918
Experiments, Psychology
Opre
Opre
Opre
Opre
Opost
Opost
Opost
Opost
Opost
Opost
Opost
Opost
Practical constraints are often encountered when attempting to implement a study using a longitudinal
design. For instance, attrition is common in longitudinal
designs, so data for some participants are often incomplete. Furthermore, it is not clear from a theoretical
viewpoint how long a longitudinal design should be
conducted; as such, requiring participant involvement for an extended period of time may pose ethical
concerns.
Criteria to Evaluate
a Psychological Experiment
All of the characteristics of an experiment described
herein can be used as criteria to evaluate the findings
of a study. Specifically of interest are the reliabilities of
the measures, the internal validity, the construct validity,
the experimental design, and the external validity.
Note the intimate relationships among these criteria. In
the case of internal and external validity, it may be necessary to compromise the generalizability of an experiments results to the world outside the laboratory (external
validity) in order to ensure causation with a strong manipulation (internal validity). Realize that although the
experiment is a useful tool to further knowledge about
human behavior, it is rare that any one experiment can
possess all characteristics at optimum levels (e.g., internal
validity, construct validity). Therefore, when conducting
an experiment, it is appropriate to acknowledge the
stronger features of the design (e.g., manipulation and
random assignment) and the weaker aspects (e.g., weak
experimental realism and mundane realism) so that later
experiments can build on the solid components and
strengthen the limitations. Experiments can be conceptualized as small pieces of a larger puzzle whereby each
study provides a small understanding of a larger overarching human phenomenon. Other designs, such as the field
experiment, quasi-experiment, or correlational design,
would complement traditional experiments by offering
different pieces of the same puzzle.
Further Reading
Abdullaev, Y. G., and Posner, M. I. (1997). Time course of
activating brain areas in generating verbal associations.
Psychol. Sci. 8, 5659.
Aronson, E., Ellsworth, P. C., Carlsmith, J. M., and Gonzales,
M. H. (1990). Methods of Research in Social Psychology,
2nd Ed. McGraw Hill, New York.
Chen, P. Y., and Krauss, A. D. (2004). Reliability. In
Encyclopedia of Research Methods for the Social Sciences
(M. Lewis-Beck, A. Bryman, and T. F. Liao, eds.). Sage
Publ., Newbury Park, California.
Chen, P. Y., and Popovich, P. M. (2002). Correlation:
Parametric and Nonparametric Measures. Sage Publ.,
Newbury Park, California.
Hambleton, R. K., Swaminathan, H., and Rogers, H. J. (1991).
Fundamentals of Item Response Theory. Sage Publ.,
Newbury Park, California.
Holland, P. W. (1986). Statistics and causal inference. J. Am.
Statist. Assoc. 81, 945960.
Kray, L. J., Thompson, L., and Galinsky, A. (2001). Battle
of the sexes: Gender stereotype confirmation and
reactance in negotiations. J. Personal. Social Psychol.
80, 942958.
Lindsay, D. S. (1990). Misleading suggestions can impair
eyewitnesses ability to remember event details. J. Exp.
Psychol. Learn. Mem. Cognit. 16, 10771083.
Lyskov, E., Sandstroem, M., and Mild, K. H. (2001).
Neurophysiological study of patients with perceived electrical hypersensitivity. Int. J. Psychophysiol. 42, 233241.
Pedhazur, E. J., and Schmelkin, L. P. (1991). Measurement,
Design, and Analysis: An Integrated Approach. Lawrence
Erlbaum, Hillsdale, New Jersey.
Rohsenow, D. J., Monti, P. M., Martin, R. A., Michalec, E.,
and Abrams, D. B. (2000). Brief coping skills treatment for
cocaine abuse: 12-month substance use outcomes.
J. Consult. Clin. Psychol. 68, 515520.
Shadish, W. R., Cook, T. D., and Campbell, D. T. (2002).
Experimental and Quasi-experimental Designs for Generalized Causal Inference. Houghton Mifflin, Boston.
Shavelson, R. J., and Webb, N. M. (1991). Generalizability
Theory: A Primer. Sage Publ., Newbury Park, California.
Solomon, R. L., and Lessac, M. S. (1968). A control group
design for experimental studies of developmental processes.
Psychol. Bull. 70, 145150.
Twenge, J. M., Baumeister, R. F., Tice, D. M., and Stucke, T. S.
(2001). If you cant join them, beat them: Effects of social
exclusion on aggressive behavior. J. Personal. Social Psychol.
81, 10581069.
Wood, R., and Bandura, A. (1989). Impact of conceptions of
ability on self-regulatory mechanisms and complex decision
making. J. Personal. Social Psychol. 56, 407415.
Wortman, P. M. (1992). Lessons from the meta-analysis
of quasi-experiments. In Methodological Issues in
Applied Social Psychology (F. B. Bryant, J. Edwards,
R. S. Tindale, E. J. Posavac, L. Heath, E. Henderson, and
Y. Suarez-Balcazar, eds.), pp. 6581. Plenum Press, New
York.
Glossary
abuse of discretion The standard that an appellate court
uses when reviewing a trial court judges decision whether
to allow a particular expert witness to testify.
Daubert/Kumho Tire test The Supreme Courts guidelines
concerning factors to examine in assessing the reliability of
expert witness testimony.
ecological fallacy The reliance on aggregate level data to
draw inferences about the behavior of individuals.
expert witness Someone qualified by knowledge, skill,
training, or education to express opinions about facts in
dispute in a case.
general acceptance test The guideline that expert analyses
and testimony must be based on principles and practices
generally accepted as reliable by others working in the same
field.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
919
920
latitude, subject to the very deferential abuse-ofdiscretion standard of review by appellate courts. Not
surprisingly, one appellate court judge, in a case (United
States v. Smithers, 2000) involving expert testimony
concerning eyewitness identifications, after referring to
a Daubert test commented, whatever that may be.
921
100
90
80
Votes cast for Davis (%)
922
70
60
50
40
30
20
10
0
0
10
20
30
40
50
60
70
80
90 100
Figure 1 Percentage of votes for the African-American candidate, Louisiana House of Representatives, District 21, 1995
run-off election, in all of the voting precincts in the district.
each set are compared. This simple procedure rarely contains sufficient coverage of a groups voters to constitute
a valid jurisdictionwide estimate of the divisions, however,
and therefore experts also perform analyses that are based
on all of the precincts.
The standard procedure for deriving estimates from all
of the precincts has been regression, typically the ordinary
least squares type, in which the measure of the vote for
a candidate is regressed onto the measure of the minority
presence in the precincts. The intercept and slope coefficients provide the information from which the respective estimates of group support are derived. When an
election analyzed is a contest in which only one person
may win and voters have only one vote to cast, the particular variant of regression most often employed is the
two-equation method, commonly called double
regression.
The EI method developed by Gary King is the most
complex of the three. It supplements the information
employed in regression analyses by adding information
about the empirical limits, or bounds, in group support
within every precinct. It also does not assume that the
group divisions in the vote are constant across precincts,
as do the regression analyses. Maximum likelihood estimation is employed to derive a bivariate normal distribution, truncated by the bounds, of possible combinations of
support levels for a particular candidate by each group.
Tomography lines that reflect the racial composition of
precincts are inserted through the bivariate distribution to
identify the possible values for each specific precinct.
Point estimates for the respective precincts are randomly
derived along the tomography line of possible values
923
924
Conclusion
Litigation is an adversarial process in which critical concepts under the law, such as racially polarized voting,
often become the subjects of great dispute. Social scientists serving as expert witnesses are engaged in a form of
applied research. This does not usually involve the theory
building and testing prevalent in their basic research,
but it does entail issues of definition and measurement
common to all empirical work. These issues are often at
the center of their analyses and testimony.
925
Further Reading
Daubert v. Merrell Dow Pharmaceuticals, 509 U.S. 579 (1993).
Dixon, L., and Gill, B. (2001). Changes in the Standards for
Admitting Expert Evidence in Federal Civil Cases since the
Daubert Decision Rand Institute for Civil Justice, Santa
Monica, CA.
Engstrom, R. L. (1985). The Reincarnation of the intent
standard: Federal judges and at-large election cases.
Howard Law Rev. 28, 495514.
Frye v. United States, 54 App DC 46 (1923).
Grofman, B. (2000). A primer on racial bloc voting analysis. In
The Real Y2K Problem (N. Persily, ed.), pp. 4381.
Brennan Center for Justice, New York.
Johnson, M. T., Krafka, C., and Cecil, J. S. (2000). Expert
Testimony in Civil Rights Trials: A Preliminary Analysis.
Federal Judicial Center, Washington, DC.
King, G. (1997). A Solution to the Ecological Inference
Problem. Princeton University Press, Princeton, NJ.
Kumho Tire Company v. Carmichael, 526 U.S. 137 (1999).
Thornburg v. Gingles, 478 U.S. 30 (1986).
United States v. Hall, 165 F.3d 1095, (1999).
United States v. Smither, 212 F.3d 306 (2000).
Wuffle, A. (1984). A Wuffles advice to the expert witness in
court. PS: Polit. Sci. Polit. 17, 6061.
C. Victor Bunderson
The EduMetrics Institute, Provo, Utah, USA
Glossary
design experiments Exhibit all aspects of a design study,
except that, in seeking explanatory and design theories,
reliance on narrative methods is supplemented with
invariant measurement of the growth or change constructs
spanning the domain. The measurement instruments evolve
over the cycles of design; they implement, evaluate,
redesign, and come to embody an increasingly adequate
descriptive theory of the processes operative in the
domain. In addition, the technological devices designed
to introduce and control the treatment effects are forthrightly described using the emerging layers and languages
of technology in that domain. In experimental terminology, design experiments are quasi-experiments, but
may include mini-randomized experiments within a larger
cycle.
design research Includes design studies and design experiments, both of which build domain-specific descriptive
theories as well as design theories. Design research also
includes research on design methods or design theory, as
applied across two or more domains.
design studies Seek two kinds of theoretical knowledge: first,
a descriptive explanation of the processes operative in
a domain, and second, technological or design knowledge
about how to create and implement the toolsboth
measurement instruments and the treatment control
technologies. These studies are attempts to discover new
artifact- and intervention-related principles or to improve
the effectiveness of existing artifacts or intervention plans.
Design studies take place in live settings, and are iterative,
cyclical applications of a process of principled design,
implementation, evaluation, and redesign. Design studies
often aid in exploring a domain and possible treatments,
and thus may be largely qualitative, producing narrative
accounts. These accounts may not provide adequate
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
927
928
Explore
Explain
Design
sciences) can then approach the ideal of invariant measurement, providing comparability from occasion to occasion and from group to group, so that unprecedented
progress can be made. However, progress will continue to
be limited in the human sciences and in design so long as
the metrics are incommensurable from study to study.
Explore
Early Scientific ExplorationNatural
History
Before it is possible to create theories, we must have some
description of nature and the content to which theories
pertain. Natural history has been described as a type of
research involving the collection and cataloguing of specimens as an inventory of what weve got. This becomes
a register of facts and a compilation of the contents of the
world. This way of knowing is contrasted with experimental inquiry into cause, which originated as natural philosophy, i.e., an explanation of causes rather than an
inventory.
Natural history studies have historically embraced anything that can be named and collected, though it is
a common misperception today that naturalistic studies
are confined to animals and plants. One of the most famous collectors, the founder of the British Museum, Sir
Hans Sloane (16601753), collected thousands of plant
and animal specimens, at the same time amassing collections of coins, medals, spoons, goblets, rings, minerals,
weaponry, combs, and a huge library of manuscripts.
Today, similar collections of human-made objects from
around the world compete in size with those of naturally
occurring phenomena. Sloanes collections, donated to
government, formed an important core of the British Museum. Sloane considered his major contribution to science the collection and accurate arrangement of these
curiosities.
Natural history explorations have a long history of rich
collectors assembling collections of oddities, gathered
mainly as conversation pieces and enhancements to social
status. However, natural history research expanded
greatly from approximately 1800 to 1900 and changed
in character. More serious and systematic collectors
emerged to gather and catalogue with the purpose of
arranging specimens in orderly tableaux that emphasized
gradation of similarities, suggested regular underlying
patterns, and identified gaps that supplied targets for further collection. The method of natural history research
became wide search, collection, description, cataloging,
preservation, preparation for display, and archiving. This,
of course, required that systems of measurement be devised for describing the magnitude of the different qualities of specimens. The search of natural historians
929
930
ess
roc
al p
tur
Na
Design/development process
Artifact of
intervention
Desired outcomes
Design
What Is Design?
Design research is historically the least well known and
least well understood of the triad in Fig. 1, but interest
in design research is increasing. Tough-minded experimental, sampling, and survey designs are used, and other
aspects of the design of measurement instruments, including new types of items or questions, and computeradministration, are increasingly common, but the bulk of
these works take their perspective from the framework of
scientific research; the focus here is on discussing design
research in more detail to provide an alternative viewpoint. Design research is emphasized here as a distinct
knowledge-producing activity that discovers processes,
principles, and structural concepts essential for the production of the technological tools and devices used in
explorational research, explanatory research, and in
design research.
931
932
instrument development. Tryon explains this phenomenon by pointing out that test development is not as scientifically respected as hypothesis testing is, even though
test development entails construct validation, which is
a highly theoretical enterprise. The neglect of design as
a knowledge-producing activity has been changing slowly.
An Internet search on the term design science shows
that it has been incorporated forthrightly in fields as disparate as engineering design, public administration, and
education.
Relationships among
ExploreExplainDesign
that could produce an explanation of the effect. The discovery of medicines and medical procedures through the
ages has tended to follow this route. Even today, folk
remedies and natural healing concoctions derived from
locally found substances supply the beginning point for
the refinement of pharmaceuticals. Until recently, serendipity, or dogged persistence in trial-and-error, was
a major source of new drug discovery. Interestingly,
this is being replaced with principled anticipatory design
research.
Early in the industrial revolution, some manufacturers
proceeded by trial-and-error in the absence of direction
from explanatory science. The excellence of early
Wedgwood china traces its origins to tireless design experiments involving clay and various minerals from
Wedgwoods own farm. Writer Jenny Uglow has ascertained that ceramics has always been a mix of science,
design, and skill, and every good potter was in a sense an
experimental chemist, trying out new mixes and glazes,
alert to the impact of temperatures and the plasticity of
clay. This, along with various patterns of aging, working,
and firing using a variety of glazes, ultimately produced
formulas that looked attractive and survived in their environment of use better than did the products of competitors. In some cases, then, exploratory research turns up
useful structures and processes that become fundamental
technologies. For example, the transistor effect was discovered by chance because of studies on the impurities
in crystal substances.
Explanatory Research Leads to
Exploratory Research
As explanatory theories gain support through scientific
research, inferences using the theory lead to the expectation of finding as-yet undetected natural phenomena.
The early stages of success of a theory constructed and
initially supported through explanatory research often
require additional exploratory research to develop further
support. Armed with predictions of theory, researchers
can go in search of specific phenomena that are predicted
but as yet not observed. Exploratory research motivated
by the explanatory theory of Mendeleyevs periodic table
of the elements led to the discovery of several elements,
most notably the trans-uranium elements, one of which
was named after Mendeleyev, in honor of his discovery.
Likewise, improved astronomical instruments developed
decades after Einsteins relativity theory allowed astronomers to observe the phenomenon of gravitational
lensing, just as Einsteins theory had predicted.
Explanatory Research Leads to
Design Research
As descriptive theories gain support through scientific
research, using principles from the theories to exercise control to produce specific outcomes becomes a
933
934
935
936
a principled design experiment using invariant measurement scales. In publications extending back to 1963,
Donald Campbell and Julian Stanley, and later Thomas
Cook, William Shadish, and many others, have given
guidance much used over the years in how threats to
the validity of causal inferences can be reduced by
employing good experimental design. Classically, three
pre-experimental designs, three true experimental designs, and 10 quasi-experimental designs have been discussed. There has been a shift away from significance
testing in fairly recent times in favor of effect sizes, but
the need to consider threats to the validity of inference
has not abated. The simplest inference is that the introduction of treatment X did indeed cause the change in
observation (measure) O. The emphasis here is that there
is a need for design disciplines to assure both that measurement O and treatment X do indeed involve the
constructs of learning progression along the pathway to
greater expertise. Design theories to guide the development of instructional treatments can succeed far better to
the extent that they have a descriptive account of the
progress that learners typically follow in moving to higher
levels of knowledge and expertise. Figure 2 depicts a design intervention into a natural process. In learning and
instruction, the most useful explanatory account would
describe the sequence of progressive attainments in the
particular learning domain of interest. This descriptive
knowledge guides the design of both the measurement
instruments, to determine outcome effects (O), and the
instructional treatments (X). Instructional-design theories (a term elaborated in a series of books by Charles
Reigeluth) are now available to guide the development of
controlling technologies for the treatment. Designing X
and O in a principled manner can help give this design the
six aspects of construct validity discussed in numerous
publications by validity theorist Samuel Messick. The descriptive account (or theory) of progressive attainments in
learning the increasingly difficult tasks in a learning domain, along with construct-linked measurement scales of
learning and growth, along with a validity argument, can
assure that the O indeed measures valid levels of progress
in the construct. For simplicity, designs that require
adaptation to individual difference measures are not
discussed here. It is sufficient to note that good quasiexperimental and experimental designs exist for examining hypotheses of how treatments might differ for
Table I
O X
O
O
O
Experimental group
Control group
Parameter
Baseline measure O0
Control for cycles
15
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
X1
O0
X2
O1
X3
O2
X4
O2
X5
O3
Control for
cycle 5
Control for
future cycles
937
Conclusion
The three knowledge-producing activitiesexplore, explain, and designhave been placed into a common context in this article in hopes of distinguishing among the
different types of question each activity addresses; at the
same time, it has been shown that it is not research methodology that allows them to be separated. Contributions
of each form of knowledge-seeking to the other forms
have also been described, and examples have been
given to show that though exploratory research and explanatory research have longer formal histories than design research has, all three forms of research continue
today to contribute to unanswered questions, and none
has outlived its usefulness, especially not exploratory research, which continues today at what might be considered an accelerated, rather than a diminished, rate.
The types of research and knowledge-seeking have
been described in proportion to the measurement theory
and technique concerns addressed in this volume. In particular, measurement has been related to the health and
progress of design research and exploratory research to
correct what can be viewed as a current underemphasis
in those areas caused by living in the shadow of science.
The intent here has not been to isolate the three
enterprises from each other, but to show their essential
938
Further Reading
Brown, A. L. (1992). Design experiments: Theoretical and
methodological challenges in creating complex interventions in classroom settings. J. Learn. Sci. 2(2), 141178.
Campbell, D. T., and Stanley, J. (1963). Experimental and
Quasi-Experimental Designs for Research. HougtonMifflin, Boston, MA.
Collins, A. (1992). Toward a design science of education. In
New Directions in Educational Technology (E. Scanlon and
T. OShea, eds.), pp. 1522. Springer-Verlag, New York.
Cronbach, L. J. (1957). The two disciplines of scientific
psychology. Am. Psychol. 12, 671684.
Cronbach, L. (1975). Beyond the two disciplines of scientific
psychology. Am. Psychol. 30, 116127.
Dewey, J. (1916). Democracy and Education. Free Press, New
York. [Cited in Tanner, L. N. (1997) Deweys Laboratory
School: Lessons for Today. Teachers College Press, New
York.]
Glossary
classical conditioning Also known as Pavlovian conditioning;
a form of learning that transfers or extends an existing
response to a stimulus to that of a new stimulus via the
temporal and/or spatial association of the two stimuli.
correlation A measure of covariation, measured on a scale
from 1.00 (perfect positive association) through 0.00 (no
association) to 1.00 (perfect negative association).
cortical arousal The level of activity in the brain, with high
levels of arousal indicative of mental alertness or readiness.
covariation The extent that two variables are associated, that
is, vary together.
dimension A continuous psychological variable.
electroencephalography Measures of electrical activity of the
brain, involving the analysis of various forms of brain waves.
excitationinhibition The action and counteraction of
cortical activity.
factor-analysis A statistical tool for summarizing the covariation between sets of variables; the patterns it yields may
point to underlying dimensions in the data.
genes The physical unit contained within the cells of living
organisms by which particular characteristics may be
passed from one generation to the next.
genetic That pertaining to genes.
hereditary Refers to characteristics passed from one generation to the next via genes.
limbic system A section of the midbrain associated with the
regulation of emotions, memory, feeding, and sexuality.
neurosis A term loosely applied to a relatively mild set of
mental disorders, often involving anxiety, phobias, compulsions, and depression.
psychosis A term loosely applied to the more severe mental
disorders, including schizophrenia and manic depression.
Encyclopedia of Social Measurement, Volume 1 2005, Elsevier Inc. All Rights Reserved.
939
940
which these differences determined conditioned learning, and the net effects of this interaction. For example,
introverts were thought to be far more responsive than
extraverts, learning quicker, better, and for longer; as
a consequence, Eysenck suggested that introverts also
tended to have a more developed sense of morality and
a greater capacity for academic achievement.
Eysenck drew many other implications from his personality theories, suggesting that various forms of social
distress were related to extreme positions on at least one of
these dimensions. Since the early 1950s, he collaborated
with a number of other researchers on pioneering twin
and family studies on the inheritance of personality
indicating a hereditability of at least 50%as well as writing on the genetic basis for intelligence, sexual behavior,
crime, and political attitudes. As the implications were
tested, the results served to elaborate or modify an evolving theoretical framework. By the mid-1960s, the formalism of Hull that inspired earlier work had fallen out of
favor. After some disappointing results in experimental
conditioning work, Eysenck revised the biological basis of
IE by linking it to cortical arousal thresholds in the
brainstems activation systems, with N related to limbic
system activity.
Several shifts were made to the content of Eysencks
three personality dimensions in order to maintain their
theoretical coherence, independence, and explanatory
power. High N was finally made up of traits such as anxiety, depression, guilt, and tension, with the opposite pole
being stability. The extraversion end of IE was characterized by sociability, assertiveness, and sensation-seeking
and the introversion end by low levels of these traits. High
P was defined by aggression, coldness, and impulsivity,
with the opposite being control. Eysenck made an initial
distinction between dysthymic and hysteric neuroticism
that related to introversionextraversion. However, this
distinction was later dropped as these terms disappeared
from psychiatric discourses. The P dimension was less
theoretically driven and never enjoyed a clearly articulated biological basis. At the extreme, P was initially associated with schizophrenia and manicdepressive illness.
However, P was empirically reworked so that high P
became more indicative of the sociopathy of current psychiatric nomenclature. The high ends of the N and P
dimensions were associated with psychopathology.
Neither extreme on the IE dimension per se carried
quite the same implications, although scores on this
dimension helped characterize the kind of psychiatric
symptomology that extreme scores on the other scales
indicated.
In a bid to provide standardized measures for his personality dimensions, Eysenck developed successive versions of a relatively short, accessible questionnaire. It first
appeared in 1959 as the Maudsley Personality Inventory
(measuring IE and N). With considerable input from his
941
942
Intelligence
Eysencks interest in intelligence research started late in
his career, partly because his mentor Burt had already
carved out the area so assiduously. As a result, his contribution was bound to be overshadowed by his earlier,
more definitive work on personality. In the late 1960s,
Eysenck proposed a structural model for intelligence
similar to Guildfords model that aimed to avoid the circular problems of its psychometric derivation. Throughout his career, he remained committed to a version of
the g concept, a notion originated by his intellectual
Political Attitudes
Eysenck extended his success in getting a grip on personality via factor-analysis into the political realm. Although
he published several more papers in the 1960s and 1970s,
his 1954 book The Psychology of Politics remained his
major statement in the area. According to Eysenck, social
and political attitudes can be organized into two bipolar
dimensions. One dimension followed the traditional
means for differentiating leftright political ideology
Radical versus Conservative. However, Eysenck labeled
the second dimension Tough versus Tender-Mindedness
(T), following the classic distinction of William James.
This produced a four-quadrant space, the most provocative implication being that the political extremes of
Fascism and Communism were separated by ideology
but were similar in terms of personal style. Eysenck linked
this conclusion with postwar research on Authoritarianism, controversially arguing that Adorno et al.s F concept
was practically synonymous with T. For Eysenck, this
balanced out the political picture, explaining the same
but different paradox he had witnessed in the volatile
politics of prewar Germany.
Eysencks work on political attitudes met with considerable hostility from liberal researchers, especially in the
United States. He engaged in a somewhat acrimonious,
highly technical debate with Rokeach and Hanley, and
with Christie, over the reality of left-wing authoritarianism and the adequacy of his factor-analytically derived
measures. As he struggled to obtain a pure, independent
943
944
945
Further Reading
Claridge, G. (1998). Contributions to the history of psychology.
CXIII. Hans Jurgen Eysenck (4 March 19164 September
1997), an appreciation. Psychol. Rep. 83, 392394.
Corr, P. J. (2000). Reflections on the scientific life of Hans
Eysenck. History Philos. Psychol. 2, 1835.
Eaves, L. J., Eysenck, H. J., and Martin, L. J. (1989). Genes,
Culture and Personality: An Empirical Approach.
Academic Press, New York.
Eysenck, H. J. (1991). Personality, stress and disease:
An interactionist perspective (with commentaries and
response). Psychol. Inquiry 2, 221232.
Eysenck, H. J. (1997). Rebel with a Cause: The Autobiography
of Hans Eysenck. Transaction, New Brunswick, NJ.
Eysenck, H. J., and Kamin, L. (1981). The Intelligence
Controversy. Wiley, New York.
Eysenck, H. J., and Nias, D. K. B. (1982). Astrology: Science or
Superstition? Temple Smith, London.
Furnham, A. (1998). Contributions to the history of psychology. CXIV. Hans Jurgen Eysenck, 19161997. Percept.
Motor Skills 87, 505506.
Gibson, H. B. (1981). Hans Eysenck: The Man and His Work.
Peter Owen, London.
Gray, J. (1997). Obituary: Hans Jurgen Eysenck (191697).
Nature 389, 794.
Jones, G. (1984). Behaviour therapy: An autobiographic view.
Behav. Psychother. 12, 716.
Modgil, S., and Modgil, C. (eds.) (1986). Hans Eysenck:
Consensus and Controversy. Falmer Press, London.
Nyborg, G. (ed.) (1997). The Scientific Study of Human
Nature: Tribute to Hans J. Eysenck at Eighty. Pergamon,
Oxford, UK.
Pelosi, A. J., and Appleby, L. (1992). Psychological influences
on cancer and ischaemic heart disease. Br. Med. J. 304,
12951298.
Wiggins, J. S. (ed.) (1996). The Five-Factor Model of
Personality: Theoretical Perspectives. Guilford, New York.
Editorial Board
Andy Neely
Murdoch University
Cranfield University
Brian Berry
Leonard Plotnicov
University of Pittsburgh
Charles Brody
Theodore Porter
Ruth Chadwick
Kenneth Rothman
Lancaster University
Boston University
David F. Gillespie
Robert W. Sussman
Washington University
Washington University
Ton de Jong
University of Twente
Tilburg University
George McCall
University of Twente
Manus I. Midlarsky
James Wright
Rutgers University
Editorial Board
Editor-in-Chief
Kimberly Kempf-Leonard
University of Texas at Dallas
Richardson, Texas, USA
Editor Biography
Dr. Kempf-Leonard is Professor of Sociology, Crime and
Justice Studies, and Political Economy at the University of
Texas at Dallas. Prior to her appointment at UTD in 2000,
she was Associate Professor and Graduate Director of
Criminology and Criminal Justice at the University of
Missouri at St. Louis. She also served for ten years as a
gubernatorial appointee to the Missouri Juvenile Justice
Advisory Group. She received her Ph.D. at the University
of Pennsylvania in 1986; M.A. at the University of Pennsylvania in 1983; M.S. at the Pennsylvania State University
in 1982; B.S. at the University of Nebraska in 1980.
Her book Minorities in Juvenile Justice won the 1997 Gustavus Myers Award for Human Rights in North America.
Her publications have appeared in: Criminology, Justice
Quarterly, Journal of Criminal Law & Criminology, Crime
& Delinquency, Journal of Quantitative Criminology, Advances in Criminological Theory, Punishment & Society,
Corrections Management Quarterly, the Journal of Criminal Justice, Criminal Justice Policy Review, The Justice
Professional, Youth and Society, The Corporate Finance
Reader, and The Modern Gang Reader.
Editorial Board
Gary King
Harvard University
Paul Tracy
University of Texas at Dallas
Foreword
Not long ago, and perhaps still today, many would expect
an encyclopedia of social measurement to be about
quantitative social science. The Encyclopedia of Social
Measurement excellently defies this expectation by
covering and integrating both qualitative and quantitative
approaches to social science and social measurement. The
Encyclopedia of Social Measurement is the best and strongest sign I have seen in a long time that the barren opposition between quantitative and qualitative research,
which has afflicted the social sciences for half a century,
is on its way out for good. As if the Science Wars proper
between the social and natural scienceswere not
enough, some social scientists found it fitting to invent
another war within the social sciences, in effect a civil
war, between quantitative and qualitative social science.
Often younger faculty and doctoral students would be
forced to take sides, and the war would reproduce within
disciplines and departments, sometimes with devastating
effects. This, no doubt, has set social science back.
We cannot thank the editors and contributors to the
Encyclopedia of Social Measurement enough for showing
us there is an effective way out of the malaise.
This volume demonstrates that the sharp separation
often seen in the literature between qualitative and quantitative methods of measurement is a spurious one. The
separation is an unfortunate artifact of power relations and
time constraints in graduate training; it is not a logical
consequence of what graduates and scholars need to
know to do their studies and do them well. The Encyclopedia of Social Measurement shows that good social science is opposed to an either/or and stands for a both/and
on the question of qualitative versus quantitative methods.
Good social science is problem-driven and not methodologydriven, in the sense that it employs those methods which
xxxix
Preface
xli
xlii
Preface
discipline-specific. Some preferences can be linked to a specific field of study or research topic; others, related to time
and location, coincide with how new ideas and advances in
technology are shared. Sometimes we dont even agree
on what is the appropriate question we should try to answer!
Although our views differ on what is ideal, and even on
what are the appropriate standards for assessing measurement quality, social scientists generally do agree that the
following five issues should be considered:
1. We agree on the need to be clear about the scope and
purpose of our pursuits. The benchmarks for
evaluating success differ depending on whether
our intent is to describe, explain, or predict and
whether we focus extensively on a single subject or
case (e.g., person, family, organization, or culture) or
more generally on patterns among many cases.
2. We agree on the need to make assurances for the
ethical treatment of the people we study.
3. We agree on the need to be aware of potential
sources of measurement error associated with our
study design, data collection, and techniques of
analysis.
4. We agree it is important to understand the extent to
which our research is a reliable and valid measure of
what we contend. Our measures are reliable if they
are consistent with what others would have found in
the same circumstances. If our measures also are
consistent with those from different research circumstances, for example in studies of other behaviors
or with alternate measurement strategies, then
such replication helps us to be confident about the
quality of our efforts. Sometimes wed like the results
of our study to extend beyond the people
and behavior we observed. This focus on a wider
applicability for our measures involves the issue of
generalizability. When were concerned about an accurate portrayal of reality, we use tools to assess
validity. When we dont agree about the adequacy
of the tools we use to assess validity, sometimes the
source of our disagreements is different views on
scientific objectivity.
5. We also agree that objectivity merits consideration,
although we dont agree on the role of objectivity or
our capabilities to be objective in our research. Some
social scientists contend that our inquiries must be
objective to have credibility. In a contrasting view of
social science, or epistemology, objectivity is not possible and, according to some, not preferable. Given
that we study people and are human ourselves, it is
important that we recognize that life experiences
necessarily shape the lens through which people
see reality.
Besides a lack of consensus within the social sciences,
other skeptics challenge our measures and methods. In
what some recently have labeled the science wars, external critics contend that social scientists suffer physics
envy and that human behavior is not amenable to scientific
investigation. Social scientists have responded to antiscience sentiments from the very beginning, such as
Emile Durkhiems efforts in the 19th century to identify
social facts. As entertaining as some of the debates and
mudslinging can be, they are unlikely to be resolved anytime soon, if ever. One reason that Lazarsfeld and
Rosenberg contend that tolerance and appreciation for
different methodological pathways make for better science
is that no individual scientist can have expertise in all the
available options. We recognize this now more than ever, as
multidisciplinary teams and collaborations between scientists with diverse methodological expertise are commonplace, and even required by some sources of research
funding.
Meanwhile, people who can be our research subjects
continue to behave in ways that intrigue, new strategies are
proffered to reduce social problems and make life better,
and the tool kits or arsenals available to social scientists
continue to grow. The entries in these volumes provide
useful information about how to accomplish social measurement and standards or rules of thumb. As you learn
these standards, keep in mind the following advice from
one of my favorite methodologists: Avoid the fallacy fallacy. When a theorist or methodologist tells you you cannot
do something, do it anyway. Breaking rules can be fun!
Hirschi (1973, pp. 1712). In my view nothing could be
more fun than contemporary social science, and I hope this
encyclopedia will inspire even more social science inquiry!
In preparing this encyclopedia the goal has been to
compile entries that cover the entire spectrum of measurement approaches, methods of data collection, and techniques of analysis used by social scientists in their efforts
to understand all sorts of behaviors. The goal of this project
was ambitious, and to the extent that the encyclopedia is
successful there are many to people to thank. My first thank
you goes to the members of the Executive Advisory Board
and the Editorial Advisory Board who helped me to identify
my own biased views about social science and hopefully to
achieve greater tolerance and appreciation. These scientists helped identify the ideal measurement topics, locate
the experts and convince them to be authors, review drafts
of the articles, and make the difficult recommendations
required by time and space considerations as the project
came to a close. My second thank you goes to the many
authors of these 356 entries. Collectively, these scholars
represent well the methodological status of social
science today. Third, I thank the many reviewers whose
generous recommendations improved the final product.
In particular I extend my personal thanks to colleagues
at the University of Texas at Dallas, many of whom participated in large and small roles in this project, and all of
whom have helped me to broaden my appreciation of social
Preface
xliii
KIMBERLY KEMPF-LEONARD
Factor Analysis
Christof Schuster
University of Notre Dame, Indiana, USA
Ke-Hai Yuan
University of Notre Dame, Indiana, USA
Glossary
Introduction
Encyclopedia of Social Measurement, Volume 2 2005, Elsevier Inc. All Rights Reserved.
Factor Analysis
R KUK0 W,
Ex j j Kj,
is called the factor structure. Specializing this expression to a variablefactor pair yields Cov(xi, xj)
li1f1j liqfqj. The covariances of the factor
structure matrix do not control for other factors. This
means that even if a particular variable has a zero
loading on, say, the first factor, it still can be correlated
with the first factor because of indirect effects.
Specifically, if the variable loads on a factor that is also
correlated with the first factor, the variable will be
correlated with the first factor despite its zero loading on
this factor. Also note that for standardized variables, the
factor structure matrix gives the correlations rather than
the covariances between the factors and the variables.
An important special case occurs if the factors are uncorrelated, that is, U I. In this case, the factors are said
to be orthogonal and several of the preceding equations
simplify. First, the communality of the ith variable reduces to h2i l2i1 l2iq . Second, the factor structure
matrix [see Eq. (3)] is equal to the factor loading matrix,
that is, Cov(x, j) K. Thus, if the factors are orthogonal,
a zero loading will imply a zero covariance between the
corresponding variablefactor pair. Again, note that for
standardized variables, the l parameters represent correlations rather than covariances between variables
factor pairs.
Factor Analysis
legitimate set of factors with K KT1. This follows because the assumptions of the factor model are also fulfilled
by the new factors j and the fact that the conditional
expectation of the variables is invariant to this transformation. This is easily seen by noting that E(x j j )
K j (KT1)(Tj) Kj E(x j j). When the factors
are uncorrelated, the set of legitimate transformation matrices T is limited to orthogonal matrices, which fulfill the
condition TT0 I. In order to obtain a solution for the
factor loadings, it is desirable to remove this indeterminacy. This can be achieved if the factors are uncorrelated,
that is, U I, and if the so-called canonical constraint,
which requires K0 W1K to be diagonal, is satisfied.
Having imposed sufficient model restrictions to define
the parameters uniquely, the degrees of freedom (df ) can
be determined. The degrees of freedom are equal to the
difference between the p(p 1)/2 freely varying elements
in the unconstrained population covariance matrix R and
the number of unrestricted model parameters. The
degrees of freedom characterize the extent to which
the factor model offers a simple explanation of the correlations among the variables. A necessary condition for
the identification of the model parameters is df 0.
Clearly, if the factors are uncorrelated, all model parameters are either loadings or uniquenesses and the total
number of model parameters is pq p. Because the canonical constraint introduces q(q 1)/2 restrictions on
the model parameters, the degrees of freedom are
df pp 1=2 pq p qq 1=2
1=2 p q2 p q :
4
Estimation procedures may yield negative estimates
for the c parameters that are outside the permissible
range. Such inadmissible solutions are commonly referred to as Heywood cases. Heywood cases occur
quite frequently in practice. A simple strategy of dealing
with negative uniquenesses is to set them equal to zero.
However, this strategy implies an unrealistic model characteristic, namely, that the factors perfectly explain the
variation of the variables having zero uniquenesses. Finally, note that methods for estimating the factor loadings
assume that the number of factors is known.
Factor Analysis
Rotating Factors to
Simple Structure
Because the estimated factor loadings are based on arbitrary constraints used to define uniquely the model parameters, the initial solution may not be ideal for
interpretation. Recall that any factor transformation
j Tj for which diag(TUT0 ) I is an equally legitimate
solution to Eq. (1). To simplify interpretation, it is desirable to rotate the factor loading matrix to simple structure,
which has been defined in terms of five criteria: (1) each
row of K should have at least one zero; (2) each of the q
columns of K should have at least q zeros; (3) for every pair
of columns of K, there should be several variables with
a zero loading in one column but not in the other; (4) for
every pair of columns of K, there should be a considerable
proportion of loadings that are zero in both columns if
q 4; and (5) for every pair of columns of K, only few
variables should have nonzero loadings in both columns.
Rotation techniques differ according to their emphasis on
particular simple structure criteria.
Generally, a distinction is made between orthogonal
and oblique rotation techniques, which yield uncorrelated
or correlated factors, respectively. If the substantive concepts identified by the factors are related, correlated factors are appealing because they allow for a more realistic
representation of the concepts, as compared to orthogonal
factors.
One of the most popular orthogonal rotation methods
is the varimax approach. This approach aims at finding
a loading pattern such that the variables have either large
(positive or negative) loadings or loadings that are close to
zero. The varimax approach tries to accomplish this loading pattern by maximizing the variance of the squared
loadings for each factor. Another popular orthogonal rotation techniques is the quartimax rotation, which
maximizes the variance of the squared factor loadings
in each row.
One of the most common oblique rotation techniques
is the promax approach. This approach improves the loading pattern obtained from an orthogonal rotation in the
sense of further increasing large loadings and further
decreasing small loadings. Varimax rotation is commonly
used as prerotation to promax. The promax approach
accomplishes this goal in two steps. First a target matrix
Factor Analysis
is obtained from the normalized loading matrix by replacing each factor loading by its kth power. For even powers,
the signs of the loading matrix elements carry over to the
corresponing target matrix elements. Common values for
k are 3 and 4. Second, the orthogonal factors are rotated
such that the variable loadings are, in the least-squares
sense, as close as possible to the corresponding elements
of the target matrix.
Table I
01
02
03
04
05
06
07
08
09
10
a
Example
Generally, the results from a factor analysis of a correlation
matrix and the corresponding covariance matrix are not
identical. When analyzing a covariance matrix, variables
having large variance will influence the results of the
analysis more than will variables having small variance.
Because the variances of the variables are intrinsically
linked to the measurement units, it is preferable to
analyze standardized variables, which is equivalent to fitting the factor model based on the correlation matrix, if
the variables have been measured using different units.
Factor analysis can be illustrated using the artificial
data set given in Table I. The data set contains standardized performance scores of 10 individuals obtained from
an algebra problem, a trigonometry problem, a logic
puzzle, a crossword puzzle, a word recognition task,
and a word completion task. The correlation matrix of
Observation
x1
x2
x3
x4
x5
x6
0.697
1.787
0.206
0.191
0.606
0.171
1.460
0.639
0.779
1.304
0.700
1.538
0.913
0.430
0.225
0.417
1.038
0.888
1.595
0.702
1.268
2.018
0.079
1.074
0.296
0.620
0.532
0.306
0.775
0.844
2.245
0.486
0.801
0.002
0.602
0.519
1.261
0.372
0.499
0.688
1.973
0.163
0.964
0.071
0.990
0.694
0.364
0.305
1.215
0.992
1.674
0.065
1.043
0.159
1.174
0.648
0.848
1.101
1.055
0.488
The performance measures (x1x6) are scores obtained on six tests: an algebra problem, a trigonometry problem,
a logic puzzle, a crossword puzzle, a word recognition task, and a word completion task.
Factor Analysis
1:0
B 0:7
B
B
B 0:7
RB
B 0:5
B
B
@ 0:5
0:7
0:7 0:5
1:0
0:7
0:7 0:3
1:0 0:4
0:3
0:3
0:4 1:0
0:4 0:7
0:5 0:3
0:3 0:1 C
C
C
0:4 0:2 C
C:
0:7 0:6 C
C
C
1:0 0:5 A
0:3 0:1
0:2 0:6
0:5 1:0
3.0
Eigenvalue
2.5
2.0
1.5
1.0
0.5
Rotation method
Rank
Unrotated
Variable
x1
x2
x3
x4
x5
x6
Varimax
Promax
x1
x2
x1
x2
x1
x2
0.828
0.711
0.754
0.746
0.693
0.482
0.226
0.489
0.335
0.486
0.373
0.485
0.781
0.860
0.794
0.263
0.294
0.061
0.356
0.079
0.225
0.851
0.730
0.681
0.758
0.920
0.807
0.055
0.123
0.120
0.184
0.141
0.037
0.864
0.722
0.731
Factor Analysis
1.0
1.0
x6
0.5
x4
x5
Factor II
Factor II
0.5
0.0
x1
x3
x2
0.5
x6
x4
x5
0.0
x
x3 1
x2
0.5
1.0
1.0
1.0
0.5
0.0
0.5
1.0
1.0
0.5
Factor I
0.0
0.5
1.0
Factor I
Figure 2 Loadings of variables on unrotated and rotated axes. The left-hand panel depicts the loadings with
respect to the unrotated and varimax-rotated factors. The right-hand panel depicts the loadings with the respect to
the unrotated and promax-rotated axes.
Bartlett
x1
x2
x1
x2
1.008
1.764
0.230
0.070
0.244
0.296
1.065
0.187
1.214
1.007
2.122
0.093
0.968
0.014
0.846
0.036
0.862
0.199
0.389
0.833
0.994
2.076
0.356
0.083
0.211
0.351
1.173
0.237
1.389
1.107
2.463
0.300
1.197
0.024
0.998
0.075
0.931
0.261
0.342
0.902
Further Reading
Bartholomew, D. J., and Knott, M. (1999). Latent Variable
Models and Factor Analysis, 2nd Ed. Arnold, London.
Factor Analysis
Falsification in Social
Science Method and Theory
Bryan Benham
University of Utah, Salt Lake City, Utah, USA
Charles Shimp
University of Utah, Salt Lake City, Utah, USA
Glossary
Bayesian inference A use of Bayes theorem relating
conditional and unconditional probabilities, by which it
has sometimes been hoped to interrelate objective and
subjective probabilities.
falsification Any method by which scientific claims are
evaluated by empirical disconfirmation.
induction Any method for confirming general scientific laws
or theories by appealing to the accumulation of specific
experimental observations.
methodological pluralism The position according to which
diverse methods, perhaps having different ontological
implications, may nevertheless be legitimate means of
discovering scientific truths.
positivism A philosophy of science according to which
scientific observation is independent of theoretical commitments and science does and should rest on a secure
empirical foundation.
Quine Duhem thesis The position according to which the
meaning or truth of statements cannot be determined
individually, but only holistically.
scientific method Any means by which scientific truths are
reliably obtained and scientific claims are evaluated.
underdetermination The condition in which two or more
mutually inconsistent theories can equally account for the
same data and for which evidence alone does not determine
theory selection.
can rest on a secure foundation, provided that its hypotheses and theories are required to be subject to possible
empirical falsification. Contemporary views of falsification vary greatly. In the philosophy of science, it is seen
chiefly in logical and historical terms. In introductory social science texts on methodology, it is often seen as a way
to reduce scientists self-imposed biases that reflect theoretical preferences. Accordingly, in texts on rigorous,
quantitative methodology, falsification tends to be praised
either explicitly or implicitly as an essential part of the
scientific method. Alternatively, in texts on qualitative
research methods, it is sometimes described as a largely
unhelpful or even counterproductive method. In social
science practice, there is often an appeal to informal falsification, along with parsimony, as key evaluative criteria
by which theories and empirical claims about truth can be
evaluated. In theories of choice, decision making, and
rational judgment, humans are seen as hypothesis-testing
organisms who, in some contexts, may use an intuitive
form of falsification. In mathematical statistics, many alternatives have been developed in response to the somewhat counterintuitive, negative logic of falsification.
Similarly, the philosophical context from which falsification is viewed has broadened, so that to some scholars,
positivism, within which falsification developed, is often
seen as only a historical phase in the development of the
philosophy of science. To other scholars, the scientific
method is seen as successfully applying to the physical
sciences but not to the social sciences. Most radically,
some scholars see the scientific method within which
falsification is defined as an unworkable method and in
any case as a political tool. This constructionist and historicist view is compatible with the increasing use of
narratives and other qualitative methods in sociological,
Encyclopedia of Social Measurement, Volume 2 2005, Elsevier Inc. All Rights Reserved.
10
Origins of Falsification
Induction and Scientific Method
Falsification originated in part as a response to various
problems raised by earlier views of scientific methods
based on induction. Induction has the intuitively desirable
property that data that agree with a scientific hypothesis
support or confirm that hypothesis. Few claims about
scientific method seem more natural to a beginning science student. For example, suppose it is predicted that it
will rain if the temperature falls below the dew point. To
many students, it seems reasonable that there is confirmation of the prediction when the temperature falls below
the dew point and it indeed starts to rain. From the perspective of induction, science progresses by the formulation of a hypothesis and the collection of sufficient
empirical data that agree with the hypothesis. When accumulated evidence consistent with the hypothesis is sufficient, the hypothesis is confirmed. Although this account
is oversimplified, insofar as induction exists in sophisticated forms, the goal of induction may be simply stated as
the confirmation of scientific claims by the accumulation
of data that agree with those claims.
Falsification as a Solution to
the Problems of Induction
Falsification aims to overcome these problems with induction. According to falsification, the hallmark of scientific methodology is not that it uses observation or
empirical evidence to verify or confirm its hypotheses.
After all, many nonscientific practices, e.g., astrology,
also employ this strategy. Rather, according to falsificationists, what makes science unique is that its claims are
open to empirical falsification. What makes a generalization such as All crows are black genuinely scientific is not
that there is a huge amount of observational evidence in its
support, but that we know what type of evidence would
count decisively against it; namely, the observation of
only one nonblack crow would falsify it. Although such
11
12
Instead, other criteria are needed. Some candidates include considerations of parsimony (e.g., Occams razor),
consistency with already accepted theory, productivity,
and predictive power, each of which has its own methodological problems. More insidious problems arise for theory selection if scientific method is viewed from
a historicist or sociological view, according to which science involves metatheoretical, arbitrary, possibly irrational, and ideological or political features. From this view,
falsification does not provide a methodology from which
to escape these idiosyncratic features of scientific
research methodology.
Falsification in Contemporary
Social Science Research
The Persistence of Falsification
Despite the problems with falsification, it retains
a prominent position in social science. One reason why
falsification, in one form or another, continues to be used
may be that it so effectively captures a widespread view of
science: Virtually everyone seems to agree that, in some
sense, empirical scientific claims should be subject to
refutation by empirical observation. In theory evaluation,
in research methods, and especially in statistical methods,
falsification is often held up as an essential component of
scientific method. Falsification remains prominent even
in cases in which it is the object of criticism, in part for
reasons already described and in part because some assert
that it does not apply constructively to qualitative and
interpretive methods that are increasingly employed in
the social sciences. The proper place for falsification
in social science research is therefore still a matter of
some debate.
Alternative Methods
Not all social scientists are willing to accept falsification
as an essential part of scientific method, for reasons
previously described. Others reject falsification and
associated views about rigorous hypothesis testing
13
in accordance with the rational logic of scientific hypothesis testing. For example, Tversky and Kahneman have
argued, especially in terms of base-rate neglect, that
human categorical judgment displays large-scale irrationality. In another series of experiments to determine
whether humans are naturally Popperian falsificationists,
Wason presented subjects with a selection task aimed at
testing an abstract rule. The result was that participants
demonstrated a strong bias toward selecting potentially
confirming evidence, and not equally available potentially
disconfirming evidence. However, others have argued
that rationality in these types of experiments must be
interpreted in terms of cognitive processes evolution
has provided, not in terms of the logic of falsificationist
hypothesis testing. Thus, Tooby and Cosmides showed
that participants in the Wason selection task actually
did use something like falsificationist logic when the
rule to be tested was placed in a more naturalistic setting, involving detecting cheaters in a social setting, not
as a form of abstract rule testing. These types of experiments are provocative in what they can tell us about
the standards of scientific methodology and research
design.
Methodological Pluralism
The acceptance of falsification has been so pervasive in
some circles that it has caused concern among researchers, who have felt its weaknesses have been
underestimated and that domination by a single method
is unhealthy. Alternative methods less committed to falsification and more in line with the tradition of induction
seem to be finding growing acceptance in some parts of
social science. The increasing use of narrative and other
qualitative methods in sociological, anthropological, and
psychological research raises controversial questions
about the feasibility and appropriateness of methodological pluralism. Legitimate scientific method may not be
a unified system relying heavily on falsification in the
context of hypothesis testing, but may involve a great diversity of methods that may or may not be consistent with
one another. How the tension between these competing
approaches and the traditional view of scientific method
will be resolved remains to be seen.
Acknowledgment
14
Further Reading
Cosmides, L., and Tooby, J. (1992). Cognitive adaptations for
social exchange. The Adapted Mind: Evolutionary Psychology and the Generation of Culture (J. H. Barkow, et al.,
eds.), pp. 163 228. Oxford University Press, Oxford,
England.
Duhem, P. (1906/1954). The Aim and Structure of Physical
Theory (translated by P. P. Wiener). Princeton University
Press, Princeton, New Jersey.
Feyerabend, P. K. (1987). Farewell to Reason. Verso, London.
Gigerenzer, G. (1996). The psychology of good judgment:
Frequency formats and simple algorithms. Med. Decis.
Making 16, 273 280.
Hanson, N. R. (1969). Perception and Discovery. Freeman,
Cooper, San Francisco.
Hume, D. (1739 1740/2000). Treatise on Human Nature.
Dent, London.
Kuhn, T. S. (2000). The Road since Structure. University of
Chicago Press, Chicago.
Leahey, T. H. (1992). A History of Psychology, 3rd Ed.
Prentice Hall, Englewood Cliffs, New Jersey.
Nevin, J. A. (1969). Interval reinforcement of choice behavior
in discrete trials. J. Exp. Anal. Behav. 12, 875 885.
Glossary
federalism A system of government in which national and
local units possess autonomous domains of decision making.
fiscal federalism The division of responsibility for public
services, taxation, and debt across multilevel jurisdictions.
general-purpose government A government, such as a state,
municipal, county, or township government, that performs
a variety of functions; in contrast to school districts or special
districts, which perform only one or a few specific roles.
obligation In U.S. budgeting terms, a transaction that may
either be paid in the present period or require payment in
a future period.
Introduction
In the Federalist Papers, James Madison characterized
the U.S. constitution as a mixture of federal and national
elements. For Madison, federal meant decentralized,
but more recently, students of intergovernmental
relations have used the word federalism to refer to
a system of government in which national and local
units divide sovereignty, each level possessing its own
decision-making domain. There are many such nations
in the world today, including Argentina, Australia, Austria,
Brazil, Canada, Germany, India, Malaysia, Mexico,
Encyclopedia of Social Measurement, Volume 2 2005, Elsevier Inc. All Rights Reserved.
15
16
agencies to state and local government entities only during a given fiscal year. These data are presented at
a higher level of aggregation compared to the FAADS
data, but are useful for those who are interested in annual
federal grant aid to governments, rather than to individuals or private organizations.
Consolidated Federal Funds Report
The Consolidated Federal Funds Report (CFFR) is published annually by the U.S. Census Bureau and includes
data (based on CFDA classifications, with records from
1993 to present available online) on distribution of federal
funds to local areas at least down to the county level
(sometimes down to the city level). This publication combines data from the FAADS, the Federal Aid to the States
report, and several other (relatively minor) sources. An
online query system at the Census Bureau web site allows
users to search for data by geographic area and by agency
(and subagency if applicable). This search produces summary tables with program names, program codes, and
annual dollar figures. Consolidated data files are also
available for download in comma-delimited format.
Public Employment
Here the census includes statistics on employment and payrolls at several different levels of aggregation: nationally by
function, by states, and by county area. It also contains detailed data on major local governments, which includes all
counties, general-purpose governments of greater than
25,000 population, and school districts of over 7500 students.
Government Finances
Of most interest for those studying intergovernmental
public finance, this section of the census details all revenue, expenditure, debt, and financial assets of
governments for the fiscal year. Within these broad categories, data are further broken down into very specific
classifications. A scholar interested in transportation expenditures, for example, encounters spending data for
highways, air transportation, parking facilities, sea and
inland port facilities, and transit subsidies.
17
18
Fiscal Federalism
Further Reading
Collet, C. (1997). State Legislative Election Candidate and
Constituency Data, 1993 1994. ICPSR 2019. Inter-University Consortium for Political and Social Research, Ann
Arbor, Michigan.
Erikson, R. S., Wright, G. C., and McIver, J. P. (1993).
Statehouse Democracy: Public Opinion and Policy in the
American States. Cambridge University Press, New York.
Inglehart, R., et al. (2000). World Values Surveys and European
Values Surveys 1981 1984, 1990 1993, 1995 1997.
ICPSR 2790. (electronic file). Inter-University Consortium
for Political and Social Research, Ann Arbor, Michigan.
Inter-University Consortium for Political and Social Research
(1992). State Legislative Election Returns in the United
States, 1968 1989. ICPSR 8907. Inter-University Consortium for Political and Social Research, Ann Arbor, Michigan.
King, G., Palmquist, B., Adams, G., Altman, M., Benoit, K.,
Gay, C., Lewis, J. B., Mayer, R., and Reinhardt, E. (1998).
Record of American Democracy, 1984 1990. ICPSR 2162.
Inter-University Consortium for Political and Social Research, Ann Arbor, Michigan.
Rodden, J. (2002). The dilemma of fiscal federalism: Grants
and fiscal performance around the world. Am. J. Pol. Sci.
46(3), 670 687.
Roper Center for Public Opinion Research. (2001). Social
Capital Benchmark Survey, 2000. (electronic file).
Saguaro Seminar at the John F. Kennedy School of Government, Harvard University. Available on the Internet at
www.ropercenter.uconn.edu
Saez, L. (1999). Indias economic liberalization, interjurisdictional competition and development. Contemp. S. Asia
8(3), 323 345.
Saiegh, S. M., and Tommasi, M. (1999). Why is Argentinas
fiscal federalism so inefficient? Entering the labyrinth.
J. Appl. Econ. 2(1), 169 209.
Stein, R. M., and Bickers, K. N. (1995). Perpetuating the Pork
Barrel: Policy Subsystems and American Democracy.
Cambridge University Press, New York.
Ter-Minassian, T. (ed.) (1997). Fiscal Federalism in Theory and
Practice. International Monetary Fund, Washington, D.C.
U.S. Bureau of the Census. (2000). 1997 Census of Governments, Volume 3-2, Compendium of Public Employment.
U.S. Government Printing Office, Washington, D.C.
U.S. Bureau of the Census. (2000). 1997 Census of Governments: Volume 4-5, Compendium of Government Finances.
U.S. Government Printing Office, Washington, D.C.
U.S. Bureau of the Census. (2002). Federal Aid to the States
for Fiscal Year: 2001. U.S. Government Printing Office,
Washington, D.C.
Wright, G. C., McIver, J. P., and Erickson, R. S. (2003). Pooled
CBS/NYT Party ID and Ideology Estimates, 1977 1999,
(electronic file). Available on the Internet at http://
sobek.colorado.edu
Field Experimentation
Donald P. Green
Yale University, New Haven, Connecticut, USA
Alan S. Gerber
Yale University, New Haven, Connecticut, USA
Glossary
dependent variable The outcome variable to be explained.
external validity The extent to which the results from a given
study inform the understanding of cause and effect in other
settings or populations.
independent variable The explanatory or treatment variable.
instrumental variables regression A statistical technique
designed to correct for the contamination or mismeasurement of an independent variable. The technique involves
predicting this independent variable with one or more
variables that are uncorrelated with unmodeled causes of
the dependent variable.
random assignment A method of distributing experimental
subjects to treatment and control groups such that every
observation has an equal probability of receiving the
treatment.
spurious correlation A correlation between two variables that
occurs, not because one variable causes the other but rather
because both variables are correlated with a third variable.
Introduction
Field experiments, as distinct from laboratory experiments, are studies conducted in natural social settings,
Intervention
Unlike passive observation, which tracks putative causes
and effects as they unfold over time or turn up in different
places, intervention-based research seeks to identify
causal patterns by disrupting the normal flow of social
activity, for example, through the introduction of a new
policy, dissemination of information, or creation of new
social arrangements.
Controlled Comparison
Intervention research has a long intellectual history, but
the scientific value of intervention research depends on
the procedures used to determine when and where to
intervene. Although massive interventions such as the
creation of the Soviet Union are sometimes dubbed experiments in the colloquial sense, a basic requirement of
any experiment is the use of a control group against which
changes in the treatment group are to be gauged. Control
groups may be formed in a number of ways. We may
compare different sets of observations to one another,
track a single set of observations over time, or follow
multiple groups over time. But to draw meaningful
Encyclopedia of Social Measurement, Volume 2 2005, Elsevier Inc. All Rights Reserved.
19
20
Field Experimentation
Randomization
The comparability of treatment and control groups is ensured by random assignment. The assignment of
observations is said to be random when each observation
has the same ex ante probability of winding up in the
treatment or control groups. The integrity of randomization is best ensured through a set of assignment
procedures, whether it be flipping coins or consulting
a sequence of random numbers. Note that randomization procedures may be under the control of the experimenter or some other agency or institution in charge of
a lottery (e.g., the random assignment of judges to court
cases).
Other approaches, such as assigning observations to
treatment groups on an alternating basis, might be termed
near-randomization, because we cannot be certain that
the treatment and control groups differ by chance alone.
Still less secure are attempts to simulate random assignment after the fact by matching treatment and control
observations as closely as possible in terms of other observable characteristics. This approach, the logic of which
undergirds the overwhelming majority of social science
research, is known as quasi-experimentation.
The advantage of randomization is that it guarantees
that treatment and control groups differ solely by chance
prior to an intervention. Moreover, randomization guarantees that chance differences will disappear as the number of observations increases, so that remaining
differences reflect the interventions influence on the
treatment group. No such guarantees govern near-random or quasi-experimental research. The first to recognize the full significance of this point was R. A. Fisher, who
in 1926 argued vigorously on the advantages of assigning
the units of observation at random to treatment and control conditions, as reported by Box:
As Fisher put it in correspondence, the experimenter
games with the devil; he must be prepared by his layout
to accommodate whatever pattern of soil fertilities the
devil may have chosen in advance. A systematic arrangement is prepared to deal only with a certain sort of devilish plan. But the devil may have chosen any plan, even
the one for which the systematic arrangement is appropriate. To play this game with the greatest chance of
success, the experimenter cannot afford to exclude the
possibility of any possible arrangement of soil fertilities,
and his best strategy is to equalize the chance that any
treatment shall fall on any plot by determining it by
chance himself.
Verisimilitude
Finally, field experimentation reflects the notion that research best occurs in settings that most closely approximate the domain in which knowledge is to be applied. The
term field harkens to the agricultural origins of modern
experimentation. Near-experiments in agriculture date
back to the eighteenth century and grew increasingly sophisticated by the late nineteenth century. The pathbreaking work of R. A. Fisher in the 1920s, which laid
the foundation for randomized experimentation, grew
out of and was first applied to agricultural research.
That research program had the down-to-earth objective
of raising crop yields and lowering production costs on
English farms. In a similar vein, many field experiments
in social science are conceived as an attempt to evaluate
specific programs or pilot projects. Often, however, they
inform larger theoretical debates about behavioral responses to incentives, coercion, information, and moral
suasion.
Field Experimentation
provides a careful analysis of the data that takes into account the practical problems that arose during the course
of the study.
Advantages of Field
Experimentation
Field experimentation may be viewed as a response to
the limitations of both laboratory experimentation and
observational research. In this section, we briefly summarize these limitations and the advantages of field experimentation.
21
lengths to have subjects view television in a relaxed environment akin to their own homes, yet the evaluation of
these exposure effects consists of a survey administered
shortly afterward. It is difficult to know how to translate
these survey responses into the terms that matter politically, namely, votes on election day. The advantage of
randomizing media markets in which political advertisements are aired is that we can link the treatment to election results. Similar arguments could be made about
laboratory versus field studies of whether policing deters
crime or pre-school programs augment high school
graduation rates.
22
Field Experimentation
Field Experimentation
overestimated. Under these conditions, the sign and magnitude of the biases in nonexperimental research are
knotty functions of variances and covariances of observed
and unobserved determinants of the vote. Even if by some
accident of fate the positive biases were to offset the
negative, our ability to compare experimental and nonexperimental findings is hampered by the fact that most
surveys fail to gauge the precise nature and frequency
of campaign contact.
As it turns out, experimental studies have found that the
effectiveness of mobilization varies markedly depending
on whether voters are contacted by phone, mail, or faceto-face. In 2000, Gerber and Green conducted a randomized field experiment using a population of 30,000
registered voters in New Haven. Their results indicated
that personal canvassing boosts turnout by approximately
9 percentage points, whereas phone calls have no
discernible effect. Both experimental and nonexperimental studies find statistically significant effects
of campaign activities on voter turnout. Experimental
studies, however, speak to the issue of causality with far
greater clarity and nuance because the researchers have
control over the content of the campaign stimulus and can
rule out threats to valid inference.
Drawbacks
Performing randomized experiments in real-world settings presents a number of practical challenges, the magnitude of which in part explains the relative paucity of this
form of research in social science.
Resource Limitations
Field experiments are often expensive. Even relatively
inexpensive field experiments that require a change in
administrative behavior (e.g., mandatory arrests in domestic violence cases) rather than the allocation of additional
resources still require more funding and effort than the
typical observational study.
23
Implementation Failure
Assigning subjects at random to treatment and control
groups is no guarantee that subjects will actually receive
the treatment on a purely random basis. Sometimes those
who administer or participate in experiments subvert randomization in various ways. A classic instance of implementation failure may be found in the Lanarkshire milk
study, one of the first randomized experiments ever
conducted. Thousands of children in Lanarkshire schools
were randomly assigned during the 1920s to dietary supplements of pasteurized milk, raw milk, or nothing. Their
physical development was tracked over several years. Out
of concern for the well-being of their students, teachers
reassigned some of the smallest and most needy children
from the control group into one of the milk-drinking conditions, thereby undermining the randomness of the
treatment. Similar problems arise in social experiments
in which administrators steer certain subjects into the
treatment group or when subjects themselves insist on
being reassigned.
Sampling
The central aim of field experimentation is to estimate
the average treatment effect in a given population. For
example, a study may seek to estimate the average extent
to which preparatory classes improve performance on
college entrance examinations. Although it is conceivable
that all students react in the same way to such classes, it
may be that certain types of students benefit more than
others. If effects vary, and the aim is to estimate the
average treatment effect, we must either draw a representative sample of students or narrow our focus (and
conclusions) to students of a particular type.
The issue of sampling and generalizability arises in
several ways. First, the challenges of orchestrating field
experiments means that they tend to occur in a small
number of sites that are chosen for reasons of logistics
rather than representativeness. Second, within a given
experiment, the numbers of observations may be attenuated by subjects refusal to participate or decision to
drop out of the study. Noncompliance and attrition are
remediable problems as long as the decision to participate is unrelated to the strength of the treatment effect.
The statistical correction is to perform an instrumental
variables regression in which the independent variable
is whether a subject was actually treated and the instrumental variable is whether a subject was originally assigned to the treatment group.
The situation becomes more complex if participation
and responsiveness to the experimental treatment
interact. In this case, the studys conclusions apply only
to the types of people who actually participate in an experiment. The way to address this concern empirically is
to replicate the experiment under conditions that lead to
24
Field Experimentation
Conclusion
The results from large-scale field experiments command
unusual attention in both academic circles and the public
at large. Although every experiment has its limitations,
field experiments are widely regarded as exceptionally
authoritative. By combining the power of randomization
with the external validity of field research, these studies
have the potential to eclipse or significantly bolster findings derived from laboratory experiments or nonexperimental research.
Further contributing to the strength of field experimentation is the transparent manner in which the data
are analyzed. In contrast to nonexperimental data analysis, in which the results often vary markedly depending on
the model the researcher imposes on the data and in
which researchers often fit a great many models in an
effort to find the right one, experimental data analysis
tends to be quite robust. Simple comparisons between
control and treatment groups often suffice to give an unbiased account of the treatment effect, and additional
analysis merely estimates the causal parameters with
greater precision. This is not to say that experimental
research is free from data mining, but the transformation
of raw data into statistical results involves less discretion
and therefore fewer moral hazards.
Further Reading
Angrist, J. D., Imbens, G. W., and Rubin, D. B. (1996).
Identification of casual effects using instrumental variables.
J. Am. Statist. Assoc. 91(June), 444 455.
Field Relations
David A. Snow
University of California, Irvine, California, USA
Calvin Morrill
University of California, Irvine, California, USA
Glossary
derived dimension of roles The highly variable, situationally
specific behaviors and orientations associated with the role,
and through which actual relations are negotiated and
maintained.
fieldwork relations The set of researcher/informant relationships established between the fieldworkers and those
members of the scene or setting who function as hosts
and objects of the research.
fieldwork roles The various negotiated positions or vantage
points that situate the fieldworker in relation to the
phenomenon of interest, or in relation to some set of
members of the group or setting studied.
rapport/trust A property of a relationship, referring to the
point in the development of fieldwork relationships at
which the informant feels reasonably comfortable in
functioning as a behavioral and orientational guide to the
setting/group/social world being studied, thus divulging
what s/he knows and does not know, and the corresponding
confidence the fieldworker has in his/her informants and
the information elicited from them.
structural dimension of roles The generic, skeletal boundaries of a role (such as the role of a fieldworker), but without
explicitly defining the various behaviors and orientations
associated with the actual performance of the role.
Field relations encompass the set of relationships established, in the scene or setting being studied, between the
researcher and those informants who function as the
hosts and objects of the research. Because of the importance of field relations to successful field research, such
Introduction
Of the various issues that qualitative field researchers
must consider and negotiate during the course of their
fieldwork, few are as fundamental to the success of
a project as the kinds of field relations established with
members of a research setting. Indeed, it is arguable that
nothing is more determinative of what a field researcher
sees and hears, and thus learns about aspects of a social
context, than the numbers and types of field relationships
developed with informants. This is so for two reasons: first,
intimate, up-close access to the happenings, events, and
routines that constitute any particular social setting is
contingent on establishing relationships in that setting
with one or more members who function as guides to
the organization and perspectives associated with those
happenings, events, and routines. Second, different kinds
of relationships with different individuals variously situated within the setting are likely to yield different perspectives and understandings. Ethnographic knowledge
of any particular family, for example, is likely to vary
somewhat, depending on whether the family is accessed
Encyclopedia of Social Measurement, Volume 2 2005, Elsevier Inc. All Rights Reserved.
25
26
Field Relations
associated with the idea of roles. However, these troublesome accouterments can be bypassed if roles are understood more flexibly as having two components: the
structural dimension, which delimits the skeletal or generic boundaries of roles, and the derived dimension,
which denotes the more situationally specific, negotiated,
and thus emergent aspects of roles. Although not always
framed in terms of these two dimensions of roles, much of
the discussion of fieldwork roles can be usefully understood in terms of these components. For example,
Raymond Golds now classic distinction concerning the
observational roles of the complete observer, the observeras-participant; the participant-as-observer, and the complete participant, Herbert Gans similar distinction
concerning the total participant, the researcherparticipant, and the total researcher; and John and Lyn
Loflands even more parsimonious distinction between
known (overt) and unknown (covert) investigators all
identify two or more ideal-typical fieldwork roles, with
each implying different kinds of field relations and
informational yield.
Although these typologies of fieldwork roles are useful
insofar as they sensitize researchers to possible variations
in the degree of participation/membership and openness
with respect to research identity and interests, and the
advantages and disadvantages associated with such
variations, such typologies are not very helpful with respect to actually negotiating and maintaining field
relationships. Moreover, they gloss over the multitude
of ways in which it is possible to be a participant-observer
within a setting, or a member of a group, whether known
or unknown. For example, Adler and Adlers examination
of a large number of ethnographic field studies in which
the researcher was a member of the setting or group
revealed that membership roles can vary considerably,
ranging from peripheral through active to complete membership roles, with each entailing different kinds of
relationships with members and commitments to the
group or setting. In the case of peripheral membership
roles, for example, the researcher participates in some of
the activities associated with the group or setting being
studied, but refrains from engaging in its central activities,
as illustrated in Ruth Horowitzs study of Chicano youth
gangs in Chicago. Though she hung out with members in
some places and at some times, she avoided participation
in their more dangerous and sexual activities, thus foregoing the kinds of knowledge that derive from
relationships and experiences forged in certain groupspecific contexts. In the active membership role, by contrast, the researcher participates in the groups core activities, but without fully embracing the groups values
and goals, as illustrated by Burke Rochfords study of
the Hare Krishna movement in the United States in
the 1980s. And exhibiting both thoroughgoing engagement in a groups activities and embracing its goals and
Field Relations
27
Researcher Attributes/
Characteristics and Field
Relations
Also significantly affecting the kinds of fieldwork roles
and relations that can be negotiated are the researchers
social attributes (gender, age, race, and ethnicity) and
personal characteristics and experiences, including biography, personality, and perspectives. Such personal and
social characteristics can be relevant to all social research,
but they are especially significant in the case of fieldwork,
and particularly ethnographic fieldwork, because it is the
most embodied of all social research in the sense that the
fieldworker is the primary research instrument. Because
of this characteristic of fieldwork, the social attributes of
the researcher are likely to be especially relevant to
those studied, not only closing or opening doors, but
also influencing the character of the relationships
that evolve.
Given that we live in a gendered social world, with
gender figuring in the organization of relationships in
most social settings and scenes, the gender of the researcher is likely to be an especially significant factor in
accessing and negotiating relationships in most settings
and groups. Not surprisingly, the settings and activities
that are most gendered, in the sense of being associated
primarily with women or men, tend to get studied primarily by the matched gender, or used accordingly as research
sites. Public restrooms, for example, can be particularly
good sites for developing fleeting relationships that yield
information about activities and interests that extend beyond the restroom, but both experience and research
show that access and the possibility of developing such
fleeting relationships is highly gendered. In other words,
women fieldworkers traditionally have been more likely to
study topics and issues stereotypically associated with
women, such as child care, emotion work, and veiling,
whereas male fieldworkers have been more are likely to
focus on topics and issues more stereotypically associated
with men, such as hunting, police work, and drinking. But
just as gender barriers and logics have changed, so have
there been changes in the link between gender and research; women, in particular, have negotiated sexually
28
Field Relations
Field Relations
29
the spot. Such countercentripetal strategies include preempting at the outset of a project, by articulating certain
limitations and constraints on what will be done; finessing through the utterance of evasions and ambiguities in
relation to member overtures and entreaties; declining
directly and unambiguously member appeals; and withdrawal from activities. Such responses to centripetal
overtures and appeals further underscore the negotiated
character of the development and maintenance of viable
fieldwork relationships and of the fieldwork process more
generally.
Further Reading
Adler, P. A., and Adler, P. (1987). Membership Roles in Field
Research. Sage, Newbury Park, CA.
Emerson, R. M. (2001). Contemporary Field Research:
Perspectives and Formulations,. 2nd Ed. Waveland Press,
Prospect Heights, IL.
Fine, G. A., and Sandstrom, K. L. (1988). Knowing
Children: Participant Observation with Minors. Sage,
Newbury Park, CA.
Johnson, J. M. (1975). Doing Field Research. The Free Press,
New York.
Lofland, J., and Lofland, L. (1995). Analyzing Social Settings,.
3rd Ed. Wadsworth, Belmont, CA.
McCall, G. J., and Simmons, J. L. (eds.) (1969). Issues
in Participant Observation: A Text and Reader. AddisonWesley, Reading, MA.
Punch, M. (1986). The Politics and Ethics of Fieldwork. Sage,
Newbury Park, CA.
Rosaldo, R. (1989). Culture & Truth: The Remaking of Social
Analysis. Beacon Press, Boston, MA.
Snow, D. A., Benford, R. D., and Anderson, L. (1986).
Fieldwork Roles and Informational Yield: A Comparison of
Alternative Settings and Roles. Urban Life 14, 377 408.
Warren, C. A. B., and Hackney, J. K. (2000). Gender Issues in
Ethnography,. 2nd Ed. Sage, Thousand Oaks, CA.
Wax, R. H. (1971). Doing Fieldwork: Warnings and Advice.
University of Chicago Press, Chicago, IL.
Field Studies
Leon Anderson
Ohio University, Athens, Ohio, USA
Steven Rubenstein
Ohio University, Athens, Ohio, USA
Glossary
analytic induction Theory building in field studies through
the collection of new cases that require the revision of
extant theoretical understandings of the phenomenon.
emic The description and analysis of culturally significant
categories of meaning used by the people in a social setting
or culture under study.
etic The description of universal categories of human
behavior for purposes of cross-cultural comparison irrespective of cultural significance.
grounded theory An inductive approach for generating and
confirming theory that emerges from recursive field
research.
informant The individual studied by field researchers.
member validation The sharing of research ideas and
reports with informants in order to correct mistakes or to
provide alternative interpretations of reported events.
purposeful sampling The selection of information-rich cases
in order to deepen understanding of specific aspects of the
topic of study.
triangulation The application of multiple research methods
to provide complementary kinds of data in order to develop
a multidimensional analysis.
Encyclopedia of Social Measurement, Volume 2 2005, Elsevier Inc. All Rights Reserved.
31
32
Field Studies
Field Studies
33
34
Field Studies
Gathering Data
Field Notes
Field notes describing the researchers experiences and
observations in the field are the fundamental data of field
research, representing the most crucial data log from
which analysis is developed. The most widely recommended technique for writing field notes is for the researcher
to make jotted notes inconspicuously in the field and to
expand on them later. Conscientiousindeed compulsivefield note writing is critical to high-quality research
and analysis, and it is important to write field notes
promptly after observations in order to ensure maximum
recall. Field notes should be concrete and richly descriptive accounts of events, people, things heard and overheard, conversations among people, and conversations
with people. The purpose of field notes is to provide
a fertile informational base from which to create rich,
descriptive, and analytic accounts. Because fieldworkers
often alternate between outsiders and insiders perspectives, they frequently distinguish between two different
kinds of accounts. An etic account provides a description
of observed actors, actions, and objects in a language that
the fieldworker considers objective or universal (in
other words, employing distinctions and categories that
can apply to many different cultures; such a description
would prove the basis of a cross-cultural analysis). An
emic account provides a description of observed actors,
actions, and objects in terms of distinctions meaningful
(either consciously or unconsciously) for the informants
(in other words, the description of actors, actions, and
objects is a means to elicit a description of locally used
categories).
In addition to descriptive accounts of field observations and experiences, field notes should also include
Sampling
Though sampling issues are important in field studies, just
as in other forms of social science research, sampling
strategies in field studies vary according to the goals
of the research and field practicalities. If the goal is to
generalize individual characteristics or behaviors to
a population, then random probability sampling is the
most appropriate technique. However, most field researchers seek instead to develop holistic understandings
of social groups, cultural settings, and/or generic social
processes. In these cases, random sampling is less useful.
Insofar as sampling strategies are consciously pursued in
field studies (and this varies widely), assorted types of
purposeful sampling tend to be invoked. In contrast
to probability sampling, purposeful sampling is designed
in the course of inquiry to pursue emergent analytic leads
or to facilitate the development of analytic insights.
A range of purposeful sampling techniques has been developed, including the relatively widespread techniques
of maximum variation sampling and negative case sampling. Maximum variation sampling involves searching for
the range of variation in a particular phenomenon (e.g.,
kinds of behavior exhibited in a specific social settings or
situation). Utilizing this method involves observing and
documenting unique cases until saturation is achieved.
Negative case samplingoften used in conjunction
with analytic induction (discussed later)entails seeking
out manifestations of the phenomenon of interest that do
not fit existing theoretical understandings.
Whatever the sampling strategies pursued by field researchers, the cases sampled (whether they are individuals, groups, or social settings) must be informationally
rich so as to provide ample contextual data for understanding and illustrative or anecdotal information for
writing field reports. The point of sampling in field studies
is to maximize richness of observations so as to enable
Field Studies
35
comment on the researcher. Moreover, fieldwork typically reveals psychological and cultural biases on the
part of the fieldworker. Those who pursue objective research are wise to attend to these revelations in any attempt to reduce observer bias. In addition to researchdriven field notes, fieldworkers should therefore keep
a diary or journal in which they record their own feelings
and personal judgments. Such reflection is also important
because it helps fieldworkers to monitor their own mental
and physical health, which is essential because, in field
research, the person is an instrument of research. In the
1950s and 1960s, some anthropologists published memoirs or autobiographical novels in which they reflected on
these personal thoughts and feelings; such works were
meant both to inspire and to caution future fieldworkers.
In recent years, reflections on fieldwork have become
more popular and influential, in part because they reveal
not only personal biases on the part of fieldworkers, but
shared biases that reveal fieldwork to be not so much
a form of scientific study as a cross-cultural encounter.
Consequently, fieldwork sometimes reveals as much
about the culture of the researcher as it does about the
culture being studied.
Recursivity refers to the effect that experiences in the
field have on the direction of research. This process often
occurs in the course of research in the field: regardless of
the theoretical orientation of the researcher, or the questions or methods established before entering the field,
field research inevitably reveals unanticipated phenomena and questions. Unexpected political, economic, or
physical barriers to research, or the demands of local
informants, may also lead to revision of the project. In
some cases, the researcher must immediately revise his or
her foci, questions, and methods. In other cases, it is
precisely the successful conclusion of a well-planned research project that raises new questions. Recursivity may
also occur in the course of writing after returning from the
field. For many anthropologists, recursivity is made possible by the detachment that comes from distance in time,
space, and mind (achieved through the process of academic writing). Thus, much recent ethnography is dedicated to discovering the questions raised by field
research, in order to suggest new research.
Data Analysis
Field researchers utilize various data analysis strategies
that range along a continuum of procedural rigor and
explicit specification. More explicit and codified strategies, such as Anselm Strauss and Juliet Corbins version
of grounded theory, James Spradleys elaboration of Ward
Goodenoughs componential analysis, and Christina
Gladwins work, presented in her book Ethnographic Decision Tree Modeling, involve standardized techniques
for manipulating data. Similarly, computer-assisted
36
Field Studies
Further Reading
Adler, P. A., and Adler, P. (1987). Membership Roles in Field
Research. Sage, Newbury Park, CA.
Atkinson, P., Coffey, A., Delamont, S., Lofland, J., and
Lofland, L. (2001). Handbook of Ethnography. Sage,
Thousand Oaks, CA.
Bernard, H. R. (1995). Research Methods in Anthropology,.
2nd Ed. Altamira Press, Walnut Creek, CA.
Field Studies
Davies, C. (1999). Reflexive Ethnography. Routledge,
New York.
Denzin, N., and Lincoln, Y. (eds.) (2000). Handbook of
Qualitative Research, 2nd Ed. Sage, Thousand Oaks, CA.
Emerson, R. (2001). Contemporary Field Research, 2nd Ed.
Waveland Press, Prospect Heights, IL.
Emerson, R., Fretz, R., and Shaw, L. (1995). Writing Ethnographic Fieldnotes. University of Chicago Press, Chicago, IL.
Lincoln, Y., and Guba, E. (1985). Naturalistic Inquiry. Sage,
Thousand Oaks, CA.
Lofland, J., and Lofland, L. (1995). Analyzing Social Settings,
3rd Ed. Wadsworth, Belmont, CA.
37
Glossary
eugenics The concept of improving a breed by the careful
selection of parents, especially in regard to the human race.
Fundamental Theorem of Natural Selection The rate of
increase in fitness of any organism at any time is equal to its
genetic variance in fitness at that time.
genetics The science of heredity, concerning the similarities
and differences that appear among related organisms.
natural selection Sometimes called survival of the fittest;
Charles Darwins theory that those organisms that can best
adapt to their environment are the most likely to survive.
statistics The science of the collection, analysis, and
interpretation of numerical data.
Introduction
Sir Ronald Fisher (1890 1962) was one of the most
prolific scientists of the 20th century. Statisticians
consider him to be one of the founders of modern statistics, both for his contributions to theory and for his
many developments of applied techniques. In population
genetics, Ronald Fisher is one of the first names mentioned. Along with J. B. S. Haldane and Sewall Wright,
Fisher contributed to the neo-Darwinian synthesis, the
Encyclopedia of Social Measurement, Volume 2 2005, Elsevier Inc. All Rights Reserved.
39
40
Fisher as Statistician
Fisher is regarded by many as one of the founders of
modern statistics. Any beginning student of statistics
will find in his text a detailed description of the analysis
of variance, without any mention that the technique was
developed in the 1920s by Sir Ronald Fisher. Similarly,
the student will learn to use the F-ratio without being
informed that it is so named as a tribute to Fisher. But
Fishers importance in the field of statistics ranges far
beyond elementary applications. Fishers work in statistics can be largely grouped in three areas, all of equal
prominence: the theoretical foundations of statistics,
practical applications and methods, and experimental
design.
In 1922, Fisher published On the Mathematical
Foundations of Theoretical Statistics. In earlier papers
he had begun considering the accuracy of estimates based
on samples drawn from large populations. The 1922 paper
clarified the method of maximum likelihood, defined several terms important to the field, and made clear the
necessity of considering the sample and the larger population separately. According to Fisher, the population
has parameters that one does not know; from the sample
one calculates one or more statistics, such as the mean or
the standard deviation, in order to estimate the unknown
parameters. Ideally, a statistic should be sufficient; the
statistic computed from the sample should contain all
possible information about the unknown parameter.
That is, for example, the sample standard deviation
s should contain all of the information about the population standard deviation s that can be obtained from the
sample. Additionally, a statistic should be efficient; that
is, information that the sample contains should not be lost
in the process of computing the statistic from the data.
Finally, a statistic should be always be consistent; the
larger the sample size, the closer the statistic should come
to the actual parameter.
The 1922 paper, published in the Philosophical
Transactions of the Royal Society, was a landmark
achievement. Fisher followed it with many other papers
on statistical theory and applications. (Many of these,
in complete text, are available on-line.) Fisher did not
41
42
Fisher as Geneticist
Among Fishers major accomplishments in genetics were
his genetic theory of natural selection, his work on dominance and the evolution of dominance, and his discovery
of the genetic basis for the very complex Rhesus factor in
human blood groups. Fisher had a lifelong interest in
genetics and always had breeding experiments of some
sort under way. Throughout his entire adult life, he kept
a breeding population of mice or chickens or, for a time,
snails, wherever he was living. In addition to his many
papers on genetics, in 1949 Fisher published a book, The
Theory of Inbreeding.
Fisher as Eugenicist
In the later chapters of Genetical Theory of Natural
Selection, Fisher expressed his concerns for the future
of human civilization. A staunch believer in eugenics,
he believed that since the lower classes, whom he regarded as less fit, were reproducing at a faster rate than
the upper classes, this was working to the detriment of
society. As an undergraduate at Cambridge, Fisher was
a founder and first president of the Cambridge University
Eugenics Society. This brought him into contact with
Major Leonard Darwin, a son of Charles Darwin and
president of the Eugenics Education Society of London.
Leonard Darwin became a close friend and advisor of
Fishers.
A survey of Fishers publications will show that in addition to his many papers on statistics and genetics, interspersed are papers dealing solely with eugenics and the
state of mankind. When Fisher became Galton Professor
of Eugenics at University College London in 1933, along
43
Further Reading
Archive of papers and correspondence of R. A. Fisher. Barr
Smith Library, University of Adelaide, Adelaide, Australia.
Available at http://www.library.adelaide.edu.au/digitised/
fisher/index.html
Bennett, J. H. (ed.) (1971 1974). Collected Papers of
R. A. Fisher, Vol. 1 5. University of Adelaide, Adelaide,
Australia.
Box, J. F. (1978). R. A. Fisher: The Life of a Scientist. Wiley,
New York.
Yates, F., and Mather, K. (1963). Ronald Aylmer Fisher. Biogr.
Mem. Fellows R. Soc. 9, 91 129.
Fixed-Effects Models
George Farkas
The Pennsylvania State University, University Park,
Pennsylvania, USA
Glossary
analysis of variance Statistical methods for comparing means
by dividing the overall variance into parts.
fixed-effects models Statistical methods for estimating causal effects, in which each individual serves as his or her own
control.
panel data Information on each survey unit for multiple time
points.
pooled time-series cross-section data A single data set
formed from pooled panel data, containing information on
multiple survey units for multiple time points.
random-effects models Statistical methods in which the
data describe a hierarchy of different populations, with
differences constrained by the hierarchy.
Introduction
Problems of Causal Inference with
Nonexperimental Data
Random-assignment experiments provide the best means
for testing causal effects. When trying to learn the effect
of a treatment (for example, a medical treatment) on humans, there is no better evidence than the results of
Encyclopedia of Social Measurement, Volume 2 2005, Elsevier Inc. All Rights Reserved.
45
46
Fixed-Effects Models
Fixed-Effects Models as
a Partial Solution
How, in the absence of randomization, is it possible to
control (adjust for) correlated variables that are not even
measured on a data set? The answer is to utilize, for each
variable and unit of analysis under study, multiple measures of the variable. Suppose, for example, that it is desired to estimate the effect on an individuals earnings of
being employed in job that is predominantly performed
by female workers. And suppose that there are work
histories for a sample of workers, and that these histories
include information on the percentage of females
employed in each of the different jobs a worker has held
over her or his lifetime. Then, for each of the multiple
measures of the variables to be included in the analysis,
Fixed-Effects Models
when this person holds a job that is, say, 1% more female
than is usual for them, their earnings fall below the level
that is usual for them. The effects of the unmeasured Z
variables are not permitted to bias the calculation because
they determine the persons average, and these effects
have been eliminated by the construction of variables
that are differences around this average.
Two caveats are in order. First, to succeed in removing
the effects of variables such as Z in the preceding example,
the variables should be unchanging across the multiple
observations for each individual. Thus, the method works
best for variables Z that can be assumed to be unchanging,
or at most, are very slowly changing, across the multiple
observations for each individual. Second, the method requires multiple observations for each individual, and that
X, the independent variable of interest, vary across these
observations. (Thus, for example, this method cannot be
used to measure the effects of X variables such as race or
gender, which do not change as an individual ages.) When
these conditions are met, the fixed-effects method can be
quite flexible and powerful. It applies when the same
individual or other unit of observation (for example,
a state or nation) is observed over time. It also applies
to situations in which, for example, multiple siblings from
the same family are observed, and the aim is to estimate
the effect of family size (number of siblings) on outcomes
for each sibling, and to do so while avoiding bias from the
effects of unmeasured, unchanging family characteristics
(such as genetic endowments and child-raising
procedures).
47
48
Fixed-Effects Models
Competing Models
A large literature exists for estimating models similar to
those in Eq. (1), but instead of using dummy variables and
fixed effects, these models assume random effects.
That is, rather than assuming that these effects are
fixed constants present in a particular observed sample,
this method assumes that they are random variables from
a distribution of values. (This fixed rather than random distinction is the source of the name, fixed
effects.) The result is a more parsimonious specification,
but one that, for estimation, must assume that the random
effects are uncorrelated with the X variables in the model.
This is a disadvantage, because it makes it impossible
to net out correlated unmeasured variables that are the
chief source of bias in nonexperimental studies.
Data Demands
Multiple Observations per Unit
As already noted, the fixed-effects methodology requires
multiple observations per unit, with the dependent variable measured in the same metric at each time point. For
pooled time-series cross-section, this means having overtime data for each unit. When individuals are involved,
this is usually referred to as panel (compared to crosssectional) data. Any individual who has data only for
a single time point must be dropped from the analysis.
Similarly, for families and other groupings, there must be
multiple observations for each family or group members.
Thus, any family or group having only one sibling (member) in the data must also be dropped from the analysis.
When families are involved, this could lead to the nonrandom loss of a large percentage of cases from the analysis, and could potentially be a significant cause of bias
in the study.
Other Issues
The fixed-effects methodology makes additional demands
on the data to be analyzed. Perhaps most significantly, it
removes only the effects of unchanging unmeasured
variables. Also, it estimates effects only for those X
variables that change their values at least somewhat across
the multiple observations for each unit. For example, if
individuals are being observed over time, it is possible to
estimate effects only for those variables that show change
over time. Examples include age, work experience, job
characteristics (to the extent that at least some workers do
change their jobs over the period covered by our data),
family characteristics such as married or not, and number
Fixed-Effects Models
49
50
Fixed-Effects Models
studies that did not use fixed effects, the latter study found
that the number of siblings has no causal effect on childrens
intellectual attainment, once unmeasured, unchanging
family variables are controlled via fixed-effects estimation.
Other Areas
Increasingly, social scientists are coming to realize that fixedeffects estimation is potentially useful across a very broad
range of circumstances in which pooled cross-section timeseries or otherwise grouped data are analyzed, including
economic and political science scenarios.
Further Reading
Allison, P. D. (1996). Fixed-Effects Partial Likelihood for
Repeated Events. Sociol. Meth. Res. 25, 207222.
Allison, P. D. (2004). Fixed Effects Regression Methods Using
the SAS System. The SAS Institute, Cary, NC.
Angrist, J. D., and Krueger, A. B. (1999). Empirical strategies
in labor economics. In Handbook of Labor Economics,
(O. Ashenfelter and D. Card, eds.), Vol. 3, Chap. 23.
Elsevier, Amsterdam.
Ashenfelter, O., and Krueger, A. B. (1994). Estimating the
returns to schooling using a new sample of twins. Am. Econ.
Rev. 84, 11571173.
Beck, N., and Katz, J. N. (1995). What to do (and not to do) with
time-series cross-section data. Am. Pol. Sci. Rev. 89, 634647.
England, P., Farkas, G., Kilbourne, B. S., and Dou, T. (1988).
Explaining occupational sex segregation and wages: findings
from a model with fixed effects. Am. Sociol. Rev. 53,
544558.
Greene, W. H. (2000). Econometric Analysis, 4th Ed. Prentice
Hall, Upper Saddle River, New Jersey.
Griliches, Z., and Hausman, J. (1986). Errors in variables in
panel data. J. Econometr. 31, 93118.
Guo, G., and VanWey, L. K. (1999). Sibship size and
intellectual development: Is the relationship causal? Am.
Sociol. Rev. 64, 169187.
Heckman, J. J., and Hotz, J. (1989). Choosing among
alternative nonexperimental methods for estimating the
impact of social programs: The case of manpower training.
J. Am. Statist. Assoc. 84, 862874.
Matyas, L., and Sevestre, P. (1996). The Econometrics of Panel
Data. Kluwer Academic Publ., Dordrecht.
Mundlak, Y. (1978). On the pooling of time series and cross
sectional data. Econometrica 56, 6986.
StataCorp. (1999). Stata Statistical Software: Release 6.0. Stata
Corporation, College Station, TX.
Focus Groups
David L. Morgan
Portland State University, Portland, Oregon, USA
Glossary
degree of structure The extent to which the focus group
interview is either directed by the research team (more
structured) or left in the hands of the group participants
(less structured).
funnel approach An interviewing technique that begins with
broad, open questions in a less structured format, and then
proceeds to more narrowly defined questions in a more
structured format.
homogeneity The extent to which all of the participants in
the focus group share a similar orientation to, or
perspective on, the discussion topic.
question wording The actual content of the items in survey
questionnaires and related measurement instruments.
saturation In qualitative research, the extent to which there is
less new information in each additional round of data
collection.
segmentation A strategy for creating homogeneous subsets
of focus groups within a larger project, by assigning
participants to groups according to prespecified characteristics, e.g., gender or previous experiences.
The use of focus groups is a technique for collecting qualitative data. Focus groups typically bring together six to
eight participants who engage in an open-ended discussion about topics that are supplied by the researchers.
Focus groups are thus a form of interviewing whereby
the researcher provides the focus, which the group uses as
the basis for their discussion. For measurement purposes,
focus groups are most often used as a preliminary method
that provides input to the development of survey
instruments, questionnaires, and other measurement instruments in the social sciences. With that emphasis in
mind, the first portion of this article describes the advantages of using focus groups as a preliminary step in
Introduction
Focus groups serve a general-purpose method for collecting data, and as such, they can serve a wide range of
purposes. In particular, qualitative researchers frequently
use focus groups in a self-contained fashion, in that the
groups are the sole source of data for a study. From
a measurement perspective, however, focus groups are
most frequently used as a preliminary method that generates insights into what should be measured and how
those measures should be constructed.
Various forms of group interviewing have played a role
in the development of social science measures from very
early in the 20th century. The basic format for what are
now known as focus groups arose subsequently, in the
1990s, from the work of Paul Lazarsfeld and Robert
Merton, although their approach applied to both individual and group interviews. From the 1950s through the
1970s, focus groups were far more common in marketing
research than in the social sciences. This changed as group
interviewing began to play a more prominent role in creating survey instruments, from the 1980s onward. This
strategy relies on a sequential approach that begins by
using focus groups as an input to the creation of quantitative research instruments. This amounts to collecting
focus group data to enhance the effectiveness of measures
that will be used in survey or experimental research, especially in areas in which researchers want to take a fresh
approach to well-studied topics and for new topics for
which researchers lack basic information. As the
following discussions show, however, the knowledge
Encyclopedia of Social Measurement, Volume 2 2005, Elsevier Inc. All Rights Reserved.
51
52
Focus Groups
Discovery-Oriented Uses of
Focus Groups
The primary reason for beginning a survey project with
discovery-oriented focus groups is to gather basic information about a poorly understood topic or population of
respondents. This approach emphasizes the strengths that
qualitative methods offer in exploratory research. Too
Development-Oriented Uses of
Focus Groups
In projects that rely on preliminary focus groups for development, the goal is to learn how a set of key issues
operates in the lives of the people being studied. Often
this amounts to locating questions that will operationalize
a theoretical concept. This approach emphasizes the
strengths of qualitative methods for learning about others
perspectives on the things that interest the researcher.
Development-oriented versions of focus groups typically search for a set of questions that can adequately
cover a predetermined topic. The discussions in focus
groups let researchers hear the respondents perspectives
on that topic. Using preliminary focus groups to develop
operationalizations for the key concepts in a survey reveals
the behaviors and opinions that the respondents associate
with the research topics. This can be especially helpful
when creating sets of questions that apply equally well
to several categories of respondents, such as men and
Focus Groups
Definition-Oriented Uses of
Focus Groups
When definition is the purpose that drives the use of
preliminary focus groups in a project, the goal is to determine the final content of the survey instrument. This
approach emphasizes the strengths of qualitative methods
for studying social life both in detail and in context. The
primary reason for beginning a survey research project
with a definition-oriented qualitative study is to assist in
creating the actual item wording for the questionnaire.
Because the quality of the data in a survey depends directly on the questions that are asked, it is not enough just
to ask about the right topics (discovery), or to locate question topics that cover the researchers interests (development); beyond those essential goals, it is also necessary to
write the questions in language that the respondents cannot only understand easily, but in language that also
means essentially the same thing to both the respondents
and the researchers (definition).
Regardless of the theoretical or practical concerns that
motivate an interest in a particular research topic, asking
effective survey questions requires an understanding of
how the survey respondents talk about these topics. Put
simply, researchers cannot create meaningful questions
unless they understand the language that the respondents
use. Once again, a useful example comes from research on
AIDS with gay and bisexual men; focus groups helped
researchers to locate appropriate wordings for asking
about a range of sexual behaviors. In general, definitionoriented groups are most useful when a researcher knows
the content areas for the measurement instrument, but is
uncertain about the best ways to state the final wording.
53
Additional Advantages of
Preliminary Focus Groups
Two different sets of advantages can arise from using
preliminary focus groups to develop measures. The first
set involves the classic criteria of reliability and validity.
The second set is less directly related to the properties of
the measures, but may be just as important for the overall
success of the research project.
Increased validity is the most obvious attraction of locating questions that are meaningful to the people who
will respond to measurement instruments, including the
essential goal of ensuring that the questions mean the
same thing to both the researchers and the respondents.
In the extreme case, preliminary focus groups may show
that a set of theoretically generated constructs of interest
to the researchers actually has very little relevance to the
respondents. For example, research group was asked to
use focus groups in developing survey questions that
would operationalize a particular theory of how couples
handled household finances; instead, the researchers
found that almost no one described their lives in terms
that fit the prespecified theoretical categories. Without
this qualitative research, it still would have been possible
to write questions that captured this particular theory,
even though those items might not measure anything
that was meaningful to the respondents.
The reliability of the survey instrument can also benefit
from preliminary focus groups. This is especially true for
checklists and attitude scales, which require parallel measures, i.e., multiple items, each targeting the same underlying phenomenon in a similar way. Thus, preliminary
focus groups can improve both the reliability and validity
of survey measures by generating a sufficient number of
survey items that effectively capture what a topic means
to the survey respondents.
The uses for preliminary focus groups can also go beyond improvements to reliability and validity. Often,
quantitative researchers start with relatively limited
54
Focus Groups
Given that focus groups can be used in a number of different ways for a number of different purposes, what are
the essential considerations in designing preliminary
focus groups as an input to creating measures?
Group Composition
Interview Structure
Focus groups work best when the participants are as interested in the topic as the researchers are. Because the
conversation among the participants produces the data in
focus groups, it is important to bring together a set of
Focus Groups
Number of Groups
There are no hard and fast rules that determine the number
of focus groups to conduct. Ultimately, the number of
groups depends on the familiar constraint imposed by
the underlying variability of the population. When there
is relatively little variability in the population, then the
researcher will begin to hear the same things during the
second or third group. This sense that additional data gathering is not producing new information is known as saturation in qualitative research. If, however, there is a wide
range of opinions or experiences in the larger population,
then it will take a larger number of focus groups before
nothing new is being said in each additional group.
55
Analysis Procedures
The analysis strategy for these focus groups should be
determined by the larger goal of generating inputs for
measurement construction. Because this goal is simpler
than the interpretive work that accompanies many fullscale qualitative studies, the analysis procedures can also
be simpler. In particular, it would seldom be necessary to
use either qualitative analysis software or a detailed coding system to analyze data that serve primarily as an input
to creating questionnaires.
The standard procedure for capturing data in focus
groups is to audiotape the discussions. Transcribing
these tapes can be a time-consuming and difficult task,
which is most useful either when there are a great many
tapes to be compared or when the content of each discussion needs to be examined in careful detail. One common alternative to transcribing is to have more than one
member of the research team present during the group
discussion, so they can meet and debrief after the discussion. This approach is most effective when the observers
have a prior protocol that directs their attention to the
aspects of the discussion that are most likely to be useful
for the purpose at hand.
56
Focus Groups
Future Directions
The relationship between focus groups and pretesting, as
just discussed, is part of a broader series of issues involving
different techniques for improving the quality of questionnaire measurements. In particular, even though focus
groups are frequently grouped together with cognitive
interviewing as developmental tools in survey research,
these methods typically serve distinct purposes. Currently, very little is known about optimum strategies for
combining focus groups with cognitive interviewing and
other developmental strategies.
One simple hypothesis is that items and instruments
developed via focus groups would need less refinement
through cognitive interviewing, based on the assumption
that such items were already closer to the respondents
Further Reading
Barbour, R. S., and Kitzinger, J. (1999). Developing Focus
Group Research. Sage, Thousand Oaks, California.
Focus Groups
Berry, W. D., and Feldman, S. (1985). Multiple Regression in
Practice. Sage, Thousand Oaks, California.
Dillman, D. A., Singer, E., Clark, J. R., and Treat, J. B. (1996).
Effects of benefits appeals, mandatory appeals, and variations in statements of confidentiality on completion rates for
census questionnaires. Public Opin. Q. 60, 376 389.
Edmunds, H. (1999). The Focus Group Research Handbook.
NTC Business Books (in association with the American
Marketing Association), Chicago.
Fern, E. F. (2001). Advanced Focus Group Research. Sage,
Thousand Oaks, California.
Greenbaum, T. L. (1998). The Handbook for Focus Group
Research, 2nd Ed. Sage, Thousand Oaks, California.
Joseph, J. G., Emmons, C.-A., Kessler, R. C., Wortman, C. B.,
OBrien, K. J., Hocker, W. T., and Schaefer, C. (1984).
Coping with the threat of AIDS: An approach to
psychosocial assessment. Am. Psychol. 39, 1297 1302.
Krueger, Richard A. (1998). Analyzing and Reporting Focus
Group Results. Sage, Thousand Oaks, California.
Krueger, R. A., and Casey, M. A. (2004). Focus Groups:
A Practical Guide for Applied Research, 3rd Ed. Sage,
Thousand Oaks, California.
57
Frameworks of Probabilistic
Models for Unfolding
Responses
Guanzhong Luo
Murdoch University, Perth, Western Australia and South China
Normal University, Guangzhou, China
Glossary
cumulative response process A response process in which
the ideal direction principle applies.
ideal direction principle The probability that a person gives
a positive response to an item depends on the difference
between the locations of the person and the item. This
personitem difference may be positive or negative. The
probability is 1 when the personitem difference is positive
infinity.
ideal point principle The probability that a person gives
a positive response to an item depends on the distance
(absolute value of the difference) between the locations of
the person and the item. This personitem distance is
always non-negative. The probability is maximized when
the personitem distance is 0.
unfolding process A response process in which the ideal
point principle applies.
unfolding models for polytomous responses is then established using the rating formulation approach. As direct
results of the frameworks of unfolding models, some
basic techniques of confirming the desirable response
process are presented.
Encyclopedia of Social Measurement, Volume 2 2005, Elsevier Inc. All Rights Reserved.
59
60
known as the direct-response format, is fundamental, because a response within this format is governed directly by
a single personitem distance. A response with pairwisepreference and ranking formats is governed by the comparison of the distances between the person and the items
involved. The focus of this article is on a framework of
probabilistic models for responses with a single-stimulus
format, though a framework for the pairwise-preference
format will also be derived from the framework for the
single-stimulus format. The rest of this article is restricted
on unidimensional situations.
Louis Thurstone pioneered studies on attitude measurement in the 1920s, using both the ideal direction and
the ideal point principle. Clyde H. Coombs, a student of
Thurstones, carried on the research within the ideal point
framework with a more systematic approach. Coombs
introduced the term unfolding when he analyzed ranking data (collected with a questionnaire in ranking format)
in the early 1950s. Based on this concept, he established
his unfolding theory, which covered various types of data
in different response formats. In particular, his milestone
work in dealing with single-stimulus data (collected with
a questionnaire in single-stimulus format) provided
a prototype of the modern probabilistic unfolding models.
In the unidimensional case, the affective continuum on
which the person and the item are located is termed the
latent trait continuum. In single-stimulus data, the manifest response that a person n gives to item i is denoted as
xni, which can be dichotomous or polytomous. In the case
of dichotomous responses, a legitimate response is either
agree/yes (positive, xni 1) or disagree/no (negative,
xni 0). The principle for the unfolding process is that the
response xni is determined by the personitem distance
jbn dij, where bn is the location of person n and di is the
location of item i. Coombs explicitly defined xni as a step
function of the personitem distance:
1, when jbn di j r,
xni
1
0, when jbn di j 4r:
Here, bn is also known as the ideal point for the person.
The value of r in Eq. (1) determines the two thresholds
between the response interval for a positive response
yes (xni 1) and two disjoint segments for a negative
response no (xni 0), as shown in Fig. 1.
Although two very different segments on the latent
trait give rise to a negative response, according to
Coombs, a negative manifest response is certain if the
absolute value of the personitem difference is greater
than the given value of r, regardless of in which segment
the person is located. In this sense, the responses are
folded, as shown by the heavy line in Fig. 1. The task
of the data analysis is to unfold these responses to
recover the underlying latent trait continuum with the
locations of the persons and items involved. In addition,
the value of r is to be recovered or estimated. The distance
Prfxni 1g Pr xnj 0
:
61
1
Probability
Deterministic
Probability
X=0
X=1
0.5
Probabilistic
0.5
0.0
From Eq. (2), the probability of any item rank can also be
expressed in terms of the probabilities for all item pairs.
Therefore, it is widely accepted in the literature that the
family of unfolding models includes all the models for all
three response formats mentioned earlier, the underlying
response function of which is single peaked. Consequently, the response process governed by these models is
generally termed the unfolding process. Because of the
shape of the response functions in Fig. 2, they are also
known as single-peaked response processes.
A Framework of Dichotomous
Probabilistic Models for the
Unfolding Response
During the 30 years since Coombs developed his unfolding
theory, particularly in the late 1980s and 1990s, various
other specific probabilistic unfolding models for dichotomous responses have been proposed. Suppose that N persons give responses to a questionnaire consisting of I items.
For any person n with location bn and any item i with location di, the response variable is denoted as Xni: xni [ {0, 1}.
By specifying different probabilistic functions that conform
to the unfolding principle, various probabilistic unfolding
models have been proposed. In particular, the followings
models are frequently referenced in the literature and are
used in real applications.
1. The simple square logistic model (SSLM):
PrfXni 1 j bn , di g
1
1 expbn di 2
1
1 bn di 2 g
Location
expyi
: 5
expyi 2 coshbn di
62
coshri
,
coshri coshbn di
0.5
3 3
Probability
coshbn d
:
PrfXni 0 j bn , di , ri g
coshri coshbn di
PrfXni 1j bn , di , ri g
Cbn di
PrfXni 0j bn , di , ri g
,
Cri Cbn di
where bn, di, and ri are as defined earlier, and the function
C (the operational function) has the following properties:
(P1) Non-negative: C(t) 0 for any real t.
(P2) Monotonic in the positive domain:
C(t1) 4 C(t2) for any t1 4 t2 4 0.
(P3) C is an even function (symmetric
about the origin): C(t) C(t) for any real t.
2
1
2
1
Location
2 g
bn di
1
1 bn di =r2 g
Using Eq. (8) with Eq. (2), a general form for pairwisepreference responses can be obtained as
Pr Xnij 1
Cri Cbn dj
:
Cri Cbn dj Crj Cbn di
10
Cbn dj
:
Cbn dj Cbn di
11
A Framework of Polytomous
Probabilistic Models for
Unfolding Responses
1
HCM
SSLM
PARELLA
0.5
1
20
15
10
10
15
63
20
Figure 5 Probabilistic curves of unfolding models with pairwise-preference formats (di 1, dj 1). HCM, Hyperbolic
cosine model; SSLM, simple square logistic model.
Rating Formulation
lim Pr Xnij 1
bn !1
8
0,
>
>
<
1
,
1expdj di
>
>
:
0:5,
mk
k 0, 1, . . . , m:
13
mk
k 1, . . . , m:
15
64
mk
Pr Z 1, 1, . . . , 1, 0, 0, . . . , 0 O
|{z} |{z}
k
k 0, . . . , m;
where
Y
j
m Y
m
X
pl
ql
l1
17
lj1
Qj
(when j 0,
l1 pl is defined as 1; when j m,
Q
m
lj1 ql is also defined as 1). Conversely, {pk, k 1, . . . ,
m} can be expressed in terms of {Pr{X k}, k 0, 1,
. . . , m}. That is, an equivalent expression of Eq. (16) is
pk
Prf X kg
, k 1, . . . , m;
Prf X k 1g PrfX kg
m
X
m
Y
Cl ril Cl bn di
l1
m Y
k
X
k0
16
j0
lni
gni
m k
PrfO0 j Og
Qk
Qm
l1 pl
lk1 ql
,
18
PrfX kg 1:
Cl ril
l1
Y
m
Cl bn di :
21
lk1
23
k0
Crik
,
Crik Cbn di
Cbn di
qk 1 p k
:
Crik Cbn di
19
k 0, . . . , m;
20
24
,
lni
m
X
k0
(
exp
k
X
)
r2il
expfm kbn di 2 g:
26
l1
27
1.0
65
Z1 = 1
Probability
Z2 = 1
X=0
X=1
Z3 = 1
X =2
X=3
0.5
0.0
Location
Figure 6 Probabilistic curves of the general unfolding model for polytomous responses.
k
Y
r2ik lni ,
l1
k 0, 1, . . . , m 1;
28
29
where
lni
m
k
Y
X
bn d2mk r2ik :
k0
l1
30
k
Y
coshril =lni ,
l1
k 0, 1, . . . , m 1;
31
where
lni
m
X
k0
coshbn dmk
k
Y
coshril :
32
l1
66
Parallelogram item
Item
Person
Item
1
2
3
4
5
6
7
0
1
1
1
1
1
1
0
0
1
1
1
1
1
0
0
0
1
1
1
1
0
0
0
0
1
1
1
0
0
0
0
0
1
1
0
0
0
0
0
0
1
1
1
0
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
0
0
0
0
0
1
1
1
0
0
0
0
0
1
1
1
0
0
0
0
0
1
1
1
1
2
3
4
5
6
are all positive toward the use of IT, but clearly to different
degrees. If the cumulative process is applied, the ideal
deterministic structure will have a typical pattern, as
shown in the first scalogram block in Table I. That is,
all items present positive attitudes toward the topic,
but to different degrees. A person agrees only to statements with locations lower than that of the person.
However, where the attitudes are expected to range
from highly negative, through neutral, to highly positive
affect, the use of the unfolding process is justified. For
example, the use of capital punishment creates a big debate among the general population. Statements reflecting
a positive attitude (e.g., Capital punishment gives the
criminal what he deserves) as well as those presenting
negative attitude (Capital punishment is one of the most
hideous practices of our time) are needed to cover the
whole range of attitude intensity toward capital punishment. If the unfolding process is applied, the ideal deterministic structure will have the typical parallelogram
pattern shown in Table I. That is, different items provoke
different attitudes (from extremely negative to extremely
positive). A person agrees only to statements with locations close to that of the person. Table I shows both patterns when the responses are dichotomous (0 disagree,
1 agree) with persons and items are sorted according to
their locations (from lowest to highest).
It can be demonstrated that if a general framework of
unfolding models [Eq. (8) for dichotomous responses or
Eq. (20) for polytomous responses] is applied, the data
collected should be essentially in the structure of parallelogram with random variations. This fact can be used
after data collection to check if the unfolding process is
actually in place. However, it involves ordering items and
persons by their locations, which is hardly feasible before
the analysis of the data. Nevertheless, the correlation coefficients between the responses on pairs of statements
can be used to examine whether the selected process is
operating. If the cumulative process is applied, the correlation coefficients should all be positive. If the structure
of the data follows the frameworks of unfolding models
1.0
1.0
1.0
1.0
1.0
1.0
Further Reading
Andrich, D. (1988). The application of an unfolding model of
the PIRT type to the measurement of attitude. Appl.
Psychol. Measure. 12, 3351.
Andrich, D., and Luo, G. (1993). A hyperbolic cosine latent
trait model for unfolding dichotomous single-stimulus
responses. Appl. Psychol. Measure. 17, 253276.
Coombs, C. H. (1964). A Theory of Data. Wiley, New York.
67
Glossary
correlation The statistical technique that measures and
describes the relationship between two variables. The
method was invented by Francis Galton, who stated that
correlation was a very wide subject indeed. It exists
wherever the variations of two objects are in part due to
common causes.
eugenics Defined by Francis Galton as dealing with questions bearing on what is termed in Greek, eugenes, namely,
good in stock, hereditarily endowed with noble qualities.
ogive Used by Galton in 1875 to describe the shape of the
curve obtained when graphing the inverse normal cumulative distribution function, which has a sinuous shape
similar to that of an ogee molding.
regression Used to characterize the manner in which one set
of variables (for example, on the Y axis of a graph) changes
as the other set of variables (for example, on the X axis of
that graph) changes. For Francis Galton, who invented
regression analysis, regression was more a biological
problem than a statistical one. Regression to the mean,
which he observed in sweet peas and in anthropometric
measurements, meant that the underlying hereditary
processes were always driving metric characters such as
height or weight back toward the mean and would
effectively counterbalance natural selection.
Encyclopedia of Social Measurement, Volume 2 2005, Elsevier Inc. All Rights Reserved.
69
70
Above
average.
a
b
c
d
e
f
g
A
B
C
D
E
F
G
x
all grades
below
g
X
all grades
above
G
Proportionate,
viz. one in.
In each
million
of the
same age.
3040
4050
5060
6070
171,000
107,000
42,000
10,400
1,600
155
9
4
6
16
64
413
4,300
79,000
256,791
162,279
63,563
15,696
2,423
233
14
651,000
409,000
161,000
39,800
6,100
590
35
495,000
312,000
123,000
30,300
4,700
450
27
391,000
246,000
97,000
23,900
3,700
355
21
268,000
168,000
66,000
16,400
2,520
243
15
1,000,000
500,000
1,000,000
1,268,000
2,536,000
964,000
1,928,000
761,000
1,522,000
521,000
1,042,000
71
7080
77,000
48,000
19,000
4,700
729
70
4
332,000
664,000
149,000
298,020
Figure 1 The table published in 1869, in Galtons Hereditary Genius: An Inquiry into Its Laws and Consequences,
showing Galtons attempt to assess the distribution of ability in the British male population. Below the table in the
original publication, it was explained that the proportions of men living at different ages are calculated from the proportions that are true for England and Wales. The data were credited to the 1861 census. Redrawn from Galton (1869).
72
Regression
Not only was Galton interested in applying the normal
distribution to continuously varying traits such as height,
but he also wanted to study the heritability of such traits.
Although he was particularly interested in examining anthropometric data, initially he lacked appropriate data, so,
on the advice of Darwin and the botanist Joseph Hooker,
Galton turned to a model system, sweet peas. He cited
three reasons. Sweet peas had little tendency to crossfertilize, they were hardy and prolific, and seed weight
was not affected by humidity. Galton planted his first experimental crop at Kew in 1874, but the crop failed. In
order to avoid this outcome the following year, he dispensed seed packets widely to friends throughout Great
Britain. The packets were labeled K, L, M, N, O, P, and Q,
with K containing the heaviest seeds, L, the next heaviest,
and so forth down to packet Q. Galton obtained fairly
complete results from the progeny of 490 carefully
weighed seeds. His discovery was that the processes
concerned in simple descent are those of Family Variability and Reversion. By simple descent, Galton meant selffertilization. Family variability referred to the degree of
variation around the mean observed among progeny seeds
irrespective of whether they were large, small, or average
in size. Although the means of the distributions shifted to
some extent between different sets of progeny, the degree
of variation around the mean was similar for all. By
reversion Galton meant the tendency of that ideal
mean type to depart from the parental type, reverting
towards the mean of the parental population from which
the parental seeds were selected. He then drew a diagram
that plotted the seed diameter of progeny seeds on the y
axis and parental seeds on the x axis, thereby constructing
the first regression line (Fig. 3).
At first, Galton referred to the slope of the line as the
coefficient of reversion, but then changed this to
regression. Later, with aid of partial pedigree data he
obtained from his anthropometric laboratory, established
in connection with the International Health Exhibition
held in London in 1884, Galton was able to show that
regression to the mean applied to human stature as
well (Table I). This was to have a profound effect on
Galtons view of the evolutionary process because Galton
believed that regression to the mean would thwart the
action of natural selection and that this would create
a major problem for Darwins theory of the origin of
species. Hence, Galton supposed that evolution must proceed in discontinuous steps that could not be reversed by
regression to the mean. The utility of regression analysis
and the regression coefficient as statistical techniques was
later recognized by Galton and others, but initially this
was of strictly secondary importance to Galton, for whom
the evolutionary and, hence, hereditary implications of
regression to the mean were of primary significance.
Correlation
There are two accounts of how Galton came upon the idea
of correlation. Sadly, the more romantic of these, which
envisions Galton as having the notion while seeking refuge
from a shower in a reddish recess in a rock near the side of
a pathway on the grounds of Naworth Castle, is probably
wrong. The plausible, and more believable, account is that
Galton was working one day, using his anthropometric
73
18
17
Offspring mean
Parental mean
19
16
15
R = 0.33
14
14
15
19
17
18
Diameter of parent seed
16
20
21
22
Number of Adult Children of Various Statures Born of 205 Mid-parents of Various Staturesa
1
1
1
3
3
Totals
Medians
1
1
7
5
3
9
4
2
16
11
14
5
5
4
4
1
1
4
16
15
2
7
1
1
3
1
17
25
36
17
11
5
2
4
3
27
31
38
17
11
5
2
1
3
12
20
34
28
14
7
2
5
18
33
48
38
13
7
2
1
1
10
14
25
21
10
4
5
2
4
7
20
18
11
1
7
9
4
11
4
4
3
2
2
3
4
3
4
2
3
5
4
19
43
68
183
219
211
78
66
23
14
5
6
11
22
41
49
33
20
12
5
1
72.2
69.9
69.5
68.9
68.2
67.6
67.2
66.7
65.8
17
14
928
205
All female heights have been multiplied by 1.08. In calculating the medians, the entries have been taken as referring to the middle of the squares
in which they stand. The reason why the headings run 62.2, 63.2, etc., instead of 62.5, 63.5 etc., is that the observations are unequally distributed between
62 and 63, 63 and 64, etc., there being a strong bias in favor of integral inches. This inequality was not apparent in the case of the mid-parents. From
Galton (1886).
data to plot forearm length against height, when he noticed that the problem was essentially the same as that of
kinship. He summarized these data in 1888 in a table in
one of his most important papers Co-relations and Their
Measurements, Chiefly from Anthropometric Data.
Galton extended his correlation data to head breadth versus head length, head breadth versus head height, etc. He
also calculated the first set of correlation coefficients,
using the familiar symbol r. Most of his values for r
were quite high, between 0.7 and 0.9.
74
Further Reading
Bulmer, M. (1999). The development of Francis Galtons ideas
on the mechanism of heredity. J. Hist. Biol. 32, 263 292.
Burbridge, D. (2001). Francis Galton on twins, heredity and
social class. Br. J. Hist. Sci. 34, 323 340.
Cowan, R. S. (1972). Francis Galtons statistical ideas: The
influence of eugenics. Isis 63, 509 598.
Crow, J. F. (1993). Francis Galton: Count and measure,
measure and count. Genetics 135, 1 4.
Darwin, C. R. (1859). On the Origin of Species by Means of
Natural Selection, or the Preservation of Favored Races in
the Struggle for Life. Murray, London.
Forrest, D. W. (1974). Francis Galton: The Life and Work of
a Victorian Genius. Taplinger, New York.
Galton, F. (1865). Hereditary talent and character. Macmillans Mag. 12, 157 166, 318 327.
Galton, F. (1869). Hereditary Genius: An Inquiry into its Laws
and Consequences. Macmillan, London.
Galton, F. (1875). Statistics by intercomparison, with remarks
on the law of frequency of error. Philosoph. Mag. [4th
series] 49, 33 46.
Galton, F. (1883). Inquiries into Human Faculty and Its
Development. Macmillan, London.
Galton, F. (1886). Regression towards mediocrity in hereditary
stature. J. Anthropol. Inst. 15, 246 263.
Galton, F. (1888). Co-relations and their measurements,
chiefly from anthropometric data. Proc. R. Soc. London,
Ser. B 182, 1 23.
Galton, F. (1889). Presidential address. J. Anthropol. Inst. 18,
401 419 [see p. 403].
Galton, F. (1889). Natural Inheritance. Macmillan, London.
Galton, F. (2001). The Art of Travel; or Shifts and
Contrivances Available in Wild Countries, 5th Ed. [reprint]. London, Phoenix Press.
Gillham, N. W. (2001). A Life of Sir Francis Galton: From
African Exploration to the Birth of Eugenics. Oxford
University Press, New York.
Gillham, N. W. (2001). Sir Francis Galton and the birth of
eugenics. Annu. Rev. Genet. 35, 83 101.
Gillham, N. W. (2001). Evolution by jumps: Francis Galton
and William Bateson and the mechanism of evolutionary
change. Genetics 159, 1383 1392.
Pearson, K. (1914). The Life, Letters, and Labours of Francis
Galton. Vol. I. Cambridge University Press, Cambridge,
Pearson, K. (1924). The Life, Letters, and Labours of Francis
Galton. Vol. II. Cambridge University Press, Cambridge,
Pearson, K. (1930). The Life, Letters, and Labours of Francis
Galton. Vol. III. Cambridge University Press, Cambridge,
Porter, T. (1986). The Rise of Statistical Thinking. Princeton
University Press, Princeton, NJ.
Stigler, S. M. (1986). The History of Statistics: The Measurement of Uncertainty before 1900. Belknap/Harvard University Press, Cambridge, MA.
Stigler, S. M. (1995). Galton and identification by fingerprints.
Genetics 140, 857 860.
Gambling Studies
Garry J. Smith
University of Alberta, Edmonton, Alberta, Canada
Glossary
chasing An attempt to recoup gambling losses by playing
longer, playing more frequently, and/or increasing bet size.
continuous gambling format Any wagering event for which
there is a short time interval between placing the bet,
playing the game, and learning the outcome.
gambling The act of risking money, property, or something of
value on an event for which the outcome is uncertain.
pathological gambling A mental disorder that includes the
following essential features: (1) a continuous or periodic
loss of control over gambling, (2) a progression in gambling
frequency and in the amounts wagered, in the preoccupation with gambling, and in the acquisition of monies with
which to gamble, and (3) a continuation of gambling
involvement despite adverse consequences.
problem gambling Behavior related to gambling that creates
negative consequences for the gambler, for others in his/her
social network, or for the community.
Introduction
What Are Gambling Studies?
Gambling is a basic human activity dating back to ancient
times and found in nearly all cultures through the ages.
Overview
Gambling-related social science has concentrated more
on perceived deviant aspects of the activity than on its
normative dimensions and has featured methodologies
that include survey, sociohistorical, ethnographic, public
policy analysis, and longitudinal perspectives. This article
Encyclopedia of Social Measurement, Volume 2 2005, Elsevier Inc. All Rights Reserved.
75
76
Gambling Studies
Problem-Gambling Prevalence
Research
Rationale for Gambling Studies
Legislative blessing for gambling is based on the premise
that the social good of the activity outweighs any negative
outcomes. A problem with this presumption is that most of
the benefits of gambling are tangible and easily quantifiable in economic terms, whereas the costs to society are
often hidden, indirect, not immediately noticeable, and
are impossible to measure precisely. The costs alluded to
are burdens that problem gamblers impose on other citizens who gamble responsibly or who do not partake in
gambling at all. In order to justify offering legal gambling,
governments must allay citizens concerns that the social
costs of the activity are not overly onerous. One way this is
done is through problem-gambling prevalence surveys.
Such surveys have been completed in nearly one-half of
the American states, in 9 of the 10 Canadian provinces, and
in several overseas jurisdictions. The object of problemgambling prevalence surveys is to identify accurately the
percentages of individuals in a population with and without the disorder. Generally, the higher the percentage of
problem and at-risk gamblers in a population, the higher
the social costs of gambling and the greater the need for
public policy to ameliorate the situation. For example, in
making decisions about funding prevention and treatment
services for problem gamblers, legislators need sound
estimates of the numbers in the general population who
require help for their out-of-control gambling, their demographic profiles, and the likelihood that they will use
these services if they are made available.
Semantic Differences
A variety of terms have been applied to individuals whose
out-of-control gambling has an adverse effect on their
lives and creates harms for those around them (e.g., family, friends, and co-workers) and for society in general.
The term often employed by lay audiences and Gamblers
Anonymous (GA) members is compulsive; however, researchers and problem-gambling treatment specialists
avoid this term on the grounds that the label implies
that the individual is participating in an unenjoyable
activity. Because gambling can be a pleasing activity,
even for those who later develop problems, terminology
inferring compulsiveness is considered a misnomer.
Problem gambling is used by both lay and professional audiences to specify all of the patterns of gambling
behavior that compromise, disrupt, or damage personal,
family, or vocational pursuits, and covers a continuum
from moderate-risk to seriously out-of-control gamblers.
Psychiatrists and mental health therapists prefer the term
pathological, which incorporates several assumptions
basic to the medical perspective of aberrant gambling,
including the notion that pathological gambling is
a chronic and progressive disorder and a conviction
that there is a clear distinction between a pathological
and a social gambler.
Based on the imprecise terminology used to describe
out-of-control gambling and the methodological chaos
this creates, use of the label disordered gambling has
been recommended for the following reasons: (1) disordered gambling embraces all of the previously used terms,
and (2) a disorder suggests a wide range of gradually
shifting behavior. Semantic differences pertaining to uncontrolled gambling continue to confound gambling
studies scholars, because no clear-cut choice has
emerged; however, problem gambling and disordered
gambling are the terms with the most academic currency
at the moment.
Gambling Studies
77
Criterion
Preoccupation
Tolerance
Withdrawal
Escape
Chasing
Lying
Loss of control
Illegal acts
Risked significant
relationship
Bailout
Definition
Is preoccupied with gambling (e.g., preoccupied with reliving past gambling experiences,
handicapping or planning the next venture, or thinking of ways to get money with
which to gamble)
Needs to gamble with increasing amounts of money in order to achieve the desired excitement
Is restless or irritable when attempting to cut down or stop gambling
Gambles as a way of escaping from problems or relieving dysphoric mood (e.g., feelings of
helplessness, guilt, anxiety, or depression)
After losing money gambling, often returns another day in order to get even
(chasing ones losses)
Lies to family members, therapists, or others to conceal the extent of involvement
with gambling
Has made repeated unsuccessful efforts to control, cut back, or stop gambling
Has committed illegal acts (e.g., forgery, fraud, theft, or embezzlement) in
order to finance gambling
Has jeopardized or lost a significant relationship, job, or educational or career
opportunity because of gambling
Has relied on others to provide money to relieve a desperate financial situation
caused by gambling
78
Gambling Studies
Gambling Studies
79
In addition to scholarly disputes over the most appropriate tool for measuring problem gambling in a population,
differences of opinion exist about the value of telephone
surveys and the validity of certain survey questionnaire
items. Most problem-gambling prevalence studies have
been telephone surveys, which have the primary advantage of capacity to reach large numbers of respondents in
a short period of time, and in a cost-effective manner.
Despite their popularity, telephone surveys generally,
and problem-gambling prevalence surveys particularly,
contain flaws that can skew response rates (generally in
the direction of underestimating the problem):
Meta-analysis
Meta-analytic research has been conducted on 120 problem-gambling prevalence studies that had taken place in
North America over the previous decade. The studies
reviewed represent both adults and adolescents in the
general population, university students, adults and
youth in clinical and prison populations, and a variety
of special populations, including women, specific ethnic
groups, and gambling industry employees. The metaanalysis empirically integrated the findings from these
80
Gambling Studies
Longitudinal Research in
Gambling Studies
Prominent gambling studies researchers have stressed the
need for longitudinal designs to determine the natural
history of both disordered and responsible gambling.
As gambling studies research evolves, it is evident that
certain questions about gambling and problem gambling
cannot be answered definitively by one-time surveys or
clinical research. Longitudinal research (or cohort research, as it is sometimes referred to) is required to measure changes in gambling behavior over time, to correlate
gambling patterns and behaviors with the availability of
various legalized gambling formats, and to determine the
social and economic impacts of gambling and problem
gambling on communities and society at large. Although
in its infancy in gambling studies, longitudinal research is
becoming an increasingly popular study design because of
its presumed superiority over snapshot or cross-sectional studies. Longitudinal research includes the following advantages:
Following a group of respondents over time allows
researchers to better understand the onset, development, and maintenance of both normative and problem
gambling behavior.
Gambling Studies
81
Cultural Studies
Several noteworthy gambling-related works have emerged
from the cultural studies tradition, which is a blend of
social history and sociology. The premise of this approach
is that past events cannot be understood without reference
to prevailing social structures, and that universal categories such as class, race, and gender must be applied to the
analysis. Central to the cultural studies schema is the concept of hegemony or social controlthe process whereby
dominant ideas, ideologies, meanings, and values are
transmitted through various cultural practices so as to
maintain patterns of power and privilege in society. Ostensibly, the moral and philosophical legitimacy of this
82
Gambling Studies
Gambling Studies
Final Thoughts
Until 20 years ago, gambling in North America was a lowprofile public issue in terms of its variety and availability;
immoderate gambling was rarely framed as being problematic, governmental concern about gambling was minimal, and scholars were indifferent to the activity.
However, social science research has significantly broadened and deepened our understanding of gambling to the
point where there is general concurrence on the following
points:
Gambling is a mainstream cultural activity that is
practiced in a socially responsible manner by the majority
of those who partake in the activity.
There is such a phenomenon as problem gambling
and it is exacerbated by intermittent reinforcement
schedules and fast-paced gambling formats.
Elevated problem-gambling prevalence rates and
higher per capita annual wagering rates in a jurisdiction
are associated with the widespread availability of gambling outlets, particularly electronic gambling devices.
Pressure to expand gambling comes not from the
public, but from vested interest groups (e.g.,
governments, the gambling industry, aboriginal groups,
and charities) that stand to benefit from the expansion; as
a consequence, commercial considerations generally take
priority over social ones.
Governments that sanction gambling are faced with
the conundrum of maintaining gambling revenues,
while at the same time suppressing its harmful impacts.
Costly treatment programs for problem gambling are
seen to be less effective than is public policy that
emphasizes preventative measures that immunize
those susceptible to gambling addiction, or at least
83
Further Reading
American Psychiatric Association. (1980). Diagnostic and
Statistical Manual of Mental Disorders, 3rd Ed. American
Psychiatric Association, Washington, DC.
American Psychiatric Association. (1994). Diagnostic and
Statistical Manual of Mental Disorders, 4th Ed. American
Psychiatric Association, Washington, DC.
Beare, M. (1989). Current law enforcement issues in Canadian
gambling. In Gambling in Canada: Golden Goose or Trojan
Horse? (C. Campbell and J. Lowman, eds.), pp. 177194.
School of CriminologySimon Fraser University, Burnaby,
British Columbia.
Blaszczynski, A., Dumlao, V., and Lange, M. (1997). How
much do you spend gambling? Ambiguities in survey
questionnaire items. J. Gambling Stud. 13(3), 237252.
Castellani, B. (2000). Pathological Gambling. State University of
New York Press, Albany.
Clotfelter, C., and Cook, P. (1989). Selling Hope: State Lotteries
in America. Harvard University Press, Cambridge, Massachusetts.
Dixon, D. (1991). From Prohibition to Regulation:
Anti-gambling and the Law. Claredon, Oxford, United
Kingdom.
Dow Schull, N. (2002). Escape Mechanism: Women, Caretaking, and Compulsive Machine Gambling. Working Paper
No. 41. Center for Working Families. University of
California, Berkeley.
Ferris, J., and Wynne, H. (2001). The Canadian Problem
Gambling Index: Final Report. Canadian Centre on
Substance Abuse, Ottawa, Ontario.
Ferris, J., Wynne, H., and Single, E. (1999). Measuring
Problem Gambling in Canada. Phase 1 Final Report to
the Canadian Inter-Provincial Task Force on Problem
Gambling.
Glaser, B., and Strauss, A. (1967). The Discovery of Grounded
Theory. Aldine, New York.
Goodman, R. (1995). The Luck Business. Free Press,
New York.
Hayano, D. (1982). Poker Faces. University of California Press,
Berkeley.
Herman, R. (1974). Gambling as work: A sociological study of
the racetrack. In Sociology for Pleasure (M. Truzzi, ed.),
pp. 298314. Prentice-Hall, Englewood Cliffs, New Jersey.
Lesieur, H. (1977). The Chase: Career of the Compulsive
Gambler. Anchor Books, Garden City, New York.
84
Gambling Studies
Glossary
asymmetric information The type of data acquired when
a move by Nature occurs before any player acts, and some,
but not all, players are ignorant of what Nature chose.
best response A players optimal choice of action in response
to a particular choice of others.
common knowledge Something that all players know, with
the shared understanding among all players that they all
know the same thing.
contractible Some element that can be written into a contract
so that the disposition of the contract is based on the exact
outcome of the element, and a third party, other than the
players, can help in verifying the outcome.
cooperative game An activity in which players can make
binding agreements as to their actions.
equilibrium and Nash equilibrium A set of actions chosen
by all players such that, given the choices of others, no
player wants to change his choice unilaterally.
folk theorem A result, especially in repeated games, that
many hold to be true, but which is proved formally much
later.
incomplete information The type of data acquired when
a move by Nature occurs before any player acts, and some
players do not know what Nature chose.
informed player A game participant who has observed
Natures move in a game of asymmetric information.
mixed strategy A players choice of action that attaches
a probability to more than one available action.
Nature An entity that makes a probabilistic move with respect
to one of the elements of the game.
noncooperative game A type of game in which players
cannot make binding agreements as to their actions.
payoff Profits or utility to players after all players have chosen
their action.
perfect information The type of data acquired when players
know all that has occurred before their move.
players Participants in a game.
pure strategy A players choice of a single action with
probability 1, from among all available actions.
Encyclopedia of Social Measurement, Volume 2 2005, Elsevier Inc. All Rights Reserved.
85
86
Classification of Games
Questions
What should a person know about game theory to be
able to apply it to problems in social science? This question can be answered using applications and illustrative
examples, mainly in economics and management, that do
not require advanced training. The deeper problems in
economics and management that game theory has been
used to analyze can be accessed in a large body of literature. The hard problems in game theory are not discussed here, yet the reader can get a feel for how the
game theoretic model has been successfully used based
on minimal domain knowledge beyond everyday experience. Issues that some would regard as drawbacks in
using game theory are presented here, although others,
including the author, view them as challenges.
Definitions
A game, denoted by G, can be defined as consisting of
three elements: players, indexed by i (i 1, 2, . . . , N); an
action or strategy ai, possibly a vector, chosen by player i,
from a set Ai {ai}; and a payoff to player i, pi(ai, ai),
Equilibrium
Nash Equilibrium
In single-person decision theory, the main task is to characterize the optimal solution. In game theory, the task is to
characterize the equilibrium. The equilibrium is understandable if the concept of best response is first understood. Consider player is choice in response to actions
by the other players. Denote bi(si) to be is best response
to si. It is then defined as
bi s i arg max pi si , s i :
ai [Ai
87
8i
88
Firm 1 (P1)
Product Introductiona
Firm 2 (P2)
S
I
I
(5, 10)
(2, 2)
Woman
Man
Baseball
Ballet
Baseball
Ballet
2, 1
0, 0
0, 0
1, 2
Monitoring a Franchisee
Franchisee
Franchisor
Monitor
Dont monitor
Shirk
Dont Shirk
1, 1
4, 1
3, 0
00, 0
Informational Issues
Perfect Equilibrium
One of the central ideas in dynamic programming is
Bellmans principle of optimality. This has force in multistage games in the following sense: a players strategy
should be such that it is a best response at each information set. To see how this might affect Nash equilibrium,
consider the following variation of the battle of the sexes.
Subgame Perfection
Suppose in the game of battle of the sexes, the man moves
first in stage 1 and makes a choice, followed by the woman,
in stage 2, and the man gets to have the final say in stage 3.
This multistage game can be represented in extensive
form by the game tree in Fig. 1, similar to a decision
tree in single-person decision situations. At the end of
each terminal branch in the game tree are the payoffs
to both of the players. Read Fig. 1 as follows: If in
stage 1, at node 1, the man chooses baseball (BS), this
leads to node 2 in stage 2 of the tree. If, on the other
1
Man
BS
BL
3 Woman
Woman 2
BS
Payoffs
Man
Woman
BL
Man
BS
Man
Man
89
BL
Man
BS
BL
BS
BL
BS
BL
2
1
0
0
0
0
1
2
2
1
0
0
BS
0
0
7
BL
1
2
Figure 1 A three-stage battle-of-the-sexes game in extensive form. BS, baseball; BL, ballet.
90
One of the interesting things about specifying the equilibrium in this way is that even though, in equilibrium,
nodes 4 and 6 would never be reached, what the man
should do if the game ever got there is specified; i.e.,
the mans strategies are specified on off-equilibrium
paths. Also, note that in solving for the equilibrium to
the game starting at node 1, the games are first solved
for starting at nodes 47 in stage 3, then these strategies
are used from stage 3 to solve for games starting from
nodes 2 and 3 in stage 2, resorting to backward induction
to solve for the game. In doing so, the equilibrium is solved
for all subgames, starting from every node. Thus, the
equilibrium specified is a Nash equilibrium, not only to
the game starting from node 1, but to every subgame. For
this reason, the equilibrium is known as a subgame perfect
Nash equilibrium. It is important to note that if sub-game
perfection was not required, other Nash equilibria would
Nature
Good
Bad
1 p
2
5
P = $1000
Seller
$900
Buy
Buyer
Dont
buy
7
Buy
$900
Dont
buy
3
P = $1000
4
Buy
Dont
buy
Buyer
Buy
Dont
buy
1
Nature
1 p
Good
Bad
p
Seller 2
3 Seller
P
P
Buyer
4
Buy
Dont
buy
5
Buy
Dont
buy
91
the seller can do about it. There are two possibilities. One
is for the seller to condition his/her equilibrium strategy
on the node where he/she is located. This, in turn, would
reveal information to the buyer, and so the buyer should
revise his/her probability of a good refrigerator y (being at
4 or 5) suitably, using Bayes rule. The buyers equilibrium
strategy at the information set containing nodes 4 and 5
should then be arrived at using these revised probabilities.
The second possibility is that the sellers equilibrium strategy is independent of the information he/she has, and so of
the node where he/she is located. In this case, the buyer
needs to make sense of an off-equilibrium outcome in
stage 1. In particular, how should the buyer revise his/
her probability of a good refrigerator? As will be seen, this
has force in perfect Bayesian Nash equilibrium (PBNE),
which extends the concept of subgame perfect Nash equilibrium to the case of asymmetric information.
92
to the IR and IC constraints. This problem can be formulated as a constrained maximization problem.
The problem can also be viewed as a multistage game in
which the principal first designs a mechanism, M, that
specifies the terms of the contract. In particular, it says
what the salesperson would have to sell, and what he/she
would earn, both depending on what message he/she
sends. After the mechanism has been designed, the salesperson accepts or rejects the contract. Suppose the salesperson accepts. Then, the information he/she has and the
terms of the contract he/she chooses simultaneously send
a costless message to the manager that will determine
the salespersons payoff. Thus, the principal chooses the
mechanism and the agent chooses the message (and the
effort), and the equilibrium in this game solves for
the optimal contract. An important result in the theory
of contracts is that the message can be restricted to be
the information the agent has, and an optimal contract
has the property that the agent would not have an incentive
to report his/her information falsely. This is known as the
revelation principle, because the optimal contract
uncovers the information that the principal does not
have. To see the intuition behind the revelation principle,
suppose in the salesperson compensation problem that
there are two salespersons with differing abilities, high
(H) and low (L). Consider the (optimal) mechanism that
offers m(H) and m(L), m(H) 6 m(L), if salespersons report H and L, respectively. Suppose, contrary to the revelation principle, the mechanism induces H to lie, and
to report L instead. Then, because the mechanism is
optimal, the manager would have lost nothing by making
m(H) m(L). This sort of reasoning can be established
more rigorously. This is useful because it can restrict
our attention to a subset of all possible contracts. In
light of the revelation principle, the problem of optimal
contracting can now be thought of as maximizing
expected utility of the principal subject to three constraints: IR, IC, and truth telling.
Auctions
Auctions play a major role in many spheres of economic
activity, including e-commerce, government sale of communication frequencies, sale of state-owned assets to
private firms, and, of course, art. Sellers are interested in
maximizing their revenue and so analysis of equilibrium
93
vi maxBi
if Bi 4 maxBi ,
else:
4
94
i H, L,
95
ai
96
a deviant player without punishing themselves. No punishment can lead to a player receiving a payoff lower than
his/her minmax value. One of the consequences of the folk
theorem is that sub-game perfect Nash equilibrium
admits too many possible outcomes, calling into question
the usefulness of the equilibrium concept as a modeling
device. The challenge then is to model a situation carefully
about what players can observe and how they can implement punishments. The main contribution of results such
as the folk theorem is the reasonable way it offers to
capture tacit understanding among players who interact
repeatedly.
Acknowledgments
I thank Professors Nanda Kumar, Uday Rajan, and
Miguel Vilas-Boas for their suggestions on early drafts
of this article.
Further Reading
Abreu, D. (19880). Towards a theory of discounted repeated
games. Econometrica 56, 383396.
Ackerlof, G. (1970). The market for lemons: Quality
Uncertainty and the market mechanism. Q. J. Econ. 84,
488500.
Aumann, R., and Hart, S. (1992). Handbook of Game Theory
with Economic Applications. North Holland, New York.
Benoit, J.-P., and Krishna, V. (1985). Finitely repeated games.
Econometrica 17, 317320.
Binmore, K. (1990). Essays on the Foundations of Game
Theory. Basil Blackwell Ltd., Oxford.
Davis, D. D., and Holt, C. A. (1993). Experimental Economics.
Princeton University Press, Princeton, NJ.
Diamond, P. A. (1971). A model of price adjustment. J. Econ.
Theory 3, 156168.
Fudenberg, D., and Levine, D. K. (1998). The Theory of
Learning in Games. MIT Press, Cambridge, MA.
Fudenberg, D., and Maskin, E. (1986). The folk theorem in
repeated games with discounting or with incomplete
information. Econometrica 54, 533554.
Fudenberg, D., and Tirole, J. (1991). Game Theory. MIT
Press, Cambridge, MA.
Harsanyi, J., and Selten, R. (1988). A General Theory
of Equilibrium Selection in Games. MIT Press,
Cambridge, MA.
Kagel, J. H., and Roth, A. E. (1995). The Handbook of
Experimental Economics. Princeton University Press, Princeton, NJ.
Krishna, V. (2002). Auction Theory. Academic Press, San
Diego, CA.
Lal, R., and Matutes, C. (1994). Retail pricing and advertising
strategies. J. Bus. 67(3), 345370.
Luce, R. D., and Raiffa, H. (1957). Games and Decisions:
Introduction and Critical Survey. Wiley, New York.
McAfee, R. P., and McMillan, J. (1987). Auctions and bidding.
J. Econ. Lit. 25, 699754.
Milgrom, P., and Roberts, J. (1986). Price and advertising
signals of product quality. J. Pol. Econ. 94, 796821.
Milgrom, P., and Weber, R. (1982). A theory of auctions and
competitive bidding. Econometrica 50, 10891122.
Nash, J. (1950). Equilibrium points in n-person games. Proc.
Natl. Acad. Sci. U.S.A. 36, 4849.
Rao, R. C., and Syam, N. (2000). Equilibrium price communication and unadvertised specials by competing supermarkets. Market. Sci. 20(1), 6681.
Ross, S. (1977). The determination of financial structure: The
incentive-signalling approach. Bell J. Econ. 8, 2340.
Selten, R. (1965). Spieltheoretische Behandlung eines Oligopolmodells mit Nachfragetragheit. Z. Ges. Staatswiss. 121,
301324, 667689.
Selten, R. (1978). The chain-store paradox. Theory Decis. 9,
127159.
Smith, V. L. (1989). Theory, experiment and economics.
J. Econ. Perspect. 3(1), 151169.
97
Vickrey, W. (1961). Counterspeculation, auctions, and competitive sealed tenders. J. Finance 16, 837.
Wilson, R. (1993). Strategic analysis of auctions. Handbook of
Game Theory (R. Aumann and S. Hart, eds.). Amsterdam,
North Holland.
Generalizability Theory
Richard J. Shavelson
Stanford University, Stanford, California, USA
Noreen M. Webb
University of California, Los Angeles, Los Angeles, California, USA
Glossary
condition The levels of a facet (e.g., task 1, task 2, . . . , task k).
decision (D) study A study that uses information from a G
study to design a measurement procedure that minimizes
error for a particular purpose.
facet A characteristic of a measurement procedure such as
a task, occasion, or observer that is defined as a potential
source of measurement error.
generalizability (G) study A study specifically designed to
provide estimates of the variability of as many possible
facets of measurement as economically and logistically
feasible considering the various uses a test might be put to.
universe of admissible observations All possible observations that a test user would considerable acceptable
substitutes for the observation in hand.
universe of generalization The conditions of a facet to
which a decision maker wants to generalize.
universe score The expected value of a persons observed
scores over all observations in the universe of generalization
(analogous to a persons true score in classical test theory);
denoted mp.
variance component The variance of an effect in a G study.
Generalizability Studies
In order to evaluate the dependability of behavioral measurements, a generalizability (G) study is designed to isolate particular sources of measurement error. The facets
that the decision maker might want to generalize over
(e.g., items or occasions) must be included.
Encyclopedia of Social Measurement, Volume 2 2005, Elsevier Inc. All Rights Reserved.
99
100
Generalizability Theory
Universe of Generalization
The universe of generalization is defined as the set of
conditions to which a decision maker wants to generalize.
A persons universe score (denoted mp) is defined as the
expected value of his or her observed scores over all observations in the universe of generalization (analogous to
a persons true score in classical test theory).
An estimate of each variance component can be obtained from a traditional analysis of variance (or other
methods such as maximum likelihood). The relative magnitudes of the estimated variance components provide
information about potential sources of error influencing
a behavioral measurement. Statistical tests are not used in
G theory; instead, standard errors for variance component
estimates provide information about sampling variability
of estimated variance components.
grand mean
person effect
mi m
mo m
item effect
occasion effect
mpi mp mi m
mpo mp mo m
mio mi mo m
Xpio mp mi mo
mpi mpo mio m
residual
Decision Studies
G theory distinguishes a decision (D) study from a G
study. The G study is associated with the development
of a measurement procedure and the D study uses information from a G study to design a measurement that
minimizes error for a particular purpose. In planning
a D study, the decision maker defines the universe that
he or wishes to generalize to, called the universe of generalization, which may contain some or all of the facets
and conditions in the universe of admissible observations.
In the D study, decisions usually are based on the mean
over multiple observations rather than on a single observation. The mean score over a sample of n0i items and n0o
occasions, for example, is denoted as XpIO, in contrast to
a score on a single item and occasion, Xpio. A two-facet,
crossed D-study design in which decisions are to be made
on the basis of XpIO is, then, denoted as p I O.
s2pi
n0i
s2po
n0o
s2pio, e
n0i n0o
101
Generalizability Theory
Coefficients
Although G theory stresses the importance of variance
components and measurement error, it provides summary coefficients that are analogous to the reliability coefficient in classical test theory (i.e., true-score variance
divided by observed-score variance; an intraclass correlation). The theory distinguishes between a generalizability coefficient for relative decisions and an index of
dependability for absolute decisions.
s2p
s2p s2d
7
Dependability Index
For absolute decisions with a p I O random-effects
design, the index of dependability is:
F
s2p
s2p
Fl
Ep mp l2
EO EI EP XpIO l2
s2p m l2
s2p m l2 s2D
10
l) s (X
),
An unbiased estimator of (m l) is (X
is the observed grand mean over sampled
where X
objects of measurement and sampled conditions of
measurement in a D-study design.
2
Generalizability- and
Decision-Study Designs
G theory allows the decision maker to use different designs in the G and D studies. Although G studies should
use crossed designs whenever possible to avoid confounding of effects, D studies may use nested designs
for convenience or for increasing sample size, which
typically reduces estimated error variance and, hence,
increases estimated generalizability. For example,
compare sd2 in a crossed p I O design and
a partially nested p (I : O) design, where facet i is
nested in facet o, and n0 denotes the number of conditions
of a facet under a decision makers control:
s2d in a p I O design pI s2pO s2pIO
Generalizability Coefficient
s2D
s2pi
n0i
s2po
n0o
s2pio, e
n0i n0o
11
s2po
n0o
s2pi, pio, e
n0i n0o
12
2
2
2
In Eqs. (11) and (12), spi
, spo
, and spio,e
are directly
available from a G study with design p xi o and
2
2
2
spi,pio,e
is the sum of spi
and spio,e
. Moreover, given
cost, logistics, and other considerations, n0 can be
manipulated to minimize error variance, trading off, in
this example, items and occasions. Due to the difference
in the designs, sd2 is smaller in Eq. (12) than in (11).
102
Generalizability Theory
15
2
s2pio, e
s2o spo
s2io
n0o
n0o
n0i n0o
n0i n0o
Numerical Example
As an example, consider the following 1998 G study, by
Webb, Nemer, Chizhik, and Sugrue, of science achievement test scores. In this study, 33 eighth-grade students
completed a six-item test on knowledge of concepts in
electricity on two occasions, 3 weeks apart. The test required students to assemble electric circuits so that the
bulb in one circuit was brighter than the bulb in another
circuit and to answer questions about the circuits. Students scores on each item ranged from 0 to 1, based on
the accuracy of their judgment and the quality of
their explanation about which circuit, for example, had
higher voltage. The design was considered fully random.
Table I gives the estimated variance components from
the G study. sp2 (0.03862) is fairly large compared to the
other components (27% of the total variation). This shows
that, averaging over items and occasions, students in the
sample differed in their science knowledge. Because people constitute the object of measurement, not error, this
variability represents systematic individual differences in
achievement. The other large estimated variance components concern the item facet more than the occasion facet.
Generalizability Study and Alternative Decision Studies for the Measurement of Science Achievement
G study
Source of variation
s2
n0i 1
n0o 1
Person
sp2
0.03862
Item
si
Occasion
so2
2
Alternative D studies
n0i
n0i
6
1
6
2
8
3
12
1
12
2
sp2
0.03862
0.03862
0.03862
0.03862
0.03862
0.00689
sI
0.00115
0.00115
0.00086
0.00057
0.00057
0.00136
sO2
0.00136
0.00068
0.00045
0.00136
0.00068
pi
spi
0.03257
spI
0.00543
0.00543
0.00407
0.00271
0.00271
po
2
spo
0.00924
2
spO
0.00924
0.00462
0.00308
0.00924
0.00462
io
sio
sIO
pio,e
2
spio,e
0.05657
2
spIO,e
0.00943
0.00471
0.00236
0.00471
0.00236
sd2
0.09838
0.02410
0.01476
0.00951
0.01667
0.00969
sD2
0.10663
0.02661
0.01659
0.01082
0.01860
0.01095
0.28
0.62
0.72
0.80
0.70
0.80
0.27
0.59
0.70
0.78
0.67
0.78
F
a
Generalizability Theory
Multivariate Generalizability
For behavioral measurements involving multiple scores
describing individuals aptitudes or skills, multivariate
generalizability can be used to (1) estimate the reliability
of difference scores, observable correlations, or universescore and error correlations for various D study designs
and sample sizes; (2) estimate the reliability of a profile
of scores using multiple regression of universe scores
103
104
Generalizability Theory
2 X EMSq
c2 q
df q
17
variances at least under large-sample conditions. Minimum norm quadratic unbiased estimation (MINQUE)
and minimum variance quadratic unbiased estimation
(MIVQUE), unlike ML and REML, do not assume normality and do not involve iterative estimation, thus
reducing computational complexity. However, MINQUE
and MIVQUE can produce different estimators from the
same data set, and estimates may be negative and are
usually biased. In 2001, Brennan described two resampling techniques, bootstrap and jackknife, that can be used
to estimate variance components and standard errors.
Drawing on Wileys 2001 dissertation, bootstrap now
appears to be potentially applicable to estimating variance
components and their standard errors and confidence
intervals when the assumption of normality is suspect.
Another concern with variance component estimation is when a negative estimate arises because of sampling errors or because of model misspecification.
Possible solutions when negative estimates are small
in relative magnitude are to (1) substitute zero for
the negative estimate and carry through the zero in
other expected mean square equations from the analysis
of variance, which produces biased estimates; (2) set
negative estimates to zero but use the negative estimates
in expected mean square equations for other components; (3) use a Bayesian approach that sets a lower
bound of zero on the estimated variance component;
and (4) use ML or REML methods, which preclude
negative estimates.
Further Reading
Brennan, R. L. (2001). Generalizability Theory. SpringerVerlag, New York.
Cronbach, L. J., Gleser, G. C., Nanda, H., and Rajaratnam, N.
(1972). The Dependability of Behavioral Measurements.
John Wiley, New York.
Feldt, L. S., and Brennan, R. L. (1989). Reliability. In
Educational Measurement (R. L. Linn, ed.), 3rd Ed.,
pp. 105146. American Council on Education/Macmillan,
Washington, D.C.
Marcoulides, G. A. (1994). Selecting weighting schemes in
multivariate generalizability studies. Educ. Psychol. Meas.
54, 37.
Searle, S. R. (1987). Linear Models for Unbalanced Data. John
Wiley, New York.
Shavelson, R. J., and Webb, N. M. (1981). Generalizability
theory: 19731980. Br. J. Math. Statist. Psychol. 34,
133166.
Shavelson, R. J., and Webb, N. M. (1991). Generalizability
Theory: A Primer. Sage, Newbury Park, CA.
Generalizability Theory
Webb, N. M., Nemer, K., Chizhik, A., and Sugrue, B. (1998).
Equity issues in collaborative group assessment: Group
composition and performance. Am. Educ. Res. J. 35,
607651.
Webb, N. M., Shavelson, R. J., and Maddahian, E. (1983).
Multivariate generalizability theory. In Generalizability
105
Geographic Information
Systems
Michael F. Goodchild
University of California, Santa Barbara, California, USA
Glossary
Introduction
Encyclopedia of Social Measurement, Volume 2 2005, Elsevier Inc. All Rights Reserved.
107
108
Figure 1 A GIS-generated map showing the locations of polluting industrial plants (black dots, from the Environmental Protection Agencys Toxic Release Inventory) and average income by census tract (white denotes lowest
average income) for Los Angeles County. The map shows a clear association between pollution and low income.
Reprinted from Burke (1993), with permission.
Representation
At the heart of a GIS is a system of representation, by
which features in the real world are coded in the binary
alphabet of the digital computer. GIS representations
109
Georeferencing
Measuring Location
A system for accurately identifying location on the surface
of the Earth is an essential component of any GIS representation. The Meridian Conference of 1884 established
latitude and longitude as the universal standard for georeferencing, based on measurements from the Greenwich
110
Visualization
GIS is an inherently visual technology, inviting its users to
take advantage of the power and effectiveness of data
when rendered visually. Maps are the traditional way of
visualizing geographic information and GIS owes much to
the legacy of cartography, the science and art of mapmaking, and to successful efforts by cartographers to systematize the discipline. Summary or aggregate data associated
with polygons are often displayed in the form of
choropleth maps, using shading and other forms of polygon fill to distinguish values of the variable of interest.
Point data are typically displayed as symbols, again with
color or symbol size used to distinguish attribute values.
Commercial GIS software supports a vast array of possible
mapping techniques, including contour or isopleth maps
of fields, and cosmetic features such as legends, north
arrows, annotation, and scale bars.
It is important, however, to recognize the fundamental
differences between GIS displays and paper maps and the
advantages that the digital technology provides over traditional methods. First, GIS has changed mapmaking
from an expensive and slow process carried out by a few
highly trained cartographers to a fast and cheap process
available to all. Anyone armed with a computer, data, and
simple software can produce compelling maps (and also
misleading maps).
Second, GIS displays are inherently dynamic and interactive, whereas paper maps are essentially immutable
once created. GIS displays can portray changes through
time or allow users to zoom and pan to expose new areas or
greater detail. More than one display can be created simultaneously on a single screen. Maps can also be displayed beside other forms of presentation, such as tables,
and tables and maps can be linked in interesting ways
(e.g., clicking on a polygon in a map display can highlight
the corresponding row in a table). The term exploratory
spatial data analysis has been coined to describe the
interactive exploration of GIS data through maps and
other forms of presentation.
Spatial Analysis
Although the display of geographic information in the
form of maps can be powerful, the true power of GIS
lies in its ability to analyze, either inductively in searching
for patterns and anomalies or deductively in attempts to
confirm or deny hypotheses based on theory. The techniques of analysis available in GIS are collectively described
as spatial analysis, reflecting the importance of location.
More precisely, spatial analysis can be defined as a set of
techniques whose results depend on the locations of the
objects of analysis. This test of locational dependence
111
Query
Interactive displays allow users to determine answers to
simple queries, such as What are the attributes of this
object? or Where are the objects with this attribute
value? Some queries are best answered by interacting
with a map view, by pointing to objects of interest. Other
queries are better answered by interacting with a table
view, by searching the table for objects whose attributes
satisfy particular requirements. A histogram view is useful
for finding objects whose attribute values lie within ranges
of interest and a scatterplot view allows objects to be
selected based on comparisons of pairs of attributes. Finally, a catalog view allows the user to explore the contents
of the many data sets that might constitute a complete GIS
project.
Measurement
Earlier discussion of the origins of GIS emphasized the
importance of area measurement in the development of
CGIS. Many other simple measurements are supported
by GIS, including distance, length, terrain slope and aspect, and polygon shape. Measurements are typically returned as additional attributes of objects and can then be
summarized or used as input to more complex forms
of analysis.
Transformation
Many techniques of spatial analysis exist for the purpose
of transforming objects, creating new objects with new
attributes or relationships. The buffer operation creates
new polygons containing areas lying within a specified
distance of existing objects and is used in the analysis
of spatial proximity. The point in polygon operation
112
Summary Statistics
Search for pattern is often conducted by computing statistics that summarize various interesting properties of
GIS data sets. The center of a point data set is a useful
two-dimensional equivalent to the mean and dispersion is
a useful equivalent to the standard deviation. Measures of
spatial dependence are used to determine the degree
of order in the spatial arrangement of high and low values
of an attribute. For example, rates of unemployment by
census tract might be highly clustered, with adjacent tracts
tending to have similarly high or similarly low values, or
they might be arranged essentially independently, or adjacent tracts might be found to have values that are more
different than expected in a random arrangement.
Optimization
A large number of methods have been devised to search
for solutions that optimize specific objectives. These include methods for finding point locations for services such
as libraries or retail stores, for finding optimum routes
through street networks that minimize time or cost, for
locating power lines or highways across terrain, or for
planning optimal arrangements of land use. These
methods are often embedded in spatial-decision support
systems and underpinned by GIS software.
Hypothesis Testing
The sixth class consists of methods that apply the concepts
of statistical inference, in reasoning from a sample to the
characteristics of some larger population. Inference is
well established in science and it is common to subject
numerical results to significance tests, in order to determine whether differences or effects could have arisen
by chance because of limited sample size or are truly
indicative of effects in the population as a whole.
It is tempting to adopt statistical inference in dealing
with geographic information, but several problems stand
in the way. First, geographic data sets are often formed
from all of the information available in an area of interest
and it is therefore difficult to believe that the data are
Issues
As will be obvious from the previous section, the use of
GIS raises numerous issues concerning the nature of geographic information and inference from cross-sectional
data. It is generally accepted that cross-sectional data
cannot be used to confirm hypotheses about process,
but they can certainly be used to reject certain false hypotheses and to explore data in the interests of hypothesis
generation. Although GIS has evolved from the static view
inherent in paper maps, there is much interest in adding
dynamics and in developing methods of spatiotemporal
analysis.
Uncertainty is a pervasive issue in GIS. It is impossible
to measure location on the Earths surface exactly and
other forms of uncertainty are common also. For example,
summary statistics for reporting zones are means or totals
and clearly cannot be assumed to apply uniformly within
zones, despite efforts to ensure that census tracts are
approximately homogenous in socioeconomic characteristics. Results of analysis of aggregated data are dependent
on the boundaries used to aggregate (the modifiable areal
unit problem) and inferences from aggregated data regarding individuals are subject to the ecological fallacy.
Nevertheless, the outcomes of the widespread adoption of GIS in the social sciences since the 1980s are
impressive. It is clear that GIS has brought new power
to the analysis of cross-sectional data and the integration
Further Reading
Allen, K. M. S., Green, S. W., and Zubrow, E. B. W. (eds.)
(1990). Interpreting Space: GIS and Archaeology. Taylor
and Francis, New York.
Bailey, T. C., and Gatrell, A. C. (1995). Interactive Spatial
Data Analysis. Longman, Harlow, UK.
Burke, L. M. (1993). Environmental Equity in Los Angeles.
Unpublished M.A. thesis, University of California, Santa
Barbara.
113
Goodchild, M. F. (2000). Communicating geographic information in a digital age. Ann. Assoc. Am. Geogr. 90,
344 355.
King, G. (1997). A Solution to the Ecological Inference
Problem: Reconstructing Individual Behavior from Aggregate Data. Princeton University Press, Princeton, NJ.
Longley, P. A., Goodchild, M. F., Maguire, D. J., and
Rhind, D. W. (2001). Geographic Information Systems
and Science. Wiley, New York.
Monmonier, M. S. (1991). How to Lie with Maps. University of
Chicago Press, Chicago, IL.
Openshaw, S. (1984). The Modifiable Areal Unit Problem.
GeoBooks, Norwich, UK.
Robinson, A. H., Morrison, J. L., Muehrcke, P. C., Kimerling, A. J.,
and Guptill, S. C. (1995). Elements of Cartography, 6th Ed.
Wiley, New York.
Silverman, B. W. (1986). Density Estimation for Statistics and
Data Analysis. Chapman and Hall, New York.
Tobler, W. R. (1970). A computer movie: Simulation of
population growth in the Detroit region. Econ. Geogr. 46,
234 240.
Tufte, E. R. (1990). Envisioning Information. Graphics Press,
Cheshire, CT.
Geography
James O. Wheeler
University of Georgia, Athens, Georgia, USA
Glossary
gravity model A model that compares the volume of flow, or
spatial interaction, between two or more places based on
the mass (population) of these places and the distance
between the places.
index of dissimilarity A measure that compares the proportions of different occupational types within the study area.
index of segregation A measure that provides a single
statistic summarizing the spatial distribution between
a subcategory and the category as a whole.
information gain statistic A statistic indicating the overall
spatial relationship. It is also used as a goodness-of-fit
statistic comparing two spatial distributions, Pi and Qi; the
smaller the fit between the two distributions, the lower the
value of I.
location quotient A ratio of ratios used to measure and map
relative distributions or relative concentrations of a subarea
to the area as a whole.
principal components analysis A data transformation technique that measures the degree to which n variables in
a data set are intercorrelated. The technique uses unities in
the m n correlation matrix, where m refers to places or
areas, and transforms the n variable into a smaller number
of independent components, with each component
accounting for a decreasing proportion of the total variance.
social area analysis The application of factor analysis to
socioeconomic and demographic data for a number of
Encyclopedia of Social Measurement, Volume 2 2005, Elsevier Inc. All Rights Reserved.
115
116
Geography
Those who were attracted to Seattle by Ullmanincluding Brian Berry, with his undergraduate degree from
University College, University of Londoninstead chose
William Garrison as their mentor. Garrison had just returned from a years leave of absence in the newly formed
Regional Science program at the University of Pennsylvania, headed by Walter Isard, where Garrison had immersed himself in statistics, mathematical modeling, and
theory development and hypotheses testing procedures.
Many graduate students were attracted to Garrison and
his new geography, emphasizing the use of statistical
Location Quotient
The location quotient, a ratio of ratios, is a widely used
geographic index. It is used to measure and map relative
distributions or relative concentrations of a subarea to the
area as a whole. An example of its use might be to measure
the residential distribution of scientists within a metropolitan area compared to total workers. The formula is:
P
Xi = Xi
P 100
LQ
Ni = Ni
where LQ is the location quotient,
Xi is the value of a variP
able (scientists) in area i, Xi is the value of the variable
(scientists) in all the subareas combined (metropolitan
area), Ni is the total ofPworkers in each subarea of the
metropolitan area, and Ni is the total number of workers
in the metropolitan area (Tables I and II). Thus, the location quotient varies in our example with the proportion
of scientists in a given subarea to the total number of
Geography
117
No courses offered
No courses offered
Table I
Subarea
1
2
3
4
5
Total
Total workers, Ni
5
10
20
20
50
100
50
50
200
600
100
1000
LQ
1
2
3
4
5
50
200
400
100
83
0.05
0.10
0.20
0.20
0.50
0.10
0.05
0.05
0.20
0.60
118
Geography
Table III Location Quotient for Help-Supply Service Workers in the U.S. South by Metropolitan-Area Population Size
Categories, 1999
Metropolitan hierarchy
Location quotient
1.14
1.22
1.04
0.73
0.58
Whites
Blacks
Hispanic
Asian Indians
Cobb
DeKalb
Fulton
Gwinnett
53
38
51
44
62
74
89
44
51
46
52
35
62
46
62
41
Average
47
67
46
53
Index of Dissimilarity
The index of dissimilarity compares the proportional distribution of two subcategories, in contrast to the location
quotient, which compares the relative distribution of
a subcategory within a larger category. If, for example,
we wish to compare the number of help-supply service
workers with the number of manufacturing workers, the
index of dissimilarity is the appropriate measure.
P
P
N
X
jXi = Xi Yi = Yi j
ID
100
2
i1
P
P
proportions j(Xi / xi Yi / yi)j is taken. These absolute differences are summed for all areas i and divided
by two to avoid double counting. The index is multiplied
by 100 in order to express the index in whole numbers.
Its value ranges from 0 to 100, where 0 reflects identical
proportional distributions between the two subcategories
and 100 indicates a totally opposite spatial distribution.
The 1998 study by Zhang provides a useful example of
the application of the index of dissimilarity. His study was
of the residential locations of racial and ethnic groups in
four counties of the Atlanta metropolitan area (Table IV).
His findings show that Chinese Americans in Atlanta had
residential patterns that were most similar to Hispanics
and Whites, least similar to Blacks, and in an intermediate
position with respect to Asian Indians. The index of dissimilarity here is interpreted as the percentage of Chinese
Americans that would have to change census tracts within
a county in order for their residential locations to be identical with another racial or ethnic group. Thus, 89% of
Chinese Americans would have to shift census tracts in
Fulton County to achieve an even spatial distribution with
Blacks, representing the highest level of dissimilarity in
Table IV.
Index of Segregation
The index of segregation is related to the index of dissimilarity. The difference is that the index of segregation
provides a single statistic that summarizes the proportional spatial distribution between a subgroup and the
group as a whole:
IS
P
P
N
X
jXi = Xi Ni = Ni j
i1
100
119
Geography
The information gain statistic has been used by geographers in a variety of ways. It is a goodness-of-fit model that
has the advantage over correlation-regression analysis in
that it does not assume that the data are normally distributed. The information gain statistic allows the comparison
of two variables at a time, such as actual population scores
versus predicted population numbers for, say, counties in
North Georgia. Conceptually, information is gained from
a set of N messages that transform a set of N a priori
probabilities (Qi) into a set of N a posteriori probabilities
(Pi). Thus, there is an initial message (Qi) and a subsequent
message (Pi). At issue is whether the subsequent message
improves the information beyond the initial message.
The information gain statistic (Ii) is calculated as
follows:
Pi
Ii Pi ln
Qi
1
1980a
1987 a
1995 a
30
47
56
37
32
43
53
33
31
43
51
33
Table VI
Pi
Qi
0:36 ln
I1
N
X
i1
0:36
0:2116
0:20
Pi ln
Pi
Qi
2
Metropolitan area
Corporate assets
Qi
Houston
Dallas
Washington
Atlanta
Miami
103.4
84.7
69.4
26.4
17.9
0.3427
0.2805
0.2299
0.0875
0.0594
Total
301.8
1.0
Ii Pi ln
I1 0.08.
Population size
2905
2975
3061
2030
1626
12,597
Pi
Iia
0.2306
0.2362
0.2430
0.1612
0.1290
0.0914
0.0406
0.0135
0.0985
0.1000
1.0
120
Geography
Geography
Geographic Application of
Factor and Principal
Components Analysis
In his classic book Applied Factor Analysis, R. J. Rummel
considered factor analysis to be A Calculus of Social
Science. Most applications of factor analysis by human
geographers have followed the typical social science
methodological applications and procedures. The unique
contribution by geographers, however, has been that the
observation inputs to factor analysis have involved areal
units or locational points. As do other social scientists,
geographers have typically used factor analysis as a data
reduction technique whereby N attributes measured over
M areal units (observations) result in a number of factors
or components considerably smaller than N. Geographers
are not only interested in the composition of the factors
but also the geographic pattern of the factor scores. In
Standardized parameters
Family status
0.79
0.24
N 129
R2 0.66
Source: Wheeler (1990) with permission of the American Geographical Society; Dun and Bradstreet data.
121
Ethnic status
Variables
Income
Occupation
Education
Housing value
Marital status
Fertility
Family size
Age of adults
Black population
Hispanic population
Asian population
Linguistic groups
122
Geography
II
III
0.90
0.59
0.76
0.73
0.57
0.85
0.83
0.75
0.80
0.58
0.64
0.86
0.85
0.79
0.76
0.74
0.56
30.1
30.1
21.4
51.5
15.6
67.1
subjected to principal components analysis and then rotated. Factor I accounts for just more than 30% of the
variance in the matrix, and the three factors combined
show a cumulative sum of more than 67%. The highest
loadings for the three factors show that they are largely
concentrated among contiguous traffic zones.
Summary Comments
Because simple and multiple linear correlation and
regression are commonly used throughout the social sciences, including geography, no attempt has been made
here to introduce these techniques. Suffice it to say, that
geographers typically use places or areas as the observational units. Geographers, as do other social scientists,
Geography
Table X Other Common Statistical Techniques
Used by Geographers
Geographic sampling
Measures of central tendency
Measures of spatial dispersion
Chi-square
Point pattern and neighbor analysis
Rank correlation
Spatial autocorrelation
Trend surface models
Graph theoretic techniques
Multidimensional scaling
Analysis of variance
Analysis of covariance
Correlation-regression analysis
Cluster analysis
Discriminant analysis
Cononical correlation
123
Further Reading
Berry, B. J. L. (1993). Geographys quantitative revolution:
Initial conditions, 19541960. A personal memoir. Urban
Geogr. 14, 434441.
Boelhouwer, P. J. (2002). Segregation and problem accumulation in the Netherlands: The case of The Hague. Urban
Geog. 23, 560580.
Clark, W. A. V., and Hoskins, P. (1986). Statistical Methods for
Geographers. John Wiley & Sons, New York.
Eyre, J. D. (1978). A man for all regions: The contributions of
Edward L. Ullman to geography. Stud. Geogr. (University
of North Carolina, Chapel Hill) No. 11, pp. 115.
Fatheringham, A. S., Brunsdon, C., and Charlton, M. (2000).
Quantitative Geography: Perspectives on Spatial Data
Analysis. Sage, Thousand Oaks, CA.
Gong, H. (2002). Location and expansion of help supply
services in the U.S. South. Southeastern Geogr. 42, 4964.
Griffeth, D. A., and Amrhein, C. G. (1991). Statistical Analysis
for Geographers. Prentice Hall, Englewood Cliffs, NJ.
Gruenberg, B. C. (1925). Biology and Human Life. Ginn and
Company, New York.
LaValle, P., McConnell, H., and Brown, R. G. (1967). Certain
aspects of the expansion of quantitative methodology in
American geography. Ann. Assoc. Am. Geogr. 57, 423436.
McCrew, J. C. and Monroe, C. B. (2000). An Introduction to
Statistical Problem Solving in Geography, 2nd Ed. McGraw
Hill, New York.
Rogerson, P. A. (2001). Statistical Methods for Geography.
Sage, Thousand Oaks, CA.
Rummel, R. J. (1970). Applied Factor Analysis. Northwestern
University Press, Evanston, IL.
Wheeler, J. O. (1990). Corporate role of New York City in
metropolitan hierarchy. Geogr. Rev. 80, 370381.
Wheeler, J. O. (1999). Local information links to the national
metropolitan hierarchy: The southeastern United States.
Environ. Plann. A 31, 841854.
Wringley, N. (2002). Categorical Data Analysis for Geographers and Environmental Scientists. Blackburn Press,
Caldwell, NJ.
Zhang, Q. (1998). Residential segregation of Asian Americans
in the Atlanta metropolitan area, 1990. Southeastern Geogr.
38, 125141.
Geolibraries
Peter Keenan
University College, Dublin, Ireland
Glossary
Introduction to Geolibraries
The digital indexing of library content has greatly facilitated social research, making the searching and cross-referencing of research material much easier than with paper
formats. The digital format allows the easy search of
a variety of data types in addition to traditional text content. Geographic (or spatial) data are becoming increasingly available in digital form, which allows searching the
data by geographic location. Digital search techniques,
applied in geolibraries, may be of considerable value
to researchers as an extension of traditional paper libraries or nonspatial digital libraries. Digital geolibraries provide a comprehensive collection of all forms of data
related to place, suitably indexed and cataloged to
allow easy access by researchers. Use of geolibraries is
optimized by understanding their potential advantages
and some of the problems that exist in building comprehensive geolibraries.
Encyclopedia of Social Measurement, Volume 2 2005, Elsevier Inc. All Rights Reserved.
125
126
Geolibraries
Geolibraries
A geolibrary is a digital library that, in addition to being
searchable by traditional methods, stores information in
a format searchable by geographic location; such data are
thus georeferenced. In principle, a geolibrary need not
use online techniques for indexing or storage of content,
and paper-based map collections with appropriate catalogs have existed in libraries for many years. However,
geographic location-based indexing is very difficult to implement using paper-based approaches, and comprehensive indexes of this form are not common in traditional
libraries. Computerized techniques have now made comprehensive spatial indexing feasible and this will lead to
spatial referencing becoming increasingly important in
the future. Just as database management techniques
made possible the development of digital library catalogs
for traditional content, the development of the Geographic Information Systems (GIS) facilitated digital
indexing of geographically referenced information.
These techniques are especially valuable for the indexing
of inherently spatial data that might be used for further
processing by a GIS, but geographical referencing also has
a major contribution to make for other forms of content
across a wide range of social science disciplines.
A traditional index might identify resources associated
with a particular author or date of publication; potentially
this could also include multimedia formats, such as a piece
of music by a particular composer. A georeferenced catalog will store material connected with a particular place.
For instance, a geolibrary might store pictures of an
urban street or sound recordings of traffic noise at that
location. Another example is the combination of geographic data with pictorial records to produce threedimensional representations of urban environments.
Though these multimedia formats have great potential,
the seamless integration of these diverse data poses
difficulties in all digital libraries and the coherent
georeferencing of multimedia content poses a distinct
challenge in a geolibrary.
The concept of a geolibrary originated in the 1990s,
and the Alexandria Digital Library at the University of
California in Santa Barbara is generally regarded as the
first major prototype geolibrary. The geolibrary model
was further defined within the GIS community by the
Workshop on Distributed Geolibraries: Spatial Information Resources, convened by the Mapping Science Committee of the U.S. National Research Council in June
1998. The idea of a geolibrary is therefore a relatively
new one, and the data-intensive nature of geographic
information has meant that geolibraries have inevitably
Geolibrary Applications
Geolibrary Software
For any digital data, accessibility is largely determined
by the availability of appropriate tools. A geolibrary will
comprise both georeferenced data and the tools to access
the data. Michael Goodchild, the most prominent
researcher in geolibraries, has identified a number of
Geolibraries
127
Geolibrary Operations
A geolibrary should allow the user to overcome these
problems by providing tools and data to answer questions
related to geographic entities of interest, in addition to the
128
Geolibraries
WI
MI
OH
IL
IN
Geolibrary Example
A number of geolibrary prototype systems exist, although
these presently fall some way short of the ultimate potential
of a geolibrary, they are constantly evolving to include more
information. Examples of current systems available over
the World Wide Web include the Alexandria Digital Library web client (http://webclient.alexandria.ucsb.edu/)
located at the University of California in Santa Barbara
and the Geography Network (http://www.geographynetwork.com) hosted by ESRI Corporation. Because of
the limitations of the software commonly used on
the web, these provide a relatively unsophisticated
geolibrary browser access to georeferenced material and
a gazetteer.
Using the Alexandria web client as an example, the user
is initially presented with a screen with an interactive map
in one window and the ability to enter text commands.
User interaction through the map browser or text dialogue
boxes can change the bounding box that determines the
footprint being examined. The interface allows different
shapes of bounding box to be selected and the user can
zoom in to the level of detail required (Fig. 2). The object
of a search is to identify information in the database with
a geographic footprint that overlaps in some way with the
footprint of the region of interest to the user. The user can
request information with a georeferenced footprint that
lies entirely inside the selected region, footprints that
overlap with it, or indeed footprints that are entirely outside the region. The library will return the results of such
a search as database entries with the required footprint.
These might include gazetteer entries for towns in the
area selected, details of relevant offline information such
as books, and links to online information such as digital
maps and aerial photographs of the region.
In their present immature state, geolibrary services
inevitably lack much potentially useful information that
will be added in such resources in the future. The limitations of web technology mean that the interfaces are not
as easy to use as they might be, and dedicated geographic
browser software may be needed to address this
Geolibraries
129
Issues in Geolibraries
Difficulties in Using Spatial Techniques
Social science researchers have become accustomed to
using digital tools such as database queries and statistics.
Though these tools are widely used, not every user makes
full use of their potential, in part because of a lack of
understanding of the techniques involved. Modern statistical analysis packages provide a range of powerful statistical techniques, but the limitations of these may not be
fully comprehended by a user whose primary domain is
not statistics. Spatial techniques are even more complex
and the development of appropriate interfaces to make
these approaches accessible to the nonspecialist user is an
important challenge for geolibrary designers. Geographic
interfaces often assume user familiarity with specialist
geographic terminology; this assumption is unlikely to
hold true for a broader user community. Training can
offer one solution to this lack of experience with spatial
techniques, but it is always likely to be the case that the
desire of social researchers to use geolibraries exceeds
their willingness to undergo long training courses.
There is a danger that inexperienced users may fail
to use a geolibrary to its full potential, owing to
ignorance of many of its capabilities. The geolibrary
tools need to make access as convenient as possible,
while preventing inexperienced users from using the
system inappropriately.
Geolibrary interfaces need to address a number of
issues: the formulation of the problem, the identification
Metadata
Identification of the relevant data for a specific problem
requires that the contents of the geolibrary be comprehensively spatially indexed. As the volume of spatially referenced data continues to grow, indexing of these data
resources plays a vital role in facilitating the wider use of
spatial techniques. The correct spatial association between
spatial entities is complicated by the use of different spatial
units for the aggregation. This is a problem faced by other
forms of digital data storage, and techniques such as data
dictionaries are used to facilitate the organization of digital
databases. The spatial equivalent of these approaches is
needed to facilitate the use of geolibraries. A variety of
spatial directories exist; for example, in the All Fields Postcode Directory in the United Kingdom, these relate spatial
units such as postal codes to other spatial representations.
However, to document spatial relations comprehensively,
130
Geolibraries
Economics of Geolibraries
A geolibrary is a combination of various data sources,
collected by different organizations, often at considerable
cost. If spatially related data are to become freely available
for inclusion in geolibraries, then the question of the
funding of these libraries arises. One approach is for
geolibraries to be regarded as a public benefit that should
be funded by the government. Many spatial data are of
public interest, and a large volume of spatial data originates with government agencies such as traditional cartographic agencies (e.g., the Ordnance Survey in the United
Kingdom, or the Bundesamt fur Kartographie und
Conclusion
Digital indexing and storage of data allow many extended
possibilities for social research. These partly arise from
the convenience of the digital format, but also because
new types of data and data relationships can be exploited.
One such opportunity exists for exploiting spatial relationships; this requires the georeferencing of data, which
has been greatly facilitated by digital processing. For
this form of data to be easily used, the traditional library
needs to be extended in the form of a geolibrary. This
would provide a comprehensive collection of all forms of
data related to any one place, indexed and cataloged to
Geolibraries
131
Further Reading
Boxall, J. (2002). Geolibraries, the Global Spatial Data
Infrastructure and Digital Earth. Int. J. Special Librar.
36(1), 1 21.
Goodchild, M. F. (1998). The geolibrary. In Innovations in
GIS (V. S. Carver, ed.), pp. 59 68. Taylor and Francis,
London.
Jankowska, M. A., and Jankowska, P. (2000). Is this a geolibrary? A case of the Idaho Geospatial Data Center. Inf.
Technol. Librar. 19(1), 4 10.
Larson, R. R. (1996). Geographic information retrieval and
spatial browsing. In GIS and Libraries: Patrons, Maps and
Spatial Information (L. Smith and M. Gluck, eds.),
pp. 81 124. University of Illinois, Urbana.
Mapping Science Committee. (1999). Distributed Geolibraries: Spatial Information Resources. National Academy
Press, Washington, D.C. Available on the Internet at http://
www.nap.edu
Glossary
ballistic relations Causal associations that are probabilistic in
the sense of being better depicted as a joint statistical
distribution than as a deterministic law.
concerted volition The sum of the distinctive responses
made by individuals subjected to common stimuli and
interstimulation that produced similar behavior, such as
cooperation or competition.
consciousness of kind Sympathy, especially through the
recognition of similarity or common membership in
a category, usually with social consequences, for example,
group identity.
index numbers Numbers derived typically by adding heterogeneous items that are assigned predetermined numerical
values.
profit-sharing A method by which workers are rewarded
a fraction of the net profit of an enterprise.
Encyclopedia of Social Measurement, Volume 2 2005, Elsevier Inc. All Rights Reserved.
133
134
Academic Career
Giddingss academic career began when he was appointed
to Bryn Mawr in 1888, taking over the position that
Woodrow Wilson vacated for Princeton. He continued
to write essays, notably an 1898 essay Imperialism (collected in Democracy and Empire, 1900) that responded
to William Graham Sumners attack on the emergence
of American imperialism in the wake of the SpanishAmerican War, in which he argued that empires could
be social aggregations founded on positive common ideals. At Bryn Mawr, he taught economics, history, and
politics and introduced sociology. At the beginning of
his career, he was active in the AEA, holding several positions in the association. He published various articles on
economics, two of which were included in The Modern
Distributive Process, which he produced with his mentor,
the economist John Bates Clark, in 1888. His firstpublished discussion of sociology appeared in 1890, but
he had presented a paper on The Sociological Character
of Political Economy to the AEA in 1887. In 1891, he
published an article on Sociology as a University Subject
and engaged in a series of exchanges on the general theme
of the relation between sociology and economics and the
other social sciences. He also began in 1892 to travel to
New York to lecture at the rapidly developing faculty
of social and political science at Columbia University.
Statistics in this Columbia faculty was then taught by
the economic and social statistician Richmond MayoSmith, who represented the tabular tradition in statistics
and directed the first American social science dissertation
on a recognizably sociological topic, Walter Willcoxs 1891
study of divorce. Giddings was appointed Professor of
Sociology at Columbia in 1894 (the first sociology professorship in an American college, established through
the efforts of active social reformer Seth Low). In 1906,
Giddings held both the chair in Sociology and the newly
established chair in the History of Civilization as the
Carpentier Professor of Sociology and the History of Civilization, a position he held until 1928. He was awarded
a D.L.D. by Columbia in 1929.
Giddings published his key theoretical work, The
Principles of Sociology, in 1896, and it was promptly
and widely translated; his key early methodological
work, Inductive Sociology, was published in 1901. His
Public Intellectual
Giddings was a public intellectual and served as a member
of the school board; various commissions; the Committee
of One Hundred, which was organized to chose a nonpartisan candidate for Mayor of New York City; and on the
board of the Charity Organization Society. He continued
to produce for the press throughout his career, notably for
Hamilton Holts Independent, to which for many years he
contributed an editorial a week, as well as writing regularly for Van Nordens and the New York Times Magazine.
Never one to shy from controversy, he spoke regularly at
the Rand School, a center for radical and socialist lectures,
often antagonizing his audience. He was, in general,
a defender of the middle class and a skeptic about social
amelioration, considering many of the ills to which reform
was addressed to be better understood as costs of progress. He was an early supporter of the American entry into
World War I and was one of the principal speakers
at Carnegie Hall at the Lusitania dinner. He later toured
the country for the national Security League in support
of the war and was an officer of the League to Enforce
Peace. His outspoken position embroiled him in controversy at Columbia, which was the site of conflict over
the war (and of a famous academic freedom issue involving James McKeen Catell, which divided the faculty from
the administration), and at the end of his career he was
seen by the administration and funding agencies to be an
obstacle to the development of sociology at Columbia.
Giddingss Influence
Although his early years at Columbia were occupied with
the problems of training social workers and meeting the
demand for lectures on social reform topics, he quickly
established general sociology, particularly theory and
method, as his primary concern, publishing his Principles
of Sociology in 1896 and in it staking a claim to the scientific approach to the field. Columbia at the time required training in several fields for its social science
graduates, so Giddings was presented with the peculiar
135
136
three stages are a modification of Pearsons stages of ideological, observational, and metrical, which Giddings
quoted in his lectures and which themselves were
a modification of Comtes three stages. In 1920, Giddings
gave the following formulation:
Science cannot, as a distinguished scientific thinker said
the other day, even get on without guessing, and one of its
most useful functions is to displace bad and fruitless
guessing by the good guessing that ultimately leads to
the demonstration of new truth. Strictly speaking, all
true induction is guessing; it is a swift intuitive glance
at a mass of facts to see if they mean anything, while
exact scientific demonstration is a complex process of deducing conclusions by the observations of more facts.
Measurement
Statistics and Sociology (1895) by Giddingss Columbia
colleague Richmond Mayo-Smith remained squarely in
the nineteenth-century tradition of moral statistics. In it,
Mayo-Smith commented, with a sociological purpose, on
material of the sort collected by state bureaus, such as
tables of vital statistics. In his Inductive Sociology and his
articles, such as The Measurement of Social Pressure
(1908), Giddings attempted something quite different
the measurement of magnitudes that derive either from
sociological theories or from social concepts such as labor
unrest. The data he proposed to use were not very different from the sort discussed by Mayo-Smith, but they were
conceived in a distinctly new fashion.
The key difference was how the problem arose.
Giddings was concerned with a theoretical question,
with the constraints on political evolution (especially
the achievement of democracy) that result from conflicts
with primordial ties such as kinship and from the inadequate psychological and characterological evolution of
personalities. His approach to this problem is indicative
of his approach to sociology. We might investigate this
kind of problem impressionistically by deciding if some
137
138
Causation
Giddings and his students also struggled with the relationship between correlation and cause. In the 1911 edition of The Grammar of Science, Pearson treated the
difference between cause and correlation as merely
a matter of degree; the difference between the laws of
physics and the relations between parental and adult childrens stature, for example, is the quantitative fact of degree of variation, and even observations in physics do not
perfectly match up with the idealized laws of physics.
Accordingly, he urged the abandonment of the distinction
between laws and correlations. Almost everything is correlated with everything else, Pearson insisted. Analogously, as Giddings explained, every manifestation of
energy is associated with other manifestations, every condition with other conditions, every known mode of behavior with other modes. Giddings and his students
attempted, as did their biometric colleagues, to find
a proper middle ground between collapsing cause into
correlation and adhering to the traditional notions of
cause and law while searching for an adequate formulation of the compromise.
Further Reading
Bannister, R. C. (1987). Sociology and Scientism: The
American Quest for Objectivity, 1880 1940. University of
North Carolina Press, Chapel Hill, NC.
139
Glossary
external evaluation Inquiries aimed at judging the merit,
worth, quality, and effectiveness of an intervention
conducted by independent evaluators.
goal-based evaluation Program evaluation that assesses the
degree and nature of goal attainment.
goal-free evaluation Program evaluation that assesses the
extent to which and ways in which the real needs of
participants are met by the program without regard to the
programs stated goals.
internal evaluation Evaluative inquiries undertaken by staff
within a program or organization.
needs assessment Determining those things that are essential and requisite for a particular population that are
lacking, the degree to which they are lacking, and the
barriers to their being provided or met.
program evaluation The systematic collection of information
about the activities, characteristics, and outcomes of programs to make judgments about the program, improve
program effectiveness, and/or inform decisions about future
programming.
program goals The desired and targeted outcomes or results
of program activities.
Encyclopedia of Social Measurement, Volume 2 2005, Elsevier Inc. All Rights Reserved.
141
142
Multiple Goals
Another challenge in conducting goal-based evaluations
involves prioritizing goals. Few programs have only one
primary goal. For example, the Headstart program aims to
prepare preschool children for success in school, but it also
includes health assessment goals, nutrition goals, parenting goals, and sometimes even employment or community
organizing goals. Evaluation priorities must be set to
determine which goals get evaluated and how scarce
resources should be allocated to evaluating different goals.
Goal-Free Evaluation
Philosopher and evaluator Michael Scriven has been
a strong critic of goal-based evaluation and, as an alternative, an advocate of what he has called goal-free
evaluation. Goal-free evaluation involves gathering
data on a broad array of actual effects and evaluating
the importance of these effects in meeting demonstrated
needs. The evaluator makes a deliberate attempt to avoid
all rhetoric related to program goals. No discussion about
goals is held with staff and no program brochures or
proposals are read; only the programs actual outcomes
and measurable effects are studied and these are judged
on the extent to which they meet demonstrated participant needs.
143
144
Further Reading
Most evaluation designs focus on assessing goal attainment and goals clarification remains a major task undertaken by evaluators early in the evaluation design process.
Goal-free evaluation has not resonated much either
with evaluators or with those who commission and
use evaluations. Relatively few goal-free evaluations
have been reported in the literature. A few comprehensive evaluations include hybrid designs in which a goalfree evaluator works parallel to a goal-based evaluator, but
such efforts are expensive and, accordingly, rare.
Introduction
Psychometrics, or mathematical psychology, distinguishes itself from the general statistics in two ways:
(1) Psychology has its own goal of understanding the behavior of individuals rather than dealing with them as
mass products and (2) psychometrics includes measurement of hypothetical constructs, such as ability, attitude,
etc. Item response theory (IRT) started as modern mental
test theory in psychological measurement and as latent
structure analysis in social attitude measurement. In its
Strongly
agree
Agree
Agree
Disagree
Neutral
Disagree
early days, IRT was called latent trait theory, a term that is
still used in latent trait models, which represent mathematical models in IRT.
For several decades IRT dealt with solely dichotomous responses such as correct/incorrect, true/false,
and so on. In 1969, the general graded response
model was proposed by Samejima. This model represents a family of mathematical models that deal with
ordered polytomous responses in general. To understand
the need for the graded response model, the following
example may be useful. Suppose that included in
a questionnaire measuring social attitudes toward the
war is the statement: A war is necessary to protect
our own country. Figure 1 presents examples of three
possible types of answer formats. Figure 1A shows
a dichotomous response that allows the subject choose
one of the two categories: Disagree and Agree. Figure 1B shows a graded response that allows the subject to
select one of the five response categories: Strongly
Strongly
disagree
Glossary
C
Totally
disagree
Neutral
Response
Totally
agree
Figure 1 The three response formats. (A) Dichotomous response format. (B) Graded response format. (C) Continuous
response format.
Encyclopedia of Social Measurement, Volume 2 2005, Elsevier Inc. All Rights Reserved.
145
146
Homogeneous and
Heterogeneous Cases
Let Ug ( 0, 1) denote a binary score for dichotomous
responses. The item characteristic function (ICF) is defined by:
Pg y prob: Ug 1 j y :
6
This is a special case of Eq. (1), which is obtainable by
replacing the graded item score Xg by the dichotomous
item score Ug, and by replacing its realization xg by
a specific value, 1. It is noted that, when mg 4 1, the
mg 1 graded response categories can be redichotomized by choosing one of the mg borderlines between
any pair of adjacent graded response categories and
creating two binary categories, as is done when the letter
grades A, B, and C are categorized into Pass and D and
F into Fail. When the borderline was set between the
categories (xg 1) and xg, the cumulative operating
characteristic Pxg y, defined by the right-hand-side of
147
12
1
1
1 exp Dag y bg
11
148
1.0
Grade 0
Grade 1
Grade 2
Grade 3
Grade 4
Grade 5
Grade 6
0.9
Operating characteristic
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
5.0
4.0
3.0
2.0
1.0
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
Strongly agree and Moderately agree), the operating characteristics of these new categories can
be found in the original mathematical model.
In the acceleration model, however, the first kind of additivity does not rigorously hold, although it does hold in
most practical situations. It can be said that models in
category 1 are less restrictive and have a broader range
of applicability because of their additivity.
Dzg !0
q
P y
qzg zg
13
q
bz
qzg g
14
provided that bzg is a strictly increasing and differentiable function of zg. Samejima proposed in 1973 and
1974 the normal ogive and logistic models for continuous responses expanded from those for graded
responses; they are specific cases of the general outcome
given by Eq. (14). Additivity holds in these models in
the homogeneous case, as can be seen in Eqs. (5), (7),
and (8).
Whereas generally graded response models that are
homogeneous can be naturally expanded to continuous
response models, models that are heterogeneous are
more diverse, namely some belonging to category 1 and
others to category 2that is, they are discrete in nature.
The logistic positive exponent model for graded responses
is a good example of a graded response model that can be
naturally expanded to a continuous response model. The
operating density characteristic of the resulting continuous response model is obtained from Eqs. (10) and (13):
x
q
Hzg y Cg y zg logCg y
xz g
qzg
15
17
149
Operating Characteristic of
a Response Pattern
Let V denote the response pattern, which represents the
individuals performance for n graded response items by
a sequence of graded item score Xg for n items, and let
v be its realization. Thus,
v fxg g0
for
g 1, 2, . . . , n:
20
150
21
Information Functions
Samejima proposed the item response information function for a general discrete item response. For a graded
response xg, it is denoted by Ixg(y) and defined by:
q2
Ixg y 2 log Pxg y
qy
"
#2
q=qyPxg y
q2 =qy2 Pxg y
Pxg y
Pxg y
22
"
#2
X q=qyPxg y
:
Pxg y
xg
23
Note that Eq. (23) includes Birnbaums 1968 item information function for the dichotomous item, which is
based on a somewhat different rationale, as a special case.
Samejima also proposed the response pattern information function, Iv(y). For graded responses, it is given by:
X
q2
log Pv y
Ixg y
2
qy
xg [v
Iv y
24
X q2
v
qy
n
X
log
P
Ig y:
v
v
2
25
g1
60
54.8
50
40
27
Eq. (27) has an important implication because the reduction in the amount of test information leads to the
inaccuracy of the estimation of y. A good example of
the aggregate T is the test score, which is the sum total
of the n xg values in V v. Because the reciprocal of the
square root of the test information function can be used as
an approximate local standard error of estimation, it is
obvious from Eq. (27) that the accuracy of the estimation
of the latent trait y will be reduced if the test score is used
as the intervening variable. Thus, it is essential that y be
estimated directly from the response pattern, not from the
test score.
It should also be noted that the item response information function Ixg(y) and the response pattern information function Iv(y), which are defined by Eqs. (22) and
(24), respectively, have important roles in the estimation
of the latent trait y based on the subjects response pattern
v. For example, even if Ixg(y) of one or more xg that belong
to the response pattern v assumes negative values for
some interval(s) of y, the item information function
Ig(y) that is defined by Eq. (23) will still assume nonnegative values for the same interval(s), as is obvious
from the far right-hand-side expression in the equation.
The consideration of this fact is especially necessary in
programming CAT to select an item to present to an individual subject when the three-parameter logistic model
is adopted for multiple-choice test items in the item pool.
151
qy
qy
uxg
Axg y
28
152
qy
xg [v
X
Axg y 0
Further Reading
29
xg [v
Discussion
A strength of the graded response model in IRT is that it
can be used in many different types of research, and there
is still room for innovation. In doing so, model selection
for specific research data should be made substantively,
considering the nature of the data and the characteristics
of the models.
Acknowledgments
The author is obliged to Wim J. Van der Linden and Philip
S. Livingston for their thorough reviewing and comments.
153
Graph Theory
Stephen C. Locke
Florida Atlantic University, Boca Raton, Florida, USA
Glossary
algorithm A procedure for solving a problem.
bipartite graph A graph that has two classes of vertices; for
example, vertices may represent men and women, with the
additional property that no edge joins two members of the
same class.
connected A graph is connected if for each pair of vertices,
there is a path from one to the other.
cycle A route, using at least one edge, that returns to its
starting point, but does not repeat any vertex except for the
first and the last.
directed graph A generalization of a graph, in which some
edges may have directions; similar to one-way streets in
a city.
edge A connection between a pair of vertices, denoting that
the two vertices are related in some way.
heuristic A procedure for obtaining an approximate solution
to a problem.
matching A pairing of some vertices, so that the vertices in
each pair are joined by an edge.
NP-complete A set of problems currently assumed to be
computationally difficult; a subset of nondeterministic
polynomial (NP)-time.
path A route that can be followed from one vertex to another
by means of edges between successive vertices of the path.
simple Without loops or multiple edges.
tree A connected graph with no cycles.
vertex A basic element of the graph, possibly representing
physical objects or places.
History
The river Pregel flows through the city of Konigsberg
(Kaliningrad). In the middle of the river is an island,
the Kneiphof, after which the river branches. There are
seven bridges over various parts of the river. The citizens
of the town enjoy strolling through the city and are curious
whether there is a route that crosses each bridge exactly
once. This problem, a popular mathematical game
known as the Konigsberg Bridge Problem, was solved
by Leonhard Euler in 1736, and, in doing so, Euler stated
the first theorem in the field of graph theory.
Another popular amusement is the Knights Tour
Problem: can a knight travel around a standard chessboard, landing on each square exactly once, and returning
to its initial square? This problem was treated by
Alexandre-Theophile Vandermonde in 1771. A similar
problem using the vertices and edges of a dodecahedron
was marketed as the Icosian Game by Sir William
Rowan Hamilton in 1857. The game was a flop, but
over 1600 research papers on Hamilton cycles (cycles
through all of the vertices) have since been published,
and thousands more papers have explored related topics.
The word graph is derived from graphical notation,
introduced by Alexander Crum Brown in a paper on isomers in 1864. Crum Brown was a contemporary of Arthur
Cayley and James Joseph Sylvester, who used related
graphs to study chemical properties. One of the most
notorious problems in graph theory originated in 1852,
when Frederick Guthrie asked his professor, Augustus De
Morgan, whether four colors were always sufficient to
Encyclopedia of Social Measurement, Volume 2 2005, Elsevier Inc. All Rights Reserved.
155
156
Graph Theory
Notation
Figure 1 is a labeled graph depicting the relationships
between various pairs of people. In the figure, vertices
representing people have been drawn as small open circles, and an edge (connection) between two vertices indicates that the two people have spoken to each other.
Thus, for example, Brad has spoken with Janet, and both
have spoken with Frank. It would be easy to modify the
picture to record that the interchanges had been in one
direction. An arrow on the edge from Brad to Janet, pointing in the direction of the communication, would show
that Brad has spoken to Janet, but Janet did not respond.
To show that Brad spoke to himself, an edge (loop) could
be drawn from Brad back to Brad. Five edges (a multiple
edge) could also be drawn from Brad to Janet if they have
spoken on five occasions. In the following discussions,
unless otherwise specified, only simple graphsthose
that are undirected, loopless, and lacking multiple
edgesare considered.
Graph theory notation is not completely standardized
and the reader should be aware that vertices may sometimes be called nodes or points, and edges may be
called lines or links. Graphs can be used to model
social relationships, transportation networks, hierarchies,
chemical or physical structures, or flow of control in an
algorithm. Perhaps one of the largest examples of
a (directed) graph that many people use every day is
Frank
Adrian
Rocky
Brad
Janet
Moose
the World Wide Web. The files are the vertices. A link
from one file to another is a directed edge (or arc).
Several properties of graphs can be studied. In Fig. 1,
Frank and Janet would be called neighbors of Brad. If
Brad wished to get a message to Moose, he could tell
Janet, who would pass the message along to Rocky, and
then Rocky would give the message to Moose. This describes a path from Brad to Moose. In this figure, there is
a path connecting any two of the vertices and the graph is
connected, or the graph has one component. The number
of edges in the path is its length. The length of a shortest
path between any two vertices is the distance between
those vertices. The diameter of a graph is the maximum
number that occurs as a distance. Figure 1 has diameter 3,
because there is no path from Brad to Moose using fewer
than three edges, but for every pair of vertices the path
length is at most three. There are Internet web sites that
will compute the driving distance between any two
points in the United States. A famous metaconjecture
(and movie title, Six Degrees of Separation) proposes
building a graph similar to the example in Fig. 1, but with
each person in the United States as a vertex, and edges
between two people if they normally communicate with
each other; that graph should have diameter 6 at most.
This is a metaconjecture because it would take a great
deal of trouble to prove itthe graph would be constantly
changing and it would never be quite correct.
In Fig. 1, if Rocky were to move away from the social
circle, Brad, Janet, and Frank could still communicate with
each other, but Moose and Adrian would have no route for
their communications, nor could they talk to the other
group. Rocky would thus be defined as a cut vertex, i.e.,
a vertex that increases the number of components when
deleted (in this case, disconnecting the graph). If both
Frank and Janet were to leave, instead of Rocky, there
would again be no way for Brad to communicate with
Moose. Thus {Janet, Frank} is a cut set of vertices. Now,
consider removing edges instead of vertices. Suppose that
Rocky and Adrian have an argument and are unwilling to
speak to each other. Then, there would be no way for, say,
Graph Theory
157
158
Graph Theory
Algorithms
Having a list of problems that can be addressed with the
language of graph theory is a useful end in itself. Even
better, once a problem has been identified, problems of
a similar nature can be sought. Often, these problems
have been solved, and there will be an algorithm available.
An algorithm is a finite, terminating set of instructions to
solve a given problem. An algorithm is of order T(n), or
O(T(n)), if there is a constant c, such that for any input
of size n, the algorithm takes no more than cT(n) steps
to complete its task, as long as n is large enough. If T(n) is
a polynomial in n, the algorithm is polynomial-time, or
good. There is a special, very well-studied class of yes/no
problems for which there are no known good algorithms,
but the existence of a good algorithm for any one problem
in the class would give a good algorithm for all problems in
the class. This is the class of NP-complete problems (the
technical definition of which is complicated and beyond
the scope here). If a mathematician is having difficulty
solving a problem from this class, the excuse that nobody
else knows how to do it efficiently either is a good excuse!
Consider now some of the standard problems from
graph theory, starting with some NP-complete problems.
Assume in each case there is a graph G, and in some cases,
there is an integer k:
1.
2.
3.
4.
Graph Theory
159
Unsolved Problems
In many areas of mathematics, it takes years of study to
reach the point where it is possible to understand the
interesting unsolved problems. In graph theory, a few
hours of study already leads one to unsolved problems.
The four-color problem, mentioned previously was unsolved for 140 years, yet it takes little to understand the
statement of the problem. Bondy and Murty made up a list
of 50 unsolved problems in 1976, and most of those
problems are still open today. The two following problems
interest many researchers.
1. The cycle double cover problem: If a graph has no
edge whose deletion disconnects the graph, is there
a collection of cycles that together contain every edge
exactly twice? If a graph can be drawn in the plane,
and if that graph has no edge whose deletion disconnects
the graph, then the faces of the graph are cycles that
together contain every edge exactly twice.
2. The reconstruction problem: Given the deck of
a graph on at least three vertices, can we determine
uniquely what the original graph was? Given a graph
G, with vertices v1, . . . , vn, let Gk denote the subgraph
of G obtained by deleting the vertex vk. Now, draw these
vertex-deleted subgraphs, G1, . . . , Gk, as unlabeled
graphs, each on a separate index card. This is called
the deck of G.
Further Reading
The books by Chartrand (1985) and Trudeau (1976) are
accessible to high school students. For an undergraduate
or beginning graduate presentation, Bondy and Murty (1976)
or West (2001) would be useful. For further information on
algorithms, see Even (1979). At the graduate level, there are
texts on algebraic graph theory, algorithmic graph theory,
cycles in graphs, graphical enumeration, matroid theory,
random graphs, topological graph theory, as well as texts on
specific unsolved problems.
160
Graph Theory
Graunt, John
Philip Kreager
Somerville College, Oxford University, Oxford, England, United Kingdom
Glossary
life expectation The average number of additional years
a person would live, when the rate of mortality indicated by
a given life table holds.
life table A detailed descriptive model of the mortality of
a population, giving the probability of dying at each age;
17th- and 18th-century tables were based on the assumption that populations are stationary, i.e., their total number
and age distribution are constant.
natural history The scientific study of animals and plants
based on observation rather than on experiment.
scholasticism Teaching based on the medieval university
framework of the trivium (grammar, logic, and rhetoric)
and quadrivium (mathematical arts of arithmetic, geometry,
astronomy, and music); although developed originally in the
context of theology and philosophy, Scholastic models
continued to shape school curricula into the 17th century.
Encyclopedia of Social Measurement, Volume 2 2005, Elsevier Inc. All Rights Reserved.
161
162
Graunt, John
reprintings by 1665, and subsequent editions, translations, excerpts, and summaries over the course of
the next century. Graunts own life and the stated purposes of his work direct us to the contemporary issues
that gave population arithmetic its immediate
importance.
Biographical Note
The fragmentary but coherent record of Graunts life
owes largely to the careful research of David Glass.
The story is at once brilliant and tragic. Graunts formal
education was limited to common schooling, based at the
time on the Scholastic curriculum of logic and rhetoric.
His mathematics was confined to bookkeeping arithmetic,
learned presumably when apprenticed at age 16 to his
fathers London haberdashery business. From these modest beginnings, Graunt rose to be a respected London
figure, holding a number of senior offices in the city council and the merchant community, and gaining a reputation
as a shrewd and fair arbitrator of trade disputes. Graunts
success coincided with the period of the English Commonwealth, the period of religious and civil conflict that
divided the country following the execution of Charles I.
Graunt was a junior officer in Cromwells forces, and
knowledgeable in the dissenting faith of Socinianism,
which denied claims to divine authority made by monarchists and the church, in favor of more egalitarian ideals.
Contemporaries record Graunt as an assiduous scholar,
competent in Latin and French, and articulate and witty
across a range of philosophical, artistic, and economic
topics. His independent learning and reputation in the
city brought Graunt into contact with Royal Society members (who likewise had combined interests in science and
trade), notably William Petty, for whom Graunt was able
to use influence to secure a professorship in the 1650s.
English society by the early 1660s, however, had arrived at an important point of transition. The restoration
of the monarchy in 1660 put an end to the Commonwealth. Religious and political differences, over which
many lives had been lost, remained critical in science
as well as in government, but there could be much tolerance for privately held views. The method of the Observations, as will be discussed later, reflects the ambivalence
of Graunts position at this time. It is noteworthy that his
membership in the Royal Society, almost unique for merchants at the time, was favored expressly by Charles II.
The new kings support would have certainly helped to
alleviate any questions concerning Graunts past. Yet, in
the early 1670s, Graunts life took an abrupt turn.
Converting publicly to Catholicism, he rejected religious
moderation and science, and alienated himself from most
friends and contemporaries. Suffering financial ruin,
Graunt died shortly thereafter in poverty.
Graunt, John
with the vital forces that enable some people to live longer
than others do, and Graunts text, in proposing a natural
arithmetic of life and death, followed logically from this.
Graunt, moreover, encouraged readers to use his numerical evidence and methods to develop observations of
their own. Knowledge of the population of a state is at
base a scientific matter that his method opens to anyone
prepared to follow it. This line of reasoning led Graunt to
conclusions more consistent with his Socinianism than
with his first dedication. The second dedication ends
with a comparison of observation to political representation. Graunt likens the Royal Society to a Parliament of
Nature, saying that even though an observer like himself
may be a mere commoner, he and his observations are
entitled to be represented by the Society. The role of such
a parliament, by implication, is to serve as a forum in
which proportional balances in the body politic may be
discussed openly, so that scientific knowledge of natural
forces can be brought to bear on whether governments
actually pursue just policies that conform with nature.
163
164
Graunt, John
Graunt, John
165
Graunts Influence
The sensitivity of population arithmetic as a matter of state
remained a dominant factor into the later 18th century.
Before the revolutions in France and America, only powerful monarchies such as those in Prussia, Sweden, and
France dared to establish systems of data collection, but
were careful to restrict public access to information. The
Observations, however, opened the door to quantitative
inquiries by individuals, and the intellectual and technical
development of population arithmetic owes to their initiative. Four main developments followed from Graunts
work. Each entailed a significant narrowing of his vision,
but together they laid the foundation for the generalization of population arithmetic in the statistical reforms of
the 19th century.
First, Huygens reinterpretation of Graunts accompt
of age structure initiated the first formal population
model, the life table, subsequently developed by
a distinguished line of mathematicians including Halley,
De Moivre, Euler, and Price. By the early 19th century,
this tradition had given rise to the first large-scale corporate structures based on mathematical models (life insurance), and was able to exercise a major influence on data
requirements of the new national census and registration
systems then being erected in Europe. Many simple proportional measures pioneered by Graunt were reintroduced at this time as key parameters of medical and
social evaluation.
166
Graunt, John
Further Reading
Glass, D. V. (1963). John Graunt and his Natural and Political
Observations. Proc. Royal Soc., Ser. B 159, 2 37.
Graunt, J. (1662). Natural and Political Observations upon the
Bills of Mortality. John Martyn, London.
Hacking, I. (1975). The Emergence of Probability.
Cambridge University Press, Cambridge.
Kreager, P. (1988). New light on Graunt. Pop. Stud. 42,
129 140.
Kreager, P. (1993). Histories of demography. Pop. Stud. 47,
519 539.
Kreager, P. (2002). Death and method: The rhetorical space of
17th-century vital measurement. In The Road to Medical
Statistics (E. Magnello and A. Hardy, eds.), pp. 1 35.
Rodopi, Amsterdam.
Laslett, P. (ed.) (1973). The Earliest Classics: Pioneers of
Demography. Gregg Int., Farnborough, Hants.
Pearson, K. (1978). . The History of Statistics in the 17th and
18th Centuries (E. S. Pearson, ed.). Charles Griffin and
Co., London.
Rusnock, A. (1995). Quantification, precision and accuracy:
Determinations of population in the Ancien Regime. In The
Values of Precision (M. N. Wise, ed.), pp. 17 38. Princeton
University Press, Princeton.
Sutherland, I. (1963). John Graunt: A tercentenary tribute.
J. Royal Statist. Soc., Ser. A 126, 537 556.
Guttman Scaling
George Engelhard, Jr.
Emory University, Atlanta, Georgia, USA
Introduction
Glossary
coefficient of reproducibility Indicator of fit between data
and requirements of a Guttman scale; defined as 1 minus
the ratio of the number of errors predicted by a Guttman
scale divided by the total number of responses; this
coefficient ranges from zero to 1.
item characteristic curve Representation of the probabilistic relationship between a response to an item and person
locations on a latent variable.
items Any set of stimuli intended to provide a structural
framework for observing person responses; synonymous
with tasks, exercises, questions.
latent variable The construct that is being measured or
represented by the scale; also called an attribute.
perfect scale Another name for a Guttman scale.
persons The object of measurement; synonymous with
subjects, participants, examinees, respondents, and individuals; groups or institutions can also be objects of
measurement.
population of objects Target group of persons that a scale is
designed to measure.
scalogram analysis Another name for Guttman scaling.
universe of attributes Target set of items that a scale is
designed to represent; universe of content.
Encyclopedia of Social Measurement, Volume 2 2005, Elsevier Inc. All Rights Reserved.
167
Guttman Scaling
Table I
Another way to illustrate a Guttman scale is to use itemresponse theory. A scale can be represented as a line with
items ordered from easy to hard and persons ordered
from low to high. This line reflects the attribute or latent
variable that defines the construct being measured. Item
characteristic curves can be used to show the relationship
between the latent variable or attribute being measured
and the probability of correctly answering an item.
A Guttman scale is considered a deterministic model because Guttman items yield the distinctive pattern that is
shown in Fig. 1. The x axis in Fig. 1 represents the attribute or latent variable with Items A to D ordered from easy
to hard. The items are rank-ordered and there is no requirement that the x axis have equal intervals. The y axis
represents the probability of responding with a correct
1.00
0.75
P ( x = 1)
168
0.50
0.25
0.00
Items:
Easy
[Pattern]: [0000]
Score:
0
B
C
[1000] [1100]
1
2
D
[1110]
3
Hard
[1111]
4
Person scores
Person scores
4
3
2
1
0
1
1
1
1
0
1
1
1
0
0
1
1
0
0
0
1
0
0
0
0
4
3
3
3
2
2
2
2
2
1
1
1
0
None
1
1
0
1
0
1
0
0
0
0
0
None
None
1
0
1
0
1
0
1
0
1
0
0
None
None
0
1
1
1
1
0
0
1
0
1
0
None
None
1
1
1
0
0
1
1
1
0
0
1
None
Note: Four dichotomous items yield 16 (24 2 2 2 2) response patterns. Items are ordered from easy (Item A) to hard (Item D). There are
5 perfect Guttman patterns and 11 imperfect Guttman patterns.
Guttman Scaling
and
Item B 5 Item C,
Item A 5 Item C:
169
not invariant over groups, then it is not possible to compare them. This concern with item invariance across
groups appears within modern measurement in the
form of differential item functioning.
RepGG
RepGS
170
Guttman Scaling
expyn di
1 expyn di
and
1
,
Prxni 0 j yn , di
1 expyn di
where xni is the observed response from person n on
item i (0 wrong, 1 right), yn is the location of person
n on the latent variable, and di is the difficulty of item
i on the same scale. Once estimates of yn and di are
available, then the probability of each item response and
item-response pattern can be calculated based on the
SLM. Modeldata fit can then be based on the comparison between the observed and expected response
patterns that is conceptually equivalent to othe methods
of evaluating a Guttman scale. If the Guttmans model is
written within this framework, then Guttman items can
be represented as follows:
Prxni 1 j yn , di 1, if yn di
and
Prxni 0 j yn , di 0, if yn 5di :
This reflects the deterministic nature of Guttman items
that was illustrated in Fig. 1.
One of the challenges in evaluating a Guttman scale
is that there are different ways to define errors. Mokken
has pointed out that there may even be errors in the
perfect response patterns. Probabilistic models provide
a more flexible framework for comparing observed to
expected item-response patterns and a less rigid perspective on errors. Guttmans requirement of perfect transitivity appears within the Rasch measurement model as
perfect transitivity in the probabilities of ordered item
responses.
Empirical Example
Table II presents data from Stouffer and Toby that are
used to illustrate the indices for evaluating a Guttman
scale presented in the previous section. Table I provides
the response patterns for 261 persons responding to four
items (A, B, C, and D) designed to measure whether or not
persons tend toward universalistic values or particularistic
values when confronted by four different situations of
role conflict. Persons with higher scores provide more
particularistic responses (reactions to role conflicts are
based on friendship). It should be noted that earlier reanalyses coded positive () and negative () responses
differently. Table I reflects the original Stouffer-Toby
scoring with positive responses () coded 1 and negative
responses () coded 0; the items are reordered from easy
to hard to endorse (e.g., Stouffer-Toby Item 4 is Item
A here) for Form A (Ego faces dilemma).
The four items in the Stouffer-Toby data have the following difficulties (Item A to Item D): 0.21, 0.49, 0.50, and
0.69. Item A is the hardest to endorse, whereas Item D is
the easiest to endorse for these persons. Based on this
difficulty in ordering, the expected patterns for a perfect
scale are shown in column 2 of Table II. Column 3 presents the observed patterns and their assignment to
a particular expected item pattern based on the sum of
the items. For example, the observed pattern [1110] sums
to person score 3 and it is assigned to the expected
item pattern of [1110]; there were 38 persons with this
observed pattern and there are no errors. In contrast, the
Guttman Scaling
171
Frequency
Errors
Error
frequency
1111
1101
1011
0111
0101
0011
20
9
6
2
2
1
0
1
1
1
2
2
0
9
6
2
4
2
0
2
2
2
0
18
12
4
1110
0110
38
7
0
1
0
7
24
25
7
4
2
1
0
2
2
2
2
4
0
50
14
8
4
4
1100
0100
24
6
0
1
0
6
1000
0100
0010
0001
23
6
6
1
0
2
2
2
0
12
12
2
1000
1010
1001
23
25
4
0
1
1
0
25
4
0000
42
k4
n 216
24
140
0000
0010
0001
k4
42
6
1
n 216
0
1
1
13
0
6
1
72
Expected
item
pattern
Item
pattern
(ABCD)
Frequency
Errors
1111
1111
20
1110
1110
1101
1011
0111
38
9
6
2
1100
1100
1010
0110
1001
0101
0011
1000
0000
Person
scores
Guttman-Suchman
Error
frequency
Note: Higher person scores indicate a more particularistic response, whereas lower person scores indicate a more universalistic response. Items are
ordered from easy (Item A) to hard (Item D). Guttman-Goodenough reproducibility coefficient: 1 [140/(216 4)] 0.84. Guttman-Suchman
reproducibility coefficient: l [72/(216 4)] 0.92.
172
Guttman Scaling
Total
Observed
Expected
50
33.5
(Error cell)
17
33.5
67
67.0
Observed
Expected
58
74.5
91
74.5
149
149.0
Total
Observed
Expected
111
111.0
108
108.0
216
216.0
Note: Error cell includes persons who fail (0) the easier item (item A)
and succeed on the harder item (item B).
Observed
error cell
Expected
error cell
H coefficient
17
16
6
38
12
16
105
33.5
32.6
14
52.5
22.5
23.1
178.2
0.41
39
67
70
34
80.1
108.5
108.2
59.6
0.51
0.38
0.35
0.43
Items A to D, respectively, and the estimated person locations on the latent variable are 1.52, 0.06, and 1.54
logits. Estimates of person locations under the Rasch
model are not provided for perfect scores of 0 and 4,
although estimates can be obtained for these scores if
additional assumptions are made about the data. The
item characteristic curves for these four items are
shown in Fig. 2. It is important to note how close
Items B and C are in terms of difficulty. Column 4
gives the probability of observing a perfect item-response
pattern for each person score. For example, a person with
a score of 3 (y 1.51) has an expected response pattern
in the probability metric of 0.97, 0.85, 0.84, and 0.34 for
Items A to D, respectively. It is clear that persons with
scores of 3 are expected to have more than a 50/50 chance
of succeeding on Items A, B, and C, whereas they have
less than a 50/50 chance of succeeding on Item D. This
ordering reflects a probabilistic transitivity for these
items that mirrors the deterministic transitivity of the
perfect item pattern [1110] for a person with a score of
3. Column 5 gives the conditional probability within each
person score group of obtaining the various item-response
patterns. These values were obtained by estimating the
likelihood of each pattern conditional on y and standardizing these values so that the likelihoods sum to 1 within
each score group. As pointed out by Andrich, the conditional probability of observing a Guttman pattern within
a score group is more likely than the other patterns. For
example, the probability of a person with a y of 1.52 (score
of 3) having an observed response pattern of [1110] is
0.830, whereas the probability of this person having an
observed response pattern of [0111] is only 0.013. In order
to follow the theme of error counting, Rasch measurement theory provides an index of person fit that quantifies the difference between the observed and expected
response probabilities. The person fit statistics (standardized residuals) are reported in the last column of
Table V. It is clear that the most unusual patterns have
the highest standardized residuals, with values greater
than 2.00 reported for patterns [0111], [0101], [0011],
and [0001].
Conceptual Contributions to
Measurement
Guttman made quite a few significant conceptual contributions regarding scaling and measurement theory. One
index of his influence is reflected in the strong and
opposing views generated by his work. Cliff viewed
Guttman scales as one of the good ideas in all of measurement, whereas Nunnally cautioned that the intuitive attractiveness of Guttman scaling does not overcome its
fundamental impracticality.
Guttman Scaling
Table V
173
Person
scores
4
3 (y3 1.52)
2 (y2 0.06)
1 (y1 1.54)
Item
patterns
ABCD
Frequency
1111
1110
1101
1011
0111
1100
1010
0110
1001
0101
0011
1000
0100
0010
0001
0000
k4
20
38
9
6
2
24
25
7
4
2
1
23
6
6
1
42
n 216
Expected
response
pattern
Conditional
probability of
response pattern
Person fit
(standardized
residual)
1.000
0.830
0.082
0.075
0.013
0.461
0.408
0.078
0.039
0.007
0.007
0.734
0.136
0.120
0.010
1.000
0.00
0.78
0.44
0.50
2.03
0.63
0.55
0.93
1.51
2.22
2.24
0.58
0.20
0.25
2.20
0.00
Note: Rasch item difficulties are 1.89, 0.20, 0.10, and 2.20 logits for Items A to D respectively. Higher person scores indicate a more
particularistic response, whereas lower person scores indicate a more universalistic response.
1.00
Item B
0.75
P (x = 1)
Item A
Item C
0.50
Item D
0.25
0.00
8
7
Low
8
High
Figure 2 Item characteristic curves for Rasch model (probabilistic model) using Stouffer-Toby data.
174
Guttman Scaling
Further Reading
Andrich, D. A. (1985). An elaboration of Guttman scaling with
Rasch models for measurement. In Sociological Methodology (N. B. Tuma, ed.), pp. 3380. Jossey-Bass, San
Francisco, CA.
Cliff, N. (1983). Evaluating Guttman scales: Some old and new
thoughts. In Principles of Modern Psychological Measurement: A Festschrift for Frederic M. Lord (H. Wainer and
S. Messick, eds.), pp. 283301. Erlbaum, Hillsdale, NJ.
Goodenough, W. H. (1944). A technique for scale analysis.
Educ. Psychol. Measure. 4, 179190.
Guttman, Louis
Shlomit Levy
The Hebrew University of Jerusalem, Jerusalem, Israel
Glossary
categorical mapping A mapping having one set for its
domain (population) and a Cartesian set for its range (items
are the facets of the range).
facet One way of classifying variables according to some rule;
a set that plays the role of a component set of a Cartesian
set.
Guttman scale A perfect scale; a unidimensional scale.
mapping A rule by which elements from one set are assigned
to elements from another set.
monotone regression A relationship in which the replies on
variable x increase in a particular direction as the replies
on variable y increase without assuming that the increase
is exactly according to a straight line. The trend is always in
one direction (upward or downward) with the possibility
that variable y can occasionally stand still.
perfect scale A set of profiles in which there is a one-to-one
correspondence between the profiles and the scale ranks;
a Guttman scale.
stem facet A content facet that directly modifies the name of
the range of a structioned mapping sentence, but does not
modify any other facet.
structioned mapping sentence A mapping having two
major varieties of facets for its domain (population and
content) and one facet (or few) for the range.
structuple A profile.
theory is cited in 1971 in Science, one of 62 major advances in the social sciences from 1900 to 1965. In the
1960s, Guttman started another wave of new developments in intrinsic data analysis and facet theory. These
topics constituted his major teaching and research activities at the Hebrew University in Jerusalem and at the
Israel Institute of Applied Social Research. In particular,
he developed new substantive theories concerning structural lawfulness for social attitudes, intelligence tests, and
other aspects of human behavior.
Biographical Highlights
Louis Guttman was born in Brooklyn, New York, on February 10, 1916, to Russian immigrant parents. When he
was 3 years old the family moved to Minneapolis where he
completed his formal education. His father was a selftaught amateur mathematician who published several
papers in the American Mathematical Monthly.
Although skilled in mathematics he decided to major in
sociology. His studies also included courses in psychology
equivalent to a major in the field. Upon realizing the
importance of statistics to social research, he returned
to his study of mathematics, of which he had a solid knowledge. This led him to formalizewhile still a graduate
studenttechniques for data analysis, some of which
were published in 1941 and constitute the foundations
of his later work on scale theory, factor analysis, and other
topics (his first publication was in 1938 in Regression
Analysis). Guttman attained his B.A. (1936), M.A.
(1939), and Ph.D. (1942) degrees in sociology from the
University of Minnesota. Guttmans doctoral dissertation
constituted original formulations to the algebra of matrices in general and to the algebra of factor analysis in
particular.
Encyclopedia of Social Measurement, Volume 2 2005, Elsevier Inc. All Rights Reserved.
175
176
Guttman, Louis
Guttman, Louis
Scientific Societies
Guttman participated in a diversity of professional organizations. He was a member of the Israel Academy of
177
Awards
Awards and honors bestowed on Guttman during his career include fellow at the Center for Advanced Study in
the Behavioral Sciences (19551956), the Rothschild
Prize for Social Science (1963), Outstanding Achievement Award from the University of Minnesota (1974),
the Israel Prize in the Social Sciences (1978), the Educational Testing Service Award for Distinguished Service
to Measurement (1984), and the Helen Dinermann
Award from the World Association of Public Opinion
Research (1988, posthumously).
Scale Theory
Introduction
The overall concept that underlies Guttmans work is
structural theory, with the universe of content to be
studied as the starting point. Very early, he realized
that focusing on concepts and classifying content must
precede data analysis; that is, content alone defines the
universe and not data analysis. It is the empirical structure
of the universe of content that is at issue.
For qualitative data, the first complete example of
structural theory was that of a perfect scale of a class
of attitudes (or attributes), developed in the U.S. Army
during World War II.
178
Guttman, Louis
3
y
2
1
0
Y2
Y3
Figure 2
Principal Components of
Scalable Attitudes
In 1954, scale theory expanded in the direction of developing substantive psychological and sociological theories
for principal components of scalable attitudes. The first
principal component is a monotone function of the scale
ranks. The second, third, and fourth components are expected to have certain polytone regressions on the ranking
of the population on the attitude: The second (intensity) is
a U-shaped function, the third component (closure) is an
N-shaped function with two bending points, and the
fourth component (involvement) is an M- or W-shaped
function with three bending points. Guttman hypothesized that certain types of personal norms for involvement
may often serve as direction finders for predicting when to
expect a W- or an M-shaped function.
The principal components have proven indispensable
for properly interpreting survey results and for making predictions, especially by marking the proper zero point of the
scale. They also illuminate the problem of attitude change.
Definition of Theory
The point of departure of facet theory is Guttmans definition of the concept of theory itself: A theory is an
Guttman, Louis
ABC . . . N
Population
Content facets
Domain
R
Range
179
through
orally
by
the
tester
in
B
1. oral
2. manual manipulation
3. paper and pencil
C
1. verbal
2. numerical
3. geometrical
expression on an
item presented
format
R
high
to
low
Correctness.
180
Guttman, Louis
Sampling of Items
The number of ordinary sentences derivable from the
mapping sentence presented in the previous section is
27 (3 3 3 27). These serve as guides for actual
item construction, specifying both the similarities and
differences among the designed items. Each test is defined by the mapping sentence in a manner independent
of its specific formulation. Consider, for example, the
faceted definition a2b1c1: The task imposed on the testee
is application (a2) of a rule expressed orally (b1) in a verbal
format (c1). A possible test item constructed according to
this structuple is: Who is the president of the United
States? Similarly, a possible test item constructed according to the structuple a3b1c2 can read as follows: I
will now cite loudly some numbers. Listen carefully, and
immediately after I finish, repeat the numbers. Namely,
the task imposed on the testee is learning (a3) expressed
orally (b1) in a numerical (c2) format. Many other items
can be constructed for each of these two structuples. The
same holds for all the structuples derivable from the faceted definition. The Wechsler Manual specifies only 12
subtests, and yet these allow for structural lawfulness
because each element of each facet is represented in at
least one of the subtests.
The number of derivable sentences from a structioned
mapping may be very large, depending on the number of
facets and the number of elements within each facet. Although generally there is no probability distribution for
a facet design of content, in each case a small sample of
items that will nevertheless suffice to yield the essential
information about the facets can be systematically
constructed. The actual item construction has to conform
to the research topic, which may result in placing different emphases on certain facets and, within facets, on
certain elements. A structioned mapping sentence can
also be used as a culling rule for a universe of content
that already exists, telling us how to choose items from
among existing ones.
Strategies of Modification
Formal definitions are necessary for scientific progress.
Though striving for formality by its formal facets, the
structioned mapping sentence enables the use of fruitful
strategies for systematic theory development because it
lends itself easily to correction, deletion, submission, extension (adding elements to a facet), and intension (adding content facets), based on cumulative research.
is ordered from
logical
scientific (factual)
semantic
very right
to
very wrong
objective rule,
and its range
This definition encompasses all approaches to intelligence, known in this field. It was published simultaneously
Guttman, Louis
Guttmans mapping definition of the universe of attitude items followed by the First Law of Attitude (Positive
Monotonicity Law) is as follows:
modality
toward an
very positive
to
very negative
towards
that
object
181
182
Guttman, Louis
intrinsic data analysis methods, especially similarity structure analysis (SSA; previously called smallest space analysis). In this technique, as well as in the other techniques
developed by Guttman, the data are treated intrinsically in
terms of inequalities, needing no explicit prespecified
model.
Similarity Structure Analysis
SSA is a technique for viewing a similarity (correlation)
coefficient matrix. It is an intrinsic data analysis technique
with an emphasis on looking at regions in the space of
variables rather than on coordinate systems.
The Symmetric Case The symmetric case (SSA-I) is
the first of the Guttman-Lingoes series designed for analyzing symmetric matrices. SSA treats each variable as
a point in a Euclidean space in such a way that the
higher the correlation between two variables, the closer
they are in the space. The space used is of the smallest
dimensionality that allows such an inverse relationship
between all the pairs of observed correlations and the
geometric distances. Reference axes are not in the general
definition of SSA. (The empirical data to be analyzed are
not limited to coefficients of similarity. They can also be
dissimilarity coefficients, such as geographical distances.
In such a case the monotonicity condition becomes as
follows: the smaller the dissimilarity coefficients between
two variables, the closer their points are in the space.)
Only the relative sizes of coefficients and the relative
distances are of concern. The goodness of fit between
the observed coefficients and the geometrical distances
is assessed by the coefficient of alienation, which varies
between 0 and 1, where 0 designates a perfect fit. The
goodness of fit is also expressed graphically by plotting the
input coefficients versus the computed distances. This
scattergram is called shepard diagram, and it actually
portrays the metric nature of the implied monotone function. The less the spread around the negative monotone
regression, the better the fit. To put it in another way, the
shepard diagram is a graphical representation of the coefficient of alienation. Any such coefficient is blind to
content considerations and, hence, alone is inadequate
for cumulative science. There is always a need for
a partnership with some content theory in determining
the shape of the space. Lawfulness regarding sizes of
correlations has been established largely in terms of
regions of content of the SSA space.
The Asymmetric Case The Guttman-Lingoes SSA
series also treats asymmetric matrices (SSA-II). In this
program, each item is presented as a point in a Euclidean
space, in such a manner that, for any three variables,
the two most dissimilar will be farthest apart. Namely,
distances are produced within rows or within columns,
but not between rows and columns.
Guttman, Louis
183
Verbal
Geometrical
Numerical
5 1 4
2
Inference
Application
Learning
Oral
Manual
manipulation
8
10
9
11
Paper
& pencil
12
Figure 3 Schematic representation of the cylindrical structure of the Revised Wechsler Intelligence Tests for Children.
Reprinted from Intelligence, Vol. 15, L. Guttman and S. Levy,
Two Structural Laws of Intelligence Tests, pp. 79103, Copyright 1991, with permission from Elsevier Science.
184
Guttman, Louis
Guttman, Louis
II
III
2
1
1
1
1
1
1
2
IV
V
VI
2
2
1
1
1
2
2
1
1
1
2
2
VII
1 1 1 1 (I)
(II) 2 1 1 1
(IV) 2 1 2 1
1 1 1 2 (III)
2 1 1 2 (V)
1 2 1 2 (VI)
2 2 2 2 (VII)
185
186
Guttman, Louis
direction (x y). Hence, in the case of a perfect (unidimensional) scale, all the structuples have their points on
a line with positive slope. In contrast, two noncomparable
profiles have their points on a line with a negative slope,
that is in the lateral direction (x y). All four kinds of
directions in the 2-space (x, y, joint and lateral) have a role
in interpreting the results. These directions are presented schematically in Fig. 5.
The program provides a coefficient for the goodness of
fit for the representation of the partial order, named
CORREP. It specifies the proportion of structuple
pairs correctly represented by POSAC. Three CORREP
coefficients are specified, for the entire partial order for
the comparable structuples and for the noncomparable
structuples.
As already stated, such coefficients alone are not sufficient for determining the shape of the space. A detailed
analysis of the systematic differences among the items is
made in terms of n POSAC diagrams, one for each item.
These are called item diagrams.
The concept of regionality plays a basic role also in the
spaces of structuples. The role of the items (which are the
facets of the categorical mapping) is to partition the space
of the subjects into exclusive and exhaustive regions ordered according to their categories in one or more of the
directions of the POSAC space (Fig. 6). The cell (or point)
High
2121
2222
ra
te
La
Jo
in
1111
Low
Low
1212
High
High
Low
Low
High
Base (x)
Base (y)
Joint (x + y)
Lateral (x y)
Figure 6 Schematic presentation of item diagrams partitioning the POSAC space into regions by
item categories, corresponding to the directions of the POSAC.
Guttman, Louis
Further Reading
Borg, I., and Lingoes, J. (1987). Multidimensional Similarity
Structure Analysis. Springer Verlag, New York.
Borg, I., and Shye, S. (1995). Facet Theory Format and
Content. Sage, Thousand Oaks, CA.
Canter, D. (ed.) (1985). Facet Theory: Approaches to Social
Research. Springer Verlag, New York.
Elizur, D. (1987). Systematic Job Evaluation and Comparable
Worth. Gower, Aldershot, UK.
Gratch, H. (ed.) (1973). Twenty-Five Years of Social Research
in Israel. Jerusalem Academic Press, Jerusalem.
Guttman, L. (1941). The quantification of a class of
attributesA theory and method of scale construction. In
187
188
Guttman, Louis
Levy, S. (ed.) (1994). Louis Guttman on Theory and Methodology: Selected Writings. Dartmouth, Aldershot, UK.
Levy, S., and Amar, L. (2002). Processing square-asymmetric
matrices via the intrinsic data analysis technique WSSA:
A new outlook on sociometric issues [CD-ROM]. Social
Science Methodology in the New Millennium (J. Blasius,
J. Hox, E. de Leeuw, and P. Schmidt, eds.). Leske and
Budrich, Opladen, Germany.
Levy, S., and Guttman, L. (1975). On the multivariate
structure of well-being. Soc. Indicators Res. 2, 361388.
Levy, S., and Guttman, L. (1985). The partial order of severity
of thyroid cancer with the prognosis of survival. In Data
Analysis in Real Life Environment: Ins and outs of Solving
Problems (J. F. Marcotorchino, J. M. Proth, and J. Janssen,
eds.), pp. 111119. Elsevier Science, North Holland.
Levy, S., and Guttman, L. (1989). The conical structure of
adjustive behavior. Soc. Indicators Res. 21, 455479.
Half-Life Method
Arthur M. Schneiderman
Independent Consultant, Boxford, Massachusetts, USA
Glossary
balanced scorecard (BSC) A deployment approach for
linking strategy to action.
experience curve A normative model of cost reduction based
on the learning curve.
half-life method A normative model for predicting rates of
incremental improvement for processes of different complexity.
incremental improvement Continuous process improvement achieved by breaking a big problem into many easily
solvable little pieces.
organizational complexity The degree to which activities or
decisions within a process confront conflicting demands or
considerations caused by the way that the participants are
organized.
outsourcing Moving an internal process or process step to an
independent external source.
PDCA (Plan-Do-Check-Act) cycle The Shewhart/Deming
cycle used in improvement activities.
quality circle A supervisor and his or her direct reports who
meet regularly to improve the process that they execute.
redesign To fundamentally change the technology employed
by or the reporting relationships of the participants in
a process.
systematic problem-solving method A documented stepby-step improvement process based on scientific methodology, for example the 7-step method.
SDCA (Standard-Do-Check-Act) cycle The cycle that
a worker follows in doing his or her daily job.
technical complexity The degree to which employed technology is understood and mastered by its users.
total quality management (TQM) An evolving set of tools
and methods used as part of a companywide focus on
continuous improvement.
The half-life method is a normative model for predicting the rate of incremental improvement of nonfinancial
performance measures. It can be used both as a diagnostic
tool and to set goals. Because it identifies the fastest rate at
which a measure can be improved using current incremental improvement tools and methods, it is particularly
valuable in testing the achievability of strategically significant balanced-scorecard (BSC) goals without major
process redesign (reengineering) or outsourcing.
Introduction
Consider the following situation. You have identified the
vital few things that you need to do in order to achieve
your strategic objectives, developed actionable metrics,
and determined appropriate time-based goals for each of
them. Now the question is: Can you achieve those goals in
a timely way and thereby feel confident that you have done
everything that you possible can in order to assure your
strategic success?
Today many organizations have implemented some
form of a balanced scorecard (BSC) that promises to
help them translate their strategy into action. But from
my experience, only a very small fraction of them have
thoughtfully addressed the question of meaningful goals
for each of their scorecard measuresgoals whose
achievement will really make a strategic difference.
An even smaller percentage of them have addressed
the issue of which methodology or approach they will
use in order to achieve these goals. That exercise is generally left to those who will be held responsible for their
achievement. There do not appear to be any rigorous
impartial studies that provide evidence in conflict with
my observations.
Encyclopedia of Social Measurement, Volume 2 2005, Elsevier Inc. All Rights Reserved.
189
Half-Life Method
190
0
Today
Time
Half-Life Method
and improved limits. In fact, actual performanceimprovement histories after process reengineering have
a strikingly similar appearance to Fig. 1, but hopefully
with a lower asymptote.
To make the economically rational decision whether
to improve, reengineer, or outsource, we need some
way to predict the expected performance improvement
over time.
Serendipity
In 1984, on a flight returning from a study mission to
Japan, I made a remarkable discovery. Like many such
discoveries, this one was also serendipitous. I was reviewing materials from a presentation given by Kenzo
Sasaoka, then president of Yokagowa Hewlett-Packard
(YHP). YHP had recently won the coveted Deming
Prize. One display (see Fig. 2) showed the multiyear
efforts of a quality circle working on the reduction of
soldering defects in the manufacture of printed circuit
boards.
Starting at a defect level of 0.4%, which corresponded
on average to two defects on each board produced, the
team systematically reduced the defect level to near
zeroor so it appeared. Each of the pointing fingers in
the figure represented the implementation of a major
corrective action resulting from one cycle of their improvement process. Recognizing that their near perfection was a graphical illusion, the team generated the
second graph, which measured the defect level in the
Counter action I
0.4
0.4%
0.3
0.2
Counter action II
0.1
ppm
Omit hand
rework process
11 1
4 7
78
Revision of manufacturing
engineering standards
Basic working
guide manual PC board design
instructions
20
3 ppm
10
Masking method
improvement
30
40 ppm
40
0
10 1
4 7
79
10
10 1
4 7
80
191
10 1
4 7
81
10 1
4 7
82
10 months
FY
Figure 2 Process quality improvement. Reduction of soldering defects in a dip soldering process.
192
Half-Life Method
10,000
1000
0.01
100
0.001
10
0.0001
ppm
0.1
1
0
12
24
36
Months
48
60
Half-Life Math
A mathematically inclined reader will immediately recognize that the path of improvement shown in Figs. 13
follows an exponential curve described by the formula:
m mmin m0 mmin eatt0 =t1=2
where:
m current value of the scorecard metric
mmin minimum possible value of m
m0 initial value of m at t0
t current time
t0 initial time
t1=2 half-life constant
a ln 2 0:7
Half-Life Method
The ins:
innovation
insight
invention,
inspiration,
intuition,
instinct
Gut feeling
Prayer
Clairvoyance
Other world
Inner
self
Problem
Others
Real world
Figure 4
Copy:
imitation,
theft,
consultation
Avoidance:
delegation,
negotiation,
just waiting,
deceiving,
hiding,
lying,
passing-the-buck,
confusing,
finger pointing,
transferring
Identification of
theme
Data collection
and analysis
Causal analysis
Solution planning
and implementation
193
What
Plan
Why
Who
Do
When
Where
How
Check
Failed
5
Evaluation of
results
Worked
Standardization
Reflection and
next problem
Act
194
Half-Life Method
Standard
Plan
Organizational complexity
Step 6
High
14
18
22
Med
11
Low
Do
SDCA
Act
Check
Purpose : Daily control
Process : e.g.: 11-step method
Metric : Control limits, Cpk
PDCA
Do
Check
Improvement
e.g.: 7-step method
Half-life
Med
High
Technical complexity
Figure 6
Half-life matrix.
Half-Life Method
(% improvement / cycle) / (months / cycle) = % improvement / month
40% / cycle
1 cycle / 4 months
Plan
or
Act
PDCA
10% / month
50% / 5 months
Do
5-month half-life
Check
195
196
Half-Life Method
Further Reading
Kaplan, R. S. (1991). Analog devices: The half-life system.
(Harvard Business School case number 9-190-061, 3/16/90;
revised 7/12/91.). In The Design of Cost Management
Systems (R. Cooper and R. S. Kaplan, eds.), pp. 226239.
Prentice Hall, Englewood Cliffs, NJ.
Schneiderman, A. M. (1986). Optimum quality costs and zero
defects: Are they contradictory concepts? Quality Prog.
(November), 28.
Schneiderman, A. M. (1988). Setting quality goals. Quality
Prog. (April), 51.
Schneiderman, A. M. (1998). Are there limits to TQM?
Strategy Business (11), 35.
Schneiderman, A. M. (2001). How to build a balanced
scorecard. In Handbook of Performance Measurement
(Mike Bourne, ed.) Gee Publishing, London.
Schneiderman, A. M. http://www.schneiderman.com
Hazards Measurement
Susan L. Cutter
University of South Carolina, Columbia, South Carolina, USA
Glossary
disaster A singular hazard event that results in widespread
human losses or has profound impacts on local environments.
discharge The quantity of water flowing past a point on
a stream or river per some unit time.
exposure models Statistical or analytical models that delineate the probability of risk, its source, the type of risk, and
the geographic extent of danger.
hazard The potential threat to humans (risk) as well as the
impact of an event on society and the environment.
intensity The measure of event severity based on subjective
human experience of it.
magnitude The strength or force of a hazard event.
recurrence intervals The time between events of a given
magnitude or the magnitude range for a specific location.
risk The likelihood or probability of occurrence of a hazard
event.
temporal spacing The sequencing and seasonality of hazard
events.
vulnerability The potential for loss or the capacity to suffer
harm from a hazard event.
Hazard Identification
The identification of hazards poses real challenges because we discover new sources of threats on a daily
basis. For example, the rediscovery of anthrax as
a biological hazard came to light only after the terrorist
attacks of September 11, 2001, despite its longevity as
a known hazard to the agricultural community. Likewise,
a recent federal survey of streams found trace amounts of
antibiotics, steroidal compounds, nonprescription drugs,
and disinfectants, all of which are found in consumer
products. This prompted the Food and Drug Administration (which approves these products for human and
animal use) to begin thinking about the adverse environmental impact of these substances after they have been
excreted from the body (human and animal), thus identifying yet another hazard in the environment.
Two methods are primarily used to identify risks
and hazards: quantitative risk assessment, a procedure
used to identify human health risks from involuntary
exposures to hazardous substances; and environmental indicators, which monitor ecosystem diversity and
Encyclopedia of Social Measurement, Volume 2 2005, Elsevier Inc. All Rights Reserved.
197
198
Hazards Measurement
Table I
Causal agent
Natural hazards
Hydrologic
Atmospheric
Seismic
Geomorphic
Other
Biological agents
Epidemics
Infestations
Other
Social disruptions
Civil disorders
Terrorism
Warfare
Technological
Extreme Failures
Common Occurrences
Chronic/globally catastrophic hazards
Example
Drought, floods, flash floods
Hurricanes (cyclones), tropical storms, tornadoes, severe storms,
temperature extremes, lightning
Earthquakes, volcanic eruptions, tsunamis
Landslides, mass movements, avalanches, soil subsidence, rockfalls
Wildfires
Influenza, cholera, AIDS
Locusts, termites, bees, grasshoppers
Bioengineered substances, bioterrorism
Ethnic violence, urban riots
Bombings, chemical/biological weapons, hijackings
Conventional war, weapons of mass destruction
Nuclear accidents, dam failures, industrial explosions
Hazardous materials spills, chemical accidents, oil spills
Pollution, environmental degradation, famine, nuclear war, global
environmental change, natural resources depletion
Source: J. T. Mitchell and S. L. Cutter, 1997. Global Change and Environmental Hazards: Is the World Becoming More Disastrous?, in
Hands On! An Active Learning Module on the Human Dimensions of Global Change. Washington D.C.: Association of American
Geographers.
health. Traditionally, human health risks have been identified through disease clusters (a large number of outbreaks of illnesses in one location that exceed the
random probability of such events), such as the cancer
clusters found in Love Canal, NY, or Woburn, MA; epidemiological surveys that involve detailed field surveys to
determine the linkage between an environmental contaminant and human disease, such as John Snows nineteenth-century classic study of cholera in London; and
bioassay data, the foundation of toxicological research,
in which chemical exposures in laboratory animals determine dose-response relationships that are then extrapolated to human populations.
For environmental hazards, the most often used
methods for hazard identification are surveillance and
monitoring. Documenting variability in spatial and temporal patterns of environmental indicators is a first step
in identifying environmental hazards. The real-time
detection of seismic events anywhere in the world is now
possible through the global seismic network. The development of WSR-88D Doppler radar has improved our
ability to forecast and track the path of tornadoes, for
example, whereas light airborne detection and ranging
(LIDAR) systems are able to predict the potential extent
of flooding based on digital terrain models or establish the
height of the debris pile from the collapse of New York
Citys World Trade Center. Improvements in sensor technology and the satellites have resulted in better spatial
resolution (1 1 meter) than in the past, thereby improving the remote monitoring of floods, wildfires, volcanic
eruptions, snow cover, hurricanes and tropical storms,
chemical releases, and nuclear accidents, among others.
These systems have been instrumental in developing better forecasting information, resulting in fewer injuries
and loss of life from environmental hazards.
Hazards Measurement
Magnitude
The sheer strength or force of an event is one characteristic that can be used to compare hazards. For
earthquakes, the Richter scale, developed in 1935, provides a measure of the seismic energy released (in ergs)
from an earthquake, using a logarithmic scale. The magnitude of an earthquake increases 10-fold from one
Richter number to the next, so that an earthquake measured as 7.2 on the Richter scale produces 10 times more
ground motion than a magnitude 6.2 earthquake, but it
releases nearly 32 times more energy. The energy release
best indicates the destructive power of an earthquake.
The Saffir-Simpson hurricane scale (ranging from category 1 to 5) is a measure of hurricane strength based
on maximum sustained winds. In the case of floods, the
magnitude is simply quantified as the maximum height of
floodwaters above some base elevation (mean sea level,
flood stage, or above ground). Snow avalanches also have
a magnitude scale ranging from 1 to 5, based on avalanche
size relative to the avalanche path size. Finally, the volcanic explosivity index (VEI) provides a relative measure
of the explosiveness on a 0 8 scale based on the volume of
materials (tephra) released, eruption cloud heights, explosive energy, and the distance traveled by the ejecta.
For example, the 1980 eruption of Mt. St. Helens is rated
5 on the VEI, whereas the 1815 Tambora, Indonesia,
eruption that killed more than 92,000 people is rated 7
on the VEI.
199
Frequency
This characteristic details how often an event of a given
magnitude or intensity occurs. More often than not, the
frequency of events is based on qualitative judgments of
rare or frequent. More quantitative assessments are available in the form of recurrence intervals or frequency of
occurrence (the number of events per number of years in
the period of record) metrics. Flood frequency curves are
generated based on the available history of discharge for
a river or river stretch. For regulatory purposes involving
floodplain management, a commonly used indicator for
flood events is the 100-year flood. This does not mean an
expectation of one large flood every 100 years (a common
misperception) but rather signifies a 1% chance of a flood
with this specific discharge occurring in any given year.
Snow avalanche return intervals are common as well,
but these are normally based on the age of trees on the
avalanche path rather than some arbitrary time frame
such as 100 years.
Intensity
In addition to magnitude based solely on physical characteristics, intensity offers a metric to gauge the severity of
an event based on the subjective human experience of it.
The modified Mercalli scale (class I to XII) is a measure
of the effects at a particular location. On the modified
Mercalli scale, the intensity of earthquakes is based on
magnitude, distance from the epicenter, building construction, and local geology, all of which contribute to
the damage of structures and human experiences (e.g.,
whether they are felt by people or not). The SaffirSimpson hurricane scale also has an intensity element
within it, offering qualitative descriptions of probable
property damage and potential flood levels based on
anticipated storm surges. The Fujita scale of tornado intensity is another index used to measure hazard
severity. Based on postevent structural-damage assessments, tornadoes are classified (with approximate wind
speeds) as F0 (minimal tornado) to F5 (catastrophic and
totally devastating). In the case of drought, there are quite
a few drought intensity indices. The most well known in
Duration
Another temporal dimension of environmental hazards
describes how long the event lasts. Some hazard events
have a very short duration measured in seconds to minutes
(earthquakes), whereas others are prolonged events that
can last years to decades (droughts). There are no specific
scales or indices that depict duration.
Speed of Onset
The speed of onset provides a measure of the length of
time between the first appearance of an event and its
peak. Hazards are often described as rapid-onset hazards
(earthquakes and tornadoes) when they offer little or no
opportunity for warnings to get people out of harms way
once the impending signals are received or slow-onset
hazards (soil erosion and drought), which take a year to
decades to fully develop from their initial appearance. Of
course, there is the occasional extreme drought year such
as the 1976 1977 drought in California, the Great Plains,
200
Hazards Measurement
Temporal Spacing
Some hazards events are quite random in their timing,
whereas others have a distinct seasonality (hurricanes and
blizzards) to them. The designation of hazard seasons
(June 1 November 30 for Atlantic basin hurricanes,
April November for tornadoes, winter for blizzards,
and summer for heat) assists in the emergency preparedness, planning, and management of these hazards.
Areal Extent
This is a measure of how much of the geographic area was
affected by the hazard event and is usually described as
large or small. It should not be confused with hazard
zones, which are clearly demarcated areas that are high
risk such as floodplains or barrier islands. Disasters normally have large areal extents, whereas a small tornado or
hazardous material spill has a more localized impact (one
house or one segment of a road). There are no consistent
scales that empirically define the areal extent of impacts
from hazards.
Spatial Dispersion
Hazards are also described and compared based on their
spatial distribution. For example, at a national scale we
can see some cluster of tornado touchdowns in particular
regions (such as the so-called Tornado Alley), yet when
we examine the distributions within this region, they appear to be more random. Tornadoes at smaller spatial
scales may relate more to the nuances of physiography,
whereas at larger scales atmospheric circulation and air
masses are more likely to dictate the spatial dispersion of
tornadoes. Spatial concepts (dispersion, contiguity, density, and concentration) and geographic-scale differences
are useful in examining hazard events because they provide additional information on the distribution of those
types of events in a given area.
Nature of Exposure
The final characteristic of hazards is less indicative of the
physical characteristics of the hazard, but more oriented
toward how individuals and society respond to them. Primarily using the criterion voluntary or involuntary, this
dimension differentiates hazards in which there is some
degree of human intervention in the level of exposure,
either through locational choices (residing in a known
floodplain or on a coastal barrier island), voluntary participation in certain risky activities (sky diving, smoking,
and scuba diving), or consumer preferences for products
Hazards Measurement
Atmospheric Prediction System (NOGAPS) and Geophysical Fluid Dynamics Laboratory (GFDL), improvements in storm path designations have been made. At
the same time, enhancements of some of the cyclone
intensity models have resulted in better estimations of
the wind field strengths in advance of the storm.
Tornado Risk
Tornado risks were first identified visually by people who
happened to see a funnel cloud and called the local
weather station or by designated tornado spotters
whose job was to go out and watch the sky. With the
development of Doppler radar, detection improved
with the ability to identify rotation and air velocity in
parent storms (mesocyclone signatures) that spawn
tornadoes as well as the distinct hook-shaped feature
on the radar. These improvements have increased our
warning times from less than 5 minutes 20 years ago to
more than 10 minutes today.
Flood Events
Flood risk models require precipitation inputs and runoff
estimates. Working in tandem, both the U.S. Geological
Survey (USGS) and the U.S. Weather Service provide
the relevant exposure data for flood events. There are
a variety of runoff-rainfall models that are used for
flood hazards such as the USGSs Distributed Routing
Rainfall-Runoff model (DR3M). The most widely used
flood damage model is the U.S. Army Corps of Engineerss Hydrologic Engineering Centers Flood Damage
Assessment model (HEC-FDA). Finally, the National
Weather Service has a couple of models that examine
the potentially affected downstream areas from floods
caused by dam failures (DAMBRK and BREACH).
Nonpoint Source Pollution
Developed by the U.S. Department of Agriculture, the
Agricultural Nonpoint Source (AGNPS) pollution suite of
models is a widely used tool to predict soil erosion and
nutrient loadings from agricultural watersheds. This
model captures runoff, sediment, and nutrient transport
(especially nitrogen and phosphorus) inputs in trying to
assess surface water quality.
Seismic Risk
Seismic risks are determined either through mapping of
known surface and/or subsurface faults and by estimates
of ground motion. Both approaches employ historical analogs. Probability approaches that include some indicator
of the likelihood of an event or the exceedence of some
ground-shaking threshold are the mostly commonly used.
The Federal Emergency Management Agency (FEMA)
uses effective peak acceleration and effective peak velocity measures to delineate seismic risks.
201
202
Hazards Measurement
Data Caveats
There are many federal, state, and local agencies as well as
the private-sector organizations that collect and disseminate data on hazard events and losses. The methods used
vary widely, as do the definitions of losses. The temporal
and geographic coverage is equally variable, rendering
much of these data incompatible with one another. For
example, FEMA compiles statistics on presidential disaster declarations, but does not archive hazard event or
economic loss data that do not qualify for that designation.
Similarly, flood event and loss data are compiled by three
different federal agencies (USGS, National Weather Service, and the U.S. Army Corps of Engineers) for very
different purposes, yet are rarely shared or reconciled.
Natural hazards are cataloged differently than technological hazards. Private insurance companies compile insured loss data, but what about uninsured losses?
Access to data is limited (especially in the private sector)
and is increasingly becoming more so. Along with issues of
accuracy, precision, incompatibility, and dissemination
come concern about data maintenance and archiving.
It is often not in the purview of a mission agency to
warehouse historical data on hazard events. The result
Further Reading
Cutter, S. L. (1993). Living with Risk. Edward Arnold,
London.
Cutter, S. L. (ed.) (2001). American Hazardscapes: The
Regionalization of Hazards and Disasters. Joseph Henry
Press, Washington, DC.
Federal Emergency Management Agency (FEMA). (1997).
Multi Hazard Identification and Risk Assessment. Government Printing Office, Washington, DC.
Heinz Center for Science, Economic, and the Environment.
(2000). The Hidden Costs of Coastal Hazards: Implications for Risk Assessment and Mitigation. Island Press,
Covello, CA.
Jensen, J. R. (2000). Remote Sensing of the Environment: An
Earth Resource Perspective. Prentice-Hall, Upper Saddle
River, NJ.
Kunreuther, H., and Roth, R. J., Sr. (eds.) (1998). Paying the
Price: The Status and Role of Insurance against Natural
Disasters in the United States. Joseph Henry Press,
Washington, DC.
Monmonier, M. (1997). Cartographies of Danger: Mapping
Hazards in America. University of Chicago Press,
Chicago, IL.
Mitchell, J. T., and Cutter, S. L. (1997). Global change and
environmental hazards: Is the world becoming more
disastrous? Hands On! An Active Learning Module on the
Human Dimensions of Global Change (ed.) Association of
American Geographers, Washington, DC.
National Research Council. (1999). Reducing Disaster Losses
through Better Information. National Academy Press,
Washington, DC.
National Research Council. (1999). The Impacts of Natural
Disasters: Framework for Loss Estimation. National Academy Press, Washington, DC.
National Research Council. (2000). Ecological Indicators for
the Nation. National Academy Press, Washington, DC.
National Research Council. (2000). Risk Analysis and
Uncertainty in Flood Damage Reduction Studies. National
Academy Press, Washington, DC.
Platt, R. H. (1999). Disasters and Democracy. The Politics of
Extreme Natural Events. Island Press, Washington, DC.
White, G. F. (1994). A perspective on reducing losses from
natural hazards. Bull. Am. Meteorol. Soc. 75, 1237 1240.
Heuristics
Michael D. Mumford
University of Oklahoma, Norman, Oklahoma, USA
Lyle E. Leritz
University of Oklahoma, Norman, Oklahoma, USA
Glossary
correlation A measure of the strength of the relationship
between two measures.
divergent thinking tests Open-ended problem-solving tasks
in which people are asked to generate multiple solutions.
domain An area of work or task performance involving
a shared body of knowledge and skills.
heuristics Strategies for executing the processing of operations involved in complex problem solving.
ill-defined problems Situations in which the goals are unclear
and multiple paths to problem solutions are available.
problem construction The cognitive process involved in
structuring, or defining, the problem to be solved.
processes Major cognitive operations that must be executed
during problem solving.
scenario A written description of a problem and its setting.
think aloud A procedure for analyzing problem solving based
on verbalizations during task performance.
Introduction
History
The term heuristics was first applied in the social sciences
some 50 years ago. Initially, this term was used to refer to
the strategies people employed to reduce the cognitive demand associated with certain decision-making tasks. These
strategies involved, for example, satisficing, which refers
to peoples tendency to use readily available representations
as a basis for framing decision tasks. Means-end analysis
was the term coined to describe a strategy whereby people
work backward from a given goal using trial and error to
identify the operations needed for problem solving.
As interest grew in problem-solving, planning, and
decision-making on the complex, ill-defined cognitive
tasks encountered in the real-world (such as planning
the Olympic games, creating a new aircraft, or selecting
an investment portfolio), it became apparent that multiple
solution paths exist that might lead to successful performance. Alternative solution paths, often paths in which
simplification proves useful, allow for multiple alternative
strategies that might contribute to performance. Accordingly, the concept of heuristics was expanded, and the term
is now commonly used to describe both effective and
ineffective strategies people apply in executing complex
cognitive processing operations, although some scholars
prefer to limit use of this term to strategies that simplify
complex cognitive operations.
Encyclopedia of Social Measurement, Volume 2 2005, Elsevier Inc. All Rights Reserved.
203
204
Heuristics
Objective
Despite the theoretical importance of heuristics, the practical implications of studies of heuristics beg a question:
How is it possible to go about identifying relevant heuristics and measuring their application? The intent here is
to examine the relative strengths and weaknesses of
the various approaches that have been used to identify,
and measure, the heuristics people apply to tasks calling
for complex cognitive processing activities. More specifically, three general approaches that have been applied
are examined: observational, experimental, and psychometric. In examining the methods applied in each of these
three approaches, there is no attempt to provide
a comprehensive review of all pertinent studies. Instead,
the general approach is described and illustrated through
select example studies.
Observational Methods
Naturalistic Observations
Although they are not commonly used to assess differential effectiveness, naturalistic observations are used to
identify the heuristics people apply as they work on
Heuristics
Structured Observations
All unstructured observational techniques suffer, to some
extent, from the problem of selective information presentation in the presence, or assumed presence, of observers.
However, a more critical problem lies in the fact that these
techniques do not capture data about heuristics that remain unarticulated. To address this problem, many
researchers have come to rely on think-aloud protocols.
In this technique, people are asked to verbalize, or think
aloud, as they work on a cognitive task. Transcripts, or
recordings, of these verbalizations are obtained. Heuristics are identified through a content analysis focusing on
verbalizations about strategy selection, justifications for
the actions being taken, and the sequences of actions
described. This approach has been used to identify the
heuristics used by anesthesiologists planning medical operations and to identify the heuristics used by students
working on creative problem-solving tasks.
The widespread use of think-aloud procedures in heuristic identification may be traced to three advantageous
characteristics of this technique. First, protocols can be
obtained for people who presumably are more or
less skilled with respect to the application of relevant
heuristicsexperts versus novices, successful versus unsuccessful problem solvers, and gifted versus nongifted
students. Comparison of these groups, of course, allows
relatively unambiguous identification of the heuristics
associated with good and poor performance. Moreover,
vis-a`-vis careful structuring of the conditions of task
performance, situational factors moderating heuristic application can be identified. Second, because all groups are
exposed to a common task, or set of tasks, controlled
inferences about heuristic application are possible.
Third, content coding schemes can be tailored to specific
actions on a known set of problems, thereby enhancing
the reliability and validity of assessments.
Although these observations recommend the application of think-aloud procedures in identifying heuristics,
this technique does have limitations. Due to the nature
of verbalization techniques, it is difficult to apply this
approach to highly complex real-world tasks, especially
tasks for which performance unfolds over substantial
periods of time. Moreover, unless multiple, carefully
selected, stimulus tasks are applied, the generality of
the conclusions flowing from application of this technique are open to question. Along related lines, it is difficult to collect protocol information for a large number
of people, a characteristic of the technique that limits
power and population inferences. Finally, due to the
complexity of the material obtained, it is difficult to
205
Experimental Methods
Experimental techniques are used in both the identification of heuristics and the assessments of heuristic application. Experimental techniques base identification and
assessment of heuristics on observable performances on
a particular task or set of tasks. Four general kinds of tasks
are commonly applied in studies of heuristics based on the
experimental approach: optimization tasks, choice tasks,
skill acquisition tasks, and simulation/gaming tasks.
Optimization Tasks
The basic principle underlying optimization tasks is
that departures from an optimal standard can be used
to appraise the heuristics associated with sub par performance. Some studies apply a variation on this approach
whereby departures from a theoretical minimum are used
to identify the heuristics that contribute to performance.
Accordingly, application of this approach, an approach
commonly applied in studies of decision-making heuristics, is contingent on the feasibility of determining
theoretical minimums and maximums.
206
Heuristics
Choice Tasks
Choice tasks, in contrast to optimization tasks, apply relative, rather than absolute, standards to measure heuristics. More specifically, choice tasks measure heuristics by
examining preferences for applying one heuristic over
another as people work on certain cognitive tasks. Assessment is based on the assumption that people will use
preferred heuristics in performing both the task at
hand and in performing other tasks lying in the same
domain. As a result, use of this approach typically requires
prior identification of relevant heuristics. It is applied
when the concern at hand is identifying more or less
useful heuristics. When these conditions are met, however, the ability to link heuristic preferences to performance on a variety of tasks makes this approach an
attractive vehicle for assessing heuristic application.
This approach has been used to assess the heuristics
involved in problem construction, a processing operation
commonly held to play a key role in creative problemsolving. In one study, people were presented with
a short written description of four scenarios in which
the problem at hand could be defined in a number of different ways. People were to review 16 alternative problem definitions and were asked to select the best four
definitions under conditions in which these alternatives
were structured to reflect the tendency to define
problems in terms of (a) goals, (b) procedures, (c) restrictions, and (d) key information. It was found that most
people tended to define problems in terms of key information. However, performance on two complex creative
problem-solving tasks was related to a preference for defining problems in terms of procedures and restrictions.
Thus, people apparently preferred to use a heuristic
that was not one of those contributing to subsequent
performance.
Simulations/Gaming Tasks
The fourth, and final, experimental procedure used in the
identification and assessment of heuristics is simulation or
gaming. In simulation and gaming exercises, heuristics are
Heuristics
Psychometric Methods
In the psychometric approach, heuristics are identified and
assessed with respect to people, and the performance
differences observed among people, rather than with respect to the particular actions observed for a given set of
tasks. This point is of some importance because the psychometric approach tends to emphasize heuristics associated with performance differences, discounting
heuristics that are applied in a similar fashion by all individuals. To elicit these individual differences, psychometric
studies rely on one of two basic techniques: self-report and
tests.
Self-Report
One version of the self-report approach assumes that
people are aware of, and understand the implications of,
the strategies they apply as they execute tasks lying in different domains. Of course, given this assumption, it is possible to identify heuristics simply by asking people about
the heuristics they apply in different endeavors. In accordance with this proposition, in one study, undergraduates
were asked to indicate whether they applied heuristics
such as visualization, step-by-step analysis, and analogies
when working on interpersonal, academic, or daily life
tasks. It was found that people could describe where,
and how frequently, they applied these general heuristics.
This direct reporting approach, however, has proved less
effective when specific heuristics, particularly heuristics
applied nearly automatically, are under consideration.
207
Testing
In testing, people are not asked to describe heuristics or
the behaviors associated with these heuristics. Instead,
the capability for applying heuristics is inferred based
on peoples responses to a series of test items. In one
variation on this approach, the objective scoring approach, test items are developed such that the problems
presented call for certain processing activities. Response
options are structured to capture the application of heuristics linked to effective process application. In one
study, test items were developed to measure selective
encoding, selective comparison, and selective combination. It was found that gifted students differed from
nongifted students in that they were better able to identify
relevant information (selective encoding).
In the subjective scoring approach, problems (typically
open-ended problems) are developed in a way that
a variety of responses might be used to address the problem. Judges are then asked to review peoples responses
and assess the extent to which they reflect the application
of certain heuristics. This approach is applied in scoring
the divergent thinking tests commonly used to measure
creativity by having judges evaluate responses with respect to three creative strategies: generating a number
of ideas (fluency), generating unusual ideas (originality),
and generating ideas through the use of multiple
concepts (flexibility). In another illustration of this approach, army officers were asked to list the changes in
peoples lives that might happen if certain events occurred
(e.g., What would happen if the sea level rose?). Judges
scored these responses for heuristics such as use of a
longer time frame, application of principles, and a focus
on positive versus negative consequences. Use of longer
time frames and application of principles were found to
be related to both performance on managerial problemsolving tasks and indices of real-world leader performance, yielding multiple correlations in the 0.40s.
208
Heuristics
Conclusions
Clearly, a variety of procedures have been developed for
identification and assessment of the heuristics people use
in working on complex, ill-defined problems. Moreover,
these procedures have proved useful in identifying heuristics, and measuring heuristic application, across
a number of performance domains calling for complex
cognition. What should be recognized, however, is that
all of these approaches evidence certain strengths and
weaknesses. The observational approach has typically
proved most useful in heuristic identification. Experimental and psychometric methods are more commonly used
to identify the impact of heuristics on performance and
measure individual differences in heuristic application.
Along similar lines, certain techniques subsumed under
these three basic methods appear, by virtue of their assumptions, more appropriate than are others for addressing some questions, and measuring some heuristics.
These observations are of some importance because
they suggest that a comprehensive understanding of the
heuristics involved in a certain type of performance will
require a multimethod, multitechnique approach.
Although the kind of multimethod, multitechnique
studies called for are not a simple undertaking, the evidence indicates that work along these lines will be worth
the time and effort. Not only has identification of the
heuristics involved in high-level cognitive performance
proved critical in theory development, the information
provided by these studies has provided new tools for developing and assessing peoples performance capacities.
Given the foundation of human performance in complex
cognitive activities such as planning, decision making,
and creative problem-solving, it can be expected that
heuristic measurement will become an increasingly
important aspect of social measurement over the course
of the 21st century.
Further Reading
Antonietti, A., Ignazi, S., and Perego, P. (2000). Metacognitive
knowledge about problem-solving methods. Br. J. Educat.
Psychol. 70, 1 16.
Badke-Schaub, P., and Strohschneider, S. (1998). Complex
problem solving in the cultural context. Travail Humain
61, 1 28.
Butler, D. L., and Kline, M. A. (1998). Good versus creative
solutions: A comparison of brainstorming, hierarchical, and
Glossary
analysis of covariance model (ANCOVA) A varying intercept hierarchical linear model with the second-level effect
fixed across groups.
between-unit model The component of a hierarchical linear
model that describes the variability across the groups.
context-level variables Variables defined at the second or
higher level of the hierarchical linear model.
empirical Bayes Using the observed data to estimate
terminal-level hierarchical model parameters.
exchangeability The property of a hierarchical linear model
that the joint probability distribution is not changed by
re-ordering the data values.
expectation-maximization (EM) algorithm An iterative
procedure for computing modal quantities when the data
are incomplete.
fixed effects coefficients Model coefficients that are
assumed to pertain to the entire population and therefore
do not need to be distinguished by subgroups.
hierarchy The structure of data that identifies units and
subunits in the form of nesting.
interaction term A model specification term that applies to
some mathematical composite of explanatory variables,
usually a product.
random coefficients regression model A hierarchical linear
model in which the only specified effect from the second
level is seen through error terms.
random effects coefficients Model coefficients that are
specified to differ by subgroups and are treated probabilistically at the next highest level of the model.
two-level model A hierarchical linear model that specifies
a group level and a single contextual level.
varying intercept model A hierarchical linear model with
only one (noninteractive) effect from the second level of the
model.
within-unit model The component of a hierarchical linear
model that describes variability confined to individual
groups.
Hierarchical linear models (HLMs) are statistical specifications that explicitly recognize multiple levels in
data. Because explanatory variables can be measured at
different points of aggregation, it is often important to
structure inferences that specifically identify multilevel
relationships. In the classic example, student achievement
can be measured at multiple levels: individually, by class,
by school, by district, by state, or nationally. This is not just
an issue of clarity and organization. If there exist differing
effects by level, then the substantive interpretation of the
coefficients will be wrong if levels are ignored. HLMs take
the standard linear model specification and remove the
restriction that the estimated coefficients be constant
across individual cases by specifying levels of additional
effects to be estimated. This approach is also called random effects modeling because the regression coefficients
are now presumed to be random quantities according to
additionally specified distributions.
Essential Description of
Hierarchical Linear Models
The development hierarchical linear model (HLM) starts
with a simple bivariate linear regression specification for
individual i:
Yi b0 b1 Xi ei
Encyclopedia of Social Measurement, Volume 2 2005, Elsevier Inc. All Rights Reserved.
209
210
211
bj0
bj1
"
Zj0
Zj1
3
g00
" #
6g 7
uj0
6 10 7
6
7
4 g01 5
uj1
g11
10
which is just the vectorized version of Eq. (3). Therefore, it is possible to express Eq. (4) in the very concise
form:
Yij b0j 1
Xij 0 eij
11
212
"
bj
bj0
bj1
"
1
0
Zj01
0
Zj02
0
Zj0k0 1
0
0
1
0
Zj11
0
Zj12
g00
g10
..
.
7
6
7
6
7
6
7
6
7
6
7 " #
6
#6
7
6 gk0 10 7
0
uj0
7
6
7
6
Zj1k1 1 6 g01 7
uj1
7
6
6 g11 7
7
6
7
6
..
7
6
.
5
4
gk1 11
12
16
and:
Yij b0j 1
Xij1
Xij2
. . . XijL 0 eij
17
Thus, the HLM in this form allows any number of firstand second-level explanatory variables, as well as
differing combinations across contextual levels. Note
also that there is no restriction that number of individual
units, nj, be equal across the contexts (although this can
make the estimation process more involved).
The final basic way that the HLM can be made more
general is to add further levels of hierarchy with respect
to levels. That is, it is possible to specify a third level
in exactly the way that the second level was added by
parameterizing the g terms according to:
gpq d0q d1q Wpq vpq
where the p subscript indicates a second level of
contexts (p 1, . . . , P), and the q subscript indexes the
number of equations (q 1, . . . , Q) specified at this
level (analogous to k at the lower level). In this
specification, Wpq is a third-level measured explanatory
variable and vpq is the level-associated error term.
213
214
Although these reasons are compelling, it is only relatively recently that hierarchical models have been actively pursued in the social sciences. This is parallel
(and related) to the attachment social scientists have
for the linear model in general. What precipitated the
change was the dramatic improvement in statistical computing that provided solutions to previously intractable
problems. These stochastic simulation tools include the
EM algorithm; Markov chain Monte Carlo techniques
(MCMC), such as the Metropolis-Hastings algorithm;
Further Reading
Goldstein, H. (1995). Multilevel Statistical Models. Edward
Arnold, New York.
Heck, R. H., and Thomas, S. L. (2000). Introduction to
Multilevel Modeling Techniques. Lawrence Erlbaum
Associates, Mahwah, NJ.
Kreft, I., and de Leeuw, J. (1998). Introducing Multilevel
Modeling. Sage Publications, Newbury Park, CA.
Leyland, A. H., and Goldstein, H. (2001). Multilevel Modelling
of Health Statistics. John Wiley & Sons, New York.
Lindley, D. V., and Smith, A. F. M. (1972). Bayes estimates for
the linear model. J. Royal Statist. Soc. B 34, 141.
Neider, J. A. (1977). A reformulation of linear models (with
discussion). J. Royal Statist. Soc. A 140, 4876.
Raudenbush, S., and Bryk, A. S. (1986). A hierarchical model
for studying school effects. Sociol. Educ. 59, 117.
Raudenbush, S., and Bryk, A. S. (2002). Hierarchical Linear
Models, 2nd Ed. Sage Publications, Newbury Park, CA.
Reise, S. P., and Duan, N. (2001). Multilevel Models: A Special
Issue of Multivariate Behavioral Research. Lawrence
Erlbaum Associates, Mahwah, NJ.
Smith, A. F. M. (1973). A general Bayesian linear model.
J. Royal Statist. Soc. B 35, 6175.
Wong, G. Y., and Mason, W. M. (1991). Contextually specific
effects and other generalizations of the hierarchical linear
model for comparative analysis. J. Am. Statist. Assoc. 86,
487503.
Highway Statistics
Alan E. Pisarski
Independent Consultant, Falls Church, Virginia, USA
Glossary
fatality rate The measure almost universally used in the
United States is fatalities per 100 million miles of travel.
functional classification An engineering classification of
roads based on their function (as opposed to their design
characteristics).
international roughness index (IRI) A calculation of inches
of deflection from the surface per mile of road; e.g., a rating
of 150 150 inches of ruts measured over a mile.
lane-miles The number of miles of route multiplied by the
number of lanes in the road. A four-lane 10-mile road
contains 40 lane-miles.
miles of route The actual length of a system; sometimes
called centerline miles.
pavement serviceability rating (PSR) A subjective professional measure of road quality on a scale of 1 to 5.
person-miles of travel The travel of an individual rather
than a vehicle; two people in a vehicle going 10 miles equals
20 person-miles of travel.
ton-miles of travel The number of miles traveled by trucks
multiplied by the cargo weight carried. One truck carrying
10 tons traveling 10 miles 100 ton-miles.
travel time index (TTI) One of several measures of
congestion, but perhaps the most typically used, measuring
the ratio of peak-hour travel to off-peak travel. A TTI of 150
indicates that a 10-minute off-peak trip takes 15 minutes in
the peak time.
vehicle-miles of travel (VMT) The total travel in miles
summed for all vehicles; one vehicle going 10 miles equals
10 VMT.
of these are measurable, and it is through their measurement and description that society recognizes the roles
played by the road system, appreciates the significance
of the road system in daily life, and forms judgments
on future transport needs and expectations. The road
vehicle dyad provides unparalleled mobility and accessibility to the American population.
Introduction
The highway is many things. Clearly, it is a physical
object of considerable scope and dimension. Even if
no vehicle ever traversed a highway, the description
of the attributes and physical characteristics of the entire highway system, just in terms of its dimensions and
design, would be a significant task. But an extraordinary
variety of vehicles do use the system, and their performance, or the performance of the road serving them, is
the object of a broad array of statistical mechanisms.
The highway is an economic and social tool that carries
prodigious quantities of people and goods to fulfill many
economic and social needs; the highway is also a political tool, used to guide economic development to selected areas, to support military logistics, and to create
social and economic interactions among regions to bind
nations together.
As a societal artifact that interacts with the rest of the
world both positively and negatively, in terms of the space
it occupies, the highway is associated with deaths
and accidents and has other social impacts as well. The
vehicle fleet associated with the highway constitutes
a world in itself, a world requiring statistical description
and understandingsomething akin to a demography of
the automobile.
Encyclopedia of Social Measurement, Volume 2 2005, Elsevier Inc. All Rights Reserved.
215
216
Highway Statistics
System Ownership
According to the U.S. Bureau of the Census, roads in
America are owned by approximately 36,000 units of government, utilizing almost every imaginable system of
government organization for the management of the
roads in the care of different units. Surprisingly, the national government owns few roads; typically these are only
minor roads internal to federal lands. Table I, showing
ownership of miles of road in 2001 for urban and rural
areas, indicates the main entities that own roads in the
United States. Although only approximately 20% of roads
are under state control, these are often the major roads of
the road system in terms of design characteristics and
volume of use. The greatest portion of roads is under
local control, including county, town, township, or municipal ownership and also including Indian tribes and
public authorities especially created to operate roads.
Over 3000 counties in America own more than 1.7 million
miles of road, almost 45% of the total system in the country. Only about 3% of the road system is under federal
control. The mileage of federal roads has tended to
decline as roads on public lands are declassified or reclassified. It is also important to note that the road system is
predominantly rural in location, with almost 80% of roads
in rural areas.
Ownership of Roadsa
Rural
Jurisdiction
Urban
Total
Mileage
Percent
Mileage
Percent
Mileage
Percent
State
Local
Federal
665,093
2,286,969
119,270
21.7
74.5
3.9
109,136
765,633
2234
12.4
87.3
0.3
774,229
3,052,602
121,504
19.6
77.3
3.1
Total
3,071,332
100.0
877,003
100.0
3,948,335
100.0
217
Highway Statistics
Bridges
A system as large as the highway system of the United
States inevitably would have a large number of bridges.
Because of safety concerns, the number, characteristics,
and condition of bridges are carefully monitored under
federal law. The length threshold for bridges that are
monitored is 20 feet. In 2000, there were somewhat
more than 587,000 bridges in the National Bridge Inventory distributed functionally as shown in Table III. Like
the road system, the number of bridges grows slowly, with
about a 1% increase since 1996.
Bridge Condition
The bridges in Table III are grouped by functional class
and area. The National Bridge Inventory reviews
each bridge at least every 3 years and exhaustively
Table II Pavement Rating Systemsa
Rating
Very good
Good
Fair
Mediocre
Poor
PSRb
IRI c
4.0
3.53.9
2.63.4
2.12.5
52.0
560
06094
095170
171220
4220
a
The ranges shown are for roads in general; interstate roads have
a more stringent standard, with anything worse than an IRI of 170
deemed poor (an IRI of 170 170 inches in depth of cracks per
mile). Data from Condition and Performance Report, Federal Highway
Administration (2002).
b
PSR, Pavement serviceability rating.
c
IRI, International roughness index.
Rural bridges
Urban bridges
Interstate
Other arterial
Collector
Local
27,797
74,796
143,357
209,415
27,882
63,177
15,038
25,684
Total
455,365
131,781
218
Highway Statistics
different sizes, as measured in terms of population, industrial output, attractiveness, etc., then the average circuity weighted by that measure would be quite different
than an average that was derived assuming that all
nodes were equal. Early transportation modeling used
a formula related to a gravitation equation that weighted
travel by the size of areas and the distances between
them.
Another way in which to measure the robustness of the
system would be to conduct hypothetical exercises in
which a link is deleted; for instance, a bridge failure
can be simulated and the effects on the circuity of travel
measured. This measures the redundancy incorporated in
the system and may be an important measure of network
effectiveness, in a military or emergency situation, for
example.
Function
To understand the true nature and use of the road network as a system, it must be described based on function.
A functional classification system has evolved over the
years to provide a description of roads according to the
function they perform in the overall highway system.
There are three major groupings: (1) arterials, which
are designed to serve longer distance travel, (2) collectors,
which provide an intermediate function between local
streets and long distance arterials, and (3) local roads,
which primarily serve adjacent properties. As can be
seen from Table IV, local roads that serve houses, factories, businesses, and farms constitute the largest share
(more than two-thirds) of roads. All arterials constitute
only about 11% of the road system, with the collectors
making up the remainder. In urban and rural areas, there
are subcategorizations within the three main functional
groupings to permit more detailed description of specialized functions. The 46,000-mile-long interstate system
constitutes the highest level of the functional system;
although it accounts for only slightly more than 1% of
Table IV
Functional system
Rural
Urban
Total
Interstate
Other freeways/expressways
Other principal arterials
Minor arterials
Major collector
Minor collector
Collector
Local
0.8%
NAb
2.5%
3.5%
11.0%
7.0%
NAb
53.5%
0.3%
0.2%
1.4%
2.3%
NAb
NAb
2.3%
15.1%
1.2%
0.2%
3.9%
5.8%
11.0%
7.0%
2.3%
68.6%
All
78.4%
21.6%
100.0%
a
Data from Highway Statistics, Federal Highway Administration
(2002).
b
NA, Not applicable.
Highway Statistics
Mileage
a
System
Interstate
Other NHS
Total NHS
a
219
Rural
Urban
Total
Rural
Urban
Total
Rural
Urban
Total
32,910
85,616
118,526
13,424
28,143
41,567
46,334
113,759
160,093
252,317
214,824
467,141
377,840
315,243
693,083
630,157
530,067
1,160,224
9.6
8.1
17.7
14.3
11.9
26.2
23.9
20.1
43.9
220
Highway Statistics
VMT Milestonesa
Year
1900
1905
1924
1936
1952
1962
1967
1968
1988
1988
2000
2005 (est.)
a
VMT
Event
100 million
1.24 billion
105 billion
0.25 trillion
0.50 trillion
0.76 trillion
0.96 trillion
1.0 trillion
2.0 trillion
2.0 trillion
2.75 trillion
3.0 trillion
Rural
Urban
Interstate
Other freeways/expressways
Other principal arterials
Minor arterial
Major collector
Minor collector
Collectors
Local
39%
49%
10%
12%
17%
29%
46%
43%
22%
34%
33%
24%
43%
43%
31%
24%
12%
17%
33%
26%
Total
27%
33%
30%
All
Highway Statistics
Description
Describes primarily free-flow operations; vehicles are almost completely unimpeded in their
ability to maneuver within the traffic stream and the effects of incidents are easily absorbed
Also represents reasonable free-flow in which speeds are generally maintained; ability to
maneuver is only slightly restricted
Flow still at or near free-flow speeds; ability to maneuver restricted and tension increases; incidents
cause local deterioration of service
Speeds begin to decline with increasing volumes; freedom noticeably restricted; minor incidents
cause queuing
Operations at or near capacity; operations volatile, with any disruption causing waves of reaction;
both physical and psychological comfort is extremely poor
Breakdown in flow; queues formed and lower volumes of flow produced
B
C
D
E
F
a
221
Data abstracted by the author from the Highway Capacity Manual 2000, Transportation Research Board (2000).
222
Highway Statistics
700
600
Index
500
400
300
200
100
0
1950
1955
1960
1965
1970
1975
1980
1985
1990
1995
2000
Year
Year
Vehicles
Drivers
Miles of road
VMT
1950
1955
1960
1965
1970
1975
1980
1985
1990
1995
2000
100
127.4
150.2
183.7
220.3
270.3
316.7
349.0
383.7
409.6
458.9
100
120.1
140.4
158.4
179.3
208.7
233.6
252.3
268.5
283.9
307.1
100
103.2
107.0
111.4
112.6
115.8
116.5
116.6
116.7
118.1
118.8
100
132.169
156.853
193.758
242.187
289.764
333.326
387.342
467.983
528.765
600.131
Highway Statistics
routes, etc.; the most direct user measure is, for instance,
average travel time. This information is collected by the
Bureau of the Census for work trips and by the National
Household Travel Survey (NHTS) of the FHWA for all
trips. These averages can improve while the facility-based
measures decline. The census measured an increase in
work trip travel times of only 40 seconds from 1980 to
1990 but observed a 2-minute increase from 1990
to 2000.
There are many other measures of congestion that
express various aspects of congestions attributes. The
Table IX
223
Measure
Approach
Average speed
Percentage of system
congested
Percentage of travel
congested
Buffer or reliability
index
Purpose
Most direct measure of changes
in facility service
Depth measure; measure of loss of time
in the peak (140 40% more travel time)
Depth extent measure; accumulated
loss of time for the year; provides
cost value
Extent measure; percentages of road
system affected by congestion
Extent measure; e.g., 40% of total travel
was under congested conditions!
a
Data from Condition and Performance Report, Federal Highway Administration (2002), and Urban Mobility Report, 2003, Texas Transportation
Institute (2003).
Concept
Are the origins and destinations linked by the transportation system?
Do the origins and destinations have physical barriers for part of the population?
Do the origins and destinations have economic barriers for part of the population?
Is the transportation system available when people want to travel?
How long does it take to get to the desired destination under the best conditions?
What is the usual expected time to get to the destination?
How much time do people add for a trip or shipment in anticipation of variability
in expected travel-time to reach the destination when they need to be there?
How much longer than the best travel time does the usual or scheduled trip take
because of routing, speed restrictions, expected delay, etc.?
How late are people and goods when buffer time is exceeded?
How often do trips and shipments reach their destinations by the usual
expected time?
How many travelers and shipments can be moved over the transportation system
under optimum conditions?
When and where are the number of trips and shipments that are actually
accommodated less than the maximum possible because capacity has been
exceeded by the volume of traffic?
Data from R. R. Schmitt, presentation to the North American Statistical Interchange, April 2002.
224
Highway Statistics
There is an extraordinary number of ways in which highways interact with society, and each has significant ramifications. The impacts include both positive and negative
attributes and can be arrayed into three broad categories, economic, social, and safety/energy/environmental.
Although the impacts of roads on the land began well
before the United States was a nation, the most important
impacts of highways ultimately derive from the inception
of the great role of the internal combustion engine.
50,000
40,000
30,000
20,000
10,000
0
1960
Vehicles
Number of vehicles
A key measure of the relationship between vehicles, highways, and society is the number of vehicles/household or
vehicles/1000 population. In 1910, at the start of the auto
age, there were 200 people/vehicle in the United States; 5
years later, this was already down to 40 people/vehicle. By
the 1930s, the entire population could be seated in the
front and back seats of all of the existing vehicles, and by
the 1950s, the entire population could be accommodated
in the front seats alone. In 2001, the total motor vehicle
fleet exceeded 230 million vehicles; with a population on
the order of 285 million, this was a comfortable 1.24
people/vehicle. The vehicle fleet now not only exceeds
the number of drivers in the population, it exceeds the
number of the total adult population. In many of the
recent past decades, vehicle production exceeded population increase.
Other measures of the vehicle/population relationship
include the percentage of persons over 16 years of age with
drivers licenses, the average number of vehicles per household, the distribution of households by vehicles owned or
available for use, and the number of households with no
vehicles. Among adults, the United States isnear saturation
regarding licenses; the same can be said for households, for
total number of vehicles. However, these levels are significantly lower among minority groups, which will be
a significant source of future growth in vehicle acquisition
and travel miles. A key statistical measure of mobility is the
number and percentage of households without vehicles in
a society highly oriented to mobility. Surprisingly, the
number of households without a vehicle has remained at
about 10 million for 40 years, and the number of households with one vehicle has remained at about 30 million,
but, of course, both are a significantly declining percentage
of all households. All growth has occurred in the two- and
three-vehicle households. Figure 3 depicts this trend over
the past 40 years.
0
1
2
3
Total
1970
1980
1990
2000
Year
3
Number of households ( 10 ) for year
1960
1970
1980
1990
2000
11,400
30,190
10,100
1300
52,999
11,110
30,300
18,600
3500
63,500
10,400
28,600
27,400
14,100
80,500
10,600
31,000
34,400
16,000
92,000
10,861
36,124
40,462
18,033
105,480
measure to monitor is the share of total household spending going to transportation. On average, American households spend about 19% of their total expenditures on
transportation, or about $7600 per year in 2001, with
both the amount and the percentage of spending rising
with increasing incomes. Of this amount, more than 96%,
all but about $300 of total spending, is oriented to vehicles
and road travel. The percentage of expenditures going to
transportation has risen slightly over the decades.
Human Purposes
In considering local road travel, particularly when associated with congestion, the tendency is to think of work
travel. In fact, work travel is a small ( 20%) and declining
share of all travel. The share of personal travel accommodated by the road system is prodigious. For work trips, the
personal auto accounts for over 92% of trips, with transit
and other accounting for less than 4% each. Even the